Article | Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16 | Using Topic Models in Content-Based News Recommender Systems Link�ping University Electronic Press Conference Proceedings
Göm menyn

Title:
Using Topic Models in Content-Based News Recommender Systems
Author:
Tapio Luostarinen: Comtra Oy, Savonlinna, Finland Oskar Kohonen: Aalto University School of Science, Department of Information and Computer Science, Finland
Download:
Full text (pdf)
Year:
2013
Conference:
Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16
Issue:
085
Article no.:
022
Pages:
239-251
No. of pages:
13
Publication type:
Abstract and Fulltext
Published:
2013-05-17
ISBN:
978-91-7519-589-6
Series:
Linköping Electronic Conference Proceedings
ISSN (print):
1650-3686
ISSN (online):
1650-3740
Series:
NEALT Proceedings Series
Publisher:
Linköping University Electronic Press; Linköpings universitet


Export in BibTex, RIS or text

We study content-based recommendation of Finnish news in a system with a very small group of users. We compare three standard methods; Naïve Bayes (NB); K-Nearest Neighbor (kNN) Regression and Regulairized Linear Regression in a novel online simulation setting and in a coldstart simulation. We also apply Latent Dirichlet Allocation (LDA) on the large corpus of news and compare the learned features to those found by Singular Value Decomposition (SVD). Our results indicate that Naïve Bayes is the worst of the three models. K-Nearest Neighbor performs consistently well across input features. Regularized Linear Regression performs generally worse than kNN; but reaches similar performance as kNN with some features. Regularized Linear Regression gains statistically significant improvements over the word-features with LDA both on the full data set and in the cold-start simulation. In the cold-start simulation we find that LDA gives statistically significant improvements for all the methods.

Keywords: Recommender Systems; Content-Based Recommendation; Topic Models; Latent Dirichlet Allocation; Cold-start

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Tapio Luostarinen, Oskar Kohonen
Title:
Using Topic Models in Content-Based News Recommender Systems
References:

Adomavicius; G. and Tuzhilin; A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering; 17(6):734–749.

Billsus; D. and Pazzani; M. J. (2000). User modeling for adaptive news access. User Modeling and User-Adapted Interaction; 10:147–180.

Blei; D. M.; Ng; A. Y.; and Jordan; M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research; 3:993–1022.

Cleger-Tamayo; S.; Fernández-Luna; J. M.; and Huete; J. F. (2012). Top-n news recommendations in digital newspapers. Knowledge-Based Systems; 27(0):180 – 189.

Deerwester; S. (1988). Improving Information Retrieval with Latent Semantic Indexing. In Borgman; C. L. and Pai; E. Y. H.; editors; Proceedings of the 51st ASIS Annual Meeting (ASIS ’88); volume 25; Atlanta; Georgia. American Society for Information Science.

Friedman; J. H.; Hastie; T.; and Tibshirani; R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software; 33(1):1–22.

Griffiths; T. L. and Steyvers; M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America; 101(Suppl 1):5228–5235.

Hoerl; A. E. and Kennard; R.W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics; 12(1):55–67.

Hofmann; T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval; SIGIR ’99; pages 50–57; New York; NY; USA. ACM.

Lang; K. (1995). Newsweeder: Learning to filter netnews. In Proceedings of the 12th International Machine Learning Conference (ML95.

Lindén; K.; Silfverberg; M.; Axelson; E.; Hardwick; S.; and Pirinen; T. A. (2011). Hfst-framework for compiling and applying morphologies. In Communications in Computer and Information Science; volume 100 of Systems and Frameworks for Computational Morphology; pages 67–85. Springer.

McCallum; A. and Nigam; K. (1998). A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization; pages 41–48. AAAI Press.

Mooney; R. J. and Roy; L. (2000). Content-based book recommending using learning for text categorization. In Proceedings of the Fifth ACM Conference on Digital Libraries; pages 195–204. ACM Press.

Rashid; A. M.; Karypis; G.; and Riedl; J. (2008). Learning preferences of new users in recommender systems: an information theoretic approach. SIGKDD Explor. Newsl.; 10(2):90– 100.

Salton; G. (1989). Automatic text processing: the transformation; analysis; and retrieval of information by computer. Addison-Wesley Longman Publishing Co.; Inc.; Boston; MA; USA.

Schein; A. I.; Popescul; A.; Ungar; L. H.; and Pennock; D. M. (2002). Methods and metrics for cold-start recommendations. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval; SIGIR ’02; pages 253–260; New York; NY; USA. ACM.

Takács; G. and Tikk; D. (2012). Alternating least squares for personalized ranking. In Proceedings of the sixth ACM conference on Recommender systems; RecSys ’12; pages 83–90; New York; NY; USA. ACM.

Wilcoxon; F. (1945). Individual Comparisons by Ranking Methods. Biometrics Bulletin; 1(6):80–83.

Yao; Y. Y. (1995). Measuring retrieval effectiveness based on user preference of documents. J. Am. Soc. Inf. Sci.; 46(2):133–145.

Zou; H. and Hastie; T. (2005). Regularization and variable selection via the Elastic Net. Journal of the Royal Statistical Society B; 67:301–320.

Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16

Author:
Tapio Luostarinen, Oskar Kohonen
Title:
Using Topic Models in Content-Based News Recommender Systems
Note: the following are taken directly from CrossRef
Citations:
No citations available at the moment


Responsible for this page: Peter Berkesand
Last updated: 2018-9-11