Abstract
In this paper we propose a method of movie series recommender system development. Our recommender system is content-based, and movie series are represented by their scripts. We experiment with several semantic similarity measures, lexico-morphological metrics, keywords and vector space models to extract similar movie series. Evaluation is conducted in the experiment with informants. The best results are achieved by distributional semantic approach (i.e., using word2vec technology).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We also experimented with other tools for training word embeddings like FastText and tried larged word embedding models provided by RusVectores and Russian Distributional Thesaurus, but in both cases the results appeared to be worse than those achieved with word2vec trained on the movie series scripts. These results are not reported in this paper due to space limits.
References
Gurbanov, T.: Non-personalized recommendations: method of associations. https://habrahabr.ru/post/257903/. Accessed 1 May 2018
Roizner, M.: How recommender systems work. https://habrahabr.ru/company/dca/blog/280700/. Accessed 1 May 2018
Ricci, F., Rokach, L., Shapira, B.: Introduction ton to Recommender Systems Handbook. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B., (eds.) Recommender Systems Handbook, pp. 1–29 (2011). ISBN 978-0-387-85819-7, https://doi.org/10.1007/978-0-387-85820-3
Lops, P., de Gemmis, M., Semeraro, G.: Content-based recommender systems: state of the art and trends. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook. ISBN 978-0-387-85819-7, pp. 73–100 (2010). https://doi.org/10.1007/978-0-387-85820-3_3
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS’13 Proceedings of the 26th International Conference on Neural Information Processing Systems, vol. 2, pp. 3111–3119 (2013)
Tambovcev, Y., Tambovceva, A., Tambovceva, L.: Typology of linguistic units distribution in text as a factor in author profiling task. Vestnik Omskogo universiteta 2, 88–96 (2008)
Pospelova, A., Yagunova, E.: The use of stylistic and genre characteristics to describe text collection style. Novie informacionnie tehnologii v avtomatizirovannih systemah, pp. 347–357 (2014)
Yagunova, E., Pivovarova, L.: Experimental and computational study of N.V.Gogol’ narrative stories. Struct. Funct. Stud. Russ. Linguist. 1(3), 83–104 (2014)
Wojciechowski, A., Goeznynski, K.: A method for measuring similarity of books: a step towards an objective recommender system for readers. In: Human Language Technology. Challenges for Computer Science and Linguistics, pp. 161–174 (2016). https://doi.org/10.1007/978-3-319-43808-5_13
Pronoza, E., Yagunova, E.: Low-level features for paraphrase identification. Adv. Artif. Intell. Soft Comput. 59–71 (2015). https://doi.org/10.1007/978-3-319-27060-9
Movie2Vec: Clustering movies by plot. https://movie2vec.wordpress.com/2016/03/22/clustering-movies-by-plot/. Accessed 1 May 2018
Paramonov, S.: How to write a simple recommender system. https://habrahabr.ru/post/230155/. Accessed 1 May 2018
Recommender systems: introduction to the cold start problem. https://habrahabr.ru/company/surfingbird/blog/168733/. Accessed 1 May 2018
Bordashshenko, A., Potemkin, A., Sazanova, E., Shekshuev, S.: Algorithm for the search of similar media reports. Int. J. “Naukovedenie” 7 (2015). ISSN 2223-5167
Myslín, M., Levy, R.: Codeswitching and predictability of meaning in discourse. Language 91(4), 871–905 (2015). https://doi.org/10.1353/lan.2015.0068
Song, Y., Roth, D.: Unsupervised sparse vector densification for short text similarity. In: NAACL HLT 2015—2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp. 1275–1280 (2015)
MacKay, D.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)
Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Williams (2014). ISBN 978-5-8459-1623-5
Scripted Originals Hit Record 455 in 2016. FX Study Finds. https://www.hollywoodreporter.com/live-feed/scripted-originals-hit-record-455-2016-fx-study-finds-958337. Accessed 1 May 2018
Era of Peak TV Continues With 487 Scripted Shows in 2017. https://www.wsj.com/articles/era-of-peak-tv-continues-with-487-scripted-shows-in-2017-1515182593. Accessed 1 May 2018
Best movie series. https://www.kinopoisk.ru/top/lists/45/. Accessed 1 May 2018
The most popular movie series in Kinopoisk. https://www.kinopoisk.ru/top/lists/257/. Accessed 1 May 2018
Gensim. https://radimrehurek.com/gensim/. Accessed 1 May 2018
RusVectōrēs: Russian semantic models. http://rusvectores.org/ru/. Accessed 1 May 2018
Russian Distributional Thesaurus. https://nlpub.ru/Russian_Distributional_Thesaurus. Accessed 1 May 2018
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space
word2vec. https://code.google.com/archive/p/word2vec/. Accessed 1 May 2018
Hierarchical clustering. https://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html. Accessed 1 May 2018
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315 (2007). https://doi.org/10.1126/science.1136800
Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963). https://doi.org/10.2307/2282967
AffinityPropagation. http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html. Accessed 1 May 2018
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
See Table 3.
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Danil, B., Elena, Y., Ekaterina, P. (2018). Similarity Measures and Models for Movie Series Recommender System. In: Bodrunova, S. (eds) Internet Science. INSCI 2018. Lecture Notes in Computer Science(), vol 11193. Springer, Cham. https://doi.org/10.1007/978-3-030-01437-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-01437-7_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01436-0
Online ISBN: 978-3-030-01437-7
eBook Packages: Computer ScienceComputer Science (R0)