Content-based collaborative filtering using word embedding: a case study on movie recommendation

LV Nguyen, TH Nguyen, JJ Jung - Proceedings of the international …, 2020 - dl.acm.org
Proceedings of the international conference on research in adaptive and …, 2020dl.acm.org
The lack of sufficient ratings will reduce effectively modeling user reference and finding
trustworthy similar users in collaborative filtering (CF)-based recommendation systems, also
known as a cold-start problem. To solve this problem and improve the efficiency of
recommendation systems, we propose a new content-based CF approach based on item
similarity. We apply the model in the movie domain and extract features such as genres,
directors, actors, and plots of the movies. We use the Jaccard coefficient index to covert the …
The lack of sufficient ratings will reduce effectively modeling user reference and finding trustworthy similar users in collaborative filtering (CF)-based recommendation systems, also known as a cold-start problem. To solve this problem and improve the efficiency of recommendation systems, we propose a new content-based CF approach based on item similarity. We apply the model in the movie domain and extract features such as genres, directors, actors, and plots of the movies. We use the Jaccard coefficient index to covert the extracted features such as genres, directors, actors to the vectors while the plot feature is converted to the semantic vectors. Then, the similarity of the movies is calculated by soft cosine measure based on vectorized features. We apply the word embedding model (i.e., Word2Vec) for representing the plots feature as semantic vectors instead of using traditional models such as a binary bag of words and a TF-IDF vector space. Experiment results show the superiority of the proposed system in terms of accuracy, precision, recall, and F1 scores in cold-start conditions compared to the baseline systems.
ACM Digital Library