A New Machine Learning Algorithm for Users’ Movie Recommendation

Konferenz: CIBDA 2022 - 3rd International Conference on Computer Information and Big Data Applications
25.03.2022 - 27.03.2022 in Wuhan, China

Tagungsband: CIBDA 2022

Seiten: 4Sprache: EnglischTyp: PDF

Autoren:
Liu, Zhongyuan (Faculty of Rail Transit, Wuxi University, Wuxi, China)
Wang, Xuefei (Shanghai Fuxing Senior High School, Shanghai, China)
Zhu, Hongzheng (College of Computer and Information Science, Faculty of Automation, Southwest University, Chongqing, China,)

Inhalt:
In this information era, even a movie lover may feel overwhelmed when facing numerous movies on the movie website. Therefore, an efficient movie recommendation system is needed. In this paper, two different recommendation systems were compared, based on the user-based collaborative filtering and the content-based collaborative filtering respectively. Basically, several relevant studies were referred to, and a huge dataset was used from MovieLens. After pre-processing the data, where the data was transformed into matrixes of each user’s rating and the corresponding movie (the rating matrix represented the eigenvector of each user), cosine similarity was used to obtain the similarity relationship between users by calculating the cosine similarity between those eigenvectors. Then, K-NearestNeighbor (KNN) algorithm was used to select a certain number of users having the highest similarity with the target, based on which predicted how the target would rate a movie and recommended the movie with the highest predicted score to the target. In the part of using the content-based collaborative filtering, films in each category were ranked based on the average and the number of the scores and the user’s favorite movie types were obtained by the user-movie genre preference matrix based on the scores that users gave to each type of movies. The movies with the highest score in each target user’s favorite movie type would be recommended to the target. Experimental outcomes indicate that both algorithms generally meet the needs of users despite of the differences in the results.