Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Movie Recommendations

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 35

Department of Electronics and Communication Engineering

(NBA Accredited)

MATRUSRI ENGINEERING COLLEGE


(Sponsored by Matrusri Education society, Estd1980)
(Approved by AICTE, Affiliated to Osmania University)
#16-1-486, Saidabad, Hyderabad, Telangana-500 059
www.matrusri.edu.in

i
2020-2021
SUMMER INTERNSHIP REPORT
A REPORT SUBMITTED IN PARTIAL FULFILLMENT OF THE

REQUIREMENTS FOR THE AWARD OF DEGREE OF

BACHELOR OF ENGINEERING

IN

ELECTRONICS & COMMUNICATION ENGINEERING

BY

VENNELA(1608-17-735-054)

Under the Supervision of


Mr. P.RAVI KUMAR
M.E, (Ph. D)
Assistant Professor

Department of Electronics and Communication Engineering


(NBA Accredited)
MATRUSRI ENGINEERING COLLEGE
(Sponsored by Matrusri Education society, Estd1980)
(Approved by AICTE, Affiliated to Osmania University)
#16-1-486,Saidabad, Hyderabad, Telangana-500 059
www.matrusri.edu.in )

2020-21
ii
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

Date:

Certificate

This is to certify that the “SUMMER INTERNSHIP REPORT” submitted by


Mr/Ms. Vennela Roll No. 1608-17-735-054 is work done by him/her and submitted
during 2020-21 Academic year, in partial fulfillment of the requirements for the award of
the degree of Bachelor of Engineering in Electronics and Communication Engineering of
the Osmania University, Hyderabad.

M. Ravi Kumar
Internship Coordinator Head of the Department

iii
Certificate

iv
v
vi
ACKNOWLEDGEMENT

Firstly, I earnestly thank our Principal Dr. D. Hanumantha Rao, sir for giving us this
incredible opportunity of doing an internship even during these unprecedented times.
I offer my gratitude to our HOD Dr. N. Srinivasa Rao,sir for his constructive criticism
which indeed helped me all through the internship.
It gives me immense pleasure to thank Bapuji Sir,(The Director Of Appleton
Innovations) for guiding us through the crux of machine learning and helping us out with
the projects.
I would express my sincere thanks to the Internship Co-Ordinator P. Ravi Kumar sir and
all my beloved teachers for their constant support and guidance.
I would also thank my fellow scribes Jesseica, Vaishnavi, Preethi Sriya and Divya Sree
for extending their help and bolstering me.
I’d also express my indebtedness to my parents and extended family without which this
wouldn’t have been possible.

Name of the student

(Vennela)

vii
ABSTRACT/EXECUTIVE SUMMARY

Problem: The advent of movie streaming services has made thousands of movies just a
click away. We now have movies not only from Hollywood, but also from international
cinema, documentaries, indie movies, etc. With so many movies at hand, the consumer
faces the dilemma of what to watch. At the end of the day, people just want to relax and
watch something that resonates with their mood, taste and style.

Methodology: Commercial streaming services such as Netflix and Amazon combine


semantic information about movies with user ratings to get the optimum hybrid
recommendation system. However, they still depend on human taggers for basic feature
representation which are needed to classify movies or songs. Although the results
obtained from human taggers are quite good, such an approach is definitely not scalable
when tagging hundreds and thousands of movies or millions of daily generated videos.
For a system to understand a movie, it needs movie features such as moviecast, movie
genre, movie plot, etc. With these information, a system can categorize movies better.
User written movie reviews is one such source of features. It carries substantial amount
of movie related information such as location, time period, genre, lead characters and
memorable scene descriptions.

Solution: Three main approaches are used for our recommender systems.One is
Demographic Filtering i.eThey offer generalized recommendations to every user,based
on movie popularity genre.The System recommends the same movies to users with
similar demographic features.Since each user is different,this approach is considered to
be too simple.The basic idea behind this system is that movies that are more popular and
critically acclaimed will have a higher probability of being liked by the average audience.
Second is content-based filtering, where we try to profile the users interests using
information collected and recommend items based on that profile.The other is
collaborative filtering, where we try to group similar users together and use information
about the group to make recommendations to the use.

viii
INDEX

SUMMER INTERNSHIP REPORT...........................................................................II


CERTIFICATE..........................................................................................................III
ACKNOWLEDGEMENT.........................................................................................VI
ABSTRACT/EXECUTIVE SUMMARY................................................................VII
Problem....................................................................................................................VII
Methodology............................................................................................................VII
Solution....................................................................................................................VII
1. INDUSTRY/ORGANIZATION PROFILE.................................................................1
2.INTERNSHIP OBJECTIVES/LEARNING OBJECTIVES.........................................2
3.INTRODUCTION.........................................................................................................3
4.TOOLS EMPLOYED....................................................................................................5
5. TYPES OF RECOMMENDATION SYSTEMS..........................................................7
5.1 DEMOGRAPHIC FILTERING.................................................................................7
5.2 CONTENT-BASED FILTERING..............................................................................8
5.3 COLLABORATIVE FILTERING...........................................................................10
i. User based collaborative filtering..........................................................................10
ii. Item based collaborative filtering...........................................................................11
6. RESULT....................................................................................................................13
7.CONCLUSION...........................................................................................................17
8.SKILLS ACQUIRED.................................................................................................18
9. WEEKLY OVERVIEW OF INTERNSHIP ACTIVITIES………………………..19
10.BIBLIOGRAPHY.....................................................................................................26

ix
1. INDUSTRY/ORGANIZATION PROFILE

Appleton Innovations is a technology driven start-up by Alumni of IIT Mumbai.


Appleton Innovations with its foundation pillars Imagine, Innovate and Invent is a
technical training Service Company which aims in imparting industry skills to students in
colleges, corporate and government Sector. At Appleton Innovations, we have a
dedicated Research & Development team for doing research in IOT, Solar smart energy
systems and data analytics. Our areas of research include Smart Devices and IOT and
Power solutions for smart cities. 

VISION

“The company’s aim is to empower millions of students across the globe with the skills
required for the industry.”

MISSION

 “They intend to create more than 20,000 Young Engineers across the country with
special focus on opportunities in Internet of Things (IOT), solar smart energy systems
and data analytics”.

1
2. INTERNSHIP OBJECTIVES/LEARNING
OBJECTIVES
The main objective of an internship is to expose you to a particular job and a profession
or industry. While we might have an idea about what a job is like, we won't know until
we actually perform it if it's what we thought it was.
And the main objectives for this Internship are:

 To Improve data processing skills with Machine Learning Algorithms

 To analyze tools like Python and its libraries, Statistics, Probability


Theories which are used by Data Scientists globally to solve toil problems.

 To perform Pandas Profiling for visualizing the data.

 How to determine and measure program complexity,

 Python Programming ML Library Scikit, Numpy , Matplotlib, Pandas

 Statistical Math for the Algorithms.

 Supervised and Unsupervised Learning

 Classification and Regression ML Algorithms

 Machine Learning Programming and Use Cases.

2
3. INTRODUCTION
A recommendation system is a type of information filtering system which attempts to
predict the preferences of a user and makes suggestion based on those preferences. There
are a wide variety of applications for recommendation systems. These have become
increasingly popular over the last few years and are now utilized in most of the online
platforms that we use. The content of such platforms varies from movies, music, books
and videos, to friends and stories on social media platforms and to products on e-
commerce websites, to people on professional and dating websites, to search results
returned on Google. Often, these systems are able to collect information about the users
choices and use this information to improve their suggestions in the future. For example,
Facebook can monitory our interaction with various stories on your feed in order to learn
what types of stories appeal to you. Sometimes, the recommender systems can make
improvements based on the activities of a large number of people. For example if
Amazon observes that a large number of customers who buy the latest Apple Mac-book
also buy a USB-C-to USB Adapter, they can recommend the Adapter to a new user who
has just added a Mac-book to his cart. Due to the advances in recommender systems,
users constantly expect good recommendations. They have a low threshold for services
that are not able to make appropriate suggestions. If a music streaming app is not able to
predict and play music that the user likes, then the user will simply stop using it. This has
led to a high emphasis by tech companies on improving their recommendation systems.
However, the problem is more complex than it seems. Every user has different
preferences and likes .In addition, even the taste of a single user can vary depending on a
large number of factors ,such as mood, season or type of activity the user is doing. For
example the type of music one would like to hear while exercising differs greatly from
the type of music she’d listen to when cooking dinner. Another issue that
recommendation systems have to solve is the exploration vs exploitation problem. They
must explore new domains to discover more about the user while still making the most of
what is already known about of the user. Three main approaches are used for our
recommender systems. One is Demographic Filtering offers generalized

3
recommendations to every user-based on movie popularity or genre. The System
recommends the same movies to users with similar demographic features. Since each
user is different this approach is considered to be too simple. The basic idea behind this
system is that movies that are more popular and critically acclaimed will have a higher
probability of being liked by the average audience. Second is content-based filtering
where we try to profile the users interests using information collected and recommend
items based on that profile. The other is collaborative filtering where we try to group
similar users together and use information about the group to make recommendations to
the user.

4
4. TOOLS EMPLOYED

1.Python

Python is an interpreted , high-level, general-purpose programming language. It was


created by Guido van Rossum and first released in 1991. Python's design philosophy
emphasizes code readability with its notable use of significant whitespace. Its language
constructs object-oriented approach aim to help programmers write clear, logical code for
small and large-scale projects. Python is dynamically typed and garbage-collected. It
supports multiple programming paradigms, including procedural, object-oriented, and
functional programming.
 Interpreted
In Python there are no separate compilation and execution steps like C/C++. It
directly runs the program from the source code. Internally, Python converts the
source code into an intermediate form called byte codes which is then translated
into native language of specific computer to run it.
 Platform Independent
Python programs can be developed and executed on the multiple operating system
platforms. Python can be used on Linux, Windows, Macintosh, Solaris and many
more.
 Multi- Paradigm
Python is a multi-paradigm programming language. Object-oriented programming
and structured programming are fully supported, and many of its features support
functional programming and aspect oriented programming.
2.Jupyter Notebook
The Jupyter Notebook is an open-source web application that allows us to create and
share documents that contain live code, equations, visualizations and narrative text. It’s
uses include data cleaning and transformation, numerical simulation, statistical modeling,
data visualization, machine learning, and many more.

5
5. TYPES OF RECOMMENDTION SYSTEMS

5.1 DEMOGRAPHIC FILTERING-


Generalized recommendations are offered to every user based on movie popularity and
genre. The System recommends the same movies to users with similar demographic
features. Since each user is different , this approach is considered to be too simple. The
basic idea behind this system is that movies that are more popular and critically
acclaimed will have a higher probability of being liked by the average audience.
Before getting started with this -

● We need a metric to score or rate movie

● Calculate the score for every movie

● Sort the scores and recommend the best rated movie to the users.

We can use the average ratings of the movie as the score but using this won't be fair
enough since a movie with 8.9 average rating and only 3 votes cannot be considered
better than the movie with 7.8 as average rating but 40 votes. So, we'll be using
IMDB's weighted rating (wr) which is given as :-

● v is the number of votes for the movie;

● m is the minimum votes required to be listed in the chart;

● R is the average rating of the movie; And

● C is the mean vote across the whole report

6
Demographic Filtering

7
5.2 CONTENT-BASED FILTERING SYSTEMS:
In content-based filtering, items are recommended based on comparisons between
item profile and user profile. A user profile is content that is found to be relevant to
the user in form of keywords (or features). A user profile might be seen as a set of
assigned keywords (terms, features) collected by algorithm from items found
relevant (or interesting) by the user. A set of keywords (or features) of an item is the
Item profile. For example, consider a scenario in which a person goes to buy his
favorite cake ‘X’ to a pastry. Unfortunately, cake ‘X’ has been sold out and as a
result of this the shopkeeper recommends the person to buy cake ‘Y’ which is made
up of ingredients similar to cake ‘X’. This is an instance of content-based filtering

We will be using the cosine similarity to calculate a numeric quantity that denotes the
similarity between two movies. We use the cosine similarity score since it is
independent of magnitude and is relatively easy and fast to calculate. Mathematically, it
is defined as follows:

8
We are now in a good position to define our recommendation function. These
are the following steps we'll follow :-

● Get the index of the movie given its title.

● Get the list of cosine similarity scores for that particular movie with all movies.
Convert it into a list of tuples where the first element is its position and the second
is the similarity score.

● Sort the aforementioned list of tuples based on the similarity scores; that is, the
second element.

● Get the top 10 elements of this list. Ignore the first element as it refers to
self (the movie most similar to a particular movie is the movie itself).

● Return the titles corresponding to the indices of the top elements.

ADVANTAGES OF CONTENT-BASED FILTERING ARE:


● They capable of recommending unrated items.
● We can easily explain the working of recommender system by listing the Content
features of an item.
● Content-based recommender systems use need only the rating of the concerned
user and not any other user of the system.

DISADVANTAGES OF CONTENT-BASED FILTERING ARE:


● It does not work for a new user who has not rated any item yet as enough ratings
are required content-based recommender evaluates the user preferences and
provides accurate recommendations.
● No recommendation of serendipitous items.
● Limited Content Analysis

9
5.3 COLLABORATIVE FILTERING BASED SYSTEMS:
Our content based engine suffers from some severe limitations. It is only capable of
suggesting movies which are close to a certain movie. That is, it is not capable of
capturing tastes and providing recommendations across genres.
Also, the engine that we built is not really personal in that it doesn't capture the
personal tastes and biases of a user. Anyone querying our engine for
recommendations based on a movie will receive the same recommendations for that
movie, regardless of who she/he is.
Therefore, in this section, we will use a technique called Collaborative Filtering to
make recommendations to Movie Watchers. It is basically of two types:-

A) USER BASED FILTERING- These systems recommend products to a


user that similar users have liked. For measuring the similarity between two users we
can either use pearson correlation or cosine similarity. This filtering technique can be
illustrated with an example. In the following matrix's, each row represents a user, while
the columns correspond to different movies except the last one which records the
similarity between that user and the target user. Each cell represents the rating that the
user gives to that movie. Assume user E is the target.

Although computing user-based CF is very simple, it suffers from several problems.


One main issue is that users’ preference can change over time. It indicates that pre-
computing the matrix based on their neighboring users may lead to bad performance.
To tackle this problem, we can apply item-based CF.

10
User Based Filtering

B) ITEM BASED COLLABORATIVE FILTERING - Instead of


measuring the similarity between users, the item-based CF recommends items based on
their similarity with the items that the target user rated. Likewise, the similarity can
be computed with Pearson Correlation or Cosine Similarity. The major difference is that,
with item-based collaborative filtering, we fill in the blank vertically, as oppose to the
horizontal manner that user-based CF

11
Item based collaborative system

ADVANTAGES OF COLLABORATIVE FILTERING BASED SYSTEMS:


● It is content-independent.
● CF recommender systems can suggest serendipitous items by observing similar-
minded people’s behavior.
● They can make real quality assessment of items by considering other peoples
experience

DISADVANTAGES OF COLLABORATIVE FILTERING ARE:


● Early rater problem: Collaborative filtering systems cannot provide
recommendations for new items since there are no user ratings on which to base a
prediction.
● Gray sheep: In order for CF based system to work, group with similar
characteristics are needed. Even if such groups exist, it will be very difficult to
recommend users who do not consistently agree or disagree to these groups.
● Sparsity problem: In most cases, the amount of items exceed the number of users
by a great margin which makes it difficult to find items that are rated by enough
people.

12
RESULTS

1. Demographic Filtering

Demographic Output

13
2. Content-based Filtering Systems

Content Based Output_1


get_recommendations() function by passing in the new cosine_sim2
matrix is

Content Based Output_2

14
3. Collaborative filtering based systems

Collaborative Based Output_1

15
Collaborative Based Output_2

16
CONCLUSION

A hybrid approach is taken between context based filtering and collaborative filtering to
implement the system. This approach overcomes drawbacks of each individual
algorithm and improves the performance of the system. Techniques like Clustering,
Similarity and Classification are used to get better recommendations thus reducing mean
absolute error and increasing precision and accuracy. In future we can work on hybrid
recommender using clustering and similarity for better performance. Our approach can be
further extended to other domains to recommend songs, video, venue, news, books,
tourism and e-commerce platforms etc.
In this project, we developed the prototyping system for extracting movie features i.e.
topics. We trained a model on a collection of movie reviews and used the trained model
to find similar movies. Evaluation results shows that such an approach gives good result
even with a small movie collection. Results shows that the movie topics are efficient
features as they performs fairly well in capturing movie genre and mood.
Movie plot resultsare somewhat satisfactory but need descriptive plot information and
better methods that can capture the story-line. Our small sized movie corpus resulted in
very few overlap between actors. The topics as an explanation in movie recommendation
are quite useful but need to be fine-tuned with the ability to rate individual topics.
User rated movie topics could be used as a feedback to the system. Finally, movie topics
are efficient features for movie recommendation systems as they represent the semantic
patterns behind movies. With user movie reviews as data, movie topics capture the
essential movie aspects such as genre and mood. Our prototyping approach to feature
extraction has the potential to scale for a large number of movies.

17
SKILLS ACQUIRED

PROFESSIONAL SKILLS
 Leadership qualities
 Professionalism
 Building relationships
 Work ethics
 Self confidence

 Building a network of would-be founders and technical


specialists.
 Flexibility towards work.
 Ability to work independently
 Oral communication
 Problem solving skills

SCIENTIFIC SKILLS

 Learning how to build presentations — from storyboarding to


design, copywriting, and rehearsing
 To Improve data processing skills with Machine Learning
Algorithms
 Python Programming ML Library Scikit, Numpy , Matplotlib,
Pandas
 To analyze tools like Python and it’s libraries, Statistics, Probability
Theories which are used by Data Scientists globally to solve toil
problems.
 To perform Pandas Profiling for visualizing the data.
 How to determine and measure program complexity,
 Learnt to solve statistics and mathematical concepts.

18
WEEKLY OVERVIEW OF INTERNSHIP ACTIVITIES

Date Day Name of the topic/Module Competed


11/05/2020 Monday Introduction to machine learning
Applications of machine learning
1st
12/2020 Tuesday Algorithms of machine learning:
w
supervised learning
e
unsupervised learning
e
Reinforced learning
k 13/05/2020 Wednesday Explanation about supervised learning and its
types
a. Classification
b. Regression
And their applications
14/05/2020 Thursday Explained unsupervised learning and its types.
a. labeled data
b. unlabelled data
And their applications.
15/05/2020 Friday Regression algorithms
1. Linear regression
2.multiple regression
3. Polynomial regression.
16/05/2020 Saturday 1. Explained Application Program Interface.
2. Introduction to python.

19
Date Day Name of the topic/Module Competed

18/05/2020 Monday Data types in python


2nd Loops
w Pop methods
e Sorting technique
e
19/05/2020 Tuesday Programs in python
k

20/05/2020 Wednesday python modules


a. numpy
b. pandas
c. scipy
d. matplotlib
e. requests
f. seaborn
g.scikit learn
21/05/2020 Thursday Installation of python modules.

22/05/2020 Friday Creation of data frame by using numpy arrays.


Introduction to matiplotlib online library.

23/05/2020 Saturday Explained dictionaries in python.


And difference between loc and iloc.

20
Date Day Name of the topic/Module Competed

25/05/2020 Monday Data visualization using matplotlibe


3rd
Types of plots and its explanation.
w 26/05/2020 Tuesday Basic program to plot a graph using
e matplotlibe library.
e
27/05/2020 Wednesday Introduction to seaborn module.
k
And programs using seaborn.
28/05/2020 Thursday Explained matrix.
a. addition of 2 matrices
b. multiplication of 2 matrices.
c. matrix transpose.
d. matrix inverse.
29/05/2020 Friday Special matrix
Diagonal matrix
symmetric matrix
Orthogonal matrix.
30/05/2020 Saturday Explained Eigen values, and explained
difference between loc and iloc.

21
Date Day Name of the topic/Module Competed
4th 1/06/2020 Monday Programs using Eigen values.
w Eigen decomposition.
2/06/2020 Tuesday Explanation about singular value
e
decomposition and its applications.
e
3/06/2020 Wednesday Explained probability
k
a. covariance and correlation.
b. program for square matrix using machine
learning.
4/06/2020 Thursday Reformulation.
Implementation of linear algebra using
inversion method.
5/06/2020 Friday Introduction to scikit learn.
Model fitting in supervised and unsupervised
learning.

6/06/2020 Saturday Finding regression matrices using


a. Mean absolute error.
b. Mean squared error.
c. R*R score.

Date Day Name of the topic/Module Competed


5th
8/06/2020 Monday Implementation of simple linear regression
w
using scikit learn
e 9/06/2020 Tuesday Prediction of Boston house price using
e polynomial regression and ridge regression.
k 10/06/2020 Wednesday Explanation about recommendations.

22
11/06/2020 Thursday Converting Text to numerical data conversion.
Tokenizing

12/06/2020 Friday Developing movies recommendations

13/06/2020 Saturday Developing Movies recommendations.

Date Day Name of the topic/Module Competed

15/06/2020 Monday Developing zomato recommendations.


6th
w 16/06/2020 Tuesday Introduction to fruit data set using machine
e learning.
e Explaining about logistic regression and its
k applications.
17/06/2020 Wednesday Explained about root mean square error and
developing a program using logistic regression.

18/06/2020 Thursday Introduction about regularization and


developed logistic regression for multi classes.

19/06/2020 Friday Introduction to KNN algorithm.

20/06/2020 Saturday Implementation of KNN algorithm using


python.

23
Date Day Name of the topic/Module Competed

22/06/202 Monday Developing fruit detection using matplotlib


7th
0 inline.
w 23/06/202 Tuesday Introduction to navy bayes and its types.
e 0 a. Multinomial naive bayes
e b. Bernoulis naive bayes
k c. Gaussian naïve bayes
24/06/202 Wednesday Developed Predicting survival of Navie Bayes.
0
25/06/202 Thursday Introduction to Flask and its installation process.
0

26/06/202 Friday Creating html page and deploying a model.


0 And introduction to pickling.

27/06/202 Saturday Pickling vectorlizer and loading vectorlizer.


0 And created account in python anywhere.

24
BIBLIOGRAPHY

1. Peng, Xiao, Shao Liangshan, and Li Xiuran. "Improved Collaborative Filtering


Algorithm in the Research and Application of Personalized Movie
Recommendations",
2013 Fourth International Conference on Intelligent Systems Design and
Engineering Applications, 2013.
2. Munoz-Organero, Mario, Gustavo A. Ramíez-González, Pedro J. Munoz-
Merino, and Carlos Delgado Kloos. "A Collaborative Recommender System
Based on Space-
Time Similarities", IEEE Pervasive Computing, 2010.
3. Al-Shamri, M.Y.H.. "Fuzzy-genetic approach to recommender systems based
on a novel hybrid user model", Expert Systems With Applications, 200810
4. Hu Jinming. "Application and research of collaborative filtering in e-
commerce recommendation system", 2010 3rd International Conference on
Computer Science and Information Technology, 07/2010
5. Xu, Qingzhen Wu, Jiayong Chen, Qiang. "A novel mobile personalized
recommended method based on money flow model for stock exchange.
(Research", Mathematical Problems in Engineering, Annual 2014 Issue
6. Yan, Bo, and Guanling Chen. "AppJoy : personalized mobile application
discovery", Proceedings of the 9th international conference on Mobile systems
applications and services
- MobiSys 11 MobiSys 11, 2011.
7. Davidsson C, Moritz S. Utilizing implicit feedback and context to
recommend mobile applications from first use.In: Proc. of the Ca RR 2011. New
York: ACM Press, 2011. 19- 22.http://dl.acm.org/citation.cfm?
id=1961639[doi:10.1145/1961634.1961639]

25
26

You might also like