Recommender Systems Notes
Recommender Systems Notes
Recommender Systems
Definition?
A recommender system (or recommendation system) is a class of machine learning
that uses data to help predict, narrow down, and find what people are looking for among an
exponentially growing number of options.
A recommender system, or a recommendation system, is a subclass of information
filtering systems that provide suggestions for items that are most pertinent to a particular
user.
Role:
Based on the user's profile, these systems can predict whether a product will be
preferable by a user or not. More broadly, recommender systems represent user preferences
for the purpose of suggesting items to purchase or examine and are now an integral part of a
lot of e-commerce sites.
Types/Techniques:
Collaborative Filtering. The collaborative filtering method is based on gathering and
analyzing data on user's behavior.
Content-Based Filtering.
Knowledge-based system.
Hybrid Recommendation Systems.
Algorithm:
The most common is the Linear Regression Algorithm. The linear regression
algorithm is used to find the best linear approximation to a data set. In a recommender
system, this algorithm is used to predict how a user will rate an item based on their past
ratings.
Benefits:
Revenue and sales increase.
User satisfaction growth.
Turnover increase.
Deficiency of information.
Information variability.
But here are some moments, that we should take into consideration:
o Unpredictable performance.
Recommender System use case:
News and media: Recommendation systems are used in news and media platforms to
recommend articles, videos, and other content that is relevant to a user's interests.
Social media: Recommendation systems are used in social media to recommend
friends, groups, or posts that are likely to be of interest to users.
Learning:
It is ―a process that leads to change, which occurs as a result of experience and
increases the potential for improved performance and future learning‖.
Types of Learning:
• Auditory learning (―by listening and speaking―).
• Visual learning (―through the eyes, by watching‖).
• Kinesthetic (Learning through the intellect - experience and practice).
• Haptic learning (―by touching and feeling‖).
Artificial Intelligence:
Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to
intelligence displayed by humans or by other animals. Also refers to the simulation of human
Page 1 of 21
Unit - I
intelligence in machines that are programmed to think and act like humans using robust
datasets.
Machine Learning:
Machine learning is a branch of artificial intelligence (AI) and computer science
which focuses on the use of data and algorithms to imitate the way that humans learn,
gradually improving its accuracy.
Data Science:
Data science is the study of data to extract meaningful insights for business. It is a
multidisciplinary approach that combines principles and practices from the fields of
mathematics, statistics, artificial intelligence, and computer engineering to analyze large
amounts of data.
Deep Learning:
Deep learning is a subset of machine learning, which is essentially a neural network
with three or more layers. These neural networks attempt to simulate the behavior of the
human brain—albeit far from matching its ability—allowing it to ―learn‖ from large amounts
of data.
Big Data:
Big data is a collection of data from many different sources and is often describe by
five characteristics: volume, value, variety, velocity, and veracity. Put simply, big data is
larger, more complex data sets, especially from new data sources.
Analysis vs Analytics
Analysis is the division of a whole into small components, and analytics is the science
of logical analysis. While analysis looks backward over time and works on the facts and
figures of what has happened, analytics work towards modeling the future or predicting a
result.
Labeled and unlabeled data:
It is the data that has one or more predefined tags such as name, type, or number. For
example, an image has an apple or banana. At the same time, unlabelled data contains no tags
or no specified name. Labeled data is used in Supervised Learning techniques, whereas
unlabelled data is used in Unsupervised Learning.
Supervised Learning:
Supervised learning, also known as supervised machine learning, is a subcategory of
machine learning and artificial intelligence. It is defined by its use of labeled datasets to train
algorithms that to classify data or predict outcomes accurately.
Steps: 1. Prepare Data. 2. Choose an Algorithm. 3. Fit a Model
Eg.:
1. Linear regression for regression problems.
2. Random forest for classification and regression problems.
3. Support vector machines for classification problems.
Unsupervised learning:
It is also known as unsupervised machine learning, uses machine learning algorithms
to analyze and cluster unlabeled datasets. These algorithms discover hidden patterns or data
groupings without the need for human intervention.
Supervised Learning vs Unsupervised Learning:
Supervised machine learning is generally used to classify data or make predictions,
whereas unsupervised learning is generally used to understand relationships within datasets.
Supervised machine learning is much more resource-intensive because of the need for
labelled data.
Page 2 of 21
Unit - I
Underfitting:
Underfitting is a situation when your model is too simple for your data. More
formally, your hypothesis about data distribution is wrong and too simple — for example,
your data is quadratic and your model is linear. This situation is also called high bias. This
means that your algorithm can do accurate predictions, but the initial assumption about the
data is incorrect.
Page 3 of 21
Unit - I
Overfitting:
Overfitting is a situation when your model is too complex for your data. More
formally, your hypothesis about data distribution is wrong and too complex — for example,
your data is linear and your model is high-degree polynomial. This situation is also called
high variance. This means that your algorithm can’t do accurate predictions — changing the
input data only a little, the model output changes very much.
Overfitting means that your model makes not accurate predictions. In this case, train
error is very small and val/test error is large.
When you find a good model, train error is small (but larger than in the case of
overfitting), and val/test error is small too.
In the above case, the test error and validation error are approximately the same. This
happens when everything is fine, and your train, validation, and test data have the same
distributions. If validation and test error are very different, then you need to get more data
similar to test data and make sure that you split the data correctly.
Page 4 of 21
Unit - I
To complicate the model, you need to add more parameters (degrees of freedom).
Sometimes this means to directly try a more powerful model — one that is a priori capable to
restore more complex dependencies (SVM with different kernels instead of logistic
regression). If the algorithm is already quite complex (neural network or some ensemble
model), you need to add more parameters to it, for example, increase the number of models
in boosting. In the context of neural networks, this means adding more layers / more neurons
in each layer / more connections between layers / more filters for CNN, and so on.
To simplify the model, you need contrariwise to reduce the number of parameters.
Either completely change the algorithm (try random forest instead of deep neural network),
or reduce the number of degrees of freedom. Fewer layers, fewer neurons, and so on.
Regularization:
Regularization refers to techniques that are used to calibrate machine learning models
in order to minimize the adjusted loss function and prevent overfitting or underfitting.
Regularization is a set of techniques that can prevent overfitting in neural networks
and thus improve the accuracy of a Deep Learning model when facing completely new data
from the problem domain.
L1 regularization:
It is also known as Lasso regularization, is a machine-learning strategy that inhibits
overfitting by introducing a penalty term into the model's loss function based on the absolute
values of the model's parameters.
L2 regularization:
It acts like a force that removes a small percentage of weights at each iteration.
Therefore, weights will never be equal to zero. L2 regularization penalizes (weight). There is
an additional parameter to tune the L2 regularization term which is called regularization rate
(lambda).
Page 5 of 21
Unit - I
Goal:
• Increasing the product sales is the primary goal of a recommender system, utilized by
merchants to increase their profit.
• How? By recommending carefully selected items to users, recommender systems bring
relevant items to the attention of users.
• To achieve the broader business-centric goal of increasing revenue, the common
operational and technical goals of recommender systems are as follows:
• Relevance (logical connection):
a. To recommend items that is relevant to the user at hand.
• Novelty(new & unusual):
a. To recommend the item that is something that the user has not seen in the past.
Eg. Popular movies of a preferred genre would rarely be novel to the user.
Repeated recommendation of popular items can also lead to reduction in the sales
diversity.
• Serendipity(unexpected, good luck, by chance in finding good things- coincidence):
a. Serendipity is different from novelty in that the recommendations are truly
surprising to the user, rather than simply something they did not know about
before. The serendipitous methods focus on discovering such recommendations.
b. Eg. if a new Indian restaurant opens in a neighborhood, then the recommendation
of that restaurant to a user who normally eats Indian food is novel but not
necessarily serendipitous.
c. On the other hand, when the same user is recommended Ethiopian food, and it
was unknown to the user that such food might appeal to him, then the
recommendation is serendipitous.
d. Serendipity has the beneficial side effect of increasing sales diversity or beginning
a new trend of interest in the user results in long-term and strategic benefits to the
merchant.
Page 6 of 21
Unit - I
Page 7 of 21
Unit - I
• Netflix was founded as a mail-order digital video disc (DVD) rental company of movies
and television shows, which was eventually expanded to streaming on a subscription
basis.
• Netflix provides users the ability to rate the movies and television shows on a 5-point
scale.
• Furthermore, the user actions in terms of watching various items are also stored by
Netflix. These ratings and actions are then used by Netflix to make recommendations.
• Netflix does an excellent job of providing explanations for the recommended items.
• It explicitly provides examples of recommendations based on specific items that were
watched by the user.
• Such information provides the user with additional information to decide whether or not
to watch a specific movie
• The Netflix Prize contest is notable for its numerous contributions to recommendation
research.
Google News Personalization System:
• Able to recommend news to users based on their history of clicks. The clicks are
associated with specific users based on identification mechanisms enabled by Gmail
accounts.
• but no mechanism exists for them to show their dislike. (implicit and unary ratings).
Facebook Friend Recommendations:
• Social networking sites often recommend potential friends to users in order to increase
the number of social connections at the site.
• This kind of recommendation has slightly different goals than a product recommendation.
While a product recommendation directly increases the profit of the merchant by
facilitating product sales, an increase in the number of social connections improves the
experience of a user at a social network. This, in turn, encourages the growth of the social
network. Social networks are heavily dependent on the growth of the network to
increase their advertising revenues.
• Therefore, the recommendation of potential friends (or links) enables better growth
and connectivity of the network. This problem is also referred to as link prediction in the
field of social network analysis. Such forms of recommendations are based on structural
relationships rather than ratings data.
Computational advertising?
• Computational Advertising (CA) is a scientific sub-discipline at the intersection of
information retrieval, statistical modeling, machine learning, optimization, large
scale search and text analysis. The core problem addressed in Computational
Advertising is of match-making between the ads and the context.
Goal of computational advertising:
• The goals of computational advertising are to achieve a more efficient allocation of
advertising resources through better targeting and improve effectiveness through
enhanced ad relevance and personalization.
Page 8 of 21
Unit - I
Page 9 of 21
Unit - I
• Therefore, Bob’s ratings on similar science fiction movies like Alien and Predator can be
used to predict his rating on Terminator.
• Similarity functions are computed between the columns of the ratings matrix to discover
similar items.
Advantage:
• The advantages of memory-based techniques are that they are simple to implement and
the resulting recommendations are often easy to explain.
Disadvantage:
• On the other hand, memory-based algorithms do not work very well with sparse ratings
matrices.
• For example, it might be difficult to find sufficiently similar users to Bob, who have rated
Gladiator. In such cases, it is difficult to robustly predict Bob’s rating of Gladiator. In
other words, such methods might lack full coverage of rating predictions.
• The lack of coverage is often not an issue, when only the top-k items are required.
2. Model-based methods:
• In model-based methods, machine learning and data mining methods are used in the
context of predictive models.
• In cases where the model is parameterized, the parameters of this model are learned
within the context of an optimization framework.
• Some examples of such model-based methods include decision trees, rule-based
models, Bayesian methods and latent factor models.
• Many of these methods, such as latent factor models, have a high level of coverage even
for sparse ratings matrices.
Types of Ratings:
• The ratings are often specified on a scale that indicates the specific level of like or dislike
of the item at hand.
• The ratings to be continuous values, can take on any value between -10 and 10.
• Usually, the ratings are interval-based, where a discrete set of ordered numbers are used
to quantify like or dislike. Such ratings are referred to as interval-based ratings. For
example, a 5-point rating scale might be drawn from the set {−2,−1, 0, 1, 2}, in which a
rating of −2 indicates an extreme dislike, and a rating of 2 indicates a strong affinity to
the item. Other systems might draw the ratings from the set {1, 2, 3, 4, 5}.
• The use of 5-point, 7-point, and 10-point ratings is particularly common.
binary ratings 5-point interval ratings Unbalanced rating scale Ordinal ratings
Like, Dislike ***** Loved it Excellent Strongly Disagree
0,1 **** I liked it Very Good Disagree
*** It was ok Good Neutral
** I didn’t like it Fair Agree
* I hated it Poor Strongly Agree
• There may be an even number of possible ratings, and the neutral rating might be
missing. This approach is referred to as a forced choice rating system.
• One can also use ordered categorical values such as {Strongly Disagree, Disagree,
Neutral, Agree, Strongly Agree} in order to achieve the same goals referred to as ordinal
ratings.
• Binary ratings, the user may represent only a like or dislike for the item and nothing
else. For example, the ratings may be 0, 1, or unspecified values. The unspecified values
need to be predicted to 0-1 values.
• A special case of ratings is that of unary ratings, in which there is a mechanism for a user
to specify a liking for an item but no mechanism to specify a dislike.
Page 10 of 21
Unit - I
• Unary ratings are particularly common, especially in the case of implicit feedback data
sets.
• In these cases, customer preferences are derived from their activities rather than their
explicitly specified ratings.
• For example, the buying behavior of a customer can be converted to unary ratings. When
a customer buys an item, it can be viewed as a preference for the item. However, the act
of not buying an item from a large universe of possibilities does not always indicate a
dislike.
• Similarly, many social networks, such as Facebook, use “like” buttons, which provide
the ability to express liking for an item. However, there is no mechanism to specify
dislike for an item.
• The implicit feedback setting can be viewed as the matrix completion analog of the
positive-unlabeled (PU) learning problem in data classification.
• A ratings matrix is sometimes referred to as a utility matrix, although the two may not
always be the same.
• In the unary rating, the matrix is referred to as a positive preference utility matrix
because it allows only the specification of positive preferences.
Page 11 of 21
Unit - I
• However, the item description of Terminator contains similar genre keywords as other
science fiction movies, such as Alien and Predator. In such cases, these movies can be
recommended to John.
Advantage:
• It will make recommendations for new items, when sufficient rating data are not available
for that item. This is because other items with similar attributes might have been rated by
the active user.
• Therefore, the supervised model will be able to leverage these ratings in conjunction with
the item attributes to make recommendations even when there is no history of ratings for
that item.
Disadvantage:
• In many cases, content-based methods provide obvious recommendations because of the
use of keywords or content. For example, if a user has never consumed an item with a
particular set of keywords, such an item has no chance of being recommended. This is
because the constructed model is specific to the user at hand, and the community
knowledge from similar users is not leveraged. This phenomenon tends to reduce the
diversity of the recommended items, which is undesirable.
• Even though content-based methods are effective at providing recommendations for new
items, they are not effective at providing recommendations for new users. This is because
the training model for the target user needs to use the history of her ratings. In fact, it is
usually important to have a large number of ratings available for the target user in order to
make robust predictions without overfitting.
Page 12 of 21
Unit - I
• A particular item may have attributes associated with it that correspond to its various
properties, and a user may be interested only in items with specific properties.
• For example, cars may have several makes, models, colors, engine options, and interior
options, and user interests may be regulated by a very specific combination of these
options. Thus, in these cases, the item domain tends to be complex in terms of its varied
properties, and it is hard to associate sufficient ratings with the large number of
combinations at hand.
• The recommendation process is performed on the basis of similarities between customer
requirements and item descriptions, or the use of constraints specifying user
requirements.
• The process is facilitated with the use of knowledge bases, which contain data about rules
and similarity functions to use during the retrieval process.
• The explicit specification of requirements results in greater control of users over the
recommendation process.
• In both collaborative and content-based systems, recommendations are decided entirely
by either the users past actions/ratings, the action/ratings of her peers, or a combination of
the two.
• Knowledge-based systems are unique in that they allow the users to explicitly specify
what they want.
Knowledge-based recommender systems can be classified on the basis of the type of the
interface (and corresponding knowledge) used to achieve the goals.
Page 13 of 21
Unit - I
• In case-based recommender systems, specific cases are specified by the user as targets or
anchor points.
• Similarity metrics are defined on the item attributes to retrieve similar items to these
cases.
• The similarity metrics are often carefully defined in a domain-specific way. Therefore,
the similarity metrics form the domain knowledge that is used in such systems.
• The returned results are often used as new target cases with some interactive
modifications by the user.
• For example, when a user sees a returned result, which is almost similar to what they
want, they might re-issue a query with that target, but with some of the attributes changed
to the user’s liking.
• This interactive process is used to guide the user towards items of interest.
Page 14 of 21
Unit - I
• The form of the guidance may often • Critiquing interfaces are particularly popular
take the form of search-based systems, for expressing feedback in such systems,
where users specify their constraints where users iteratively modify one or more
with a search-based interface attributes of a preferred item in each iteration
Conversational systems:
• The user preferences are determined iteratively in the context of a feedback loop. The
main reason for this is that the item domain is complex and the user preferences can be
determined only in the context of an iterative conversational system.
Search-based systems:
• The user preferences are elicited by using a preset sequence of questions such as the
following: ―Do you prefer a house in a suburban area or within the city?‖ In some cases,
specific search interfaces may be set up in order to provide the ability to specify user
constraints.
Navigation-based recommendation:
• The user specifies a number of change requests to the item being currently recommended.
• Through an iterative set of change requests, it is possible to arrive at a desirable item.
• An example of a change request specified by the user, when a specific house is being
recommended is as follows: ―I would like a similar house about 5 miles west of the
currently recommended house.‖ Such recommender systems are also referred to as
critiquing recommender systems
Note: The main difference is that content-based systems learn from past user behavior,
whereas knowledge-based recommendation systems recommend based on active user
specification of their needs and interests.
Page 15 of 21
Unit - I
• Different systems use different types of input, and have different strengths and
weaknesses.
Page 16 of 21
Unit - I
• Predictive accuracy metrics, classification accuracy metrics, rank accuracy metrics, and
non-accuracy measurements are the four major types of evaluation metrics for
recommender systems
• Predictive measures address the subject of how close ratings of recommender systems are
to the user ratings. They are a good choice for non-binary tasks.
• Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are the most popular
and easy to interpret predictive metrics.
Domain-Specific Challenges in Recommender Systems:
• In different domains, such as temporal data, location-based data, and social data, the
context of the recommendation plays a critical role.
• Therefore, the notion of contextual recommender systems was developed to address the
additional side information that arises in these domains.
• This notion is used with different modifications for various types of data, such as
temporal data, location data, or social data.
Context-Based Recommender Systems:
• Context-based or context-aware recommender systems take various types of contextual
information into account, while making recommendations.(time, location, social media).
• Eg. Cloths – season & location of customer. – festival, holidays affects these.
Page 17 of 21
Unit - I
Page 18 of 21
Unit - I
• The most common form of feedback is social tagging. Such forms of feedback are
particularly common on content sharing sites on the Web, such as Flickr (photo sharing),
last.fm (music sharing), and Bibsonomy (scientific literature sharing).
• Tags are meta-data that users utilize to add short informative keywords to the content.
• For example, a user on a music site might tag Michael Jackson’s Thriller album as
―rock.‖
• Such tags provide useful information about the interests of both the user and the content
of the item because the tag is associated with both.
• The tags serve as useful context for performing the recommendations. Methods for
context-sensitive recommendations can be directly used to incorporate this feedback into
the recommendation process.
Advanced Topics and Applications:
The Cold-Start Problem in Recommender Systems:
• One of the major problems in recommender systems is that the number of initially
available ratings is relatively small.
• In such cases, it becomes more difficult to apply traditional collaborative filtering models.
• While content-based and knowledge-based methods are more robust than collaborative
models in the presence of cold starts, such content or knowledge might not always be
available.
• Therefore, a number of specific methods have been designed to improve the problem of
cold start in the context of recommender systems.
Attack-Resistant Recommender Systems:
• The use of recommender systems has a significant impact on the sale of various products
and services.
• As a result, the sellers of products and services have significant economic incentives to
manipulate the output of recommender systems.
• One example of such a manipulation would be to submit inflated ratings of their own
products to the recommender systems.
• A malicious rival might submit biased and negative reviews about the products of a
competitor.
• Over the years, numerous sophisticated strategies have been developed for attacking
recommender systems.
• Such attacks are highly undesirable because they reduce the overall effectiveness of the
recommender system and reduce the quality of experience for legitimate users.
• Therefore, methods are needed that enable robust recommendations in the presence of
such attacks.
Group Recommender Systems:
• The recommendation system is tailored to recommend a particular activity to a group of
users rather than a single user.
• Eg. might include the watching of movie or television by a group, the selection of music
in a fitness center, or the travel recommendations to a group of tourists.
• Simple averaging strategies do not work well when groups are heterogeneous and contain
users with diverse tastes.
• This is because users often have an impact on each other’s tastes based on phenomena
from social psychology, such as emotional contagion (Contamination) and conformity
(orthodoxy).
Multi-Criteria Recommender Systems:
• In multi-criteria systems, ratings might be specified on the basis of different criteria by
a single user.
• For example, a user might rate movies based on the plot, music, special effects, and so on.
Page 19 of 21
Unit - I
• Such techniques often provide recommendations by modeling the user’s utility for an
item as a vector of ratings corresponding to various criteria.
• In multi-criteria recommender systems, one can often obtain misleading results by using
only the overall rating in conjunction with a traditional recommender system.
• For example, if two users have the same overall rating for a movie, but their component
ratings for the plot and music are very different, then the two users should not be
considered similar from the perspective of a similarity-based collaborative filtering
algorithm.
• In some of the multi-criteria systems, users may not specify an overall rating at all. In
such cases, the problem is even more challenging because it is needed to present ranked
lists of items to various users on the basis of multiple criteria.
• Some of the methods for group recommender systems can also be adapted to multi-
criteria recommender systems.
• However, the two topics are generally considered different because they emphasize
different aspects of the recommendation process.
Active Learning in Recommender Systems:
• A major challenge in recommender systems is the acquisition of sufficient ratings in order
• to make robust predictions.
• The sparsity of the ratings matrix continues to be a significant impediment in effective
functioning of recommender systems.
• The acquisition of sufficient ratings can reduce the sparsity problem.
• A variety of real-world recommender systems have mechanisms to encourage users to
enter ratings in order to populate the system.
• For example, users might be provided incentives to rate certain items.
• In general, it is often difficult to obtain too many ratings from the single user because of
the high cost of the acquisition process.
• Therefore, one must judiciously select the items to be rated by specific users.
• For example, if a user has already rated a lot of action movies, then asking the user to rate
another action movie does not help much in predicting ratings of other action movies, and
it helps even less in predicting ratings of movies belonging to unrelated genres.
• On the other hand, asking the user to rate movies belonging to less populated genres will
help significantly in predicting ratings of movies belonging to that genre.
• Of course, if a user is asked to rate an unrelated movie, it is not necessary that she will be
able to provide feedback because she might not have watched that movie at all.
• There are many interesting trade-offs in the problem of active learning of recommender
systems, that are not encountered in other problem domains like classification
Privacy in Recommender Systems:
• Recommender systems are based heavily on feedback from the users, which might be
implicit or explicit.
• This feedback contains significant information about the interests of the user, and it might
reveal information about their political opinions, sexual orientations, and personal
preferences.
• In many cases, such information can be highly sensitive, which leads to privacy concerns.
• Such privacy concerns are significant in that they hinder the release of data necessary for
the advancement of recommendation algorithms.
• The availability of real data is crucial for algorithmic advances.
• For example, the contribution of the Netflix Prize data set to the recommender systems
community is invaluable, in that it can be credited with motivating the development of
many state-of-the-art algorithms.
Page 20 of 21
Unit - I
Page 21 of 21