Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Mindsight Codex

Download as pdf or txt
Download as pdf or txt
You are on page 1of 87

RECOMMENDER

SYSTEM
SHARING POINTS :
● Introduction to Recommender System
● Content based Filtering
● Collaboration Filtering
● Explicit and implicit
● Matrix Factorization
● Hybrid recommendation
● Data pipeline
● Architecture and deployment
INTRODUCTION
RECOMMENDER SYSTEM
Product Recommender System
● System that predict and arrange relevant list of
items that users would purchase
● A lot of products to show? Which ones to
recommend?
The impact of recommendation and personalization

● Product recommendations drive


revenue
● Shoppers buy more (marketplace),
Visitors view more (youtube etc)
● Shoppers and visitors stay longer
● Upselling/crossselling
● Better user experience
● Lead to repeat visits
Approaches in Designing Recommender System
CONTENT-BASED
RECOMMENDER SYSTEM
CONTENT-BASED
RECOMMENDATION

● Use attibutes about the product (movie


categories: cartoon, action, drama, romance)
● Uses item features to recommend new items
that are similar to what the user has liked in the
past
● Don’t rely on other user preferences/ other user-
item interaction
● User A likes drama and romance, we can use that
information to recommend movies belonging to
those groups.
ITEM’S ATTRIBUTES OR
CHARACTERISTICS
VECTORIZING ATTRIBUTES INTO EMBEDDING MAP

Collection of items mapping


to some finite dimensional
vector space
VECTORIZING ATTRIBUTES
MOVIE CROSS TABLE
JACCARD SIMILARITY
JACCARD DISTANCE CALCULATION
SIMILARITY MEASURES
A similarity measure is a
metric for items in an
embedding space
COSINE SIMILARITY

The closer the documents are by angle, the higher is the


Cosine Similarity (Cos theta).
SIMILARITY MEASURES WITH DOT PRODUCT
TEXT-BASED SIMILARITY
TF-IDF (Term Frequency – Inverse Document)
TEXT-BASED
SIMILARITY
Find similarity with TF / IDF
COLLABORATIVE
FILTERING
Collaborative Filtering
● Recommend product based on history of user behaviors and similarities
between user.
● Similar users might like similar items
● E.g Person 1 purchased item A,B,C. Person 2 purchased item A,B,D.
Then person 1 might also purchase item D
● Involves matrix factorization for large matrix
Types of Collaborative Filtering
Collaborative Filtering
Collaborative Filtering
TWO APPROACH FOR COLLABORATIVE FILTERING

USER-BASED ITEM-BASED
● Uses the similarities ● Uses the similarities
between users between items
● Build a user-to-item matrix ● Build an item-to-user matrix
MEMORY-BASED
COLLABORATIVE
FILTERING
STEP 1
Creating user-to-item matrix

STEP 2
Calculating similarity between
users or items
USER TO ITEM MATRIX
● user 1 has purchased
items B and D
● user 2 has purchased
items A, B, C, and E
● user 4 has purchased
items A, C, and E
CALCULATE COSINE SIMILARITY BETWEEN USER
(1,2) AND (2,4)
USER TO USER ITEM TO ITEM
matrix = df.pivot_table(index='CustomerID',columns='StockCode',values='Quantity',aggfunc='sum'
)

cosine_similarity(matrix)
MATRIX FACTORIZATION
SPARSITY
● A measure how (percentage) empty a matrix is
SPARSITY
● A measure how (percentage) empty a matrix is

The ratings dataframe is


98.30% empty.
MATRIX FACTORIZATION
MATRIX FACTORIZATION
MATRIX FACTORIZATION
WHAT MATRIX FACTORIZATION LOOKS LIKE
LATENT FEATURES
SINGULAR VALUE DECOMPOSITION
WHAT SVD DOES
APPLYING SVD
ALS ALGORITM
ALS ALGORITM, PREDICTING THE RATING
ALS ALGORITM, PREDICTING THE RATING
ALS ALGORITM, LATENT FEATURES
ALS HYPERPARAMETER
ALS Examples Apache Spark
# Split the ratings dataframe into training and test data
(training_data, test_data) = ratings.randomSplit([0.8, 0.2], seed=42)

# Set the ALS hyperparameters


from pyspark.ml.recommendation import ALS
als = ALS(userCol="userId", itemCol="movieId", ratingCol="rating", rank = 10, maxIter = 15, regParam = .1,
coldStartStrategy="drop", nonnegative = True, implicitPrefs = False)

# Fit the mdoel to the training_data


model = als.fit(training_data)

# Generate predictions on the test_data


test_predictions = model.transform(test_data)
test_predictions.show()
IMPLICIT VS EXPLICIT DATA

IMPLICIT FEEDBACK EXPLICIT FEEDBACK


IMPLICIT VS EXPLICIT DATA

IMPLICIT FEEDBACK EXPLICIT FEEDBACK


IMPLICIT DATA
Binary Implicit rating
Recommendation engine library

1. sklearn.decomposition import TruncatedSVD


2. sklearn.metrics.pairwise.cosine_similarity
3. from pyspark.ml.recommendation import ALS
4. from surprise import SVD
5. from lightfm import LightFM
6. import implicit
APPLYING ALS WITH
SPARK
EXPLORATORY DATA ANALYSIS
COLD START PROBLEM,
HYBRID APPROACH, AND
ML OPS FOR RECSYS
HYBRID RECOMMENDATION APPROACHES
HYBRID RECOMMENDATION APPROACHES
COLD START EXAMPLE FOR NEW USER
DATA MODELING PROCESS (example)

GATHERING PREPROCESSING
01 02 DATA
DATA

CREATE
PREDICTION 03 04 INTERACTION
DATA

CREATE
EVALUATION 05 06 RECOMMENDATION
MODEL
DATA MODELING PROCESS (example)
01 02

GATHERING PREPROCESSING
DATA DATA
1. Log Event History user 1.Duplication Check
play content 2.Inconsistency Check
2. User & Content 3.Feature Engineering
Metadata
3. CRM Data
DATA MODELING PROCESS (example)

03 04 05 06

PREDICTION CREATE INTERACTION EVALUATION CREATE RECOMMENDATION


DATA MODEL
Make Interaction table with 1.Split dataset training 1.Precision 1.List of Recommendation per
user and item (80%) & testing (20%) 2.AUC user
2.hyperparameter tuning 2.List of Recommendation per
3.Create model with content
selected Model 3.List of Catalog per user
Model Evaluation (Offline Evaluation)
Recommendation Engine Journey
Model Evaluation (Online Evaluation)
Metric Evaluation (Online Evaluation)
RECOMMENDATION ENGINE
ARCHITECTURE
MACHINE LEARNING PIPELINE
NEAR REALTIME ARCHITECTURE EXAMPLE
BATCH PROCESS ARCHITECTURE EXAMPLE
(DATA ENGINEER TASK)
BATCH PROCESS ARCHITECTURE EXAMPLE
(DATA SCIENTIST TASK)
ML AUTOMATION FOR BATCH PROCESSING
PERFORMANCE CONTROL
RECOMMENDATION SYSTEM IMPLEMENTATION USEETV GO
Q&A

You might also like