Within Netflix, personalization is a key differentiator, helping members to quickly discover new content that matches their taste. Done well, it creates an immersive user experience, however when the recommendation is out of tune, it is immediately noticed by our members. During this presentation I will cover some of the personalization and recommendation tasks that jointly define the Netflix user experience that entertains more that 130M members world wide. In particular, I will focus on several of the algorithmic challenges related to the launch of new Netflix originals in the service, and go over concepts such as causality, incrementality and explore-exploit strategies.
The research presented in this talk represents the collaborative efforts of a team of research scientists and engineers at Netflix on our journey to create best in class user experiences.
1 of 39
More Related Content
Correlation, causation and incrementally recommendation problems at netflix september 2018 - university of antwerp
9. Correlation (X,Y) is high, does it mean…
… X causes Y? … Y causes X?
In general, neither!
Most common reason: unobserved confounder
X Y
Unobserved
Observed Observed
C
“Omited variable bias”
11. Advertising
● High probability of conversion the day before weekly groceries irrespective
of adverts shown
● Effect of Pampers ads is null in this case.
Traditional (correlational) machine learning will fail
and waste $ on useless ads
W1 W2 W3 W4 W5
Probability of
buying:
Advertise?$ $ $ $
in practice, Cost-Per-Incremental-Acquisition can be > 100x Cost-Per-Acquisition (!!!!!)
12. Netflix Promotions
Netflix homepage is an expensive real-estate (opportunity cost):
- so many titles to promote
- so few opportunities to win a “moment of truth”
D1 D2 D3 D4 D5
Promote?▶ ▶ ▶ ▶
13. Netflix Promotions
Netflix homepage is an expensive real-estate (opportunity cost):
- so many titles to promote
- so few opportunities to win a “moment of truth”
Traditional (correlational) ML systems:
- take action if probability of positive reward is high, irrespective of reward
base rate
- don’t model incremental effect of taking action
D1 D2 D3 D4 D5
Promote?▶ ▶ ▶ ▶
15. Why do predictive models fail?
Endogeneity: observed variable correlates with error term
● Confounding factors and omitted variable bias
● Simultaneity: Explanatory variable is jointly determined
with the dependent variable
Impressions
Plays
16. Why do predictive models fail?
Endogeneity: observed variable correlates with error term
● Confounding factors and omitted variable bias
● Simultaneity: Explanatory variable is jointly determined
with the dependent variable
● Non-random noncompliance: Members are more likely
to visit on some hours than others, affecting “intent to
treat”
0 3 6 9 12 15 18 21 24
Visits/hour
17. Correlation vs Causation
● Statistical models capture dependencies and correlation
● Common (mis)-interpretation: β captures effect size
● Changing x for large β changes prediction, but not
outcome!
● Solution: causal modeling
● Remove bias in β, many go to zero
20. Background and notation
● Title t belongs to the pool of candidate titles T, eligible for promotion in
Billboard when member m visits the homepage
● Let xm,t
be a context vector for member m and title t
● Let ym,t
be the label indicating a play of title t by member m from the
homepage, after having seen a billboard.
21. What (sequence of) actions will maximize the
cumulative reward?
● Reinforcement Learning
● Multi-Armed Bandits
● Acknowledge the need for balancing
exploration and exploitation
○ Allow sub-optimal actions, to collect unbiased treatment
effects and learn the probability distributions over the
space of possible actions.
B B7
7 7B
7 77
?
R3
R2
R1
22. ϵ-greedy policy
● Explore → Collect experimental data
○ With ϵ probability, select at random a title for promotion in Billboard
○ Log context (xm,t
)
○ Causal observations of play-feedback (ym,t
)
25. ϵ-greedy policy
● Explore → Collect experimental data
○ With ϵ probability, select at random a title for promotion in Billboard
○ Log context (xm,t
)
○ Causal observations of play-feedback (ym,t
)
● Exploit → Train on the experimental data
○ With (1-ϵ) probability, select the optimal title for promotion
● Alternatives: UCB, Thompson Sampling
26. Greedy exploit model
● Learn a model per title to predict likelihood of play
P(ym,t
| xm,t
,T) = σ( f(xm,t
, Θ) )
● Pick winning title:
t = argmax P(ym,t
| xm,t
,T)
● Various models can be used to predict probability of
play, such as logistic regression, GBDT, neural networks
27. Exploit - Launch Day
Correlation
Explore 0.93
Corr. ML 0.61
Exploit 0.88
28. Considerations for ϵ-greedy policy
● Explore
○ Bandwidth allocation and cost of exploration
○ New vs existing titles
● Exploit
○ Model synchronisation
○ Title availability (group censoring)
○ Observation window
○ Frequency of model update
○ Incremental updates vs batch training
■ Stationarity of title popularities
?
?
?
? ??
?
29. Online learning works great for title
cold start scenarios, but...
MABs are
greedy, not
lift-based!
31. Incrementality-based policy
● Goal: Select title for promotion that benefits most from
being shown in billboard
○ Member can play title from other sections on the homepage or search
○ Popular titles likely to appear on homepage anyway: Trending Now
○ Better utilize most expensive real-estate on the homepage!
● Define policy to be incremental with respect to probability of play
32. Incrementality-based policy
● Goal: Select title for promotion that benefits most from
being shown in billboard
t = argmax [ P(ym,t
| xm,t
, T, b=1) - P(ym,t
| xm,t
, T, b=0) ]
Where b is an indicator for the treatment of a title being shown in billboard (b=1),
versus not being shown in billboard (b=0)
33. Offline evaluation: Replay [Li et al, 2010]
● Relies upon uniform exploration data.
● For every record in the uniform exploration log
{context, title k shown, reward, list of candidates}
● For every record:
○ Evaluate the trained model for all the titles in the candidate pool.
○ Pick the winning title k’
○ Keep the record in history if k’ = k (the title impressed in the logged
data) else discard it.
○ Compute the metrics from the history.
34. Offline evaluation: Replay [Li et al, 2010]
Uniform Exploration Data - Unbiased evaluation
Evaluation
Data
Train Data
Trained
Model
Reveal context x
Use reward only if k’ = k
Winner title k’
context,title,reward
context,title,reward
context,title,reward
Take Rate = # Plays
# Matches
35. Offline replay
Greedy exploit has higher replay
take rate than incrementality based
model….
Incrementality Based Policy
sacrifices replay by selecting a
lesser known title that would benefit
from being shown on the Billboard.
Lift in Replay in the various algorithms as
compared to the Random baseline
36. Which titles benefit from Billboard promotion?
Title A has a low baseline
probability of play, however when
the billboard is shown the
probability of play increases
substantially!
Title C has higher baseline
probability and may not benefit as
much from being shown on the
Billboard. Scatter plot of incremental vs baseline
probability of play for various members.
37. Online observations
● Online take rates for take rates follow the offline
patterns.
● Our implementation of incrementality is able to shift
engagement within the candidate pool.
39. Correlation, causation, and incrementality
Most ML algorithms are correlational, e.g. based on observational data
In this context, the Explore-exploit models are causal
E.g. we train models based on experimental data, where we are in control of
the randomization
Incrementality can be defined as the causal lift in a metric of interest
For instance, the change in probability of play for a title in a session, when a
billboard is shown for that title to a member