Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Correlation, Causation
and Incrementality:
Recommendations at Netflix
Roelof van Zwol
Netflix
Spot the
Algorithms!
98% Match
Spot the
Algorithms!
98% Match
Introducing new content
● Who will watch the show?
● How many members will
watch the show?
● Which canvas to use?
● When to promote?
Overview
● Correlation ≠ Causation
● Online-learning
● Incrementality
Correlation ≠ Causation
Should you stop buying margarine,
to save your marriage?
Correlation (X,Y) is high, does it mean…
… X causes Y? … Y causes X?
Correlation (X,Y) is high, does it mean…
… X causes Y? … Y causes X?
In general, neither!
Most common reason: unobserved confounder
X Y
Unobserved
Observed Observed
C
“Omited variable bias”
Advertising
W1 W2 W3 W4 W5
Probability of
buying:
Advertise?$ $ $ $
Advertising
● High probability of conversion the day before weekly groceries irrespective
of adverts shown
● Effect of Pampers ads is null in this case.
Traditional (correlational) machine learning will fail
and waste $ on useless ads
W1 W2 W3 W4 W5
Probability of
buying:
Advertise?$ $ $ $
in practice, Cost-Per-Incremental-Acquisition can be > 100x Cost-Per-Acquisition (!!!!!)
Netflix Promotions
Netflix homepage is an expensive real-estate (opportunity cost):
- so many titles to promote
- so few opportunities to win a “moment of truth”
D1 D2 D3 D4 D5
Promote?▶ ▶ ▶ ▶
Netflix Promotions
Netflix homepage is an expensive real-estate (opportunity cost):
- so many titles to promote
- so few opportunities to win a “moment of truth”
Traditional (correlational) ML systems:
- take action if probability of positive reward is high, irrespective of reward
base rate
- don’t model incremental effect of taking action
D1 D2 D3 D4 D5
Promote?▶ ▶ ▶ ▶
Surely we can do better!
Why do predictive models fail?
Endogeneity: observed variable correlates with error term
● Confounding factors and omitted variable bias
● Simultaneity: Explanatory variable is jointly determined
with the dependent variable
Impressions
Plays
Why do predictive models fail?
Endogeneity: observed variable correlates with error term
● Confounding factors and omitted variable bias
● Simultaneity: Explanatory variable is jointly determined
with the dependent variable
● Non-random noncompliance: Members are more likely
to visit on some hours than others, affecting “intent to
treat”
0 3 6 9 12 15 18 21 24
Visits/hour
Correlation vs Causation
● Statistical models capture dependencies and correlation
● Common (mis)-interpretation: β captures effect size
● Changing x for large β changes prediction, but not
outcome!
● Solution: causal modeling
● Remove bias in β, many go to zero
CASE STUDY:
Content promotion
through Billboard
98% Match
Online Learning
Background and notation
● Title t belongs to the pool of candidate titles T, eligible for promotion in
Billboard when member m visits the homepage
● Let xm,t
be a context vector for member m and title t
● Let ym,t
be the label indicating a play of title t by member m from the
homepage, after having seen a billboard.
What (sequence of) actions will maximize the
cumulative reward?
● Reinforcement Learning
● Multi-Armed Bandits
● Acknowledge the need for balancing
exploration and exploitation
○ Allow sub-optimal actions, to collect unbiased treatment
effects and learn the probability distributions over the
space of possible actions.
B B7
7 7B
7 77
?
R3
R2
R1
ϵ-greedy policy
● Explore → Collect experimental data
○ With ϵ probability, select at random a title for promotion in Billboard
○ Log context (xm,t
)
○ Causal observations of play-feedback (ym,t
)
Explore - Launch Day
Correlational ML - Launch Day
Correlation
Explore 0.93
Corr. ML 0.61
ϵ-greedy policy
● Explore → Collect experimental data
○ With ϵ probability, select at random a title for promotion in Billboard
○ Log context (xm,t
)
○ Causal observations of play-feedback (ym,t
)
● Exploit → Train on the experimental data
○ With (1-ϵ) probability, select the optimal title for promotion
● Alternatives: UCB, Thompson Sampling
Greedy exploit model
● Learn a model per title to predict likelihood of play
P(ym,t
| xm,t
,T) = σ( f(xm,t
, Θ) )
● Pick winning title:
t = argmax P(ym,t
| xm,t
,T)
● Various models can be used to predict probability of
play, such as logistic regression, GBDT, neural networks
Exploit - Launch Day
Correlation
Explore 0.93
Corr. ML 0.61
Exploit 0.88
Considerations for ϵ-greedy policy
● Explore
○ Bandwidth allocation and cost of exploration
○ New vs existing titles
● Exploit
○ Model synchronisation
○ Title availability (group censoring)
○ Observation window
○ Frequency of model update
○ Incremental updates vs batch training
■ Stationarity of title popularities
?
?
?
? ??
?
Online learning works great for title
cold start scenarios, but...
MABs are
greedy, not
lift-based!
Incrementality
Incrementality-based policy
● Goal: Select title for promotion that benefits most from
being shown in billboard
○ Member can play title from other sections on the homepage or search
○ Popular titles likely to appear on homepage anyway: Trending Now
○ Better utilize most expensive real-estate on the homepage!
● Define policy to be incremental with respect to probability of play
Incrementality-based policy
● Goal: Select title for promotion that benefits most from
being shown in billboard
t = argmax [ P(ym,t
| xm,t
, T, b=1) - P(ym,t
| xm,t
, T, b=0) ]
Where b is an indicator for the treatment of a title being shown in billboard (b=1),
versus not being shown in billboard (b=0)
Offline evaluation: Replay [Li et al, 2010]
● Relies upon uniform exploration data.
● For every record in the uniform exploration log
{context, title k shown, reward, list of candidates}
● For every record:
○ Evaluate the trained model for all the titles in the candidate pool.
○ Pick the winning title k’
○ Keep the record in history if k’ = k (the title impressed in the logged
data) else discard it.
○ Compute the metrics from the history.
Offline evaluation: Replay [Li et al, 2010]
Uniform Exploration Data - Unbiased evaluation
Evaluation
Data
Train Data
Trained
Model
Reveal context x
Use reward only if k’ = k
Winner title k’
context,title,reward
context,title,reward
context,title,reward
Take Rate = # Plays
# Matches
Offline replay
Greedy exploit has higher replay
take rate than incrementality based
model….
Incrementality Based Policy
sacrifices replay by selecting a
lesser known title that would benefit
from being shown on the Billboard.
Lift in Replay in the various algorithms as
compared to the Random baseline
Which titles benefit from Billboard promotion?
Title A has a low baseline
probability of play, however when
the billboard is shown the
probability of play increases
substantially!
Title C has higher baseline
probability and may not benefit as
much from being shown on the
Billboard. Scatter plot of incremental vs baseline
probability of play for various members.
Online observations
● Online take rates for take rates follow the offline
patterns.
● Our implementation of incrementality is able to shift
engagement within the candidate pool.
In Summary
Correlation, causation, and incrementality
Most ML algorithms are correlational, e.g. based on observational data
In this context, the Explore-exploit models are causal
E.g. we train models based on experimental data, where we are in control of
the randomization
Incrementality can be defined as the causal lift in a metric of interest
For instance, the change in probability of play for a title in a session, when a
billboard is shown for that title to a member

More Related Content

Correlation, causation and incrementally recommendation problems at netflix september 2018 - university of antwerp

  • 4. Introducing new content ● Who will watch the show? ● How many members will watch the show? ● Which canvas to use? ● When to promote?
  • 5. Overview ● Correlation ≠ Causation ● Online-learning ● Incrementality
  • 7. Should you stop buying margarine, to save your marriage?
  • 8. Correlation (X,Y) is high, does it mean… … X causes Y? … Y causes X?
  • 9. Correlation (X,Y) is high, does it mean… … X causes Y? … Y causes X? In general, neither! Most common reason: unobserved confounder X Y Unobserved Observed Observed C “Omited variable bias”
  • 10. Advertising W1 W2 W3 W4 W5 Probability of buying: Advertise?$ $ $ $
  • 11. Advertising ● High probability of conversion the day before weekly groceries irrespective of adverts shown ● Effect of Pampers ads is null in this case. Traditional (correlational) machine learning will fail and waste $ on useless ads W1 W2 W3 W4 W5 Probability of buying: Advertise?$ $ $ $ in practice, Cost-Per-Incremental-Acquisition can be > 100x Cost-Per-Acquisition (!!!!!)
  • 12. Netflix Promotions Netflix homepage is an expensive real-estate (opportunity cost): - so many titles to promote - so few opportunities to win a “moment of truth” D1 D2 D3 D4 D5 Promote?▶ ▶ ▶ ▶
  • 13. Netflix Promotions Netflix homepage is an expensive real-estate (opportunity cost): - so many titles to promote - so few opportunities to win a “moment of truth” Traditional (correlational) ML systems: - take action if probability of positive reward is high, irrespective of reward base rate - don’t model incremental effect of taking action D1 D2 D3 D4 D5 Promote?▶ ▶ ▶ ▶
  • 14. Surely we can do better!
  • 15. Why do predictive models fail? Endogeneity: observed variable correlates with error term ● Confounding factors and omitted variable bias ● Simultaneity: Explanatory variable is jointly determined with the dependent variable Impressions Plays
  • 16. Why do predictive models fail? Endogeneity: observed variable correlates with error term ● Confounding factors and omitted variable bias ● Simultaneity: Explanatory variable is jointly determined with the dependent variable ● Non-random noncompliance: Members are more likely to visit on some hours than others, affecting “intent to treat” 0 3 6 9 12 15 18 21 24 Visits/hour
  • 17. Correlation vs Causation ● Statistical models capture dependencies and correlation ● Common (mis)-interpretation: β captures effect size ● Changing x for large β changes prediction, but not outcome! ● Solution: causal modeling ● Remove bias in β, many go to zero
  • 20. Background and notation ● Title t belongs to the pool of candidate titles T, eligible for promotion in Billboard when member m visits the homepage ● Let xm,t be a context vector for member m and title t ● Let ym,t be the label indicating a play of title t by member m from the homepage, after having seen a billboard.
  • 21. What (sequence of) actions will maximize the cumulative reward? ● Reinforcement Learning ● Multi-Armed Bandits ● Acknowledge the need for balancing exploration and exploitation ○ Allow sub-optimal actions, to collect unbiased treatment effects and learn the probability distributions over the space of possible actions. B B7 7 7B 7 77 ? R3 R2 R1
  • 22. ϵ-greedy policy ● Explore → Collect experimental data ○ With ϵ probability, select at random a title for promotion in Billboard ○ Log context (xm,t ) ○ Causal observations of play-feedback (ym,t )
  • 24. Correlational ML - Launch Day Correlation Explore 0.93 Corr. ML 0.61
  • 25. ϵ-greedy policy ● Explore → Collect experimental data ○ With ϵ probability, select at random a title for promotion in Billboard ○ Log context (xm,t ) ○ Causal observations of play-feedback (ym,t ) ● Exploit → Train on the experimental data ○ With (1-ϵ) probability, select the optimal title for promotion ● Alternatives: UCB, Thompson Sampling
  • 26. Greedy exploit model ● Learn a model per title to predict likelihood of play P(ym,t | xm,t ,T) = σ( f(xm,t , Θ) ) ● Pick winning title: t = argmax P(ym,t | xm,t ,T) ● Various models can be used to predict probability of play, such as logistic regression, GBDT, neural networks
  • 27. Exploit - Launch Day Correlation Explore 0.93 Corr. ML 0.61 Exploit 0.88
  • 28. Considerations for ϵ-greedy policy ● Explore ○ Bandwidth allocation and cost of exploration ○ New vs existing titles ● Exploit ○ Model synchronisation ○ Title availability (group censoring) ○ Observation window ○ Frequency of model update ○ Incremental updates vs batch training ■ Stationarity of title popularities ? ? ? ? ?? ?
  • 29. Online learning works great for title cold start scenarios, but... MABs are greedy, not lift-based!
  • 31. Incrementality-based policy ● Goal: Select title for promotion that benefits most from being shown in billboard ○ Member can play title from other sections on the homepage or search ○ Popular titles likely to appear on homepage anyway: Trending Now ○ Better utilize most expensive real-estate on the homepage! ● Define policy to be incremental with respect to probability of play
  • 32. Incrementality-based policy ● Goal: Select title for promotion that benefits most from being shown in billboard t = argmax [ P(ym,t | xm,t , T, b=1) - P(ym,t | xm,t , T, b=0) ] Where b is an indicator for the treatment of a title being shown in billboard (b=1), versus not being shown in billboard (b=0)
  • 33. Offline evaluation: Replay [Li et al, 2010] ● Relies upon uniform exploration data. ● For every record in the uniform exploration log {context, title k shown, reward, list of candidates} ● For every record: ○ Evaluate the trained model for all the titles in the candidate pool. ○ Pick the winning title k’ ○ Keep the record in history if k’ = k (the title impressed in the logged data) else discard it. ○ Compute the metrics from the history.
  • 34. Offline evaluation: Replay [Li et al, 2010] Uniform Exploration Data - Unbiased evaluation Evaluation Data Train Data Trained Model Reveal context x Use reward only if k’ = k Winner title k’ context,title,reward context,title,reward context,title,reward Take Rate = # Plays # Matches
  • 35. Offline replay Greedy exploit has higher replay take rate than incrementality based model…. Incrementality Based Policy sacrifices replay by selecting a lesser known title that would benefit from being shown on the Billboard. Lift in Replay in the various algorithms as compared to the Random baseline
  • 36. Which titles benefit from Billboard promotion? Title A has a low baseline probability of play, however when the billboard is shown the probability of play increases substantially! Title C has higher baseline probability and may not benefit as much from being shown on the Billboard. Scatter plot of incremental vs baseline probability of play for various members.
  • 37. Online observations ● Online take rates for take rates follow the offline patterns. ● Our implementation of incrementality is able to shift engagement within the candidate pool.
  • 39. Correlation, causation, and incrementality Most ML algorithms are correlational, e.g. based on observational data In this context, the Explore-exploit models are causal E.g. we train models based on experimental data, where we are in control of the randomization Incrementality can be defined as the causal lift in a metric of interest For instance, the change in probability of play for a title in a session, when a billboard is shown for that title to a member