Algorithmic Music Recommendations at Spotify

Algorithmic Music
Discovery at Spotify
Chris Johnson
@MrChrisJohnson
January 13, 2014

Monday, January 13, 14

Who am I??
•Chris Johnson

– Machine Learning guy from NYC
– Focused on music recommendations
– Formerly a graduate student at UT Austin


What is Spotify?

•
•

On demand music streaming service
“iTunes in the cloud”


3

Section name


4

Data at Spotify....
• 20 Million songs
• 24 Million active users
• 6 Million paying users
• 8 Million daily active users
• 1 TB of compressed data generated from users per day
• 700 node Hadoop Cluster
• 1 Million years worth of music streamed
• 1 Billion user generated playlists


5

Challenge: 20 Million songs... how do we
recommend music to users?


6

Recommendation Features
• Discover (personalized recommendations)
• Radio
• Related Artists
• Now Playing


7

8

How can we find good
recommendations?
• Manual Curation

• Manually Tag Attributes

• Audio Content,
Metadata, Text Analysis

• Collaborative Filtering


Collaborative Filtering - “The Netflix Prize”


9

Collaborative Filtering

10

Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!

Then you should check out
track P!

Nice! Btw try track T!

Image via Erik Bernhardsson

Section name


11

Difference between movie and music recs

•

Scale of catalog

60,000 movies


20,000,000 songs

12


•

Repeated consumption


13


•

Music is more niche


14

“The Netflix Problem” Vs “The Spotify Problem

•Netflix:

Users explicitly “rate” movies

•Spotify:

Feedback is implicit through streaming behavior


15

Section name


16

Explicit Matrix Factorization

•Users explicitly rate a subset of the movie catalog
•Goal: predict how users will rate new movies
Movies

Users
Chris
Inception


17

Explicit Matrix Factorization

18

•Approximate ratings matrix by the product of lowdimensional user and movie matrices
Minimize RMSE (root mean squared error)

•

?
1
2
?
5

•
•
•

3
?
?
?
2

5
?
3
?
?

?
1
2
5
4

= user
= user

rating for movie
latent factor vector

= item



X

Y
Inception
Chris

•
•
•

= bias for user
= bias for item
= regularization parameter

Implicit Matrix Factorization

19

•Replace Stream counts with binary labels
– 1 = streamed, 0 = never streamed

•Minimize weighted RMSE (root mean squared error) using a
function of stream counts as weights

10001001
00100100
10100011
01000100
00100100
10001001

•
•
•
•

= 1 if user
= user
=i tem


streamed track

X

else 0

Y

•
•
•

= bias for user
= bias for item
= regularization parameter

Alternating Least Squares

• Initialize user and item vectors to random noise

• Fix item vectors and solve for optimal user vectors

– Take the derivative of loss function with respect to user’s vector, set
–

equal to 0, and solve
Results in a system of linear equations with closed form solution!

• Fix user vectors and solve for optimal item vectors
• Repeat until convergence
code: https://github.com/MrChrisJohnson/implicitMF

20


• Note that:
• Then, we can pre-compute
–
–

once per iteration

and
only contain non-zero elements for tracks that
the user streamed
Using sparse matrix operations we can then compute each user’s
vector efficiently in
time where
is the number of
tracks the user streamed


21



22

How do we use the learned vectors?

•User-Item score is the dot product

•Item-Item similarity is the cosine similarity

•Both operations have trivial complexity based on the number of
latent factors


23

Latent Factor Vectors in 2 dimensions


24

Section name


25

Scaling up Implicit Matrix Factorization
with Hadoop


26

Hadoop at Spotify 2009


27

Hadoop at Spotify 2014
700 Nodes in our London data center


28

Implicit Matrix Factorization with Hadoop
Map step

29

Reduce step

item vectors
item%L=0

item vectors
item%L=1

user vectors
u%K=0

u%K=0
i%L=0

u%K=0
i%L=1

...

u%K=0
i % L = L-1

u%K=0

user vectors
u%K=1

u%K=1
i%L=0

u%K=1
i%L=1

...

...

u%K=1

...

...

...

...

u % K = K-1
i%L=0

...

...

u % K = K-1
i % L = L-1

user vectors
u % K = K-1

item vectors
i % L = L-1

u % K = K-1

all log entries
u%K=1
i%L=1

Figure via Erik Bernhardsson

Implicit Matrix Factorization with Hadoop

30

One map task
Distributed
cache:
All user vectors
where u % K = x
Distributed
cache:
All item vectors
where i % L = y

Mapper

Emit contributions

Reducer

New vector!

Map input:
tuples (u, i, count)
where
u%K=x
and
i%L=y


Implicit Matrix Factorization with Spark

31

Spark

Vs
Hadoop

http://www.slideshare.net/Hadoop_Summit/spark-and-shark

Section name


32

Approximate Nearest Neighbors

code: https://github.com/Spotify/annoy

33

Ensemble of Latent Factor Models

34


AB-Testing Recommendations


35

Open Problems

•How to go from predictive model to related artists? (learning

to rank?)
How do you learn from user feedback?
How do you deal with observation bias in the user feedback?
(active learning?)
How to factor in temporal information?
How much value in content based recommendations?
How to best evaluate model performance?
How to best train an ensemble?

•
•
•
•
•
•


36

Section name

37

Thank You!


Section name


38

Section name


39

Section name


40

Section name


41

Section name


42

Algorithmic Music Recommendations at Spotify

More Related Content

Algorithmic Music Recommendations at Spotify