Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

What Is Shazam?

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

What

 is  Shazam?  

 Mobile  phone  based  applica5on  that  listens  


to  music  and  compares  against  a  database.    
 
Shazam’s Method

! Ideas
! Take advantage of music local structures
! Find salient peaks on spectrogram
! Pair peaks to form landmarks for comparison
! Efficient search by hash tables
! Use positions of landmarks as hash keys
! Use song ID and offset time as hash values
! Use time constraints to find matched landmarks
Technology  

! Takes  10  second  sample  


! Catalogs  the  spectrogram  (fingerprint)  
! Compares  against  large  database  
! If  match  is  found,  
! Gives  song  informa5on  
! Offers  immediate    purchase  
Shazam-Industry leader in audio fingerprinting

! For each audio file, generate reproducible landmarks


! Each landmark occurs at a time offset
! For each landmark, generate a “fingerprint” tag that
characterizes its location
Spectrogram and fingerprint

! Spectrogram axes
! x axis: time
! y axis: frequency
! z axis: intensity
! identify frequency of peak intensity

5
Spectrogram and fingerprint

! Spectrogram axes
! x axis: time
! y axis: frequency
! z axis: intensity
! identify frequency of peak intensity

6
Shazam-Industry leader in audio fingerprinting

! Do same for sample

! slide the mask to match a significant number of points

! Generate list of matching fingerprints

! timedb–timesample= Constant
Shazam: Landmarks as Features

Pair peaks in
target zone to
Spectrogram form landmarks

•  Landmark: [t1, f1, t2,


Salient peaks of f2]
spectrogram •  24-bit hash key:
•  f1: 9 bits
•  Δf = f2-f1: 8
bits
•  Δt = t2-t1: 7
bits
•  Hash value:
(Avery Wang, 2003) •  Song ID
•  Landmark’s
start time t1
Shazam: Landmarks as Features

Pair peaks in
target zone to
Spectrogram form landmarks

Salient peaks of
spectrogram

(Avery Wang, 2003)


Shazam: Landmarks as Features

Pair peaks in
target zone to
Spectrogram form landmarks

•  Landmark: [t1, f1, t2, f2]


Salient peaks of •  24-bit hash key:
spectrogram •  f1: 9 bits
•  Δf = f2-f1: 8 bits
•  Δt = t2-t1: 7 bits
•  Hash value:
•  Song ID
•  Landmark’s
start time t1

(Avery Wang, 2003)


hash

! anchor point and target zone


! for each point in target zone
! hash: f1, f2, Δt
! robust to time shifts, noise

11
search

! search for all matching songs

12
2d plot of frequency hits

! if a specific song is hit multiple times:


! 2d plot of frequency hits
! x axis: time from the beginning of the track, whose
frequencies appears in the song
! y axis: time from the beginning of the track, whose
frequencies appears in the sample
! if the files match, matching features should occur at similar
offsets from the beginning of the file

13
Shazam-match

! if the files match, matching features should occur at similar


offsets from the beginning of the file
Shazam-match algorithm

! matching features should occur at similar time offsets


! compute offset: dk = yk – xk
! histogram of dk
! sort dk
! scan for a cluster of values
! match: presence of a statistically significant cluster

15
Shazam-no match
Hash

! no special assumption about the format of the hashes


! hash
! to avoid having too many spurious matches
! being reproducible
! scanning:
! matching hashes to be temporally aligned
! score
! number of matching and time aligned tokens

! from distribution of scores of false positive


! estimate threshold score

17
Technology

! Pair of peaks: peak + anchor


! Hash table searching O(l), collision reduced by having extra key
by orders of magnitude
! Robustness to interference?
Societal Impact

! Introducing people to new music.


! Hear a song once, have it forever
Economic Impact

! Improves music sales


! Music sales are a convenient impulse buy
! More efficient advertisement of music.
! http://www.youtube.com/watch?v=ppJAkN4m9bY
Conclusion  

! Turns  the  world  into  a  record  store  


 
Riferimenti

! A.Wang, “An industrial-strength audio search algorithm”, ISMIR


2003

22

You might also like