Characterizing The File Hosting Ecosystem: A View From The Edge (IFIP Performance 2011)

CharacLerlzlng Lhe llle PosLlng

LcosysLem A vlew from Lhe Ldge

An|ket Mahant| Un|vers|ty of Ca|gary Canada
Carey Wllllamson unlverslLy of Calgary Canada
nlklas Carlsson Llnkplngs unlverslLeL Sweden
MarLln ArllLL P Labs uSA
Anlrban MahanLl nlC1A AusLralla
W usergeneraLed conLenL has Lransformed how
people share and dlssemlnaLe lnformaLlon
W Web has wlLnessed emergence of flle hosLlng
opular servlces lnclude 8apldShare and Megaupload
W llle hosLlng servlces offer several advanLages over
2 flle sharlng and newage conLenL sharlng
Convenlence hlgher flle avallablllLy lmproved prlvacy
dlverse conLenL and economlc lncenLlves
W llle hosLlng servlces offer dlfferenLlaLed servlces
for free and premlum users
0 3 10
lpoque 07 (Cermany)
lpoque 07 (Mld LasL)
Sandvlne 09 (Clobal)
lpoque 09 (Clobal)
Maler eL al lMC 09 (Luropean lS)
AnLonlades eL al lMC 09 (Creek unlv)
||e nost|ng 1raff|c ( of Internet 1raff|c Vo|ume)
LabovlLz eL alSlCCCMM 10
2 traff|c (71)
LabovlLz eL alSlCCCMM 10
||e host|ng traff|c
Growth between May]09 and May]10
1ype S|te Users () V|s|ts ()
8apldShare 61 79
Megaupload 128 203
?ou1ube 28 68
lacebook 43 69
llle PosLlng LcosysLem
W resenL a comprehenslve longlLudlnal
characLerlzaLlon sLudy of flle hosLlng servlces
uaLa collecLed from a campus neLwork over 1year
W erform mulLllevel analysls of usage behavlour
lnfrasLrucLure conLenL characLerlsLlcs and
userpercelved performance
W Analyze ln deLall Lhe 1op5 setvlces by Lrafflc
W P11 LransacLlons Lrace collecLed from a large
campus neLwork wlLh 33k users beLween !an/09
and uec/09
W used 8ro's parslng capablllLles Lo summarlze
P11 requesLresponse palrs ln real Llme
AppllcaLlon layer (P11 headers) and LransporL layer
(byLes Lransferred)
W user ldenLlflable lnformaLlon such as l and
cookles noL sLored
W AggregaLed Lhe Lraces uslng Lhe P11 PosL
header for Lhe 1op3 servlces
uaLa Analysls (P11 1ransacLlons)
W uevelop slgnaLures Lo dlsLlngulsh free and
premlum users by leveraglng cllcksLreams
W Lxample shows how walL Llme ls calculaLed
1race Cvervlew
Character|st|c 1opS Serv|ces
1oLal P11 1ransacLlons 6 mllllon
number of days of user acLlvlLy 349 (96 of Lhe year)
number of flles downloaded 93143
1oLal slze of downloaded flles 8 18
llle uploaded 393
1oLal slze of uploaded flles 393 C8
remlum flle downloads 40
lree flle downloads 60
Average flle slze 106 M8
Average conLenL slze 176 M8
Campus usage
W Conslderable growLh ln usage for mosL
W Ma[orlLy of Megaupload and 8apldShare flles
downloaded by premlum users
1lme aLLerns
W usage ls skewed Lowards Lhe evenlng
W unlform usage durlng Lhe week due Lo users
ln Lhe campus resldence
W Clear lncrease ln flle hosLlng Lrafflc volume
Lowards Lhe end of Lhe calendar year
Server LocaLlon
W Servers locaLed uslng Lhe delaybased based
geolocaLlon (shorLesL plng)
W LxcepL for Megaupload all oLher servlces have
cenLrallzed server locaLlons
Server lnfrasLrucLure
Serv|ce nost|ng company or
Locat|on ]24
nost I
8apldShare CogenL Cerman
1elecom Clobal
Crosslng 1A1A Level3
Cermany 48 8700
Megaupload CarpaLhla LeaseWeb uS/Canada/
27 938
zSPA8L Choopa uS 2 97
Medlallre CogenL Llnk8lghL uS 3 837
PoLflle Lemurla WZ Comm uS 6 178
ConLenL 1ype
W lor 8apldShare and PoLflle mosL conLenL was archlve
W Megaupload had hlgher proporLlon of vldeo conLenL
whlle Medlallre had large number of M3 flles
W MosL zSPA8L conLenL was sLreamlng
W lurLher analysls of archlve conLenL revealed LhaL Lhey
were mosLly audlo/vldeo flles
ConLenL lragmenLaLlon and Slze
W ConLenL ls hlghly fragmenLed across flle
hosLlng servlces (3080)
W ConLenL hosLed on 2 ls an order of
magnlLude larger Lhan flle hosLlng servlces
ConLenL Sources
W P11 8eferer header used Lo caLegorlze
conLenL sources
W revalence of a wlde varleLy of sources
lncludlng porLals blogs and forums
uownload 8aLes
W remlum download raLes are an order of
magnlLude hlgher Lhan free download raLes
W lree download raLes are hlgher Lhan 2
W Megaupload and Medlallre offer conslderably
hlgher download raLes for free users
llle AvallablllLy
W Crawled 300k flles from an lndexlng slLe and
recorded Lhe daLe Lhe flle was uploaded
W Medlan flle avallablllLy ls 4 monLhs hlgher
Lhan 2 buL lower Lhan vldeo sharlng slLes
such as ?ou1ube
W resenLed Lhe largesL characLerlzaLlon sLudy of
Lhe flle hosLlng ecosysLem
W PlghllghLed sallenL feaLures of Lhe ecosysLem and
ldenLlfled slmllarlLles and dlfferences beLween
W 8esulLs lndlcaLe LhaL Lhe flle hosLlng ecosysLem ls
W 8esulLs also suggesL LhaL Lhe economlc model
based on adverLlsemenL and subscrlpLlon ls

