Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
MongoDB@Baidu	
  
Xiao Beibei
Project Owner & Senior Developer
Baidu	
  
Who	
  are	
  we?
ü  Largest	
  internet	
  search	
  services	
  in	
  China	
  
ü  Various	
  products,	
  solu=ons	
  &	
  services	
  
ü  NASDAQ:	
  BIDU	
  
Market Cap: 64B
Revenue: 10B
Qtrly Growth: 33.10%
Story	
  between	
  2	
  “Giants”
+
Who	
  am	
  
I?
ü  Senior	
  NoSQL	
  Developer	
  
ü  Various	
  MongoDB	
  project	
  owner	
  
ü  In	
  charge	
  of	
  	
  the	
  LARGEST	
  MongoDB	
  cluster	
  in	
  CHINA	
  
Where	
  MongoDB	
  fits?	
  
Small	
  Step	
  à	
  Big	
  Surprise
l  Start	
  from	
  Baidu	
  Address	
  Book	
  
ü  Small	
  project	
  
ü  Various	
  sources	
  
ü  Flexible	
  schema	
  
l  more	
  than	
  3	
  hundred	
  million	
  
users
Success	
  +	
  Confidence	
  =	
  More	
  Projects
•  Message	
  &	
  Mul=media	
  Message	
  Projects
•  Netdisk	
  picture	
  meta	
  data	
  
•  Facial	
  Recogni=on	
  System	
  
•  User	
  Opera=on	
  Log	
  System	
  
•  Baidu	
  Cloud	
  
•  Baidu	
  Post	
  Bar	
  
…	
  …
ü  Over	
  100	
  businesses	
  
ü  Drive	
  meta	
  data	
  >	
  200B	
  
ü  PB	
  Level	
  
Big	
  MongoDB	
  Cluster
•  Consolidate	
  the	
  entrance	
  
•  All	
  use	
  SSD	
  +	
  raid	
  0	
  
•  Most	
  1	
  Master,	
  2	
  Secondary,	
  2	
  Arbiter	
  
•  Some	
  1	
  Master,	
  2	
  Secondary,	
  1	
  Arbiter	
  
Standard	
  Mongodb	
  Cluster
Standard	
  Mongodb	
  Cluster
….
Rest	
  mongoDB	
  service	
  Api
…mongos
P
S…
A…
P
S…
A…
config
How	
  we	
  use	
  MongoDB?	
  
Throughput	
  !!!
•  All	
  run	
  good,	
  BUT	
  when	
  WRITES	
  >	
  10	
  thousands	
  qps
Query	
  Slow	
  
Writes	
  Timeout
Mongod	
  
Memory	
  Usage	
  
Increase
Reads	
  impact,	
  
Query	
  Slow
Problem
Simple	
  way	
  is	
  the	
  BEST!
Root	
  Cause	
  
Cache	
  Replacement
In	
  3.0,	
  Cache	
  replacement	
  works	
  not	
  quite	
  efficiently
Try	
  to	
  Pilot	
  Upgrade	
  to	
  3.2	
  
Solu=on
Replica=on	
  makes	
  this	
  possible
Problem
Online	
  index	
  crea=on	
  issue	
  
•  Time-­‐Consuming	
  
•  Direct	
  or	
  background	
  
•  Write	
  =meout	
  during	
  crea=ng
Solu=on
•  Crea=ng	
  index	
  in	
  turn	
  
•  Secondary	
  first	
  and	
  primary	
  last	
  
•  Oplog	
  =me	
  
Big	
  Issue
Problem
Why?	
  
•  MongoDB	
  balancer	
  user	
  single	
  thread	
  to	
  move	
  data	
  
•  Cons	
  &	
  Pros
Query	
  
Slow!!!
Data	
  increases	
  rapidly	
  à	
  Clusters	
  increase	
  accordingly	
  
Largest	
  cluster	
  =	
  160	
  shards,	
  2T	
  each
Mi=ga=on
•  Reduced	
  the	
  balancer	
  window	
  from	
  24	
  to	
  6	
  hours,	
  so	
  that	
  it	
  ran	
  in	
  off-­‐
peak	
  hours	
  
•  Good	
  way	
  for	
  a	
  period	
  =me,	
  BUT	
  when	
  more	
  …
•  Shard	
  key:	
  uid	
  or	
  Hash?	
  
•  Pre-­‐alloca=ng	
  chunks	
  
•  Balancer	
  or	
  oplog?
Solu=on
Na=ve	
  Auto	
  Balance
	
  	
  
Config	
  ServerMongos
shard1 shard2
Please	
  receive	
  data
Data	
  Transferring	
  …
Update	
  Chunk	
  Manager Update	
  Chunk	
  Manager
Update	
  Chunk	
  Informa=on
Update	
  Chunk	
  Cache
Delete	
  or	
  Not	
  delete
Incremental	
  data	
  sync
Move	
  certain	
  chunk	
  to	
  shard2	
  
Solu=on
Modified	
  Balancer
Data	
  Transferring	
  …	
  
Update	
  Chunk	
  Manager Update	
  Chunk	
  Manager
Update	
  Chunk	
  Informa=on
Update	
  when	
  WriteBack
Solu=on
Config	
  ServerMongos
shard1 shard2
Itera=on	
  in	
  Detail
IdenFfy	
  a	
  range	
  to	
  be	
  migratedIdentify
Take	
  a	
  note	
  of	
  the	
  current	
  oplog	
  Fme	
  Record
Send	
  a	
  query	
  to	
  source	
  shard,	
  and	
  iterate	
  over	
  the	
  
returned	
   cursor	
   to	
   write	
   matching	
   documents	
   to	
  
the	
  desFnaFon	
  shard	
  
Query
Scan	
  the	
  oplog	
  from	
  the	
  source	
  shard	
  for	
  events	
  
recorded	
  from	
  Fmestamp	
  recorded	
  at	
  the	
  start	
  of	
  
this	
  pass;	
  matching	
  events	
  are	
  then	
  wriLen	
  to	
  the	
  
desFnaFon	
  shard	
  
Scan & Match
When	
  the	
  last	
  oplog	
  event	
  has	
  been	
  applied,	
  the	
  
pass	
  has	
  completed	
  and	
  the	
  worker	
  process	
  can	
  be	
  
stopped	
  
Apply
Summary	
  
Quick	
  Summary
•  Early	
  adop=on	
  makes	
  us	
  
•  100+	
  diverse	
  app	
  &	
  more	
  are	
  coming	
  
•  $$$	
  Cost	
  saving	
  with	
  awesome	
  
scalability	
  
•  Con=nuous	
  improvements	
  =	
  Confidence	
  
•  Add	
  LSM	
  to	
  WT	
  to	
  have	
  beier	
  insert	
  performance	
  
•  Mulitmaster	
  as	
  an	
  op=on	
  
Key	
  Take	
  away
•  Baidu	
  =	
  Big	
  system	
  +	
  Big	
  data	
  +	
  Big	
  challenge	
  
–  We	
  need	
  a	
  strong	
  &	
  scalable	
  DB	
  architecture,	
  MongoDB	
  is	
  fantas=c!	
  
•  Upgrading	
  to	
  3.x	
  is	
  a	
  MUST	
  
–  WT	
  engine,	
  Document	
  valida=on,	
  …	
  	
  
•  Innova=on	
  &	
  Automa=on	
  via	
  customized	
  scripts	
  
	
  	
  	
  	
  	
  MongoDB	
  CAN	
  manage	
  our	
  “BIG	
  DATA”	
  
600	
  nodes	
  
160	
  shards	
  
200	
  B	
  documents	
  
Next	
  Steps
MongoDB:	
  	
  is	
  enhancing	
  balancer	
  performance	
  	
  
Working	
  with	
  MongoDB	
  as	
  the	
  beta	
  tester	
  for	
  the	
  new	
  feature	
  
Enabling	
  parallel	
  chunk	
  migra=on	
   Remove	
  Throiling	
  by	
  Default	
  
(for	
  WiredTiger)	
  
+
Questions?

More Related Content

MongoDB at Baidu

  • 1. MongoDB@Baidu   Xiao Beibei Project Owner & Senior Developer
  • 3. Who  are  we? ü  Largest  internet  search  services  in  China   ü  Various  products,  solu=ons  &  services   ü  NASDAQ:  BIDU   Market Cap: 64B Revenue: 10B Qtrly Growth: 33.10%
  • 4. Story  between  2  “Giants” + Who  am   I? ü  Senior  NoSQL  Developer   ü  Various  MongoDB  project  owner   ü  In  charge  of    the  LARGEST  MongoDB  cluster  in  CHINA  
  • 6. Small  Step  à  Big  Surprise l  Start  from  Baidu  Address  Book   ü  Small  project   ü  Various  sources   ü  Flexible  schema   l  more  than  3  hundred  million   users
  • 7. Success  +  Confidence  =  More  Projects •  Message  &  Mul=media  Message  Projects •  Netdisk  picture  meta  data   •  Facial  Recogni=on  System   •  User  Opera=on  Log  System   •  Baidu  Cloud   •  Baidu  Post  Bar   …  … ü  Over  100  businesses   ü  Drive  meta  data  >  200B   ü  PB  Level  
  • 8. Big  MongoDB  Cluster •  Consolidate  the  entrance   •  All  use  SSD  +  raid  0   •  Most  1  Master,  2  Secondary,  2  Arbiter   •  Some  1  Master,  2  Secondary,  1  Arbiter   Standard  Mongodb  Cluster Standard  Mongodb  Cluster …. Rest  mongoDB  service  Api …mongos P S… A… P S… A… config
  • 9. How  we  use  MongoDB?  
  • 10. Throughput  !!! •  All  run  good,  BUT  when  WRITES  >  10  thousands  qps Query  Slow   Writes  Timeout Mongod   Memory  Usage   Increase Reads  impact,   Query  Slow Problem
  • 11. Simple  way  is  the  BEST! Root  Cause   Cache  Replacement In  3.0,  Cache  replacement  works  not  quite  efficiently Try  to  Pilot  Upgrade  to  3.2   Solu=on
  • 12. Replica=on  makes  this  possible Problem Online  index  crea=on  issue   •  Time-­‐Consuming   •  Direct  or  background   •  Write  =meout  during  crea=ng Solu=on •  Crea=ng  index  in  turn   •  Secondary  first  and  primary  last   •  Oplog  =me  
  • 13. Big  Issue Problem Why?   •  MongoDB  balancer  user  single  thread  to  move  data   •  Cons  &  Pros Query   Slow!!! Data  increases  rapidly  à  Clusters  increase  accordingly   Largest  cluster  =  160  shards,  2T  each
  • 14. Mi=ga=on •  Reduced  the  balancer  window  from  24  to  6  hours,  so  that  it  ran  in  off-­‐ peak  hours   •  Good  way  for  a  period  =me,  BUT  when  more  … •  Shard  key:  uid  or  Hash?   •  Pre-­‐alloca=ng  chunks   •  Balancer  or  oplog? Solu=on
  • 15. Na=ve  Auto  Balance     Config  ServerMongos shard1 shard2 Please  receive  data Data  Transferring  … Update  Chunk  Manager Update  Chunk  Manager Update  Chunk  Informa=on Update  Chunk  Cache Delete  or  Not  delete Incremental  data  sync Move  certain  chunk  to  shard2   Solu=on
  • 16. Modified  Balancer Data  Transferring  …   Update  Chunk  Manager Update  Chunk  Manager Update  Chunk  Informa=on Update  when  WriteBack Solu=on Config  ServerMongos shard1 shard2
  • 17. Itera=on  in  Detail IdenFfy  a  range  to  be  migratedIdentify Take  a  note  of  the  current  oplog  Fme  Record Send  a  query  to  source  shard,  and  iterate  over  the   returned   cursor   to   write   matching   documents   to   the  desFnaFon  shard   Query Scan  the  oplog  from  the  source  shard  for  events   recorded  from  Fmestamp  recorded  at  the  start  of   this  pass;  matching  events  are  then  wriLen  to  the   desFnaFon  shard   Scan & Match When  the  last  oplog  event  has  been  applied,  the   pass  has  completed  and  the  worker  process  can  be   stopped   Apply
  • 19. Quick  Summary •  Early  adop=on  makes  us   •  100+  diverse  app  &  more  are  coming   •  $$$  Cost  saving  with  awesome   scalability   •  Con=nuous  improvements  =  Confidence   •  Add  LSM  to  WT  to  have  beier  insert  performance   •  Mulitmaster  as  an  op=on  
  • 20. Key  Take  away •  Baidu  =  Big  system  +  Big  data  +  Big  challenge   –  We  need  a  strong  &  scalable  DB  architecture,  MongoDB  is  fantas=c!   •  Upgrading  to  3.x  is  a  MUST   –  WT  engine,  Document  valida=on,  …     •  Innova=on  &  Automa=on  via  customized  scripts            MongoDB  CAN  manage  our  “BIG  DATA”   600  nodes   160  shards   200  B  documents  
  • 21. Next  Steps MongoDB:    is  enhancing  balancer  performance     Working  with  MongoDB  as  the  beta  tester  for  the  new  feature   Enabling  parallel  chunk  migra=on   Remove  Throiling  by  Default   (for  WiredTiger)