3. Who
are
we?
ü Largest
internet
search
services
in
China
ü Various
products,
solu=ons
&
services
ü NASDAQ:
BIDU
Market Cap: 64B
Revenue: 10B
Qtrly Growth: 33.10%
4. Story
between
2
“Giants”
+
Who
am
I?
ü Senior
NoSQL
Developer
ü Various
MongoDB
project
owner
ü In
charge
of
the
LARGEST
MongoDB
cluster
in
CHINA
6. Small
Step
à
Big
Surprise
l Start
from
Baidu
Address
Book
ü Small
project
ü Various
sources
ü Flexible
schema
l more
than
3
hundred
million
users
7. Success
+
Confidence
=
More
Projects
• Message
&
Mul=media
Message
Projects
• Netdisk
picture
meta
data
• Facial
Recogni=on
System
• User
Opera=on
Log
System
• Baidu
Cloud
• Baidu
Post
Bar
…
…
ü Over
100
businesses
ü Drive
meta
data
>
200B
ü PB
Level
8. Big
MongoDB
Cluster
• Consolidate
the
entrance
• All
use
SSD
+
raid
0
• Most
1
Master,
2
Secondary,
2
Arbiter
• Some
1
Master,
2
Secondary,
1
Arbiter
Standard
Mongodb
Cluster
Standard
Mongodb
Cluster
….
Rest
mongoDB
service
Api
…mongos
P
S…
A…
P
S…
A…
config
10. Throughput
!!!
• All
run
good,
BUT
when
WRITES
>
10
thousands
qps
Query
Slow
Writes
Timeout
Mongod
Memory
Usage
Increase
Reads
impact,
Query
Slow
Problem
11. Simple
way
is
the
BEST!
Root
Cause
Cache
Replacement
In
3.0,
Cache
replacement
works
not
quite
efficiently
Try
to
Pilot
Upgrade
to
3.2
Solu=on
12. Replica=on
makes
this
possible
Problem
Online
index
crea=on
issue
• Time-‐Consuming
• Direct
or
background
• Write
=meout
during
crea=ng
Solu=on
• Crea=ng
index
in
turn
• Secondary
first
and
primary
last
• Oplog
=me
13. Big
Issue
Problem
Why?
• MongoDB
balancer
user
single
thread
to
move
data
• Cons
&
Pros
Query
Slow!!!
Data
increases
rapidly
à
Clusters
increase
accordingly
Largest
cluster
=
160
shards,
2T
each
14. Mi=ga=on
• Reduced
the
balancer
window
from
24
to
6
hours,
so
that
it
ran
in
off-‐
peak
hours
• Good
way
for
a
period
=me,
BUT
when
more
…
• Shard
key:
uid
or
Hash?
• Pre-‐alloca=ng
chunks
• Balancer
or
oplog?
Solu=on
15. Na=ve
Auto
Balance
Config
ServerMongos
shard1 shard2
Please
receive
data
Data
Transferring
…
Update
Chunk
Manager Update
Chunk
Manager
Update
Chunk
Informa=on
Update
Chunk
Cache
Delete
or
Not
delete
Incremental
data
sync
Move
certain
chunk
to
shard2
Solu=on
17. Itera=on
in
Detail
IdenFfy
a
range
to
be
migratedIdentify
Take
a
note
of
the
current
oplog
Fme
Record
Send
a
query
to
source
shard,
and
iterate
over
the
returned
cursor
to
write
matching
documents
to
the
desFnaFon
shard
Query
Scan
the
oplog
from
the
source
shard
for
events
recorded
from
Fmestamp
recorded
at
the
start
of
this
pass;
matching
events
are
then
wriLen
to
the
desFnaFon
shard
Scan & Match
When
the
last
oplog
event
has
been
applied,
the
pass
has
completed
and
the
worker
process
can
be
stopped
Apply
19. Quick
Summary
• Early
adop=on
makes
us
• 100+
diverse
app
&
more
are
coming
• $$$
Cost
saving
with
awesome
scalability
• Con=nuous
improvements
=
Confidence
• Add
LSM
to
WT
to
have
beier
insert
performance
• Mulitmaster
as
an
op=on
20. Key
Take
away
• Baidu
=
Big
system
+
Big
data
+
Big
challenge
– We
need
a
strong
&
scalable
DB
architecture,
MongoDB
is
fantas=c!
• Upgrading
to
3.x
is
a
MUST
– WT
engine,
Document
valida=on,
…
• Innova=on
&
Automa=on
via
customized
scripts
MongoDB
CAN
manage
our
“BIG
DATA”
600
nodes
160
shards
200
B
documents
21. Next
Steps
MongoDB:
is
enhancing
balancer
performance
Working
with
MongoDB
as
the
beta
tester
for
the
new
feature
Enabling
parallel
chunk
migra=on
Remove
Throiling
by
Default
(for
WiredTiger)