Sharding in MongoDB
Sharding in MongoDB
Sharding in MongoDB
Sharding is a method for distributing data across multiple machines. MongoDB uses sharding
to support deployments with very large data sets and high throughput operations.
Database systems with large data sets or high throughput applications can challenge the
capacity of a single server. For example, high query rates can exhaust the CPU capacity of
the server. Working set sizes larger than the system's RAM stress the I/O capacity of disk
drives.
There are two methods for addressing system growth: vertical and horizontal scaling.
Vertical Scaling involves increasing the capacity of a single server, such as using a more
powerful CPU, adding more RAM, or increasing the amount of storage space. Limitations in
available technology may restrict a single machine from being sufficiently powerful for a
given workload. Additionally, Cloud-based providers have hard ceilings based on available
hardware configurations. As a result, there is a practical maximum for vertical scaling.
Horizontal Scaling involves dividing the system dataset and load over multiple servers,
adding additional servers to increase capacity as required. While the overall speed or capacity
of a single machine may not be high, each machine handles a subset of the overall workload,
potentially providing better efficiency than a single high-speed high-capacity server.
Expanding the capacity of the deployment only requires adding additional servers as needed,
which can be a lower overall cost than high-end hardware for a single machine. The trade off
is increased complexity in infrastructure and maintenance for the deployment.
Sharded Cluster
A MongoDB sharded cluster consists of the following components:
shard: Each shard contains a subset of the sharded data. Each shard can be deployed
as a replica set.
mongos: The mongos acts as a query router, providing an interface between client
applications and the sharded cluster. Starting in MongoDB 4.4, mongos can support
hedged reads to minimize latencies.
config servers: Config servers store metadata and configuration settings for the
cluster.
The following graphic describes the interaction of components within a sharded cluster:
MongoDB shards data at the collection level, distributing the collection data across the shards
in the cluster.
Shard Keys
MongoDB uses the shard key to distribute the collection's documents across shards. The
shard key consists of a field or multiple fields in the documents.
Next Generation Databases NoSQL, NewSQL, and Big Data by Guy Harrison = Book
Note: When the primary server goes down, the secondary server will become master.
The record of operations kept by the master is called the oplog, short for operation log. The
oplog is stored in a special database called local, in the oplog.$main collection. Each
document in the oplog represents a single operation performed on the master server.
The oplog (operations log) is a special capped collection that keeps a rolling record of all operations
that modify the data stored in your databases.
In Master-Slave in replica set in Failover and Primary Election scenario depends upon the
condition and configuration between Master & Slave like that
1. When the primary server goes down, the secondary server will become master.
2. If the primary goes down, the highest-priority servers will compare how up-to-
date they are.
3. If the Primary Server goes down among the Secondaries the highest-priority most-
up-to-date server will become the new primary.
Using slaves to scale reads in MongoDB is that replication is asynchronous. This means
that when data is inserted or updated on the master, the data on the slave will be out-of-
date momentarily. This is important to consider if you are serving some requests using
queries to slaves.
Note: you should be sure never to write to any database on the slave that is being
replicated from the master. The slave will not revert any such writes in order to
properly mirror the master. The slave should also not have any of the databases that are
being replicated when it first starts up. If it does, those databases will not ever be fully
synced but will just update with new operations.
Books
Guy Harrison, “Next Generation database: NoSQL New SQL and Big Data”, Apress, Ist
Edition, 2015
Daniel G. McCreary and Ann M. Kelly “Making Sense of NoSQL” Manning publisher,