MongoDB Architecture Guide
MongoDB Architecture Guide
Architecture Guide
The foundational concepts that underpin the
architecture of MongoDB
MongoDB
Architecture Guide
Introduction
Data and software are at the heart of ° Previously separate transactional,
every business. But for many organizations, analytical, search, and mobile
realizing the full potential of the digital workloads are converging to create
economy remains a significant challenge. rich data-driven applications and
Since the inception of MongoDB, we’ve customer experiences. However,
understood that the biggest challenges each workload has traditionally
developers face are related to working been powered by its own database,
with data: creating duplicated data silos
stitched together with fragile ETL
° Demands for higher productivity pipelines, accessed by different
and faster time to market are developer APIs.
being held back by rigid relational
data models that are mismatched To address some of these challenges, non-
to modern code and impose tabular (sometimes called NoSQL or non-
complex interdependencies among relational) databases have been rapidly
engineering teams. adopted over the past decade. But many of
° Organizations are unable to work these NoSQL databases are simply Band-
with, or extract insights from, the Aids, offering a niche set of functionality.
massive and rapidly growing amount
The problem is that typical NoSQL
of data generated by modern
databases do one or two things well. They
applications, including time series,
might offer more flexibility in the data
geospatial, and polymorphic data.
model than traditional databases or scale
° Monolithic and fragile legacy out easily. But to do this, they discard
databases are inhibiting the
the most valuable features of relational
wholesale shift to distributed systems
databases. They often sacrifice data
and cloud computing that deliver
integrity and the ability to work with data in
the resilience and scale demanded
the ways needed to build rich and valuable
by digital business and support new
regulatory demands for data privacy. applications — whether these are new
digital touchpoints with an organization’s
customers, or modernized core back-end
business processes.
2
MongoDB
Architecture Guide
3
MongoDB
Architecture Guide
{
“The most beautiful
“_id”: part is the data model.
ObjectId(“5ad88534e3632e1a35a58d00”),
Everything is a natural
“name”: {
“first”: “John”, JSON document. So for
“last”: “Doe” },
the developers, it is easy
“address”: [
{ “location”: “work”, — really easy — for them
“address”: { to work quickly. They’re
“street”: “16 Hatfields”,
“city”: “London”, spending time on building
“postal_code”: “SE1 8DJ”}, business value rather than
“geo”: { “type”: “Point”, “coord”: [
51.5065752,-0.109081]}}, data modeling.”
], — Filip Dadgar, IT manager,
Toyota Material Handling Europe
4
MongoDB
Architecture Guide
5
MongoDB
Architecture Guide
6
MongoDB
Architecture Guide
Distributed Architecture:
Scalable, Resilient, and
Mission Critical
Through replica sets and native sharding, identify those replica set members
MongoDB enables you to scale out your that have applied the most recent
applications with always-on availability. updates from the primary replica
You can distribute data for low-latency user
° Heartbeat and connectivity status
access while enforcing data sovereignty with the majority of other replica
controls for data privacy regulations such set members
as GDPR.
° User-defined priorities assigned to
replica set members
Availability and data By extending data protection,
protection with replica sets developers can configure replica sets
to provide tunable, multinode durability
MongoDB replica sets enable you to create and geographic awareness. Through
up to 50 copies of your data, which can be MongoDB’s write concern, you can ensure
provisioned across separate nodes, data write operations propagate to a majority
centers, and geographic regions. of replicas in a cluster. With MongoDB 5.0,
the default durability guarantee has been
Replica sets are predominantly designed
elevated to the majority (w:majority) write
for resilience. If a primary node suffers an
concern. Write success will now only be
outage or is taken down for maintenance,
acknowledged in the application once it
the MongoDB cluster will automatically
has been committed and persisted to disk
elect a replacement in a few seconds,
on a majority of replicas.
switching over client connections and
retrying any failed operations for you. Choosing the new default versus the former
w:1 default allows for a stronger durability
The replica set election process is
guarantee, where acknowledged data can
controlled by sophisticated algorithms
survive replica set elections and complete
based on an extended implementation
node failures. The new w:majority default
of the Raft consensus protocol. Before
setting is fully tunable, so you can maintain
a secondary replica is promoted, the
the earlier w:1 default or any custom write
election algorithms evaluate a range of
concern you had previously configured.
parameters including:
You can also create custom write concerns
° Analysis of election identifiers, time that target specific members of a replica
stamps, and journal persistence to
7
MongoDB
Architecture Guide
set, deployed locally and in remote regions. routing queries to a copy of the data
This ensures writes are only acknowledged that is physically closest to the user. With
once custom policies have been fulfilled, sophisticated policies such as hedged
such as writing to at least a primary and reads, the cluster will automatically route
replica in one region and at least one queries to the two closest nodes (measured
replica in a second region. This reduces the by ping distance), returning results from the
risk of data loss in the event of a complete fastest replica. This helps minimize queries
regional failure. waiting on a node that might otherwise be
busy, reducing 95th and 99th percentile
Beyond resilience, replica sets can also be
read latency. Note that hedged reads are
used to scale read operations, intelligently
available in shared clusters only.
8
MongoDB
Architecture Guide
higher scalability across a more diverse set ° Zoned sharding: This allows
of workloads. MongoDB native sharding developers to define specific rules
gives you the following options: governing data placement in a
sharded cluster.
° Ranged sharding: Documents are
partitioned across shards according Beyond vertical and horizontal scaling,
to the shard key value. Documents MongoDB also offers tiered scaling. When
with shard key values close to one working in the cloud, the MongoDB Atlas
another are likely to be co-located Online Archive will automatically tier aged
on the same shard. This approach data out of the database and into cloud
is well suited for applications that
object storage. Archived data remains
need to optimize range-based
fully accessible with federated queries that
queries, such as co-locating data for
span both object and database storage in
customers in a specific region on a
a single connection string. This approach
specific set of shards.
enables you to more economically scale
° Hashed sharding: Documents are data storage by moving it to a lower-cost
distributed according to an MD5 hash
storage tier without losing access to the
of the shard key value. This approach
data and without grappling with slow and
guarantees a uniform distribution of
complex ETL pipelines.
writes across shards, which is often
optimal for ingesting streams of time
series and event data.
Figure 1: Serving always-on, globally distributed, write-everywhere apps with MongoDB Atlas Global Clusters
9
MongoDB
Architecture Guide
10
MongoDB
Architecture Guide
11
MongoDB
Architecture Guide
Document Model
Key Value
Geospatial
Pairs
Relationships Graphs
Objects
Search Mobile
Distributed
Security Architecture
Multi-Cloud
12
MongoDB
Architecture Guide
Real-Time Analytics
Atlas Database allows you to deploy a
The best way to run
read-only analytics node to serve more MongoDB in the cloud
resource-intensive analytics queries.
You can easily target analytics nodes by Atlas Database delivers MongoDB
configuring the read preference, effectively as a pay-as-you-go service billed on
ensuring that analytics queries leveraging an hourly basis. To deploy it, you can
MongoDB’s built-in aggregation pipeline use a GUI or the admin API to select
never contend for database resources the public cloud provider, region,
with your operational workloads. Analytics instance size, and features you need.
nodes, like all read-only nodes within a Atlas Database provides:
MongoDB cluster, do not participate in
° Automated database and
elections and can never be elected to the
infrastructure provisioning
cluster primary. along with auto-scaling, so
teams can get the database
Atlas Search resources they need, when
they need them, and elastically
Atlas Search is built into MongoDB Atlas, scale in response to application
making it easy to build fast, full-text search demands.
capabilities on top of your MongoDB data ° Always-on security to protect
with no need to learn a different API or data, including network
deploy a separate search technology. Atlas isolation, fine-grained access
Search is built on top of Apache Lucene, controls, auditing, and end-to-
the industry standard library. Search end encryption down to the
indexes run alongside the database and level of individual fields.
are automatically kept in sync. Supported ° Certifications with global
search capabilities include fuzzy search, standards for supporting
autocomplete, facets and filters, custom compliance, including ISO
scoring, analyzers for more than 30 27001, SOC 2, and more. Atlas
languages, and more. Database can be used for
workloads subject to HIPAA,
PCI-DSS, or GDPR.
Atlas Data Lake ° Built in replication both within
and across regions for always-
Atlas Data Lake is an on-demand query on availability, even in the face
service that enables you to analyze data of complete regional outages.
in cloud object storage (Amazon S3) in ° Global Clusters for fully
place using the MongoDB Query API. managed, globally distributed
There is no infrastructure to set up or databases that provide low-
manage. Atlas Data Lake automatically
Continued on next page »
13
MongoDB
Architecture Guide
14
MongoDB
Architecture Guide
15
MongoDB
Architecture Guide
16
MongoDB
Architecture Guide
17
MongoDB
Architecture Guide
Getting Started
Every industry is in the midst of digital In this guide we explored the foundational
transformation. Many businesses are concepts that underpin the architecture of
unable to realize the full potential of MongoDB. Other guides on topics such as
their investments because they fail to performance, operations, and security best
modernize their data architecture. As practices can be found at MongoDB.com.
you build or remake your company for a
digital world, speed matters — measured
by how fast you build applications, scale You can get started now with MongoDB by:
them, and gain insights from the data 1. Reviewing the Use Case Guidance
they generate. These are the keys to White Paper to identify applicable
applications that provide better customer use cases for MongoDB.
experiences; enable deeper, data-
2. Spinning up a fully managed
driven insights; and make new products
MongoDB cluster on the Atlas free
or business models possible. MongoDB
tier or downloading MongoDB for
enables you to meet the demands of local development.
modern apps with a complete application
3. Reviewing the MongoDB manuals and
data platform that includes all the
tutorials in our documentation.
complementary services developers need.
Safe Harbor
The development, release, and timing of any features or functionality described for our
products remains at our sole discretion. This information is merely intended to outline our
general product direction, and it should not be relied on in making a purchasing decision, nor is
this a commitment, promise, or legal obligation to deliver any material, code, or functionality.
© 2021 MongoDB, Inc. MongoDB and the MongoDB leaf logo are
registered trademarks of MongoDB, Inc. Published November 2021.
18