Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

WP SQL To Nosql Architectur Differences Considerations Migration 1+ (6) - 1641371845027

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13
At a glance
Powered by AI
Some of the key differences discussed are economies of scale, data replication, consistency models, and query languages between SQL and NoSQL databases.

SQL databases are vertically scalable and have a rigid schema while NoSQL databases are designed for horizontal scalability and have more flexible data models.

Factors to consider for migration include data models, query patterns, secondary indexes, and converting between SQL and NoSQL database administration.

SCYLLADB WHITE PAPER

SQL to NoSQL:
Architecture Differences
and Considerations
for Migration

RDBMS NoSQL
CONTENTS
SQL VERSUS NOSQL BASICS 3

ARCHITECTURAL DIFFERENCES BETWEEN SQL AND NOSQL 4

ECONOMIES OF SCALE 4
REPLICAS OF DATA 5
APPLICATION-DRIVEN USE CASES 5
CONSISTENCY VERSUS AVAILABILITY 6
ACID VERSUS BASE CONSISTENCY 6
LIGHTWEIGHT TRANSACTIONS 7

QUERY LANGUAGES: SQL VERSUS CQL 8


JOINS 8

VALUES VERSUS OBJECTS 8

SCALING CHARACTERISTICS 8

CONSIDERATIONS FOR SQL TO NOSQL MIGRATIONS 8


DATA MODELS 8

QUERY PATTERNS 9
MATERIALIZED VIEWS 10

SECONDARY INDEXES 10

REFERENTIAL INTEGRITY 10

MIGRATION TO NOSQL 10
HYBRID CONVERSION TO NOSQL 10
DATA FORKLIFTING 10

DATA VALIDATION 10

SQL AND NOSQL DATABASE ADMINISTRATION 10


PERFORMANCE MONITORING AND TUNING 10

BACKUP AND RECOVERY 11

NODE REPAIR AND REPLACEMENT 12

SCYLLA NOSQL: SCALE-UP OF RDBMS


AND HIGH AVAILABILITY OF NON-RELATIONAL 12
SQL and NoSQL: Understanding the Tradeoffs between
Flexibility, Scale and Cost
When and how to migrate data from SQL to NoSQL are matters of much debate. It can
certainly be a daunting task, but when your SQL systems hit architectural limits or your
cloud provider expenses skyrocket, it’s probably time to consider a move.

SQL VERSUS NOSQL BASICS many organizations have been reevaluating their
use of traditional relational databases.
Since their invention in 1970 by Edgar Codd,
relational databases have served as the default Cloud computing exposed many limitations of
data store for almost every IT organization, large relational databases. RDBMSs proliferated in
or small. Today, the most iconic and familiar an age when databases were isolated islands
relational databases include IBM DB2, Oracle with relatively stable user bases, running in a
Database, Microsoft SQL Server, PostgreSQL, traditional client-server configuration. RDBMSs
and MySQL. support arbitrary reshaping and joining of
data, but performance can be variable and
Structured Query Language, or SQL, was unpredictable. In restricted environments, such
invented at IBM soon after the introduction of variable performance characteristics could
the relational database. Since its introduction, be managed. The shift to a mobile, globally
SQL has become the most widely used dispersed user base caught many organizations
database language, used for querying data, data off guard.
manipulation (insert, update and delete), data
definition (schema creation and modification), During the same time period, consumer
and data access control. Though the terms refer demands shifted radically. Today, users expect
to different technologies, ‘SQL’ and ‘RDBMS’ low-latency applications that deliver an
have become virtually interchangeable. Though extremely responsive experience, regardless
some non-relational databases support SQL, of the user’s location. Apps that are slow
the term “SQL database” generally means a and unresponsive contribute significantly to
relational database. customer churn. Predictable performance
became more important than the semantic
During the decades in which relational databases flexibility afforded by RDBMSs.
proliferated, data entry was largely a manual
process. Times have changed. The advent of Latency issues can be addressed by shifting
smartphones, the ‘app economy,’ and cloud data closer to the customer. To meet this
computing in the late 2000s caused a seachange need, data must be replicated across different
in the workloads, query types, and traffic geographic locations. Such geographical
patterns needed to support a global user base. replication turned out to be a struggle for
RDBMSs. While RDBMSs are not fit for
Fast forward to 2020, when people, smart- distributed deployments, non-relational
devices, sensors, and machines emit continuous databases are designed specifically to support
streams of data, such as user activity, IoT and such topologies.
machine-generated data, and metadata that
encompasses geolocation and telemetry. As A number of alternative non-relational database
early as 2013, one researcher noted that 90% systems have been proposed, including Google’s
of the world’s data had been generated over Bigtable (2006) and Amazon’s Dynamo (2007).
the previous two years. This trend has only The papers for these projects paved the way
accelerated. In response to this torrent of data, for Cassandra (2008) and MongoDB (2009).

3
SQL NoSQL
Relational Database Management
Key-Value Graph
Systems (RDBMS)

Online Analytical Processing (OLAP) Cube Document Column store

SQL vs NoSQL

Today, a range of mature NoSQL databases are In contrast, NoSQL databases are designed
available to help organizations scale big data for low latency and high resilience, being built
applications. from the ground up to run across clusters of
distributed nodes. This architecture is often
Yet, despite their origins in a long-forgotten
referred to as ‘horizontal scale,’ or ‘scaling
technology cycle, relational SQL databases
out.’ To add capacity to a NoSQL database,
are by no means ‘legacy’ technology. Some
administrators simply add more nodes, a very
SQL databases, notably PostgreSQL and
simple process in modern cloud environments.
MySQL, have experienced a recent resurgence
in popularity. A new generation of NewSQL In a NoSQL cluster, nodes are easy to add and
databases, notably Google Spanner and remove according to demand, providing ‘elastic’
CockroachDB, leverage SQL as a query language capacity. This feature enables organizations
and offer a distributed architecture similar to align their data footprint with the needs of
to that of NoSQL databases yet provide full the business while maintaining availability even
transactional support. in the face of seasonal demand spikes, node
failures, and network outages.
The horizontal scale of NoSQL brings tradeoffs
ARCHITECTURAL DIFFERENCES of its own. Adding commodity hardware to
BETWEEN SQL AND NOSQL a cluster can be cheap in terms of software
licenses and subscriptions. However, as more
ECONOMIES OF SCALE and more nodes are added in the pursuit of
higher throughput and lower latency, operational
Database administrators add capacity to RDBMS
overhead and administrative costs spike. Big
and NoSQL databases in very different ways.
clusters of small instances demand more
Typically, the only way to add capacity in a
attention and generate more alerts than small
relational system is to add expensive hardware,
clusters of large instances.
faster CPUs, more RAM, and more advanced
networking components. This is often referred to (Notably, some next-generation NoSQL
as ‘vertical’ scale, or ‘scaling up.’ databases like Scylla are able to overcome this
tradeoff, scaling out in a way that can take

4
advantage of the powerful hardware in modern In a masterless architecture, no single node
servers and ultimately running in smaller, though can bring down an entire cluster. A typical
still distributed, clusters of fewer nodes.) masterless topology involves three or more
replicas for each dataset. Adopting a NoSQL
REPLICAS OF DATA database that implements a masterless
architecture provides yet another layer
Replicating data across multiple nodes allows of resilience for high-volume, low-latency
databases to achieve higher levels of resilience. applications.
In the RDBMS world, it’s not trivial to replicate
data across multiple instances. Relational
databases do not support replication. Instead,
APPLICATION-DRIVEN USE CASES
they rely on external tools to extract and update The rise in popularity of NoSQL databases
copies of datasets. These tools run batch paralleled the adoption of agile development
processes that often take hours to complete. and DevOps practices. Unlike RDBMSs, NoSQL
As a result, there is no way to ensure real-time databases encourage ‘application-first’ or
synchronization of data among the copies of API-first development patterns. Following
data. these models, developers first consider queries
that support the functionality specific to an
While non-relational databases provide native
application, rather than considering the data
support for data replication, they follow three
models and entities. This developer-friendly
basic models: multi-master databases, such as
architecture paved the path to the success of
DynamoDB, master-slave architectures, such as
the first generation of NoSQL databases.
MongoDB, and masterless, such as Scylla. Given
their reliance on master nodes, both multi- In contrast, relational databases impose fairly
master and master-slave architecture introduce rigid, schema-based structures to data models;
a point of failure. When a master goes down, the tables consisting of columns and rows, which
process of electing a new master introduces a can be joined to enable ‘relations’ among
brief downtime. Even though the delay may be entities. Each table typically defines an entity.
minimal, measured in milliseconds, that delay Each row in a table holds one entry, and each
can still cause SLA violations. column contains a specific piece of information
for that record. The relationships among tables
A masterless architecture addresses this
are clearly defined and usually enforced by
limitation. In these databases, data is replicated
schemas and database rules.
across multiple nodes, all of which are equal.

Node

Node Node

Cluster

Shards (per core)

Node Node

Masterless Architecture in Scylla

5
Relational data models enforce uniformity, A
whereas non-relational models do not. NoSQL
databases permit multiple ‘shapes’ of data Availability

objects to coexist, which is more flexible but


SCYLLA
can also be more error prone. In the world
of relational databases, the schemas that
support uniformity are usually managed by Consistency Partition-
Tolerance
database administrators. This can sometimes
introduce friction between administrators and C P
development teams, resulting in long, non-agile
application development lifecycles. Such highly CAP Theorem
structured data requires normalization to reduce
redundancy. Since the data model is based on Another way of putting this is that the CAP
the entity being represented; query patterns are theorem dictates that any data store brings
a secondary consideration. with it a fundamental trade-off. As such, many
databases are referred to as CP (consistent
NoSQL inverts this approach, placing more and partition tolerance, but not available)
power in the hands of the developer and often or AP (available and partition-tolerant, but
decentralizing control over data structures. Non- not consistent). In CAP terms, the critical
relational data models are flexible, and schema trade-off that distinguishes relational and non-
management is often delegated to application relational data stores is between availability and
developers, who are relatively free to adapt data consistency. SQL data stores sacrifice availability
models independently. Such a decentralized in favor of data consistency. NoSQL data stores
approach can accelerate development sacrifice consistency in favor of availability.
cycles and provide a more agile approach to
addressing user requirements. It is important to note that the CAP theorem
has come under significant criticism. Martin
Kleppmann, in particular, has written a
CONSISTENCY VERSUS AVAILABILITY
comprehensive Critique of the CAP Theorem.
A consideration of the architectural differences So, it is important to keep in mind that the
between relational and non-relational databases theorem is merely a simplified model for
would not be complete without the CAP understanding a very complex topic.
theorem. The CAP theorem was formulated by
Eric Brewer in 2000, as a way of expressing ACID VERSUS BASE CONSISTENCY
the key tradeoffs in distributed systems. The
CAP theorem states that it is impossible for a One of the defining tradeoffs between relational
distributed data store to provide more than two and non-relational datastores is in the type
of the following three guarantees: of consistency that they provide. In simple
terms, RDBMS provides strong consistency,
• Consistency: Every read receives either the while NoSQL databases provide a weaker form.
most recent write or an error. Consistency in general refers to a database’s
• Availability: Every request receives a response ability to process concurrent transactions while
that is not an error, but with no guarantee that preserving the integrity of the data. Somewhat
it contains the most recent write. confusingly, ‘consistency’ as defined in the
CAP theorem has a different, though related,
• Partition Tolerance: The system continues to
meaning than the consistency discussed in this
operate even when an arbitrary number of
section. The definition used by Brewer in the
messages are delayed, dropped or reordered
CAP theorem derives from distributed systems
among nodes.
theory, while the definition used in this section
derives from database theory.

6
In simple terms, consistency is a guarantee that to run and maintain. It should be noted some
a read should return the result of the latest RDBMS systems enable performance to be
successful write. This seems simple, but such improved by relaxing ACID guarantees. Still,
a guarantee is incredibly difficult to deliver all SQL databases are ACID compliant to
without impacting the performance of the varying degrees, and as such, they all share
system as a whole. In a relational database, this downside. The practical effect of ACID
a single data item is actually split across compliance is to make it extraordinarily difficult
independent registers that must agree with one and expensive to achieve resilient, distributed
another. Thus, a single database write is actually SQL database deployments.
decomposed into several small writes to these
In contrast to RDBMS’ ACID guarantees, NoSQL
registers, which must be completed and visible
databases provide so-called ‘BASE guarantees.’
when the read is executed. With concurrent
BASE enables availability and relaxes the
operations running against the database, the
stringent consistency. The acronym BASE
semblance of order between the group of
designates:
sub-operations needs to be maintained; the
concurrent operations must be atomic. ACID • Basic Availability: Data is available most of
consistency means the rules of relations must the time, even during a partial system failure.
be satisfied. In a globally distributed database • Soft state: Individual data items are
topology, which involves multiple clusters independent and do not have to be consistent
each containing many nodes the problem with each other.
of consistency becomes exponentially more
• Eventual consistency: Data will become
complex.
consistent at some unspecified point in the
In general, relational databases that support future.
‘strong consistency’ provide ‘ACID guarantees.’
As such, NoSQL databases sacrifice a degree
ACID is an acronym designed to capture the
of consistency in order to increase availability.
essential elements of a strongly consistent
Rather than providing strong consistency,
database. The components of the ACID are as
NoSQL databases generally provide eventual
follows:
consistency. A data store that provides
• Atomicity: Guarantees that each transaction BASE guarantees can occasionally fail to
is treated as a single “unit”, which either return the result of the latest write, providing
succeeds completely or fails completely. different answers to applications making
• Consistency: Guarantees that each transaction requests. Developers building applications
only changes affected data in permitted ways. against eventually consistent data stores
often implement consistency checks in their
• Isolation: Guarantees that the concurrent
application code.
execution of transactions leaves the database
in the same state that would have been Lightweight transactions
obtained if the transactions were executed In a traditional SQL RDBMS, a “transaction”
sequentially. is a logical unit of work — a group of tasks
• Durability: The transactions results are that provides the ACID guarantees discussed
permanent, even in the event of system failure. above. To compensate for relaxed consistency,
some NoSQL databases offer ‘lightweight
ACID compliance is a complex and often
transactions’ (LWTs).
contested topic. In fact, one popular system
of analysis, the Jepsen test, is dedicated to Lightweight transactions are limited to a single
verifying vendor consistency claims. conditional statement, which enables an atomic
“compare and set” operation. Such an operation
By their nature, ACID-compliant databases are
checks whether a condition is true before it
generally slow, difficult to scale, and expensive
conducts the transaction. If the condition is not

7
met, the transaction is not executed. (For this Values versus objects
reason, LWTs are sometimes called ‘conditional Query results are also returned differently. SQL
statements’). LWTs do not truly lock the natively returns data-typed values, usually to
database for the duration of the transaction; be read into an object one field at a time. In
they only ‘lock’ a single cell or row. LWTs contrast, CQL natively returns complete objects,
leverage a consensus protocol such as Paxos often serialized in extensible markup language
to ensure that all nodes in the cluster agree (XML) or Javascript object notation (JSON). This
the change is committed. In this way, LWTs can makes applications responsible for parsing these
provide sufficient consistency for applications objects to obtain the desired result of a query.
that require the availability and resilience of a
distributed database. Scaling characteristics
In NoSQL, data is stored across nodes in a
QUERY LANGUAGES: SQL VERSUS CQL cluster based on a token range, which is a
hashed value of the primary key. By using token
As we’ve noted, relational databases are defined ranges, NoSQL databases enable objects to
in part by their use of the Structured Query be stored on different nodes. CQL queries are
Language (SQL). In contrast, NoSQL databases inherently more scalable than SQL queries,
employ a host of alternative query languages having been specifically designed to query
that have been designed to support diverse across a horizontally distributed cluster of
application use cases. A partial list includes servers, rather than a single database at a time.
MongoDB Query Language (MQL), Couchbase’s
N1QL, Elasticsearch’s Query DSL, Microsoft
Azure’s Cosmos DB query language, and
CONSIDERATIONS FOR SQL TO NOSQL
MIGRATIONS
Cassandra Query Language (CQL).
In this paper, we will focus on the most widely Data models
used NoSQL query language, CQL. While CQL SQL data models follow a normalized design;
is the primary language for communicating different but related pieces of information are
with Apache Cassandra, it is also supported by stored in ‘relations,’ which are separate logical
a range of familiar NoSQL databases. Common tables connected by joins. NoSQL databases
CQL-compliant databases include Scylla, use denormalized data models, in which
DataStax Enterprise, Microsoft’s cloud-native redundant copies of data are added as needed
Azure Cosmos DB, and Amazon Keyspaces. by the consuming applications. The point of
denormalization is to increase performance
CQL’s similarity to SQL enables developers
and lower latency since the joins involved
to move between the languages with relative
in normalized data models can introduce
ease. A few distinctions between SQL and CQL
significant performance overhead, especially in
include:
distributed topologies.
Joins When migrating from SQL to NoSQL, the
SQL and CQL share similar statements to store primary key in the relational table becomes the
and modify data, such as Create, Alter, Drop, and partition key in the NoSQL table. If the RDBMS
Truncate commands, but unlike SQL, CQL is not table must be joined to additional tables to
designed to support joins between tables. In CQL, retrieve the business object, those closely
relations are implemented within the application, related tables should combine into a single
rather than within the database query. NoSQL table. The NoSQL cluster ordering key
determines the physical order of records, so it
should be a unique value (often a composite
value) that would be useful for searching.

8
Project Project Project Project Employee Employee Department Department Hourly
Code Name Manager Budget No. Name No. Name Rate

PC010 Reservation Mr. Ajay 120500 $100 Mohan D03 Database 21.00
System

PC010 Reservation Mr. Ajay 120500 $101 Vipul D02 Testing 16.50
System

PC010 Reservation Mr. Ajay 120500 $102 Riyaz D01 IT 22.00


System

PC011 HR System Mrs. Charu 500500 $103 Pavan D03 Database 18.50

PC011 HR System Mrs. Charu 500500 $104 Jitendra D02 Testing 17.00

PC011 HR System Mrs. Charu 500500 $315 Pooja D01 IT 23.50

PC012 Attendance Mr. Rajesh 710700 $137 Rahul D03 Database 21.50
System

PC012 Attendance Mr. Rajesh 710700 $218 Avneesh D02 Testing 15.50
System

PC012 Attendance Mr. Rajesh 710700 $109 Vikas D01 IT 20.50


System

Denormalized data

Partition Key Clustering Key Value

268e074a-a801476c-8db5- 2011 02 03 04:05:00:


81
276eb2283b03 heart_rate

fead97e9 4d77 40c9-ba15- 2011 02 03 04:05:05:


80
c45478542e20 heart_rate

2011 02 03 04:05:10:
heart_rate
89 Rows
Partitions
Sorted by time

2011 12 17 09:21:00:
84
heart_rate

47045a b-fd1144c6 9d0f- 2011 02 03 04:05:00:


83
82428434e887 heart_rate

Example of Partition

Of course, partition keys and cluster ordering QUERY PATTERNS


keys are not the only way data is queried.
Relational databases are organized around data
Additional indexes on the relational table
structures and relationships. In contrast, NoSQL
provide the basis for secondary indexes or
databases are organized around query patterns.
materialized views, in order to support an
As noted above, the NoSQL partition key can
application’s search and filtering requirements.
be mapped to a primary key in an RDBMS.

9
Secondary keys and indexes can be added tables and storage for streaming data, especially
later. A UNIQUE constraint in a SQL database in the context of event-driven architecture (EDA),
becomes a good candidate for a cluster are good candidates to migrate to NoSQL.
ordering key in NoSQL.
Data forklifting
Materialized views Tools like Apache Kafka can facilitate the
Common, frequent queries against a database process of migrating existing data from an
can become expensive. When the same RDBMS to NoSQL. Depending on the complexity
query is run again and again, it makes sense of the conversion, more comprehensive
to ‘virtualize’ the query. Materialized views operations may be needed. Tools such as
address this need by enabling common queries Apache Spark, a lightning-fast unified analytics
to be represented by a database object that is engine for big data and machine learning, can
continuously updated as data changes. be used to enable such data conversions.

Secondary indexes For key-value migrations, the forklifting process


is trivial. For a document model with hierarchical
Secondary indexes enable queries to run against
entities (order lines, for example) the process of
the main table using indexed values, as in an
building the new value from the old table joins
RDBMS, but it is actually implemented as a
can become much more involved.
materialized view. The application is isolated
from having to query the secondary index Data validation
directly.
Following a migration, teams need to validate
Referential integrity that data was migrated in the correct form.
Missing or truncated data, which has the
Referential integrity ensures that no references
potential to degrade system capabilities,
between tables are broken, as occurs when a
may not be immediately obvious, making
foreign key references a non-existent entry. A
comprehensive testing essential.
lack of referential integrity in a database can
result in incomplete query responses, usually During the validation process, the old and
failing quietly, with no indication of an error. new data stores can be run in parallel; new
Relational databases are designed to enforce data is added or updated in both datastores
referential integrity. NoSQL databases shift the simultaneously. Reports can be run against both
responsibility for making sure that objects are systems and compared for accuracy. Running
complete and correct to the API, which checks some queries across a large range of data will
entities when loading or saving them. help to find differences that could be magnified
across aggregated data.
MIGRATION TO NOSQL
SQL AND NOSQL DATABASE
Hybrid conversion to NoSQL ADMINISTRATION
Rather than migrating an entire RDBMS to a
NoSQL database, some applications benefit Performance Monitoring and Tuning
from leaving some data on a relational database, As development teams become smaller and
while moving a subset of data to a NoSQL more agile, they are also increasingly sensitive
database. A hybrid solution that spans two to database maintenance and administrative
database types can offer the best of both overhead. On application-focused teams,
worlds. For example, in some deployments, database experts are becoming less and less
customer account information, which is common. Traditional database administration
infrequently updated, will be stored in an and maintenance responsibilities are often
RDBMS, while transactional or streaming data, rolled into ‘full-stack’ developer and DevOps
such as IoT sensor data or telemetry, might be positions. Operational overhead is often in direct
stored in a NoSQL database. Large, growing competition with product development efforts.

10
For these reasons, the choice of a database by adding new nodes running on inexpensive
must take into account the expertise of the commodity servers. But within the family of
organization and the need or desire to build up NoSQL databases, these two capabilities vary
internal expertise around a given technology. considerably.
The ongoing maintenance of a database Some NoSQL databases also require expert
requires close monitoring and frequent administrators with detailed knowledge of
performance tuning. As datasets grow and proprietary tuning settings. Others adopt a
application traffic increases, administrators more automated approach that minimizes
need to keep a close eye on disk space, CPU tricky manual tuning parameters, enabling
consumption, memory allocation, and index non-specialties to administer and operate the
fragmentation. Performance adjustments are database.
proprietary to each database and often require
Likewise, some NoSQL databases take a
significant dedicated expertise.
horizontal scale to an extreme, often requiring
A database administrator never wants to huge clusters to achieve the required
see database utilization spike over 100%. performance targets and maintain SLAs.
Therefore, administrators must provide a buffer Sometimes these clusters run into the tens
against traffic spikes by ‘overprovisioning’ of thousands of nodes. While providing a
hardware. The degree to which hardware must frictionless path to scale, this approach also
be overprovisioned depends on the scaling increases operational overhead. The ideal non-
characteristics of the database. In general, relational database can efficiently use powerful
NoSQL databases have a flatter and more modern hardware, while also enabling clusters
predictable performance curve. Therefore, to grow and shrink elastically with minimal
NoSQL databases tend to require administrators administrator intervention.
to minimize overprovisioning without
compromising safety. Backup and Recovery
In both RDBMS and NoSQL worlds, data
Performance tuning can be used to minimize
can become corrupt due to hardware issues,
overprovisioning, but it can only go so far in
software bugs, and user errors. The resilient
preventing full utilization. When performance
architecture of NoSQL databases typically
tuning hits a wall, the database must be scaled;
provides a buffer against data loss. Still,
the RDBMS administrator has two choices. First,
administrators need to be able to restore the
the dataset can be ‘sharded,’ such that a subset
data to a known ‘good state.’ A backup and
of the data is stored on each node. Second, the
recovery plan is essential, being built around
administrator can add more powerful hardware,
two core targets: Recovery Point Objective
increasing the capacity of hardware by adding
(RPO) and Recovery Time Objective (RTO).
more powerful CPUs, more storage, and faster
networking components. • RPO is defined by the age of data in backup
storage needed to resume normal operations
Often, teams do both, sharding and scaling,
after a failure.
which adds both complexity and cost. The
vertical scale adds significant cost at each step • RTO defines the time needed to restore the
and eventually runs up against the physical system to a normal state.
limits of the network. A classic database restore plan might include
NoSQL databases make it easier for a single daily backup along with differential
administrators to monitor and manage database backups every hour to support a one-hour RPO.
deployments. First, they tend to be capable of For a large database, the recovery time for a
running at higher levels of utilization than most full restore can take hours to days, and every
RDBMSs. Second, capacity can be increased backup takes additional storage space.

11
Node Repair and Replacement many modern, cloud-native applications are
Given the distributed nature of NoSQL clusters, better suited to databases that support high
nodes occasionally fall out-of-sync. To address availability and a developer-centric data
this issue, NoSQL databases provide tools to model. The decision is based on business
bring out-of-sync nodes up-to-date using a considerations: how important is each
repair procedure. Repairs populate the node to transaction? Where the aggregate scale and
match the data on the other replicas. Sometimes speed of all transactions outweighs the specific
a node can fall so far out-of-sync with the correctness of any single query, then NoSQL
cluster that it needs to be replaced. As they are is the best fit.
bootstrapped into the cluster, fresh nodes must With this fundamental tradeoff in mind, one
stream a copy of the whole dataset; for large database, Scylla, has been designed from
datasets, such a refresh can take an inordinate the ground-up to overcome one of the key
amount of time. NoSQL databases perform such limitations of the first generation of NoSQL
operations using a variety of algorithms, some databases. Using a unique, close-to-the-
of which are more efficient than others. Thus, hardware design, Scylla combines the scale
some NoSQL databases recover more quickly up capabilities of traditional RDBMSs with the
and predictably than others. high availability and resilience of non-relational
databases. The result is a database that
extracts maximum performance from modern
SCYLLA NOSQL: SCALE-UP OF hardware to deliver predictable, low latency,
RDBMS AND HIGH AVAILABILITY while also minimizing operational overhead and
OF NON-RELATIONAL significantly reducing TCO.

In this document, we have discussed a set of Many IT organizations have followed the
trade offs between SQL and NoSQL databases. principles in this paper and have migrated
If your use case requires ACID guarantees, successfully from RDBMS to the Scylla NoSQL
then NoSQL might not be an option. But database.

SQL NoSQL
Orientation Relational Generally non-relational

Schema Strict and rigid schema design and data Loose and more varied designs for
normalization unstructured and semi-structured data;
data is generally denormalized

Language Structured Query Language (SQL) for There are different languages for querying,
defining, reading and manipulating data. some quite similar to SQL, such as
Supports JOIN statements to relate data Cassandra Query Language (CQL) for
across tables. wide column databases, or others radically
different, such as using object-oriented
JSON for document databases.

Scalability Vertically scalable. Loads on a single server Generally designed for horizontal
can be increased with CPU, RAM or SSD. scalability. Increased traffic can be handled
by adding more servers in the database.
This is useful for large and frequently
changing datasets.

Structure Table-based, which is efficient for NoSQL database structure is variable, and
applications using multi-row transactions can be based on documents, key-value
or systems that were built with a relational pairs, graph structures or wide-column
structure. stores.

12
ABOUT SCYLLADB
Scylla is the real-time big data database. API-compatible
with Apache Cassandra and Amazon DynamoDB, Scylla
embraces a shared-nothing approach that increases
throughput and storage capacity as much as 10X.
Comcast, Discord, Disney+ Hotstar, Grab, Medium,
Starbucks, Ola Cabs, Samsung, IBM, Investing.com and
many more leading companies have adopted Scylla to
realize order-of-magnitude performance improvements
and reduce hardware costs. Scylla’s database is available
as an open source project, an enterprise edition and a
fully managed database as a service. ScyllaDB was
founded by the team responsible for the KVM hypervisor.
For more information: ScyllaDB.com

SCYLLADB.COM

United States Headquarters Israel Headquarters


2445 Faber Place, Suite 200 11 Galgalei Haplada
Palo Alto, CA 94303 U.S.A. Herzelia, Israel
Email: info@scylladb.com
Copyright © 2020 ScyllaDB Inc. All rights reserved. All trademarks or
registered trademarks used herein are property of their respective owners.

You might also like