M5_dbm_sql_notes
M5_dbm_sql_notes
NoSQL databases are a broad class of database management systems that differ from traditional
relational database management systems (RDBMS) in that they do not use the standard relational
database model. The term "NoSQL" originally stood for "non-SQL" to emphasize their departure
from SQL-based relational databases, but it has since been interpreted to mean "not only SQL,"
indicating that some NoSQL databases support query languages that are SQL-like.
Use Cases:
Big Data and Real-Time Web Applications: NoSQL databases are well-suited for handling
large volumes of data with varying structures, or when an application requires real-time
access to data.
Flexible Data Models: Applications that require a flexible data model with the ability to
change the data schema.
Scalability: When an application needs to scale horizontally across many servers to handle
large loads or large volumes of data, NoSQL databases are often more cost-effective and
technically feasible than traditional RDBMS.
Advantages and Disadvantages:
Advantages:
Scalability: Designed to horizontal scale, that is involve across many servers.
Flexibility: Schema-less models allow for flexible and rapid development.
Variety: high volume, unstructured data.
Disadvantages:
Consistency: Many NoSQL databases sacrifice ACID (Atomicity, Consistency,
Isolation, Durability)
Maturity: Relational databases benefit from decades of research, optimization, and
tooling, while NoSQL solutions are generally newer and may lack comprehensive
tools and best practices.
Complexity: The variety of data models can introduce complexity in selecting and
managing the appropriate NoSQL database for specific use cases.
1. Document stores
1. “document" refers to the main unit of storage, which encapsulates data in formats like
JSON, BSON. These databases are designed to store, retrieve, and manage document-
oriented information, typically in JSON, BSON (Binary JSON), or XML formats.
2. Document stores provide a flexible schema approach, which means that the structure of the
data can change from one document to another within the same database.
3. This flexibility makes document stores particularly well-suited for applications dealing with
large volumes of unstructured data that may not fit neatly into the rows and columns of a
traditional relational database.
Example
MongoDB, Apache CouchDB, Couchbase, DocumentDB
[
{
"isbn": "100-100-100",
"title": "Alchemist",
"authors": ["Paulo Coelho"],
},
{
"isbn": "100-100-101",
"title": "The Godfather",
"authors": ["Mario Puzo"],
"publisher": "Penguin Books",
"genres": ["Novel", "Psychological Fiction"],
},
Above we can see two records of books , Alchemist and God father having different schema. That
is the record for the book “God father“ got two more attributes (publisher and genres). This is
described as flexible schema
Insert operation
db.book.insert({
isbn: "100-100-103",
title: "Hamlet",
authors: ["William Shakespeare"],
publicationYear: 1599,
genres: [“play”, “Shakespearean tragedy”],
})
Query operation
db.book.find({title: "Hamlet"})
db.book.find({isbn: 100-100-103)}
Simplicity: The model is straightforward, with each key acting as a unique identifier through which
its associated value can be accessed.
Performance: Key-value stores are optimized for speed, particularly for read and write operations,
due to their simple data model and the ability to distribute data across multiple nodes.
Scalability: Many key-value databases are designed to scale out horizontally, making it easy to
increase capacity and throughput by adding more nodes to the system.
Flexibility: Values stored can be anything from simple data types (such as strings or numbers) to
more complex objects or binary data. The structure of the value is not enforced by the database,
providing flexibility in what can be stored.
Schema-less: There is no fixed schema required for the data, allowing the structure of values to
change without the need for database migrations.
Ecample
Redis, Amazon DynamoDB, Riak, Riak
To represent the given book records as key-value pairs suitable for a key-value store like Amazon
DynamoDB, you can structure each record with its ISBN as the key and the rest of the book's
information as the value.
{
"100-100-100": {
"title": "Alchemist",
"authors": ["Paulo Coelho"],
},
"100-100-101": {
"title": "The Godfather",
"authors": ["Mario Puzo"],
"publisher": "Penguin Books",
"genres": ["Novel", "Psychological Fiction"],
},
}
Insert operatio
aws dynamodb put-item \
--table-name Books \
--item '{
"isbn": {"S": "100-100-103"},
"title": {"S": "Hamlet"},
"authors": {"L": [{"S": "William Shakespeare"}]},
"publicationYear": {"N": "1599"},
"genres": {"L": [{"S": "play"}, {"S": "Shakespearean tragedy"}]}
}' \
Query operation
aws dynamodb get-item \
--table-name Books \
--key '{"isbn": {"S": "100-100-103"}}' \
3. Wide-column stores
Wide-column stores are a type of NoSQL database that organizes data into tables, rows, and
dynamic columns. They combine elements of both relational databases and key-value stores but are
unique in their approach to column management. Wide-column stores allow for each row to have
a different set of columns.
Cassandra encourages denormalization and duplication of data across multiple tables to optimize
read performance.
Creating Schema
In Apache Cassandra, if you need to add a new attribute (like "language" in your example) to an
existing table, you would typically alter the table schema to include this new column.
This will not add memory overhead to existing rows whereas in RDBMS, scema updates will add
memory overhead to all exisitng records.
4. Graph stores
1. Graph stores, or graph databases, are a type of NoSQL database designed to treat
relationships between data as equally important to the data itself.
2. They are optimized for managing interconnected data and for queries that traverse complex
relationships with many hops in the data.
3. Graph databases excel at handling data whose relationships are as important or more
important than the data itself.
// Create Publishers
CREATE (p1:Publisher {name: "Penguin Books"})
CREATE (p2:Publisher {name: "Green Bools"})
// Create Books
CREATE (b1:Book {isbn: "100-100-101", title: "The Alchemist"})
CREATE (b2:Book {isbn: "100-100-102", title: "The God Father", publisher: “Penguin Books”,
genres: ['Novel', 'Psychological Fiction']})
CREATE (b3:Book {isbn: "100-100-103", title: "100 years of Solitude", publicationYear: 1967,
publisher: “Green Books”, genres: ['Novel', 'Fiction', 'Coming-of-Age']})
// Establish Relationships
CREATE (b1)-[:AUTHORED_BY]->(a1)
CREATE (b2)-[:AUTHORED_BY]->(a2)
CREATE (b2)-[:PUBLISHED_BY]->(p2)
CREATE (b3)-[:AUTHORED_BY]->(a3)
CREATE (b3)-[:PUBLISHED_BY]->(p3)
Query Operations
To find a book by its ISBN, you can use the following query:
MATCH (b:Book {isbn: "100-100-103"})
RETURN b;
find all books authored by "Gabriel Garcia Marques", use a query like:
MATCH (a:Author {name: "Gabriel Garcia Marques"})-[:AUTHORED_BY]-(b:Book)
RETURN b;