Lecture Notes Hands-On With Nosql - Mongodb: - O O O O O O - O O O O O O O
Lecture Notes Hands-On With Nosql - Mongodb: - O O O O O O - O O O O O O O
• Introduction to MongoDB
o Introduction to MongoDB
o Setting up MongoDB on your system
o Performing CRUD operations on databases
o Advanced queries and their use on a JSON data set
o Comparison operators
o Logical operators
o Aggregation pipeline in MongoDB
• Indexing in MongoDB
o What is indexing?
o Different types of indexes
o How to create indexes
o Best indexing practices to improve query performance
Following are some facts depicting the rapid growth of unstructured data:
• 48 hours of YouTube videos are uploaded every minute daily.
• More than 50 websites are launched every minute daily.
• Last two years hold the credit of contributing to 90% of the data generated so far.
• Year 2020 has witnessed a rise in data generation of 44 times as compared to that of 2009.
• Different brands and organisations get more than 35,000 likes per hour on Facebook.
• More than 100 TB of data is uploaded on Facebook daily.
• Twitter saw roughly 300 million tweets every day and had more than 600 million accounts.
• Billion pieces of content are shared through Facebook every month.
• Zettabytes (1020 bytes) of data is generated each year.
The data generated in the era of the internet are classified as follows:
1. Structured data
2. Semi-structured data
3. Unstructured data
1. Structured Data
• Structured data has a fixed structure and can be arranged in rows and columns.
2. Semi-Structured Data
• Semi-structured data does not have a fixed structure. So, it cannot be accommodated in a
relational model.
• However, semi-structured data can be organised into attributes and values.
• Semi-structured data can be stored in CSV files, extended markup language (XML) and
JavaScript Object Notation (JSON) documents.
• The first row indicates the attributes, and each attribute is separated by a comma.
• The second and third rows have the attribute values specified in the first row.
• It is important to note that traditional databases can only process structured data which is
merely 10% of all data.
• Traditional databases can only process structured data which is merely 10% of all the data.
NoSQL
SQL databases used to be the primary means for almost all organisations to store their data and
build applications on top of it. However, the inherent weaknesses in the way these databases work
and the huge explosion of unstructured and semi-structured data due to the rise of the Internet led
to an acute need for a new type of database technology. This is where NoSQL, or ‘Not Only SQL’,
databases emerged and filled that vacuum.
Why NoSQL?
• Necessity of joins - In order to obtain information about a particular record, you need to
join multiple tables to get the relevant data. As the number of tables increases, a join
becomes a computationally expensive operation and can, therefore, slow down your
systems.
• Lack of a flexible schema - SQL databases would require all the field values to be
mentioned explicitly, even if you don’t need them for a particular record.
NoSQL databases, on the other hand, store documents in a denormalized structure and also have
a flexible schema. For example, in the document below, all the information pertaining to a
specific Order_ID is in a single place. You don’t need to perform joins to obtain the relevant data
from multiple tables.
Schema - SQL databases store information in the form of tables with a fixed schema, whereas
NoSQL databases can follow a dynamic schema.
Scaling - SQL databases are generally scaled vertically, by increasing the CPU and RAM in order
to increase the power of the existing database. NoSQL databases are more suitable for horizontal
scaling, where you keep on adding more servers to improve the performance.
Structure - SQL databases are designed to handle advanced querying requirements, whereas
NoSQL databases are utilised to handle complex databases and scale them as per requirement.
Base Property
The BASE model is a good alternative to the ACID properties of relational databases, as it provides
more flexibility and does not require strict adherence to a relational database.
The BASE model accommodates NoSQL’s flexibility. There are three principles under the BASE
property:
1. Basic availability - This means that the system is mostly available, but without any
guarantee of consistency.
2. Soft state - This means that even without an input query, the state of the system may
change over time. This happens because a NoSQL database keeps on trying to make
the system consistent and available by synchronising with other systems.
3. Eventual consistency - This means that the system would eventually become consistent
once it stops receiving input. So, if we wait long enough for any given input, then we
will get consistent reads.
Key–value stores - Data is stored as a key along with its value. A pointer and a unique identifier
are associated with every data element. Arbitrary strings are used as keys and the value could be a
document or an image. Key–value data stores have large hash tables, which contain the keys and
the values. Some of the popular key–value NoSQL datastores include Cassandra and Redis.
Document-based stores - Each record and all the associated data are stored within a document.
The documents stored are made up of tagged elements. Some examples of document-based data
stores include CouchDB and MongoDB.
Graph-based stores - These are network databases that use edges and nodes to represent and store data. A
popular example is Neo4J.
Here are some use cases where NoSQL databases outperform traditional databases:
• You can download this document from the website for self-use only.
• Any copies of this document, in part or full, saved to disc or to any other storage medium may only be used
for subsequent, self-viewing purposes or to print an individual extract or copy for non-commercial personal
use only.
• Any further dissemination, distribution, reproduction, copying of the content of the document herein or the
uploading thereof on other websites or use of the content for any other commercial/unauthorised purposes in
any way which could infringe the intellectual property rights of upGrad or its contributors, is strictly
prohibited.
• No graphics, images or photographs from any accompanying text in this document will be used separately for
unauthorised purposes.
• No material in this document will be modified, adapted or altered in any way.
• No part of this document or upGrad content may be reproduced or stored in any other web site or included in
any public or private electronic retrieval system or service without upGrad’s prior written permission.
• Any rights not expressly granted in these terms are reserved.