Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
81 views

Lecture Notes Hands-On With Nosql - Mongodb: - O O O O O O - O O O O O O O

Uploaded by

mh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views

Lecture Notes Hands-On With Nosql - Mongodb: - O O O O O O - O O O O O O O

Uploaded by

mh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Lecture Notes

Hands-on With NoSQL - MongoDB

The following topics were covered in this module:

• Introduction to NoSQL database


o Challenges associated with RDBMS
o Types of data
o Overview of NoSQL databases
o Difference between a NoSQL database and an SQL database
o BASE property of NoSQL Database
o Different types of NoSQL databases and their use cases

• Introduction to MongoDB
o Introduction to MongoDB
o Setting up MongoDB on your system
o Performing CRUD operations on databases
o Advanced queries and their use on a JSON data set
o Comparison operators
o Logical operators
o Aggregation pipeline in MongoDB

• Indexing in MongoDB
o What is indexing?
o Different types of indexes
o How to create indexes
o Best indexing practices to improve query performance

• MongoDB ETL Use-Case


o MongoDB tools and versions
o MongoDB Use-Case
▪ Architecture
▪ Steps

Powered by upGrad Education Private Limited


© Copyright. UpGrad Education Pvt. Ltd. All rights reserved
Introduction to NoSQL Database

Challenges from the Internet


• The internet grew at a very high speed in the late 1990s.
• Social media networks like Facebook, WhatsApp and Twitter, which are used for
communication, are generating tremendous amounts of data.
• As the internet became more accessible, the number of users grew, leading to an exponential
growth of data.
• The generation of the huge amounts of data has imposed multiple challenges as follows:
1. Rapid growth of unstructured data
2. Technological issues in handling large amounts of data

Following are some facts depicting the rapid growth of unstructured data:
• 48 hours of YouTube videos are uploaded every minute daily.
• More than 50 websites are launched every minute daily.
• Last two years hold the credit of contributing to 90% of the data generated so far.
• Year 2020 has witnessed a rise in data generation of 44 times as compared to that of 2009.
• Different brands and organisations get more than 35,000 likes per hour on Facebook.
• More than 100 TB of data is uploaded on Facebook daily.
• Twitter saw roughly 300 million tweets every day and had more than 600 million accounts.
• Billion pieces of content are shared through Facebook every month.
• Zettabytes (1020 bytes) of data is generated each year.

The data generated in the era of the internet are classified as follows:
1. Structured data
2. Semi-structured data
3. Unstructured data

1. Structured Data
• Structured data has a fixed structure and can be arranged in rows and columns.

Powered by upGrad Education Private Limited


© Copyright. UpGrad Education Pvt. Ltd. All rights reserved
• For example, a database stored in SQL, excel sheets, online transaction processing (OLTP)
data, etc.
• Structured data is stored in a relational model on the basis of relational keys. These keys are
helpful in relating to multiple tables. Hence, mapping into a predefined schema is quite easy
in a relational model. Using the relational model, you can easily perform all database
management operations like insert, update, delete and retrieve easily on structured data.
• Database software like Oracle, DB2 and MySQL are used to handle structured data.
• Structured data can also be stored in spreadsheets and OLTP systems.

2. Semi-Structured Data
• Semi-structured data does not have a fixed structure. So, it cannot be accommodated in a
relational model.
• However, semi-structured data can be organised into attributes and values.
• Semi-structured data can be stored in CSV files, extended markup language (XML) and
JavaScript Object Notation (JSON) documents.

Sources of semi-structured data:


1. Human-generated semi-structured data.
• Text data generated in the form of PowerPoint presentations, word processing files, etc.
• Regular social media updates
• Online chats and messages

2. Machine-generated semi-structured data


• Sensory data generated from traffic, oceanography, weather reports, etc.

Problems encountered while analysing semi-structured data:


• Semi-structured data follows an irregular structure. So, it is difficult to spot the relationship
between the data.
• Schema and data are usually tightly coupled and, hence, difficult to put in an ideal structure
like tables or relations.

JavaScript object notation (JSON)


• We do not need to predefine a rigid schema while dealing with JSON.
• It is a semi-structure data model which answers our need to deal with semi-structured data.
• JSON has gained wide support from many programming languages.
• JSON is a file format that is used to store and interchange data.
• In JSON, the data is stored in a set of key-value pairs.
• For example:
{
"name": “Ram",
"age": 25
}
Powered by upGrad Education Private Limited
© Copyright. UpGrad Education Pvt. Ltd. All rights reserved
• The example can also be written as follows:
{ "name": “Ram", "age": 25}
• Here, name and age are the keys and Ram, and 25 are their respective values.
• The following JSON example defines a student object, with an array of 4 students:
{"students":[
{ "firstName":"Ram", "lastName":"Joshi" },
{ "firstName":"Anita", "lastName":"Singh" },
{ "firstName":"Geeta", "lastName":"Tripathi" },
{ "firstName":"Shweta", "lastName":"Shah" }
]}

Extended Markup Language (XML)


• XML is a tag-based notation language which is used to describe some data.
• In XML, the data can have an elaborate and intricate structure that is significantly richer and
more complex than a table of rows and columns.
• For example:

<?xml version = "1.0"?>


<contact-info>
<contactNew>
<name>Ram Shah</name>
<company>upGrad</company>
<phone>9876543210</phone>
</contactNew>
</contact-info>
• Here name, company and phone are the tags and Ram Shah, upGrad and 9876543210 are
their respective values.

Comma Separated Values (CSV)


• It is a delimited text file format where each line of the file represents a single data record.
• The fields of each record are separated by a comma.
• The information is stored in the form of rows.
• Each row indicates one particular instance.
• For example:

• The first row indicates the attributes, and each attribute is separated by a comma.
• The second and third rows have the attribute values specified in the first row.

Powered by upGrad Education Private Limited


© Copyright. UpGrad Education Pvt. Ltd. All rights reserved
3. Unstructured Data
• Majority of the data generated in the world is unstructured data.
• For example, the data generated in the form of images, audio-clips, video files, likes,
comments, etc., are unstructured data.

Following are the sources of unstructured data:


• Data fetched from satellite surveillance
• Output of scientific research data, i.e., the data received from nuclear reactor results
• Data obtained from traffic surveillance in the form of videos and photos
• Data obtained from RADAR or SONAR, i.e., data generated during vehicular,
meteorological and oceanographic seismic profile studies
• Data generated internally in a company in the form of official documents, reports, survey
logs, and email conversations
• Data generated in social media using Twitter, WhatsApp, Facebook, etc.
• Data generated in the form of chats, SMS or location information from a mobile phone
• Websites like YouTube, Amazon and MakeMyTrip which generate a lot of information
based on our interest

Percentage distribution of types of data:

• 80% of data falls under the category of unstructured data.


• 10% of data falls under the category of structured data.
• 10% of data falls under the category of semi-structured data.

• It is important to note that traditional databases can only process structured data which is
merely 10% of all data.
• Traditional databases can only process structured data which is merely 10% of all the data.

Powered by upGrad Education Private Limited


© Copyright. UpGrad Education Pvt. Ltd. All rights reserved
• Since 80% of the data is unstructured, there is a huge demand for database experts
who can process and analyse this data.
• The internet boom has produced new challenges and opportunities in terms of technologies
and skills.

NoSQL

SQL databases used to be the primary means for almost all organisations to store their data and
build applications on top of it. However, the inherent weaknesses in the way these databases work
and the huge explosion of unstructured and semi-structured data due to the rise of the Internet led
to an acute need for a new type of database technology. This is where NoSQL, or ‘Not Only SQL’,
databases emerged and filled that vacuum.

Why NoSQL?

SQL databases have the following two major challenges:

• Necessity of joins - In order to obtain information about a particular record, you need to
join multiple tables to get the relevant data. As the number of tables increases, a join
becomes a computationally expensive operation and can, therefore, slow down your
systems.

• Lack of a flexible schema - SQL databases would require all the field values to be
mentioned explicitly, even if you don’t need them for a particular record.

NoSQL databases, on the other hand, store documents in a denormalized structure and also have
a flexible schema. For example, in the document below, all the information pertaining to a
specific Order_ID is in a single place. You don’t need to perform joins to obtain the relevant data
from multiple tables.

Powered by upGrad Education Private Limited


© Copyright. UpGrad Education Pvt. Ltd. All rights reserved
SQL vs NoSQL

Here are the main differences between SQL and NoSQL:

Schema - SQL databases store information in the form of tables with a fixed schema, whereas
NoSQL databases can follow a dynamic schema.

Scaling - SQL databases are generally scaled vertically, by increasing the CPU and RAM in order
to increase the power of the existing database. NoSQL databases are more suitable for horizontal
scaling, where you keep on adding more servers to improve the performance.

Structure - SQL databases are designed to handle advanced querying requirements, whereas
NoSQL databases are utilised to handle complex databases and scale them as per requirement.

Base Property
The BASE model is a good alternative to the ACID properties of relational databases, as it provides
more flexibility and does not require strict adherence to a relational database.

The BASE model accommodates NoSQL’s flexibility. There are three principles under the BASE
property:

1. Basic availability - This means that the system is mostly available, but without any
guarantee of consistency.
2. Soft state - This means that even without an input query, the state of the system may
change over time. This happens because a NoSQL database keeps on trying to make
the system consistent and available by synchronising with other systems.
3. Eventual consistency - This means that the system would eventually become consistent
once it stops receiving input. So, if we wait long enough for any given input, then we
will get consistent reads.

Types of NoSQL Databases and Use Cases


There are four basic types of NoSQL databases. These are as follows:

Key–value stores - Data is stored as a key along with its value. A pointer and a unique identifier
are associated with every data element. Arbitrary strings are used as keys and the value could be a
document or an image. Key–value data stores have large hash tables, which contain the keys and
the values. Some of the popular key–value NoSQL datastores include Cassandra and Redis.

Document-based stores - Each record and all the associated data are stored within a document.
The documents stored are made up of tagged elements. Some examples of document-based data
stores include CouchDB and MongoDB.

Powered by upGrad Education Private Limited


© Copyright. UpGrad Education Pvt. Ltd. All rights reserved
Column-based stores - Data is stored by column rather than by row. Each storage block contains data from
one column only. HBase and Hypertable are popular examples of column-based datastores.

Graph-based stores - These are network databases that use edges and nodes to represent and store data. A
popular example is Neo4J.

Here are some use cases where NoSQL databases outperform traditional databases:

● Storage of real-time big data


● Internet of Things (IoT)
● Customer 360° View
● Semi-structured data

Powered by upGrad Education Private Limited


© Copyright. UpGrad Education Pvt. Ltd. All rights reserved
Disclaimer: All content and material on the upGrad website is copyrighted material, either belonging to upGrad or its
bonafide contributors and is purely for the dissemination of education. You are permitted to access print and download
extracts from this site purely for your own education only and on the following basis:

• You can download this document from the website for self-use only.
• Any copies of this document, in part or full, saved to disc or to any other storage medium may only be used
for subsequent, self-viewing purposes or to print an individual extract or copy for non-commercial personal
use only.
• Any further dissemination, distribution, reproduction, copying of the content of the document herein or the
uploading thereof on other websites or use of the content for any other commercial/unauthorised purposes in
any way which could infringe the intellectual property rights of upGrad or its contributors, is strictly
prohibited.
• No graphics, images or photographs from any accompanying text in this document will be used separately for
unauthorised purposes.
• No material in this document will be modified, adapted or altered in any way.
• No part of this document or upGrad content may be reproduced or stored in any other web site or included in
any public or private electronic retrieval system or service without upGrad’s prior written permission.
• Any rights not expressly granted in these terms are reserved.

You might also like