0% found this document useful (0 votes)

25 views

Module 1 and NoSQL

Uploaded by

ATHARVA THAKUR

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Module 1 and NoSQL

Uploaded by

ATHARVA THAKUR

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Big Data

• Big data is an evolving term that describes any voluminous amount of

structured, semi-structured and unstructured data that has the potential to
be mined for information.

• Can’t be processed or analyzed using traditional processes or tools.

By Santosh Tamboli Sir...

1
http://www.youtube.com/@santoshtamboli
Characteristics

By Santosh Tamboli Sir...

2
http://www.youtube.com/@santoshtamboli
Volume
• describes the relative size of data to the processing capability.

• not Terabytes but Zettabytes or Yottabytes.

To Overcoming the volume issue :

• Two options exist today: Apache Hadoop based solutions and

massively parallel processing databases such as CalPont, EMC
GreenPlum, EXASOL, HP Vertica, IBM Netezza, Kognitio, ParAccel, and
Teradata Kickfire.
By Santosh Tamboli Sir...
3
http://www.youtube.com/@santoshtamboli
By Santosh Tamboli Sir...
4
http://www.youtube.com/@santoshtamboli
Velocity
• describes the frequency at which data is generated, captured, and
shared.

• affect the ability to parse text, detect sentiment, and identify new
patterns.

• Key technologies that address velocity include streaming processing

and complex event processing.
• NoSQL databases are used when relational approaches no longer
make sense.
By Santosh Tamboli Sir...
5
http://www.youtube.com/@santoshtamboli
Variety
• Various data types from social, machine to machine, and mobile sources add new
data types to traditional transactional data.

• New types include content, geo-spatial, hardware data points, location based, log
data, machine data, metrics, mobile, physical data points, process, RFID’s, search,
sentiment, streaming data, social, text, and web.

• The addition of unstructured data such as speech, text, image, video increasingly
complicate the ability to categorize data.

• Some technologies that deal with unstructured data include data mining, text
analytics, and noisy text analytics.

By Santosh Tamboli Sir...

6
http://www.youtube.com/@santoshtamboli
Value

• Today data is being produced in large volumes. And just collecting the produced
data is of no use. Instead, we have to look for data from which business insights
can be generated which adds “value” to the company.

• This is where Big data analytics comes into the big picture. There are companies
that have invested in establishing data and data storage infrastructure, but they
fail to understand that the aggregation of data doesn’t equal value addition.

• Data analytics helps to derive useful insights from the collected data. These
insights, in turn, add value to the decision-making process.

By Santosh Tamboli Sir...

7
http://www.youtube.com/@santoshtamboli
Validity / Veracity

• The Validity and Veracity of Big data can be described as the assurance of quality
or credibility of the collected data.
• Since Big data is vast and involves so many data sources, it is the possibility that
not all the collected data is accurate and of good quality.
• Hence, when processing big data sets, it is important to check the validity of the
data before proceeding with further analysis.
• Questions like Can you trust the data that you have collected? Is the data reliable
enough? , etc. need to be entertained. Hence, before processing the data for
further analysis, it is important to check the validity of the data.

By Santosh Tamboli Sir...

8
http://www.youtube.com/@santoshtamboli
Types of Big Data
Structured and Unstructured

By Santosh Tamboli Sir...

9
http://www.youtube.com/@santoshtamboli
Structured data
• refers to data that has a defined length and format.

By Santosh Tamboli Sir...

10
http://www.youtube.com/@santoshtamboli
Types of Structured data
M/c generated:
• i. Sensor data: Examples include radio frequency ID (RFID) tags, smart
meters, medical devices, and Global Positioning System (GPS) data.

• ii. Web log data: When servers, applications, networks, and so on operate,
they capture all kinds of data about their activity.

• iii. Point-of-sale data: When the cashier swipes the bar code of any
product that you are purchasing.

• iv. Financial data: such as the company symbol and dollar value.

By Santosh Tamboli Sir...

11
http://www.youtube.com/@santoshtamboli
Human generated data

• generated by human intervention by interacting with computers.

Types:
• i. Input data: data that a human might input into a computer, such as
name, age, income, non-free-form survey responses, etc.

• ii. Click-stream data: Data is generated every time when you click a
link on a website.

• iii. Gaming-related data: Every move you make in a game can be

recorded.
By Santosh Tamboli Sir...
12
http://www.youtube.com/@santoshtamboli
Unstructured data
• not follow any format.
Types:

M/c generated:

• i. Satellite images: includes weather data or the data that the government captures in its satellite
surveillance imagery.

• ii. Scientific data: includes seismic imagery, atmospheric data and high energy physics.

• iii. Photographs and video: includes security, surveillance, and traffic video.

• iv. Radar or sonar data: includes vehicular, meteorological, and oceanographic data.

By Santosh Tamboli Sir...

13
http://www.youtube.com/@santoshtamboli
b. Human generated:
Types:
i. Text internal to your company: All the text within documents, logs,
survey results, and e-mails.
ii. Social media data: This data is generated from the social media
platforms such as YouTube, Facebook, Twitter, LinkedIn, and Flickr.
iii. Mobile data: This includes data such as text messages and location
information.
iv. Website content: This comes from any site delivering unstructured
content, like YouTube, Flickr, or Instagram.
By Santosh Tamboli Sir...
14
http://www.youtube.com/@santoshtamboli
Traditional Vs Big data approach

By Santosh Tamboli Sir...

15
http://www.youtube.com/@santoshtamboli
Big Data challenges
Dealing with data growth
Shortage of Skilled People
Recruiting and retaining big data talent
Collecting and Integrating Massive and Diverse Datasets
Validating data
Maintaining Data Integrity, Security, and Privacy
Picking the Right NoSQL Tools
Real-time can be Complex

By Santosh Tamboli Sir...

16
http://www.youtube.com/@santoshtamboli
Applications of Big data
Education Industry
Healthcare Industry
Government Sector
Media and Entertainment Industry
Weather Patterns
Transportation Industry
Banking Sector

By Santosh Tamboli Sir...

17
http://www.youtube.com/@santoshtamboli
What is NoSQL
• NoSQL is a set of concepts that allows the rapid and efficient
processing of data sets with a focus on performance, reliability, and
agility.
• It’s more than rows in tables—NoSQL systems store and retrieve data
from many formats: key-value stores, graph databases, column-family
(Bigtable) stores, document stores, and even rows in tables.
• It’s free of joins—NoSQL systems allow you to extract your data using
simple interfaces without joins.
• It’s schema-free—NoSQL systems allow you to drag-and-drop your
data into a folder and then query it without creating an entity-
relational model.
By Santosh Tamboli Sir...
18
http://www.youtube.com/@santoshtamboli
• It works on many processors—NoSQL systems allow you to store
your database on multiple processors and maintain high-speed
performance.
• It uses shared-nothing commodity computers—Most (but not all)
NoSQL systems leverage low-cost commodity processors that have
separate RAM and disk.
• It supports linear scalability—When you add more processors, you
get a consistent increase in performance.
• It’s innovative—NoSQL offers options to a single way of storing,
retrieving, and manipulating data.
By Santosh Tamboli Sir...
19
http://www.youtube.com/@santoshtamboli
NoSQL Data Architecture Patterns
• Key-value stores
• Graph stores
• Column family stores
• Document stores

By Santosh Tamboli Sir...

20
http://www.youtube.com/@santoshtamboli
Key-value stores

• A key-value store is a simple database that when presented with a

simple string (the key) returns an arbitrary large BLOB of data (the
value).

• Key-value stores have no query language; they provide a way to add

and remove key-value pairs into/from a database.

• A key-value store is like a dictionary. A dictionary has a list of words

and each word has one or more definitions

By Santosh Tamboli Sir...

21
http://www.youtube.com/@santoshtamboli
By Santosh Tamboli Sir...
22
http://www.youtube.com/@santoshtamboli
Graph stores

• Graph stores are important in applications that need to analyze

relationships between objects or visit all nodes in a graph in a
particular manner.

• Graph stores are highly optimized to efficiently store graph nodes and
links that allow you to query these graphs.

• Graph databases are useful for any business problem that has
complex relationships between objects such as social networking,
rules-based engines, creating mashups.

By Santosh Tamboli Sir...

23
http://www.youtube.com/@santoshtamboli
By Santosh Tamboli Sir...
24
http://www.youtube.com/@santoshtamboli
• A graph store is a system that contains a sequence of nodes and
relationships to create a graph.
• In a key-value store there two data fields: the key and the value. In
contrast, a graph store has three data fields: nodes, relationships, and
properties.

By Santosh Tamboli Sir...

25
http://www.youtube.com/@santoshtamboli
Column family (Bigtable) stores

• These are important NoSQL data architecture patterns because they can
scale to manage large volumes of data.

• In the MapReduce framework, the map operation has a master node which
breaks up an operation into subparts and distributes each operation to
another node for processing, and reduce is the process where the master
node collects the results from the other nodes and combines them into the
answer to the original problem.

• Column family stores use row and column identifiers as general purposes
keys for data lookup. They’re sometimes referred to as data stores rather
than databases

By Santosh Tamboli Sir...

26
http://www.youtube.com/@santoshtamboli
• HBase, Hypertable and Cassandra are good examples of systems that
have Bigtable like interfaces.
• MonetDB, SybaseIQ and Vertica are examples of column-store
systems.

By Santosh Tamboli Sir...

27
http://www.youtube.com/@santoshtamboli
Document stores
• The key-value store and Bigtable values lack a formal structure and aren’t
indexed or searchable.

• Document stores work in the opposite manner: the key may be a simple ID

• But you can get almost any item out of a document store by querying any
value or content within the document.

• A consequence of using a document store is everything inside a document

is automatically indexed when a new document is added.

By Santosh Tamboli Sir...

28
http://www.youtube.com/@santoshtamboli
• Document stores can tell not only that your search item is in the
document but also the search item’s exact location by using the
document path as shown below:

By Santosh Tamboli Sir...

29
http://www.youtube.com/@santoshtamboli
• Document trees have a single root element. Beneath the root
element there is a sequence of branches, sub-branches and values.

• Each branch has a related path expression that shows you how to
navigate from the root of the tree to any given branch, sub-branch or
value.

By Santosh Tamboli Sir...

30
http://www.youtube.com/@santoshtamboli
CAP theorem

By Santosh Tamboli Sir...

31
http://www.youtube.com/@santoshtamboli
The three letters in CAP refer to three desirable properties of
distributed systems with replicated data:
consistency (among replicated copies)
availability (of the system for read and write operations)
partition tolerance (in the face of the nodes in the system being
partitioned by a network fault).

By Santosh Tamboli Sir...

32
http://www.youtube.com/@santoshtamboli
Consistency –
Consistency means that the nodes will have the same copies of a
replicated data item visible for various transactions.
A guarantee that every node in a distributed cluster returns the same,
most recent and a successful write.
Consistency refers to every client having the same view of the data.

By Santosh Tamboli Sir...

33
http://www.youtube.com/@santoshtamboli
Availability –
Availability means that each read or write request for a data item will
either be processed successfully or will receive a message that the
operation cannot be completed.
Every non-failing node returns a response for all the read and write
requests in a reasonable amount of time.
The key word here is “every”. In simple terms, every node must be able
to respond in a reasonable amount of time.

By Santosh Tamboli Sir...

34
http://www.youtube.com/@santoshtamboli
Partition Tolerance –
Partition tolerance means that the system can continue operating even
if the network connecting the nodes has a fault that results in two or
more partitions, where the nodes in each partition can only
communicate among each other.
That means, the system continues to function and upholds its
consistency guarantees in spite of network partitions.
Distributed systems guaranteeing partition tolerance can gracefully
recover from partitions once the partition heals.

By Santosh Tamboli Sir...

35
http://www.youtube.com/@santoshtamboli

The Colonization of Tiamat V (Phoenix III, Daniel) PDF
No ratings yet
The Colonization of Tiamat V (Phoenix III, Daniel) PDF
44 pages
Mandler y Sarason
No ratings yet
Mandler y Sarason
8 pages
Data Analytics AI v2
No ratings yet
Data Analytics AI v2
31 pages
Unit-Iii CC&BD CS71
No ratings yet
Unit-Iii CC&BD CS71
89 pages
Unit 5
No ratings yet
Unit 5
63 pages
Wk1_Overview of Data Analytics and Big Data
No ratings yet
Wk1_Overview of Data Analytics and Big Data
21 pages
Module I Big Data
No ratings yet
Module I Big Data
7 pages
big_data_in_the_future_of_workforce_-_prof_abdullah
No ratings yet
big_data_in_the_future_of_workforce_-_prof_abdullah
30 pages
FDSUNIT 1
No ratings yet
FDSUNIT 1
27 pages
big data processing
No ratings yet
big data processing
38 pages
PPT 1.1.2
No ratings yet
PPT 1.1.2
17 pages
1 - Big Data
No ratings yet
1 - Big Data
204 pages
Bda - Unit 1
No ratings yet
Bda - Unit 1
32 pages
01 - Introduction To Big Data Analytics PDF
No ratings yet
01 - Introduction To Big Data Analytics PDF
38 pages
R II Bca IV Sem Unit 3 Balu Sir
No ratings yet
R II Bca IV Sem Unit 3 Balu Sir
14 pages
Unit-III CC&BD Cs62 Ab
No ratings yet
Unit-III CC&BD Cs62 Ab
85 pages
01_Introduction to Big Data Analytics.pdf
No ratings yet
01_Introduction to Big Data Analytics.pdf
37 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
83 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
BD Unit-1
No ratings yet
BD Unit-1
27 pages
mod 3
No ratings yet
mod 3
96 pages
Unit Iii Big Data Analytics What Is Data?
No ratings yet
Unit Iii Big Data Analytics What Is Data?
36 pages
Data Science Training
No ratings yet
Data Science Training
8 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
26 pages
4 - Chapitre 3 Big Data
No ratings yet
4 - Chapitre 3 Big Data
31 pages
Data Science and Big Data Analytics Unit 1 notes
No ratings yet
Data Science and Big Data Analytics Unit 1 notes
13 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Unit 1: To Data Science
No ratings yet
Unit 1: To Data Science
56 pages
Sem Csen1301
No ratings yet
Sem Csen1301
12 pages
BD Unit-1 Upd
No ratings yet
BD Unit-1 Upd
29 pages
Aplicatii Web Semantice
No ratings yet
Aplicatii Web Semantice
33 pages
BDA UNIT1
No ratings yet
BDA UNIT1
56 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
88 pages
Introduction To Big Data - Presentation
No ratings yet
Introduction To Big Data - Presentation
30 pages
Module 1 BDA
No ratings yet
Module 1 BDA
103 pages
Session 1
No ratings yet
Session 1
58 pages
Fundamentals of Big Data Analytics
No ratings yet
Fundamentals of Big Data Analytics
151 pages
Big Data Skn
No ratings yet
Big Data Skn
24 pages
BD U1.PDF.crdownload
No ratings yet
BD U1.PDF.crdownload
65 pages
Unit I_DSBDA(1)
No ratings yet
Unit I_DSBDA(1)
152 pages
14 Big Data
No ratings yet
14 Big Data
39 pages
UNIT-1_Big Data and Hadoop
No ratings yet
UNIT-1_Big Data and Hadoop
41 pages
Big Data Analytics_AAM_Unit 1
No ratings yet
Big Data Analytics_AAM_Unit 1
178 pages
20210913115458D3708 - Session 01 Introduction To Big Data Analytics
No ratings yet
20210913115458D3708 - Session 01 Introduction To Big Data Analytics
28 pages
PPT 1.1.1
No ratings yet
PPT 1.1.1
13 pages
Unit I - Business Analytics
No ratings yet
Unit I - Business Analytics
22 pages
Part 01 - Overview of Big Data
No ratings yet
Part 01 - Overview of Big Data
11 pages
Lecture 1: Big Data Challenges and Overview: Extracted From
No ratings yet
Lecture 1: Big Data Challenges and Overview: Extracted From
26 pages
Week 01-B Lecture Student Version
No ratings yet
Week 01-B Lecture Student Version
28 pages
22UCS303 DS-Unit I-N
No ratings yet
22UCS303 DS-Unit I-N
42 pages
BDT 1
No ratings yet
BDT 1
49 pages
Unit - 1
No ratings yet
Unit - 1
46 pages
BDA NOTES With Questions Included
No ratings yet
BDA NOTES With Questions Included
108 pages
Big Data Analytics
No ratings yet
Big Data Analytics
32 pages
DTA First Lecture
No ratings yet
DTA First Lecture
36 pages
Data Analytics Unit I 1
No ratings yet
Data Analytics Unit I 1
87 pages
Chapter 4 Data Analytics
No ratings yet
Chapter 4 Data Analytics
19 pages
UNIT- 1_DA_Notes
No ratings yet
UNIT- 1_DA_Notes
51 pages
Class - Big Data UNIT-I
No ratings yet
Class - Big Data UNIT-I
40 pages
Evolution of Big Data
No ratings yet
Evolution of Big Data
50 pages
IoT Data Analytics using Python: Learn how to use Python to collect, analyze, and visualize IoT data (English Edition)
From Everand
IoT Data Analytics using Python: Learn how to use Python to collect, analyze, and visualize IoT data (English Edition)
M S Hariharan
No ratings yet
Crash 2024 03 31 18 08 12 419
No ratings yet
Crash 2024 03 31 18 08 12 419
9 pages
NF EN 14566+A1 - Mechanical fasteners for gypsum plasterboard systems
No ratings yet
NF EN 14566+A1 - Mechanical fasteners for gypsum plasterboard systems
34 pages
Atasamente CFR
No ratings yet
Atasamente CFR
36 pages
Mab 206 PDF
No ratings yet
Mab 206 PDF
2 pages
DISTILLATION
No ratings yet
DISTILLATION
6 pages
AI and ML For Business Antim Prahar WITH ANSWERS
No ratings yet
AI and ML For Business Antim Prahar WITH ANSWERS
26 pages
Losses in Prestress
No ratings yet
Losses in Prestress
43 pages
References: Sources Used
No ratings yet
References: Sources Used
4 pages
AIC
No ratings yet
AIC
1 page
Syll2001ao1to4 PDF
No ratings yet
Syll2001ao1to4 PDF
48 pages
Adv - Math Reviewer 2nd Quarter
No ratings yet
Adv - Math Reviewer 2nd Quarter
2 pages
1391B-ES Instruction Manual
No ratings yet
1391B-ES Instruction Manual
83 pages
Wing Pendidikan 200/elektronika Skadron Pendidikan 203
No ratings yet
Wing Pendidikan 200/elektronika Skadron Pendidikan 203
13 pages
Production Function: Module - 7
No ratings yet
Production Function: Module - 7
13 pages
Firing Deformation in Large Size Porcelain Tiles. Effect of Compositional and Process Variables
No ratings yet
Firing Deformation in Large Size Porcelain Tiles. Effect of Compositional and Process Variables
15 pages
Pic18f4550 PWM Example Using Ccs Pic C
100% (1)
Pic18f4550 PWM Example Using Ccs Pic C
2 pages
Temperature Guide
100% (1)
Temperature Guide
40 pages
Five Project Management Performance Metrics Key To Successful Project Execution
No ratings yet
Five Project Management Performance Metrics Key To Successful Project Execution
7 pages
1.0 - Properties Fluids
No ratings yet
1.0 - Properties Fluids
5 pages
SEM Workshop Presentation 4
No ratings yet
SEM Workshop Presentation 4
27 pages
Captiva Series II Product Overview
No ratings yet
Captiva Series II Product Overview
10 pages
JC Cuevas Molecular Electronics Lecture PDF
No ratings yet
JC Cuevas Molecular Electronics Lecture PDF
83 pages
Protection From Coastal Erosion
No ratings yet
Protection From Coastal Erosion
30 pages
Linear Programming - RHS Sensitivity
No ratings yet
Linear Programming - RHS Sensitivity
15 pages
Development of Mill Drives For The Cement Industry
No ratings yet
Development of Mill Drives For The Cement Industry
16 pages
Punctuation S
No ratings yet
Punctuation S
28 pages
RNDF MDF Formats 031407
No ratings yet
RNDF MDF Formats 031407
14 pages
Vacuum Unit Conversion Chart, An ISM Resource
No ratings yet
Vacuum Unit Conversion Chart, An ISM Resource
5 pages