chapter-1 Introduction to Data Analytics

Uploaded by

tinayetakundwa

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

chapter-1 Introduction to Data Analytics

Uploaded by

tinayetakundwa

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Chapter-1

Introduction to Data Analytics

Prepared By :- Assistant Professor Manthan Rankaja
Definition of Data Analytics
• Data Analytics involves the use of specialized systems and software to
analyze data and draw insights from it.
• In the era of big data, analytics help organizations make informed
decisions, predict trends, and understand customer behavior.
Applications of Data Analytics
• Various industries where data analytics is applied: Healthcare
(predicting disease outbreaks), Finance (fraud detection), Retail
(customer segmentation), and many more.
• Real-world examples of data analytics: Netflix’s recommendation
system, Credit card fraud detection, etc
Types of Data Analytics
• Descriptive: Analyzes historical data to understand what has
happened.
• Diagnostic: Digs deeper into data to understand the root cause of the
outcome.
• Predictive: Uses statistical models and forecasts techniques to
understand the future.
• Prescriptive: Uses optimization and simulation algorithms to advise
on possible outcomes.
Descriptive Analytics
• Definition: Descriptive Analytics deals with the analysis of historical
data to understand changes that have occurred in a business.
• Use cases: Sales trend analysis, Social media trend analysis.
• Examples: Monthly revenue report, Social media post reach analysis.
Diagnostic Analytics
• Definition: Diagnostic Analytics is a form of advanced analytics that
examines data to answer the question “Why did it happen?”.
• Use cases: Sales decline analysis, Customer churn analysis.
• Examples: Analyzing customer feedback to understand a drop in
product sales, Studying customer behavior data to understand churn.
Predictive Analytics
• Definition: Predictive Analytics uses statistical techniques and
machine learning algorithms to understand the future.
• Use cases: Customer lifetime value prediction, Predictive
maintenance.
• Examples: Using past purchase history to predict a customer’s future
purchase, Predicting machine failure using sensor data.
Prescriptive Analytics
• Definition: Prescriptive Analytics goes beyond predicting future
outcomes by also suggesting actions to benefit from the predictions.
• Use cases: Supply chain optimization, Personalized marketing.
• Examples: Optimizing delivery routes in real-time to save costs,
Personalizing marketing messages based on customer behavior
prediction.
Types of Data
• Structured data: Data that is organized and formatted so it’s easily
readable.
• For example, a database of customer information where data is
organized in rows and columns.
• Unstructured data: Data that doesn’t follow a specified format. For
example, emails, social media posts, etc.
• Semi-structured data: A mix of structured and unstructured data. For
example, a document which contains metadata.
Structured Data
• Definition: Structured data is highly organized and formatted in a way
so it’s easily searchable in relational databases.
• Examples:
Customer databases, Excel spreadsheets, etc.
• Advantages:
Easy to enter, store, query, and analyze.
• Disadvantages:
Requires a lot of time and resources to maintain.
Not suitable for complex, interconnected data.
Unstructured Data
• Definition: Unstructured data is not organized in a pre-defined
manner or does not have a pre-defined data model. It is difficult to
process and analyze.
• Examples: Word documents, PDFs, emails, audio files, etc.
• Advantages: Can capture nuanced information. More flexible as it
does not require a predefined schema.
• Disadvantages: Difficult to analyze and process. Requires more
storage space.
Semi-Structured Data
• Definition: Semi-structured data is a type of data that is both raw and
formatted, falling somewhere in between structured and
unstructured data.
• Examples: XML files, JSON files, etc.
• Advantages: More flexible than structured data, while still being
easier to analyze than unstructured data.
• Disadvantages: Can be more complex to work with and manage
compared to structured data.
•XML: extensible Markup Language
<person>
<name>John Doe</name>
<email>john.doe@example.com</email>
<age>30</age>
</person>
JSON: JavaScript Object Notation

{
"person": {
"name": "John Doe",
"email": "john.doe@example.com",
"age": 30
}
}
Data Sources
• Explanation:
• Data sources are the locations, files, databases, or services where
data comes from.
• Understanding data sources is important as the quality and reliability
of the data can greatly impact the results of data analysis.
Databases
• Explanation: Databases are structured sets of data. They are a
common source of data for analytics.
• Discussion: There are different types of databases,
• such as SQL (relational databases) and
• NoSQL (non-relational databases like MongoDB).
• Examples: Customer information in a SQL database, product
information in a NoSQL database.
Web Data
• Explanation: Web data refers to data that is obtained from the
internet. This can include data scraped from websites, data from
social media platforms, etc.
• Discussion: Different types of web data include text data, user
behaviour data, transactional data, etc.
• Examples: Tweets scraped from Twitter for sentiment analysis,
product reviews scraped from e-commerce websites.
Sensor Data
• Explanation: Sensor data is data that is collected by sensors, which
can be anything from temperature sensors to motion sensors.
• Discussion: Different types of sensor data include time series data,
spatial data, etc.
• This data is often used in IoT (Internet of Things) applications.
• Examples: Temperature data from a weather station, accelerometer
data from a smartphone
Data Collection Types
• Primary data collection involves gathering new data directly from the
source,
• while secondary data collection involves using data that already
exists, such as data from existing databases or data collected by
others.
Data Collection Methods
• Explanation: Data collection methods refer to how we obtain data.
• Common methods include surveys, where we ask people for
information;
• experiments, where we observe outcomes under controlled
conditions;
• observations, where we collect data about real-world behavior.
Data Preprocessing
• Definition: Data preprocessing is the process of cleaning and
transforming raw data into an understandable format.
• It’s a crucial step before data analysis or data modeling.
• Overview:
• Preprocessing involves data cleaning (removing noise and
inconsistencies),
• data transformation (normalizing data),
• data integration (combining data from various sources).
Data Cleaning
• Definition: Data cleaning involves handling missing values, removing
duplicates, and treating outliers.
• It ensures the quality of the data and improves the accuracy of the
insights derived from it.
• Discussion: Techniques include imputation for handling missing
values, deduplication for removing duplicate data, and outlier
detection methods for identifying and handling anomalies in the data.
Data Transformation
• Definition: Data transformation involves changing the format,
structure, or values of data to prepare it for analysis.
• It can involve
• normalization (scaling data to a small, specified range),
• standardization (shifting the distribution of each attribute to have a
mean of zero and a standard deviation of one),
• binning (converting numerical variables into categorical
counterparts).
• Discussion: These techniques help in reducing the complexity of data
and making data compatible for analysis.
Normalization

• Normalization involves scaling data to fit within a small, specified

range, typically between 0 and 1. This is useful when you want to
ensure that all features contribute equally to the analysis. The
formula for min-max normalization is:

• [ 10, 20, 30, 40, 50 ] >[ 0, 0.25, 0.5, 0.75, 1 ]

Standardization
• Standardization transforms data to have a mean of zero and a
standard deviation of one. This is useful when you want to compare
data that have different units or scales. The formula for
standardization is

• [ 10, 20, 30, 40, 50 ] >[ -1.41, -0.71, 0, 0.71, 1.41 ]

Data Integration
• Definition: Data integration involves combining data from different
sources and providing users with a unified view of the data.
• Discussion: This process becomes significant in a variety of situations,
which include both
• commercial (when two similar companies need to merge their
databases)
• scientific (combining research findings from different bioinformatics
repositories, for example) applications.
Data Analytics Tools
• Data analytics tools are software applications used to process and
analyze data. They help data analysts manage and interpret data from
various sources
• We will be discussing the features and use cases of popular data
analytics tools like R, Python, and SAS.
SAS
• Introduction to SAS:
SAS (Statistical Analysis System) is a software suite developed by SAS
Institute for advanced analytics, business intelligence, data
management, and predictive analytics.
• Key features and use cases of SAS in data analytics:
SAS provides a graphical point-and-click user interface for non-technical
users and more advanced options through the SAS language.
It is widely used in the corporate world.
R
• Introduction to R:
• R is a programming language and free software environment for
statistical computing and graphics.
It is widely used among statisticians and data miners for developing
statistical software and data analysis.
• Key features and use cases of R in data analytics:
R provides a wide variety of statistical and graphical techniques and is
highly extensible.
It is used in fields like healthcare, finance, academia, etc.
Python
• Python is a high-level, interpreted programming language. It is known
for its simplicity and readability, making it a popular choice for
beginners and experts in data analytics
• Python has powerful libraries for data manipulation and analysis like
pandas, NumPy, and SciPy.
• It is used in various domains like web development, machine learning,
AI, and more
Data Analytics Technologies
• Data analytics technologies refer to the frameworks and systems used
to process and analyze large datasets. They are designed to handle
big data and are essential for advanced analytics.
• Discussion on various technologies such as Hadoop, Spark, etc.: We
will be discussing the features and use cases of popular data analytics
technologies like Hadoop and Spark.
Hadoop
• Hadoop is an open-source software framework for storing data and
running applications on clusters of commodity hardware.
• It provides massive storage for any kind of data, enormous processing
power, and the ability to handle virtually limitless concurrent tasks or
jobs.
• Key features and use cases of Hadoop in data analytics: Hadoop is
known for its scalability, cost-effectiveness, flexibility, and fault
tolerance.
• It is used in various industries like finance, healthcare, media, etc.
Spark
• Introduction to Spark: Spark is an open-source, distributed computing
system used for big data processing and analytics.
• It provides an interface for programming entire clusters with implicit
data parallelism and fault tolerance.
• Key features and use cases of Spark in data analytics: Spark is known
for its speed, ease of use, and versatility.
• It can be used for various tasks like batch processing, real-time data
streaming, machine learning, etc.

Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
unit-1ppt
No ratings yet
unit-1ppt
29 pages
Data Analytics For IOT
No ratings yet
Data Analytics For IOT
57 pages
1 - Konsep Big Data
No ratings yet
1 - Konsep Big Data
35 pages
unit-1ppt-241202105748-ba1c594f
No ratings yet
unit-1ppt-241202105748-ba1c594f
30 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
DA Merge Notes(30!09!24)
No ratings yet
DA Merge Notes(30!09!24)
348 pages
Data Analysis _Unit1
No ratings yet
Data Analysis _Unit1
65 pages
2.1_Data_Analytics[1]
No ratings yet
2.1_Data_Analytics[1]
16 pages
Module 1 & 2 DAEH QB
No ratings yet
Module 1 & 2 DAEH QB
69 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
download
No ratings yet
download
4 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
9 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
Introd Ata Lytics
No ratings yet
Introd Ata Lytics
32 pages
DA-Unit-2-Trio-1
No ratings yet
DA-Unit-2-Trio-1
26 pages
Unit 1
No ratings yet
Unit 1
61 pages
Summary_ Introduction to Data Analytics (2)-3978
No ratings yet
Summary_ Introduction to Data Analytics (2)-3978
7 pages
UNITWISE-IMP-NOTES
No ratings yet
UNITWISE-IMP-NOTES
34 pages
Unit 1 Introduction to Data Analytics
No ratings yet
Unit 1 Introduction to Data Analytics
20 pages
Unit-II Notes
No ratings yet
Unit-II Notes
9 pages
Data Analytics Quantum
No ratings yet
Data Analytics Quantum
144 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
Week-1-Lecture
No ratings yet
Week-1-Lecture
26 pages
Chapter 1 DA
No ratings yet
Chapter 1 DA
73 pages
What is Data Analytics
No ratings yet
What is Data Analytics
12 pages
dataanalyticsunit-1[1]
No ratings yet
dataanalyticsunit-1[1]
26 pages
Session1-DataCharacteristics
No ratings yet
Session1-DataCharacteristics
41 pages
Data Analytics-Wps Office
No ratings yet
Data Analytics-Wps Office
21 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
UNIT1
100% (1)
UNIT1
37 pages
DAVAI Macro
No ratings yet
DAVAI Macro
6 pages
Unit 1
No ratings yet
Unit 1
36 pages
CHAPTER-1
No ratings yet
CHAPTER-1
149 pages
L01-Fundamentals of Big Data and Data Analytics (1)
No ratings yet
L01-Fundamentals of Big Data and Data Analytics (1)
58 pages
Da Unit-1
No ratings yet
Da Unit-1
23 pages
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
No ratings yet
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
10 pages
Unit II
No ratings yet
Unit II
91 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
Unit1
No ratings yet
Unit1
21 pages
Introduction-to-Data-Analytics
No ratings yet
Introduction-to-Data-Analytics
15 pages
Lecture 3 (DS) - Steps in Data Science Process
No ratings yet
Lecture 3 (DS) - Steps in Data Science Process
57 pages
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
Unit 1
No ratings yet
Unit 1
19 pages
Unit-01 Varun Singh
No ratings yet
Unit-01 Varun Singh
34 pages
unit 2
No ratings yet
unit 2
81 pages
Big Data Day II
No ratings yet
Big Data Day II
38 pages
Dmml Notes
No ratings yet
Dmml Notes
89 pages
Data Analytics for Business-3 Marks
No ratings yet
Data Analytics for Business-3 Marks
5 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
DA-1,2,3[1]_merged
No ratings yet
DA-1,2,3[1]_merged
39 pages
Untitled Document-1
No ratings yet
Untitled Document-1
3 pages
Data Analysis
No ratings yet
Data Analysis
11 pages
Unit - I DA.pptx
No ratings yet
Unit - I DA.pptx
107 pages
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Script Datamover Migrar Awe
No ratings yet
Script Datamover Migrar Awe
3 pages
Tca (Customer Information)
No ratings yet
Tca (Customer Information)
5 pages
Employment: Objective IBM
No ratings yet
Employment: Objective IBM
1 page
SAP BSIS Interview Questions
No ratings yet
SAP BSIS Interview Questions
2 pages
Ge Elec M3
No ratings yet
Ge Elec M3
3 pages
This Study Resource Was: Running Head: Comparison of Owasp and Osstmm
No ratings yet
This Study Resource Was: Running Head: Comparison of Owasp and Osstmm
4 pages
Luka Miletic: Volvo, Remote - Junior Penetration Tester
No ratings yet
Luka Miletic: Volvo, Remote - Junior Penetration Tester
2 pages
Midterm Exam
No ratings yet
Midterm Exam
4 pages
Multiple ch3
No ratings yet
Multiple ch3
4 pages
Building Cybersecurity Capability, Maturity, Resilience
No ratings yet
Building Cybersecurity Capability, Maturity, Resilience
18 pages
Accenture Reformx PDF
No ratings yet
Accenture Reformx PDF
8 pages
Fundamentals of Designing and Deploying Computer Networks - Internet Society
No ratings yet
Fundamentals of Designing and Deploying Computer Networks - Internet Society
4 pages
Silo - Tips - Configuring An Opennms Stand by Server
No ratings yet
Silo - Tips - Configuring An Opennms Stand by Server
12 pages
002-System Design Interviews
No ratings yet
002-System Design Interviews
17 pages
SAP S4 HANA Data Migration Tool
No ratings yet
SAP S4 HANA Data Migration Tool
12 pages
Ravindra Narayan
No ratings yet
Ravindra Narayan
5 pages
Wireless Application Protocol: 1. A Different Market
No ratings yet
Wireless Application Protocol: 1. A Different Market
6 pages
PP Interview Questions
No ratings yet
PP Interview Questions
4 pages
TGPecats Circular - Implementation of 30 Days Password Reset Policy Wef 11.3.2024
No ratings yet
TGPecats Circular - Implementation of 30 Days Password Reset Policy Wef 11.3.2024
3 pages
52560
No ratings yet
52560
51 pages
A10 Ddos Datasheet
No ratings yet
A10 Ddos Datasheet
12 pages
0003044634
No ratings yet
0003044634
1,556 pages
1a - Kelompok 8 (Personal Health Record)
No ratings yet
1a - Kelompok 8 (Personal Health Record)
3 pages
Module - 5: Python Application Programming
No ratings yet
Module - 5: Python Application Programming
25 pages
Assignment Questions 2
No ratings yet
Assignment Questions 2
2 pages
11 - Implementing Secure Network Protocols
No ratings yet
11 - Implementing Secure Network Protocols
28 pages
Introduction To Computer
No ratings yet
Introduction To Computer
45 pages
Lecture 7: Service Management: Readings
No ratings yet
Lecture 7: Service Management: Readings
14 pages
Course Syllabus Advanced Penetration Testing: Georgia-Weidman
No ratings yet
Course Syllabus Advanced Penetration Testing: Georgia-Weidman
6 pages
part_3-運銷系統之軟體確效準備
No ratings yet
part_3-運銷系統之軟體確效準備
16 pages