chapter-1 Introduction to Data Analytics
chapter-1 Introduction to Data Analytics
{
"person": {
"name": "John Doe",
"email": "john.doe@example.com",
"age": 30
}
}
Data Sources
• Explanation:
• Data sources are the locations, files, databases, or services where
data comes from.
• Understanding data sources is important as the quality and reliability
of the data can greatly impact the results of data analysis.
Databases
• Explanation: Databases are structured sets of data. They are a
common source of data for analytics.
• Discussion: There are different types of databases,
• such as SQL (relational databases) and
• NoSQL (non-relational databases like MongoDB).
• Examples: Customer information in a SQL database, product
information in a NoSQL database.
Web Data
• Explanation: Web data refers to data that is obtained from the
internet. This can include data scraped from websites, data from
social media platforms, etc.
• Discussion: Different types of web data include text data, user
behaviour data, transactional data, etc.
• Examples: Tweets scraped from Twitter for sentiment analysis,
product reviews scraped from e-commerce websites.
Sensor Data
• Explanation: Sensor data is data that is collected by sensors, which
can be anything from temperature sensors to motion sensors.
• Discussion: Different types of sensor data include time series data,
spatial data, etc.
• This data is often used in IoT (Internet of Things) applications.
• Examples: Temperature data from a weather station, accelerometer
data from a smartphone
Data Collection Types
• Primary data collection involves gathering new data directly from the
source,
• while secondary data collection involves using data that already
exists, such as data from existing databases or data collected by
others.
Data Collection Methods
• Explanation: Data collection methods refer to how we obtain data.
• Common methods include surveys, where we ask people for
information;
• experiments, where we observe outcomes under controlled
conditions;
• observations, where we collect data about real-world behavior.
Data Preprocessing
• Definition: Data preprocessing is the process of cleaning and
transforming raw data into an understandable format.
• It’s a crucial step before data analysis or data modeling.
• Overview:
• Preprocessing involves data cleaning (removing noise and
inconsistencies),
• data transformation (normalizing data),
• data integration (combining data from various sources).
Data Cleaning
• Definition: Data cleaning involves handling missing values, removing
duplicates, and treating outliers.
• It ensures the quality of the data and improves the accuracy of the
insights derived from it.
• Discussion: Techniques include imputation for handling missing
values, deduplication for removing duplicate data, and outlier
detection methods for identifying and handling anomalies in the data.
Data Transformation
• Definition: Data transformation involves changing the format,
structure, or values of data to prepare it for analysis.
• It can involve
• normalization (scaling data to a small, specified range),
• standardization (shifting the distribution of each attribute to have a
mean of zero and a standard deviation of one),
• binning (converting numerical variables into categorical
counterparts).
• Discussion: These techniques help in reducing the complexity of data
and making data compatible for analysis.
Normalization