0% found this document useful (0 votes)

7 views

UNIT 2 Notes - Data Science

The document discusses data collection and preprocessing, outlining methods for gathering primary and secondary data, including interviews, observations, surveys, and internet sources. It emphasizes the importance of data preprocessing in ensuring data quality through steps like data cleaning, integration, reduction, and transformation, which enhance the accuracy and usability of data for analysis. Additionally, it introduces exploratory data analysis (EDA) as a technique for understanding data patterns and relationships, detailing various analysis types and their applications.

Uploaded by

Ankita Kaushik

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

UNIT 2 Notes - Data Science

Uploaded by

Ankita Kaushik

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

UNIT -2

Data Collection and Data Pre - Processing

What is Data Collection?

Data collection is the process of collecting, measuring and analyzing

different types of information using a set of standard validated
techniques. The main objective of data collection is to gather
information-rich and reliable data, and analyze them to make critical
business decisions. Once the data is collected, it goes through a
rigorous process of data cleaning and data processing to make this data
truly useful for businesses. There are two main methods of data
collection in research based on the information that is required,
namely:

 Primary Data Collection

 Secondary Data Collection

Primary Data Collection Methods

Primary data refers to data collected from first-hand experience
directly from the main source. It refers to data that has never been
used in the past. The data gathered by primary data collection
methods are generally regarded as the best kind of data in
research.The methods of collecting primary data can be further
divided into quantitative data collection methods (deals with factors
that can be counted) and qualitative data collection methods (deals
with factors that are not necessarily numerical in nature).

1. Interviews
Interviews are a direct method of data collection. It is simply a
process in which the interviewer asks questions and the interviewee
responds to them. It provides a high degree of flexibility because
questions can be adjusted and changed anytime according to the
situation.

2. Observations
In this method, researchers observe a situation around them and
record the findings. It can be used to evaluate the behaviour of
different people in controlled (everyone knows they are being
observed) and uncontrolled (no one knows they are being observed)
situations. This method is highly effective because it is
straightforward and not directly dependent on other
participants. For example, a person looks at random people that
walk their pets on a busy street, and then uses this data to decide
whether or not to open a pet food store in that area.
3. Surveys and Questionnaires
Surveys and questionnaires provide a broad perspective from large
groups of people. They can be conducted face-to-face, mailed, or
even posted on the Internet to get respondents from anywhere in
the world. The answers can be yes or no, true or false, multiple
choice, and even open-ended questions. However, a drawback of
surveys and questionnaires is delayed response and the possibility
of ambiguous answers.

4. Focus Groups
A focus group is similar to an interview, but it is conducted with a
group of people who all have something in common. The data
collected is similar to in-person interviews, but they offer a better
understanding of why a certain group of people thinks in a particular
way. However, some drawbacks of this method are lack of privacy
and domination of the interview by one or two participants. Focus
groups can also be time-consuming and challenging, but they help
reveal some of the best information for complex situations.

Secondary Data Collection Methods

Secondary data refers to data that has already been collected by
someone else. It is much more inexpensive and easier to collect
than primary data. While primary data collection provides more
authentic and original data, there are numerous instances where
secondary data collection provides great value to organizations.
1. Internet
The use of the Internet has become one of the most popular
secondary data collection methods in recent times. There is a large
pool of free and paid research resources that can be easily accessed
on the Internet. While this method is a fast and easy way of data
collection, you should only source from authentic sites while
collecting information.
2. Government Archives
There is lots of data available from government archives that you
can make use of. The most important advantage is that the data in
government archives are authentic and verifiable. The challenge,
however, is that data is not always readily available due to a
number of factors. For example, criminal records can come under
classified information and are difficult for anyone to have access to
them.
3. Libraries
Most researchers donate several copies of their academic research
to libraries. You can collect important and authentic information
based on different research contexts. Libraries also serve as a
storehouse for business directories, annual reports and other similar
documents that help businesses in their research.
Data Pre - processing

Data Preprocessing can be defined as a process of converting raw

data into a format that is understandable and usable for further
analysis. It is an important step in the Data Preparation stage. It
ensures that the outcome of the analysis is accurate, complete,
and consistent.

Data Types

Data Type can be defined as labeling the values a feature can hold. The data type will
also determine what kinds of relational, mathematical, or logical operations can be
performed on it. A few of the most common data types include Integer, Floating,
Character, String, Boolean, Array, Date, Time, etc.

Data Summary

Data Summary can be defined as generating descriptive or summary statistics for the
features in a given dataset. For example, for a numeric column, it will
compute mean, max, min, std, etc. For a categorical variable, it will compute the
count of unique labels, labels with the highest frequency, etc.

Why is Data Preprocessing Important

Data Preprocessing is an important step in the Data Preparation

stage of a Data Science development lifecycle that will ensure
reliable, robust, and consistent results. The main objective of this
step is to ensure and check the quality of data before applying any
Machine Learning or Data Mining methods. Let’s review some of its
benefits -

 Accuracy - Data Preprocessing will ensure that input data is

accurate and reliable by ensuring there are no manual entry
errors, no duplicates, etc.
 Completeness - It ensures that missing values are handled, and
data is complete for further analysis.
 Consistent - Data Preprocessing ensures that input data is
consistent, i.e., the same data kept in different places should
match.
 Timeliness - Whether data is updated regularly and on a timely
basis or not.
 Trustable - Whether data is coming from trustworthy sources or
not.
 Interpretability - Raw data is generally unusable, and Data
Preprocessing converts raw data into an interpretable format.
Key Steps in Data Preprocessing

Data Cleaning

Data Cleaning uses methods to handle incorrect, incomplete,

inconsistent, or missing values. Some of the techniques for Data
Cleaning include -

1. Handling Missing Values

Input data can contain missing or NULL values, which must be

handled before applying any Machine Learning or Data Mining
techniques.

Missing values can be handled by many techniques, such as

removing rows/columns containing NULL values and imputing NULL
values using mean, mode, regression, etc.

2. De-noising

De-noising is a process of removing noise from the data. Noisy data

is meaningless data that is not interpretable or understandable by
machines or humans. It can occur due to data entry errors, faulty
data collection, etc.

De-noising can be performed by applying many techniques, such as

binning the features, using regression to smoothen the features to
reduce noise, clustering to detect the outliers, etc.

Data Integration

Data Integration can be defined as combining data from multiple

sources. A few of the issues to be considered during Data
Integration include the following -

Entity Identification Problem - It can be defined as identifying

objects/features from multiple databases that correspond to the
same entity. For example, in database A _customer_id,_ and in
database B _customer_number_ belong to the same entity.

Schema Integration - It is used to merge two or more database

schema/metadata into a single schema. It essentially takes two or
more schema as input and determines a mapping between them.
For example, entity type CUSTOMER in one schema may have
CLIENT in another schema.

Detecting and Resolving Data Value Concepts - The data can

be stored in various ways in different databases, and it needs to be
taken care of while integrating them into a single dataset. For
example, dates can be stored in various formats such
as DD/MM/YYYY, YYYY/MM/DD, or MM/DD/YYYY, etc.

Data Reduction

Data Reduction is used to reduce the volume or size of the input

data. Its main objective is to reduce storage and analysis costs and
improve storage efficiency. A few of the popular techniques to
perform Data Reduction include -

Dimensionality Reduction - It is the process of reducing the

number of f eatures in the input dataset. It can be performed in
various ways, such as selecting features with the highest
importance, Principal Component Analysis (PCA), etc.

Numerosity Reduction - In this method, various techniques

can be applied to reduce the volume of data by choosing
alternative smaller representations of the data. For example, a
variable can be approximated by a regression model, and
instead of storing the entire variable, we can store the
regression model to approximate it.
Data Compression - In this method, data is compressed.
Data Compression can be lossless or lossy depending on
whether the information is lost or not during compression.

Data Transformation

Data Transformation is a process of converting data into a format

that helps in building efficient ML models and deriving better
insights. A few of the most common methods for Data
Transformation include -

Smoothing - Data Smoothing is used to remove noise in the

dataset, and it helps identify important features and detect
patterns. Therefore, it can help in predicting trends or future events.

Aggregation - Data Aggregation is the process of transforming

large volumes of data into an organized and summarized
format that is more understandable and comprehensive. For
example, a company may look at monthly sales data of a product
instead of raw sales data to understand its performance better and
forecast future sales.

Discretization - Data Discretization is a process of converting

numerical or continuous variables into a set of intervals/bins. This
makes data easier to analyze. For example, the age features can be
converted into various intervals such as (0-10, 11-20, ..) or
(child, young, …).

Normalization - Data Normalization is a process of converting a

numeric variable into a specified range such as [-1,1], [0,1], etc. A
few of the most common approaches to performing normalization
are Min-Max Normalization, Data Standardization or Data
Scaling, etc.

Applications of Data Preprocessing

Data Preprocessing is important in the early stages of a Machine

Learning and AI application development lifecycle. A few of the most
common usage or application include -

Improved Accuracy of ML Models - Various techniques used to

preprocess data, such as Data Cleaning, Transformation ensure that
data is complete, accurate, and understandable, resulting in
efficient and accurate ML models.

Reduced Costs - Data Reduction techniques can help companies

save storage and compute costs by reducing the volume of the data
Visualization - Preprocessed data is easily consumable and
understandable that can be further used to build d ashboards to gain valuable
insights.

Data Preprocessing is a process of converting raw datasets into a

format that is consumable, understandable, and usable for further
analysis. It is an important step in any Data Analysis project that will
ensure the input datasets's accuracy, consistency,
and completeness.

The key steps in this stage include - Data Cleaning, Data

Integration, Data Reduction, and Data Transformation.

It can help build accurate ML models, reduce analysis costs, and

build dashboards on raw data.

Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) refers to the method of studying
and exploring record sets to apprehend their predominant traits,
discover patterns, locate outliers, and identify relationships
between variables. EDA is normally carried out as a preliminary
step before undertaking extra formal statistical analyses or
modeling.

The Foremost Goals of EDA

1. Data Cleaning: EDA involves examining the information for

errors, lacking values, and inconsistencies. It includes techniques
including records imputation, managing missing statistics, and
figuring out and getting rid of outliers.
2. Descriptive Statistics: EDA utilizes precise records to
recognize the important tendency, variability, and distribution of
variables. Measures like suggest, median, mode, preferred
deviation, range, and percentiles are usually used.

Data Visualization: EDA employs visual techniques to represent

the statistics graphically. Visualizations consisting of histograms,
box plots, scatter plots, line plots, heatmaps, and bar charts assist
in identifying styles, trends, and relationships within the facts.

4. Feature Engineering: EDA allows for the exploration of

various variables and their adjustments to create new functions or
derive meaningful insights. Feature engineering can contain
scaling, normalization, binning, encoding express variables, and
creating interplay or derived variables.
5. Correlation and Relationships: EDA allows discover
relationships and dependencies between variables. Techniques
such as correlation analysis, scatter plots, and pass-tabulations
offer insights into the power and direction of relationships between
variables.

6. Data Segmentation: EDA can contain dividing the information

into significant segments based totally on sure standards or traits.
This segmentation allows advantage insights into unique
subgroups inside the information and might cause extra focused
analysis.

7. Hypothesis Generation: EDA aids in generating hypotheses

or studies questions based totally on the preliminary exploration of
the data. It facilitates form the inspiration for in addition
evaluation and model building.

8. Data Quality Assessment: EDA permits for assessing the nice

and reliability of the information. It involves checking for records
integrity, consistency, and accuracy to make certain the
information is suitable for analysis.

Types of EDA

Depending on the number of columns we are analyzing we can

divide EDA into two types.

EDA, or Exploratory Data Analysis, refers back to the method of

analyzing and analyzing information units to uncover styles, pick
out relationships, and gain insights. There are various sorts of EDA
strategies that can be hired relying on the nature of the records
and the desires of the evaluation. Here are some not unusual kinds
of EDA:

1. Univariate Analysis: This sort of evaluation makes a

speciality of analyzing character variables inside the records set. It
involves summarizing and visualizing a unmarried variable at a
time to understand its distribution, relevant tendency, unfold, and
different applicable records. Techniques like histograms, field
plots, bar charts, and precis information are generally used in
univariate analysis.

2. Bivariate Analysis: Bivariate evaluation involves exploring the

connection between variables. It enables find associations,
correlations, and dependencies between pairs of variables. Scatter
plots, line plots, correlation matrices, and move-tabulation are
generally used strategies in bivariate analysis.

3. Multivariate Analysis: Multivariate analysis extends bivariate

evaluation to encompass greater than variables. It ambitions to
apprehend the complex interactions and dependencies among
more than one variables in a records set. Techniques inclusive of
heatmaps, parallel coordinates, aspect analysis, and primary
component analysis (PCA) are used for multivariate analysis.

4. Time Series Analysis: This type of analysis is mainly applied

to statistics sets that have a temporal component. Time collection
evaluation entails inspecting and modeling styles, traits, and
seasonality inside the statistics through the years. Techniques like
line plots, autocorrelation analysis, transferring averages, and
ARIMA (AutoRegressive Integrated Moving Average) fashions are
generally utilized in time series analysis.

5. Missing Data Analysis: Missing information is a not unusual

issue in datasets, and it may impact the reliability and validity of
the evaluation. Missing statistics analysis includes figuring out
missing values, know-how the patterns of missingness, and using
suitable techniques to deal with missing data. Techniques along
with lacking facts styles, imputation strategies, and sensitivity
evaluation are employed in lacking facts evaluation.

6. Outlier Analysis: Outliers are statistics factors that drastically

deviate from the general sample of the facts. Outlier analysis
includes identifying and knowledge the presence of outliers, their
capability reasons, and their impact at the analysis. Techniques
along with box plots, scatter plots, z-rankings, and clustering
algorithms are used for outlier evaluation.

7. Data Visualization: Data visualization is a critical factor of

EDA that entails creating visible representations of the statistics to
facilitate understanding and exploration. Various visualization
techniques, inclusive of bar charts, histograms, scatter plots, line
plots, heatmaps, and interactive dashboards, are used to
represent exclusive kinds of statistics.

These are just a few examples of the types of EDA techniques that
can be employed at some stage in information evaluation. The
choice of strategies relies upon on the information traits, research
questions, and the insights sought from the analysis.

Exploratory Data Analysis (EDA) Using Python Libraries

For the simplicity of the article, we will use a single dataset. We
will use the employee data for this. It contains 8 columns namely –
First Name, Gender, Start Date, Last Login, Salary, Bonus%, Senior
Management, and Team. We can get the dataset
here Employees.csv

Let’s read the dataset using the Pandas read_csv() function and
print the 1st five rows. To print the first five rows we will use
the head() function.
 Python3

import pandas as pd
import numpy as np
# read datasdet using pandas
df = pd.read_csv('employees.csv')
df.head()

Output:

First five rows of the dataframe

Getting Insights About The Dataset

Let’s see the shape of the data using the shape.

 Python3

df.shape

Output:

(1000, 8)

This means that this dataset has 1000 rows and 8 columns.

Let’s get a quick summary of the dataset using the

pandas describe() method. The describe() function applies basic
statistical computations on the dataset like extreme values, count
of data points standard deviation, etc. Any missing value or NaN
value is automatically skipped. describe() function gives a good
picture of the distribution of data.

Example:

 Python3

df.describe()

Output:

description of the dataframe

Note we can also get the description of categorical columns of the
dataset if we specify include =’all’ in the describe function.

Now, let’s also see the columns and their data types. For this, we
will use the info()method.

 Python3

# information about the dataset

df.info()

Output:

Information about the dataset

Changing Dtype from Object to Datetime

Start Date is an important column for employees. However, it is

not of much use if we can not handle it properly to handle this
type of data pandas provide a special function datetime() from
which we can change object type to DateTime format.

 Python3

# convert "Start Date" column to datetime data type

df['Start Date'] = pd.to_datetime(df['Start Date'])

We can see the number of unique elements in our dataset. This

will help us in deciding which type of encoding to choose for
converting categorical columns into numerical columns.

 Python3

df.nunique()

Output:

First Name 200

Gender 2
Start Date 972
Last Login Time 720
Salary 995
Bonus % 971
Senior Management 2
Team 10
dtype: int64

Till now we have got an idea about the dataset used. Now Let’s
see if our dataset contains any missing values or not.

Handling Missing Values

You all must be wondering why a dataset will contain any missing
values. It can occur when no information is provided for one or
more items or for a whole unit. For Example, Suppose different
users being surveyed may choose not to share their income, and
some users may choose not to share their address in this way
many datasets went missing. Missing Data is a very big problem in
real-life scenarios. Missing Data can also refer to as NA(Not
Available) values in pandas. There are several useful functions for
detecting, removing, and replacing null values in Pandas
DataFrame :

isnull()

notnull()

dropna()

fillna()

replace()

interpolate()

Now let’s check if there are any missing values in our dataset or
not.

Example:

 Python3

df.isnull().sum()

Output:

Null values in dataframe

We can see that every column has a different amount of missing

values. Like Gender has 145 missing values and salary has 0. Now
for handling these missing values there can be several cases like
dropping the rows containing NaN or replacing NaN with either
mean, median, mode, or some other value.

Now, let’s try to fill in the missing values of gender with the string
“No Gender”.
Example:

 Python3

df["Gender"].fillna("No Gender", inplace = True)

df.isnull().sum()

Output:

Null values in dataframe after filling Gender column

We can see that now there is no null value for the gender column.
Now, Let’s fill the senior management with the mode value.

Example:

 Python3

mode = df['Senior Management'].mode().values[0]

df['Senior Management']= df['Senior Management'].replace(np.nan,
mode)

df.isnull().sum()

Output:

Null values in dataframe after filling S senior management column

Now for the first name and team, we cannot fill the missing values
with arbitrary data, so, let’s drop all the rows containing these
missing values.

Example:

 Python3

df = df.dropna(axis = 0, how ='any')

print(df.isnull().sum())
df.shape

Output:
Null values in dataframe after dropping all null values

We can see that our dataset is now free of all the missing values
and after dropping the data the number of rows also reduced from
1000 to 899.

Note: For more information, refer to Working with Missing Data in

Pandas.

After removing the missing data let’s visualize our data.

Data Encoding

There are some models like Linear Regression which does not work
with categorical dataset in that case we should try to encode
categorical dataset into the numerical column. we can use
different methods for encoding like Label encoding or One-hot
encoding. pandas and sklearn provide different functions for
encoding in our case we will use the LabelEncoding function from
sklearn to encode the Gender column.

 Python3

from sklearn.preprocessing import LabelEncoder

# create an instance of LabelEncoder
le = LabelEncoder()

# fit and transform the "Senior Management"

# column with LabelEncoder
df['Gender'] = le.fit_transform\
(df['Gender'])

Noe

Data visualization
Data Visualization is the process of analyzing data in the form of
graphs or maps, making it a lot easier to understand the trends or
patterns in the data.

Let’s see some commonly used graphs –

Note: We will use Matplotlib and Seaborn library for

the data visualization. If you want to know about these
modules refer to the articles –

 Matplotlib Tutorial
 Python Seaborn Tutorial
Histogram

It can be used for both uni and bivariate analysis.

Example:

 Python3

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(x='Salary', data=df, )
plt.show()

Output:

Histogram plot of salary column

Boxplot

It can also be used for univariate and bivariate analyses.

Example:

 Python3

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.boxplot( x="Salary", y='Team', data=df, )

plt.show()

Output:

Boxplot of Salary and team column

Scatter Boxplot For Data Visualization

It can be used for bivariate analyses.

Example:

 Python3

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.scatterplot( x="Salary", y='Team', data=df,

hue='Gender', size='Bonus %')

# Placing Legend outside the Figure

plt.legend(bbox_to_anchor=(1, 1), loc=2)

plt.show()

Output:

Scatter plot of salary and Team column

For multivariate analysis, we can use pairplot()method of the

seaborn module. We can also use it for the multiple pairwise
bivariate distributions in a dataset.

Example:

 Python3

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.pairplot(df, hue='Gender', height=2)

Output:

Pairplot of columns of dataframe

Handling Outliers
An Outlier is a data item/object that deviates significantly from the
rest of the (so-called normal)objects. They can be caused by
measurement or execution errors. The analysis for outlier
detection is referred to as outlier mining. There are many ways to
detect outliers, and the removal process of these outliers from the
dataframe is the same as removing a data item from the panda’s
dataframe.

Let’s consider the iris dataset and let’s plot the boxplot for the
SepalWidthCm column.

Example:

 Python3

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset

df = pd.read_csv('Iris.csv')

sns.boxplot(x='SepalWidthCm', data=df)

Output:

Boxplot of sample width column before outliers removal

In the above graph, the values above 4 and below 2 are acting as
outliers.

Removing Outliers

For removing the outlier, one must follow the same process of
removing an entry from the dataset using its exact position in the
dataset because in all the above methods of detecting the outliers
end result is the list of all those data items that satisfy the outlier
definition according to the method used.

Example: We will detect the outliers using IQR and then we will
remove them. We will also draw the boxplot to see if the outliers
are removed or not.

 Python3

# Importing
import sklearn
from sklearn.datasets import load_boston
import pandas as pd
import seaborn as sns

# Load the dataset

df = pd.read_csv('Iris.csv')

# IQR
Q1 = np.percentile(df['SepalWidthCm'], 25,
interpolation = 'midpoint')

Q3 = np.percentile(df['SepalWidthCm'], 75,
interpolation = 'midpoint')
IQR = Q3 - Q1

print("Old Shape: ", df.shape)

# Upper bound
upper = np.where(df['SepalWidthCm'] >= (Q3+1.5*IQR))

# Lower bound
lower = np.where(df['SepalWidthCm'] <= (Q1-1.5*IQR))

# Removing the Outliers

df.drop(upper[0], inplace = True)
df.drop(lower[0], inplace = True)

print("New Shape: ", df.shape)

sns.boxplot(x='SepalWidthCm', data=df)

Output:

Boxplot of sample width after outlier removal

Note: for more information, refer Detect and Remove the Outliers
using Python

These are some of the EDA we do during our data science project
however it depends upon your requirement and how much data
analysis we do.

Chapter 3 - Methodology Revised (Partial - Data Gathering)
100% (10)
Chapter 3 - Methodology Revised (Partial - Data Gathering)
7 pages
(Latest Edited) Full Note Sta404 - 01042022
No ratings yet
(Latest Edited) Full Note Sta404 - 01042022
108 pages
Data Processing and Analysis
100% (3)
Data Processing and Analysis
38 pages
Unit 2 FDS
No ratings yet
Unit 2 FDS
13 pages
Data mining 3
No ratings yet
Data mining 3
31 pages
Data Science
No ratings yet
Data Science
68 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Unit 2 Data Preprocessing and Association Rule Mining
No ratings yet
Unit 2 Data Preprocessing and Association Rule Mining
31 pages
BigData Theory Updated 2
No ratings yet
BigData Theory Updated 2
28 pages
CDS - Unit 2
No ratings yet
CDS - Unit 2
31 pages
Knowledge Discovery in Databases
No ratings yet
Knowledge Discovery in Databases
17 pages
Tools For Data Preparation
No ratings yet
Tools For Data Preparation
4 pages
Data Mining
100% (1)
Data Mining
18 pages
Unit 3
No ratings yet
Unit 3
18 pages
Practicalno: 1 Introduction To Database: Data
No ratings yet
Practicalno: 1 Introduction To Database: Data
33 pages
Unit 3 Data Warehousing and Data Mining
No ratings yet
Unit 3 Data Warehousing and Data Mining
7 pages
Research Methodology (Data Analysis)
No ratings yet
Research Methodology (Data Analysis)
7 pages
Business Uses of Data Mining and Data Warehousing MIS 304 Section 04 CRN-41595
No ratings yet
Business Uses of Data Mining and Data Warehousing MIS 304 Section 04 CRN-41595
23 pages
Data Analysis - PDF Book
No ratings yet
Data Analysis - PDF Book
4 pages
Data Mining 1
No ratings yet
Data Mining 1
166 pages
Q 7
No ratings yet
Q 7
2 pages
DATA MINING MODULE 2
No ratings yet
DATA MINING MODULE 2
23 pages
Notes Data Science With Python 1
No ratings yet
Notes Data Science With Python 1
18 pages
DS Module2 L1 L11
No ratings yet
DS Module2 L1 L11
27 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
11 pages
module 1.
No ratings yet
module 1.
7 pages
Data Mining
No ratings yet
Data Mining
11 pages
DATA ANALYTICS note
No ratings yet
DATA ANALYTICS note
52 pages
N OTES
No ratings yet
N OTES
17 pages
Data Mining 445545
No ratings yet
Data Mining 445545
11 pages
DM Module1
No ratings yet
DM Module1
15 pages
COMPUTATIONAL DATA SCIENCE - UNIT 2
No ratings yet
COMPUTATIONAL DATA SCIENCE - UNIT 2
11 pages
Data Mining unit-1 complete
No ratings yet
Data Mining unit-1 complete
45 pages
Data Collection and Types of Research
No ratings yet
Data Collection and Types of Research
7 pages
Data_Mining_Warehousing Unit II
No ratings yet
Data_Mining_Warehousing Unit II
39 pages
Dw&bi PR2,3
No ratings yet
Dw&bi PR2,3
6 pages
Data Collection Methods
No ratings yet
Data Collection Methods
8 pages
M3 - Business Data Analysis
No ratings yet
M3 - Business Data Analysis
31 pages
Unit 1
No ratings yet
Unit 1
22 pages
M.E.-ISE-2023-25-60 PIS E31-RSA-Best Practices in Data Mining
No ratings yet
M.E.-ISE-2023-25-60 PIS E31-RSA-Best Practices in Data Mining
3 pages
DATA WAREHOUSING UNIT 1[1]
No ratings yet
DATA WAREHOUSING UNIT 1[1]
26 pages
unit-1 notes onl
No ratings yet
unit-1 notes onl
25 pages
Module 4
No ratings yet
Module 4
35 pages
Data Mining - Unit - 3
No ratings yet
Data Mining - Unit - 3
62 pages
Unit_2 Data Warehouse
No ratings yet
Unit_2 Data Warehouse
11 pages
DM Sem U-1
No ratings yet
DM Sem U-1
50 pages
DM UNIT -3
No ratings yet
DM UNIT -3
10 pages
Bda Ia I
No ratings yet
Bda Ia I
11 pages
Unit 1 Notes - Data Analysis Using r
No ratings yet
Unit 1 Notes - Data Analysis Using r
17 pages
1 - Introduction To Data Science
No ratings yet
1 - Introduction To Data Science
6 pages
Unit I
No ratings yet
Unit I
41 pages
Data Analysis _Unit1
No ratings yet
Data Analysis _Unit1
65 pages
2-UNIT1
No ratings yet
2-UNIT1
6 pages
Assignment Data
No ratings yet
Assignment Data
7 pages
Assignment DBB3102
No ratings yet
Assignment DBB3102
8 pages
1708443470801
No ratings yet
1708443470801
71 pages
General Data Analyst Interview Questions
No ratings yet
General Data Analyst Interview Questions
7 pages
Important Question of Introduction of Data Science
No ratings yet
Important Question of Introduction of Data Science
10 pages
Data Mining Basics
No ratings yet
Data Mining Basics
52 pages
UNIT 3 DWM NOTES
No ratings yet
UNIT 3 DWM NOTES
17 pages
Data mining and wrangling
No ratings yet
Data mining and wrangling
3 pages
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
How To Compute Mean, Median, Mode, Range, and Standard Deviation
100% (1)
How To Compute Mean, Median, Mode, Range, and Standard Deviation
3 pages
MQB 10 WS 2
No ratings yet
MQB 10 WS 2
4 pages
How To Interpret SEM Model-Fit Results in AMOS
No ratings yet
How To Interpret SEM Model-Fit Results in AMOS
21 pages
Qsar With Python
No ratings yet
Qsar With Python
18 pages
Statistics Formula
No ratings yet
Statistics Formula
6 pages
1
No ratings yet
1
9 pages
comprehensive-examination-in-statistics-educ-602
No ratings yet
comprehensive-examination-in-statistics-educ-602
11 pages
CE502 Week 3 (Part 1) Descriptive Statistics
No ratings yet
CE502 Week 3 (Part 1) Descriptive Statistics
36 pages
Test Bank Questions Chapter 7
0% (1)
Test Bank Questions Chapter 7
3 pages
2 Frequency Dist PDF
No ratings yet
2 Frequency Dist PDF
3 pages
Applied Mathematics Seminar Questions
No ratings yet
Applied Mathematics Seminar Questions
6 pages
Lecture 8: Heteroskedasticity: Causes Consequences Detection Fixes
No ratings yet
Lecture 8: Heteroskedasticity: Causes Consequences Detection Fixes
46 pages
KLS Gogte Institute of Technology, Belagavi.: "F Distribution"
No ratings yet
KLS Gogte Institute of Technology, Belagavi.: "F Distribution"
15 pages
Practical Business Statistics Sixth Edition Andrew Siegel - Download the full set of chapters carefully compiled
No ratings yet
Practical Business Statistics Sixth Edition Andrew Siegel - Download the full set of chapters carefully compiled
81 pages
Screening Test (Adv) : Rizwanul Karim Nipsom
No ratings yet
Screening Test (Adv) : Rizwanul Karim Nipsom
37 pages
A Primer in Nonparametric Econometrics
No ratings yet
A Primer in Nonparametric Econometrics
88 pages
Basic econometrics 2023 question paper with solution delhi university BBE business economics
No ratings yet
Basic econometrics 2023 question paper with solution delhi university BBE business economics
7 pages
Name: Badigi Shivakumar Reg - No: 20MIS0173 Lab - Slot: L9+L10 Date: 02-09-2021
No ratings yet
Name: Badigi Shivakumar Reg - No: 20MIS0173 Lab - Slot: L9+L10 Date: 02-09-2021
10 pages
Paper 124 Referredjournal
No ratings yet
Paper 124 Referredjournal
11 pages
〈1033〉 Biological Assay Validation
No ratings yet
〈1033〉 Biological Assay Validation
17 pages
1 s2.0 S1544612321002944 Main
No ratings yet
1 s2.0 S1544612321002944 Main
9 pages
Demand Estimation and Forecasting - Lecturenotes
100% (1)
Demand Estimation and Forecasting - Lecturenotes
33 pages
Causal Comparative Research
100% (2)
Causal Comparative Research
14 pages
ANOVA - Example - Welch and G-H - Key
No ratings yet
ANOVA - Example - Welch and G-H - Key
6 pages
Population, Sample and Sampling Techniques
100% (1)
Population, Sample and Sampling Techniques
8 pages
Frequency - Distribution - PPTX Filename - UTF-8''Frequency Distribution
No ratings yet
Frequency - Distribution - PPTX Filename - UTF-8''Frequency Distribution
16 pages
Sampling Design, Sample Size, and Their Importance Prof Bhisma Murti PDF
No ratings yet
Sampling Design, Sample Size, and Their Importance Prof Bhisma Murti PDF
23 pages
Batc 641
No ratings yet
Batc 641
14 pages