0% found this document useful (0 votes)

113 views

Lecture - 04 - Data Understanding and Preparation

The document summarizes key concepts related to data understanding and preparation, which are the second and third phases of the CRISP-DM process for data analytics projects. It discusses evaluating data quality, cleaning data through handling missing values and noisy data/outliers, integrating and reducing data, and transforming data. The goal of data preparation is to minimize garbage in/garbage out by improving data quality and preparing the data for analysis.

Uploaded by

Youssef Kamoun

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

113 views

Lecture - 04 - Data Understanding and Preparation

Uploaded by

Youssef Kamoun

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

CS 434

Data Analytics
Lecture 4
Data Understanding & Preparation

Dr. Firas Jabloun

1
CRISP-DM, Phases and Tasks

Today’s Course
Business Data Data
Modeling Evaluation Deployment
Understanding Understanding Preparation
•Determine •Collect Initial •Select Data •Select •Evaluate •Plan
Business Data •Clean Data Modeling Results Deployment
Objectives •Describe •Construct Technique •Review •Plan
•Assess Data Data •Generate Test Process Monitoring &
Situation •Explore Data •Integrate Design •Determine Maintenance
•Determine •Verify Data Data •Build Model Next Steps •Produce Final
Data Mining Quality •Format Data •Assess Model Report
Goals •Review
•Produce Project
Project Plan

2
Table of contents

1. Introduction
2. Data cleaning
3. Data reduction
4. Data transformation

3
Introduction
• Data understanding and data preparation (2nd and 3rd phases of CRISP-DM
standard process)
▪ Evaluate the quality of the data
▪ Clean the raw data and Deal with missing data
▪ Integrate different sources
▪ Reduce data where necessary
▪ Perform transformations on certain variables

4
Introduction: What is data?

Collection of data objects and their

attributes Attributes

An attribute is a property or Tid Refund Marital

Status
Taxable
Income
Cheat

characteristic of an object: 1 Yes Single 125K No

• Examples: eye color of a person, temperature, 2 No Married 100K No

etc. 3 No Single 70K No

• Attribute is also known as variable, field, 4 Yes Married 120K No

characteristic, or feature 5 No Divorce 95K Yes
d
Objects
A collection of attributes describe an 6 No Married 60K No
7 Yes Divorce 220K No
object d
8 No Single 85K Yes
• Object is also known as record, point, case, 9 No Married 75K No
sample, entity, or instance 10 No Single 90K Yes
10

5
Introduction: Attribute values

Attribute values are numbers or symbols assigned to an

attribute.

Distinction between attributes and attribute values

• Same attribute can be mapped to different attribute values

• Example: height can be measured in feet or meters

Different attributes can be mapped to the same set of

values
• Example: Attribute values for ID and age are integers
• But properties of attribute values can be different
• ID has no limit but age has a maximum and minimum value

6
Data quality

• Data have quality if they satisfy the requirements of the intended use
• Measures for data quality: A multidimensional view
▪ Accuracy: correct or wrong, accurate or not
▪ Completeness: not recorded, unavailable, …
▪ Consistency: some modified but some not, dangling, …
▪ Timeliness: timely update?
▪ Believability: how trustable is the data?
▪ Interpretability: how easily the data can be understood?
• Two different users may have very different assessments of the quality of a
given dataset

7
Data Preparation
• Much of the raw data contained in databases is unprocessed, incomplete, and noisy

▪ Poor quality data result in incorrect and unreliable data mining results
▪ In recent years we moved from “all signal” to “mostly noise” data!!
• Data preparation (or data preprocessing) means the manipulation of data into a form suitable for
further analysis and processing

▪ Objective: minimize garbage in, garbage out (GIGO). It improves the quality of data and
consequently helps improve the quality of data mining results
• Depending on the dataset, data preprocessing alone can account for 10-60% of all time and effort
of the data mining process (data preparation activities are routine, tedious, and time consuming)

8
Major tasks in Data Preparation

• Data cleaning: Fill in missing values, smooth noisy data, identify

or remove outliers, and resolve inconsistencies
• Data integration: Integration of multiple databases, data cubes,
or files
• Data reduction: Dimensionality reduction, numerosity reduction,
data compression
• Data transformation and data discretization: Normalization,
Aggregation, Binning…

9
Table of contents

1. Introduction
2. Data cleaning
3. Data reduction
4. Data transformation

10
Data cleaning
• Data cleaning (cleansing or scrubbing) routines attempt to fill in missing values, smooth
out noise while identifying outliers, and correct inconsistencies in the data

11
Missing data

• A dataset can have missing

(unavailable) values
because those values either
cannot be measured or are
just unknown.

• Different ways to represent

missing values in a data set:
blank, ?, null, NA, « »,etc

12
Missing data

• Missing data may be due to

▪ equipment malfunction
▪ inconsistent with other recorded data and thus deleted
▪ data not entered due to misunderstanding
▪ certain data may not be considered important at the time of entry
▪ not register history or changes of the data
• Missing data may need to be inferred

13
Handling Missing Data

➢ Two main ways to deal with missing or inconsistent values :

1. Replace with a proper value (also called imputation)
• Use a value that makes sense, such as the mean, median, mode
1. Remove the entire example/ Ignore the tuple: usually done when class label is missing
(when doing classification)
- Not effective when the % of missing values per attribute varies considerably
➢ Choosing between these 2 methods depends on the underlying meaning of
the values
➢ Also, sometimes we may leave the value as missing, because subsequent
modeling step can actually tolerate missing values (e.g. decision trees)

14
Handling missing data: Imputation
➢ Manual Imputation: tedious + infeasible?
➢ Automatic Imputation using:
▪ a global constant : e.g., “unknown”, a new class?!
▪ the attribute mean/the local mean/the moving average
▪ the attribute mean for all samples belonging to the same class (smarter)
▪ the most probable value: inference-based such as Bayesian formula or decision tree, or
regression

15
Noisy data

❖Noise: random error or variance in a measured variable, a value that is not

valid or meaningful, for example has wrong data type, wrong range, or
impossible value
➢ Incorrect attribute values may be due to
▪ faulty data collection instruments
▪ data entry problems
▪ data transmission problems
▪ technology limitation
▪ inconsistency in naming convention

16
How to handle noisy data (outliers)?

• Binning:
▪ first sort data and partition into (equal-frequency) bins
▪ then one can smooth by bin means, smooth by bin median, smooth by bin
boundaries, etc.
• Regression: smooth by fitting the data into regression functions

• Clustering: detect and remove outliers

• Combined computer and human inspection: detect suspicious

values and check by human (e.g., deal with possible outliers)

17
How would you handle this?

Nescafe % of Own Label % of Nescafe Kenco

Nescafe Share Nescafe Price Kenco Price Own Label Kenco % of All
Time Period all Stores All Stores Advertising Advertising
of Market (Pence) (Pence) Price (Pence) Stores Stocking
Stocking Stocking (£000s) (£000s)

1 25.7% 68.0 59.8 38.7 66 89 79 255

2 26.7% 69.4 65.4 39.0 65 88 79 252
3 27.8% 67.1 39.6 69 89 77 233
4 25.2% 74.9 66.8 38.8 67 89 78 210
5 25.7% 73.9 67.0 39.7 66 89 78 250
6 27.1% 74.8 67.2 39.2 67 88 79 306
7 64.4 77 319
8 35.3% 0 38.7 66 88 78
9 32.5% 0 65.9 40.2 65 90 77
10 78.2 66.8 38.9 0 90 77
11 23.1% 75.7 65.8 0 90 78
12 25.0% 73.2 65.9 0 90 48
13 21.1% 77.7 66.6 0 92 46 195
14 19.9% 74.2 64.6 66 95 44 209
15 17.5% 75.2 65.2 37.8 62 94 47 171
16 21.0% 72.0 65.1 44.6 64 94 45 182
17 27.0% 67.4 45.1 63 94 46 1.52
18 24.6% 72.6 69.0 45.8 67 95 48
19 23.1% 76.6 69.0 46.4 66 0 54
20 25.8% 72.3 69.0 47.1 64 0 57
21 29.5% 71.6 68.9 46.5 67 0 48

18
Identifying misclassifications
• A frequency distribution can be used to make sure that all labels of a categorical
variable are all valid and consistent
• Example (Larose & Larose, 2015, p. 26)

19
The normal distribution: Review

• Bell Shaped
• Symmetrical
f(x)
• Mean, Median and Mode are Equal
• Location is determined by the mean, μ
• Spread is determined by the standard deviation, σ σ
• The random variable has an infinite theoretical range: x
• +  to −  μ

Mean = Median = Mode

20
The Standard Normal Distribution: Review
• Also known as the “z” distribution
• Mean is defined to be 0
• Standard Deviation is 1
f(z)
• Values above the mean have positive
z-values
• Values below the mean have negative
z-values z
-1.96 0 +1.96
• Translate from x to the standard normal
(the “z” distribution) by subtracting the
mean of x and dividing by its standard • 95.0% of the scores fall between a Z of -1.96 to +1.96
deviation: • 99.9% of the scores fall between a Z of -3.30 to +3.30
x −μ
z=
σ

21
22
Graphical methods for identifying outliers
➢ Outliers are extreme values that go against the trend of the remaining data.
➢ Certain statistical methods are sensitive to the presence of outliers and may deliver
unreliable results
➢ Graphical methods for identifying outliers for numeric variables include:
▪ Histograms
▪ Box Plot
▪ Scatter diagram

23
Histogram
❖HISTOGRAM: A graph in which the classes are marked on the horizontal axis and the class
frequencies on the vertical axis. The class frequencies are represented by the heights of
the bars and the bars are drawn adjacent to each other.

Bin
Histogram: Discovering Outlier

Outlier
Box Plot: Percentiles & Quartiles
➢ The pth percentile of a data set is a value such that at least p percent of the items take on this value
or less and at least (100 - p) percent of the items take on this value or more.
▪ Arrange the data in ascending order.
▪ Compute index i, the position of the pth percentile.
i = (p/100)n
▪ If i is not an integer, round up. The p th percentile is the value in the i th position.
▪ If i is an integer, the p th percentile is the average of the values in positions i and i +1.
➢ Quartiles are specific percentiles
▪ First Quartile = 25th Percentile
▪ Second Quartile = 50th Percentile = Median
▪ Third Quartile = 75th Percentile
➢ IQR: Interquartile Range=Q3-Q1
➢A measure of the spread of the middle 50% of the data.
Box Plot

❖ A box plot is a graphical rendition of statistical data based on

the minimum, first quartile, median, third quartile, and
maximum.
❖ The term "box plot" comes from the fact that the graph looks
like a rectangle with lines extending from the top and bottom.

IQR=Interquartile range
27
Box Plot: Identifying outliers
• A robust measure of identifying outliers (univariate case) is defined as follows. A
data value is an outlier if
▪ it is located 1.5(IQR) or more below Q1, or
▪ It is located 1.5(IQR) or more above Q3.
• Example: Q1=70, Q3=80, identify the boundaries starting from which some
of the values are considered outliers:
• IQR=80-70=10
• A value is identified as an outlier if:
• It is lower than Q1-1.5*IQR=55, or
• It is higher than Q3+1.5*IQR=95

28
Box Plot: Z-Score Example

• Case item number 282

has a star, indicating that
it is an extreme score.
• We need to convert the
data to Z scores to
examine the Z for case
282.

29
The column
ZReactionTime
contains the Z scores
for ReactionTime.

The Z score for case 282 is

4.944. Since the value is
much greater than our
arbitrary cutoff of 3.

We will delete this data

point
Scatter Plot
• Scatter plots are similar to line graphs in that they use horizontal and vertical
axes to plot data points. However, they have a very specific purpose. Scatter
plots show how much one variable is affected by another. The relationship
between two variables is called their correlation .
• Scatter plots usually consist of a large body of data. The closer the data points
come when plotted to making a straight line, the higher the correlation between
the two variables, or the stronger the relationship.

32
Correlation vs. Possible Relationship Between Variables
❖ Direct cause and effect,
▪ e.g. water causes plant to grow
❖ Both cause and effect,
▪ E.g. coffee consumption causes nervousness as well nervous people have more coffee.
❖ Relationship caused by third variable;
▪ Death due to drowning and soft drink consumption during summer.
• Both variables are related to heat and humidity (third variable).
• This is dangerous (Why?)
❖ Coincidental relationship;
▪ Increase in the number of people exercising and increase in the number of people committing crimes.
• This is even more dangerous (Why?)
❖ Correlation measures association and not causation.

33
Other problems
❑ Other data problems which require data cleaning

▪ duplicate records
▪ incomplete data
▪ inconsistent data

34
Table of contents

1. Introduction
2. Data cleaning
3. Data reduction
4. Data transformation

36
Data Reduction

1) Sampling observations (examples)

• Select a small but representative subset of the observations
• Appropriate when number of observations in dataset is too large
• Usually done by some form of random sampling

2) Removing irrelevant attributes

• Because the dataset should only contain attributes that are relevant
(pertinent) to the pattern of interest
• This is a non-trivial task, especially with large number of attributes
• Typically achieved with a combination of domain knowledge, visual
exploration of the data, and statistical dimensionality reduction
techniques (such as Principal Components Analysis)

37
Data reduction: keep in mind the following tradeoffs

❖Data reduction helps reduce the processing time of the 2 subsequent phases
(modeling & evaluation)
▪ Ideally, should not throw away data unless we’re sure it is not essential
▪ The number of observations in the dataset should be at least 6 times the
number of attributes
• This is a so-called rule of thumb (not an exact rule)
• Need sufficient observations in order to obtain reliable modeling results

38
Table of contents

1. Introduction
2. Data cleaning
3. Data reduction
4. Data transformation

39
Data transformation
• Data are transformed or consolidated to forms appropriate for mining
• Strategies for data transformation include the following
▪ Smoothing
▪ Attribute construction
▪ Aggregation
▪ Normalization
▪ Discretization
▪ Concept hierarchy generation for nominal data

40
Normalization
• Variables tend to have ranges that vary greatly from each other
• The measurement unit used can affect the data analysis
• For some data mining algorithms, differences in ranges will lead to a tendency for the
variable with greater range to have undue influence on the results
• Data miners should normalize their numeric variables in order to standardize the scale
of effect each variable has on the results

41
Normalization
• Normalizing the data attempts to give all attributes an equal weight
• The terms standardize and normalize are used interchangeably in data preprocessing
• Algorithms that make use of distance measures, such as the k-nearest neighbors
algorithm benefit from normalization
• Notation:
▪ X : original field value
▪ X*: normalized field value

42
Min-max normalization
• Performs a linear transformation on the original data
• Min-max normalization works by seeing how much greater the field value is
than the minimum value, min(X), and scaling this difference by the range:
X − min( X ) X − min( X )
*
X mm = =
range( X ) max( X ) − min( X )

• Min-max normalization preserves the relationships among the original data

values
• Values range between 0 and 1
• Min-max normalization will encounter an « out-of-bounds » error if a future
input case for normalization falls outside of the original data range of X

43
Z-score normalization
• Also called zero-mean normalization
• Z-score standardization works by taking the difference between the field
value and the field mean value, and scaling this difference by the standard
deviation of the field values:
X − mean( X )
z − score =
SD( X )
• The z-score normalization is useful when the actual minimum and maximum
of an attribute X are unknown, or when there are outliers that dominate the
min-max normalization
• A variation of the z-score normalization replaces the standard deviation of
X by the mean absolute deviation of X

44
Decimal scaling
❖ Decimal scaling ensures that every normalized value lies
between -1 and 1
X
X decimal = d
*

• d represents the number of digits in the data value with the largest absolute value

45
Normalization-examples

• Suppose that the minimum, maximum, mean, and standard deviation of the
values for the attribute income are $12000, $98000, $54000 and $16000.
Transform a value of $73600 for income using:
▪ Min-max normalization: a value of $73600 for income is transformed to 0.716
▪ Z-score normalization: a value of $73600 for income is transformed to 1.225
• Suppose that the recorded values range from -986 to 917. To normalize by
decimal scaling, we therefore divide each value by 1000 = 103
▪ 986 normalizes to -0.986 and 917 normalizes to 0.917

46
Normalization — remarks
• Normalization can change the original data quite a bit,
especially when using the z-score normalization or decimal
scaling
• It is necessary to save the normalization parameters (e.g., the
mean and standard deviation if using z-score normalization) so
that future data can be normalized in a uniform manner
• The normalization parameters now become model parameters
and the same value should be used when the model is used on
new data (e.g. testing data)

47
Transformations to achieve normality
• Some data mining algorithms and statistical methods require that the variables be
normally distributed
• Z-score transformation does not achieve normality

Z-score
Transformation

48
Transformations to achieve normality
❖ The skewness of a distribution is measured by
3(mean − median)
sk =
std
▪ Much real-world data is right skewed, including most financial data
➢ right skewed : sk >0
▪ Left-skewed data is not as common (often occurs when the data is right-censored)
➢ Left skewed : sk <0

49
Transformations to achieve normality
• Common transformations to achieve normality (more normally distributed) by
eliminating skewness)
➢ ln(x)
➢ Sqrt(x)
➢ 1/x
➢ 1/sqrt(x)

50
Transformations to achieve normality
• Normal probability plot can be used to check whether the new variable is
normally distributed or not (normality ≠ symmetry).
• Algorithms requiring normality usually do fine when supplied with data that
is symmetric and unimodal.
• Don’t forget to « de-transform » the data when the algorithm is done with its
results:
➢ What does this mean?

51
Checking Normality

We can use the following to determine if a

distribution is approximately normal:
• Distribution should be similar in shape to the normal
curve.
• Q-Q Plot values should lie close to the 45 line.
• Skew & (excess) Kurtosis should be reasonably close to 0.
• Note: For a Normal distribution, the Kurtosis should be 3,
but usually there is a correction term of -3 leading to the
so called “excess” Kurtosis which should be 0.

52
Checking Normality: Histogram

• Looking at the histogram with

Normal curve can show
problems in skewness and
therefore departure from
normality
Checking Normality: Quantile-Quantile (Q-Q) Plot

➢The Normal probability plot, plots

the quantiles/percentiles of a
particular distribution against the
quantiles of the standard normal The bulk of the points
distribution should fall on the straight
➢The circles should fall on the 45 lines
degree line:
➢ For this data set the ends are
deviating from the line, again
suggesting a problem with
normality.
Checking Normality: Skewness & Kurtosis

• Skewness & Kurtosis should

be reasonably close to 0.
Dummy variables

Some analytical methods, such as regression, require

predictors to be numeric

A dummy variable (or flag variable, or indicator variable) is a

categorical variable taking only two values, 0 and 1

When a categorical predictor takes k ≥ 3 possible values, then

define k-1 dummy variables and use the unassigned category
as the reference category

56
Transforming categorical variables into numerical variables
• In most instances, the data analyst should
avoid transforming categorical variables
to numeric variables

• The exception is for categorical variables

that are clearly ordered
▪ One may bicker with the actual values assigned
▪ But still not very recommended (why?)

57
Discretization by binning numerical values
• Some algorithms prefer categorical rather than continuous predictors, in which case
we would need to partition the numerical predictors into bins or bands
• Common methods for binning numerical predictors:
▪ Equal width binning
▪ Equal frequency binning
▪ Binning by clustering
▪ Binning based on predictive value

58
Discretization by binning
▪ Binning does not use class information and is therefore an unsupervised discretization
technique
▪ Binning is sensitive to the user-specified number of bins, as well as the presence of
outliers
▪ Equal width binning is not recommended for most data mining applications (the width
of the categories can be greatly affected by the presence of outliers)
▪ Equal frequency binning assumes that each category is equally likely (an assumption
which is usually not warranted)
▪ Binning by clustering and binning based on predictive value are preferred

59
Discretization by binning --example

➢ Suppose we have the following tiny data set, which we would like to discretize into
k=3 categories: X = {1, 1, 1, 1, 1, 2, 2, 11, 11, 12, 12, 44}
▪ Equal width binning: [0, 15), [15, 30), [30, 45)
▪ Equal frequency binning: n/k = 12 / 3 = 4
▪ K-means clustering
Reclassifying categorical variables
▪ Reclassifying categorical variables is the equivalent of binning numerical variables

▪ Often, a categorical variable will contain too many easily analyzable field values (for example state)

▪ Data mining methods such as logistic regression perform sub -optimally when confronted with
predictors containing too many field values

▪ The data analyst should reclassify the field values (the reclassification should support the objectives of
the business problem or research question)

CS 434 Data Analytics: Dr. Firas Jabloun
100% (1)
CS 434 Data Analytics: Dr. Firas Jabloun
5 pages
Tutorial 11
0% (1)
Tutorial 11
5 pages
Customer Satisfaction Survey Questionnaire For Customers
100% (2)
Customer Satisfaction Survey Questionnaire For Customers
3 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Big Data - S
No ratings yet
Big Data - S
79 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
Examples On Triggers: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
No ratings yet
Examples On Triggers: Instructor: Mohamed Eltabakh Meltabakh@cs - Wpi.edu
15 pages
Module No 5 Relational Database Design
No ratings yet
Module No 5 Relational Database Design
160 pages
DBMS Module1 Part1
No ratings yet
DBMS Module1 Part1
66 pages
Health Data Quality
No ratings yet
Health Data Quality
20 pages
Lecture1 Big Data
No ratings yet
Lecture1 Big Data
47 pages
Final - DBMS UNIT-5
No ratings yet
Final - DBMS UNIT-5
181 pages
Report Design & Data Monitor Using Businessobjects Dashboard Design
No ratings yet
Report Design & Data Monitor Using Businessobjects Dashboard Design
74 pages
Perl Tutorial
No ratings yet
Perl Tutorial
32 pages
SQL Basic
100% (1)
SQL Basic
53 pages
Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats
No ratings yet
Introduction To R: Shanti.S.Chauhan, PH.D Business Studies Shuats
53 pages
RDBMS
No ratings yet
RDBMS
155 pages
20IT503 - Big Data Analytics - Unit1
No ratings yet
20IT503 - Big Data Analytics - Unit1
59 pages
Data Distribution
No ratings yet
Data Distribution
18 pages
Data Mining
No ratings yet
Data Mining
87 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
1 page
DBMS Module 2
No ratings yet
DBMS Module 2
125 pages
Module 4 SQL
No ratings yet
Module 4 SQL
151 pages
Lesson1 - Data Definitions
No ratings yet
Lesson1 - Data Definitions
57 pages
Training in R For Data Statistics
No ratings yet
Training in R For Data Statistics
113 pages
Unit-2 SQL Updated
No ratings yet
Unit-2 SQL Updated
102 pages
PPT ch01
No ratings yet
PPT ch01
82 pages
Data Mining Abstract
No ratings yet
Data Mining Abstract
6 pages
Lesson 6 Data Life Cycle Part 2
No ratings yet
Lesson 6 Data Life Cycle Part 2
30 pages
Unit 01
No ratings yet
Unit 01
32 pages
Seminar 7 Introduction To Databases
No ratings yet
Seminar 7 Introduction To Databases
41 pages
Final - Unit 3 Data Preprocessing - Phases
No ratings yet
Final - Unit 3 Data Preprocessing - Phases
42 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
Subqueries
No ratings yet
Subqueries
32 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
L9 SQL
No ratings yet
L9 SQL
128 pages
DataMining S
No ratings yet
DataMining S
103 pages
Data Mining: Business Intelligence
No ratings yet
Data Mining: Business Intelligence
68 pages
Session 3 4 Data Literacy Privacy Ethics
No ratings yet
Session 3 4 Data Literacy Privacy Ethics
19 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Descriptive Data Analytics
No ratings yet
Descriptive Data Analytics
56 pages
Big Data Analytics and Artificial Intelligence in
No ratings yet
Big Data Analytics and Artificial Intelligence in
10 pages
DBMS - Module 3 Ppts - Jan28th (Autosaved)
No ratings yet
DBMS - Module 3 Ppts - Jan28th (Autosaved)
104 pages
Visualisation For Data Science Predict Overview 3267
No ratings yet
Visualisation For Data Science Predict Overview 3267
15 pages
SQL
No ratings yet
SQL
101 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
22 pages
Module 4
No ratings yet
Module 4
63 pages
Advanced SQL - LAB 2
No ratings yet
Advanced SQL - LAB 2
11 pages
Introduction of Big Data & Applications
No ratings yet
Introduction of Big Data & Applications
107 pages
DBMS Module 1
No ratings yet
DBMS Module 1
56 pages
Data Mining and Visualization
No ratings yet
Data Mining and Visualization
8 pages
Big Data Analytics and Visualization Lab
No ratings yet
Big Data Analytics and Visualization Lab
193 pages
SQL Introduction
No ratings yet
SQL Introduction
96 pages
Chapter 5: Advanced SQL: Database System Concepts, 6 Ed
No ratings yet
Chapter 5: Advanced SQL: Database System Concepts, 6 Ed
77 pages
Data Visualisation
No ratings yet
Data Visualisation
55 pages
The Next Frontier For Innovation, Competition and Productivity
No ratings yet
The Next Frontier For Innovation, Competition and Productivity
23 pages
Data Warehousing and Data Mining
75% (4)
Data Warehousing and Data Mining
14 pages
Lesson 3 Data Cleaning and Preparation
No ratings yet
Lesson 3 Data Cleaning and Preparation
105 pages
Lecture 1 - Introduction To Information Systems
No ratings yet
Lecture 1 - Introduction To Information Systems
48 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
No ratings yet
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
3 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
AppDynamics Third Edition
From Everand
AppDynamics Third Edition
Gerardus Blokdyk
No ratings yet
JNS AgriBiotech Vol 48 02
No ratings yet
JNS AgriBiotech Vol 48 02
10 pages
Lecture 03 Probability and Statistics Review Part2
No ratings yet
Lecture 03 Probability and Statistics Review Part2
74 pages
TH2019PESC2039
No ratings yet
TH2019PESC2039
239 pages
Olive Mill Wastewater Treatment and Valorization Technologies-Converti
No ratings yet
Olive Mill Wastewater Treatment and Valorization Technologies-Converti
29 pages
Climate Change Issues & Responses
No ratings yet
Climate Change Issues & Responses
29 pages
Parameters To Monitor The Biogas Production
No ratings yet
Parameters To Monitor The Biogas Production
1 page
Research Report HW2
No ratings yet
Research Report HW2
15 pages
Homework 1: Exercise 2
No ratings yet
Homework 1: Exercise 2
2 pages
Energy Policy: Raphael J. Heffron, Darren Mccauley, Benjamin K. Sovacool
No ratings yet
Energy Policy: Raphael J. Heffron, Darren Mccauley, Benjamin K. Sovacool
9 pages
Advantages of Biodiesel:: It Is Less Polluting and Environment - Friendly
No ratings yet
Advantages of Biodiesel:: It Is Less Polluting and Environment - Friendly
3 pages
Solar Ceils, 6
No ratings yet
Solar Ceils, 6
9 pages
Impac
No ratings yet
Impac
5 pages
Energy Conversion and Management: Review
No ratings yet
Energy Conversion and Management: Review
27 pages
Thermodynamics - Assignment 2
No ratings yet
Thermodynamics - Assignment 2
2 pages
Recent Progress in Concentrator Photovoltaics
No ratings yet
Recent Progress in Concentrator Photovoltaics
9 pages
1/define and Understand The Problem
No ratings yet
1/define and Understand The Problem
2 pages
3/employ One of The Problem-Solving Techniques
No ratings yet
3/employ One of The Problem-Solving Techniques
1 page
4.preparation of Acetylsalicylic Acid
No ratings yet
4.preparation of Acetylsalicylic Acid
10 pages
Handling Imbalanced Dataset
No ratings yet
Handling Imbalanced Dataset
23 pages
An Investigation Into The Problems and P
No ratings yet
An Investigation Into The Problems and P
47 pages
HRM Project Report On Performance Appraisal System at BSNL, Performance Appraisal System Project Report MBA BBA, HRM Project Report, HR Project Report
No ratings yet
HRM Project Report On Performance Appraisal System at BSNL, Performance Appraisal System Project Report MBA BBA, HRM Project Report, HR Project Report
4 pages
Jenani's Critical Reviews
No ratings yet
Jenani's Critical Reviews
17 pages
General Steps of Test Construction in Psychological Testing
No ratings yet
General Steps of Test Construction in Psychological Testing
13 pages
Report Analysis - Scribd
No ratings yet
Report Analysis - Scribd
7 pages
The Effect of Street Trading On Children
No ratings yet
The Effect of Street Trading On Children
3 pages
MSC Finance Aberdeen Uni
No ratings yet
MSC Finance Aberdeen Uni
10 pages
3rd Year Course Outline
100% (3)
3rd Year Course Outline
19 pages
Micromax
No ratings yet
Micromax
47 pages
COMPRE EXAM Reviewer 1
No ratings yet
COMPRE EXAM Reviewer 1
7 pages
Course Outline MPPU 1070 1819 - 1
No ratings yet
Course Outline MPPU 1070 1819 - 1
5 pages
Midterm Exam Business Analytics
No ratings yet
Midterm Exam Business Analytics
4 pages
Request Letter and Authorization
No ratings yet
Request Letter and Authorization
5 pages
RGTRG
No ratings yet
RGTRG
16 pages
Output Process Input: Conceptual Framework
No ratings yet
Output Process Input: Conceptual Framework
2 pages
Bahari (1999) - Investigations Into The Quality of Public Services in Malaysia
No ratings yet
Bahari (1999) - Investigations Into The Quality of Public Services in Malaysia
587 pages
Chapter 6
No ratings yet
Chapter 6
11 pages
A Survey Is A Study
No ratings yet
A Survey Is A Study
15 pages
Operational Search For mh370 - Final - 3oct2017 PDF
No ratings yet
Operational Search For mh370 - Final - 3oct2017 PDF
440 pages
SI Lec 5
0% (2)
SI Lec 5
5 pages
Bashar 2019
No ratings yet
Bashar 2019
6 pages
Behavioural Approach To Political Science PDF
100% (3)
Behavioural Approach To Political Science PDF
3 pages
A Summer Training Project Report ON Study of Consumer Perception Regarding Panasonicrefrigerators"
No ratings yet
A Summer Training Project Report ON Study of Consumer Perception Regarding Panasonicrefrigerators"
49 pages
Inter-Regional Disparities in Industrial Growth and Structure
No ratings yet
Inter-Regional Disparities in Industrial Growth and Structure
100 pages
Study CSC Report Card Survey Rcs
No ratings yet
Study CSC Report Card Survey Rcs
48 pages
Organizational Behavior Syllabus
No ratings yet
Organizational Behavior Syllabus
19 pages
Glass Defects
No ratings yet
Glass Defects
27 pages