100% found this document useful (1 vote)

202 views

Data Preprocessing

Here are the steps for equi-depth binning of the price data: 1) Sort the data in ascending order: 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 2) Calculate the number of bins (let's say 3 bins): n = total number of data points / number of bins = 12 / 3 = 4 3) Create bins such that each bin has approximately equal number of data points: Bin 1: 4, 8, 9, 15 Bin 2: 21, 21, 24, 25 Bin 3: 26, 28, 29, 34 So the equi-depth binning partitions the data into 3 bins, with each

Uploaded by

Dhruvi Modi

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

202 views

Data Preprocessing

Uploaded by

Dhruvi Modi

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 109

Data

Preprocessing

1
Data Preprocessing

 Types of Attributes
 Why preprocess the data?
 Data cleaning
 Data integration and transformation
 Data reduction
 Discretization and concept hierarchy generation
 Summary

2
Discrete vs. Continuous Attributes
 Discrete Attribute
 Has only a finite or countably infinite set of values

 E.g., zip codes, profession, or the set of words in a

collection of documents
 Sometimes, represented as integer variables

 Note: Binary attributes are a special case of discrete

attributes
 Continuous Attribute
 Has real numbers as attribute values

 Examples: temperature, height, or weight

 Practically, real values can only be measured and

represented using a finite number of digits
 Continuous attributes are typically represented as
floating-point variables
3
Types of Attribute Values
 Nominal
 E.g., profession, ID numbers, eye color, zip codes

 Ordinal
 E.g., rankings (e.g., army, professions), grades,

height in {tall, medium, short}

 Binary
 E.g., medical test (positive vs. negative)

 Interval
 E.g., calendar dates, body temperatures

4
Data Preprocessing

 Why preprocess the data?

 Data cleaning
 Data integration and transformation
 Data reduction
 Discretization and concept hierarchy generation
 Summary

5
Why Preprocess Data
 No quality data, no quality mining results!
 Quality decisions must be based on quality data
 e.g., duplicate or missing data may cause incorrect or even
misleading statistics
 “Data cleaning is the number one problem in data warehousing”—
DCI survey
 Data extraction, cleaning, and transformation comprises the
majority of the work of building a data warehouse

6
Data in the Real World Is Dirty
 incomplete: lacking attribute values, lacking certain
attributes of interest, or containing only aggregate data
 e.g., occupation=“ ” (missing data)

 noisy: containing noise, errors, or outliers

 e.g., Salary=“−10” (an error)

 inconsistent: containing discrepancies in codes or

names, e.g.,
 Age=“42” Birthday=“03/07/1997”

 Was rating “1,2,3”, now rating “A, B, C”

 discrepancy between duplicate records

7
Why Is Data Dirty?
 Incomplete data may come from
 “Not applicable” data value when collected
 Different considerations between the time when the data was
collected and when it is analyzed.
 Human/hardware/software problems
 Noisy data (incorrect values) may come from
 Faulty data collection instruments
 Human or computer error at data entry
 Errors in data transmission
 Inconsistent data may come from
 Different data sources
 Functional dependency violation (e.g., modify some linked data)
 Duplicate records also need data cleaning

8
Major Tasks in Data Preprocessing

 Data cleaning
 Fill in missing values, smooth noisy data, identify or remove
outliers, and resolve inconsistencies
 Data integration
 Integration of multiple databases, data cubes, or files
 Data transformation
 Normalization and aggregation
 Data reduction
 Obtains reduced representation in volume but produces the
same or similar analytical results
 Data discretization
 Part of data reduction but with particular importance, especially
for numerical data
9
Forms of data preprocessing

10
Data Preprocessing

 Why preprocess the data?

 Data cleaning
 Data integration and transformation
 Data reduction
 Discretization and concept hierarchy generation
 Summary

11
Portland Installing 200 Sensors to Improve Traffic
Safety, Government Computer News, Matt Leonard
June 29, 2018

Portland, OR's Traffic Safety Sensor Project involves installing 200 CityIQ
sensor nodes in the city's "high crash network" to gather data on vehicle and
pedestrian traffic, in the hope that will help officials learn how to prevent fatal
accidents. The sensors will be installed on 30 Portland streets that account for
more than half of the city's traffic fatalities, even though they make up only
8% of the city's roadways. The nodes, which are attached to existing street
light poles, have more than 30 built-in sensors, including two cameras that
capture images of the roadway and sidewalk, and an array of environmental
sensors for measuring temperature, pressure, and humidity. Specialized chips
power vision analysis in the units and produce metadata featuring traffic and
pedestrian counts, along with time and location stamps. The metadata is
transmitted to a cloud platform, where it can be accessed via application
programming interfaces.

12
'You Cannot Be Serious': IBM Taps Emotions for
Wimbledon Highlights Reuters June 26, 2018
 IBM's Watson artificial intelligence (AI) platform is analyzing
players' emotions at the Wimbledon tennis tournament to
compile highlights in which players display a heightened
sense of emotion. In addition to recognizing emotions,
Watson is also analyzing crowd noise, players' movements,
and match data. IBM's Sam Seddon says the company is
using machine learning to pinpoint scenes after exciting play
when athletes show their emotions. "If you've got the visual
element from the player, and you know that it's a tight
pressure point in the match, then those are the points that
you are going to really target in on in the highlights
package," he says. Watson also is offering a Wimbledon
chatbot service via Facebook Messenger, which provides
fans with access to customized information on scores, news,
and players

13
Missing Data

 Data is not always available

 E.g., many tuples have no recorded value for several
attributes, such as customer income in sales data
 Missing data may be due to
 equipment malfunction
 inconsistent with other recorded data and thus deleted
 data not entered due to misunderstanding
 certain data may not be considered important at the time of
entry
 not register history or changes of the data
 Missing data may need to be inferred.

14
How to Handle Missing Data?
 Ignore the tuple: usually done when class label is missing (assuming
the tasks in classification—not effective when the percentage of
missing values per attribute varies considerably.
 Fill in the missing value manually: tedious + infeasible?
 Use a global constant to fill in the missing value: e.g., “unknown”, a
new class?!
 Use the attribute mean to fill in the missing value
 Use the attribute mean for all samples belonging to the same class
to fill in the missing value: smarter
 Use the most probable value to fill in the missing value: inference-
based such as Bayesian formula or decision tree
15
Noisy Data

 Noise: random error or variance in a measured variable

 Incorrect attribute values may due to
 faulty data collection instruments

 data entry problems

 data transmission problems

 technology limitation

 inconsistency in naming convention

 Other data problems which requires data cleaning

 duplicate records

 incomplete data

 inconsistent data

16
How to Handle Noisy Data?
 Binning method:
 first sort data and partition into (equi-depth) bins

 then one can smooth by bin means, smooth by bin

median, smooth by bin boundaries, etc.

 Clustering
 detect and remove outliers

 Combined computer and human inspection

 detect suspicious values and check by human

 Regression
 smooth by fitting the data into regression functions

17
Simple Discretization Methods: Binning
 Equal-width (distance) partitioning:
 It divides the range into N intervals of equal size:

uniform grid
 if A and B are the lowest and highest values of the

attribute, the width of intervals will be: W = (B-A)/N.

 The most straightforward

 But outliers may dominate presentation

 Skewed data is not handled well.

 Equal-depth (frequency) partitioning:

 It divides the range into N intervals, each containing

approximately same number of samples

 Good data scaling

 Managing categorical attributes can be tricky.

18
Equi-Depth Binning Method
• Data for price (in dollors) : 24,25,26,34,28,4,8,9,21,21,15,29
• Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26,
28, 29, 34
* Partition into (equi-depth) bins:
- Bin 1: 4, 8, 9, 15
- Bin 2: 21, 21, 24, 25
- Bin 3: 26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 9, 9, 9, 9
- Bin 2: 23, 23, 23, 23
- Bin 3: 29, 29, 29, 29
* Smoothing by bin boundaries:
- Bin 1: 4, 4, 4, 15
- Bin 2: 21, 21, 25, 25
- Bin 3: 26, 26, 26, 34
19
Equi-Width Binning Method
• Data for price (in dollors) : 24,25,26,34,28,4,8,9,21,21,15,29
• Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26,
28, 29, 34
* Partition into (equi-width) bins: Width = (34-4)/3= 10
- Bin 1 [4-14): 4, 8, 9
- Bin 2 [14-24): 15,21, 21
- Bin 3 [24-34]: 24,25,26, 28, 29, 34
* Smoothing by bin means:
- Bin 1: 7,7,7
- Bin 2: 19, 19, 19
- Bin 3: 28, 28, 28, 28
* Smoothing by bin boundaries:
- Bin 1: 4, 9,9
- Bin 2: 15, 21, 21
- Bin 3: 24, 24, 24, 24,34,34
20
Assignment Question
Following data for the attribute age: 20,20, 21, 22,
22, 25, 25, 25,, 36, 40, 45, 46, 52, 70, 13, 15, 16,
16, 19, 25, 30, 33, 33, 35, 35, 35, 35.

(a) Use Equi Depth and Equi width smoothing by bin

means, modes and boundaries and to smooth the
above data, using a bin depth of 3. Illustrate your
steps. Comment on the effect of this technique for
the given data. (n for Equi-width = 3)

21
Data Smoothing using Cluster Analysis

Cluster the data and use properties of the clusters to represent

the instances constituting those clusters.
22
Data Smoothing using Regression

Y1 y

Y1’ y=x+1

X1 x

Data can be smoothed by fitting the data to a function, such as

with regression
23
Data Integration
 Data integration:
 Combines data from multiple sources into a coherent
store
 Schema integration: e.g., A.cust-id  B.cust-#
 Integrate metadata from different sources

 Entity identification problem:

 Identify real world entities from multiple data sources,
e.g., Bill Clinton = William Clinton
 Detecting and resolving data value conflicts
 For the same real world entity, attribute values from
different sources are different
 Possible reasons: different representations, different
scales, e.g., metric vs. British units
24
Assignment Question
 Discuss issues to consider during data integration.

25
Handling Redundancy in Data Integration

 Redundant data occur often when integration of multiple

databases
 Object identification: The same attribute or object may have
different names in different databases
 Derivable data: One attribute may be a “derived” attribute in
another table, e.g., annual revenue
 Redundant attributes may be able to be detected by correlation
analysis
 Careful integration of the data from multiple sources may help
reduce/avoid redundancies and inconsistencies and improve
mining speed and quality

26
Correlation Analysis (Numerical Data)

 Correlation coefficient (also called Pearson’s product moment

coefficient)

rp ,q 
 ( p  p)( q  q)  ( pq)  n p q

(n  1) p q (n  1) p q

where n is the number of tuples, p and q are the respective means of p

and q, σp and σq are the respective standard deviation of p and q, and
Σ(pq) is the sum of the pq cross-product.
 If rp,q > 0, p and q are positively correlated (p’s values increase as q’s).
The higher, the stronger correlation.
 rp,q = 0: independent; rpq < 0: negatively correlated

27
Correlation Analysis (Numerical Data)

28
Correlation Analysis (Example)
Person Height Self Esteem
1 68 4.1
2 71 4.6
3 62 3.8
4 75 4.4
5 58 3.2
6 60 3.1
7 67 3.8
8 68 4.1
9 71 4.3
10 69 3.7
11 68 3.5
12 67 3.2
13 63 3.7
14 62 3.3
15 60 3.4
16 63 4.0
17 65 4.1
18 67 3.8
19 63 3.4

Example
20 61 3.6
August 5, 2019 Data Mining: Concepts and Techniques  29
Example

30
Example

31
Assignment Question
 Car Age(months) Min Stopping at 40 kph(meters)

A 9 28.4
B 15 29.3
C 24 37.6
D 30 36.2
E 38 36.5
F 46 35.3
G 53 36.2
H 60 44.1
I 64 44.8
J 76 47.2

32
Chi Square Method (Categorical Data)
 For categorical (discrete) data, a correlation relationship between two attributes, A and
B, can be discovered by a c2 (chi-square) test.

 Suppose A has c distinct values, namely a1;a2; : : :ac. B has r distinct values, namely
b1;b2; : : :br.

 The data tuples described by A and B can be shown as a contingency table, with the c
values of A making up the columns and the r values of B making up the rows.
 Let (Ai;Bj) denote the event that attribute A takes on value ai and attribute B takes on
value bj, that is, where (A = ai;B = bj).

Male Female
Fiction 250 200
Non Fiction 50 1000

33
Chi Square Method (Categorical Data)
Each and every possible (Ai;Bj) joint event has its own cell (or slot) in
the table. The c2 value (also known as the Pearson c2 statistic) is
computed as:
(Observed  Expected) 2
 
2

Expected
 where oi j is the observed frequency (i.e., actual count) of the joint event (Ai;Bj)
 eij is the expected frequency of (Ai;Bj), which can be computed as

 where N is the number of data tuples, count(A=ai) is the number of tuples having value
ai for A, and count(B = bj) is the number of tuples having value bj for B.
 Cells that contribute the most to the c2 value are those whose actual count is very
different from that expected.
 The larger the Χ2 value, the more likely the variables are related
34
Example
 Suppose that a group of 1,500 people was surveyed.
 The gender of each person was noted. Each person was
polled as to whether their preferred type of reading
material was fiction or nonfiction.
 Thus, we have two attributes, gender and preferred
reading.
 The observed frequency (or count) of each possible joint
event is summarized in the contingency table where the
numbers in parentheses are the expected frequencies
(calculated based on the data distribution for both
attributes).

August 5, 2019 Data Mining: Concepts and Techniques

35
Example

Male Female
Fiction 250 200
Non Fiction 50 1000

the expected frequency for the cell (male, fiction) is

August 5, 2019 Data Mining: Concepts and Techniques

36
Chi-Square Calculation: An Example
Play chess Not play chess Sum (row)
Like science fiction 250(90) 200(360) 450

Not like science fiction 50(210) 1000(840) 1050

Sum(col.) 300 1200 1500

 Χ2 (chi-square) calculation (numbers in parenthesis are expected

counts calculated based on the data distribution in the two categories)

(250  90) 2 (50  210) 2 (200  360) 2 (1000  840) 2

 2
    507.93
90 210 360 840

August 5, 2019 Data Mining: Concepts and Techniques

37
Chi Square Method (Categorical Data)
(250  90) 2 (50  210) 2 (200  360) 2 (1000  840) 2
 
2
    507.93
90 210 360 840

 The c2 statistic tests the hypothesis that A and B are

independent.
 The test is based on a significance level, with (r-1) x (c-1)
degrees of freedom. For Degree 1 the Significance level is
10.828

 It shows that the two attributes are (strongly) correlated

for the given group of people.

August 5, 2019 Data Mining: Concepts and Techniques

38
Example
 A year group in school chooses between drama
and history as below. Is there any difference
between boys' and girls' choices?
 Observed
Chose Chose Total
drama history
 Boys 43 55 98
 Girls 52 54 106
 Total 95 109 204

August 5, 2019 Data Mining: Concepts and Techniques

39
Correlation and Causality

 Correlation does not imply causality

 # of hospitals and # of car-theft in a city are
correlated
 Both are causally linked to the third variable:
population

August 5, 2019 Data Mining: Concepts and Techniques

40
Data Transformation
 A function that maps the entire set of values of a given attribute to a new
set of replacement values s.t. each old value can be identified with one of
the new values
 Methods
 Smoothing: Remove noise from data

 Aggregation: Summarization, data cube construction

 Generalization: Concept hierarchy climbing

 Normalization: Scaled to fall within a small, specified range

 min-max normalization

 z-score normalization

 normalization by decimal scaling

 Attribute/feature construction

 New attributes constructed from the given ones

August 5, 2019 Data Mining: Concepts and Techniques

41
Transformation Techniques
 1. Smoothing, which works to remove the noise from data. Such techniques
include binning, clustering, and regression.

 2. Aggregation, where summary or aggregation operations are applied to the

data. For example, the daily sales data may be aggregated so as to compute
monthly and annual total amounts.

 3. Generalization of the data, where low level or ”primitive" (raw) data are
replaced by higher level concepts through the use of concept hierarchies. For
example, categorical attributes, like street, can be generalized to higher level
concepts, like city or county. Similarly, values for numeric attributes, like age,
may be mapped to higher level concepts, like young, middle-aged, and
senior.

 4. Normalization, where the attribute data are scaled so as to fall within a

small specied range, such as -1.0 to 1.0, or 0 to 1.0.

 5. Attribute construction (or feature construction), where new attributes are

constructed and added from the given set of attributes to help the mining
process.
August 5, 2019 Data Mining: Concepts and Techniques
42
Normalization
 An attribute is normalized by scaling its values so that they fall within
a small specified range, such as 0 to 1.0.

 Normalization is particularly useful for classification algorithms

involving neural networks, or distance measurements such as
nearest-neighbor classification and clustering.

 If using the neural network back propagation algorithm for

classification mining, normalizing the input values for each attribute
measured in the training samples will help speed up the learning
phase.

 For distance-based methods, normalization helps prevent attributes

with initially large ranges (e.g., income) from outweighing attributes
with initially smaller ranges (e.g., binary attributes).

August 5, 2019 Data Mining: Concepts and Techniques

43
Methods of Normalization
 min-max normalization,
 z-score normalization,
 Normalization by decimal scaling.

August 5, 2019 Data Mining: Concepts and Techniques

44
Data Transformation:
Normalization

 Min-Max Normalization:
 Performs a linear transformation on the original data.
 Suppose minA and maxA are the minimum and
maximum values of an attribute A.
 Min-Max normalization maps a value v of A to v’ in the
range[new_minA ,new_maxA]
v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA
 Preserves the relationships among the original data

August 5, 2019 Data Mining: Concepts and Techniques

45
Example
 Suppose that the minimum and maximum
values for the attribute income are $12,000
and $98,000, respectively. We would like to
map income to the range [0.0,1.0]. By min-
max normalization, a value of $73,600 for
income is transformed to
v  minA
v'  (new _ maxA  new _ minA)  new _ minA
maxA  minA
73,600  12,000
(1.0  0)  0  0.716
98,000  12,000

August 5, 2019 Data Mining: Concepts and Techniques

46
Z-Score Normalization (zero-mean normalization)

•The values for an attribute A are normalized based on

the mean and standard deviation of A. A value v of A is
normalized to v’ by computing
v  A
v' 
 A

where are the mean and standard deviation, respectively, of

attribute A.

August 5, 2019 Data Mining: Concepts and Techniques

47
Example
 Suppose that the mean and standard deviation of the
values for the attribute income are $54,000 and
$16,000, respectively.
 Z-score normalization (μ: mean, σ: standard deviation):
 Ex. Let μ = 54,000, σ = 16,000. Then

73,600  54,000
 1.225
16,000

August 5, 2019 Data Mining: Concepts and Techniques

48
Decimal scaling Normalization
 Normalization by decimal scaling
v Where j is the smallest integer such that
v'  Max(|ν’|) < 1
10 j

• Suppose that the recorded values of A range from

-986 to 917.
• The maximum absolute value of A is 986.
• To normalize by decimal scaling, we therefore divide
each value by 1,000 (i.e., j = 3) so that -986 normalizes
to -0.986.

August 5, 2019 Data Mining: Concepts and Techniques

49
Assignment

Use the two methods below to normalize the following

group of data:
202, 303, 404, 606, 1001

(a) min-max normalization by setting min = 0 and max = 1

(b) z-score normalization

50
Chapter 3: Data Preprocessing

 Why preprocess the data?

 Data cleaning
 Data integration
 Data transformation
 Data reduction
 Discretization and concept hierarchy generation
 Summary
August 5, 2019 Data Mining: Concepts and Techniques
51
Major Tasks in Data Preprocessing

 Why data reduction?

 A database/data warehouse may store

terabytes of data
 Complex data analysis/mining may take a

very long time to run on the complete data

set
 Data reduction: Obtain a reduced representation
of the data set that is much smaller in volume
but yet produce the same (or almost the same)
analytical results
August 5, 2019 Data Mining: Concepts and Techniques
53
Data reduction strategies
 Data cube aggregation, where aggregation operations are applied to the
data in the construction of a data cube.
 Dimension reduction, where irrelevant, weakly relevant, or redundant
attributes or dimensions may be detected and removed
 Data compression, where encoding mechanisms are used to reduce the data
set size.
 Numerosity reduction, where the data are replaced or estimated by
alternative, smaller data representations such as parametric models (which
need store only the model parameters instead of the actual data), or
nonparametric methods such as clustering, sampling, and the use of
histograms.
 Discretization and concept hierarchy generation, where raw data values
for attributes are replaced by ranges or higher conceptual levels. Concept
hierarchies allow the mining of data at multiple levels of abstraction, and are
a powerful tool for data mining.
August 5, 2019 Data Mining: Concepts and Techniques
55
Data Cube Aggregation
Data cubes store multidimensional aggregated information

•Data cubes provide fast access to precomputed, summarized

data, thereby beneting on-line analytical processing as well as
data mining.
•The cube created at the lowest level of abstraction is referred
to as the base cuboid. A cube for the highest level of
abstraction is the apex cuboid.
August 5, 2019 Data Mining: Concepts and Techniques
56
Data Cube Aggregation

 Imagine that you have collected the data for your analysis. These
data consist of the All Electronics sales per quarter, for the years
1997 to 1999. You are, however, interested in the annual sales
(total per year), rather than the total per quarter. Thus the data
can be aggregated so that the resulting data summarize the total
sales per year instead of per quarter.

 The resulting data set is smaller in volume, without loss of

information necessary for the analysis task.

August 5, 2019 Data Mining: Concepts and Techniques

57
Sales data for a given branch of All Electronics for the years 1997
to 1999. In the data on the left, the sales are shown per quarter.
In the data on the right, the data are aggregated to provide the
annual sales.

August 5, 2019 Data Mining: Concepts and Techniques

58
Dimensionality Reduction

 Data sets for analysis may contain hundreds of attributes, many of

which may be irrelevant to the mining task, or redundant.

 if the task is to classify customers as to whether or not they are

likely to purchase a popular new CD at AllElectronics

 Attributes such as the customer's telephone number are likely to

be irrelevant, unlike attributes such as age or music taste.

 This can result in discovered patterns of poor quality.

 In addition, the added volume of irrelevant or redundant attributes

can slow down the mining process.

August 5, 2019 Data Mining: Concepts and Techniques

59
Dimensionality Reduction

 Feature selection (i.e., attribute subset selection):

 Select a minimum set of features such that the probability distribution of

different classes given the values for those features is as close as possible
to the original distribution given the values of all features

 reduce # of patterns in the patterns, easier to understand

 Heuristic methods (due to exponential # of choices):

 step-wise forward selection

 step-wise backward elimination

 combining forward selection and backward elimination

 decision-tree induction

August 5, 2019 Data Mining: Concepts and Techniques

60
Attribute Subset Selection
 Step-wise forward selection : The procedure starts with an empty
set of attributes. The best of the original attributes is determined
and added to the set. At each subsequent iteration or step, the
best of the remaining original attributes is added to the set.

August 5, 2019 Data Mining: Concepts and Techniques

61
Attribute Subset Selection
 Step-wise backward elimination: The procedure starts with the full
set of attributes. At each step, it removes the worst attribute
remaining in the set.

 The procedure may employ a threshold on the measure used to

determine when to stop the attribute selection process.

August 5, 2019 Data Mining: Concepts and Techniques

62
Example of Decision Tree Induction

Initial attribute set:

{A1, A2, A3, A4, A5, A6}
A4 ?

A1? A6?

Class 1 Class 2 Class 1 Class 2

> Reduced attribute set: {A1, A4, A6}

August 5, 2019 Data Mining: Concepts and Techniques
63
Heuristic Feature Selection Methods
 There are 2d possible sub-features of d features

 Several heuristic feature selection methods:

 Best single features under the feature independence assumption: choose

by significance tests.

 Best step-wise feature selection:

 The best single-feature is picked first

 Then next best feature condition to the first, ...

 Step-wise feature elimination:

 Repeatedly eliminate the worst feature

 Best combined feature selection and elimination

August 5, 2019 Data Mining: Concepts and Techniques

64
Approaches
 If the mining task is classification, and the mining algorithm itself
is used to determine the attribute subset, then this is called a
wrapper approach; otherwise, it is a Filter approach.

 In general, the wrapper approach leads to greater accuracy since

it optimizes the evaluation measure of the algorithm while
removing attributes. However, it requires much more computation
than a Filter approach.

August 5, 2019 Data Mining: Concepts and Techniques

65
Data reduction strategies
 Data cube aggregation, where aggregation operations are applied to the
data in the construction of a data cube.
 Dimension reduction, where irrelevant, weakly relevant, or redundant
attributes or dimensions may be detected and removed
 Data compression, where encoding mechanisms are used to reduce the data
set size.
 Numerosity reduction, where the data are replaced or estimated by
alternative, smaller data representations such as parametric models (which
need store only the model parameters instead of the actual data), or
nonparametric methods such as clustering, sampling, and the use of
histograms.
 Discretization and concept hierarchy generation, where raw data values
for attributes are replaced by ranges or higher conceptual levels. Concept
hierarchies allow the mining of data at multiple levels of abstraction, and are
a powerful tool for data mining.
August 5, 2019 Data Mining: Concepts and Techniques
66
Data Compression

Original Data Compressed

Data
lossless

Original Data
Approximated

August 5, 2019 Data Mining: Concepts and Techniques

67
Discrete Wavelet Transform (DWT)

 The discrete wavelet transform (DWT) is a linear signal processing

technique that, when applied to a data vector D, transforms it to a
numerically different vector, D0, of wavelet coefficients. The two
vectors are of the same length.

 The usefulness lies in the fact that the wavelet transformed data
can be truncated. A compressed approximation of the data can be
retained by storing only a small fraction of the strongest of the
wavelet coefficients.

 All wavelet coefficients larger than some user-specified threshold

can be retained. The remaining coefficients are set to 0.

August 5, 2019 Data Mining: Concepts and Techniques

68
More practical example

M= 4 7 6 9 M’ = 8.5 11.5 10.5 15

6 9 3 6 1.5 3.5 -1.5 0
5 4 7 6 -2.5 -0.5 0.5 3
2 4 5 9 0.5 -0.5 2.5 0

Notice how the higher values (low frequency)

are now positioned toward the top left and
the lower values (high frequency) are
positioned toward
August 5, 2019
the bottom right
Data Mining: Concepts and Techniques
69
DFT Vs DWT

 The DWT achieves better lossy compression.

 If the same number of coefficients are retained for a DWT

and a DFT of a given data vector, the DWT version will
provide a more accurate approximation of the original data.
Hence, for an equivalent approximation

 The DWT requires less space than the DFT.

 Unlike DFT, wavelets are quite localized in space,

contributing to the conservation of local detail.

August 5, 2019 Data Mining: Concepts and Techniques

70
Principal component analysis
 Principal component analysis (PCA) is a mathematical procedure that
uses an orthogonal transformation to convert a set of observations of
possibly correlated variables into a set of values of uncorrelated
variables called principal components.

 The number of principal components is less than or equal to the number

of original variables.

 This transformation is defined in such a way that the first principal

component has as high a variance as possible (that is, accounts for as
much of the variability in the data as possible), and each succeeding
component in turn has the highest variance possible under the
constraint that it be orthogonal to (uncorrelated with) the preceding
components

August 5, 2019 Data Mining: Concepts and Techniques

73
Principal Component Analysis
 Given N data vectors from k-dimensions, find
c <= k orthogonal vectors that can be best
used to represent data
 The original data are thus projected onto a
much smaller space, resulting in data
compression.
 Works for numeric data only
 Used when the number of dimensions is large

August 5, 2019 Data Mining: Concepts and Techniques

74
PCA Vs Wavelet
 PCA is computationally inexpensive, can be applied to
ordered and unordered attributes, and can handle sparse
data and skewed data. Multidimensional data of more than
two dimensions can be handled by reducing the problem to
two dimensions.

 For example, a 3-D data cube for sales with the dimensions
item type, branch, and year must first be reduced to a 2-D
cube, such as with the dimensions item type, and branch
year.

 In comparison with wavelet transforms for data

compression, PCA tends to be better at handling sparse
data, while wavelet transforms are more suitable for data of
high dimensionality.
August 5, 2019 Data Mining: Concepts and Techniques
78
Data reduction strategies
 Data cube aggregation, where aggregation operations are applied to the
data in the construction of a data cube.
 Dimension reduction, where irrelevant, weakly relevant, or redundant
attributes or dimensions may be detected and removed
 Data compression, where encoding mechanisms are used to reduce the data
set size.
 Numerosity reduction, where the data are replaced or estimated by
alternative, smaller data representations such as parametric models (which
need store only the model parameters instead of the actual data), or
nonparametric methods such as clustering, sampling, and the use of
histograms.
 Discretization and concept hierarchy generation, where raw data values
for attributes are replaced by ranges or higher conceptual levels. Concept
hierarchies allow the mining of data at multiple levels of abstraction, and are
a powerful tool for data mining.
August 5, 2019 Data Mining: Concepts and Techniques
79
Data reduction strategies
 Data cube aggregation, where aggregation operations are applied to the
data in the construction of a data cube.
 Dimension reduction, where irrelevant, weakly relevant, or redundant
attributes or dimensions may be detected and removed
 Data compression, where encoding mechanisms are used to reduce the data
set size.
 Numerosity reduction, where the data are replaced or estimated by
alternative, smaller data representations such as parametric models (which
need store only the model parameters instead of the actual data), or
nonparametric methods such as clustering, sampling, and the use of
histograms.
 Discretization and concept hierarchy generation, where raw data values
for attributes are replaced by ranges or higher conceptual levels. Concept
hierarchies allow the mining of data at multiple levels of abstraction, and are
a powerful tool for data mining.
August 5, 2019 Data Mining: Concepts and Techniques
80
Numerosity Reduction
 Can we reduce the data volume by choosing alternative “Smaller
form of data representation?”
 Parametric methods
 Assume the data fits some model, estimate model parameters,
store only the parameters, and discard the data (except
possible outliers)
 Log-linear models: Which estimate discrete multidimensional
probability distribution.
 Non-parametric methods
 Do not assume models
 Major families: histograms, clustering, sampling

August 5, 2019 Data Mining: Concepts and Techniques

81
Regression and Log-Linear Models
 Linear regression: Data are modeled to fit a straight line

 Often uses the least-square method to fit the line

 Random variable Y (Response ) can be modeled as a linear function of

another random variable X (Predictor) with the equation

 Where alpha and beta are called regression coefficients.

 Multiple regression: allows a response variable Y to be modeled as a linear

function of multidimensional feature vectorY = b0 + b1 X1 + b2 X2.

 Log-linear model: approximates discrete multidimensional probability

distributions

August 5, 2019 Data Mining: Concepts and Techniques

82
Histograms
 Histograms use binning to approximate data distributions and
are a popular form of data reduction.

 A histogram for an attribute A partitions the data distribution of

A into disjoint subsets, or buckets. The buckets are displayed
on a horizontal axis, while the height (and area) of a bucket
typically reflects the average frequency of the values
represented by the bucket.

 If each bucket represents only a single attribute-

value/frequency pair, the buckets are called singleton buckets.

August 5, 2019 Data Mining: Concepts and Techniques

85
Histogram
 The following data are a list of prices of commonly sold items at All Electronics
(rounded to the nearest dollar). The numbers have been sorted.

 1, 1, 5, 5, 5, 5, 5, 8, 8, 10, 10, 10, 10, 12, 14, 14, 14, 15, 15, 15, 15, 15, 15, 18,
18, 18, 18, 18, 18, 18, 18, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 25, 25, 25,
25, 25, 28, 28, 30, 30, 30.

August 5, 2019 Data Mining: Concepts and Techniques

86
Techniques
 How are the buckets determined and attributes values partitioned
?
 Equi Width : the width of each bucket range is uniform.

 Equi Depth(height) : the frequency of each bucket is constant.

August 5, 2019 Data Mining: Concepts and Techniques

87
Clustering
 Clustering techniques consider data tuples as objects.

 They partition the objects into groups or clusters, so that objects within a
cluster are “similar" to one another and “dissimilar" to objects in other clusters.

 Similarity is commonly defined in terms of how “close" the objects are in space,
based on a distance function.

 The “quality" of a cluster may be represented by its diameter, the maximum

distance between any two objects in the cluster.

 Centroid distance is an alternative measure of cluster quality, and is defined as

the average distance of each cluster object from the cluster centroid.

August 5, 2019 Data Mining: Concepts and Techniques

88
Example
 A 2-D plot of customer data with respect to
customer locations in a city, where the centroid
of each cluster is shown with a “+". Three data
clusters are visible.

August 5, 2019 Data Mining: Concepts and Techniques

89
Hierarchical Reduction
Multi Dimensional Index Tree

 Multidimensional index trees are primarily used for

providing fast data access. They can also be used for
hierarchical data reduction, providing a multi resolution
clustering of the data.

 Hierarchical aggregation

 An index tree hierarchically divides a data set into

partitions by value range of some attributes

 Each partition can be considered as a bucket

August 5, 2019 Data Mining: Concepts and Techniques

90
Example
 It provides a hierarchy of clusterings of the data set, where each
cluster has a label that holds for the data contained in the cluster.

Suppose that the tree contains 10,000 tuples with keys ranging from 1 to 9,999.
The data in the tree can be approximated by an equi-depth histogram of 6 buckets
for the key ranges 1 to 985, 986 to 3395, 3396 to 5410, 5411 to 8392, 8392 to
9543, and 9544 to 9999.

Each bucket contains roughly 10,000/6 items. Similarly, each bucket is subdivided
into smaller buckets, allowing for aggregate data at a finer-detailed level
August 5, 2019 Data Mining: Concepts and Techniques
91
Sampling

 Sampling can be used as a data reduction technique

since it allows a large data set to be represented by a
much smaller random sample (or subset) of the data.

 Suppose that a large data set, D, contains N tuples.

Let's have a look at some possible samples for D.

August 5, 2019 Data Mining: Concepts and Techniques

92
Sampling

 Simple random sample without replacement (SRSWOR) of size n:

 This is created by drawing n of the N tuples from D (n < N),

where the probably of drawing any tuple in D is 1/N, i.e., all
tuples are equally likely.

 Simple random sample with replacement (SRSWR) of size n:

 This is similar to SRSWOR, except that each time a tuple is

drawn from D, it is recorded and then replaced. That is, after a
tuple is drawn, it is placed back in D so that it may be drawn
again.

August 5, 2019 Data Mining: Concepts and Techniques

93
Sampling

Raw Data
August 5, 2019 Data Mining: Concepts and Techniques
94
Sampling Example

August 5, 2019 Data Mining: Concepts and Techniques

95
Sampling
 Cluster sample: If the tuples in D are grouped into M mutually
disjoint “clusters", then a SRS of m clusters can be obtained, where
m < M. For example, tuples in a database are usually retrieved a
page at a time, so that each page can be considered a cluster. A
reduced data representation can be obtained by applying, say,
SRSWOR to the pages, resulting in a cluster sample of the tuples.

August 5, 2019 Data Mining: Concepts and Techniques

96
Sampling Example

August 5, 2019 Data Mining: Concepts and Techniques

97
Sampling
 Stratified sample: If D is divided into mutually disjoint parts called
“strata", a stratified sample of D is generated by obtaining a SRS
at each stratum. This helps to ensure a representative sample,
especially when the data are skewed. For example, a stratified
sample may be obtained from customer data, where stratum is

created for each customer age group. In this way, the age group
having the smallest number of customers will be sure to be
represented.

August 5, 2019 Data Mining: Concepts and Techniques

98
Sampling Example

August 5, 2019 Data Mining: Concepts and Techniques

99
Sampling Advantages

 An advantage of sampling for data reduction is that the cost of

obtaining a sample is proportional to the size of the sample, n, as
opposed to N, the data set size. Hence, sampling complexity is
potentially sub-linear to the size of the data.

 Other data reduction techniques can require at least one complete

pass through D. For a fixed sample size, sampling complexity
increases only linearly as the number of data dimensions, d,
increases

August 5, 2019 Data Mining: Concepts and Techniques

100
Major Tasks in Data Preprocessing

 Continuous — real numbers

 Discretization:
 divide the range of a continuous attribute into
intervals
 Some classification algorithms only accept categorical
attributes.
 Reduce data size by discretization

 Prepare for further analysis

August 5, 2019 Data Mining: Concepts and Techniques

102
Discretization and Concept hierachy
 Discretization
 reduce the number of values for a given continuous
attribute by dividing the range of the attribute into
intervals. Interval labels can then be used to replace
actual data values.
 Concept hierarchies
 reduce the data by collecting and replacing low level
concepts (such as numeric values for the attribute
age) by higher level concepts (such as young,
middle-aged, or senior).

August 5, 2019 Data Mining: Concepts and Techniques

103
Discretization and concept hierarchy
generation for numeric data

 Binning

 Histogram analysis

 Clustering analysis

 Segmentation by natural partitioning

August 5, 2019 Data Mining: Concepts and Techniques

104
Binning
 These methods are also forms of discretization.

 For example, attribute values can be discretized by

distributing the values into bins, and replacing each bin
value by the bin mean or median, as in smoothing by
bin means or smoothing by bin medians, respectively.

 These techniques can be applied recursively to the

resulting partitions in order to generate concept
hierarchies.

August 5, 2019 Data Mining: Concepts and Techniques

105
Histogram Analysis
 Histograms, as discussed can also be used for discretization.

 For instance, in an equi-width histogram, the values are

partitioned into equal sized partions or ranges (e.g., ($0-$100],
($100-$200], . . . , ($900-$1,000]).

 The histogram analysis algorithm can be applied recursively to

each partition in order to automatically generate a multilevel
concept hierarchy, with the procedure terminating once a pre-
specified number of concept levels has been reached.

 A minimum interval size can also be used per level to control the
recursive procedure.

August 5, 2019 Data Mining: Concepts and Techniques

106
Example
 A concept hierarchy for price

August 5, 2019 Data Mining: Concepts and Techniques

107
Cluster Analysis
 A clustering algorithm can be applied to partition data
into clusters or groups. Each cluster forms a node of a
concept hierarchy, where all nodes are at the same
conceptual level.

 Each cluster may be further decomposed into several

subclusters, forming a lower level of the hierarchy.

 Clusters may also be grouped together in order to form

a higher conceptual level of the hierarchy.

August 5, 2019 Data Mining: Concepts and Techniques

108
Segmentation by natural partitioning
3-4-5 rule can be used to segment numeric data into
relatively uniform, “natural” intervals.
* If an interval covers 3, 6, 7 or 9 distinct values at the most
significant digit, partition the range into 3 equi-width
intervals
* If it covers 2, 4, or 8 distinct values at the most significant
digit, partition the range into 4 intervals
* If it covers 1, 5, or 10 distinct values at the most
significant digit, partition the range into 5 intervals

August 5, 2019 Data Mining: Concepts and Techniques

109
Example

 The Interval (-$400,000- $0] is partitioned into

 4 interval (-4 - -3],(-3 - -2],(-2 - -1],(-1 – 0).
 The interval ($0 - $1,00,000] is partitioned into
 5 interval (0,2],(2,4],(4,6],(6,8],(8,10].
 The interval ($1,00,000 - $2,00,000] is partitioned into
 5 interval (1,1.20],……
 The interval ($2,00,000 - $5,00,000] is partitioned into
 3 interval (2,3],……

August 5, 2019 Data Mining: Concepts and Techniques

110
Concept hierarchy generation for
categorical data
 Specification of a partial ordering of attributes explicitly at the schema
level by users or experts

 street, city, province or state, and country.

 A hierarchy can be defined by specifying the total ordering among

these attributes at the schema level, such as street < city < province
or state < country.

 Specification of a portion of a hierarchy by explicit data grouping

 to add some intermediate levels manually, such as defining explicitly,

 {Alberta, Saskatchewan, Manitobag} Northern Canada,

 {British Columbia, prairies Canadag} Western Canada".

August 5, 2019 Data Mining: Concepts and Techniques

112
Concept hierarchy
 Specification of a set of attributes, but not of their partial ordering

 an attribute defining a high concept level will usually contain a smaller number
of distinct values than an attribute defining a lower concept level.

 Based on this observation, a concept hierarchy can be automatically generated

based on the number of distinct values per attribute in the given attribute set.

 The attribute with the most distinct values is placed at the lowest level of the
hierarchy.

 Specification of only a partial set of attributes

 If a user were to specify only the attribute city for a hierarchy defining
location, the system may automatically drag all of the above via semantically-
related attributes to form a hierarchy.

August 5, 2019 Data Mining: Concepts and Techniques

113
Specification of a set of attributes
Concept hierarchy can be automatically generated based on
the number of distinct values per attribute in the given
attribute set. The attribute with the most distinct values is
placed at the lowest level of the hierarchy.

country 15 distinct values

province_or_ state 65 distinct values

city 3567 distinct values

674,339 distinct values

streetData Mining: Concepts and Techniques
August 5, 2019
114
Chapter 3: Data Preprocessing

 Why preprocess the data?

 Data cleaning
 Data integration and transformation
 Data reduction
 Discretization and concept hierarchy generation
 Summary

August 5, 2019 Data Mining: Concepts and Techniques

115
Summary

 Data preparation is a big issue for both warehousing

and mining
 Data preparation includes
 Data cleaning and data integration
 Data reduction and feature selection
 Discretization
 A lot a methods have been developed but still an active
area of research
August 5, 2019 Data Mining: Concepts and Techniques
116
August 5, 2019
Thank you !!!
Data Mining: Concepts and Techniques
117
August 5, 2019 Data Mining: Concepts and Techniques
118

ZESA Electrical Safety Rules
No ratings yet
ZESA Electrical Safety Rules
141 pages
Free Netflix PowerPoint Template Pptheme
No ratings yet
Free Netflix PowerPoint Template Pptheme
14 pages
Machine Learning
100% (2)
Machine Learning
136 pages
Unit 1 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Machine Learning - WWW - Rgpvnotes.in
23 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
MF-420E Technical Data
100% (5)
MF-420E Technical Data
3 pages
Drosos M&E Inception Report
100% (2)
Drosos M&E Inception Report
15 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Data Science: Chapter 1: Introduction To Big Data
100% (2)
Data Science: Chapter 1: Introduction To Big Data
77 pages
Machine Learning
100% (1)
Machine Learning
62 pages
365 Data Science R Course Notes
No ratings yet
365 Data Science R Course Notes
20 pages
Pattern Classification
100% (1)
Pattern Classification
42 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Data Mining
100% (1)
Data Mining
53 pages
Data Science PPT PD41
100% (1)
Data Science PPT PD41
8 pages
Symbolic Machine Learning: M.S.Kaysar, M.Engg Cse, Iub
100% (2)
Symbolic Machine Learning: M.S.Kaysar, M.Engg Cse, Iub
112 pages
7 Classification
100% (3)
7 Classification
63 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
L10a - Machine Learning Basic Concepts
100% (1)
L10a - Machine Learning Basic Concepts
80 pages
Image Enhancement-Spatial Filtering From: Digital Image Processing, Chapter 3
No ratings yet
Image Enhancement-Spatial Filtering From: Digital Image Processing, Chapter 3
56 pages
Dimension Reduction
No ratings yet
Dimension Reduction
15 pages
Data Science Lecture 1 Introduction
No ratings yet
Data Science Lecture 1 Introduction
27 pages
By Microsoft Website: DURATION: 6 Weeks Amount Paid: Yes: Introduction To Data Science
100% (1)
By Microsoft Website: DURATION: 6 Weeks Amount Paid: Yes: Introduction To Data Science
21 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
No ratings yet
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
Statistics in Details
100% (2)
Statistics in Details
283 pages
Python Data Science
No ratings yet
Python Data Science
25 pages
02.data Preprocessing PDF
100% (1)
02.data Preprocessing PDF
31 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
ML Notes
100% (2)
ML Notes
125 pages
Predict 422 - Module 8
100% (1)
Predict 422 - Module 8
138 pages
Chapter 3: Data Preprocessing
100% (1)
Chapter 3: Data Preprocessing
41 pages
Classification
100% (2)
Classification
105 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
Feature Engineering
100% (2)
Feature Engineering
76 pages
Dealing With Missing Data in Python Pandas
100% (1)
Dealing With Missing Data in Python Pandas
14 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
6 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Data Mining
No ratings yet
Data Mining
27 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
Data Science 5
100% (3)
Data Science 5
216 pages
Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.
100% (5)
Classification Algorithms Used in Data Mining. This Is A Lecture Given To MSC Students.
63 pages
Fundamentals of Data Science and Analytics On Descriptive Analysis
No ratings yet
Fundamentals of Data Science and Analytics On Descriptive Analysis
53 pages
Python Machine Learning
100% (2)
Python Machine Learning
70 pages
Data Preprocessing - Data Cleaning
100% (2)
Data Preprocessing - Data Cleaning
29 pages
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
NPTEL Coursebook
No ratings yet
NPTEL Coursebook
649 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Machine Learning Tutorial
No ratings yet
Machine Learning Tutorial
775 pages
Machine Learning
100% (2)
Machine Learning
104 pages
IT6006 Data Analytics
No ratings yet
IT6006 Data Analytics
12 pages
Data Science
100% (2)
Data Science
38 pages
Machine Learning
100% (2)
Machine Learning
58 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
Alzheimers Disease Detection Using Different Machine Learning Algorithms
100% (1)
Alzheimers Disease Detection Using Different Machine Learning Algorithms
7 pages
Data Science Theory: Analysis and Analytics
No ratings yet
Data Science Theory: Analysis and Analytics
14 pages
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
From Everand
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
Abhishek Vijayvargia
No ratings yet
DP490
No ratings yet
DP490
6 pages
Receipt Summary Buchu
No ratings yet
Receipt Summary Buchu
1 page
EP Sheet For Assistant
No ratings yet
EP Sheet For Assistant
164 pages
Glosario Tecnico Ocupacional
No ratings yet
Glosario Tecnico Ocupacional
5 pages
Readme 32
100% (1)
Readme 32
5 pages
Sound Activated Switch PDF
No ratings yet
Sound Activated Switch PDF
6 pages
Consumer Behavior Research: Jesilyn T. Sabay, MMT
No ratings yet
Consumer Behavior Research: Jesilyn T. Sabay, MMT
56 pages
Fire Detection
No ratings yet
Fire Detection
63 pages
Ogl 365-Writing Assignment One-Cynthia Nieto Vargas-1
No ratings yet
Ogl 365-Writing Assignment One-Cynthia Nieto Vargas-1
7 pages
Clutches
90% (10)
Clutches
76 pages
2 Storey Villa
No ratings yet
2 Storey Villa
16 pages
IA Pricelist 2020
No ratings yet
IA Pricelist 2020
120 pages
Root Cause Analysis of Failure of LPT Rotor Blades of 250MW KWU Turbine
No ratings yet
Root Cause Analysis of Failure of LPT Rotor Blades of 250MW KWU Turbine
1 page
Leo Park Supporting Innovative SMEs in Korea
No ratings yet
Leo Park Supporting Innovative SMEs in Korea
33 pages
Online Self-Reporting Procedure
No ratings yet
Online Self-Reporting Procedure
2 pages
Lichfields - Uk-Chartered Success Top Tips For Your RTPI APC
No ratings yet
Lichfields - Uk-Chartered Success Top Tips For Your RTPI APC
4 pages
Twin T Notch
No ratings yet
Twin T Notch
2 pages
ME2142E Speed or Position Control of A DC Motor
75% (4)
ME2142E Speed or Position Control of A DC Motor
10 pages
Jual ACER Aspire All-In-1 Multi Media PC AZ5710 - BeraniMurah
No ratings yet
Jual ACER Aspire All-In-1 Multi Media PC AZ5710 - BeraniMurah
4 pages
Mixed Media 3 Step-by-Step Lessons For Experimental Painting PDF
100% (1)
Mixed Media 3 Step-by-Step Lessons For Experimental Painting PDF
18 pages
How To Install Absolute 4 Vstsound
No ratings yet
How To Install Absolute 4 Vstsound
6 pages
Green Grass Mat
No ratings yet
Green Grass Mat
8 pages
Class X (2019-20) - Social Science
0% (5)
Class X (2019-20) - Social Science
4 pages
Oops
No ratings yet
Oops
13 pages
Competency Assessment Results Summary: Form A/C24/0108
No ratings yet
Competency Assessment Results Summary: Form A/C24/0108
2 pages