4 - Exploring Data
4 - Exploring Data
EXPLORING DATA
CREATED BY : ARIF DJUNAIDY (FTIF - ITS)
PRESENTED BY
11 Maret 2015
OUTLINE
Summary Statistics
Visualization
analysis
Making use of humans abilities to recognize patterns
(EDA)
http://www.itl.nist.gov/div898/handbook/index.htm
exploratory techniques
In data mining, clustering and anomaly detection are
major areas of interest, and not thought of as just
exploratory not discussed further in this chapter
Repository
http://www.ics.uci.edu/~mlearn/MLRepository.html
Virginica
Versicolour
SUMMARY STATISTICS
Summary statistics are numbers that
population of people, the gender female occurs about 50% of the time.
value
The notions of frequency and mode are typically used with
categorical data
PERCENTILES
For continuous data, the notion of a
VISUALIZATION
Visualization is the conversion of data into a visual
July 1982
Tens of thousands of data points are summarized in a single
figure
REPRESENTATION
Representation is the mapping of information to a
visual format
Data objects, their attributes, and the relationships
among data objects are translated into graphical
elements such as points, lines, shapes, and colors.
Example:
Objects are often represented as points
Their attribute values can be represented as the position
ARRANGEMENT
Arrangement is the placement of visual elements within a
display
Can make a large difference in how easy it is to understand
the data
Example:
SELECTION
TWO-DIMENSIONAL HISTOGRAMS
Show the joint distribution of the values of two attributes
Example: petal width and petal length
Box Plots
Invented by J. Tukey
Another way of displaying the distribution of data
Following figure shows the basic part of a box plot
outlier
90th percentile
75th percentile
50th percentile
25th percentile
10th percentile
VISUALIZATION TECHNIQUES:
SCATTER PLOTS
Scatter plots
Attributes values determine the position
Two-dimensional scatter plots most common, but can have
VISUALIZATION TECHNIQUES:
CONTOUR PLOTS
Contour plots
Useful when a continuous attribute is measured
on a spatial grid
They partition the plane into regions of similar
values
The contour lines that form the boundaries of
these regions connect points with equal values
The most common example is contour maps of
elevation
Can also display temperature, rainfall, air
pressure, etc.
An example for Sea Surface Temperature (SST) is
Celsius
VISUALIZATION TECHNIQUES:
MATRIX PLOTS
Matrix plots
Can plot the data matrix
This can be useful when objects are sorted according to
class
Typically, the attributes are normalized to prevent one
standard
deviation
VISUALIZATION TECHNIQUES:
PARALLEL COORDINATES
Parallel Coordinates
Used to plot the attribute values of high-dimensional data
Instead of using perpendicular axes, use a set of parallel
axes
The attribute values of each object are plotted as a point
on each corresponding coordinate axis and the points are
connected by a line
Thus, each object is represented as a line
Often, the lines representing a distinct class of objects
group together, at least for some attributes
Ordering of attributes is important in seeing such
groupings
Chernoff Faces
Approach created by Herman Chernoff
This approach associates each attribute with a
characteristic of a face
The values of each attribute determine the appearance of
the corresponding facial characteristic
Each object becomes a separate face
Relies on humans ability to distinguish faces
Setosa
Versicolour
Virginica
Setosa
Versicolour
Virginica
THE END
THANK YOU