Data Mining Data Exploration
Data Mining Data Exploration
Example:
Histogram
– Usually shows the distribution of values of a single variable
– Divide the values into bins and show a bar plot of the number of
objects in each bin.
– The height of each bar indicates the number of objects
– Shape of histogram depends on the number of bins
Box Plots
– Invented by J. Tukey
– Another way of displaying the distribution of data –
Following figure shows the basic part of a box plot
75th percentile
50th percentile
25th percentile
10th percentile
outlier
Scatter plots
– Attributes values determine the position
– Two-dimensional scatter plots most common, but can
have three-dimensional scatter plots
– Often additional attributes can be displayed by using
the size, shape, and color of the markers that
represent the objects
– It is useful to have arrays of scatter plots can
compactly summarize the relationships of several pairs
of attributes
◆ See example on the next slide
Matrix plots
– Can plot the data matrix
– This can be useful when objects are sorted according
to class
– Typically, the attributes are normalized to prevent one
attribute from dominating the plot
◆ Often standardized to have mean 0 and standard deviation 1
– Plots of similarity or distance matrices can also be
useful for visualizing the relationships between objects
– Examples of matrix plots are presented on the next two
slides
© Tan,Steinbach, Kumar Introduction to Data Mining
Visualization of the Iris Data Matrix
standard
© Tan,Steinbach, Kumar Introduction to Data Mining
Deviation or Score?
Star Plots
– Similar approach to parallel coordinates, but axes
radiate from a central point
– The line connecting the values of an object is a
polygon
Setosa
© Tan,Steinbach, Kumar Introduction to Data Mining
Versicolour
Virginica
Setosa
© Tan,Steinbach, Kumar Introduction to Data Mining
Versicolour
Virginica
OLAP
Virginica