Data Mining: Identify and Characterize A Data Set
Data Mining: Identify and Characterize A Data Set
• Objects under water give the strength of sonar returns at different angles.
• By using the historical data of bottom imaging taken by sonar we can monitor the
change in the bottom that predict about
Mine
• Knowledge extract from data mining greatly affected by the quality of data.
Missing data
Duplicate data
Noisy data
DATA QUALITY ISSUES IN SONAR DATASET
• No missing values
• No duplicate values
faulty sensors
Data integration
Data reduction
Data compression
Data transformation
Data discretization
DATA PRE-PROCESSING FOR ATTRIBUTE 1 AND
ATTRIBUTE 2
• Data cleaning
• Data integration
• Data reduction
• Data transformation
• Data discretion
EXAMPLE
MISSING VALUE
NORMALIZING
RESULT
SMOOTH NOISY DATA
RESULT
REMOVE DUPLICATES DATA
RESULT
SUMMARY
• What the data about?
• What type of benefit you might to hope get from data mining?
• What type of data mining you think would be relevant?
• Discuss data quality issues
• For at least two attributes, discuss data pre-processing, and give an
example of how would be done?