Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
54 views

Data Mining: Identify and Characterize A Data Set

The document discusses a sonar dataset containing signals bounced off rocks and mines underwater. It identifies predictive classification as the relevant data mining task given the labeled classes. It also discusses issues like noisy attributes and pre-processing techniques like data cleaning, normalization, smoothing and removing duplicates that could be applied to attributes before modeling.

Uploaded by

Ushna Khalid
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views

Data Mining: Identify and Characterize A Data Set

The document discusses a sonar dataset containing signals bounced off rocks and mines underwater. It identifies predictive classification as the relevant data mining task given the labeled classes. It also discusses issues like noisy attributes and pre-processing techniques like data cleaning, normalization, smoothing and removing duplicates that could be applied to attributes before modeling.

Uploaded by

Ushna Khalid
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

DATA MINING

IDENTIFY AND CHARACTERIZE A DATA SET


MANAHIL NASIR 2016-CS-610
HADIA ILYAS 2016-CS-615
ZEESHAN ASHRAF 2016-CS-633
NUMAN ASLAM 2016-CS-635
USHNA KHAIQ 2016-CS-658
WHAT THE DATA ABOUT?
SONAR

• Sonar stands for Sound Navigation and Ranging

• Sound propagation technology

• Used in underwater navigation, communication and


detection of submarine objects
SONAR DATASET
• Sonar signals bouncing off from the rock or mine under the water

• Objects under water give the strength of sonar returns at different angles.

• Signals obtained from a variety of different aspect angles spanning

180 degrees for the rock

90 degrees for the mine


DATA VARIABLES
• Number of examples: 208

• Input variables: 60 (Numeric)


 Range: 0.0 to 1.0
 Represent energy within particular frequency band

• Output variable: 1 (Nominal)


 Rock: 111 patterns obtained by bouncing off signals from rock
 Mine: 97 patterns obtained by bouncing off signals from mine
WHAT TYPE OF BENEFIT YOU MIGHT TO HOPE GET
FROM DATA MINING?
RECOGNIZING SURFACE TYPE
• Recognizing the type of surface

• Determine whether signal bounced by object or a submarine

• Help in modern warfare


DEPTH OF OCEAN
• Finding the bottom depth of the ocean

• Predict the depth of the ocean will increase or decrease.


OBJECT TYPES
• Find out what kind of objects underneath the ocean.

• Predict about the object in the bottom of the ocean.


DISASTER PREVENTION
• Data mining technique has been applied in sonar signal form the disaster
prevention.

• By using the historical data of bottom imaging taken by sonar we can monitor the
change in the bottom that predict about

the disaster will occur or not and


the size of it.
WHAT TYPE OF DATA MINING YOU THINK WOULD BE
RELEVANT?
DATA MINING TASKS

There are two types of data mining tasks

• Predictive data mining

• Descriptive data mining


PREDICTIVE VS DESCRIPTIVE DATA MINING (CONT.)

PREDICTIVE DATA MINING DESCRIPTIVE DATA MINING

• It identifies, what happened in the past by


• It describes, what can happen in the
analyzing stored data.
future with the help past data analysis.
• Provide accurate data(trained dataset)
• Produce results does not ensure accuracy
PREDICTIVE VS DESCRIPTIVE DATA MINING
PREDICTIVE DATA MINING DESCRIPTIVE DATA MINING

• These involve the result of questions • Capable of generating the response of


 what will happen next?  what happened?
 what is the outcome if these trends  where exactly is the problem?
continue?
 what is the frequency of the problem?
 what actions are required to be taken?

• Uses supervised learning functions • Uses unsupervised learning functions


DATA MINING MODEL AND TASK
DATA MINING RELEVANT TO SONAR DATASET

• Dataset is labeled contains two different classes


 Rock

 Mine

• Task is to predict about the object on the basis of data given

• Predictive data mining


 Classification
DISCUSS DATA QUALITY ISSUES
DATA QUALITY ISSUES
• Data quality is a major concern in data mining and knowledge extracting.

• Knowledge extract from data mining greatly affected by the quality of data.

• There are three major problems in data quality:

Missing data
Duplicate data
Noisy data
DATA QUALITY ISSUES IN SONAR DATASET

• No missing values

• No duplicate values

• 20% noisy attributes

• Heavy noise due to


 background inference

 faulty sensors

 target of detection located nearly out of range


FOR AT LEAST TWO ATTRIBUTES, DISCUSS DATA PRE-
PROCESSING, AND GIVE AN EXAMPLE OF HOW
WOULD BE DONE?
DATA PRE-PROCESSING
• There are major tasks in data pre-processing:
 Data cleaning

 Data integration

 Data reduction

 Data compression

 Data transformation

 Data discretization
DATA PRE-PROCESSING FOR ATTRIBUTE 1 AND
ATTRIBUTE 2

• Data cleaning

• Data integration

• Data reduction

• Data transformation

• Data discretion
EXAMPLE
MISSING VALUE
NORMALIZING
RESULT
SMOOTH NOISY DATA
RESULT
REMOVE DUPLICATES DATA
RESULT
SUMMARY
• What the data about?
• What type of benefit you might to hope get from data mining?
• What type of data mining you think would be relevant?
• Discuss data quality issues
• For at least two attributes, discuss data pre-processing, and give an
example of how would be done?

You might also like