Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views

Preprocessing_1

These are preprocessing slides

Uploaded by

Fareeha Butt
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Preprocessing_1

These are preprocessing slides

Uploaded by

Fareeha Butt
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Attributes Types in

Machine Learning
By Dr. Adven
What are Attributes?
• Attributes, also known as features or variables, are the different
characteristics or properties of the data in Machine learning. They are
crucial as they define the type of information collected and determine
the kind of analysis that can be performed
Types of Attributes
Nominal Attributes
• Nominal attributes represent categories or names.
• Examples: Gender (Male, Female), Hair Color (Black, Brown, Blonde),
Nationality (American, Canadian).
Ordinal Attributes
• Ordinal attributes represent categories with a meaningful order but no
consistent difference between them.
• Examples: Education Level (High School, Bachelor's, Master's, PhD),
Customer Satisfaction (Very Dissatisfied, Dissatisfied, Neutral, Satisfied,
Very Satisfied).
Types of Attributes
Interval Attributes
• Definition: Interval attributes represent numerical values with
meaningful intervals but no true zero point.
• Examples: Temperature in Celsius or Fahrenheit, Calendar dates (e.g.,
years 2000, 2001, 2002).
Ratio Attributes
• Ratio attributes represent numerical values with meaningful intervals
and a true zero point.
• Examples: Height, Weight, Age, Salary, Distance.
Types of Attributes
Binary Attributes
• Binary attributes have only two categories or states.
• Examples: Yes/No, True/False, Male/Female, Pass/Fail.
Discrete Attributes
• Discrete attributes have a finite or countable number of values.
• Examples: Number of children, Number of cars owned, Shoe size.
Continuous Attributes
• Continuous attributes have an infinite number of possible values
within a range.
• Examples: Temperature, Height, Weight, Time.
Role of Attributes in Data Mining
• Attributes play a crucial role in data mining tasks, such as:
1.Classification: Attributes serve as input features to classify data into
predefined categories.
2.Clustering: Attributes help in grouping data points into clusters based
on similarity.
3.Association Rule Mining: Attributes are used to find relationships
between different variables in the dataset.
4.Regression: Attributes are used to predict a continuous target
variable.
Attribute Transformation
• Normalization: Scaling attributes to a specific range, often [0,1] or [-
1,1].
• Standardization: Transforming attributes to have a mean of zero and
a standard deviation of one.
• Discretization: Converting continuous attributes into discrete
attributes.
• Encoding: Converting categorical attributes into numerical formats,
such as one-hot encoding.
Data Cleaning
• There are various steps and techniques used in data cleaning and
preprocessing in machine learning. Here's an explanation of each
1. Parsing: Parsing involves analyzing a string of symbols, either in
natural language or computer languages, to understand its
structure. In data mining it refers to the process of breaking down
data into its components, such as splitting a full name into first and
last names or extracting date components from a timestamp.
Data Cleaning
2. Correcting: Correcting refers to fixing errors or inaccuracies in the data.
• This step involves identifying and correcting incorrect data entries, such
as typos, spelling errors, or logical inconsistencies (e.g., a future date for
a past event).
3. Standardizing: Standardizing is the process of bringing data into a
common format or structure.
• Standardizing ensures consistency in data format across the dataset. For
example, converting all dates to a single format (e.g., "YYYY-MM-DD") or
standardizing measurement units (e.g., converting all weights to
kilograms).
Data Cleaning
4. Dealing with Missing Values: Handling missing values involves
strategies to address gaps in the data.
• In Data Mining: Techniques include:
• Imputation: Replacing missing values with estimated ones, such as
mean, median, or mode.
• Deletion: Removing records or variables with missing data if they are
minimal or insignificant.
• Prediction: Using algorithms to predict missing values based on other
available data.
Data Cleaning
5. Dealing with Noisy Data: Noisy data refers to data that contains
errors, outliers, or irrelevant information.
• Methods to handle noisy data include:
• Smoothing Techniques: Such as binning, clustering, or regression, to
reduce noise.
• Outlier Detection: Identifying and treating data points that deviate
significantly from the rest of the data.
• Filtering: Removing or transforming data that is irrelevant or
contributes to noise in the dataset.

You might also like