Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

classification basic concept.data mining

The document provides an overview of data mining, focusing on machine learning and classification techniques. It explains the types of learning, including supervised and unsupervised learning, and details the classification process, its applications, and various techniques used. Additionally, it discusses the advantages and disadvantages of classification in data mining.

Uploaded by

ssri62439
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

classification basic concept.data mining

The document provides an overview of data mining, focusing on machine learning and classification techniques. It explains the types of learning, including supervised and unsupervised learning, and details the classification process, its applications, and various techniques used. Additionally, it discusses the advantages and disadvantages of classification in data mining.

Uploaded by

ssri62439
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

CLASSIFICATION

BASIC CONCEPTS
IN
DATA MINING
PRESENTED BY
JAYASRI.A
VAISHNAVI.E
FELIX RAJ.A
BALACHANDAR.R
Machine learning

 Machine learning is a subfield of artificial intelligence, which is broadly


defined as the capability of a machine to imitate intelligent human
behavior. Artificial intelligence systems are used to perform complex
tasks in a way that is similar to how humans solve problems.
 TYPES OF LEARNING:
 SUPERVISED LEARNING

 UNSUPERVISED LEARNING
Supervised learning

o It is defined by its use of labeled datasets to train algorithms that to


classify data or predict outcomes accurately.
o Supervised learning uses a training set to teach models to yield the
desired output.
o This training dataset includes inputs and correct outputs, which allow
the model to learn over time.
o The algorithm measures its accuracy through the loss function, adjusting
until the error has been sufficiently minimized.
Unsupervised learning

o Unsupervised learning, also known as unsupervised machine


learning, uses machine learning algorithms to analyze and cluster
unlabeled datasets.
o These algorithms discover hidden patterns or data groupings without
the need for human intervention.
o Its ability to discover similarities and differences in information make it
the ideal solution for exploratory data analysis, cross-selling strategies,
customer segmentation, and image recognition.
Types of supervised learning

 Classification
 Classification is a technique that aims to
reproduce class assignments. It can predict the
response value and the data is separated into
“classes”.
 Regression
 Regression is related to continuous data (value
functions). The predicted output values are real
numbers. It deals with problems such as predicting
the price of a house or the trend in the stock price at
a given time, etc.
Data analytics

 Data analytics focuses on processing and performing statistical analysis


of existing datasets. Analysts concentrate on creating methods to
capture, process, and organize data to uncover actionable insights for
current problems, and establishing the best way to present this data.
 TYPES:
 Descriptive
 Diagnostic
 Predictive
 Prescriptive.
CLASSIFICATION

 Classification is a data mining function that assigns items in a collection


to target categories or classes. The goal of classification is to accurately
predict the target class for each case in the data.
 Classification is a widely used technique in data mining and is applied in
a variety of domains, such as email filtering, sentiment analysis, and
medical diagnosis.
 Classification is the problem of identifying to which of a set of categories
(subpopulations), a new observation belongs to, on the basis of a
training set of data containing observations and whose categories
membership is known.
Examples of Classification

 Classifying credit card transactions


as legitimate or fraudulent

 Classifying secondary structures of protein


as alpha-helix, beta-sheet, or random
coil
 Categorizing news stories as finance,
weather, entertainment, sports, etc
Example :
Raw mango & ripen mango
Feature-1: Weight

Weight
Feature-2: Weight and color
color weight lable
[22-140-10] 300 Raw
[10-240-10] 330 Ripen
x
[12-235-10] 310 Ripen
[250-130- 307 Raw
10]
[80-220-20] 333 Ripen
Color

y
Weight
RAW DATASET CLASSIFICATION

Raw dataset

Features/Label

Learning Algorithm

Model
Classification Techniques

 Decision Tree Based Method


 Rule- based method
 Memory-base reasoning
 Neural networks
 Navive Bayes and Bayesian Networks
 Support Vector Machines
 Linear Regression
Steps of classification

1. Learning Step (Training Phase): Construction of Classification Model


Different Algorithms are used to build a classifier by making the model
learn using the training set available. The model has to be trained for
the prediction of accurate results.
2. Classification Step: Model used to predict class labels and testing the
constructed model on test data and hence estimate the accuracy of the
classification rules.
Training and Testing

 The goal is to produce a trained (fitted) model that generalizes well to


new, unknown data. The fitted model is evaluated using “new”
examples from the held-out datasets (validation and test datasets) to
estimate the model's accuracy in classifying new data.
 Example:
 Suppose there is a person who is sitting under a fan and the fan starts
falling on him, he should get aside in order not to get hurt. So, this is his
training part to move away. While Testing if the person sees any heavy
object coming towards him or falling on him and moves aside then the
system is tested positively and if the person does not move aside then
the system is negatively tested.
ADVANTAGES

• Mining Based Methods are cost-effective and efficient


• Helps in identifying criminal suspects
• Helps in predicting the risk of diseases
• Helps Banks and Financial Institutions to identify defaulters so
that they may approve Cards, Loan, etc.
DISADVANTAGES

 Privacy: When the data is either are chances that a company may give
some information about their customers to other vendors or use this
information for their profit.
 Accuracy Problem: Selection of Accurate model must be there in order
to get the best accuracy and result.
THANK YOU

You might also like