Data Mining Process, Techniques, Tools & Examples
Data Mining Process, Techniques, Tools & Examples
Examples
What is Data Mining?
Data mining is looking for hidden, valid, and potentially useful patterns in huge
data sets. Data Mining is all about discovering unsuspected/ previously unknown
relationships amongst the data.
The insights derived via Data Mining can be used for marketing, fraud detection,
and scientific discovery, etc.
Types of Data
Data mining can be performed on following types of data
• Relational databases
• Data warehouses
• Advanced DB and information repositories
• Object-oriented and object-relational databases
• Transactional and Spatial databases
• Heterogeneous and legacy databases
• Multimedia and streaming database
• Text databases
• Text mining and Web mining
Business understanding:
In this phase, business and data-mining goals are established.
• First, you need to understand business and client objectives. You need to
define what your client wants (which many times even they do not know
themselves)
• Take stock of the current data mining scenario. Factor in resources,
assumption, constraints, and other significant factors into your assessment.
• Using business objectives and current scenario, define your data mining
goals.
• A good data mining plan is very detailed and should be developed to
accomplish both business and data mining goals.
Data understanding:
In this phase, sanity check on data is performed to check whether its appropriate
for the data mining goals.
Data preparation:
In this phase, data is made production ready.
The data preparation process consumes about 90% of the time of the project.
Data cleaning is a process to "clean" the data by smoothing noisy data and filling in
missing values.
For example, for a customer demographics profile, age data is missing. The data is
incomplete and should be filled. In some cases, there could be data outliers. For
instance, age has a value 300. Data could be inconsistent. For instance, name of the
customer is different in different tables.
Data transformation operations change the data to make it useful in data mining.
Following transformation can be applied
Data transformation:
Data transformation operations would contribute toward the success of the mining
process.
Aggregation: Summary or aggregation operations are applied to the data. I.e., the
weekly sales data is aggregated to calculate the monthly and yearly total.
Attribute construction: these attributes are constructed and included the given
set of attributes helpful for data mining.
The result of this process is a final data set that can be used in modeling.
Modelling
In this phase, mathematical models are used to determine data patterns.
Evaluation:
In this phase, patterns identified are evaluated against the business objectives.
• Results generated by the data mining model should be evaluated against the
business objectives.
• Gaining business understanding is an iterative process. In fact, while
understanding, new business requirements may be raised because of data
mining.
• A go or no-go decision is taken to move the model in the deployment phase.
2. Clustering:
Clustering analysis is a data mining technique to identify data that are like each
other. This process helps to understand the differences and similarities between
the data.
3. Regression:
Regression analysis is the data mining method of identifying and analyzing the
relationship between variables. It is used to identify the likelihood of a specific
variable, given the presence of other variables.
4. Association Rules:
This data mining technique helps to find the association between two or more
Items. It discovers a hidden pattern in the data set.
5. Outer detection:
This type of data mining technique refers to observation of data items in the
dataset which do not match an expected pattern or expected behavior. This
technique can be used in a variety of domains, such as intrusion, detection, fraud
or fault detection, etc. Outer detection is also called Outlier Analysis or Outlier
mining.
6. Sequential Patterns:
This data mining technique helps to discover or identify similar patterns or trends
in transaction data for certain period.
For example, he might learn that his best customers are married females between
the age of 45 and 54 who make more than $80,000 per year. Marketing efforts can
be targeted to such demographic.
A bank wants to search new ways to increase revenues from its credit card
operations. They want to check whether usage would double if fees were halved.
Bank has multiple years of record on average credit card balances, payment
amounts, credit limit usage, and other key parameters. They create a model to
check the impact of the proposed new business policy. The data results show that
cutting fees in half for a targeted customer base could increase revenues by $10
million.
R-language:
R language is an open source tool for statistical computing and graphics. R has a
wide variety of statistical, classical statistical tests, time-series analysis,
classification and graphical techniques. It offers effective data handling and
storage facility.
Oracle Data Mining popularly knowns as ODM is a module of the Oracle Advanced
Analytics Database. This Data mining tool allows data analysts to generate detailed
insights and makes predictions. It helps predict customer behavior, develops
customer profiles, identifies cross-selling opportunities.
Insurance Data mining helps insurance companies to price their products profitable
and promote new offers to their new or existing customers.
Education Data mining benefits educators to access student data, predict achievement
levels and find students or groups of students which need extra attention.
For example, students who are weak in maths subject.
Manufacturing With the help of Data Mining Manufacturers can predict wear and tear of
production assets. They can anticipate maintenance which helps them
reduce them to minimize downtime.
Banking Data mining helps finance sector to get a view of market risks and manage
regulatory compliance. It helps banks to identify probable defaulters to
decide whether to issue credit cards, loans, etc.
Retail Data Mining techniques help retail malls and grocery stores identify and
arrange most sellable items in the most attentive positions. It helps store
owners to comes up with the offer which encourages customers to increase
their spending.
Service Providers Service providers like mobile phone and utility industries use Data Mining to
predict the reasons when a customer leaves their company. They analyze
billing details, customer service interactions, complaints made to the
company to assign each customer a probability score and offers incentives.
E-Commerce E-commerce websites use Data Mining to offer cross-sells and up-sells
through their websites. One of the most famous names is Amazon, who use
Data mining techniques to get more customers into their eCommerce store.
Crime Data Mining helps crime investigation agencies to deploy police workforce
Investigation (where is a crime most likely to happen and when?), who to search at a
border crossing etc.
Bioinformatics Data Mining helps to mine biological data from massive datasets gathered in
biology and medicine.