0% found this document useful (0 votes)

106 views

What Is Data Mining

Data mining is the process of analyzing large amounts of data to discover patterns and relationships. It involves collecting data, organizing it in a database, and using software to identify patterns. The goal is to extract useful information that can help companies make better business decisions. Popular techniques include association rule mining to analyze customer purchasing habits and classify customers into groups. The typical data mining process involves understanding the business goals, preparing the data, building models to identify patterns, evaluating the results, and implementing changes based on the findings. Data mining is widely used across industries for applications like marketing, fraud detection, and customer segmentation.

Uploaded by

Amran Anwar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views

What Is Data Mining

Uploaded by

Amran Anwar

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

What Is Data Mining?

How It Works, Benefits, Techniques, and

Examples
https://www.investopedia.com/terms/d/datamining.asp

What Is Data Mining?

Data mining is a process used by companies to turn raw data into useful information. By
using software to look for patterns in large batches of data, businesses can learn more
about their customers to develop more effective marketing strategies, increase sales and
decrease costs. Data mining depends on effective data collection, warehousing, and
computer processing.

KEY TAKEAWAYS

 Data mining is the process of analyzing a large batch of information to discern trends
and patterns.

 Data mining can be used by corporations for everything from learning about what
customers are interested in or want to buy to fraud detection and spam filtering.

 Data mining programs break down patterns and connections in data based on what
information users request or provide.

 Social media companies use data mining techniques to commodify their users in
order to generate profit.

 This use of data mining has come under criticism lately as users are often unaware
of the data mining happening with their personal information, especially when it is
used to influence preferences.

1
How Data Mining Works
Data mining involves exploring and analyzing large blocks of information to glean
meaningful patterns and trends. It can be used in a variety of ways, such as database
marketing, credit risk management, fraud detection, spam Email filtering, or even to discern
the sentiment or opinion of users.

The data mining process breaks down into five steps. First, organizations collect data and
load it into their data warehouses. Next, they store and manage the data, either on in-
house servers or the cloud. Business analysts, management teams, and information
technology professionals access the data and determine how they want to organize it. Then,
application software sorts the data based on the user's results, and finally, the end-user
presents the data in an easy-to-share format, such as a graph or table.

Data Warehousing and Mining Software

Data mining programs analyze relationships and patterns in data based on what users
request. For example, a company can use data mining software to create classes of
information. To illustrate, imagine a restaurant wants to use data mining to determine when
it should offer certain specials. It looks at the information it has collected and creates
classes based on when customers visit and what they order.

In other cases, data miners find clusters of information based on logical relationships or look
at associations and sequential patterns to draw conclusions about trends in consumer
behavior.

Warehousing is an important aspect of data mining. Warehousing is when companies

centralize their data into one database or program. With a data warehouse, an organization
may spin off segments of the data for specific users to analyze and use. However, in other
cases, analysts may start with the data they want and create a data warehouse based on
those specs.

TIP: Cloud data warehouse solutions use space and power of a cloud provider to store data
from data sources. This allows smaller companies to leverage digital solutions for storage,
security, and analytics.

Data Mining Techniques

Data mining uses algorithms and various techniques to convert large collections of data into
useful output. The most popular types of data mining techniques include:

 Association rules, also referred to as market basket analysis, searches for

relationships between variables. This relationship in itself creates additional value
within the data set as it strives to link pieces of data. For example, association rules
would search a company's sales history to see which products are most commonly
purchased together; with this information, stores can plan, promote, and forecast
accordingly.

2
 Classification uses predefined classes to assign to objects. These classes describe
characteristics of items or represent what the data points have in common with each.
This data mining technique allows the underlying data to be more neatly categorized
and summarized across similar features or product lines.

 Clustering is similar to classification. However, clustering identified similarities

between objects, then groups those items based on what makes them different from
other items. While classification may result in groups such as "shampoo",
"conditioner", "soap", and "toothpaste", clustering may identify groups such as "hair
care" and "dental health".

 Decision trees are used to classify or predict an outcome based on a set list of

criteria or decisions. A decision tree is used to ask for input of a series of cascading
questions that sort the dataset based on responses given. Sometimes depicted as a
tree-like visual, a decision tree allows for specific direction and user input when
drilling deeper into the data.

 K-Nearest Neighbor (KNN) is an algorithm that classifies data based on its

proximity to other data. The basis for KNN is rooted in the assumption that data
points that are close to each are more similar to each other than other bits of data.
This non-parametric, supervised technique is used to predict features of a group
based on individual data points.

 Neural networks process data through the use of nodes. These nodes is comprised
of inputs, weights, and an output. Data is mapped through supervised learning
(similar to how the human brain is interconnected). This model can be fit to give
threshold values to determine a model's accuracy.

 Predictive analysis strives to leverage historical information to build graphical or

mathematical models to forecast future outcomes. Overlapping with regression
analysis, this data mining technique aims at supporting an unknown figure in the
future based on current data on hand.

The Data Mining Process

To be most effective, data analysts generally follow a certain flow of tasks along the data
mining process. Without this structure, an analyst may encounter an issue in the middle of
their analysis that could have easily been prevented had they prepared for it earlier. The
data mining process is usually broken into the following steps.

Step 1: Understand the Business

Before any data is touched, extracted, cleaned, or analyzed, it is important to understand

the underlying entity and the project at hand. What are the goals the company is trying to
achieve by mining data? What is their current business situation? What are the findings of
a SWOT analysis? Before looking at any data, the mining process starts by understanding
what will define success at the end of the process.

3
Step 2: Understand the Data

Once the business problem has been clearly defined, it's time to start thinking about data.
This includes what sources are available, how it will be secured stored, how information will
be gathered, and what the final outcome or analysis may look like. This step also critically
thinks about what limits there are to data, storage, security, and collection and assesses
how these constraints will impact the data mining process.

Step 3: Prepare the Data

It's now time to get our hands on information. Data is gathered, uploaded, extracted, or
calculated. It is then cleaned, standardized, scrubbed for outliers, assessed for mistakes,
and checked for reasonableness. During this stage of data mining, the data may also be
checked for size as an overbearing collection of information may unnecessarily slow
computations and analysis.
Step 4: Build the Model

With our clean data set in hand, it's time to crunch the numbers. Data scientists use the
types of data mining above to search for relationships, trends, associations, or sequential
patterns. The data may also be fed into predictive models to assess how previous bits of
information may translate into future outcomes.

Step 5: Evaluate the Results

The data-centered aspect of data mining concludes by assessing the findings of the data
model(s). The outcomes from the analysis may be aggregated, interpreted, and presented
to decision-makers that have largely be excluded from the data mining process to this point.
In this step, organizations can choose to make decisions based on the findings.

Step 6: Implement Change and Monitor

The data mining process concludes with management taking steps in response to the
findings of the analysis. The company may decide the information was not strong enough or
the findings were not relevant to change course. Alternatively, the company may
strategically pivot based on findings. In either case, management reviews the ultimate
impacts of the business and re-creates future data mining loops by identifying new business
problems or opportunities.

IMPORTANT: Different data mining processing models will have different steps, though the
general process is usually pretty similar. For example, the Knowledge Discovery Databases
model has nine steps, the CRISP-DM model has six steps, and the SEMMA process model
has five steps.

4
Applications of Data Mining
In today's age of information, it seems like almost every department, industry, sector, and
company can make use of data mining. Data mining is a vague process that has many
different applications as long as there is a body of data to analyze.

Sales
The ultimate goal of a company is to make money, and data mining encourages smarter,
more efficient use of capital to drive revenue growth. Consider the point-of-sale register at
your favorite local coffee shop. For every sale, that coffeehouse collects the time a purchase
was made, what products were sold together, and what baked goods are most popular.
Using this information, the shop can strategically craft its product line.

Marketing
Once the coffeehouse above knows its ideal line-up, it's time to implement the changes.
However, to make its marketing efforts more effective, the store can use data mining to
understand where its clients see ads, what demographics to target, where to place digital
ads, and what marketing strategies most resonate with customers. This includes
aligning marketing campaigns , promotional offers, cross-sell offers, and programs to
findings of data mining.

Manufacturing
For companies that produce their own goods, data mining plays an integral part in analyzing
how much each raw material costs, what materials are being used most efficiently, how
time is spent along the manufacturing process, and what bottlenecks negatively impact the
process. Data mining helps ensure the flow of goods is uninterrupted and least costly.

Fraud Detection
The heart of data mining is finding patterns, trends, and correlations that link data points
together. Therefore, a company can use data mining to identify outliers or correlations that
should not exist. For example, a company may analyze its cash flow and find a reoccurring
transaction to an unknown account. If this is unexpected, the company may wish to
investigate should funds be potentially mismanaged.

Human Resources
Human resources often has a wide range of data available for processing including data on
retention, promotions, salary ranges, company benefits and utilization of those benefits, and
employee satisfaction surveys. Data mining can correlate this data to get a better
understanding of why employees leave and what entices recruits to join.

Customer Service
Customer satisfaction may be caused (or destroyed) for a variety of reasons. Imagine a
company that ships goods. A customer may become unhappy with ship time, shipping
quality, or communication on shipment expectations. That same customer may become
frustrated with long telephone wait times or slow e-mail responses. Data mining gathers
operational information about customer interactions and summarizes findings to determine
weak points as well as highlights of what the company is doing right.

5
Benefits of Data Mining
Data mining ensures a company is collecting and analyzing reliable data. It is often a more
rigid, structured process that formally identifies a problem, gathers data related to the
problem, and strives to formulate a solution. Therefore, data mining helps a business
become more profitable, efficient, or operationally stronger.

Data mining can look very different across applications, but the overall process can be used
with almost any new or legacy application. Essentially any type of data can be gathered and
analyzed, and almost every business problem that relies on qualifiable evidence can be
tackled using data mining.

The end goal of data mining is to take raw bits of information and determine if there is
cohesion or correlation among the data. This benefit of data mining allows a company to
create value with the information they have on hand that would otherwise not be overly
apparent. Though data models can be complex, they can also yield fascinating results,
unearth hidden trends, and suggest unique strategies.

Limitations of Data Mining

This complexity of data mining is one of the largest disadvantages to the process. Data
analytics often requires technical skillsets and certain software tools. Some smaller
companies may find this to be a barrier of entry too difficult to overcome.

Data mining doesn't always guarantee results. A company may perform statistical analysis,
make conclusions based on strong data, implement changes, and not reap any benefits.
Through inaccurate findings, market changes, model errors, or inappropriate data
populations, data mining can only guide decisions and not ensure outcomes.

There is also a cost component to data mining. Data tools may require ongoing costly
subscriptions, and some bits of data may be expensive to obtain. Security and privacy
concerns can be pacified, though additional IT infrastructure may be costly as well. Data
mining may also be most effective when using huge data sets; however, these data sets
must be stored and require heavy computational power to analyze.

FAST FACTS: Even large companies or government agencies have challenges with data
mining. Consider the FDA's white paper on data mining that outlines the challenges of bad
information, duplicate data, underreporting, or overreporting.

6
Data Mining and Social Media
One of the most lucrative applications of data mining has been that of social media.
Platforms like Facebook (owned by Meta), TikTok, Instagram, and Twitter gather reams of
data about individual users to make inferences about their preferences in order to send
targeted marketing ads. This data is also used to try to influence user behavior and change
their preferences, whether it be for a consumer product or who they will vote for in an
election.

Data mining on social media has become a big point of contention, with several
investigative reports and exposes showing just how nefarious mining users' data can be. At
the heart of the issue, users may agree to the terms and conditions of the sites not realizing
how their personal information is being collected or to whom their information is being sold
to.

Examples of Data Mining

Data mining can be used for good, or it can be used illicitly. Here is an example of both.

eBay and e-Commerce

eBay collects countless bits of information every day, ranging from listings, sales, buyers,
and sellers. eBay uses data mining to attribute relationships between products, assess
desired price ranges, analyze prior purchase patterns, and forms product categories. eBay
outlines the recommendation process as:

 Raw item metadata and user historical data is aggregated.

 Scrips are run on a trained model to generate and predict the item and user.
 A KNN search is performed.
 The results are written to a database.
 The real-time recommendation takes the user ID, calls the database results, and
displays them to the user.

Facebook-Cambridge Analytica Scandal

Another cautionary example of data mining includes the Facebook-Cambridge Analytica data
scandal. During the 2010s, the British consulting firm Cambridge Analytical collected
personal data belong to millions of Facebook users. This information was later analyzed to
assist the 2016 presidential campaigns of Ted Cruz and Donald Trump. It is also suspected
that Cambridge Analytica interfered with other notable events such as the Brexit
referendum.

In slight of inappropriate data mining and misuse of user data, Facebook agreed to pay
$100 million for misleading investors about the use of consumer data. The Securities and
Exchange Commission claimed Facebook discovered the misuse in 2015 but did not correct
disclosures for more than two years.

7
What Are the Types of Data Mining?
Data mining is broken into two basic aspects: predictive data mining and descriptive data
mining. Predictive data mining is a type of analysis that extracts data that may be helpful in
determining an outcome. Description data mining is a type of analysis that informs users of
that data of a given outcome.

How Is Data Mining Done?

Data mining relies on big data and advanced computing processes including machine
learning and other forms of artificial intelligence (AI). The goal is to find patterns that can
lead to inferences or predictions from otherwise unstructured or large data sets.

What Is Another Term for Data Mining?

Data mining also goes by the less-used term knowledge discover in data, or KDD.

Where Is Data Mining Used?

Data mining applications range from the financial sector to look for patterns in the markets
to governments trying to identify potential security threats. Corporations, and especially
online and social media companies, use data mining on their users to create profitable
advertising and marketing campaigns that target specific sets of users.

The Bottom Line

Modern businesses have the ability to gather information on customers, products,
manufacturing lines, employees, and storefronts. These random pieces of information may
not tell a story, but the use of data mining techniques, applications, and tools helps pieces
together information to drive value. The ultimate goal of the data mining process is to
compile data, analyze the results, and execute operational strategies based on data mining
results.

PETvet Assembly Manual
No ratings yet
PETvet Assembly Manual
25 pages
Structure Charts-WS
No ratings yet
Structure Charts-WS
6 pages
Kantar - Consultant Interview Questions
No ratings yet
Kantar - Consultant Interview Questions
11 pages
Equipment Modelling E3D
100% (10)
Equipment Modelling E3D
104 pages
Data Mining
No ratings yet
Data Mining
6 pages
Unit 1 Data Mining
No ratings yet
Unit 1 Data Mining
15 pages
Data Mining Cognate
No ratings yet
Data Mining Cognate
23 pages
Data Mining
No ratings yet
Data Mining
21 pages
Unit 3
No ratings yet
Unit 3
22 pages
Bana1 Visualization
No ratings yet
Bana1 Visualization
22 pages
Sayan Ghosh 26900123054 Cse Data Mining 6th Sem
No ratings yet
Sayan Ghosh 26900123054 Cse Data Mining 6th Sem
11 pages
Chapter 3-IB
No ratings yet
Chapter 3-IB
69 pages
Arpita Paul DWDM2024
No ratings yet
Arpita Paul DWDM2024
10 pages
Data Mining - Prashant
No ratings yet
Data Mining - Prashant
10 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
640394541-Kantar-Consultant-Interview-questions-1
No ratings yet
640394541-Kantar-Consultant-Interview-questions-1
11 pages
Data Mining
No ratings yet
Data Mining
43 pages
Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science
No ratings yet
Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science
66 pages
DM
No ratings yet
DM
15 pages
IT in Society On Data Mining
No ratings yet
IT in Society On Data Mining
22 pages
IT in Society - Data Mining
No ratings yet
IT in Society - Data Mining
22 pages
Data Mining
No ratings yet
Data Mining
7 pages
Presentation Data Mining
No ratings yet
Presentation Data Mining
22 pages
NoteGPT AI PPT Maker 1728839183012
No ratings yet
NoteGPT AI PPT Maker 1728839183012
18 pages
Chapter 1&2
No ratings yet
Chapter 1&2
91 pages
SAYAN_GHOSH_26900123054_CSE_DATA_MINING_6TH_SEM
No ratings yet
SAYAN_GHOSH_26900123054_CSE_DATA_MINING_6TH_SEM
11 pages
Data Mining AND Warehousing: Abstract
No ratings yet
Data Mining AND Warehousing: Abstract
12 pages
UNIT-2_BI
No ratings yet
UNIT-2_BI
58 pages
Presentation On Data Mining
100% (1)
Presentation On Data Mining
51 pages
DWDM 2
No ratings yet
DWDM 2
15 pages
1_Lect 1 & 2 Data Mining
No ratings yet
1_Lect 1 & 2 Data Mining
20 pages
Unit 3 Ba
No ratings yet
Unit 3 Ba
29 pages
UNIT3
No ratings yet
UNIT3
125 pages
Chapter 1
No ratings yet
Chapter 1
55 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Data Mining.pdf
No ratings yet
Data Mining.pdf
6 pages
Data Mining Process Week3
No ratings yet
Data Mining Process Week3
13 pages
Data Warehousing&Dat Mining
No ratings yet
Data Warehousing&Dat Mining
12 pages
Data Mining Tutorial - Javatpoint
No ratings yet
Data Mining Tutorial - Javatpoint
12 pages
Data Mining
No ratings yet
Data Mining
8 pages
Data Science
No ratings yet
Data Science
11 pages
Data Mining Concepts
100% (3)
Data Mining Concepts
122 pages
What Is Data Mining
No ratings yet
What Is Data Mining
8 pages
09-Datamining Concepts
100% (1)
09-Datamining Concepts
121 pages
Data Mining
No ratings yet
Data Mining
46 pages
Seminar Data Mining
No ratings yet
Seminar Data Mining
10 pages
Data Mining
No ratings yet
Data Mining
30 pages
Module 3
No ratings yet
Module 3
187 pages
Data Mining
No ratings yet
Data Mining
18 pages
Data Mining in Search Engine Analytics
No ratings yet
Data Mining in Search Engine Analytics
7 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
HND - BI - W8 - Data Mining
No ratings yet
HND - BI - W8 - Data Mining
19 pages
Data-Mining-OVERVIEW (1)
No ratings yet
Data-Mining-OVERVIEW (1)
8 pages
Data mining M1
No ratings yet
Data mining M1
64 pages
DM Mod 1
No ratings yet
DM Mod 1
17 pages
DM Module1
No ratings yet
DM Module1
15 pages
Lecture 7 8 Data Mining
No ratings yet
Lecture 7 8 Data Mining
23 pages
Module 1 Ppt1
No ratings yet
Module 1 Ppt1
59 pages
Data Mining Notes1
No ratings yet
Data Mining Notes1
56 pages
Group 4
No ratings yet
Group 4
16 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Kinematics
No ratings yet
Kinematics
15 pages
As A Level Computer Science 9608 - Compress
No ratings yet
As A Level Computer Science 9608 - Compress
6 pages
Next Steps Computer Science 9608 June 18 - Paper21 - Q5
No ratings yet
Next Steps Computer Science 9608 June 18 - Paper21 - Q5
3 pages
Design Efficient Networked Systems
0% (1)
Design Efficient Networked Systems
2 pages
Searching and Sorting: - Linear Search - Binary Search
No ratings yet
Searching and Sorting: - Linear Search - Binary Search
18 pages
Revision Aids - Computer Studies 7010
No ratings yet
Revision Aids - Computer Studies 7010
1 page
Measurements in ICT
No ratings yet
Measurements in ICT
1 page
Company Criteria - 3 Years 33 43
No ratings yet
Company Criteria - 3 Years 33 43
11 pages
Microstation Powerdraft: Drafting Software For Your Most Demanding Projects
No ratings yet
Microstation Powerdraft: Drafting Software For Your Most Demanding Projects
2 pages
SQL Tutorial For Beginners PDF
No ratings yet
SQL Tutorial For Beginners PDF
5 pages
Sap C ts452 2021 Dumps by Nicholson 23 06 2022 9qa Dumpssheet
No ratings yet
Sap C ts452 2021 Dumps by Nicholson 23 06 2022 9qa Dumpssheet
11 pages
T4user Guide
No ratings yet
T4user Guide
226 pages
1 2 3 4 5 CFC For S7 Continuous Function Chart Simatic: Appendices Manual
No ratings yet
1 2 3 4 5 CFC For S7 Continuous Function Chart Simatic: Appendices Manual
126 pages
1245 8709 1 PB - Yamini - Scopus
No ratings yet
1245 8709 1 PB - Yamini - Scopus
6 pages
Avanade - Windows Server 2003 Case Study
No ratings yet
Avanade - Windows Server 2003 Case Study
5 pages
A+ 220-801 and 220-802 Authorized Practice Questions Chapter 7
No ratings yet
A+ 220-801 and 220-802 Authorized Practice Questions Chapter 7
20 pages
SW DWNLD
No ratings yet
SW DWNLD
53 pages
Advanced BASIC Scientific Subroutines
No ratings yet
Advanced BASIC Scientific Subroutines
189 pages
B.Tech AI &DS Curriculum - 16102020
No ratings yet
B.Tech AI &DS Curriculum - 16102020
64 pages
Invoice 39091
No ratings yet
Invoice 39091
1 page
Plaf 60x60 Philips
No ratings yet
Plaf 60x60 Philips
3 pages
Using Design Patterns With GRASP: G R A S P
No ratings yet
Using Design Patterns With GRASP: G R A S P
34 pages
MSP Lab Manual
No ratings yet
MSP Lab Manual
35 pages
Matrik Pengembangan
No ratings yet
Matrik Pengembangan
1 page
Catholic Ming Yuan College
No ratings yet
Catholic Ming Yuan College
3 pages
CV
No ratings yet
CV
6 pages
Best High Da Pa Backlinks
No ratings yet
Best High Da Pa Backlinks
7 pages
4-Reversing With Ida Pro From Scratch PDF
No ratings yet
4-Reversing With Ida Pro From Scratch PDF
17 pages
This Study Resource Was: Chapter 16-Network Security
No ratings yet
This Study Resource Was: Chapter 16-Network Security
6 pages
Xstamper Pre-Inked Stamp Operating Manual
No ratings yet
Xstamper Pre-Inked Stamp Operating Manual
60 pages
Automation Interface: Manual
No ratings yet
Automation Interface: Manual
179 pages
Guide To Computer Network Security Chapter1
No ratings yet
Guide To Computer Network Security Chapter1
16 pages
FMCS Field Activity
No ratings yet
FMCS Field Activity
14 pages
Game Design Doc For KZZZZZZZZT!: by Extreme Z7
No ratings yet
Game Design Doc For KZZZZZZZZT!: by Extreme Z7
6 pages