0% found this document useful (0 votes)

98 views

Data Processing in Data Mining

Data processing involves collecting, filtering, analyzing, and presenting raw data in a readable format. It is usually performed by a team in multi-step process. The data is processed either manually or automatically using computer programs and algorithms. Proper data processing is crucial for organizations to gain insights from data and make better business decisions. Some commonly used tools for data processing include Hadoop, Storm, and Qubole.

Uploaded by

Kartik Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views

Data Processing in Data Mining

Uploaded by

Kartik Tiwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Processing in Data Mining

Data processing is collecting raw data and translating it into usable information. The raw
data is collected, filtered, sorted, processed, analyzed, stored, and then presented in
a readable format. It is usually performed in a step-by-step process by a team of data
scientists and data engineers in an organization.

The data processing is carried out automatically or manually. Nowadays, most data is
processed automatically with the help of the computer, which is faster and gives
accurate results. Thus, data can be converted into different forms. It can be graphic as
well as audio ones. It depends on the software used as well as data processing methods.

After that, the data collected is processed and then translated into a desirable form as
per requirements, useful for performing tasks. The data is acquired from Excel files,
databases, text file data, and unorganized data such as audio clips, images, GPRS,
and video clips.

Data processing is crucial for organizations to create better business strategies and
increase their competitive edge. By converting the data into a readable format
like graphs, charts, and documents, employees throughout the organization can
understand and use the data.

The most commonly used tools for data processing are Storm, Hadoop, HPCC,
Statwing, Qubole, and CouchDB. The processing of data is a key step of the data
mining process. Raw data processing is a more complicated task. Moreover, the results
can be misleading. Therefore, it is better to process data before analysis. The processing
of data largely depends on the following things, such as:

o The volume of data that needs to be processed.

o The complexity of data processing operations.
o Capacity and inbuilt technology of respective computer systems.
o Technical skills and Time constraints.

Stages of Data Processing

The data processing consists of the following six stages.
1. Data Collection

The collection of raw data is the first step of the data processing cycle. The raw data
collected has a huge impact on the output produced. Hence, raw data should be
gathered from defined and accurate sources so that the subsequent findings are valid
and usable. Raw data can include monetary figures, website cookies, profit/loss
statements of a company, user behavior, etc.

2. Data Preparation

Data preparation or data cleaning is the process of sorting and filtering the raw data to
remove unnecessary and inaccurate data. Raw data is checked for errors, duplication,
miscalculations, or missing data and transformed into a suitable form for further analysis
and processing. This ensures that only the highest quality data is fed into the processing
unit.

3. Data Input
In this step, the raw data is converted into machine-readable form and fed into the
processing unit. This can be in the form of data entry through a keyboard, scanner, or
any other input source.

4. Data Processing

In this step, the raw data is subjected to various data processing methods using machine
learning and artificial intelligence algorithms to generate the desired output. This step
may vary slightly from process to process depending on the source of data being
processed (data lakes, online databases, connected devices, etc.) and the intended use
of the output.

5. Data Interpretation or Output

The data is finally transmitted and displayed to the user in a readable form like graphs,
tables, vector files, audio, video, documents, etc. This output can be stored and further
processed in the next data processing cycle.

6. Data Storage

The last step of the data processing cycle is storage, where data and metadata are
stored for further use. This allows quick access and retrieval of information whenever
needed. Effective proper data storage is necessary for compliance with GDPR (data
protection legislation).

Why Should We Use Data Processing?

In the modern era, most of the work relies on data, therefore collecting large amounts
of data for different purposes like academic, scientific research, institutional use,
personal and private use, commercial purposes, and lots more. The processing of this
data collected is essential so that the data goes through all the above steps and gets
sorted, stored, filtered, presented in the required format, and analyzed.

The amount of time consumed and the intricacy of processing will depend on the
required results. In situations where large amounts of data are acquired, the necessity of
processing to obtain authentic results with the help of data processing in data mining
and data processing in data research is inevitable.

Methods of Data Processing

There are three main data processing methods, such as:
1. Manual Data Processing

Data is processed manually in this data processing method. The entire procedure of
data collecting, filtering, sorting, calculation and alternative logical operations is all
carried out with human intervention without using any electronic device or automation
software. It is a low-cost methodology and does not need very many tools. However, it
produces high errors and requires high labor costs and lots of time.

2. Mechanical Data Processing

Data is processed mechanically through the use of devices and machines. These can
include simple devices such as calculators, typewriters, printing press, etc. Simple data
processing operations can be achieved with this method. It has much fewer errors than
manual data processing, but the increase in data has made this method more complex
and difficult.

3. Electronic Data Processing

Data is processed with modern technologies using data processing software and
programs. The software gives a set of instructions to process the data and yield output.
This method is the most expensive but provides the fastest processing speeds with the
highest reliability and accuracy of output.

Types of Data Processing

There are different types of data processing based on the source of data and the steps
taken by the processing unit to generate an output. There is no one size fits all method
that can be used for processing raw data.
1. Batch Processing: In this type of data processing, data is collected and
processed in batches. It is used for large amounts of data. For example, the
payroll system.
2. Single User Programming Processing: It is usually done by a single person for
his personal use. This technique is suitable even for small offices.
3. Multiple Programming Processing: This technique allows simultaneously
storing and executing more than one program in the Central Processing Unit
(CPU). Data is broken down into frames and processed using two or more CPUs
within a single computer system. It is also known as parallel processing. Further,
the multiple programming techniques increase the respective computer's overall
working efficiency. A good example of multiple programming processing is
weather forecasting.
4. Real-time Processing: This technique facilitates the user to have direct contact
with the computer system. This technique eases data processing. This technique
is also known as the direct mode or the interactive mode technique and is
developed exclusively to perform one task. It is a sort of online processing, which
always remains under execution. For example, withdrawing money from ATM.
5. Online Processing: This technique facilitates the entry and execution of data
directly; so, it does not store or accumulate first and then process. The technique
is developed to reduce the data entry errors, as it validates data at various points
and ensures that only corrected data is entered. This technique is widely used for
online applications. For example, barcode scanning.
6. Time-sharing Processing: This is another form of online data processing that
facilitates several users to share the resources of an online computer system. This
technique is adopted when results are needed swiftly. Moreover, as the name
suggests, this system is time-based. Following are some of the major advantages
of time-sharing processing, such as:
o Several users can be served simultaneously.
o All the users have an almost equal amount of processing time.
o There is a possibility of interaction with the running programs.
7. Distributed Processing: This is a specialized data processing technique in which
various computers (located remotely) remain interconnected with a single host
computer making a network of computers. All these computer systems remain
interconnected with a high-speed communication network. However, the central
computer system maintains the master database and monitors accordingly. This
facilitates communication between computers.

Examples of Data Processing

Data processing occurs in our daily lives whether we may be aware of it or not. Here are
some real-life examples of data processing, such as:

o Stock trading software that converts millions of stock data into a simple graph.
o An e-commerce company uses the search history of customers to recommend
similar products.
o A digital marketing company uses demographic data of people to strategize
location-specific campaigns.
o A self-driving car uses real-time data from sensors to detect if there are
pedestrians and other cars on the road.

Importance of Data Processing in Data Mining

In today's world, data has a significant bearing on researchers, institutions, commercial
organizations, and each individual user. Data is often imperfect, noisy, and incompatible,
and then it requires additional processing. After gathering, the question arises of how to
store, sort, filter, analyze and present data. Here data mining comes into play.

The complexity of this process is subject to the scope of data collection and the
complexity of the required results. Whether this process is time-consuming depends on
steps, which need to be made with the collected data and the type of output file desired
to be received. This issue becomes actual when the need for processing a big amount of
data arises. Therefore, data mining is widely used nowadays.

When data is gathered, there is a need to store it. The data can be stored in physical
form using paper-based documents, laptops and desktop computers, or other data
storage devices. With the rise and rapid development of such things as data
mining and big data, the process of data collection becomes more complicated and
time-consuming. It is necessary to carry out many operations to conduct thorough data
analysis.

At present, data is stored in a digital form for the most part. It allows processing data
faster and converting it into different formats. The user has the possibility to choose the
most suitable output.
Tasks in data preprocessing
1. Data Cleaning: It is also known as scrubbing. This task involves
filling of missing values, smoothing or removing noisy data and
outliers along with resolving inconsistencies.

2. Data Integration: This task involves integrating data from

multiple sources such as databases (relational and non-relational),
data cubes, files, etc. The data sources can be homogeneous or
heterogeneous. The data obtained from the sources can be
structured, unstructured or semi-structured in format.

3. Data Transformation: This involves normalisation and

aggregation of data according to the needs of the data set.

4. Data Reduction: During this step data is reduced. The number of

records or the number of attributes or dimensions can be reduced.
Reduction is performed by keeping in mind that reduced data
should produce the same results as original data.

5. Data Discretization: It is considered as a part of data reduction.

The numerical attributes are replaced with nominal ones.

Data Cleaning

The data cleaning process detects and removes the errors and
inconsistencies present in the data and improves its quality. Data
quality problems occur due to misspellings during data entry, missing
values or any other invalid data. Basically, “dirty” data is transformed
into clean data. “Dirty” data does not produce the accurate and good
results. Garbage data gives garbage out. So it becomes very important
to handle this data. Professionals spend a lot of their time on this step.

Night Audit Checklist
No ratings yet
Night Audit Checklist
3 pages
Infor Data Map Guide
No ratings yet
Infor Data Map Guide
32 pages
Data Processing Cycle
100% (1)
Data Processing Cycle
5 pages
Lesson Five Data Processing Introduction To Computer
No ratings yet
Lesson Five Data Processing Introduction To Computer
16 pages
DA Unit 2
No ratings yet
DA Unit 2
13 pages
Introduction To Data Processing
No ratings yet
Introduction To Data Processing
6 pages
Data Processing, Security, Antivirus
No ratings yet
Data Processing, Security, Antivirus
9 pages
DATA PROCESSING (2)
No ratings yet
DATA PROCESSING (2)
10 pages
Unit I
No ratings yet
Unit I
31 pages
Unit 2
No ratings yet
Unit 2
27 pages
3ppt Module#01 Continuation
No ratings yet
3ppt Module#01 Continuation
14 pages
4 Data Processing
No ratings yet
4 Data Processing
7 pages
My Mind Reader's
No ratings yet
My Mind Reader's
19 pages
Data Processing - ST
No ratings yet
Data Processing - ST
12 pages
Data Mining - Unit - 3
No ratings yet
Data Mining - Unit - 3
62 pages
Types of Data Processing
No ratings yet
Types of Data Processing
3 pages
Data Processing
No ratings yet
Data Processing
4 pages
Lec 4 Intro To Com
No ratings yet
Lec 4 Intro To Com
10 pages
2ppt Module#01 ComputerConcepts 2021 2
No ratings yet
2ppt Module#01 ComputerConcepts 2021 2
18 pages
Topic Importance of Data Processing
No ratings yet
Topic Importance of Data Processing
9 pages
Data Processing
No ratings yet
Data Processing
35 pages
Abed Computer
No ratings yet
Abed Computer
13 pages
About Computers Homework
No ratings yet
About Computers Homework
12 pages
Data Processing and Its Types
No ratings yet
Data Processing and Its Types
11 pages
Data Processing
No ratings yet
Data Processing
4 pages
Data Processing Operations
No ratings yet
Data Processing Operations
11 pages
Data Processing
0% (1)
Data Processing
101 pages
Computer Data Processing
No ratings yet
Computer Data Processing
3 pages
Data Processing
No ratings yet
Data Processing
22 pages
Introdction To Data Processing
No ratings yet
Introdction To Data Processing
2 pages
2ppt Module#01 ComputerConcepts
No ratings yet
2ppt Module#01 ComputerConcepts
21 pages
Data Pprocessing
No ratings yet
Data Pprocessing
29 pages
Module 5
No ratings yet
Module 5
23 pages
wk12 DATA PROCESSING
No ratings yet
wk12 DATA PROCESSING
2 pages
Tawanda Comp FD Ass 2
No ratings yet
Tawanda Comp FD Ass 2
18 pages
??? ? (Jhed)
No ratings yet
??? ? (Jhed)
5 pages
Data Processing
No ratings yet
Data Processing
3 pages
8 Data Processing
No ratings yet
8 Data Processing
17 pages
LECTURE 3-Data Processing
50% (2)
LECTURE 3-Data Processing
21 pages
Data Processing
No ratings yet
Data Processing
6 pages
Data Processing
No ratings yet
Data Processing
17 pages
UNIT 2 - Computer Appication
No ratings yet
UNIT 2 - Computer Appication
12 pages
Methods and Techniques of Data Processing
No ratings yet
Methods and Techniques of Data Processing
22 pages
Data Processing 9
No ratings yet
Data Processing 9
19 pages
Lesson-2 - Data Processing
No ratings yet
Lesson-2 - Data Processing
8 pages
Data Mining unit-1 complete
No ratings yet
Data Mining unit-1 complete
45 pages
Data Processing
100% (2)
Data Processing
18 pages
04 Data Processing
No ratings yet
04 Data Processing
16 pages
Computer Assignment
No ratings yet
Computer Assignment
7 pages
Data Processing
No ratings yet
Data Processing
45 pages
Data Processing
No ratings yet
Data Processing
26 pages
Data Processing
No ratings yet
Data Processing
43 pages
Data Mining U-1
No ratings yet
Data Mining U-1
10 pages
What Do You Mean by Data Processing?: Different Types of Output Files Obtained As "Processed" Data
No ratings yet
What Do You Mean by Data Processing?: Different Types of Output Files Obtained As "Processed" Data
2 pages
3ppt Module#01 Continuation2021 2
No ratings yet
3ppt Module#01 Continuation2021 2
8 pages
Zero To Mastery In Cybersecurity- Become Zero To Hero In Cybersecurity, This Cybersecurity Book Covers A-Z Cybersecurity Concepts, 2022 Latest Edition
From Everand
Zero To Mastery In Cybersecurity- Become Zero To Hero In Cybersecurity, This Cybersecurity Book Covers A-Z Cybersecurity Concepts, 2022 Latest Edition
RAJIV JAIN
No ratings yet
FOR DATA PROCESSING
No ratings yet
FOR DATA PROCESSING
6 pages
Res Meth Unit 8 - Data Processing
No ratings yet
Res Meth Unit 8 - Data Processing
18 pages
Chapter 3 DATA Processing and Computer Application
No ratings yet
Chapter 3 DATA Processing and Computer Application
31 pages
Data Processing Cycle
No ratings yet
Data Processing Cycle
2 pages
Intelligent Document Capture with Ephesoft
From Everand
Intelligent Document Capture with Ephesoft
Pat Myers
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Design Process and Concepts
No ratings yet
Design Process and Concepts
20 pages
Stakeholder Analysis and SRS
No ratings yet
Stakeholder Analysis and SRS
36 pages
L5 Stakeholder Analysis PDF
No ratings yet
L5 Stakeholder Analysis PDF
12 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
1 page
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
1 page
American Society For Quality Certified Quality Inspector (CQI) Body of Knowledge 2018
No ratings yet
American Society For Quality Certified Quality Inspector (CQI) Body of Knowledge 2018
8 pages
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
No ratings yet
Artificial Intelligence: Chapter 6: Representing Knowledge Using Rules
54 pages
Airfoil Investigation Database - Showing Naca-2412
No ratings yet
Airfoil Investigation Database - Showing Naca-2412
1 page
Introduction To Information and Communication Technologies: Lesson 4. What Are The Software Components of Computers?
No ratings yet
Introduction To Information and Communication Technologies: Lesson 4. What Are The Software Components of Computers?
27 pages
User's Manual: Wound Composite Modeler For Abaqus
No ratings yet
User's Manual: Wound Composite Modeler For Abaqus
45 pages
Modern Midi Sequencing and Performing Using Traditional and Mobile Tools 2ndnbsped 1138578770 9781138578777 1351263846 9781351263849 1138578746 9781138578746 135126382x 9781351263825 - Compress
No ratings yet
Modern Midi Sequencing and Performing Using Traditional and Mobile Tools 2ndnbsped 1138578770 9781138578777 1351263846 9781351263849 1138578746 9781138578746 135126382x 9781351263825 - Compress
443 pages
Aait-Itsc 1071 (Fundamentals of It) : Lecture 16 - Python For The Web
No ratings yet
Aait-Itsc 1071 (Fundamentals of It) : Lecture 16 - Python For The Web
29 pages
AI20-Session 07 - 01 - Yampolskiy - 2015 - Taxonomy of Pathways To Dangerous AI
No ratings yet
AI20-Session 07 - 01 - Yampolskiy - 2015 - Taxonomy of Pathways To Dangerous AI
6 pages
Programming Roadmap
No ratings yet
Programming Roadmap
4 pages
Smart School Project
No ratings yet
Smart School Project
10 pages
S.R.K.R Engineering College: Author 1 Name:N.Prameela Reg No:17B91A05E5 Author 2 Name:N.Anjani Devi Reg No:17B91A05D6
No ratings yet
S.R.K.R Engineering College: Author 1 Name:N.Prameela Reg No:17B91A05E5 Author 2 Name:N.Anjani Devi Reg No:17B91A05D6
15 pages
Deviation OQ Script
No ratings yet
Deviation OQ Script
54 pages
Introduction to Python and Python Modules and Libraries
No ratings yet
Introduction to Python and Python Modules and Libraries
6 pages
USA California Driver's License online generator — Verif Tools
No ratings yet
USA California Driver's License online generator — Verif Tools
1 page
C_Structures_Unions_Exercises 2
No ratings yet
C_Structures_Unions_Exercises 2
3 pages
FDI 01 Falling Dart Impact Tester
No ratings yet
FDI 01 Falling Dart Impact Tester
2 pages
Android - How To Enable & Disable Wifi or Internet Connection Programmatically - Stack Overflow PDF
No ratings yet
Android - How To Enable & Disable Wifi or Internet Connection Programmatically - Stack Overflow PDF
5 pages
Computer: Computer Fundamentals: Pradeep K. Sinha & Priti Sinha
0% (1)
Computer: Computer Fundamentals: Pradeep K. Sinha & Priti Sinha
50 pages
Lecture 5 - Scatter Plot Matrix
No ratings yet
Lecture 5 - Scatter Plot Matrix
6 pages
3BDS100596R301 - ONB User's Guide
No ratings yet
3BDS100596R301 - ONB User's Guide
70 pages
Aaban Resume
No ratings yet
Aaban Resume
2 pages
Business Case Cost Estimating Guide
No ratings yet
Business Case Cost Estimating Guide
56 pages
Digital Marketing Certification Test Study
0% (1)
Digital Marketing Certification Test Study
6 pages
Genset Status Indicator Box (Gsib)
No ratings yet
Genset Status Indicator Box (Gsib)
3 pages
Rockwell Software RSView32 TrendX en 0811
No ratings yet
Rockwell Software RSView32 TrendX en 0811
31 pages
Lauren Vandaniker Technology Impact On Nursing Practice
No ratings yet
Lauren Vandaniker Technology Impact On Nursing Practice
19 pages
Logcat 1713775968440
No ratings yet
Logcat 1713775968440
34 pages
Ebin Joseph
No ratings yet
Ebin Joseph
35 pages

Data Processing in Data Mining

Uploaded by

Data Processing in Data Mining

Uploaded by

Data Processing in Data Mining

o The volume of data that needs to be processed.

Stages of Data Processing

5. Data Interpretation or Output

Why Should We Use Data Processing?

Methods of Data Processing

2. Mechanical Data Processing

3. Electronic Data Processing

Types of Data Processing

Examples of Data Processing

Importance of Data Processing in Data Mining

2. Data Integration: This task involves integrating data from

3. Data Transformation: This involves normalisation and

4. Data Reduction: During this step data is reduced. The number of

5. Data Discretization: It is considered as a part of data reduction.

You might also like