0% found this document useful (0 votes)

22 views

Data Science Process and Machine Learning

Uploaded by

cs235214205

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Data Science Process and Machine Learning

Uploaded by

cs235214205

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Science Process

Data science process consists of six stages :

1. Discovery or Setting the research goal

2. Retrieving data

3. Data preparation

4. Data exploration

5. Data modeling

6. Presentation and automation

• Fig. 1.3.1 shows data science design process.

• Step 1: Discovery or Defining research goal

This step involves acquiring data from all the identified internal and external
sources, which helps to answer the business question.

• Step 2: Retrieving data

It collection of data which required for project. This is the process of gaining a
business understanding of the data user have and deciphering what each piece of
data means. This could entail determining exactly what data is required and the
best methods for obtaining it. This also entails determining what each of the
data points means in terms of the company. If we have given a data set from a
client, for example, we shall need to know what each column and row
represents.

• Step 3: Data preparation

Data can have many inconsistencies like missing values, blank columns, an
incorrect data format, which needs to be cleaned. We need to process, explore
and condition data before modeling. The cleandata, gives the better predictions.

• Step 4: Data exploration

Data exploration is related to deeper understanding of data. Try to understand

how variables interact with each other, the distribution of the data and whether
there are outliers. To achieve this use descriptive statistics, visual techniques
and simple modeling. This steps is also called as Exploratory Data Analysis.

• Step 5: Data modeling

In this step, the actual model building process starts. Here, Data scientist
distributes datasets for training and testing. Techniques like association,
classification and clustering are applied to the training data set. The model, once
prepared, is tested against the "testing" dataset.

• Step 6: Presentation and automation

Deliver the final baselined model with reports, code and technical documents in
this stage. Model is deployed into a real-time production environment after
thorough testing. In this stage, the key findings are communicated to all
stakeholders. This helps to decide if the project results are a success or a failure
based on the inputs from the model.
What is Machine Learning?

Machine learning is a branch of artificial intelligence that enables algorithms to

uncover hidden patterns within datasets, allowing them to make predictions on
new, similar data without explicit programming for each task. Traditional
machine learning combines data with statistical tools to predict outputs, yielding
actionable insights. This technology finds applications in diverse fields such as
image and speech recognition, natural language processing, recommendation
systems, fraud detection, portfolio optimization, and automating tasks.

Types of Machine Learning?

Machine learning algorithms can be trained in many ways, with each method
having its pros and cons. Based on these methods and ways of learning,
machine learning is broadly categorized into four main types:

Types of Machine Learning

1. Supervised machine learning

This type of ML involves supervision, where machines are trained on labeled

datasets and enabled to predict outputs based on the provided training. The
labeled dataset specifies that some input and output parameters are already
mapped. Hence, the machine is trained with the input and corresponding output.
A device is made to predict the outcome using the test dataset in subsequent
phases.

For example, consider an input dataset of parrot and crow images. Initially, the
machine is trained to understand the pictures, including the parrot and crow’s
color, eyes, shape, and size. Post-training, an input picture of a parrot is
provided, and the machine is expected to identify the object and predict the
output. The trained machine checks for the various features of the object, such
as color, eyes, shape, etc., in the input picture, to make a final prediction. This is
the process of object identification in supervised machine learning.

The primary objective of the supervised learning technique is to map the input
variable (a) with the output variable (b). Supervised machine learning is further
classified into two broad categories:

 Classification: These refer to algorithms that address classification

problems where the output variable is categorical; for example, yes or no,
true or false, male or female, etc. Real-world applications of this category
are evident in spam detection and email filtering.

Some known classification algorithms include the Random Forest Algorithm,

Decision Tree Algorithm, Logistic Regression Algorithm, and Support Vector
Machine Algorithm.

 Regression: Regression algorithms handle regression problems where

input and output variables have a linear relationship. These are known to
predict continuous output variables. Examples include weather
prediction, market trend analysis, etc.

Popular regression algorithms include the Simple Linear Regression Algorithm,

Multivariate Regression Algorithm, Decision Tree Algorithm, and Lasso
Regression.

2. Unsupervised machine learning

Unsupervised learning refers to a learning technique that’s devoid of

supervision. Here, the machine is trained using an unlabeled dataset and is
enabled to predict the output without any supervision. An unsupervised learning
algorithm aims to group the unsorted dataset based on the input’s similarities,
differences, and patterns.

For example, consider an input dataset of images of a fruit-filled container.

Here, the images are not known to the machine learning model. When we input
the dataset into the ML model, the task of the model is to identify the pattern of
objects, such as color, shape, or differences seen in the input images and
categorize them. Upon categorization, the machine then predicts the output as it
gets tested with a test dataset.

Unsupervised machine learning is further classified into two types:

 Clustering: The clustering technique refers to grouping objects into

clusters based on parameters such as similarities or differences between
objects. For example, grouping customers by the products they purchase.

Some known clustering algorithms include the K-Means Clustering Algorithm,

Mean-Shift Algorithm, DBSCAN Algorithm, Principal Component Analysis,
and Independent Component Analysis.

 Association: Association learning refers to identifying typical relations

between the variables of a large dataset. It determines the dependency of
various data items and maps associated variables. Typical applications
include web usage mining and market data analysis.

Popular algorithms obeying association rules include the Apriori Algorithm,

Eclat Algorithm, and FP-Growth Algorithm.

3. Semi-supervised learning

Semi-supervised learning comprises characteristics of both supervised and

unsupervised machine learning. It uses the combination of labeled and
unlabeled datasets to train its algorithms. Using both types of datasets, semi-
supervised learning overcomes the drawbacks of the options mentioned above.

Consider an example of a college student. A student learning a concept under a

teacher’s supervision in college is termed supervised learning. In unsupervised
learning, a student self-learns the same concept at home without a teacher’s
guidance. Meanwhile, a student revising the concept after learning under the
direction of a teacher in college is a semi-supervised form of learning.
4. Reinforcement learning

Reinforcement learning is a feedback-based process. Here, the AI component

automatically takes stock of its surroundings by the hit & trial method, takes
action, learns from experiences, and improves performance. The component is
rewarded for each good action and penalized for every wrong move. Thus, the
reinforcement learning component aims to maximize the rewards by performing
good actions.

Unlike supervised learning, reinforcement learning lacks labeled data, and the
agents learn via experiences only. Consider video games. Here, the game
specifies the environment, and each move of the reinforcement agent defines its
state. The agent is entitled to receive feedback via punishment and rewards,
thereby affecting the overall game score. The ultimate goal of the agent is to
achieve a high score.

Reinforcement learning is applied across different fields such as game theory,

information theory, and multi-agent systems. Reinforcement learning is further
divided into two types of methods or algorithms:

 Positive reinforcement learning: This refers to adding a reinforcing

stimulus after a specific behavior of the agent, which makes it more likely
that the behavior may occur again in the future, e.g., adding a reward
after a behavior.
 Negative reinforcement learning: Negative reinforcement learning
refers to strengthening a specific behavior that avoids a negative
outcome.

Applied ML notes
No ratings yet
Applied ML notes
123 pages
Machine Learning Notes
100% (10)
Machine Learning Notes
19 pages
GR10-12 Delphi Cram Notes - Revision1
100% (1)
GR10-12 Delphi Cram Notes - Revision1
20 pages
FLDFU Download Manual - GEN2U GEN3
100% (1)
FLDFU Download Manual - GEN2U GEN3
33 pages
INTRODUCTION TO MACHINE LEARNING
No ratings yet
INTRODUCTION TO MACHINE LEARNING
31 pages
MLES
No ratings yet
MLES
30 pages
ML Unit 1
No ratings yet
ML Unit 1
19 pages
FDS Assignment
No ratings yet
FDS Assignment
76 pages
Unit 2
No ratings yet
Unit 2
63 pages
ML Unit-1
No ratings yet
ML Unit-1
28 pages
Ai Unit-4 ML
No ratings yet
Ai Unit-4 ML
4 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
4.introduction To Learning - Unit 2
No ratings yet
4.introduction To Learning - Unit 2
8 pages
6CS4 AI Unit-4 @zammers
No ratings yet
6CS4 AI Unit-4 @zammers
129 pages
Types of Machine Learning
No ratings yet
Types of Machine Learning
14 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Module 1: Introduction To Machine Learning: 1. What Is Machine Learning? How Is It Different From Human Learning?
No ratings yet
Module 1: Introduction To Machine Learning: 1. What Is Machine Learning? How Is It Different From Human Learning?
21 pages
Null 5
No ratings yet
Null 5
16 pages
Learning Algorithms
No ratings yet
Learning Algorithms
28 pages
ML Unit-1 (CEC)
No ratings yet
ML Unit-1 (CEC)
108 pages
machine learning and AI
No ratings yet
machine learning and AI
13 pages
Chapter Five
No ratings yet
Chapter Five
10 pages
ETE Ans
No ratings yet
ETE Ans
73 pages
MLT Unit 1
No ratings yet
MLT Unit 1
15 pages
Machine Learning
No ratings yet
Machine Learning
146 pages
AI Project Cycle PPT - Notes
No ratings yet
AI Project Cycle PPT - Notes
9 pages
DL Unit-1
No ratings yet
DL Unit-1
25 pages
ML unit-I part 1
No ratings yet
ML unit-I part 1
7 pages
Full Notes
No ratings yet
Full Notes
37 pages
ML Notes UT-1
No ratings yet
ML Notes UT-1
21 pages
Machine Learning - its types
No ratings yet
Machine Learning - its types
8 pages
Ml Solutions
No ratings yet
Ml Solutions
34 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
All algos_of_ML
No ratings yet
All algos_of_ML
31 pages
ML1
No ratings yet
ML1
11 pages
ML LAB MANUAL
No ratings yet
ML LAB MANUAL
53 pages
Machine Learning Is The Branch of
No ratings yet
Machine Learning Is The Branch of
12 pages
Data Analytics Unit1
No ratings yet
Data Analytics Unit1
17 pages
ML Unit 1
No ratings yet
ML Unit 1
6 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
72 pages
Unit 1
No ratings yet
Unit 1
41 pages
(Pec Cs701e)
No ratings yet
(Pec Cs701e)
4 pages
ML Unit 1
No ratings yet
ML Unit 1
42 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
20 pages
Data Science Solutions IA 2
No ratings yet
Data Science Solutions IA 2
16 pages
Unit-5 Machine Learning
No ratings yet
Unit-5 Machine Learning
25 pages
ML Type
No ratings yet
ML Type
13 pages
AI Session 3 Machine Learning Slides
No ratings yet
AI Session 3 Machine Learning Slides
35 pages
ida unit-4
No ratings yet
ida unit-4
19 pages
Chapter 01 machine learning
No ratings yet
Chapter 01 machine learning
22 pages
TIS - Intro To Machine Learning
No ratings yet
TIS - Intro To Machine Learning
18 pages
Unit-I
No ratings yet
Unit-I
23 pages
Unit 1 - Machine Learning - NOTES1 - ML
No ratings yet
Unit 1 - Machine Learning - NOTES1 - ML
52 pages
Core Concepts of Supervised, Unsupervised, and Reinforcement Learning
No ratings yet
Core Concepts of Supervised, Unsupervised, and Reinforcement Learning
3 pages
AI lab6 (1)
No ratings yet
AI lab6 (1)
7 pages
Machine Learning Is A Branch of Artificial Intelligence (AI)
No ratings yet
Machine Learning Is A Branch of Artificial Intelligence (AI)
80 pages
ai.docx (2)
No ratings yet
ai.docx (2)
13 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Next Level Deep Machine Learning: Complete Tips and Tricks to Deep Machine Learning
From Everand
Next Level Deep Machine Learning: Complete Tips and Tricks to Deep Machine Learning
Joe Grant
No ratings yet
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Wifi Cracker
No ratings yet
Wifi Cracker
13 pages
Sedgwick Form
No ratings yet
Sedgwick Form
6 pages
70 High Authority PDF Submission Sites List
100% (2)
70 High Authority PDF Submission Sites List
3 pages
The Metaverse
No ratings yet
The Metaverse
17 pages
Lecture 8 Memory Management
No ratings yet
Lecture 8 Memory Management
23 pages
Digital Literacy and Cyberbullying Behav 5ef9de18
No ratings yet
Digital Literacy and Cyberbullying Behav 5ef9de18
24 pages
Network Algorithmics: An Interdisciplinary Approach To Designing Fast Networked Devices George Varghese & Jun Xu
No ratings yet
Network Algorithmics: An Interdisciplinary Approach To Designing Fast Networked Devices George Varghese & Jun Xu
49 pages
2-Way Radio & Personal Navigator: Owner's Manual and Reference Guide
No ratings yet
2-Way Radio & Personal Navigator: Owner's Manual and Reference Guide
88 pages
Resume Rosevee Ruiz 577
0% (2)
Resume Rosevee Ruiz 577
2 pages
Smartphones Past Present and Future
No ratings yet
Smartphones Past Present and Future
4 pages
Vitalograph Pc-Based Audiometers: With Spirotrac Software
No ratings yet
Vitalograph Pc-Based Audiometers: With Spirotrac Software
4 pages
Feature extraction techniques in NLP
No ratings yet
Feature extraction techniques in NLP
10 pages
HONOR X7a User Guide - (Magic UI 6.1 - 01, En)
No ratings yet
HONOR X7a User Guide - (Magic UI 6.1 - 01, En)
91 pages
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
No ratings yet
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
12 pages
Subsub
No ratings yet
Subsub
2 pages
Introduction To Computer Basics
No ratings yet
Introduction To Computer Basics
41 pages
Indicator NAS Ultimate Algo Remastered for TradingView
No ratings yet
Indicator NAS Ultimate Algo Remastered for TradingView
6 pages
36C26319Q0102 001
No ratings yet
36C26319Q0102 001
8 pages
Schneider MiCOM P633 660 Transformer PTT User Manual ENU
No ratings yet
Schneider MiCOM P633 660 Transformer PTT User Manual ENU
7 pages
Digital Electronics: Subtractor
No ratings yet
Digital Electronics: Subtractor
21 pages
Workshop 5-1: Dynamic Link: ANSYS HFSS For Antenna Design
No ratings yet
Workshop 5-1: Dynamic Link: ANSYS HFSS For Antenna Design
32 pages
Tuples
No ratings yet
Tuples
2 pages
Case Solutions: You Visited This Page On 17/8/21
No ratings yet
Case Solutions: You Visited This Page On 17/8/21
3 pages
HouseRules-MSH-1 0 6
No ratings yet
HouseRules-MSH-1 0 6
4 pages
Attacks
No ratings yet
Attacks
6 pages
10 - Processor Structure and Function
No ratings yet
10 - Processor Structure and Function
45 pages
Jurnal Baru
No ratings yet
Jurnal Baru
8 pages
Layout Similarity
No ratings yet
Layout Similarity
18 pages