Data Science

Uploaded by

banushri914

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Data Science

Uploaded by

banushri914

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

What is Data Science?

Data science combines math and statistics, specialized programming, advanced analytics,
artificial intelligence (AI) and machine learning with specific subject matter expertise to
uncover actionable insights hidden in an organization’s data. These insights can be used to
guide decision making and strategic planning.
This logic behind the data or the process behind the manipulation is what is known as Data
Science.
Data science is the study of data to extract meaningful insights for business. It is a
multidisciplinary approach that combines principles and practices from the fields of
mathematics, statistics, artificial intelligence, and computer engineering to analyze large
amounts of data. This analysis helps data scientists to ask and answer questions like what
happened, why it happened, what will happen, and what can be done with the results.
Data Science Process Life Cycle
There are some steps that are necessary for any of the tasks that are being done in the field of
data science to derive any fruitful results from the data at hand.
Data Collection – After formulating any problem statement the main task is to calculate data
that can help us in our analysis and manipulation. Sometimes data is collected by performing
some kind of survey and there are times when it is done by performing scrapping.
Data Cleaning – Most of the real-world data is not structured and requires cleaning and
conversion into structured data before it can be used for any analysis or modeling.
Exploratory Data Analysis – This is the step in which we try to find the hidden patterns in
the data at hand. Also, we try to analyze different factors which affect the target variable and
the extent to which it does so. How the independent features are related to each other and
what can be done to achieve the desired results all these answers can be extracted from this
process as well. This also gives us a direction in which we should work to get started with the
modeling process.
Model Building – Different types of machine learning algorithms as well as techniques have
been developed which can easily identify complex patterns in the data which will be a very
tedious task to be done by a human.
Model Deployment – After a model is developed and gives better results on the holdout or
the real-world dataset then we deploy it and monitor its performance. This is the main part
where we use our learning from the data to be applied in real-world applications and use
cases.
Steps for Data Science Processes:
Step 1: Defining research goals and creating a project charter
Spend time understanding the goals and context of your research. Continue asking questions
and devising examples until you grasp the exact business expectations, identify how your
project fits in the bigger picture, appreciate how your research is going to change the
business, and understand how they’ll use your results.
Create a project charter
A project charter requires teamwork, and your input covers at least the following:
 A clear research goal
 The project mission and context
 How you’re going to perform your analysis
 What resources you expect to use
 Proof that it’s an achievable project, or proof of concepts
 Deliverables and a measure of success
 A timeline
Step 2: Retrieving Data
Start with data stored within the company
 Finding data even within your own company can sometimes be a challenge.
 This data can be stored in official data repositories such as databases, data marts, data
warehouses, and data lakes maintained by a team of IT professionals.
 Getting access to the data may take time and involve company policies.
Step 3: Cleansing, integrating, and transforming data-
Cleaning:
Data cleansing is a subprocess of the data science process that focuses on removing errors in
your data so your data becomes a true and consistent representation of the processes it
originates from.
The first type is the interpretation error, such as incorrect use of terminologies, like saying
that a person’s age is greater than 300 years.
The second type of error points to inconsistencies between data sources or against your
company’s standardized values. An example of this class of errors is putting “Female” in one
table and “F” in another when they represent the same thing: that the person is female.
Integrating:
Combining Data from different Data Sources.
Your data comes from several different places, and in this sub step we focus on integrating
these different sources.
You can perform two operations to combine information from different data sets. The first
operation is joining and the second operation is appending or stacking.
Joining Tables:
Joining tables allows you to combine the information of one observation found in one table
with the information that you find in another table.
Appending Tables:

Appending or stacking tables is effectively adding observations from one table to another
table.
Transforming Data
Certain models require their data to be in a certain shape.
Reducing the Number of Variables
Sometimes you have too many variables and need to reduce the number because they don’t
add new information to the model.
Having too many variables in your model makes the model difficult to handle, and certain
techniques don’t perform well when you overload them with too many input variables.
Dummy variables can only take two values: true(1) or false(0). They’re used to indicate the
absence of a categorical effect that may explain the observation.
Step 4: Exploratory Data Analysis
During exploratory data analysis you take a deep dive into the data.
Information becomes much easier to grasp when shown in a picture, therefore you mainly use
graphical techniques to gain an understanding of your data and the interactions between
variables.
Bar Plot, Line Plot, Scatter Plot ,Multiple Plots , Pareto Diagram , Link and Brush
Diagram ,Histogram , Box and Whisker Plot .
Step 5: Build the Models
Build the models are the next step, with the goal of making better predictions, classifying
objects, or gaining an understanding of the system that are required for modeling.
Step 6: Presenting findings and building applications on top of them –
The last stage of the data science process is where your soft skills will be most useful, and
yes, they’re extremely important.
Presenting your results to the stakeholders and industrializing your analysis process for
repetitive reuse and integration with other tools.
Benefits and uses of data science and big data
Governmental organizations are also aware of data’s value. A data scientist in a governmental
organization gets to work on diverse projects such as detecting fraud and other criminal
activity or optimizing project funding.
Nongovernmental organizations (NGOs) are also no strangers to using data. They use it to
raise money and defend their causes. The World Wildlife Fund (WWF), for instance,
employs data scientists to increase the effectiveness of their fundraising efforts.
Universities use data science in their research but also to enhance the study experience of
their students. • Ex: MOOC’s- Massive open online courses.
Tools for Data Science Process
As time has passed tools to perform different tasks in Data Science have evolved to a great
extent. Different software like Matlab and Power BI, and programming Languages like
Python and R Programming Language provides many utility features which help us to
complete most of the most complex task within a very limited time and efficiently. Some of
the tools which are very popular in this domain of Data Science are shown in the below
image.
Usage of Data Science Process
The Data Science Process is a systematic approach to solving data-related problems and
consists of the following steps:
Problem Definition: Clearly defining the problem and identifying the goal of the analysis.
Data Collection: Gathering and acquiring data from various sources, including data cleaning
and preparation.
Data Exploration: Exploring the data to gain insights and identify trends, patterns, and
relationships.
Data Modeling: Building mathematical models and algorithms to solve problems and make
predictions.
Evaluation: Evaluating the model’s performance and accuracy using appropriate metrics.
Deployment: Deploying the model in a production environment to make predictions or
automate decision-making processes.
Monitoring and Maintenance: Monitoring the model’s performance over time and making
updates as needed to improve accuracy.
Issues of Data Science Process
Data Quality and Availability: Data quality can affect the accuracy of the models developed
and therefore, it is important to ensure that the data is accurate, complete, and consistent.
Data availability can also be an issue, as the data required for analysis may not be readily
available or accessible.
Bias in Data and Algorithms: Bias can exist in data due to sampling techniques, measurement
errors, or imbalanced datasets, which can affect the accuracy of models. Algorithms can also
perpetuate existing societal biases, leading to unfair or discriminatory outcomes.
Model Overfitting and Underfitting: Overfitting occurs when a model is too complex and fits
the training data too well, but fails to generalize to new data. On the other hand, underfitting
occurs when a model is too simple and is not able to capture the underlying relationships in
the data.
Model Interpretability: Complex models can be difficult to interpret and understand, making
it challenging to explain the model’s decisions and decisions. This can be an issue when it
comes to making business decisions or gaining stakeholder buy-in.
Privacy and Ethical Considerations: Data science often involves the collection and analysis of
sensitive personal information, leading to privacy and ethical concerns. It is important to
consider privacy implications and ensure that data is used in a responsible and ethical
manner.
Technical Challenges: Technical challenges can arise during the data science process such as
data storage and processing, algorithm selection, and computational scalability.

Once You Upload An Approved Document, You Will Be Able To Download The Document Lippincott Pharmacology 5th Edition PDF
No ratings yet
Once You Upload An Approved Document, You Will Be Able To Download The Document Lippincott Pharmacology 5th Edition PDF
1 page
Visualeyez Getting Started Guide
100% (1)
Visualeyez Getting Started Guide
19 pages
Data Science Process Stages Lecture 2
No ratings yet
Data Science Process Stages Lecture 2
4 pages
Unit - 1
No ratings yet
Unit - 1
25 pages
Exporatory Data Analytics Notes ME SEM 2
No ratings yet
Exporatory Data Analytics Notes ME SEM 2
132 pages
Data Science-Lec 1
No ratings yet
Data Science-Lec 1
17 pages
Data Science
100% (2)
Data Science
33 pages
Unit 2 - DS - 1st year
No ratings yet
Unit 2 - DS - 1st year
7 pages
DS
No ratings yet
DS
94 pages
22amh32 - Data Analytics and Data Science Unit I & Data Science Process 1. Data Science Process
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Data Science Process 1. Data Science Process
7 pages
Unit 3
No ratings yet
Unit 3
9 pages
Chapter 1- Intr to DS and Business Understanding
No ratings yet
Chapter 1- Intr to DS and Business Understanding
35 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
74 pages
Data Science
No ratings yet
Data Science
18 pages
Activity 3. Mind Map. Data Science Methodology
No ratings yet
Activity 3. Mind Map. Data Science Methodology
4 pages
Data Science
No ratings yet
Data Science
11 pages
Life Cycle of Data Science - Complete Step-By-step Guide
No ratings yet
Life Cycle of Data Science - Complete Step-By-step Guide
3 pages
Data Science
No ratings yet
Data Science
18 pages
File
No ratings yet
File
27 pages
Final Industrial Report
No ratings yet
Final Industrial Report
34 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
85 pages
Fundamental of Data Science
No ratings yet
Fundamental of Data Science
20 pages
introduction to data science
No ratings yet
introduction to data science
8 pages
Unit 1
No ratings yet
Unit 1
30 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
16 pages
Unit I and unit ii dev (1)
No ratings yet
Unit I and unit ii dev (1)
36 pages
Unit 1 DA
No ratings yet
Unit 1 DA
72 pages
Unit-2
No ratings yet
Unit-2
21 pages
Data Science Career Guide Interview Preparation
From Everand
Data Science Career Guide Interview Preparation
Gradient Publication
No ratings yet
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Dsdm-Unit1 241031 194317
No ratings yet
Dsdm-Unit1 241031 194317
38 pages
Unit I
No ratings yet
Unit I
52 pages
Introduction of Data Science.docx
No ratings yet
Introduction of Data Science.docx
28 pages
Emerging - 2021 - Module 2 PDF
No ratings yet
Emerging - 2021 - Module 2 PDF
61 pages
DSE 3 Unit 1
100% (1)
DSE 3 Unit 1
10 pages
Data Science in IOT
No ratings yet
Data Science in IOT
220 pages
M1 - FDS
No ratings yet
M1 - FDS
19 pages
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
No ratings yet
TE Sem1 UNIT 1 (Data Science and Visualization) HONOURS - TE (SEM V)
28 pages
Introduction to Data-Science
No ratings yet
Introduction to Data-Science
246 pages
Chapter 1 - Lecture
No ratings yet
Chapter 1 - Lecture
7 pages
Data Science Process
No ratings yet
Data Science Process
7 pages
Data Science Report - Compress
No ratings yet
Data Science Report - Compress
31 pages
Life Cycle of DS Project
No ratings yet
Life Cycle of DS Project
9 pages
Intro DA and ML Lecture 1 - S-2
No ratings yet
Intro DA and ML Lecture 1 - S-2
17 pages
DSUR_EA2352001010391_W3
No ratings yet
DSUR_EA2352001010391_W3
3 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Statictics Computerscience Information Science
No ratings yet
Statictics Computerscience Information Science
3 pages
Unit 1
No ratings yet
Unit 1
19 pages
Selected Topics - Datascience
No ratings yet
Selected Topics - Datascience
17 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
Module1 Data Science
No ratings yet
Module1 Data Science
15 pages
1. Data Science Introduction
No ratings yet
1. Data Science Introduction
24 pages
UNIT- I
No ratings yet
UNIT- I
17 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Unit-2 - DS Notes
No ratings yet
Unit-2 - DS Notes
22 pages
EBook - Data Science 4
No ratings yet
EBook - Data Science 4
14 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
_unit2 DATA SCIENCE
No ratings yet
_unit2 DATA SCIENCE
8 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
Unit 2 Bi Unlocked Notes
No ratings yet
Unit 2 Bi Unlocked Notes
48 pages
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
Pts Dmi Kelas Xii MM (Jawaban)
No ratings yet
Pts Dmi Kelas Xii MM (Jawaban)
8 pages
Thermal Profiling Optimizes Printed Circuit Board Assembly
No ratings yet
Thermal Profiling Optimizes Printed Circuit Board Assembly
8 pages
Oncology: Legacy Data SDTM
100% (1)
Oncology: Legacy Data SDTM
42 pages
ITM Assignment
100% (1)
ITM Assignment
59 pages
Sub Ledger Accounting
100% (1)
Sub Ledger Accounting
13 pages
Thesis On Pci Dss
100% (1)
Thesis On Pci Dss
7 pages
Motivation
No ratings yet
Motivation
15 pages
Mobile Actros - Compact
No ratings yet
Mobile Actros - Compact
2 pages
LESSON3 - Logical Control Structures
No ratings yet
LESSON3 - Logical Control Structures
45 pages
Vinayak Soin 102115159 3NC7: Name of Student: Roll Number: Group
No ratings yet
Vinayak Soin 102115159 3NC7: Name of Student: Roll Number: Group
3 pages
Diferenciacion Numerica
No ratings yet
Diferenciacion Numerica
396 pages
Samsung Galaxy S Duos 3 SM-G313HU - Schematic Diagarm
No ratings yet
Samsung Galaxy S Duos 3 SM-G313HU - Schematic Diagarm
126 pages
UT Dallas Syllabus For cs6363.001 05f Taught by Balaji Raghavachari (RBK)
No ratings yet
UT Dallas Syllabus For cs6363.001 05f Taught by Balaji Raghavachari (RBK)
1 page
BCA 1st Sem Assignment 2020-21 22 23
No ratings yet
BCA 1st Sem Assignment 2020-21 22 23
11 pages
Service Interface User Guide, February 2022 DOCA0170EN-02
No ratings yet
Service Interface User Guide, February 2022 DOCA0170EN-02
64 pages
Cs I PR II MQP
No ratings yet
Cs I PR II MQP
2 pages
There Is No Such Thing As A Microservice!: Chris Richardson
No ratings yet
There Is No Such Thing As A Microservice!: Chris Richardson
71 pages
Manuals
No ratings yet
Manuals
8 pages
Lecture17 DNS and The DNS Cache Poisoning Attack
No ratings yet
Lecture17 DNS and The DNS Cache Poisoning Attack
100 pages
Primefaces User Guide 5 3 PDF
No ratings yet
Primefaces User Guide 5 3 PDF
605 pages
QRadar Cyber Security Analyst CTF
No ratings yet
QRadar Cyber Security Analyst CTF
10 pages
USB Cassette Converter Manual PDF
No ratings yet
USB Cassette Converter Manual PDF
13 pages
unit 5 fpl
No ratings yet
unit 5 fpl
11 pages
Paper Dissection Lab - Playdough Mats A4 Twinkle
No ratings yet
Paper Dissection Lab - Playdough Mats A4 Twinkle
6 pages
College Management System: A Mini Project Report
No ratings yet
College Management System: A Mini Project Report
87 pages
Quantus Fluorometer Operating Manual TM396
No ratings yet
Quantus Fluorometer Operating Manual TM396
17 pages
Learning Management System (LMS) Pada E-Learning Menggunakan Metode Agile Dan Waterfall Berbasis Website
No ratings yet
Learning Management System (LMS) Pada E-Learning Menggunakan Metode Agile Dan Waterfall Berbasis Website
6 pages
SADP_21CS741-M4
No ratings yet
SADP_21CS741-M4
30 pages