UNIT 4 Data Science Notes

Uploaded by

parth170309

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

116 views

UNIT 4 Data Science Notes

Uploaded by

parth170309

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

UNIT-4 DATA SCIENCE

1. What is Data Science?

Data Sciences- It is a concept to unify statistics, data analysis, machine learning
and their related methods in order to understand and analyse actual phenomena
with data. It employs techniques and theories drawn from many fields within the
context of Mathematics, Statistics, Computer Science, and Information Science.
2. What are the applications of Data Science?
Data Science is not a new field. Data Sciences majorly work around analysing the
data and when it comes to AI, the analysis helps in making the machine intelligent
enough to perform tasks by itself. There exist various applications of Data Science
in today’s world. Some of them are:
• Fraud and Risk Detection
• Genetics & Genomics
• Internet Search
• Targeted Advertising
• Website Recommendations:
• Airline Route Planning

3. What are the different sources of collecting data?

There exist various sources of data from where we can collect any type of data
required and the data collection process can be categorised in two ways: Offline
and Online.

4. What points should be kept in mind while accessing data from any of the data
sources?
While accessing data from any of the data sources, following points should be kept
in mind:
1) Data which is available for public usage only should be taken up.
2) Personal datasets should only be used with the consent of the owner.
3) One should never breach someone’s privacy to collect data.
4) Data should only be taken form reliable sources as the data collected from
random sources can be wrong or unusable.
5) Reliable sources of data ensure the authenticity of data which helps in
proper training of the AI model.

5. What are the some of the popular formats for storing data?
1) CSV: CSV stands for comma separated values. It is a simple file format used
to store tabular data. Each line of this file is a data record and each record
consist of one or more fields which are separated by commas. Since the
values of records are separated by a comma, hence they are known as CSV
files.
2) Spreadsheet: A Spreadsheet is a piece of paper or a computer program
which is used for accounting and recording data using rows and columns into
which information can be entered. Microsoft excel is a program which helps
in creating spreadsheets.
3) SQL: SQL is a programming language also known as Structured Query
Language. It is a domain specific language used in programming and is
designed for managing data held in different kinds of DBMS (Database
Management System) It is particularly useful in handling structured data.

6. What is NumPy?
NumPy, which stands for Numerical Python, is the fundamental package for
Mathematical and logical operations on arrays in Python. It is a commonly used
package when it comes to working around numbers. NumPy gives a wide range of
arithmetic operations around numbers giving us an easier approach in working
with them. NumPy also works with arrays, which is nothing but a homogenous
collection of Data.

7. Define Array?
An array is a set of multiple values which are of same datatype. They can be
numbers, characters, booleans, etc. but only one datatype can be accessed through
an array. In NumPy, the arrays used are known as ND-arrays (N-Dimensional
Arrays) as NumPy comes with a feature of creating n-dimensional arrays in
Python.

8. What is the difference between Array and List?

9. What is Pandas?
Pandas is a software library written for the Python programming language for data
manipulation and analysis. It offers data structures and operations for
manipulating numerical tables and time series. The name is derived from the term
"panel data" and “Python data analysis” which means using a statistical method of
analysing the data taken from the observations over multiple time periods for the
same individuals.

10. Pandas is well suited for which kinds of data?

Pandas is well suited for many kinds of data:
• Tabular data with heterogeneously-typed columns, as in an SQL table or
Excel spreadsheet
• Ordered and unordered (not necessarily fixed-frequency) time series data.
• Arbitrary matrix data (homogeneously typed or heterogeneous) with row
and column labels
• Any other form of observational / statistical data sets. The data need not be
labelled at all to be placed into a Pandas data structure

11. What are the two primary data structures of Pandas?

The two primary data structures of Pandas, Series (1-dimensional) and DataFrame
(2-dimensional), handle the vast majority of typical use cases in finance, statistics,
social science, and many areas of engineering.
Here are just a few of the things that pandas does well:
• Easy handling of missing data (represented as NaN) in floating point as well as
non-floating-point data
• Size mutability: columns can be inserted and deleted from DataFrame and higher
dimensional objects
• Automatic and explicit data alignment: objects can be explicitly aligned to a set of
labels, or the user can simply ignore the labels and let Series, DataFrame, etc.
automatically align the data for you in computations
• Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
• Intuitive merging and joining data sets
• Flexible reshaping and pivoting of data sets

12. What is Matplotlib?

Matplotlib is an amazing visualization library in Python for 2D plots of arrays.
Matplotlib is a multiplatform data visualization library built on NumPy arrays.
One of the greatest benefits of visualization is that it allows us visual access to
huge amounts of data in easily digestible visuals. Matplotlib comes with a wide
variety of plots. Plots helps to understand trends, patterns, and to make
correlations. They are typically instruments for reasoning about quantitative
information. Some types of graphs that we can make with this package are listed
below:
13. What are the Statistical tools widely used in Python?
Statistical tools widely used in Python are:

14. What are types of issues we can face with data?

While collecting data, it is possible that the data might come with some errors. Let
us first look at the types of issues we can face with data:
1) Erroneous Data: There are two ways in which the data can be erroneous:
• Incorrect values: The values in the dataset (at random places) are incorrect.
For example, in the column of phone number, there is a decimal value or in the
mark’s column, there is a name mentioned, etc.
• Invalid or Null values: At some places, the values get corrupted and hence
they become invalid. Many times, you will find NaN values in the dataset. These
are null values which do not hold any meaning and are not processible.
2) Missing Data: In some datasets, some cells remain empty. The values of these
cells are missing and hence the cells remain empty.
3) Outliers: Data which does not fall in the range of a certain element are referred
to as outliers. To understand this better, let us take an example of marks of
students in a class. Let us assume that a student was absent for exams and hence
has got 0 marks in it. If his marks are considered, the whole class’s average
would go down. To prevent this, the average is taken for the range of marks from
highest to lowest keeping this particular result separate. This makes sure that
the average marks of the class are true according to the data.

Ipendant Customization Manual Ver.7.70 (MAROC77CG01101E Rev.
100% (1)
Ipendant Customization Manual Ver.7.70 (MAROC77CG01101E Rev.
222 pages
ALU Passive Optical Networking 2
No ratings yet
ALU Passive Optical Networking 2
24 pages
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
20% (5)
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
24 pages
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
From Everand
Data Structures & Algorithms Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
1/5 (1)
CH 4
No ratings yet
CH 4
17 pages
UNIT 4 Data Science
No ratings yet
UNIT 4 Data Science
7 pages
Combinepdf
No ratings yet
Combinepdf
15 pages
22mca341 - Data Science
No ratings yet
22mca341 - Data Science
109 pages
Data Science
No ratings yet
Data Science
8 pages
Date Science Part 2
No ratings yet
Date Science Part 2
3 pages
DATA ANALYSIS USING PYTHON2
No ratings yet
DATA ANALYSIS USING PYTHON2
27 pages
1st Class-Introduction and Python Package (1)
No ratings yet
1st Class-Introduction and Python Package (1)
93 pages
Da Ans (GKJ)
No ratings yet
Da Ans (GKJ)
11 pages
AI_Book 10_Part B_Answer Key (New Version)
No ratings yet
AI_Book 10_Part B_Answer Key (New Version)
16 pages
Unit-4
No ratings yet
Unit-4
6 pages
DATA ANALYTICS
No ratings yet
DATA ANALYTICS
6 pages
Lecture 1428550844
No ratings yet
Lecture 1428550844
11 pages
Data Warehousing & Data Mining - Study Material
No ratings yet
Data Warehousing & Data Mining - Study Material
27 pages
AI-Data Science
No ratings yet
AI-Data Science
21 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
UNIT I - Introduction - DataScience - New
No ratings yet
UNIT I - Introduction - DataScience - New
34 pages
IDS_sem ans unit 1
No ratings yet
IDS_sem ans unit 1
10 pages
Foundation of Data Science Solve Question Paper Aug 2022
No ratings yet
Foundation of Data Science Solve Question Paper Aug 2022
7 pages
Data Science Module 1 q & A
No ratings yet
Data Science Module 1 q & A
16 pages
Data Science Class X Notes
No ratings yet
Data Science Class X Notes
3 pages
data science notes 1
No ratings yet
data science notes 1
3 pages
Introduction to Data Science
No ratings yet
Introduction to Data Science
29 pages
Chapter 04 Advanced Use of Python Libraries for AI and Data Science
No ratings yet
Chapter 04 Advanced Use of Python Libraries for AI and Data Science
179 pages
R programming.Q.A
No ratings yet
R programming.Q.A
13 pages
Recent Trends in IT Q&A SEM VI 2022-23
100% (1)
Recent Trends in IT Q&A SEM VI 2022-23
17 pages
CS3352-QB Fds
No ratings yet
CS3352-QB Fds
12 pages
CH-2 Data Science Emerging Technology
No ratings yet
CH-2 Data Science Emerging Technology
20 pages
Chapter Two2
No ratings yet
Chapter Two2
21 pages
Basic Data Science Interview Questions
No ratings yet
Basic Data Science Interview Questions
18 pages
Week3 02 Dataset Characteristics
No ratings yet
Week3 02 Dataset Characteristics
41 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
33 pages
Class 8 Notes
No ratings yet
Class 8 Notes
5 pages
Data Visualization Module1
No ratings yet
Data Visualization Module1
44 pages
Module -1(Introduction to Data Wrangling)
No ratings yet
Module -1(Introduction to Data Wrangling)
29 pages
UNIT V
No ratings yet
UNIT V
47 pages
Chapter No.4 Exercise Solution (Computer)
No ratings yet
Chapter No.4 Exercise Solution (Computer)
8 pages
Unit Ii Getting Started With Pandas
No ratings yet
Unit Ii Getting Started With Pandas
35 pages
Chapter 4 - Data Science
No ratings yet
Chapter 4 - Data Science
4 pages
AUTOMATED EDA Libraries
No ratings yet
AUTOMATED EDA Libraries
12 pages
QUESTIONS AND ANSWERS
No ratings yet
QUESTIONS AND ANSWERS
19 pages
data science
No ratings yet
data science
42 pages
Data v2
No ratings yet
Data v2
25 pages
tool and lib in Data Science
No ratings yet
tool and lib in Data Science
32 pages
6th Sem Data Science (DSE) Answer
No ratings yet
6th Sem Data Science (DSE) Answer
17 pages
Data Science Notes
No ratings yet
Data Science Notes
4 pages
Ass-1 Prac
No ratings yet
Ass-1 Prac
23 pages
1 - Introduction To Data Science
No ratings yet
1 - Introduction To Data Science
6 pages
Unit 2
No ratings yet
Unit 2
58 pages
ds with py
No ratings yet
ds with py
39 pages
Module 1
No ratings yet
Module 1
35 pages
DATA SCIENCE
No ratings yet
DATA SCIENCE
31 pages
FDS CH1
No ratings yet
FDS CH1
4 pages
Chapter 2 - Introduction To Data Science
No ratings yet
Chapter 2 - Introduction To Data Science
56 pages
Aiml
No ratings yet
Aiml
11 pages
INTRODUCTION
No ratings yet
INTRODUCTION
3 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Data Scientist Roadmap
From Everand
Data Scientist Roadmap
Mohammed Ahmed
5/5 (1)
SML Resort Management-3
No ratings yet
SML Resort Management-3
43 pages
EliteBook-830-G8-DS
No ratings yet
EliteBook-830-G8-DS
4 pages
Wavetek User Guides
100% (2)
Wavetek User Guides
28 pages
Digrad AK ANG
No ratings yet
Digrad AK ANG
20 pages
Professional Teacher - Secondary (Social Studies)
No ratings yet
Professional Teacher - Secondary (Social Studies)
64 pages
Paper Overview Nonlinear MPC Applications
No ratings yet
Paper Overview Nonlinear MPC Applications
24 pages
Cse6669 HW 1
No ratings yet
Cse6669 HW 1
8 pages
F-789sga (Exp) - en
100% (2)
F-789sga (Exp) - en
35 pages
Primer Número Revista BIM y Puentes
No ratings yet
Primer Número Revista BIM y Puentes
45 pages
Mẫu Slide PowerPoint Đẹp
No ratings yet
Mẫu Slide PowerPoint Đẹp
20 pages
LESSON 2 - Purposive Communication
No ratings yet
LESSON 2 - Purposive Communication
4 pages
Cyber Security Analyst
No ratings yet
Cyber Security Analyst
1 page
SOC Selected Optimized Coordination Coordination Table For Selectivity 415 Vac
No ratings yet
SOC Selected Optimized Coordination Coordination Table For Selectivity 415 Vac
36 pages
02 Introduction To Numerical Analysis
No ratings yet
02 Introduction To Numerical Analysis
27 pages
A Pic Real Time Clock Ic RTC Using The ds1307 - Compress PDF
No ratings yet
A Pic Real Time Clock Ic RTC Using The ds1307 - Compress PDF
6 pages
Mayank Yadav Le Petit Chef
No ratings yet
Mayank Yadav Le Petit Chef
4 pages
Multithreading Interview Questions: Click Here
No ratings yet
Multithreading Interview Questions: Click Here
37 pages
Ceramic Capacitor - Wikipedia, The Free Encyclopedia
No ratings yet
Ceramic Capacitor - Wikipedia, The Free Encyclopedia
4 pages
NetScaler SSL Offload - Overview and Sample Configuration
No ratings yet
NetScaler SSL Offload - Overview and Sample Configuration
7 pages
Nvidia Tegra Presentation
No ratings yet
Nvidia Tegra Presentation
50 pages
DATAKOM D500 Ethernet Configuration
No ratings yet
DATAKOM D500 Ethernet Configuration
14 pages
order_1730353404-2
No ratings yet
order_1730353404-2
4 pages
Zoom+H6+Mini+Manual
No ratings yet
Zoom+H6+Mini+Manual
4 pages
Specs Fortnite
No ratings yet
Specs Fortnite
61 pages
Gmail - Invitation To Interview
No ratings yet
Gmail - Invitation To Interview
3 pages
BÀI TẬP SO SÁNH
No ratings yet
BÀI TẬP SO SÁNH
3 pages
Q Series Technologies Explained
No ratings yet
Q Series Technologies Explained
14 pages