UNIT 4 Data Science Notes
UNIT 4 Data Science Notes
4. What points should be kept in mind while accessing data from any of the data
sources?
While accessing data from any of the data sources, following points should be kept
in mind:
1) Data which is available for public usage only should be taken up.
2) Personal datasets should only be used with the consent of the owner.
3) One should never breach someone’s privacy to collect data.
4) Data should only be taken form reliable sources as the data collected from
random sources can be wrong or unusable.
5) Reliable sources of data ensure the authenticity of data which helps in
proper training of the AI model.
5. What are the some of the popular formats for storing data?
1) CSV: CSV stands for comma separated values. It is a simple file format used
to store tabular data. Each line of this file is a data record and each record
consist of one or more fields which are separated by commas. Since the
values of records are separated by a comma, hence they are known as CSV
files.
2) Spreadsheet: A Spreadsheet is a piece of paper or a computer program
which is used for accounting and recording data using rows and columns into
which information can be entered. Microsoft excel is a program which helps
in creating spreadsheets.
3) SQL: SQL is a programming language also known as Structured Query
Language. It is a domain specific language used in programming and is
designed for managing data held in different kinds of DBMS (Database
Management System) It is particularly useful in handling structured data.
6. What is NumPy?
NumPy, which stands for Numerical Python, is the fundamental package for
Mathematical and logical operations on arrays in Python. It is a commonly used
package when it comes to working around numbers. NumPy gives a wide range of
arithmetic operations around numbers giving us an easier approach in working
with them. NumPy also works with arrays, which is nothing but a homogenous
collection of Data.
7. Define Array?
An array is a set of multiple values which are of same datatype. They can be
numbers, characters, booleans, etc. but only one datatype can be accessed through
an array. In NumPy, the arrays used are known as ND-arrays (N-Dimensional
Arrays) as NumPy comes with a feature of creating n-dimensional arrays in
Python.