Data Handling Using Pandas and Data Visualization - Assessment1 Class Room Notes
Data Handling Using Pandas and Data Visualization - Assessment1 Class Room Notes
I Data Handling using Pandas and Data Visualization Class Room Notes
Chapter No. 1
Python Pandas - I
Python Pandas is Python’s Library
Data Analysis refers to process of evaluating big data sets using analytical and statistical tools so as to
discover useful information and conclusions to support business decision making.
Pandas is open source, BSD library built for python programming language.
Pandas offers high-performance, easy to use data structures and data analysis tools
To work with pandas in python we need to import pandas library in your python environment
Features of Pandas
1) It can read or write in many different data formats (integer, float, double, string etc.,)
2) It can calculate all the possible ways data is organized i.e. across rows and down columns
3) It can easily select subset of data from bulky data sets and even combine multiple datasets
together. It has functionality to find and fill missing data.
4) It allows you to apply operations to independent groups within the data.
5) It supports reshaping of data into different forms.
6) It supports advanced times-series functionality ( Time series forecasting is the use of a model to
predict future value based on previously observed values)
7) It supports visualization by integrating matplotlib and seaborn etc., libraries
Note : Pandas is best at handling huge tabular data sets comprising different data formats.
There are many more other types of data structures suited for different types of functionality. Out of many
data structures of pandas two basic data structures – Series and Dataframes are universally popular for
their dependability. Another datastructure panel is also there, but we are not going to study panel.
-----------------------------------------------------------------------------------------------------------------------------------------------
# Program No.1
# Program to create an empty series
import pandas as pd
obj1=pd.Series()
print(obj1)
-----------------------------------------------------------------------------------------------------------------------------------------------
Output:
Series([], dtype: float64)
-----------------------------------------------------------------------------------------------------------------------------------------------
1 1-Dimensional 2- Dimensional
2 Homogeneous i.e. all the elements Heterogeneous i.e. Dataframes objects can
must be of same data type in a series have elements of different data types
object
3 Value mutable : values can change Value mutable : value can change
4 Size immutable: size of a series object Size mutable: size of a dataframe object
once created cannot be changed. If once created can change in place. you can
you want to add/drop an element, add/drop in an existing dataframe object
internally a new series object will be
created
When you store a NaN value in a series object, Pandas require the data type to be of floating point type.
Even if you specify an integer type, Pandas will promote it to a floating point type (automatically) because
NaN is not supported by integer types
Series Object Attributes
When you create a Series type object, all information related to it is available through attributes. You can
use these attributes in the following formats to get information about the series object
info.sort_values (ascending=True)
0 31
1 41
2 51
dtype: int64
-----------------------------------------------------------------------------------------------------------------------------------------------