Data Handling Using Pandas-II
Data Handling Using Pandas-II
com
Descriptive Statistics
Statistics is a branch of mathematics that deals with
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
max()
Syntax-
df[‘columnname’].max()
Or
Or
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
min()
Syntax-
df[‘columnname’].min()
Or
Or
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
3-count()
Syntax-
df[‘columnname’].count()
Or
Or
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
4- mean()
Syntax-
df[‘columnname’].mean()
Or
Or
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
5- sum()
Syntax-
df[‘columnname’].sum()
Or
Or
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
6- median()
Syntax-
df[‘columnname’].median()
Or
Or
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
7- mode()
Syntax-
df[‘columnname’].mode()
Or
Or
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
8- quartile()
The word ‘’quartile” is taken from the word ‘’quantile’’ and the word
example-
dataset are below a given line. It also states that there are
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
9- Variance
Syntax-
df[‘columnname’].var()
Or
Or
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Syntax-
df[‘columnname’].std()
Or
Or
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Groupby()
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Example:- Program to group the data- city wise and find out
maximum temperature according to the city.
30:-Temp in Evening
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Sorting
Sorting in data frame can be done row wise or column wise. By default
sorting is done row wise.
Syntax:-
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
df.sort_values(by=’column_name’)
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
sort_index()
To sort the data based on index Value.
Syntax:
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Example 1:- To sort the data frame based on index in ascending order
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Example 2:- To sort the data frame based on index in descending order
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Renaming index
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Deleting index
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
1. pivot()
2. pivot-table()
pivot()
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Dataframe(df)
import pandas as pd
data={
'Year':['2018','2019','2018','2019','2018','2019'],
'Team':['MI','MI','RCB','RCB','CSK','CSK'],
'Runs':[2500,2650,2200,2400,2300,2700]}
df=pd.DataFrame(data)
pv=pd.pivot(df,index='Year',columns='team',values='Runs')
print (df)
print (pv)
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Output-
0 2018 MI 2500
1 2019 MI 2650
Year
pivot_table()
What will happen if we have multiple rows with the same values for these
columns.
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
pivot, but it aggregates the values from rows with duplicate entries for
'City':['DELHI','DELHI','MUMBAI','MUMBAI','CHENNAI','CH
ENNAI'],
'Temp':[28,30,22,24,32,34],
'Humidity':[60,55,80,70,90,85]
}
df=pd.DataFrame(data)
print (df)
pv=pd.pivot_table(df,index='City',values='Temp')
print (pv)
Output-
0 1-1-2019 60 28 DELHI
1 1-1-2019 55 30 DELHI
2 1-2-2019 80 22 MUMBAI
3 1-2-2019 70 24 MUMBAI
4 1-3-2019 90 32 CHENNAI
5 1-3-2019 85 34 CHENNAI
Temp
City
CHENNAI 33
DELHI 29
MUMBAI 23
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
'Date':['1-1-2019','1-1-2019','1-2-2019','1-2-2019','1-3-2019','1-
3-2019'],
'city':['DELHI','DELHI','MUMBAI','MUMBAI','CHENNAI','CH
ENNAI'],
'Temp':[28,30,22,24,32,34],
'Humidity':[60,55,80,70,90,85]
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
df=pd.DataFrame(data)
print (df)
pv=pd.pivot_table(df,index='city',values='Temp', aggfunc='max')
print (pv)
Output-
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
'city':['DELHI','DELHI','MUMBAI','MUMBAI','CHENNAI
','CHENNAI'],
'Temp':[28,30,22,24,32,34],
'Humidity':[60,55,80,70,90,85]
df=pd.DataFrame(data)
print (df)
print(pd.pivot_table(df,index='Date',columns='city'))
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Output-
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
In many cases, the data that we receive from many sources may not be
perfect. That means there may be some missing data. For example- in
the given program where employee name is missing in one row and date
of joining is missing in other row.
When we convert the data into data frame, the missing data is
represented by NaN (Not a Number). NaN is a default marker for
the missing value.
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
But this is not useful as it is filling any type of column with 0. We can
fill each column with a different value by passing the column name and
the value to be used to fill in that column.
For example- to fill ‘ename’ with ‘Name Missing’ and ‘Doj’ wityh ’00-00-
method.
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
If we do not want any missing data and want to remove those rows
having Na or NaN values, then we can use dropna() method.
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Environment and then scroll down to find mysql connector and mysql
1. Start Python
2. import mysql.connector package
3. Create or open a database
4. Open and establish a connection to the database
5. Create a cursor object or its instance (required for Pandas to
Mysql)
6. Read a sql query for (Mysql to Pandas) and execute a query for(
Pandas to Mysql)
7. Commit the transaction for(Pandas to Mysql)
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
After the execution of the program the records in employee table are-
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Example 2-
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
After the execution of the program the record in employee table got
updated from Sachin to Sachin Bhardwaj-
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Example 1- To retrieve column empid and Doj from employee table into
data frame emp.
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Example -2
To retrieve all the tables from database sachin into data frame emp.
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
provide core python based sql expressions and object oriented python
following library:
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
In Above program-
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
Example-2
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR
For More Updates Visit: www.python4csip.com
CREATED BY: SACHIN BHARDWAJ, PGT (CS) KV NO.1 TEZPUR, MR. VINOD KUMAR VERMA,
PGT (CS) KV OEF KANPUR