Advanced Python Unit5 Pandas
Advanced Python Unit5 Pandas
Pandas: Pandas is a Python library used for working with data sets. It has functions for analyzing, cleaning, exploring, and manipulating
data. The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.
Why Use Pandas? Pandas allows us to analyze big data and make conclusions based on statistical theories. Pandas can clean messy data
sets, and make them readable and relevant. Relevant data is very important in data science.
What Can Pandas Do? Pandas gives you answers about the data. Like: • Is there a correlation between two or more columns? • What is
average value? • Max value? • Min value? Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or
NULL values. This is called cleaning the data.
Installation of Pandas:
If you have Python and PIP already installed on a system, then installation of Pandas is very easy. Install it using this command: pip install
pandas If this command fails, then use a python distribution that already has Pandas installed like, Anaconda, Spyder etc.
In [1]:
pip install pandas
import pandas as pd
Pandas Series: It is like a col in a table. it is like a one dimensional array holding data of any type.
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 1/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [2]:
# create a simple pandas series from a list
import pandas as pd
a=[10,'ab',30,'COMPUTER']
mycol=pd.Series(a)
print(mycol)
0 10
1 ab
2 30
3 COMPUTER
dtype: object
Labels: it is like index. if nothing is passed then col elements will have normal indexing which starts from 0 to the (number_of_elements-1).
it can be used to access any element of the series.
In [3]:
#return the element of series using index(label)
print(mycol[1])
ab
In [4]:
#creating your own index
import pandas as pd
b=[15,25,35]
s=pd.Series(b,index=['x','y','z'])
print(s)
x 15
y 25
z 35
dtype: int64
In [5]:
print(s['y'])
print(s[1])
25
25
In [6]:
#creating series through an array
import pandas as pd
import numpy as np
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
data=np.array([5,6,7,8])
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 2/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
ds=pd.Series(data,index=[10,20,30,40])
print(ds)
10 5
20 6
30 7
40 8
dtype: int32
In [7]:
#creating a series through a dictionary
dict={'a':10,'b':20,'c':30}
ds1=pd.Series(dict)
print(ds1)
a 10
b 20
c 30
dtype: int64
In [8]:
#creating series through dictionary using fewer keys as index
dict={'a':10,'b':20,'c':30}
ds2=pd.Series(dict,index=['a','b',1])
print(ds2)
a 10.0
b 20.0
1 NaN
dtype: float64
In [9]:
#creating series with repeated index
#index are not unique
dict={'a':10,'b':20,'c':30}
ds3=pd.Series(dict,index=['a','b','c','a','b','c'])
print(ds3)
a 10
b 20
c 30
a 10
b 20
c 30
dtype: int64
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 3/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [10]:
#creating series of a particular scalar value with your own index
ds4=pd.Series(20,index=[10,20,30,40,50])
print(ds4)
10 20
20 20
30 20
40 20
50 20
dtype: int64
In [11]:
#creating series using numpy fuctions
ds5=pd.Series(np.ones(10))
print(ds5)
ds6=pd.Series(np.arange(12,25))
print(ds6)
0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 1.0
7 1.0
8 1.0
9 1.0
dtype: float64
0 12
1 13
2 14
3 15
4 16
5 17
6 18
7 19
8 20
9 21
10 22
11 23
12 24
dtype: int32
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 4/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
DataFrames: Data sets in Pandas are usually multi-dimensional tables, called DataFrames. Series is like a column, a DataFrame is the whole
table. A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns. Data frame is
hetrogeneous i.e collection of different type of value
In [12]:
#creating a dataframe from three series(list)/ dictionary list:
import pandas as pd
data = {
"Names" :['abhishek','harsh','nikhil'],
"Rollno":[100,101,102],
'Marks':[90,88,85]
}
df=pd.DataFrame(data)
print(df)
In [13]:
#creating a dataframe from two series(list)/ dictionary list: with your own index
import pandas as pd
data = {
"Names" :['abhishek','harsh','nikhil'],
"Rollno":[100,101,102],
'Marks':[90,88,85]
}
df1=pd.DataFrame(data,index=['st1','st2','st3'])
print(df1)
In [14]:
# loc attribute is used for Locating a row in dataframe
#the output is a series of a particular row
print(df1.loc['st1'])
Names abhishek
Rollno 100
Marks 90
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Name: st1, dtype: object
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 5/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [15]:
print(df1['Names'])
#print(df1['st1'])--> not allowed
st1 abhishek
st2 harsh
st3 nikhil
Name: Names, dtype: object
In [16]:
print(df1.loc[['st1','st2']])
In [17]:
#creating dataframes using numpy arrays
import numpy as np
import pandas as pd
npar=np.array([['abc','xyz','uvw'],[101,102,103],[90,88,85]])
df2=pd.DataFrame(npar,columns=['ID1','ID2','ID3'],index=['Names','Rollno','Marks'])
print(df2)
In [18]:
#creating dataframe using numpy dictionary
import numpy as np
import pandas as pd
npar=np.array([['abc','xyz','uvw'],[101,102,103],[90,88,85]])
np_dict={'Names':npar[0],'RollNo':npar[1],'Marks':npar[2]}
df3=pd.DataFrame(np_dict)
print(df3)
In [19]:
#creating DF using list of lists
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
import pandas as pd
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 6/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
list_of_list=[['abc',101,90],['xyz',102,88],['uvw',103,85]]
df4=pd.DataFrame(list_of_list,columns=['Names','Rollno','Marks'])
print(df4)
In [20]:
#creating DF using list of Dictionary.
import pandas as pd
list_of_dict=[{'Names':'abc','Rollno':101,'Marks':90},
{'Names':'xyz','Rollno':102,'Marks':88},
{'Names':'uvw','Rollno':103,'Marks':85}]
df5=pd.DataFrame(list_of_dict)
print(df5)
In [21]:
#creating DF using dictionary of series.
s1=pd.Series(['abc','xyz','uvw'])
s2=pd.Series([101,102,103])
s3=pd.Series([90,88,85])
dict_of_series={'Names':s1,'Rollno':s2,'marks':s3}
df6=pd.DataFrame(dict_of_series)
print(df6)
In [22]:
#accessing index of a Dataframes
print(df6.index)
print(df1.index)
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 7/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [23]:
#Accessing Columns of a Data Frame
print(df6.columns)
In [24]:
#indexing and slicing is allowed on col header
print(df6['Names'])
print(df6['Names'][1])
print(df6['Names'][0:3:2])
0 abc
1 xyz
2 uvw
Name: Names, dtype: object
xyz
0 abc
2 uvw
Name: Names, dtype: object
In [25]:
#inserting a new col in existing DF
#new col will be added as right most col
df6['Address']=['noida','delhi','gzb']
print(df6)
In [26]:
df6['pra_status']=df6['marks']>80
print(df6)
In [27]:
#insertion of new column at desired location in a dataframe
df6.insert(1,'Section',['A','B','A'])#insert(location,col_name,values)
print(df6)
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 8/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [28]:
#dropping or deleting any col from a DF
#del keyword
del df6['pra_status']
print(df6)
In [29]:
#dropping a col with pop
#The pop() method removes the specified column from the DataFrame.
#The pop() method returns the removed columns as a Pandas Series object.
df7=df6.pop('Section')
print(df6)
print(df7)
DROP:
The drop() method removes the specified row or column.
By specifying the column axis (axis='columns'), the drop() method removes the specified column.
By specifying the row axis (axis='index'), the drop() method removes the specified row.
In [30]:
#dropping a col or row from DF using drop()
print(df6)
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 9/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
df6.drop('marks', axis=1,inplace=True)
print(df6)
In [31]:
#dropping a row
df6.drop(0,inplace=True)
print(df6)
In [32]:
#reseting index to default
df6.reset_index(drop=True,inplace=True)
print(df6)
In [33]:
#renaming a col of a DF
df6.rename(columns={'Names':'St_Name'})
In [34]:
print(df6)
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 10/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [35]:
#for permanent change we must set inplace as TRUE
df6.rename(columns={'Names':'St_Name'},inplace=True)
print(df6)
In [36]:
#creating a DF using numpy random
import pandas as pd
import numpy as np
dfr=pd.DataFrame(np.random.rand(250,5))
print(dfr)#it will print first and last five rows of DF
#print(dfr.to_string())#to_string method we can display complete DF
0 1 2 3 4
0 0.367943 0.821273 0.946864 0.072266 0.204492
1 0.170343 0.338376 0.030022 0.587543 0.271135
2 0.747926 0.305134 0.307875 0.619911 0.487226
3 0.203329 0.195697 0.029494 0.658130 0.124452
4 0.823576 0.944135 0.882877 0.233529 0.783012
.. ... ... ... ... ...
245 0.641040 0.193620 0.993376 0.606001 0.518447
246 0.228899 0.918485 0.972524 0.009383 0.074809
247 0.921852 0.555687 0.161811 0.380073 0.098612
248 0.001418 0.463014 0.465351 0.018872 0.421695
249 0.710599 0.584821 0.290798 0.594491 0.747483
In [37]:
dfr.head()#it will provide first five rows
Out[37]: 0 1 2 3 4
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 11/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
0 1 2 3 4
In [38]:
dfr.tail()#it will provide last five rows
Out[38]: 0 1 2 3 4
In [39]:
print(dfr.head(10))#for first 10 rows
print(dfr.tail(15))#for last 15 rows
0 1 2 3 4
0 0.367943 0.821273 0.946864 0.072266 0.204492
1 0.170343 0.338376 0.030022 0.587543 0.271135
2 0.747926 0.305134 0.307875 0.619911 0.487226
3 0.203329 0.195697 0.029494 0.658130 0.124452
4 0.823576 0.944135 0.882877 0.233529 0.783012
5 0.096257 0.105003 0.361212 0.891744 0.662168
6 0.141845 0.001445 0.731281 0.683104 0.680288
7 0.111243 0.453738 0.489484 0.382065 0.343865
8 0.669937 0.198583 0.782764 0.621946 0.115719
9 0.225105 0.541675 0.419373 0.123157 0.738246
0 1 2 3 4
235 0.947489 0.402227 0.854571 0.966971 0.454158
236 0.245189 0.816329 0.252606 0.794678 0.418327
237 0.272883 0.235026 0.198268 0.806964 0.571526
238 0.849212 0.561047 0.540024 0.758774 0.304188
239 0.168498 0.327868 0.324537 0.939529 0.314467
240 0.814759 0.496979 0.047802 0.245085 0.077342
241 0.987266 0.755909 0.662925 0.214623 0.384270
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
242 0.750276 0.061821 0.447048 0.527276 0.144031
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 12/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [40]:
print(df5)
In [41]:
df5.index=['st1','st2','st3']
print(df5)
In [42]:
print(df5.loc['st1','Rollno'])# loc: provides data by using labels
101
In [43]:
print(df5.iloc[0,1])#iloc: provides data by using index
101
In [44]:
df5.iloc[:2,:2]
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 13/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [48]:
#writing on a csv file
df5.to_csv(r"C:\Users\callage\Downloads\cs.csv")
#provide path where you want to save the data frame file
In [50]:
#writing on an excel file
df5.to_excel(r"C:\Users\callage\Downloads\login.xlsx")
In [57]:
#reading a csv file
dfcs=pd.read_csv(r"C:\\Users\\callage\\Downloads\\login.xlsx")
print(dfcs)
print(dfcs.to_string())
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_6920/338171361.py in <module>
1 #reading a csv file
----> 2 dfcs=pd.read_csv(r"C:\\Users\\callage\\Downloads\\login.xlsx")
3 print(dfcs)
4 print(dfcs.to_string())
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 14/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
C:\ProgramData\Anaconda3\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
C:\ProgramData\Anaconda3\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._get_header()
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 15/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
C:\ProgramData\Anaconda3\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()
C:\ProgramData\Anaconda3\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 14: invalid start byte
In [ ]:
pip install pandas --upgrade pandas
Aggregation: It is a function used to apply some aggregation across one or more columns Frequently used functions: sum: min: max: mean:
count:
In [ ]:
dfc1=pd.read_csv(r"C:\Users\Admin\Desktop\nba.csv")
In [ ]:
print(dfc1)
In [ ]:
print(dfc1.Weight.count())
print(dfc1.Salary.count())
print(dfc1.Salary.sum())
print(dfc1.Salary.max())
print(dfc1.Salary.min())
In [ ]:
#aggregate(): it is used to apply more than one function on desired col
print(dfc1.aggregate([sum,max,min,'mean','count']))
In [ ]:
#applying aggregate on selected columns and selected functions
print(dfc1.aggregate({'Number':['min','max'],'Age':['max', 'mean'],'Salary':['sum','count']}))
In [ ]:
#applying aggregate on selected columns and selected functions
print(dfc1.aggregate(x=('Number','min'),y=('Age','max'),z=('Salary','sum')))
In [ ]:
#agg(): can also be used for aggregation on complete DF
dfc1.agg('mean', axis='columns')
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 16/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [ ]:
df=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]],columns=['A','B','C'])
print(df)
In [ ]:
print(df.agg('mean', axis='columns'))# will provide value of aggfun rowwise
print(df.agg('mean'))# will provide value of aggfun colwise
Grouping: groupby() will be used to perform grouping : by this we are refering to a process involving one or more of the following steps:
In [ ]:
dfg=pd.DataFrame({
'A':['foo','bar','foo','bar','bar','foo','foo','bar','foo'],
'B':['one','one','one','two','three','two','three','one','two'],
'C':np.random.randint(2,9,size=(9,)),
'D':np.random.randint(10,16,size=(9,))
})
print(dfg)
In [ ]:
#groupby(): is used to create groups among entries of DF based on values in a col
g=dfg.groupby('A')
print(g)
In [ ]:
print(g.groups)# to check the groups inside groupby data frame
In [ ]:
#iterating through the groups df
for i,j in g:
print(i)
print(j)
In [ ]:
g1=dfg.groupby('B')
Loading
In [[MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
]:
print(g1.groups)
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 17/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [ ]:
for i,j in g1:
print(i)
print(j)
In [ ]:
#applying aggregation function on groups
print(g.sum())
print(g1.sum())
print(g.max())
print(g1.max())
print(g.min())
print(g1.min())
print(g['C'].max())# can perform agg. on a specific col among all the groups
In [ ]:
# Grouping can be done by multiple columns too.
#it form a hierarichal index
print(dfg)
g2=dfg.groupby(['A','B'])
print(g2.groups)
In [ ]:
for i,j in g2:
print(i)
print(j)
In [ ]:
#agg on multiple col groups
print(g2.sum())
In [ ]:
#get_group: is used to access a particular group
g3=g.get_group('foo')
print(g3)
In [ ]:
sts=pd.DataFrame({
'Names':['john','mary','henry','akash','vaibhav'],
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
'Teams':['alpha','beta','gama','delta','ita']
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 18/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
})
print(sts)
In [ ]:
st=pd.DataFrame({
'Names':['john','mary','akash','vaibhav','tony','harsh'],
'Marks':[56,78,89,67,59,75]
})
print(st)
Types of Merging
inner join outer join left join right join
In [ ]:
#inner join : is like intersection of dataframes based on a col
#merge(): function is used for merging dfs
rst=pd.merge(sts,st)#by default inner join and first col is considered
print(sts)
print(st)
print(rst)
In [ ]:
rst=pd.merge(sts,st,on='Names',how='inner')
print(rst)
In [ ]:
#outer join: is like union operation
rst=pd.merge(sts,st,on='Names',how='outer')
print(rst)
In [ ]:
#left join: corresponding to the left dataframe entries
rst=pd.merge(sts,st,on='Names',how='left')
print(rst)
In [ ]:
#right join: corresponding to the right dataframe entries
rst=pd.merge(sts,st,on='Names',how='right')
print(rst)
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 19/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [ ]:
#using indicator parameter in merge: set indicator as True then
#it will provide columns existance iorigin in the result of merging
#outer join: is like union operation with indicator
rst=pd.merge(sts,st,on='Names',how='outer',indicator=True)
print(rst)
In [ ]:
#suffix parameter of merge function
sts['Marks']=[50,55,43,48,67]
print(sts)
In [ ]:
#outer join: is like union operation
#both DF has common column
rst=pd.merge(sts,st,on='Names',how='outer')
print(rst)
In [ ]:
#suffix: it will provide name to duplicate cols of DFs
#both DF has common column
rst=pd.merge(sts,st,on='Names',how='outer',suffixes=('_sts','_st'))
print(rst)
In [ ]:
#data Manipulation
#loc and iloc
dfc=pd.read_csv(r'C:\Users\Admin\Desktop\nba.csv')
print(dfc)
In [ ]:
#iloc: it is index(int) based location finder
#select rows and cols based on default index
dfc.iloc[3] #out put in form of a data series
In [ ]:
dfc.iloc[3]['Team'] #labeling of series generated by iloc
In [ ]:
#iloc: can access more tahn one row at a time
dfc.iloc[[2,4,5,6]]
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 20/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [ ]:
dfc1=dfc.set_index('Name')
dfc1.head(10)
In [ ]:
dfc1.iloc[3] #it will still work by ignoring column used as label
In [ ]:
#dfc1.iloc['John Holland']
#Cannot index by location index with a non-integer key.
In [ ]:
#slicing: works with iloc
print(dfc.iloc[:4])
In [ ]:
print(dfc.iloc[:5,1:4])
In [ ]:
print(dfc.iloc[:5,[4,5]])
In [ ]:
#loc property: it is a user defined label based property
# will access a group of rows and col by labeling
print(dfc1.head(10))
In [ ]:
dfc1.loc['John Holland']
In [ ]:
dfc1.loc[['John Holland'],['Team','Number']]
# for accessing multiple col values of a located row
In [ ]:
dfc.loc[2] #it will also work on integer index of non labeled dfs
loc iloc
it is label based. it is index based
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 21/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
in slicing loc includes end point. iloc does not include end point it uses bot integer/label index. it uses only integer index
In [ ]:
#slicing with loc
dfc1.loc['John Holland':'Jordan Mickey'] #end point is inclusive
In [ ]:
dfc1.Salary>2500000 #boolean series
In [ ]:
#all the data of employees having salary greater than 2500000
dfc1.loc[dfc1.Salary>2500000]
In [ ]:
#All the data of employees having salary greater than 2500000
#and weight equal to 180
dfc1.loc[(dfc1.Salary>2500000)&(dfc1.Weight>200)]
In [ ]:
#Adding new rows in existing DF using loc
print(dfc)
dfc.loc[458]=[1,2,3,4,5,6,7,8,9]
print(dfc)
In [ ]:
#Apply Function on DFs
students=pd.DataFrame({
'Name':['ram','shyam','raju','sonu','golu','harsh','nikhil','abhi'],
'Marks1':[50,45,65,76,43,89,64,93],
'Marks2':[30,40,50,60,70,80,90,20]
})
print(students)
In [ ]:
#Apply a function to check
#if a person is fail or pass according to Marks1 along with axis
#you have to pass function name as a parameter in apply and set axis.
def check(row):
if row['Marks1']>=50:
return 'Pass'
else:
return 'Fail'
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 22/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
students['Result']=students.apply(check,axis=1)
print(students)
In [ ]:
#check whether names of a DF belongs to a list or not
def names(row):
if(row['Name'] in ['ram','raju','sonu','nikhil','abhi']):
return 'yes'
else:
return 'no'
students['flag']=students.apply(names,axis=1)
print(students)
In [ ]:
#Insert a col at the end having grades of student according to their marks
def grade(row):
n=row['Marks1']+row['Marks2']
p=(n*100)/200
if p>=90:
return 'A+'
elif p>=80:
return 'A'
elif p>=70:
return 'B+'
elif p>=60:
return 'B'
elif p>=50:
return 'C'
elif p>=40:
return 'D'
else:
return 'FAIL'
students['Grade']=students.apply(grade,axis=1)
print(students)
In [ ]:
students['Address']=['noida','delhi','gzb',np.nan,np.nan,'delhi','gzb',np.nan]
print(students)
In [ ]:
#isnull(): it will return true if value contained is null
students.isnull()
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 23/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
In [ ]:
#notnull(): reverse of isnull
students.notnull()
In [ ]:
#fillna(): is used to write a value inplace of null
students['Address'].fillna(value='Unknown',inplace=True)
print(students)
In [ ]:
#describe(): statistical data table
students.describe()
In [ ]:
#End of pandas
#https://pandas.pydata.org/pandas-docs
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 24/24