0% found this document useful (0 votes)

33 views

Advanced Python Unit5 Pandas

Pandas is a Python library used for working with and analyzing datasets. It allows users to clean, manipulate, and explore data efficiently. Pandas provides functions for tasks like handling missing data, computing summary statistics, and correlating between variables. Data is stored in Pandas using Series (one-dimensional) and DataFrames (two-dimensional, like a spreadsheet). Series are like columns of data and DataFrames can contain multiple Series. Users can create Series and DataFrames from lists, NumPy arrays, and dictionaries to work with their data in Pandas.

Uploaded by

anupamtripathi3656

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Advanced Python Unit5 Pandas

Uploaded by

anupamtripathi3656

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

Pandas: Pandas is a Python library used for working with data sets. It has functions for analyzing, cleaning, exploring, and manipulating
data. The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

Why Use Pandas? Pandas allows us to analyze big data and make conclusions based on statistical theories. Pandas can clean messy data
sets, and make them readable and relevant. Relevant data is very important in data science.

What Can Pandas Do? Pandas gives you answers about the data. Like: • Is there a correlation between two or more columns? • What is
average value? • Max value? • Min value? Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or
NULL values. This is called cleaning the data.

Installation of Pandas:

If you have Python and PIP already installed on a system, then installation of Pandas is very easy. Install it using this command: pip install
pandas If this command fails, then use a python distribution that already has Pandas installed like, Anaconda, Spyder etc.

In [1]:
pip install pandas

Requirement already satisfied: pandas in c:\programdata\anaconda3\lib\site-packages (1.3.4)

Requirement already satisfied: python-dateutil>=2.7.3 in c:\programdata\anaconda3\lib\site-packages (from pandas) (2.8.2)
Requirement already satisfied: numpy>=1.17.3 in c:\programdata\anaconda3\lib\site-packages (from pandas) (1.20.3)
Requirement already satisfied: pytz>=2017.3 in c:\programdata\anaconda3\lib\site-packages (from pandas) (2021.3)
Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from python-dateutil>=2.7.3->panda
s) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
WARNING: Ignoring invalid distribution -andas (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -andas (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -andas (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -andas (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -andas (c:\programdata\anaconda3\lib\site-packages)
WARNING: Ignoring invalid distribution -andas (c:\programdata\anaconda3\lib\site-packages)
Import Pandas Once Pandas is installed, import it in your applications by adding the import keyword: import pandas Now Pandas is
imported and ready to use. alias pd is used for pandas

import pandas as pd

Pandas Series: It is like a col in a table. it is like a one dimensional array holding data of any type.

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 1/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

In [2]:
# create a simple pandas series from a list
import pandas as pd
a=[10,'ab',30,'COMPUTER']
mycol=pd.Series(a)
print(mycol)

0 10
1 ab
2 30
3 COMPUTER
dtype: object
Labels: it is like index. if nothing is passed then col elements will have normal indexing which starts from 0 to the (number_of_elements-1).
it can be used to access any element of the series.

In [3]:
#return the element of series using index(label)
print(mycol[1])

In [4]:
#creating your own index
import pandas as pd
b=[15,25,35]
s=pd.Series(b,index=['x','y','z'])
print(s)

x 15
y 25
z 35
dtype: int64

In [5]:
print(s['y'])
print(s[1])

25
25

In [6]:
#creating series through an array
import pandas as pd
import numpy as np
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
data=np.array([5,6,7,8])

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 2/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

ds=pd.Series(data,index=[10,20,30,40])
print(ds)

10 5
20 6
30 7
40 8
dtype: int32

In [7]:
#creating a series through a dictionary
dict={'a':10,'b':20,'c':30}
ds1=pd.Series(dict)
print(ds1)

a 10
b 20
c 30
dtype: int64

In [8]:
#creating series through dictionary using fewer keys as index
dict={'a':10,'b':20,'c':30}
ds2=pd.Series(dict,index=['a','b',1])
print(ds2)

a 10.0
b 20.0
1 NaN
dtype: float64

In [9]:
#creating series with repeated index
#index are not unique
dict={'a':10,'b':20,'c':30}
ds3=pd.Series(dict,index=['a','b','c','a','b','c'])
print(ds3)

a 10
b 20
c 30
a 10
b 20
c 30
dtype: int64
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 3/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

In [10]:
#creating series of a particular scalar value with your own index
ds4=pd.Series(20,index=[10,20,30,40,50])
print(ds4)

10 20
20 20
30 20
40 20
50 20
dtype: int64

In [11]:
#creating series using numpy fuctions
ds5=pd.Series(np.ones(10))
print(ds5)
ds6=pd.Series(np.arange(12,25))
print(ds6)

0 1.0
1 1.0
2 1.0
3 1.0
4 1.0
5 1.0
6 1.0
7 1.0
8 1.0
9 1.0
dtype: float64
0 12
1 13
2 14
3 15
4 16
5 17
6 18
7 19
8 20
9 21
10 22
11 23
12 24
dtype: int32
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 4/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

DataFrames: Data sets in Pandas are usually multi-dimensional tables, called DataFrames. Series is like a column, a DataFrame is the whole
table. A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns. Data frame is
hetrogeneous i.e collection of different type of value

In [12]:
#creating a dataframe from three series(list)/ dictionary list:
import pandas as pd
data = {
"Names" :['abhishek','harsh','nikhil'],
"Rollno":[100,101,102],
'Marks':[90,88,85]
}
df=pd.DataFrame(data)
print(df)

Names Rollno Marks

0 abhishek 100 90
1 harsh 101 88
2 nikhil 102 85

In [13]:
#creating a dataframe from two series(list)/ dictionary list: with your own index
import pandas as pd
data = {
"Names" :['abhishek','harsh','nikhil'],
"Rollno":[100,101,102],
'Marks':[90,88,85]
}
df1=pd.DataFrame(data,index=['st1','st2','st3'])
print(df1)

Names Rollno Marks

st1 abhishek 100 90
st2 harsh 101 88
st3 nikhil 102 85

In [14]:
# loc attribute is used for Locating a row in dataframe
#the output is a series of a particular row
print(df1.loc['st1'])

Names abhishek
Rollno 100
Marks 90
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Name: st1, dtype: object
localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 5/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

In [15]:
print(df1['Names'])
#print(df1['st1'])--> not allowed

st1 abhishek
st2 harsh
st3 nikhil
Name: Names, dtype: object

In [16]:
print(df1.loc[['st1','st2']])

Names Rollno Marks

st1 abhishek 100 90
st2 harsh 101 88

In [17]:
#creating dataframes using numpy arrays
import numpy as np
import pandas as pd
npar=np.array([['abc','xyz','uvw'],[101,102,103],[90,88,85]])
df2=pd.DataFrame(npar,columns=['ID1','ID2','ID3'],index=['Names','Rollno','Marks'])
print(df2)

ID1 ID2 ID3

Names abc xyz uvw
Rollno 101 102 103
Marks 90 88 85

In [18]:
#creating dataframe using numpy dictionary
import numpy as np
import pandas as pd
npar=np.array([['abc','xyz','uvw'],[101,102,103],[90,88,85]])
np_dict={'Names':npar[0],'RollNo':npar[1],'Marks':npar[2]}
df3=pd.DataFrame(np_dict)
print(df3)

Names RollNo Marks

0 abc 101 90
1 xyz 102 88
2 uvw 103 85

In [19]:
#creating DF using list of lists
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
import pandas as pd

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 6/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
list_of_list=[['abc',101,90],['xyz',102,88],['uvw',103,85]]
df4=pd.DataFrame(list_of_list,columns=['Names','Rollno','Marks'])
print(df4)

Names Rollno Marks

0 abc 101 90
1 xyz 102 88
2 uvw 103 85

In [20]:
#creating DF using list of Dictionary.
import pandas as pd
list_of_dict=[{'Names':'abc','Rollno':101,'Marks':90},
{'Names':'xyz','Rollno':102,'Marks':88},
{'Names':'uvw','Rollno':103,'Marks':85}]
df5=pd.DataFrame(list_of_dict)
print(df5)

Names Rollno Marks

0 abc 101 90
1 xyz 102 88
2 uvw 103 85

In [21]:
#creating DF using dictionary of series.
s1=pd.Series(['abc','xyz','uvw'])
s2=pd.Series([101,102,103])
s3=pd.Series([90,88,85])
dict_of_series={'Names':s1,'Rollno':s2,'marks':s3}
df6=pd.DataFrame(dict_of_series)
print(df6)

Names Rollno marks

0 abc 101 90
1 xyz 102 88
2 uvw 103 85

In [22]:
#accessing index of a Dataframes
print(df6.index)
print(df1.index)

RangeIndex(start=0, stop=3, step=1)

Index(['st1', 'st2', 'st3'], dtype='object')

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 7/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

In [23]:
#Accessing Columns of a Data Frame
print(df6.columns)

Index(['Names', 'Rollno', 'marks'], dtype='object')

In [24]:
#indexing and slicing is allowed on col header
print(df6['Names'])
print(df6['Names'][1])
print(df6['Names'][0:3:2])

0 abc
1 xyz
2 uvw
Name: Names, dtype: object
xyz
0 abc
2 uvw
Name: Names, dtype: object

In [25]:
#inserting a new col in existing DF
#new col will be added as right most col
df6['Address']=['noida','delhi','gzb']
print(df6)

Names Rollno marks Address

0 abc 101 90 noida
1 xyz 102 88 delhi
2 uvw 103 85 gzb

In [26]:
df6['pra_status']=df6['marks']>80
print(df6)

Names Rollno marks Address pra_status

0 abc 101 90 noida True
1 xyz 102 88 delhi True
2 uvw 103 85 gzb True

In [27]:
#insertion of new column at desired location in a dataframe
df6.insert(1,'Section',['A','B','A'])#insert(location,col_name,values)
print(df6)
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 8/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

Names Section Rollno marks Address pra_status

0 abc A 101 90 noida True
1 xyz B 102 88 delhi True
2 uvw A 103 85 gzb True

In [28]:
#dropping or deleting any col from a DF
#del keyword
del df6['pra_status']
print(df6)

Names Section Rollno marks Address

0 abc A 101 90 noida
1 xyz B 102 88 delhi
2 uvw A 103 85 gzb

In [29]:
#dropping a col with pop
#The pop() method removes the specified column from the DataFrame.
#The pop() method returns the removed columns as a Pandas Series object.
df7=df6.pop('Section')
print(df6)
print(df7)

Names Rollno marks Address

0 abc 101 90 noida
1 xyz 102 88 delhi
2 uvw 103 85 gzb
0 A
1 B
2 A
Name: Section, dtype: object

DROP:
The drop() method removes the specified row or column.

By specifying the column axis (axis='columns'), the drop() method removes the specified column.

By specifying the row axis (axis='index'), the drop() method removes the specified row.

In [30]:
#dropping a col or row from DF using drop()
print(df6)
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 9/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

df6.drop('marks', axis=1,inplace=True)
print(df6)

Names Rollno marks Address

0 abc 101 90 noida
1 xyz 102 88 delhi
2 uvw 103 85 gzb
Names Rollno Address
0 abc 101 noida
1 xyz 102 delhi
2 uvw 103 gzb

In [31]:
#dropping a row
df6.drop(0,inplace=True)
print(df6)

Names Rollno Address

1 xyz 102 delhi
2 uvw 103 gzb

In [32]:
#reseting index to default
df6.reset_index(drop=True,inplace=True)
print(df6)

Names Rollno Address

0 xyz 102 delhi
1 uvw 103 gzb

In [33]:
#renaming a col of a DF
df6.rename(columns={'Names':'St_Name'})

Out[33]: St_Name Rollno Address

0 xyz 102 delhi

1 uvw 103 gzb

In [34]:
print(df6)

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 10/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

Names Rollno Address

0 xyz 102 delhi
1 uvw 103 gzb

In [35]:
#for permanent change we must set inplace as TRUE
df6.rename(columns={'Names':'St_Name'},inplace=True)
print(df6)

St_Name Rollno Address

0 xyz 102 delhi
1 uvw 103 gzb

In [36]:
#creating a DF using numpy random
import pandas as pd
import numpy as np
dfr=pd.DataFrame(np.random.rand(250,5))
print(dfr)#it will print first and last five rows of DF
#print(dfr.to_string())#to_string method we can display complete DF

[250 rows x 5 columns]

In [37]:
dfr.head()#it will provide first five rows

Out[37]: 0 1 2 3 4

0 0.367943 0.821273 0.946864 0.072266 0.204492

1 0.170343 0.338376 0.030022 0.587543 0.271135

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
2 0.747926 0.305134 0.307875 0.619911 0.487226

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 11/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

0 1 2 3 4

3 0.203329 0.195697 0.029494 0.658130 0.124452

4 0.823576 0.944135 0.882877 0.233529 0.783012

In [38]:
dfr.tail()#it will provide last five rows

Out[38]: 0 1 2 3 4

245 0.641040 0.193620 0.993376 0.606001 0.518447

246 0.228899 0.918485 0.972524 0.009383 0.074809

247 0.921852 0.555687 0.161811 0.380073 0.098612

248 0.001418 0.463014 0.465351 0.018872 0.421695

249 0.710599 0.584821 0.290798 0.594491 0.747483

In [39]:
print(dfr.head(10))#for first 10 rows
print(dfr.tail(15))#for last 15 rows

0 1 2 3 4
0 0.367943 0.821273 0.946864 0.072266 0.204492
1 0.170343 0.338376 0.030022 0.587543 0.271135
2 0.747926 0.305134 0.307875 0.619911 0.487226
3 0.203329 0.195697 0.029494 0.658130 0.124452
4 0.823576 0.944135 0.882877 0.233529 0.783012
5 0.096257 0.105003 0.361212 0.891744 0.662168
6 0.141845 0.001445 0.731281 0.683104 0.680288
7 0.111243 0.453738 0.489484 0.382065 0.343865
8 0.669937 0.198583 0.782764 0.621946 0.115719
9 0.225105 0.541675 0.419373 0.123157 0.738246
0 1 2 3 4
235 0.947489 0.402227 0.854571 0.966971 0.454158
236 0.245189 0.816329 0.252606 0.794678 0.418327
237 0.272883 0.235026 0.198268 0.806964 0.571526
238 0.849212 0.561047 0.540024 0.758774 0.304188
239 0.168498 0.327868 0.324537 0.939529 0.314467
240 0.814759 0.496979 0.047802 0.245085 0.077342
241 0.987266 0.755909 0.662925 0.214623 0.384270
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
242 0.750276 0.061821 0.447048 0.527276 0.144031

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 12/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

243 0.131279 0.069744 0.992107 0.563838 0.461645

244 0.346798 0.246244 0.902342 0.073600 0.842876
245 0.641040 0.193620 0.993376 0.606001 0.518447
246 0.228899 0.918485 0.972524 0.009383 0.074809
247 0.921852 0.555687 0.161811 0.380073 0.098612
248 0.001418 0.463014 0.465351 0.018872 0.421695
249 0.710599 0.584821 0.290798 0.594491 0.747483

In [40]:
print(df5)

Names Rollno Marks

0 abc 101 90
1 xyz 102 88
2 uvw 103 85

In [41]:
df5.index=['st1','st2','st3']
print(df5)

Names Rollno Marks

st1 abc 101 90
st2 xyz 102 88
st3 uvw 103 85

In [42]:
print(df5.loc['st1','Rollno'])# loc: provides data by using labels

101

In [43]:
print(df5.iloc[0,1])#iloc: provides data by using index

101

In [44]:
df5.iloc[:2,:2]

Out[44]: Names Rollno

st1 abc 101

st2 xyz 102

Writing any existing Data Frame on a file.(.csv, .xlsx,.js)

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 13/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

In [48]:
#writing on a csv file
df5.to_csv(r"C:\Users\callage\Downloads\cs.csv")
#provide path where you want to save the data frame file

In [50]:
#writing on an excel file
df5.to_excel(r"C:\Users\callage\Downloads\login.xlsx")

Reading any csv, xlsx,js files and converting it into DF

In [54]:
#reading a csv file
dfc=pd.read_excel(r"C:\\Users\\callage\\Downloads\\login.xlsx")
print(dfc)

Unnamed: 0 Names Rollno Marks

0 st1 abc 101 90
1 st2 xyz 102 88
2 st3 uvw 103 85

In [57]:
#reading a csv file
dfcs=pd.read_csv(r"C:\\Users\\callage\\Downloads\\login.xlsx")
print(dfcs)
print(dfcs.to_string())

---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_6920/338171361.py in <module>
1 #reading a csv file
----> 2 dfcs=pd.read_csv(r"C:\\Users\\callage\\Downloads\\login.xlsx")
3 print(dfcs)
4 print(dfcs.to_string())

C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)

309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
312
313 return wrapper
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 14/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in read_csv(filepath_or_buffer, sep, delimiter, h

eader, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_value
s, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse
_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thous
ands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect,
error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_option
s)
584 kwds.update(kwds_defaults)
585
--> 586 return _read(filepath_or_buffer, kwds)
587
588

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _read(filepath_or_buffer, kwds)

480
481 # Create the parser.
--> 482 parser = TextFileReader(filepath_or_buffer, **kwds)
483
484 if chunksize or iterator:

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in init(self, f, engine, **kwds)

809 self.options["has_index_names"] = kwds["has_index_names"]
810
--> 811 self._engine = self._make_engine(self.engine)
812
813 def close(self):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _make_engine(self, engine)

1038 )
1039 # error: Too many arguments for "ParserBase"
-> 1040 return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
1041
1042 def _failover_to_python(self):

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py in init(self, src, **kwds)

67 kwds["dtype"] = ensure_dtype_objs(kwds.get("dtype", None))
68 try:
---> 69 self._reader = parsers.TextReader(self.handles.handle, **kwds)
70 except Exception:
71 self.handles.close()

C:\ProgramData\Anaconda3\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

C:\ProgramData\Anaconda3\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._get_header()
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 15/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
C:\ProgramData\Anaconda3\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

C:\ProgramData\Anaconda3\lib\site-packages\pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 14: invalid start byte

In [ ]:
pip install pandas --upgrade pandas

Aggregation: It is a function used to apply some aggregation across one or more columns Frequently used functions: sum: min: max: mean:
count:

In [ ]:
dfc1=pd.read_csv(r"C:\Users\Admin\Desktop\nba.csv")

In [ ]:
print(dfc1)

In [ ]:
print(dfc1.Weight.count())
print(dfc1.Salary.count())
print(dfc1.Salary.sum())
print(dfc1.Salary.max())
print(dfc1.Salary.min())

In [ ]:
#aggregate(): it is used to apply more than one function on desired col
print(dfc1.aggregate([sum,max,min,'mean','count']))

In [ ]:
#applying aggregate on selected columns and selected functions
print(dfc1.aggregate({'Number':['min','max'],'Age':['max', 'mean'],'Salary':['sum','count']}))

In [ ]:
#applying aggregate on selected columns and selected functions
print(dfc1.aggregate(x=('Number','min'),y=('Age','max'),z=('Salary','sum')))

In [ ]:
#agg(): can also be used for aggregation on complete DF
dfc1.agg('mean', axis='columns')

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 16/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

In [ ]:
df=pd.DataFrame([[1,2,3],[4,5,6],[7,8,9]],columns=['A','B','C'])
print(df)

In [ ]:
print(df.agg('mean', axis='columns'))# will provide value of aggfun rowwise
print(df.agg('mean'))# will provide value of aggfun colwise

Grouping: groupby() will be used to perform grouping : by this we are refering to a process involving one or more of the following steps:

> Splitting the data into groups based on some criteria

> Applying a function on each group independently
> combining the resuls into a data structure

In [ ]:
dfg=pd.DataFrame({
'A':['foo','bar','foo','bar','bar','foo','foo','bar','foo'],
'B':['one','one','one','two','three','two','three','one','two'],
'C':np.random.randint(2,9,size=(9,)),
'D':np.random.randint(10,16,size=(9,))
})
print(dfg)

In [ ]:
#groupby(): is used to create groups among entries of DF based on values in a col
g=dfg.groupby('A')
print(g)

In [ ]:
print(g.groups)# to check the groups inside groupby data frame

In [ ]:
#iterating through the groups df
for i,j in g:
print(i)
print(j)

In [ ]:
g1=dfg.groupby('B')

Loading
In [[MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
]:
print(g1.groups)

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 17/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

In [ ]:
for i,j in g1:
print(i)
print(j)

In [ ]:
#applying aggregation function on groups
print(g.sum())
print(g1.sum())
print(g.max())
print(g1.max())
print(g.min())
print(g1.min())
print(g['C'].max())# can perform agg. on a specific col among all the groups

In [ ]:
# Grouping can be done by multiple columns too.
#it form a hierarichal index
print(dfg)
g2=dfg.groupby(['A','B'])
print(g2.groups)

In [ ]:
for i,j in g2:
print(i)
print(j)

In [ ]:
#agg on multiple col groups
print(g2.sum())

In [ ]:
#get_group: is used to access a particular group
g3=g.get_group('foo')
print(g3)

Merging two Data Frames

In [ ]:
sts=pd.DataFrame({
'Names':['john','mary','henry','akash','vaibhav'],
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
'Teams':['alpha','beta','gama','delta','ita']

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 18/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
})
print(sts)

In [ ]:
st=pd.DataFrame({
'Names':['john','mary','akash','vaibhav','tony','harsh'],
'Marks':[56,78,89,67,59,75]
})
print(st)

Types of Merging
inner join outer join left join right join

In [ ]:
#inner join : is like intersection of dataframes based on a col
#merge(): function is used for merging dfs
rst=pd.merge(sts,st)#by default inner join and first col is considered
print(sts)
print(st)
print(rst)

In [ ]:
rst=pd.merge(sts,st,on='Names',how='inner')
print(rst)

In [ ]:
#outer join: is like union operation
rst=pd.merge(sts,st,on='Names',how='outer')
print(rst)

In [ ]:
#left join: corresponding to the left dataframe entries
rst=pd.merge(sts,st,on='Names',how='left')
print(rst)

In [ ]:
#right join: corresponding to the right dataframe entries
rst=pd.merge(sts,st,on='Names',how='right')
print(rst)
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 19/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

In [ ]:
#using indicator parameter in merge: set indicator as True then
#it will provide columns existance iorigin in the result of merging
#outer join: is like union operation with indicator
rst=pd.merge(sts,st,on='Names',how='outer',indicator=True)
print(rst)

In [ ]:
#suffix parameter of merge function
sts['Marks']=[50,55,43,48,67]
print(sts)

In [ ]:
#outer join: is like union operation
#both DF has common column
rst=pd.merge(sts,st,on='Names',how='outer')
print(rst)

In [ ]:
#suffix: it will provide name to duplicate cols of DFs
#both DF has common column
rst=pd.merge(sts,st,on='Names',how='outer',suffixes=('_sts','_st'))
print(rst)

In [ ]:
#data Manipulation
#loc and iloc
dfc=pd.read_csv(r'C:\Users\Admin\Desktop\nba.csv')
print(dfc)

In [ ]:
#iloc: it is index(int) based location finder
#select rows and cols based on default index
dfc.iloc[3] #out put in form of a data series

In [ ]:
dfc.iloc[3]['Team'] #labeling of series generated by iloc

In [ ]:
#iloc: can access more tahn one row at a time
dfc.iloc[[2,4,5,6]]
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 20/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

In [ ]:
dfc1=dfc.set_index('Name')
dfc1.head(10)

In [ ]:
dfc1.iloc[3] #it will still work by ignoring column used as label

In [ ]:
#dfc1.iloc['John Holland']
#Cannot index by location index with a non-integer key.

In [ ]:
#slicing: works with iloc
print(dfc.iloc[:4])

In [ ]:
print(dfc.iloc[:5,1:4])

In [ ]:
print(dfc.iloc[:5,[4,5]])

In [ ]:
#loc property: it is a user defined label based property
# will access a group of rows and col by labeling
print(dfc1.head(10))

In [ ]:
dfc1.loc['John Holland']

In [ ]:
dfc1.loc[['John Holland'],['Team','Number']]
# for accessing multiple col values of a located row

In [ ]:
dfc.loc[2] #it will also work on integer index of non labeled dfs

loc iloc
it is label based. it is index based
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 21/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

in slicing loc includes end point. iloc does not include end point it uses bot integer/label index. it uses only integer index

In [ ]:
#slicing with loc
dfc1.loc['John Holland':'Jordan Mickey'] #end point is inclusive

In [ ]:
dfc1.Salary>2500000 #boolean series

In [ ]:
#all the data of employees having salary greater than 2500000
dfc1.loc[dfc1.Salary>2500000]

In [ ]:
#All the data of employees having salary greater than 2500000
#and weight equal to 180
dfc1.loc[(dfc1.Salary>2500000)&(dfc1.Weight>200)]

In [ ]:
#Adding new rows in existing DF using loc
print(dfc)
dfc.loc[458]=[1,2,3,4,5,6,7,8,9]
print(dfc)

In [ ]:
#Apply Function on DFs
students=pd.DataFrame({
'Name':['ram','shyam','raju','sonu','golu','harsh','nikhil','abhi'],
'Marks1':[50,45,65,76,43,89,64,93],
'Marks2':[30,40,50,60,70,80,90,20]
})
print(students)

In [ ]:
#Apply a function to check
#if a person is fail or pass according to Marks1 along with axis
#you have to pass function name as a parameter in apply and set axis.
def check(row):
if row['Marks1']>=50:
return 'Pass'
else:
return 'Fail'
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 22/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas
students['Result']=students.apply(check,axis=1)
print(students)

In [ ]:
#check whether names of a DF belongs to a list or not
def names(row):
if(row['Name'] in ['ram','raju','sonu','nikhil','abhi']):
return 'yes'
else:
return 'no'
students['flag']=students.apply(names,axis=1)
print(students)

In [ ]:
#Insert a col at the end having grades of student according to their marks
def grade(row):
n=row['Marks1']+row['Marks2']
p=(n*100)/200
if p>=90:
return 'A+'
elif p>=80:
return 'A'
elif p>=70:
return 'B+'
elif p>=60:
return 'B'
elif p>=50:
return 'C'
elif p>=40:
return 'D'
else:
return 'FAIL'

students['Grade']=students.apply(grade,axis=1)
print(students)

In [ ]:
students['Address']=['noida','delhi','gzb',np.nan,np.nan,'delhi','gzb',np.nan]
print(students)

In [ ]:
#isnull(): it will return true if value contained is null
students.isnull()
Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 23/24
6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

In [ ]:
#notnull(): reverse of isnull
students.notnull()

In [ ]:
#fillna(): is used to write a value inplace of null
students['Address'].fillna(value='Unknown',inplace=True)
print(students)

In [ ]:
#describe(): statistical data table
students.describe()

In [ ]:
#End of pandas
#https://pandas.pydata.org/pandas-docs

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

localhost:8888/nbconvert/html/Advanced_Python_Unit5_Pandas.ipynb?download=false 24/24

Pandas
No ratings yet
Pandas
82 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
IP Slybuss
No ratings yet
IP Slybuss
21 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
IpNotes
No ratings yet
IpNotes
72 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
Unit 4
No ratings yet
Unit 4
36 pages
IP 12th Chapter 2
No ratings yet
IP 12th Chapter 2
8 pages
Introduction to Pandas & Data Structures
No ratings yet
Introduction to Pandas & Data Structures
11 pages
UNIT 3(Chapter 2) Pandas
No ratings yet
UNIT 3(Chapter 2) Pandas
43 pages
Panda Ncert 1
No ratings yet
Panda Ncert 1
36 pages
P03 Introduction To Pandas Ans
No ratings yet
P03 Introduction To Pandas Ans
45 pages
Python Pandas
No ratings yet
Python Pandas
177 pages
Python Pandas
No ratings yet
Python Pandas
96 pages
09_Pandas slides
No ratings yet
09_Pandas slides
33 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
DFF
No ratings yet
DFF
22 pages
Pandas Class XII (2021-22)
No ratings yet
Pandas Class XII (2021-22)
246 pages
DATA HANDLING AND CSV 2024- 2025
No ratings yet
DATA HANDLING AND CSV 2024- 2025
12 pages
CH 2
No ratings yet
CH 2
36 pages
XII_ip_Panda_I_Part_I_2023 (1) 1 1
No ratings yet
XII_ip_Panda_I_Part_I_2023 (1) 1 1
25 pages
Data Handling Python NCERT
No ratings yet
Data Handling Python NCERT
36 pages
Yashas RajuIP Practical File
No ratings yet
Yashas RajuIP Practical File
36 pages
Unit 2
No ratings yet
Unit 2
81 pages
Ln. 1 - Data handling using Pandas - Series & Dataframe
No ratings yet
Ln. 1 - Data handling using Pandas - Series & Dataframe
14 pages
DAY6 Pandas Seaborn
No ratings yet
DAY6 Pandas Seaborn
97 pages
Pandas Notes 1
No ratings yet
Pandas Notes 1
6 pages
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
No ratings yet
Class Notes: Class: XII Date: 7-Apr-2020 Subject: Informatics Practices Topic: 2. Python Pandas
4 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
P Unit-4 NP
No ratings yet
P Unit-4 NP
30 pages
Python Pandas
No ratings yet
Python Pandas
22 pages
Httpsncert.nic.Intextbookpdfleip102.PDF
No ratings yet
Httpsncert.nic.Intextbookpdfleip102.PDF
36 pages
Ip 102
No ratings yet
Ip 102
36 pages
XII - Informatics Practices (LAB MANUAL)
100% (1)
XII - Informatics Practices (LAB MANUAL)
42 pages
Analyzing Data Using Pandas
No ratings yet
Analyzing Data Using Pandas
4 pages
Unit - V Introduction To Pandas in Python
No ratings yet
Unit - V Introduction To Pandas in Python
21 pages
Practical Xii 11-25
No ratings yet
Practical Xii 11-25
14 pages
IP Practical File - Reference
No ratings yet
IP Practical File - Reference
98 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
Ip Chapter 1
No ratings yet
Ip Chapter 1
36 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
Series
No ratings yet
Series
23 pages
ATA Andling - 25 MARKS: D H Pandas
No ratings yet
ATA Andling - 25 MARKS: D H Pandas
102 pages
Assignment-1 (Python Pandas-Series Object and Data Frame: 1. Answer The Following
100% (1)
Assignment-1 (Python Pandas-Series Object and Data Frame: 1. Answer The Following
8 pages
Python Code
No ratings yet
Python Code
44 pages
? Pandas ?
No ratings yet
? Pandas ?
116 pages
XII-IP-QuickRevision 2 in 1
No ratings yet
XII-IP-QuickRevision 2 in 1
13 pages
Mohit
No ratings yet
Mohit
19 pages
Final 1
No ratings yet
Final 1
38 pages
Data Handling Using Pandas and Data Visualization - Assessment1 Class Room Notes
No ratings yet
Data Handling Using Pandas and Data Visualization - Assessment1 Class Room Notes
18 pages
Exp 25_26
No ratings yet
Exp 25_26
17 pages
Python Pandas Module - Introduction-07-11-2023
No ratings yet
Python Pandas Module - Introduction-07-11-2023
84 pages
DataSeries Notes File
No ratings yet
DataSeries Notes File
12 pages
Chapter 10 Python Pandas
No ratings yet
Chapter 10 Python Pandas
40 pages
PYTHON UNIT-5 Part-C
No ratings yet
PYTHON UNIT-5 Part-C
4 pages
PANDAS - Series Dataframes
No ratings yet
PANDAS - Series Dataframes
118 pages
Unit - 1 - Python Pandas
No ratings yet
Unit - 1 - Python Pandas
176 pages
Pandas
No ratings yet
Pandas
21 pages
Unit 2 Mca275 PPT Part 2
No ratings yet
Unit 2 Mca275 PPT Part 2
33 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Shakir Hassan - CV
No ratings yet
Shakir Hassan - CV
1 page
Hid Spec v10
No ratings yet
Hid Spec v10
123 pages
ADYPU Online Brochure
No ratings yet
ADYPU Online Brochure
12 pages
Esxcfg-Nics - L: Vmware Compatibility Guide
No ratings yet
Esxcfg-Nics - L: Vmware Compatibility Guide
2 pages
cse
No ratings yet
cse
3 pages
A Detailed Study On Text Mining Techniques
No ratings yet
A Detailed Study On Text Mining Techniques
4 pages
GL_MASS_UPLOAD_USING_SAP_MIGRATION_COCKPIT_1735726653
No ratings yet
GL_MASS_UPLOAD_USING_SAP_MIGRATION_COCKPIT_1735726653
12 pages
1 - Introduction of AI
No ratings yet
1 - Introduction of AI
17 pages
Accenture Common Application and Ms Office Practice Set 2 A04562f7
63% (8)
Accenture Common Application and Ms Office Practice Set 2 A04562f7
4 pages
Damir Vadas, Oracle As I Learned - How To ... Redo Logs Generation
No ratings yet
Damir Vadas, Oracle As I Learned - How To ... Redo Logs Generation
19 pages
SCM COP 2017 Barkman Digitalbusinessplanningstrategy
No ratings yet
SCM COP 2017 Barkman Digitalbusinessplanningstrategy
23 pages
Powered By: /revos - in /revosauto Revos - in /company/revos
No ratings yet
Powered By: /revos - in /revosauto Revos - in /company/revos
2 pages
Diagnosing Computer Systems PC Diagnosing
No ratings yet
Diagnosing Computer Systems PC Diagnosing
4 pages
NICNET
No ratings yet
NICNET
15 pages
Programmable 2-Pll Vcxo Clock Synthesizer With 1.8-V, 2.5-V and 3.3-V Lvcmos Outputs
No ratings yet
Programmable 2-Pll Vcxo Clock Synthesizer With 1.8-V, 2.5-V and 3.3-V Lvcmos Outputs
29 pages
BCA - Guidelines For BCA Project Work - SGVU
No ratings yet
BCA - Guidelines For BCA Project Work - SGVU
7 pages
Ex3700 Um En-1
No ratings yet
Ex3700 Um En-1
47 pages
Air India Training Manual Passage ESS
No ratings yet
Air India Training Manual Passage ESS
16 pages
JD - VMWare Systems Engineer I
No ratings yet
JD - VMWare Systems Engineer I
2 pages
ds1307 Real Time Clock Breakout Board Kit
No ratings yet
ds1307 Real Time Clock Breakout Board Kit
20 pages
Heidenhain TNC 151 AP Service Instructions
No ratings yet
Heidenhain TNC 151 AP Service Instructions
78 pages
280p PDF
0% (1)
280p PDF
2 pages
Iot Intel Xeon d2700 Processors Kontron Solution Brief
No ratings yet
Iot Intel Xeon d2700 Processors Kontron Solution Brief
4 pages
ICSOC 2022 Process Oriented Intents
No ratings yet
ICSOC 2022 Process Oriented Intents
9 pages
1KHW001498he Technical Data ETL600
No ratings yet
1KHW001498he Technical Data ETL600
7 pages
Configure and Deploy Intune MDM
100% (2)
Configure and Deploy Intune MDM
92 pages
Flip Flops
No ratings yet
Flip Flops
12 pages
TR 068
No ratings yet
TR 068
50 pages
Liebert Apm 30 600 KW Brochure English
No ratings yet
Liebert Apm 30 600 KW Brochure English
8 pages
Chapter 2
No ratings yet
Chapter 2
11 pages

Advanced Python Unit5 Pandas

Uploaded by

Advanced Python Unit5 Pandas

Uploaded by

6/8/23, 2:54 PM Advanced_Python_Unit5_Pandas

Requirement already satisfied: pandas in c:\programdata\anaconda3\lib\site-packages (1.3.4)

Names Rollno Marks

Names Rollno Marks

Names Rollno Marks

ID1 ID2 ID3

Names RollNo Marks

Names Rollno Marks

Names Rollno Marks

Names Rollno marks

RangeIndex(start=0, stop=3, step=1)

Index(['Names', 'Rollno', 'marks'], dtype='object')

Names Rollno marks Address

Names Rollno marks Address pra_status

Names Section Rollno marks Address pra_status

Names Section Rollno marks Address

Names Rollno marks Address

Names Rollno marks Address

Names Rollno Address

Names Rollno Address

Out[33]: St_Name Rollno Address

0 xyz 102 delhi

1 uvw 103 gzb

Names Rollno Address

St_Name Rollno Address

[250 rows x 5 columns]

0 0.367943 0.821273 0.946864 0.072266 0.204492

1 0.170343 0.338376 0.030022 0.587543 0.271135

3 0.203329 0.195697 0.029494 0.658130 0.124452

4 0.823576 0.944135 0.882877 0.233529 0.783012

245 0.641040 0.193620 0.993376 0.606001 0.518447

246 0.228899 0.918485 0.972524 0.009383 0.074809

247 0.921852 0.555687 0.161811 0.380073 0.098612

248 0.001418 0.463014 0.465351 0.018872 0.421695

249 0.710599 0.584821 0.290798 0.594491 0.747483

243 0.131279 0.069744 0.992107 0.563838 0.461645

Names Rollno Marks

Names Rollno Marks

Out[44]: Names Rollno

st1 abc 101

st2 xyz 102

Writing any existing Data Frame on a file.(.csv, .xlsx,.js)

Reading any csv, xlsx,js files and converting it into DF

Unnamed: 0 Names Rollno Marks

C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in read_csv(filepath_or_buffer, sep, delimiter, h

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _read(filepath_or_buffer, kwds)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in __init__(self, f, engine, **kwds)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in _make_engine(self, engine)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py in __init__(self, src, **kwds)

> Splitting the data into groups based on some criteria

Merging two Data Frames

You might also like

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\readers.py in init(self, f, engine, **kwds)

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py in init(self, src, **kwds)