100% found this document useful (2 votes)

204 views

Chapter-2 Python Pandas

This chapter discusses pandas, a popular Python library for data analysis. It covers pandas data structures like Series and DataFrame, as well as common operations like selecting/accessing data, descriptive statistics, pivoting, sorting, grouping, and aggregating. Functions like loc, iloc, describe(), hist(), groupby(), agg(), and transform() are explained. The chapter aims to teach readers how to efficiently work with data frames in pandas.

Uploaded by

Swarnim Jain

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (2 votes)

204 views

Chapter-2 Python Pandas

Uploaded by

Swarnim Jain

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Chapter -2

Python Pandas

Introduction :

Pandas is one of the most preferred and widely used data

science libraries. It offer efficient data structure which are not
only powerful , but also very convenient and flexible.

In this chapter we will learn :

pivoting, sorting and aggregation.

Descriptive statistics – histogram and quantiles, various

functions, reindexing and altering labels.

Pandas provide two basic data structures – Series and

DataFrame.

Series – It represents a one-dimensional array of indexed

data.

DataFrame – It store data in two-dimensional way.

Some attributes of DataFrame are :

1. index – to display index (row labels) of the DataFrame.

2. columns – to display column labels of the DataFrame.

3. axes – Return a list representing both the axes – row and
columns.

4. size – Return number of elements in DataFame

5. shape – Return dimension of DataFrame

6. values – Return DataFrame in the form of Numpy Array.

Selecting/Accessing a Column :
Selecting/Accessing a Subset from a DataFrame using
Row/Column Names :

Syntax :

<DataFrameObject>.loc[<startrow>:<endrow>]

To access selective columns use :

<DF object>.loc[:,<start column>:<end column>]

To access range of columns from a range of rows , use :

<DataFrameObject>.loc[<startrow>:<endrow>,
<startcolumn>:<endcolumn>]

Obtaining a Subset/Slice from a DataFrame using

Row/Columns Numeric Index/Position

<DF object>.iloc[<start row index>: <end row index>,

<start col index> : <end column index>]

Selecting/Accessing Individual Values :

<DF object>.<column>[<row name or row numeric index>]

Adding and Deleting Columns in DataFrames

Deleting Columns : use del statement to delete a column

Del<DF object>[<column name>]

Descriptive Statistics with Pandas :

Pandas include many useful statistical functions.

Reference dataframe namely sal_df

Functions min() and max() : The min() and max() functions

find out the minimum or maximum values respectively form a
given set of data.

Parameters :

axis : (0 or 1) by default, minimum and maximum is

calculated along axis 0.
Functions mode() , mean(), median() :

mode() – Returns the mode value (i.e., the value that appears
most often) from a set of values.

Parameters :

axis : axis 0 or ‘index’ get mode of each column

axis 1 or ‘columns’ : get mode of each row

mean() – Returns the computed mean(average) form a set of

values.

median() – returns the middle number form a set of numbers.

(2000 10000 12000 13000)

(10000+12000) =22000, 22000/2 = 11000

(2000 6000 7000 7000)

(6000 + 7000) = 13000, 13000/2 = 6500

Functions count( ) and sum( )

count( ) – counts the non-NA entries for each row or column.

sum( ) – returns the sum of values for the requested axis.

Applying Functions on a Subset of Dataframe :

Sometimes , we need to apply a function on a selective column

or row or a subset of the data frame.

Applying Function on a Column of a DataFrame:

To apply a function on a column, write –

<dataframe>[<column name>]

Applying function on Multiple Columns of a DataFrame :

<dataframe>[[<column name>, <column name>,….]]

Applying Function on a row of a DataFrame :

<dateframe>.loc[<row index>, :]

Applying Functions on a range of rows of a DataFrame:

<dataframe>.loc[<start row> : <end row>, :]

Applying functions to a subset of the DataFrame :

<dataframe>.loc[<start row>:<end row>, :

<start column> : <end column>]

Advanced Operations on DataFrame :

Three advanced operations are :

1. pivoting
2. sorting
3. aggregation
1. Pivoting : Pivoting is actually a summary technique that
works on tabular data (i.e., data in rows and columns).
Pivoting technique rearranges the data from rows and columns,
by possibly aggregating data from multiple sources, in a report
form (with rows transferred to columns) so that data can be
viewed in a different perspective.

Real life example :

An online tutoring company maintains its data about tutors and

online classes in the following table.

Using pivot Function :

Cells in the pivoted table which do not have a matching entry
in the original one are set with NaN.

Now change the rows and columns , i.e. the index and columns
arguments

We can skip the values argument:

The above data is for one quarter only. The online tutoring
company has data for the entire year as shown below :
The index i.e. , the rows are specified as ‘Tutor’ and the
columns as ‘Country’. There are multiple entries of tutor which
are very much different for same country.

Consider Tahira’s entries

Tutor Classes Quarter Country

Tahira 28 1 USA
Tahira 36 2 USA
Tahira 24 3 Barzil
Tahira 36 4 Japan

Try to create a row for tutor Tahira from the above data with
columns as Country.

USA Brazil Japan

Tahira 24 36

Using pivot_table( ) function :

Example-1 Considering the tutoring company data, compute

total classes per tutor.
Example-2 considering the tutoring company data, compute
number of countries (count) per tutor.

Example -3 Considering the tutoring company data, compute

total classes by country.

Example-4 Considering the tutoring company data, compute

total classes on two fields, tutor and country wise.

Example 2.5 Considering the tutoring company data, compute

average classes on two fields
Sorting : Sorting refers to arranging values in a particular
order.

sort_values( ) – function arrange the values in ascending or

descending order.
Creating Histogram :

Histogram – A histogram is a plot that lets you discover, and

show the underlying frequency distribution(shape) of a set of
continuous data.

hist( ) function of pandas is used to create histogram.

Consider the following histogram that has been computed using

the following datasets containing age of 20 people.

37 28 38 44 53 69 74 53 35 38 66 46 24 45 92 48 51 62 58 57
Bin Frequency Age included in Bin

20-30 2 28,24

30-40 4 37,38,35,38

40-50 4 44,46,45,48

50-60 5 53,53,51,56,57

60-70 3 69,66,92

70-80 1 74

80-90 0 --

90-100 1 92
Function Application : It means that a function (a library
function or user defined function ) may be applied on a
dataframe in multiple ways:

(a) on the whole dataframe

(b) row-wise or column-wise
(c) on individual elements, i.e. element-wise

for the above mentioned three types of function application,

Pandas offers following three functions :

(a) pipe() – dataframe wise function application.

(b) apply() – row-wise/column-wise function application.
(c) applymap() – individual elements-wise function application.

(a) pipe( ) function : The piping of functions through pipe( )

basically means the chaining of functions in the order they are
executed.
pipe() Example 1 Function add( ) followed by multiply( )
applied on a dataframe.

The apply and applymap() functions :

apply () – apply is a series function, so it applies the given

function to one row or one column of the dataframe (as single
row/columns of a dataframe is equivalent to a series).

Syntax - <dataframe>.apply(<funcname>,axis = 0)

axis 0 or 1 default 0 , axis along with the function is applied.

If axis is 0 or ‘index’ : function is applied on each column

If axis is 1 or ‘columns’ : function is applied on each row.

applymap() – is an element function, so it applies the

given function to each individual elements, separately.

Syntax - <dataframe>.applymap(<funcname>)
To apply, apply() row-wise write :

<dataframe>.apply(<func>, axis = 1)

NOTE – The apply() will apply the function on individual

columns/rows, only if the passed function name is a Series
function. If you pass a single value function, then apply() will
behave like applymap()
Function groupby( ) :

The duplicate values in the same filed are grouped together to

form groups, e.g. for creating Tutor wise groups :

All the rows having Tutor as Tahira will be clubbed to form

Tahira group.

groupby() function - is used to create group for the

duplicate values in the same filed.
The groupby() creates the group internally and does not
display the grouped data by default.
Grouping on Multiple Columns : Create a group for Tutor
and for each tutor group, a Country-wise subgroup :

Example : to get a group having tutor name as ‘Anusha’ and

Country as ‘UK’ write :

Aggregation via groupby () :

agg( ) method – aggregates the data of the dataframe using

one or more operations over the specified axis.
Syntax - <dataframe>.agg(func, axis = 0)

mean :-

36,40,30,32

(36+40+30+32)/4 =
34.5

median :-

30 32 36 40

n/2 = 4/2 = 2

(n/2) + 1 = (4/2) + 1
=3

(32 + 36) / 2 = 34

Sum : (36+40+30+32)
= 138

We may combine the groupby( ) and agg( ) in single command

:
The transform( ) function : This function transforms the
aggregate data by repeating the summary result for each row
of the group and make the result have the same shape as
original data.

Q- What if we want to add this aggregate data to the

dataframe itself?
Reindexing and Altering Labels : The methods provided by
Pandas for reindexing and relabeling are :

1. rename( ) – simply rename the index and/or column labels

in a dataframe.
(ii) reindex( ) – specify the new order of existing indexes
and column labels, and/or also create new indexes/column
labels.

(a) Reordering the existing indexes using reindex( )

Adding indexes :
(iii) reindex_like() – for creating indexes/column-labels
based on other dataframe object.

<dataframe>.reindex_like(other)
Solved Problems :

1. Consider the following code to create two dataframes with

similar values. What will be printed by the code given below ?
Justify your answer.

import pandas as pd

df1 = pd.DataFrame([1,2,3])

df2 = pd.DataFrame([[1,2,3]])

print("df1")

print(df1)

print("df2")

print(df2)

Ans :

Summacut Maintenance Manual
67% (3)
Summacut Maintenance Manual
50 pages
Ethnotech - Data Science With Python
No ratings yet
Ethnotech - Data Science With Python
480 pages
Hadoop Interview Questions New
No ratings yet
Hadoop Interview Questions New
9 pages
Object Serialization With Pickle, JSON and YAML PDF
No ratings yet
Object Serialization With Pickle, JSON and YAML PDF
10 pages
Python PDF
73% (15)
Python PDF
217 pages
MySQL Cheatsheet - CodeWithHarry
100% (1)
MySQL Cheatsheet - CodeWithHarry
13 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
Python Pandas Cheatsheety
No ratings yet
Python Pandas Cheatsheety
7 pages
Python Full
100% (1)
Python Full
59 pages
Data Visualization
No ratings yet
Data Visualization
9 pages
Class XII (As Per CBSE Board) : Informatics Practices
No ratings yet
Class XII (As Per CBSE Board) : Informatics Practices
43 pages
Tuple Data Structure
No ratings yet
Tuple Data Structure
8 pages
Strings PDF
No ratings yet
Strings PDF
14 pages
DVS Python Material
No ratings yet
DVS Python Material
497 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
Python Date Time
No ratings yet
Python Date Time
6 pages
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
No ratings yet
International Indian School, Riyadh WORKSHEET (2020-2021) Grade - Xii - Informatics Practices - Second Term
9 pages
TSQL Material
No ratings yet
TSQL Material
78 pages
SQL Database Notes
No ratings yet
SQL Database Notes
8 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
64 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
48 pages
Spark RDD Dataframes SQL
No ratings yet
Spark RDD Dataframes SQL
3 pages
Python Pandas Interview Questions
100% (1)
Python Pandas Interview Questions
17 pages
Python Program
No ratings yet
Python Program
7 pages
Day64 - Pandas Interview Questions
No ratings yet
Day64 - Pandas Interview Questions
5 pages
PySpark SQL Cheat Sheet Python
No ratings yet
PySpark SQL Cheat Sheet Python
1 page
Python Interview Questions: Answer: in Duck Typing, One Is Concerned With Just Those Aspects of An Object That Are
No ratings yet
Python Interview Questions: Answer: in Duck Typing, One Is Concerned With Just Those Aspects of An Object That Are
12 pages
MIcrosoft SQL Server 2012 - T-SQL
No ratings yet
MIcrosoft SQL Server 2012 - T-SQL
9 pages
String Data Type PDF
No ratings yet
String Data Type PDF
24 pages
PLSQL Intrv Guide
0% (1)
PLSQL Intrv Guide
159 pages
File Handling: Types of Files
No ratings yet
File Handling: Types of Files
19 pages
MySQL-Full Notes
No ratings yet
MySQL-Full Notes
51 pages
Python Database Programming: Storage Areas
No ratings yet
Python Database Programming: Storage Areas
11 pages
Pandas Practice Questions
No ratings yet
Pandas Practice Questions
2 pages
Python File Handling PDF
100% (1)
Python File Handling PDF
20 pages
Python Strings PDF
No ratings yet
Python Strings PDF
27 pages
Django 'Mahmoud Ahmed+
No ratings yet
Django 'Mahmoud Ahmed+
342 pages
Python Data Exploratory Commands
No ratings yet
Python Data Exploratory Commands
9 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
Window Function in Pyspark
100% (1)
Window Function in Pyspark
8 pages
Django Programming
100% (1)
Django Programming
7 pages
Python Web Server With Flask
No ratings yet
Python Web Server With Flask
11 pages
Python Technical Interviews Questions
100% (1)
Python Technical Interviews Questions
15 pages
Python Pandas-Series-neww
100% (1)
Python Pandas-Series-neww
80 pages
Numpy Complete Material
No ratings yet
Numpy Complete Material
19 pages
Input and Output Statements
No ratings yet
Input and Output Statements
9 pages
SQL Server Notes
No ratings yet
SQL Server Notes
75 pages
Python Functions
No ratings yet
Python Functions
29 pages
PYTHON Pattern - Durga
100% (4)
PYTHON Pattern - Durga
48 pages
STAT 451: Intro To Machine Learning Lecture Notes
100% (1)
STAT 451: Intro To Machine Learning Lecture Notes
17 pages
Pandas Cheat Sheet
100% (2)
Pandas Cheat Sheet
6 pages
Python - 1 Year - Unit-2
No ratings yet
Python - 1 Year - Unit-2
116 pages
Python Pandas
100% (1)
Python Pandas
35 pages
Python Tutorial For Beginners in Hindi
No ratings yet
Python Tutorial For Beginners in Hindi
34 pages
Dictionary Data Structure
No ratings yet
Dictionary Data Structure
10 pages
Informatics Practices Class 12 Cbse Notes Data Handling
0% (1)
Informatics Practices Class 12 Cbse Notes Data Handling
17 pages
DATAANALYSIS FINALS123
No ratings yet
DATAANALYSIS FINALS123
36 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
justenoughpython_pandas_220915_175329
No ratings yet
justenoughpython_pandas_220915_175329
64 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
TELECOM 2G 3G 4G RF IPv6 Study Materials PDF
No ratings yet
TELECOM 2G 3G 4G RF IPv6 Study Materials PDF
3 pages
5.the Gamma Function (Factorial Function) : 5.1 Definition, Simple Properties
No ratings yet
5.the Gamma Function (Factorial Function) : 5.1 Definition, Simple Properties
16 pages
Windows XP Embedded Thin Client Manual
No ratings yet
Windows XP Embedded Thin Client Manual
72 pages
Routing Overview - Static Route
No ratings yet
Routing Overview - Static Route
20 pages
2575 MH Mini-Horn Series
No ratings yet
2575 MH Mini-Horn Series
2 pages
Man Vlfsin34 en
No ratings yet
Man Vlfsin34 en
31 pages
156 Half-Cell: 10BB Half-Cut Mono Perc
No ratings yet
156 Half-Cell: 10BB Half-Cut Mono Perc
2 pages
AHP Sampling
0% (1)
AHP Sampling
2 pages
Sahara India Pariwar
No ratings yet
Sahara India Pariwar
48 pages
Programming Paradigms: Vitaly Shmatikov
No ratings yet
Programming Paradigms: Vitaly Shmatikov
31 pages
Foundation Course in English-2 (FEG-02) Assignment
No ratings yet
Foundation Course in English-2 (FEG-02) Assignment
2 pages
Connected Crawler Robot - Design and Motion Planning For Climbing A Step
No ratings yet
Connected Crawler Robot - Design and Motion Planning For Climbing A Step
15 pages
Independent and Dependent Variables
No ratings yet
Independent and Dependent Variables
4 pages
3.3 9 - Modeling and Simulation of Wear in A Pin On Disc Tribometer
No ratings yet
3.3 9 - Modeling and Simulation of Wear in A Pin On Disc Tribometer
10 pages
Computer-Architecture Hari Aryal Ioe
No ratings yet
Computer-Architecture Hari Aryal Ioe
163 pages
Azure SQL DB (PaaS) - Alerts Setup
No ratings yet
Azure SQL DB (PaaS) - Alerts Setup
3 pages
Banking and Insurance Law Assignment (Abhinav)
No ratings yet
Banking and Insurance Law Assignment (Abhinav)
18 pages
Ramdump 2024-03-24 13-38-38 0 Props
No ratings yet
Ramdump 2024-03-24 13-38-38 0 Props
14 pages
Yaesu Amateur Radio Digital Specs 1V02 en-GB
No ratings yet
Yaesu Amateur Radio Digital Specs 1V02 en-GB
38 pages
HighLife Users Manual
No ratings yet
HighLife Users Manual
14 pages
Logica 1
No ratings yet
Logica 1
30 pages
Bluetooth Based Smart Sensor Network
No ratings yet
Bluetooth Based Smart Sensor Network
15 pages
Categorizing Traditional Chinese Painting Images: Lecture Notes in Computer Science October 2004
No ratings yet
Categorizing Traditional Chinese Painting Images: Lecture Notes in Computer Science October 2004
9 pages
Single Channel Speech Dereverberation Using The LP Residual Cepstrum
No ratings yet
Single Channel Speech Dereverberation Using The LP Residual Cepstrum
5 pages
Entering The World of GNU Software Radio: Thanh Le and Lanchao Liu
No ratings yet
Entering The World of GNU Software Radio: Thanh Le and Lanchao Liu
43 pages
Minutes of Meeting Held on 21.01.2025
No ratings yet
Minutes of Meeting Held on 21.01.2025
5 pages
Machine Learning Ericsson
100% (2)
Machine Learning Ericsson
12 pages
Setting Mikrotik Untuk Warnet DG Feature Web-Proxy Nya
No ratings yet
Setting Mikrotik Untuk Warnet DG Feature Web-Proxy Nya
10 pages
Microelectronics 2nd Edition Jerry C. Whitaker - Download the ebook in PDF with all chapters to read anytime
No ratings yet
Microelectronics 2nd Edition Jerry C. Whitaker - Download the ebook in PDF with all chapters to read anytime
57 pages