Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
40 views

Chapter 2 Advanced Operations On Dataframeseng

The document discusses various techniques for pivoting and sorting pandas DataFrames. It explains how to use the pivot(), pivot_table(), stack(), and unstack() methods to reshape DataFrames and create pivot tables. It also covers sorting DataFrames using the sort_values() and sort_index() methods to reorder rows or columns based on column values or indices in ascending or descending order. Examples are provided to demonstrate how to apply these methods to sample DataFrames and interpret the results.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Chapter 2 Advanced Operations On Dataframeseng

The document discusses various techniques for pivoting and sorting pandas DataFrames. It explains how to use the pivot(), pivot_table(), stack(), and unstack() methods to reshape DataFrames and create pivot tables. It also covers sorting DataFrames using the sort_values() and sort_index() methods to reorder rows or columns based on column values or indices in ascending or descending order. Examples are provided to demonstrate how to apply these methods to sample DataFrames and interpret the results.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Advanced Operations on

DataFrames
Based on CBSE Curriculum
Informatics Practices, Class-12

CHAPTER-2
By:
Mrs. Neha Tyagi (PGT CS)
KV No-5 2nd Shift, Jaipur
KVS RO Jaipur
Pivoting DataFrame
• Pandas is a popular library for Data analysis .
• Pivoting is one of the key actions for a Data–Analyst. Means
providing an axis to the table data, on the basis of that axis the
database will work.
• Using Pandas, MS-Excel type of pivot tables can be created.
• These tables summarizes the big data and create meaningful
reports to save your time.
• Pivot table allows us to fetch important record from a large and
detailed data set.
• Pivot tables can automatically sort, count and total etc.
• In general, pivoting means to use unique value from a
index/column and make dataframe.
• To make pivot table we use pivot( ) or pivot_table() from
pandas.

Neha Tyagi, KV No-5 Jaipur


Pivoting using pivot( ) method
• pivot() method, creates new DataFrame after reshaping the data on the
basis of column values.
• This method takes 3 arguments - index, columns andvalues . Minimum
two arguments are compulsory.
• In the form of arguments value you have to pass column name of
original table.
• Then pivot ( ) creates a new table whose indices of row and column are
the same which you have given as argument.
• Cell values of new table will come from the column which you have
given as parameter. Its syntax is -
pandas.pivot(index, columns, values)
• Where index creates a index of new DataFrame, which is the column
name from the table.
• Where columns creates columns of new DataFrame, which are the names
of column of table.
• Where values creates columns of new DataFrame which are the values of
the column name from table.
Neha Tyagi, KV No-5 Jaipur
Pivoting using pivot( ) method
syntax  pandas.pivot(index, columns, values)
• Example Creating DataFrame

• Creating pivot table 

We can see in this pivot table that there


is a new table is created and the values
of Score column came in to different
columns. While its Name and Subject
column, is matching with original table.
Where values are not matching, NaN
(None) is putted automatically.

Neha Tyagi, KV No-5 Jaipur


Using pivot( ) method with .fillna( )
syntax  pandas.pivot(index, columns, values).fillna()
• Example Creating DataFrame.

• Creating pivot table with .fillna()

We can see in this pivot table that there


is a new table is created and the values
of Score column came in to different
columns. While its Name and Subject
column, is matching with original table.
Where values are not matching, there
is a blank space in spite of NaN.
Neha Tyagi, KV No-5 Jaipur
Pivoting by Multiple columns
We remove values parameter from syntax only.
syntax  pandas.pivot(index, columns)
• Example  Creating DataFrame

• Ceating pivot table with .fillna( ) 

Neha Tyagi, KV No-5 Jaipur


Pivoting by Multiple columns. . .
• In last example we have seen that there are many
indices created and their values were seen once for
subjects and once for grades for each name.
• We can filter them

Neha Tyagi, KV No-5 Jaipur


Pivot Problem
• We should always remember that if there are combinations of
multiple values in indices and columns then a value error will
occur.

Neha Tyagi, KV No-5 Jaipur


Using stack( ) and unstack( ) methods
• stack( ) and unstack( ) methods both flip the layout of DataFrame,
means these flips the levels of columns into row and flips levels of rows
into columns. DataFramestacking means moving the innermost column
index to innermost row index and the opposite action is know as
unstacking

Using Stack ( ) Method

After using Stack Method all the horrizontal


became vertical and it takes last level in column
breakdown and converts it into last row breakdown.

Neha Tyagi, KV No-5 Jaipur


Using stack( ) and unstack( ) methods. . .

Stack can be used like this. After stacking there is


another stacking, then it moves all the remaining
levels.

Neha Tyagi, KV No-5 Jaipur


Using stack( ) and unstack( ) methods. . .

This is unstacking

Unstacking is just like stack the


only difference is that there is an
argument ‘0’ is passed in stack( )
method.

Neha Tyagi, KV No-5 Jaipur


Pivoting using pivot_table( ) method. . . .
• This is the generalization of pivot( ) method.
• When you have duplicate values for only one index or duplicate values for
one column then pivot_table( ) method is used.
• A pivot table contains counts, sums and table data related functions.
• pivot_table( ) method creates a DataFrame, kind of Excel Sheet.
• This method is used to convert row into column and vice-versa.
• It allows grouping of any data field.
• Its syntax is 
pandas.pivot_table (DataFrame, values=None, index=None,
columns=None, aggfunc=‘mean’,
fill_value=None, margins=False, dropna=True,
margins_name=‘All’)
• All the arguments are not necessary in .pivot_table() method,
because there are some default values for some arguments.

Neha Tyagi, KV No-5 Jaipur


Pivoting using pivot_table( ) method. . .
pandas.pivot_table (DataFrame, values=None, index=None,
columns=None, aggfunc=‘mean’,
fill_value=None, margins=False, dropna=True,
margins_name=‘All’)
• All the arguments are not necessary in .pivot_table() method, because
there are some default values for some arguments.
• In its syntax -
– DataFrame  is a pandas DataFrame.
– valuesthis is optional and also a column to be aggregated.
– indexthis is column, grouper, array or list name.
– columns this is a column, grouper, array or list.
– aggfunc  is an aggregation function.
– fill_value we can set default values using this, if the values are not given.
– margins this is a boolean whose default is false. If we make it true then
the sum of row and column in resulting dataframe.
– dropnaif this is true then it drops row having missing data
– margins_name=‘All’ if margins is true then it keeps the name of the rows
and column of total. Neha Tyagi, KV No-5 Jaipur
Pivoting using pivot_table( ) method. . .
We create a pivot table considering the following data.

WE can take this data from CSV File.

or

Neha Tyagi, KV No-5 Jaipur


Pivoting using pivot_table( ) method. . .
pivot table can be created by the following method also.

Pay attention on the values of aggfunc

Neha Tyagi, KV No-5 Jaipur


Pivoting using pivot_table( ) method. . .
An excercise for you 

Neha Tyagi, KV No-5 Jaipur


Pivoting using pivot_table( ) method. . .
Solution
First you will create a dataframe of table using pandas.
After that you have to apply the following functions.-

Neha Tyagi, KV No-5 Jaipur


Sorting of DataFrames
• Data of DataFrame can be sort according to values of row and
column.
• By default sorting is done on row labels in ascending order.
• Pandas DataFrames has two useful sort functions 
– sort_values( ): it sorts the data of given column to the function in ascending
or descending order.
– sort_index( ): this function sorts rows (axis=0) or columns (axis=1).
• Its syntax is as follows
• DataFrame.sort_values(by = None, axis=0, ascending = True, inplace = False)
• DataFrame.sort_index(by = None, axis=0, ascending = True, inplace = False)
• Here –
• by: column to be sorted.
• axis: here passing 0 means sorting will be done row wise and 1 means column
wise
• ascending: by default ascending is true
• inplace: default is false if you don’t want a new dataframe then set it true.

Neha Tyagi, KV No-5 Jaipur


DataFrames Sorting...
or

by default sorting is in ascending order.|

To sort in desceiding order the example is as under.

Value of Ascending parameter is


false

If we give two columns like


this then sorting on multiple
columns will be done.
Sort by index

Sorting in ascending order Sorting in descending order

Points to remember:
1. pivot( ) method creates a new table whose row and column are unique.
2. pivot( ) method is used to pivot without aggregation.
3. stacking means moving innermost column index to innermost row index.

Neha Tyagi, KV No-5 Jaipur


• Please follow our blog and subscribe youtube
channel to get all the chapter and lectures.

www.pythontrends.wordpress.com

You might also like