Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
98 views

Python Pandas Presentation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views

Python Pandas Presentation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

PANDAS FOR

PYTHON
By:
Aayushi Pathak, Bhauk Yadav, Abhijeet, Srishti Jain, Praveen Shahani
Table of contents

01 Basic Understaning of Pandas

02 Pandas for Data Analysis

03 Broadway Theatre Example Using Pandas


Introduction

• Pandas is a popular open-source Python


library for data manipulation and analysis. It
provides data structures and functions that
enable users to work with structured data
efficiently
01
Basics of Pandas
• Pandas can be easily installed
using pip, which is a package
manager for Python. To install
pandas, you can run the
following command in the
terminal

• Once installed, you can import


the library into your Python
program using the following
code
Working with Series in Pandas

• A Series is a one-
dimensional labeled array
that can hold any data type. OUTPUT
• We can also provide
custom labels for the
Series using the index OUTPUT
parameter
Working with DataFrames in Pandas

• A Data Frame is a two-


dimensional table-like data
structure with labeled rows
and columns. Here is an
example of how to create a
DataFrame in Pandas
OUTPUT
Pivoting Data Frame

It is used to reshape a
given data frame
organized by given index/
column values. It does not
support data aggregation,
multiple values will result
in a multi index in the
columns. OUTPUT
Descriptive Statistics Using Pandas
• Descriptive statistics are
brief informational
coefficients that summarize
a given data set
• They are broken down into
measures of central
tendency and measures of
variability (spread)
• Measures of central
tendency include the mean,
median, and mode, while
measures of variability
include standard deviation,
variance, minimum and
maximum variables.
We can use df.describe() it will also give all the measures mentioned

OUTPUT
02
Pandas for Data Analysis
Steps Covered

 Importing the Data


 Data Manipulation
 Data Exploration
 Data Reindexing and Altering
 Data Cleaning
Importing Data

• The first step in data cleaning is to import the data into Pandas. Pandas provides several functions to
read different types of data, such as CSV, Excel, SQL, and more.
Data Exploration

• Before cleaning the data,


it is important to explore
the data and identify any
potential issues. Pandas
provides several functions
to explore the data, such
as head(), tail(), info(),
describe(), and more.
Data Cleaning

• Once we have explored


the data and identified any
potential issues, we can
start cleaning the data.
Pandas provides several
functions for data
cleaning, such as
dropna(), fillna(),
replace(), and more.
01 02
Filtering Sorting

Data Manipulation

03 04
Merging Grouping
Data Filtering

• Filtering is the process of


selecting a subset of data
based on specific
conditions. Pandas
provides several functions
for filtering data, such as
loc(), iloc(), and query().
Data Sorting

• Sorting is the process of


arranging the data in a
specific order based on one
or more columns. Pandas
provides a sort_values()
function for sorting data.
Data Merging

• Merging is the process of


combining two or more
Data Frames into a single
DataFrame based on a
common column. Pandas
provides a merge()
function for merging data.
Data Grouping

• Grouping is the process of


grouping the data based
on one or more columns
and then applying a
function to each group.
Pandas provides a
groupby() function for
grouping data.
Data Reindexing
Reindexing in Pandas can be used to change the index of rows and columns of a DataFrame.

Step: Firstly, make a data table in python


OUTPUT

– Here (name, marks and course are the column names and (1,2,3,4,5) are the rows name.
Reindexing the Rows
One can reindex a single row or multiple rows by using reindex() method. Default values in the
new index that are not present in the dataframe are assigned NaN.

– Here in reindexing the rows only the place is being changed here from (1st position to 2nd or 3rd).
Don’t think you can change the row name using it.
– We can reindex a single column or multiple columns by using reindex() method and by specifying
the axis we want to reindex. Default values in the new index that are not present in the dataframe are
assigned NaN.
Reindexing the Columns
We can reindex a single column or multiple columns by using reindex() method and by specifying
the axis we want to reindex. Default values in the new index that are not present in the dataframe
are assigned NaN.

– Use ffill() function to fill the missing values along the index axis.
– When ffill() is applied across the index then any missing value is filled based on the corresponding
value in the previous row.
– Here we just make a DataFrame –Firstly, we will fill this
with some missing values and missing value using the –Now we will fill the NaN
these values is denoted by NaN. index axis. value using column axis
Altering/Rename Column Labels
Using Rename() Function: One way of renaming the columns in a Pandas Dataframe is by
using the rename() function. This method is quite useful when we need to rename some selected
columns because we need to specify information only for the columns which are to be renamed.
– Rename Column name using
– By Assigning a list of new column names
DataFrameset_axis() Function
03
Broadway Theatre Example Using
Pandas
https://colab.research.google.com/drive/1HDKICQU0foyTdIHHkFyNQTUxrNzlfK9l?
usp=sharing
THANK YOU

You might also like