Introduction To Pandas - Ipynb - Colaboratory
Introduction To Pandas - Ipynb - Colaboratory
ipynb - Colaboratory
Overview
In this module we will have a look at a Python library called Pandas.
Pandas offer a data structure called pandas.DataFrame which is similar to NumPy Arrays with
added functionality.
Added functionalities include operations that we often use in data science such as omitting
missing values, replacing values, and etc.
Pandas Basics
Similar to NumPy, we can import Pandas by calling import pandas as pd .
By calling as pd we can use the library functions by calling pd.foo() . If you omiit as pd and
import library by calling import pandas , you will have to call functions as pandas.foo()
import pandas as pd
Pandas DataFrame
pandas.DataFrame is a widely used tabular data structure similar to a spreadsheet which we can
use to manage data within out python code.
We can have names to columns unlike in NumPy which allows to easily manipulate and nd our
data within a huge dataset.
print(populationData)
https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 1/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory
We can see that dataframes allows us to have names to colums which is helpful to nding data
across multiple columns.
Also note that we can have a custom index for the rows (other than regular 0, 1, 2.. index in arrays).
Here in the population dataset we have custom indices which are 'A', 'B', 'C', 'D', 'E'
Accessing Columns
We can access columns by using column names. For example we can access state names in pour
population data by calling populationData['state']
populationData['state']
When we need to access multiple columns, we can use nested [] and pass a list of columns we
need to view.
populationData[['state', 'pop']]
Also, we can view the columns that are in the data frame by calling populationData.columns
populationData.columns
Activity
https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 2/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory
You have been given a sales dataframe which includes data on effect of multiple factors on sales
price of a commodity.
Your task is to determine rst determine what are the columns in the dataset and extract 3
columns from dataset such that it includes Date, price and a factor of your choice.
salesDataframe = pd.read_csv('https://www.cs.odu.edu/~sampath/courses/f19/cs620/files/data/va
salesDataframe
Acessing Rows
There are multiple ways to retrieve rows from a dataframe.
Since pandas allows us to set custom index, we can either use custom index or the regular index
(i.e 0, 1, 2..) to access data.
populationData.index
Sometimes custom index would be the same as regular index when we haven't speci ed a custom
index for our dataframe.
https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 3/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory
head
Through head(n) we can get the rst n number of rows of the dataframe.
populationData.head(3)
loc operation
We can use the custom index we have on the dataframe (i.e 'A', 'B', 'C', 'D', 'E' in our
population dataframe) to retrieve rows. We can do that by calling loc on our dataframe.
populationData.loc['A']
To select multiple rows, we can pass multiple indices similar to the way we accessed multiple
columns.
populationData.loc[['A', 'C']]
iloc operation
We can also use the regular index to retrieve data. (i.e 0, 1, 2..). For that we have to use iloc
operation similar to loc operation we used previously.
populationData.iloc[1]
https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 4/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory
populationData.iloc[[1, 3]]
Activity
Consider the sales dataset we have. Retrieve set of rows of your choice using head() , iloc and
loc operations.
Creating Dataframes
To create dataframes we have to use pd.DataFrame() function along with the data for the
dataframe.
We have to pass a dictionary to pd.DataFrame function which contains names of columns and
data we have for each row.
Let's create the temperature dataset from numpy excercise. Here we have 2 arrays for inside and
outside temperature readings.
data = {'inside' : [166, 108, 229, 194, 266, 102, 235, 188, 183, 129],
'outside' : [251, 238, 236, 161, 108, 291, 121, 183, 137, 133]}
https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 5/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory
temperatureDataframe = pd.DataFrame(data)
temperatureDataframe
Activity
Your task is to create a dataframe that has both acceleration data and temperature data. You may
chose column names of your choice.
https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 6/7
8/5/2020 Introduction to Pandas.ipynb - Colaboratory
https://colab.research.google.com/drive/1kIUvQX1ynNzPYz7RfzLI9b069BY9-fgS#printMode=true 7/7