Python Pandas-Data Frames
Python Pandas-Data Frames
Dataframe
A Data Frame is
two-dimensional labelled data
structure like a table of MySQL.
It contains rows and columns,
and therefore has both a row
and column index.
Creation of DataFrame
(A) Creation of an empty DataFrame
>>> import pandas as pd
>>> dFrameEmt = pd.DataFrame()
>>> dFrameEmt
Empty DataFrame
Columns: []
Index: []
Creation of DataFrame
import pandas as pd
a1=[1,2,3,4]
a2=[10,20,30,40]
a3=[20,45,67,8]
x=pd.DataFrame([a1,a2,a3],columns=['A','B','C','D'])
print(x)
Creation of DataFrame
(B) Creation of DataFrame from NumPy ndarrays
>>> import numpy as np
>>> array1 = np.array([10,20,30])
>>> array2 = np.array([100,200,300])
>>> array3 = np.array([-10,-20,-30, -40])
>>> dFrame4 = pd.DataFrame(array1)
>>> dFrame4
Creation of DataFrame
(B) Creation of DataFrame from NumPy ndarrays
>>> dFrame5 = pd.DataFrame([array1, array3, array2], columns=[ 'A', 'B',
'C', 'D'])
>>> dFrame5
A B C D
0 10 20 30 NaN
1 -10 -20 -30 -40.0
2 100 200 300 NaN
Creation of DataFrame
(C) Creation of DataFrame from List of Dictionaries
# Create list of dictionaries Here, the dictionary keys are
taken as column labels, and the
>>> listDict = [{'a':10, 'b':20}, {'a':5, values corresponding to each
key are taken as rows.There will
'b':10, 'c':20}] be as many rows as the number
of dictionaries present in the list.
>>> dFrameListDict = pd.DataFrame(listDict)
Number of columns in a
>>> dFrameListDict DataFrame is equal to the
maximum number of keys in any
abc dictionary of the list.
0 10 20 NaN
1 5 10 20.0
(D) Creation of DataFrame from Dictionary of Lists
dictForest = {'State': ['Assam', 'Delhi', 'Kerala'], 'GArea': [78438, 1483, 38852]
, 'VDF' : [2797, 6.72,1663]}
>>> dFrameForest= pd.DataFrame(dictForest)
>>> dFrameForest
State GArea VDF
0 Assam 78438 2797.00
1 Delhi 1483 6.72
2 Kerala 38852 1663.00
(D) Creation of DataFrame from Dictionary of Lists
We can change the sequence of columns in a DataFrame.
>>> dFrameForest1 = pd.DataFrame(dictForest, columns = ['State','VDF',
'GArea'])
>>> dFrameForest1
State VDF GArea
0 Assam 2797.00 78438
1 Delhi 6.72 1483
2 Kerala 1663.00 38852
(E) Creation of DataFrame from Series
seriesA = pd.Series([1,2,3,4,5],index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series ([1000,2000,-1000,-5000,1000], index = ['a', 'b', 'c', 'd',
'e'])
seriesC = pd.Series([10,20,-10,-50,100], index = ['z', 'y', 'a', 'c', 'e'])
(E) Creation of DataFrame from Series
>>> dFrame6 = pd.DataFrame(seriesA)
>>> dFrame6
0
a1
b2
c3
d4
e5
(E) Creation of DataFrame from Series
>>> dFrame7 = pd.DataFrame([seriesA, seriesB])
>>> dFrame7
a b c d e
0 1 2 3 4 5
1 1000 2000 -1000 -5000 1000
(E) Creation of DataFrame from Series
>>> dFrame8 = pd.DataFrame([seriesA, seriesC])
>>> dFrame8
a b c d e z y
0 1.0 2.0 3.0 4.0 5.0 NaN NaN
1 -10.0 NaN -50.0 NaN 100.0 10.0 20.0
(F) Creation of DataFrame from Dictionary of Series
>>> ResultSheet={'Arnab': pd.Series([90, 91,
97],index=['Maths','Science','Hindi']),
'Ramit': pd.Series([92, 81, 96], index=['Maths','Science','Hindi']),
'Samridhi': pd.Series([89, 91, 88], index=['Maths','Science','Hindi']),
'Riya': pd.Series([81, 71, 67], index=['Maths','Science','Hindi']),
'Mallika': pd.Series([94, 95, 99], index=['Maths','Science','Hindi']) }
(F) Creation of DataFrame from Dictionary of Series
>>> ResultDF = pd.DataFrame(ResultSheet)
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99
>>> ResultDF=ResultDF.rename({'Arnab':'Student1','Ramit':'Student2','
Samridhi':'Student3','Mallika':'Student4'},axis='columns')
Accessing DataFrames Element through Indexing
There are two ways of indexing Dataframes :
Label based indexing and Boolean Indexing.
(A) Label Based Indexing:- DataFrame.loc[ ] is an important method
that is used for label based indexing with DataFrames.
>>> ResultDF.loc['Science']
Arnab 91
Ramit 81
Samridhi 91
Riya 71
Mallika 95
Name: Science, dtype: int64
Accessing DataFrames Element through Indexing
(A) Label Based Indexing:-
Also, note that when the row label is passed as an integer value, it
is interpreted as a label of the index and not as an integer position
along the index, for example:
>>> dFrame10Multiples = pd.DataFrame([10,20,30,40,50])
>>> dFrame10Multiples.loc[2]
0 30
Name: 2, dtype: int64
When a single column label is passed, it returns the column
as a Series.
Accessing DataFrames Element through Indexing
(A) Label Based Indexing:-
>>> ResultDF.loc[:,'Arnab'] # we can obtain the same result that is
the marks of ‘Arnab’ in all the subjects
>>> ResultDF.loc[['Science', 'Hindi']] # To read more than one row
from a DataFrame
Accessing DataFrames Element through Indexing
(B) Boolean Indexing:- Boolean means a binary variable that can
represent either of the two states - True (indicated by 1) or False
(indicated by 0). In Boolean indexing, we can select the subsets of
data based on the actual values in the DataFrame rather than their
row/column labels.
>>> ResultDF.loc['Maths'] > 90
Arnab False
Ramit True
Samridhi False
Riya False
Mallika True
Name: Maths, dtype: bool
Accessing DataFrames Element through Indexing
(B) Boolean Indexing:-
>>> ResultDF.loc[:,‘Arnab’]>90 #To check in which subjects ‘Arnab’ has
scored more than 90,
Accessing DataFrames Element through Slicing
>>> ResultDF.loc['Maths': 'Science']
>>> ResultDF.loc['Maths': 'Science', ‘Arnab’]
Maths 90
Science 91
Name: Arnab, dtype: int64
>>> ResultDF.loc['Maths': 'Science', ‘Arnab’:’Samridhi’]
>>> ResultDF.loc['Maths': 'Science',[‘Arnab’,’Samridhi’]]
Filtering Rows in DataFrames
In DataFrames, Boolean values like True (1) and False (0) can be
associated with indices. They can also be used to filter the records
using the DataFrmae.loc[] method.
In order to select or omit particular row(s), we can use a Boolean list
specifying ‘True’ for the rows to be shown and ‘False’ for the ones to be
omitted in the output.
>>> ResultDF.loc[[True, False, True]] # row having index as Science is
omitted
Joining, Merging and Concatenation of DataFrames