Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

Python Pandas-Data Frames

Uploaded by

Pari Khanuja
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Python Pandas-Data Frames

Uploaded by

Pari Khanuja
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Data Frames

Dataframe

A Data Frame is
two-dimensional labelled data
structure like a table of MySQL.
It contains rows and columns,
and therefore has both a row
and column index.
Creation of DataFrame
(A) Creation of an empty DataFrame
>>> import pandas as pd
>>> dFrameEmt = pd.DataFrame()
>>> dFrameEmt
Empty DataFrame
Columns: []
Index: []
Creation of DataFrame
import pandas as pd
a1=[1,2,3,4]
a2=[10,20,30,40]
a3=[20,45,67,8]
x=pd.DataFrame([a1,a2,a3],columns=['A','B','C','D'])
print(x)
Creation of DataFrame
(B) Creation of DataFrame from NumPy ndarrays
>>> import numpy as np
>>> array1 = np.array([10,20,30])
>>> array2 = np.array([100,200,300])
>>> array3 = np.array([-10,-20,-30, -40])
>>> dFrame4 = pd.DataFrame(array1)
>>> dFrame4
Creation of DataFrame
(B) Creation of DataFrame from NumPy ndarrays
>>> dFrame5 = pd.DataFrame([array1, array3, array2], columns=[ 'A', 'B',
'C', 'D'])
>>> dFrame5
A B C D
0 10 20 30 NaN
1 -10 -20 -30 -40.0
2 100 200 300 NaN
Creation of DataFrame
(C) Creation of DataFrame from List of Dictionaries
# Create list of dictionaries Here, the dictionary keys are
taken as column labels, and the
>>> listDict = [{'a':10, 'b':20}, {'a':5, values corresponding to each
key are taken as rows.There will
'b':10, 'c':20}] be as many rows as the number
of dictionaries present in the list.
>>> dFrameListDict = pd.DataFrame(listDict)
Number of columns in a
>>> dFrameListDict DataFrame is equal to the
maximum number of keys in any
abc dictionary of the list.

0 10 20 NaN
1 5 10 20.0
(D) Creation of DataFrame from Dictionary of Lists
dictForest = {'State': ['Assam', 'Delhi', 'Kerala'], 'GArea': [78438, 1483, 38852]
, 'VDF' : [2797, 6.72,1663]}
>>> dFrameForest= pd.DataFrame(dictForest)
>>> dFrameForest
State GArea VDF
0 Assam 78438 2797.00
1 Delhi 1483 6.72
2 Kerala 38852 1663.00
(D) Creation of DataFrame from Dictionary of Lists
We can change the sequence of columns in a DataFrame.
>>> dFrameForest1 = pd.DataFrame(dictForest, columns = ['State','VDF',
'GArea'])
>>> dFrameForest1
State VDF GArea
0 Assam 2797.00 78438
1 Delhi 6.72 1483
2 Kerala 1663.00 38852
(E) Creation of DataFrame from Series
seriesA = pd.Series([1,2,3,4,5],index = ['a', 'b', 'c', 'd', 'e'])
seriesB = pd.Series ([1000,2000,-1000,-5000,1000], index = ['a', 'b', 'c', 'd',
'e'])
seriesC = pd.Series([10,20,-10,-50,100], index = ['z', 'y', 'a', 'c', 'e'])
(E) Creation of DataFrame from Series
>>> dFrame6 = pd.DataFrame(seriesA)
>>> dFrame6
0
a1
b2
c3
d4
e5
(E) Creation of DataFrame from Series
>>> dFrame7 = pd.DataFrame([seriesA, seriesB])
>>> dFrame7
a b c d e
0 1 2 3 4 5
1 1000 2000 -1000 -5000 1000
(E) Creation of DataFrame from Series
>>> dFrame8 = pd.DataFrame([seriesA, seriesC])
>>> dFrame8
a b c d e z y
0 1.0 2.0 3.0 4.0 5.0 NaN NaN
1 -10.0 NaN -50.0 NaN 100.0 10.0 20.0
(F) Creation of DataFrame from Dictionary of Series
>>> ResultSheet={'Arnab': pd.Series([90, 91,
97],index=['Maths','Science','Hindi']),
'Ramit': pd.Series([92, 81, 96], index=['Maths','Science','Hindi']),
'Samridhi': pd.Series([89, 91, 88], index=['Maths','Science','Hindi']),
'Riya': pd.Series([81, 71, 67], index=['Maths','Science','Hindi']),
'Mallika': pd.Series([94, 95, 99], index=['Maths','Science','Hindi']) }
(F) Creation of DataFrame from Dictionary of Series
>>> ResultDF = pd.DataFrame(ResultSheet)
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika
Maths 90 92 89 81 94
Science 91 81 91 71 95
Hindi 97 96 88 67 99

When a DataFrame is created from a Dictionary of Series, the resulting index


or row labels are a union of all series indexes used to create the DataFrame.
Operations on rows and columns in DataFrames
(A) Adding a New Column to a DataFrame
>>> ResultDF['Preeti']=[89,78,76]
>>> ResultDF

Assigning values to a new column label that does not exist


will create a new column at the end.
Operations on rows and columns in DataFrames
(A) Adding a New Column to a DataFrame
>>> ResultDF['Ramit']=[99, 98, 78]
>>> ResultDF

>>> ResultDF['Arnab']=90 #To change the value of entire column


Operations on rows and columns in DataFrames
(B) Adding a New Row to a DataFrame
We can add a new row to a DataFrame using the
DataFrame.loc[ ] method.
>>> ResultDF.loc['English'] = [85, 86, 83, 80, 90, 89]
>>> ResultDF
We cannot use this method to add a row of data with already
existing (duplicate) index value (label). In such case, a row with this
index label will be updated. DataFRame.loc[] method can also be
used to change the data values of a row to a particular value.
Operations on rows and columns in DataFrames
(B) Adding a New Row to a DataFrame
If we try to add a row with lesser values than the number of
columns in the DataFrame, it results in a ValueError, with the error
message: ValueError: Cannot set a row with mismatched columns.
Similarly, if we try to add a column with lesser values than the
number of rows in the DataFrame, it results in a ValueError, with the
error message: ValueError: Length of values does not match length
of index.
Operations on rows and columns in DataFrames

Further, we can set all values of a DataFrame to a particular value,


for example:
>>> ResultDF[: ] = 0 # Set all values in ResultDF to 0
>>> ResultDF
Arnab Ramit Samridhi Riya Mallika Preeti
Maths 0 0 0 0 0 0
Science 0 0 0 0 0 0
Hindi 0 0 0 0 0 0
English 0 0 0 0 0 0
(C) Deleting Rows or Columns from a DataFrame
We can use the DataFrame.drop() method to delete rows
and columns from a DataFrame.
We need to specify the names of the labels to be dropped and the
axis from which they need to be dropped. To delete a row, the
parameter axis is assigned the value 0 and for deleting a
column,the parameter axis is assigned the value 1.

>>> ResultDF = ResultDF.drop('Science', axis=0)


>>> ResultDF = ResultDF.drop(['Samridhi','Ramit','Riya'], axis=1)
(D) Renaming Row Labels of a DataFrame
We can change the labels of rows and columns in a DataFrame
using the DataFrame.rename() method.
>>>
ResultDF=ResultDF.rename({'Maths':'Sub1',‘Science':'Sub2','English':
'Sub3', 'Hindi':'Sub4'}, axis='index')
>>> print(ResultDF)
The parameter axis='index' is used to specify that the row label is to
be changed. If no new label is passed corresponding to an existing
label, the existing row label is left as it is, for example:
(D) Renaming Row Labels of a DataFrame
>>>
ResultDF=ResultDF.rename({'Maths':'Sub1',‘Science':'Sub2','Hindi':'S
ub4'}, axis='index')
>>> print(ResultDF)
(E) Renaming Column Labels of a DataFrame
To alter the column names of ResultDF we can again use
the rename() method. The parameter axis='columns' implies we
want to change the column labels:

>>> ResultDF=ResultDF.rename({'Arnab':'Student1','Ramit':'Student2','
Samridhi':'Student3','Mallika':'Student4'},axis='columns')
Accessing DataFrames Element through Indexing
There are two ways of indexing Dataframes :
Label based indexing and Boolean Indexing.
(A) Label Based Indexing:- DataFrame.loc[ ] is an important method
that is used for label based indexing with DataFrames.
>>> ResultDF.loc['Science']
Arnab 91
Ramit 81
Samridhi 91
Riya 71
Mallika 95
Name: Science, dtype: int64
Accessing DataFrames Element through Indexing
(A) Label Based Indexing:-
Also, note that when the row label is passed as an integer value, it
is interpreted as a label of the index and not as an integer position
along the index, for example:
>>> dFrame10Multiples = pd.DataFrame([10,20,30,40,50])
>>> dFrame10Multiples.loc[2]
0 30
Name: 2, dtype: int64
When a single column label is passed, it returns the column
as a Series.
Accessing DataFrames Element through Indexing
(A) Label Based Indexing:-
>>> ResultDF.loc[:,'Arnab'] # we can obtain the same result that is
the marks of ‘Arnab’ in all the subjects
>>> ResultDF.loc[['Science', 'Hindi']] # To read more than one row
from a DataFrame
Accessing DataFrames Element through Indexing
(B) Boolean Indexing:- Boolean means a binary variable that can
represent either of the two states - True (indicated by 1) or False
(indicated by 0). In Boolean indexing, we can select the subsets of
data based on the actual values in the DataFrame rather than their
row/column labels.
>>> ResultDF.loc['Maths'] > 90
Arnab False
Ramit True
Samridhi False
Riya False
Mallika True
Name: Maths, dtype: bool
Accessing DataFrames Element through Indexing
(B) Boolean Indexing:-
>>> ResultDF.loc[:,‘Arnab’]>90 #To check in which subjects ‘Arnab’ has
scored more than 90,
Accessing DataFrames Element through Slicing
>>> ResultDF.loc['Maths': 'Science']
>>> ResultDF.loc['Maths': 'Science', ‘Arnab’]
Maths 90
Science 91
Name: Arnab, dtype: int64
>>> ResultDF.loc['Maths': 'Science', ‘Arnab’:’Samridhi’]
>>> ResultDF.loc['Maths': 'Science',[‘Arnab’,’Samridhi’]]
Filtering Rows in DataFrames
In DataFrames, Boolean values like True (1) and False (0) can be
associated with indices. They can also be used to filter the records
using the DataFrmae.loc[] method.
In order to select or omit particular row(s), we can use a Boolean list
specifying ‘True’ for the rows to be shown and ‘False’ for the ones to be
omitted in the output.
>>> ResultDF.loc[[True, False, True]] # row having index as Science is
omitted
Joining, Merging and Concatenation of DataFrames

We can use the pandas.DataFrame.append() method to merge two


DataFrames. It appends rows of the second DataFrame at the end of
the first DataFrame.

# append dFrame1 to dFrame2


>>> dFrame2 =dFrame2.append(dFrame1, sort=’True’)
>>> dFrame2 = dFrame2.append(dFrame1, sort=’False’)
Joining, Merging and Concatenation of DataFrames

The parameter verify_integrity of append()method may be set to True


when we want to raise an error if the row labels are duplicate. By
default, verify_integrity =False.

The parameter ignore_index of append()method may


be set to True, when we do not want to use row index
labels. By default, ignore_index = False.
Attributes of DataFrames
Consider the following series,
>>> ForestArea = { 'Assam' :pd.Series([78438, 2797,
10192, 15116], index = ['GeoArea', 'VeryDense', 'ModeratelyDense',
'OpenForest']), 'Kerala' :pd.Series([ 38852, 1663,
9407, 9251], index = ['GeoArea' ,'VeryDense', 'ModeratelyDense',
'OpenForest']), 'Delhi' :pd.Series([1483, 6.72, 56.24, 129.45], index =
['GeoArea', 'VeryDense', 'ModeratelyDense', 'OpenForest'])}

>>> ForestAreaDF = pd.DataFrame(ForestArea)


Attributes of DataFrames
Attributes of DataFrames
Attributes of DataFrames
Importing and Exporting Data between CSV Files and DataFrames
(For Practicals only)
Importing a CSV file to a DataFrame
>>> marks = pd.read_csv("C:/NCERT/ResultData.csv",sep =",", header=0)
★ The first parameter to the read_csv() is the name of the comma
separated data file along with its path.
★ The parameter sep specifies whether the values are separated by
comma, semicolon, tab, or any other character. The default value for
sep is a space.
★ The parameter header specifies the number of the row whose values
are to be used as the column names. header=0 implies that column
names are inferred from the first line of the file. By default, header=0.
Importing and Exporting Data between CSV Files and DataFrames
Importing a CSV file to a DataFrame
★ We can exclusively specify column names using the parameter
names while creating the DataFrame using the read_csv() function.
★ >>> marks1 = pd.read_csv("C:/NCERT/ResultData1.csv",sep=",",
names=['RNo','StudentName', 'Sub1','Sub2'])
Importing and Exporting Data between CSV Files and DataFrames
Exporting a DataFrame to a CSV file
We can use the to_csv() function to save a DataFrame to a text or csv
file.
>>>ResultDF.to_csv(path_or_buf='C:/NCERT/resultout.csv', sep=',')
➔ This creates a file by the name resultout.csv in the folder C:/NCERT
on the hard disk.
➔ In case we do not want the column names to be saved to the file we
may use the parameter header=False.
➔ Another parameter index=False is used when we do not want the row
labels to be written to the file on disk.
Importing and Exporting Data between CSV Files and DataFrames
Exporting a DataFrame to a CSV file
➔ >>> ResultDF.to_csv( 'C:/NCERT/resultonly.txt',sep = '@', header =
False, index= False)

You might also like