Class XII (As Per CBSE Board) : Informatics Practices
Class XII (As Per CBSE Board) : Informatics Practices
Class XII (As Per CBSE Board) : Informatics Practices
Chapter 1
Data Handling
using Pandas -1
Informatics Practices
Class XII ( As per CBSE Board)
Visit : for regular updates
Data Handling using Pandas -1
2. DataFrame
DataFrame is like a two-dimensional array with
heterogeneous data.
SR. Admn Student Name Class Section Gender Date Of
No. No Birth
1 001284 NIDHI MANDAL I A Girl 07/08/2010
2 001285 SOUMYADIP I A Boy 24/02/2011
3 001286 SHREYAANG I A Boy 29/12/2010
Basic feature of DataFrame are
Heterogeneous data
Size Mutable
Data Mutable
Pandas Series
It is like one-dimensional array capable of holding data
of any type (integer, string, float, python objects, etc.).
Series can be created using constructor.
Syntax :- pandas.Series( data, index, dtype, copy)
Creation of Series is also possible from – ndarray,
dictionary, scalar value.
Series can be created using
1. Array
2. Dict
3. Scalar value or constant
Pandas Series
Series([], dtype: float64)
Output Output
1 a 100 a
2 b 101 b
3 c 102 c
4 d 103d dtype:
dtype: object object
Note : default index is starting
from 0 Note : index is starting from 100
Output Output
a 0.0 b 1.0
b 1.0 c 2.0
c 2.0 d NaN
dtype: float64 a 0.0
dtype: float64
Pandas Series
Head function
a 1
b. 2
c. 3
dtype: int64
Return first 3 elements
Visit : for regular updates
Data Handling using Pandas -1
Pandas Series
tail function
c 3
d. 4
e. 5
dtype: int64
Return last 3 elements
Visit : for regular updates
Data Handling using Pandas -1
Pandas Series
Retrieve Data Using Label as (Index)
Output c
d 4
dtype: int64
Pandas Series
Retrieve Data from selection
There are three methods for data selection:
loc gets rows (or columns) with particular labels from
the index.
iloc gets rows (or columns) at particular positions in
the index (so it only takes integers).
ix usually tries to behave like loc but falls back to
behaving like iloc if a label is not present in the index.
ix is deprecated and the use of loc and iloc is encouraged
Pandas Series
Retrieve Data from
e.g. >>> s.ix[:3] # the integer is in the index so
>>> s = pd.Series(np.nan,
index=[49,48,47,46,45, 1, 2, 3, 4, 5]) s.ix[:3] works like loc
>>> s.iloc[:3] # slice the first three rows 49 NaN
49 NaN 48 NaN
48 NaN
47 NaN 47 NaN
>>> s.loc[:3] # slice up to and including 46 NaN
label 3 45 NaN
49 NaN
48 NaN
1 NaN
47 NaN 2 NaN
46 NaN 3 NaN
45 NaN
1 NaN
2 NaN
3 NaN
Pandas DataFrame
It is a two-dimensional data structure, just like any table
(with rows & columns).
Basic Features of DataFrame
Columns may be of different types
Size can be changed(Mutable)
Labeled axes (rows / columns)
Arithmetic operations on rows and columns
Pandas DataFrame
Create a DataFrame from Lists 0
e.g.1 0 1
output 1 2
import pandas as pd1 2 3
data1 = [1,2,3,4,5] 3 4
df1 = pd1.DataFrame(data1) 4 5
print (df1)
import pandas as pd1
data1 = [['Freya',10],['Mohak',12],['Dwivedi',13]]
Name Age
df1 = pd1.DataFrame(data1,columns=['Name','Age'])
1 Freya 10
print (df1) output 2 Mohak 12
2 Dwivedi 13
Pandas DataFrame
Create a DataFrame from Dict of ndarrays / Lists
import pandas as pd1
data1 = {'Name':['Freya', 'Mohak'],'Age':[9,10]}
df1 = pd1.DataFrame(data1)
print (df1)
Name Age
1 Freya 9
2 Mohak 10
Write below as 3rd statement in above prog for indexing
df1 = pd1.DataFrame(data1, index=['rank1','rank2','rank3','rank4'])
Visit : for regular updates
Data Handling using Pandas -1
Pandas DataFrame
Create a DataFrame from List of Dicts
import pandas as pd1
data1 = [{'x': 1, 'y': 2},{'x': 5, 'y': 4, 'z': 5}]
df1 = pd1.DataFrame(data1)
print (df1)
x y z
0 1 2 NaN
1 5 4 5.0
Column Deletion
del df1['one'] # Deleting the first column using DEL function
df.pop('two') #Deleting another column using POP function
Rename columns
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
>>> df.rename(columns={"A": "a", "B": "c"})
a c
0 1 4
1 2 5
2 3 6
Visit : for regular updates
Data Handling using Pandas -1
Pandas DataFrame
Row Selection, Addition, and Deletion
#Selection by Label
import pandas as pd1
d1 = {'one' : pd1.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd1.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])} df1
= pd1.DataFrame(d1)
print (df1.loc['b'])
one 2.0
two 2.0
Name: b, dtype: float64
Pandas DataFrame
#Selection by integer location
import pandas as pd1
d1 = {'one' : pd1.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd1.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df1 = pd1.DataFrame(d1)
print (df1.iloc[2])
one 3.0
two 3.0
Name: c, dtype: float64
Pandas DataFrame
Addition of Rows
import pandas as pd1
df1 = df1.append(df2)
print (df1)
Deletion of Rows
# Drop rows with label 0
df1 = df1.drop(0)
Pandas DataFrame
Iterate over rows in a dataframe
import pandas as pd1
import numpy as np1
raw_data1 = {'name': ['freya', 'mohak'],
'age': [10, 1],
'favorite_color': ['pink', 'blue'],
'grade': [88, 92]}
df1 = pd1.DataFrame(raw_data1, columns = ['name', 'age',
'favorite_color', 'grade'])
for index, row in df1.iterrows():
print (row["name"], row["age"])
freya 10
mohak 1
Visit : for regular updates
Data Handling using Pandas -1
Pandas DataFrame
Head & Tail
head() returns the first n rows (observe the index values). The default number of
elements to display is five, but you may pass a custom number. tail() returns the
last n rows .e.g.
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print df
print ("The first two rows of the data frame is:")
print df.head(2)
Visit : for regular updates
Data Handling using Pandas -1
Pandas DataFrame
Indexing a DataFrame using .loc[ ] :
This function selects data by the label of the rows and columns.
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])
# dictionary of lists
dict = {'name':[“Mohak", “Freya", “Roshni"],
'degree': ["MBA", "BCA", "M.Tech"],
'score':[90, 40, 80]}
Output 0 1 2
0 1 4 7
1 4 10 16
2 9 18 27
Visit : for regular updates
Data Handling using Pandas -1
Pandas DataFrame
Binary operation over
dataframe with dataframe
import pandas as pd
x = pd.DataFrame({0: [1,2,3], 1: [4,5,6], 2: [7,8,9] })
y = pd.DataFrame({0: [1,2,3], 1: [4,5,6], 2: [7,8,9] })
new_x = x.add(y, axis=0)
0 1 2
0 2 8 14
1 4 10 16
2 6 12 18
Note :- similarly we can use sub,mul,div functions
Pandas DataFrame
Merging/combining dataframe(different styles)
print (df)