Pandas Dataframe
Pandas Dataframe
Features of DataFrame
Potentially columns are of different types
Size – Mutable
Labeled axes (rows and columns)
Can Perform Arithmetic operations on rows and
columns
A pandas DataFrame can be created using the
following constructor −
pandas.DataFrame( data, index, columns, dtype,
copy)
A pandas DataFrame can be created using various
inputs like −
Lists
dict
Series
Numpy ndarrays
Another DataFrame
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data)
print df
Example 2
The following example shows how to create a
DataFrame by passing a list of dictionaries and the row
indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print (df)
The following example shows how to create a DataFrame
with a list of dictionaries, row indices, and column indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'],
columns=['a', 'b'])
#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'],
columns=['a', 'b1'])
print (df1)
print (df2)
Dictionary of Series can be passed to form a DataFrame.
The resultant index is the union of all the series indexes
passed
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print (df)
# In order to deal with columns, we perform basic
operations on columns like selecting, deleting, adding
and renaming.
Column Selection:
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# select two columns
print(df[['Name', 'Qualification']])
Column Addition:
In Order to add a column in Pandas DataFrame, we can declare a new list as a
column and add to a existing Dataframe.
# Define a dictionary containing Students data
import pandas as pd
dic = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Height': [5.1, 6.2, 5.1,
5.2], 'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}
A. Keyframe
B. DataFrame
C. Statistics
D. Econometrics
Which of the following input can be accepted by
DataFrame?
a) Structured ndarray
b) Series
c) DataFrame
d) All of the mentioned
Identify the correct statement:
A. The standard marker for missing data in Pandas
is NaN
B. Series act in a way similar to that of an array
C. Both of the above
D. None of the above
If data is an ndarray, index must be the same
length as data.
a) True
b) False
Column Deletion
Columns can be deleted, popped or dropped.
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10,20,30], index=['a','b','c'])}
df = pd.DataFrame(d)
print(df)
# using del function
del df['one']
print (df)
# using pop function
df.pop('two')
print df
import pandas as pd
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
# select all rows # and second to fourth column
df[df.columns[1:4]]
Selection by Label
Rows can be selected by passing row label to a loc function.
[loc is label-based, which means that you have to specify rows and columns
based on their row and column label.]
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df.loc['b']
Selection by integer location
Rows can be selected by passing integer location to an iloc function.
iloc is integer index based, so you have to specify
rows and columns by their integer index
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c',
'd'])}
df = pd.DataFrame(d)
print df.iloc[2]
Select Multiple Rows
Multiple rows can be selected using ‘ : ’ operator.
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c',
'd'])}
df = pd.DataFrame(d)
print(df[2:4])
Addition of Rows
Add new rows to a DataFrame using the append function. This function will
append the rows at the end.
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df = df.append(df2)
print (df)
Deletion of Rows
Use index label to delete or drop rows from a DataFrame. If label is
duplicated, then multiple rows will be dropped.
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df = df.append(df2)
# Drop rows with label 0
df = df.drop(0)
print (df)
Deletion of Rows
Use index label to delete or drop rows from a DataFrame. If label is
duplicated, then multiple rows will be dropped.
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df = df.append(df2)
# Drop rows with label 0
df = df.drop(0)
print (df)
DataFrame is a two-dimensional matrix and
will give the shape as rows and columns by
df.shape
This is a tuple and thus if we need to store
the rows and columns into some variables
Pandas head() method is used to return top n
(5 by default) rows of a data frame or series.
We can get the detail of all the data in the
DataFrame like it’s max, min, mean etc. by
just one command df.describe()
Function to see first few observations in data
frame is
A. dataframe_object.head()
B.dataframe_object.start()
C.head()
D.All
What is the syntax to remove column from
dataframe
A. del dataframe_object(Column_name)
B. del Column_name
C. del dataframe_object()
D.None of the above
What is the syntax to remove column from
dataframe
A. del dataframe_object(Column_name)
B. del Column_name
C. del dataframe_object()
D.None of the above
The syntax to check uniqueness of lables
A.df.index.is_unique
B. df.is_unique
C. index.is_unique
D. None of the above
What is the method for generating multiple
statistics
A. df.explain()
B. df.stat()
C. df.describe()
D. All
What is the method for generating multiple
statistics
A. df.explain()
B. df.stat()
C. df.describe()
D. All
What is the syntax for reading a csv file into
dataframe in pandas
A. df = pd.read_csv(file_name.csv)
B. df = pd.read_csv()
C. df = read_csv(file_name.csv)
D. All
What function is used to fill missing data
A. df.fillna(value)
B. fillna(value)
C. df.fillna()
D. fillna()
The operator used for concatenation of
strings is
A. :
B. +
C. *
D. All
The index of last character in the string is
A. 0
B. 1
C. N
D. N -1