Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
100 views

Pandas Dataframe

A DataFrame is a two-dimensional data structure where data is aligned in a tabular format with rows and columns. It can be created from many different input types like lists, dictionaries, and NumPy arrays. DataFrames allow labeling of rows and columns and can perform arithmetic operations on rows and columns. Common operations on DataFrames include selecting, adding, deleting, and renaming columns as well as selecting, adding, deleting, and sorting rows.

Uploaded by

James Prakash
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views

Pandas Dataframe

A DataFrame is a two-dimensional data structure where data is aligned in a tabular format with rows and columns. It can be created from many different input types like lists, dictionaries, and NumPy arrays. DataFrames allow labeling of rows and columns and can perform arithmetic operations on rows and columns. Common operations on DataFrames include selecting, adding, deleting, and renaming columns as well as selecting, adding, deleting, and sorting rows.

Uploaded by

James Prakash
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

 A Data frame is a two-dimensional data structure,

i.e., data is aligned in a tabular fashion in rows and


columns.

 Features of DataFrame
 Potentially columns are of different types
 Size – Mutable
 Labeled axes (rows and columns)
 Can Perform Arithmetic operations on rows and
columns
 A pandas DataFrame can be created using the
following constructor −
 pandas.DataFrame( data, index, columns, dtype,
copy)
 A pandas DataFrame can be created using various
inputs like −
 Lists
 dict
 Series
 Numpy ndarrays
 Another DataFrame

In the subsequent slides of this lecture, we will see


how to create a DataFrame using these inputs.
 Create an Empty DataFrame
 A basic DataFrame, which can be created is an
Empty Dataframe.

 #import the pandas library and aliasing as pd


 import pandas as pd
 df = pd.DataFrame()
 print df
 Create a DataFrame from Lists
 The DataFrame can be created using a single list or
a list of lists.
 Example 1
 import pandas as pd
 data = [1,2,3,4,5]
 df = pd.DataFrame(data)
 print (df)
 Example 3
 import pandas as pd
 data = [['Alex',10],['Bob',12],['Clarke',13]]
 df=pd.DataFrame(data,columns=['Name','Age'],dty
pe=float)
 print(df)
 Note − Observe, the dtype parameter changes the
type of Age column to floating poi
 Create DataFrame from Dictionary using default
Constructor
 DataFrame constructor accepts a data object that
can be ndarray, dictionary etc.
 But if we are passing a dictionary in data, then it
should contain a list like objects in value field like
Series, arrays or lists etc i.e.
 # Dictionary with list object in values
 studentData = {
 'name' : ['jack', 'Riti', 'Aadi'],
 'age' : [34, 30, 16],
 'city' : ['Sydney', 'Delhi', 'New york']
 }
 On Initialising a DataFrame object with this kind of
dictionary, each item (Key / Value pair) in
dictionary will be converted to one column i.e. key
will become Column Name and list in the value
field will be the column data
 # Dictionary with list object in values
 Import pandas as pd
 studentData = {
 'name' : ['jack', 'Riti', 'Aadi'],
 'age' : [34, 30, 16],
 'city' : ['Sydney', 'Delhi', 'New york']
 }
 dfObj = pd.DataFrame(studentData)
 print(dfObj)
 All the ndarrays must be of same length. If index is
passed, then the length of the index should equal to the
length of the arrays.
 If no index is passed, then by default, index will be
range(n), where n is the array length.
 import pandas as pd
 data={'Name':['Tom','Jack','Steve','Ricky'],'Age':[28,34,29
,42]}
 df = pd.DataFrame(data)
 print(df)
 List of Dictionaries can be passed as input data to
create a DataFrame. The dictionary keys are by default
taken as column names.

 The following example shows how to create a


DataFrame by passing a list of dictionaries.

 import pandas as pd
 data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
 df = pd.DataFrame(data)
 print df
 Example 2
 The following example shows how to create a
DataFrame by passing a list of dictionaries and the row
indices.

 import pandas as pd
 data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
 df = pd.DataFrame(data, index=['first', 'second'])
 print (df)
 The following example shows how to create a DataFrame
with a list of dictionaries, row indices, and column indices.

 import pandas as pd
 data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
 #With two column indices, values same as dictionary keys
 df1 = pd.DataFrame(data, index=['first', 'second'],
columns=['a', 'b'])
 #With two column indices with one index with other name
 df2 = pd.DataFrame(data, index=['first', 'second'],
columns=['a', 'b1'])
 print (df1)
 print (df2)
 Dictionary of Series can be passed to form a DataFrame.
The resultant index is the union of all the series indexes
passed

 import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
 df = pd.DataFrame(d)
 print (df)
 # In order to deal with columns, we perform basic
operations on columns like selecting, deleting, adding
and renaming.
 Column Selection:
 import pandas as pd
 # Define a dictionary containing employee data
 data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
 'Age':[27, 24, 22, 32],
 'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
 'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
 # Convert the dictionary into DataFrame
 df = pd.DataFrame(data)
 # select two columns
 print(df[['Name', 'Qualification']])
 Column Addition:
 In Order to add a column in Pandas DataFrame, we can declare a new list as a
column and add to a existing Dataframe.
 # Define a dictionary containing Students data

import pandas as pd
dic = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Height': [5.1, 6.2, 5.1,
5.2], 'Qualification': ['Msc', 'MA', 'Msc', 'Msc']}

 # Convert the dictionary into DataFrame


df = pd.DataFrame(data=dic)

 # Declare a list that is to be converted into a column


address = ['Delhi', 'Bangalore', 'Chennai', 'Patna']
# Using 'Address' as the column name # and equating it to the list
df['Address'] = address
# Observe the result
print(df)
 Column Addition:
import pandas as pd
dic = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(dic)
 # Adding a new column to an existing DataFrame
object with column label by passing new series
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print(df)
df['four']=df['one']+df['three']
print(df)
 Pandas key data structure is called?

 A. Keyframe
B. DataFrame
C. Statistics
D. Econometrics
 Which of the following input can be accepted by
DataFrame?
a) Structured ndarray
b) Series
c) DataFrame
d) All of the mentioned
 Identify the correct statement:
 A. The standard marker for missing data in Pandas
is NaN
 B. Series act in a way similar to that of an array
 C. Both of the above
 D. None of the above
 If data is an ndarray, index must be the same
length as data.
a) True
b) False
 Column Deletion
 Columns can be deleted, popped or dropped.
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']),
'three' : pd.Series([10,20,30], index=['a','b','c'])}
df = pd.DataFrame(d)
print(df)
 # using del function
del df['one']
print (df)
 # using pop function
df.pop('two')
print df
 import pandas as pd
 # Define a dictionary containing employee data
 data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
 'Age':[27, 24, 22, 32],
 'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
 'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
 # Convert the dictionary into DataFrame
 df = pd.DataFrame(data)
 # select all rows # and second to fourth column
 df[df.columns[1:4]]
 Selection by Label
 Rows can be selected by passing row label to a loc function.
 [loc is label-based, which means that you have to specify rows and columns
based on their row and column label.]

 import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
 df = pd.DataFrame(d)
 print df.loc['b']
 Selection by integer location
 Rows can be selected by passing integer location to an iloc function.
 iloc is integer index based, so you have to specify
rows and columns by their integer index
 import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c',
'd'])}
 df = pd.DataFrame(d)
 print df.iloc[2]
 Select Multiple Rows
 Multiple rows can be selected using ‘ : ’ operator.
 import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c',
'd'])}
 df = pd.DataFrame(d)
 print(df[2:4])
 Addition of Rows
 Add new rows to a DataFrame using the append function. This function will
append the rows at the end.

 import pandas as pd
 df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
 df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
 df = df.append(df2)
 print (df)
 Deletion of Rows
 Use index label to delete or drop rows from a DataFrame. If label is
duplicated, then multiple rows will be dropped.

 If you observe, in the above example, the labels are duplicate.

 import pandas as pd
 df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
 df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
 df = df.append(df2)
 # Drop rows with label 0
 df = df.drop(0)
 print (df)
 Deletion of Rows
 Use index label to delete or drop rows from a DataFrame. If label is
duplicated, then multiple rows will be dropped.

 If you observe, in the above example, the labels are duplicate.

 import pandas as pd
 df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
 df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
 df = df.append(df2)
 # Drop rows with label 0
 df = df.drop(0)
 print (df)
 DataFrame is a two-dimensional matrix and
will give the shape as rows and columns by
 df.shape
 This is a tuple and thus if we need to store
the rows and columns into some variables
 Pandas head() method is used to return top n
(5 by default) rows of a data frame or series.
 We can get the detail of all the data in the
DataFrame like it’s max, min, mean etc. by
just one command df.describe()
 Function to see first few observations in data
frame is
 A. dataframe_object.head()
 B.dataframe_object.start()
 C.head()
 D.All
 What is the syntax to remove column from
dataframe
 A. del dataframe_object(Column_name)
 B. del Column_name
 C. del dataframe_object()
 D.None of the above
 What is the syntax to remove column from
dataframe
 A. del dataframe_object(Column_name)
 B. del Column_name
 C. del dataframe_object()
 D.None of the above
 The syntax to check uniqueness of lables
 A.df.index.is_unique
 B. df.is_unique
 C. index.is_unique
 D. None of the above
 What is the method for generating multiple
statistics
 A. df.explain()
 B. df.stat()
 C. df.describe()
 D. All
 What is the method for generating multiple
statistics
 A. df.explain()
 B. df.stat()
 C. df.describe()
 D. All
 What is the syntax for reading a csv file into
dataframe in pandas
 A. df = pd.read_csv(file_name.csv)
 B. df = pd.read_csv()
 C. df = read_csv(file_name.csv)
 D. All
 What function is used to fill missing data
 A. df.fillna(value)
 B. fillna(value)
 C. df.fillna()
 D. fillna()
 The operator used for concatenation of
strings is
 A. :
 B. +
 C. *
 D. All
 The index of last character in the string is
 A. 0
 B. 1
 C. N
 D. N -1

You might also like