Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
77 views

Python Pandas

The document provides an overview of Pantech eLearning which was established in 2004. It has over 2000 workshops and 300 FDPs reaching over 12.5 lacs students. The company has over 100 staff working across 7 wings including R&D and an industrial design services lab. It conducts workshops, FDPs, internships and online training. The document then provides details about a webinar on Python programming for data science including an introduction to pandas for data analysis and manipulation. It describes pandas data structures like Series, DataFrames and Panels and how to create and access data within these structures.

Uploaded by

Raja
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Python Pandas

The document provides an overview of Pantech eLearning which was established in 2004. It has over 2000 workshops and 300 FDPs reaching over 12.5 lacs students. The company has over 100 staff working across 7 wings including R&D and an industrial design services lab. It conducts workshops, FDPs, internships and online training. The document then provides details about a webinar on Python programming for data science including an introduction to pandas for data analysis and manipulation. It describes pandas data structures like Series, DataFrames and Panels and how to create and access data within these structures.

Uploaded by

Raja
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 96

Women’s Christian College

 Department Of Computer Applications.

 Python Programming For Data Science.

 Convenor:

 Ms.Sylvia Mary D

 (HOD)

 (Department Of Computer Applications)


Webinar On Python Programming
For Data Science
Overview of Pantech eLearning
Profile
Established on 2004
Training
7 Wings
2000 + Workshops 300+
FDP’s 12.5 Lacs
+ Students
01 100+ Team Size

04 02 R&D

03
Industrial Design Services Lab
Services Equipment Manufacturer
Workshops, FDP’s Working on Funded Projects
Internships, Value
Added Courses
Online Training
Project Guidance
Follow us on:
Instagram--https://instagram.com/pantechelearning?igshid=1fohp030onteu

Telegram--https://t.me/pantechelearning

YouTube--https://youtube.com/c/PantecheLearning.

Podcasts--https://www.instagram.com/tv/COPQIigJZFi/?igshid=4gjhz4dlls1p
Python - Pandas
Pandas is a open-source python library.
It provides highly efficient data structures and data
analysis tools for python programming language.
Python with pandas is used in a variety of domains
like academics,finance,economics,statistics and
PRE-REQUISITES
In order to learn pandas , one should be aware of the
computer programming terminologies.
A basic knowledge of other programming languages is
essential.
Pandas use most of the functionalities of numpy.
A basic understanding of numpy is necessary to
understand pandas.
Pandas
It is a open-source python library used for data manipulation and data
analysis.
Python with pandas is used in a variety of domains like statistics , finance
and web-analytics.
Using pandas , the following five steps will be accomplished.
Load
Organise
Manipulate
Model
Analyze
Important Features Of Pandas:
Efficient data frame object with customized indexing.
Supports different file formats and used for loading data into in-
memory data objects.
Aligns data and handles missing data.
Used for reshaping date sets.
Performs slicing based on labels , indexing and extracts subsets from
large datasets.
Can insert columns and delete columns from a dataset.
Data are grouped for aggregation and transformation.
Important Features Of Pandas:
Performance is higher while merging and joining of
data.
Provides time-series functionality.
PANDAS – ENVIRONMENT SETUP
Standard python distribution doesn’t have a Pandas module.
Pandas is installed using python package installer pip
Pip install pandas.
Once Anaconda is installed , Pandas will be installed with it.
Anaconda is a open-source python distribution for scipy.
It is available for linux and mac.
Pandas
It deals with the following three data structures.
Series
Data frame
Panel
These data-structures are built on top of numpy
array ,thus making them fast and efficient.
Dimension and Description:
Higher dimensional data structure is a container of lower dimensional
data structure.
Data-Frame is a container of series and panel is a container of data frame.
Series – It is a one-dimensional collection of similar elements .Series is
nothing but a collection of integers.
Points to Consider:
Collection of similar elements.
Size cannot be changed(i.e, it is immutable).
Values of the data can be changed(i.e , it is mutable).
Data Frame:
It is a heterogeneous collection of data elements and
the size of the table can be changed.
Data Frame is used in a variety of fields and it is a most
useful data structure.
It is a 2D labelled size-mutable tabular data structure.
Panel:
It is nothing but a 3D labelled , size mutable array.
It is difficult to build and handle two or more dimensional arrays .
More burden is placed on the user to consider the orientation of the data when writing
functions.
Using Pandas data structure, the mental effort of the user is reduced.
With tabular dataframe , it is useful to think of the index(rows) and the columns rather than
axis 0 and axis 1.
All pandas data structure are value mutable(values can be changed).
Except series, all are size mutable.
Series is size immutable.
Data frame is widely used and one of the most important data structure.
Panel is less frequently used data structure.
Panel:
Panel is nothing but a three-dimensional data structure with
heterogeneous collection of data.
Panel can’t be represented in a graphical format.
Panel can be illustrated as a container of dataframe.

Important Points:
Heterogeneous data
Size mutable
Data mutable
Series:
It is a one-dimensional data structure with homogeneous
collection of elements.
For example , it contains a collection of integers like
10,20,30,40,50.
Pandas – Series:
A pandas series can be denoted by the following constructor:
pandas.Series( data, index, dtype, copy)
Series:
The parameters of the constructor are as follows:
Data: It takes various forms like ndarray, lists and constants.
Index: Index values must be unique values.
The hashtable should be of the same length as the data.
Default is np.arange(n) if no index is passed.
Dtype: dtype indicates the datatype.
If no values are passed , then datatypes will be inferred.
Copy: It contains a copy of the data.
Default value is false.
Series:
A series can be created using various inputs like:
Array
Dict
Scalar value or constant.
Creation of Empty Series:
A basic series can be created and it is called an empty
series.
Example:
#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print (s)

Output:
Series([], dtype: float64)
Create a series from ndarray:
If data passed is an ndarray , then index passed must
be of the same length.
If no index is passed, then by default index will be of
range(n) where n is the array length.
[0,1,2,3…. range(len(array))-1].
Example 1:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print (s)
OUTPUT:
0 a
1 b
2 c
3 d
dtype: object
 No index values are passed.
 By default , it assigned the indices ranging from 0 to
len(data)-1, i.e from 0 to 3.
EXAMPLE 2:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array([‘b',‘c',‘d',‘a'])
s = pd.Series(data,index=[10,11,12,13])
print (s)
Output:
10 b
11 c
12 d
13 a
dtype: object
Create a series from Dict:
A dictionary can be passed as an input.
If no index is specified , then dictionary keys are taken
in a sorted order to create an index.
If index is passed , then values in the data
corresponding to the labels in the index will be pulled
out.
EXAMPLE 1:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
Print(s)
OUTPUT:
a 0.0
b 1.0
c 2.0
dtype: float64

Dictionary keys are used to construct the index.


EXAMPLE 2:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print (s)
Output:
B 1.0
c 2.0
d NaN
a 0.0
dtype: float64

We can see that the index order is persisted and the missing
element is filled with NaN(not a number)
Create a Series From a Scalar:
If the data is a scalar value , then an index must be
provided.
Value will be repeated to match the length of the
index.
Example:
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print (s)
Output:
0 5
1 5
2 5
3 5
dtype: int64
Accessing data from series with position:
Data in a series can be accessed similar to that in an ndimensional array.
Example:
Retrieve the first element.
Counting starts from zero in the array .
It means that the first element is stored at the 0th position and so on.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve the first
element
print s[0]
Output:
1
Example 3:
Retrieve the first three elements in the series
If a: is inserted in front of it, all items from that index
onwards will be extracted.
If two parameters (with : between them is used),items
between these two index positions will be extracted.
End index will be excluded.
Example 3:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve the first three element
print (s[:3])

Output:
a 1
b 2
c 3
dtype: int64
Example 4:
Retrieve the last three elements:
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
#retrieve the last three element
print (s[-3:])
Output:
c 3
d 4
e 5
dtype: int64
Retrieve the Data using Label(Index)
A series is like a fixed-size dict .
In a dictionary , we can get and set values by index label.
Example: Retrieve a single element using index label value.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve a single
element
print (s['a'])
Output:
1
Example:
Retrieve multiple elements using a list of label index values.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve multiple elements
print (s[['a','c','d']])
Output:
a 1
c 3
d 4
dtype: int64
Example 5:
If a label is not contained , then an exception is raised.
import pandas as pd
s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e']) #retrieve multiple
elements
print (s['f'])
Output:
KeyError: 'f'
Data Frame:
It is two dimensional array with different data elements(i.e , it is a
heterogeneous collection of data elements).
Data is stored in a tabular format in the form of rows and columns.
Foreg , consider the following dataframe.
Name Dept Semester Percentage
Sam ECE I 78
Geetha CSE II 85
Kala ECE III 75
Mala CSE IV 70
Features Of DataFrame:
Columns are of different types.
The size of the dataframe can be changed(i.e size – mutable)
Labeled axes(rows and columns)
Various arithmetic operations can be performed on rows and columns.
Pandas.DataFrame:
A pandas dataframe can be created using the following constructor.
pandas.DataFrame( data, index, columns, dtype, copy)
Parameters:
Data: data takes various forms like ndarray , series,map,list,dict,const
and also DataFrame.
Index: For row labels , the index to be used for the resulting frame is
optional.
Default is np.arange(n) if no index is passed.
Columns: For column labels , the default value is np.arange(n).
This is only true if no index is passed.
Dtype: Specifies the datatype of each column.
Copy: This command is used for copying the data if the default is false.
DataFrame Creation:
A pandas dataframe can be created by using various
inputs like:
Lists
Dict
Series
Numpy adarrays
Another DataFrame
Data Frame Description:
The table contains the students performance
department –wise and their percentage marks in each
semseter. Data is represented in the form of rows and
columns. Each column denotes an attribute and each
row denotes a student/person.
Data Type Of Columns
Name – String
Dept – String
Semester – String
Percentage – Integer

Heterogeneous collection of data(Collection of different data


elements)
Size can be changed(Size mutable)
Data can be changed(Data mutable)
Empty DataFrame Creation:
A Basic DataFrame can be created and it is called an empty DataFrame.
#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print (df)
Output:
Empty DataFrame
 Columns: []
Index: []
Pandas - Example
Pandas data frame consists of 3 principal components
namely the data ,rows and columns.
Data frame can be created from a list ,dictionary and from a
list of dictionary.
Import pandas as pd;
Lst = [‘sun’, ‘earth’, ‘mars’, ‘venus’, moon’]
Df = pd.DataFrame(lst)
Print(df)
DOUBTS:
Pandas - Example
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print (df)
Output:
0 1
1 2
2 3
3 4
4 5
Example
import pandas as pd
data = [['Ros',10],['Popy',12],['Sunny',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print (df)
Output:
Name Age
0 Ros 10
1 Pop 12
2 Lucy 20
Create a data frame from Dict of n darrays/list
All the ndarrays must be of the same length.
If the index is passed , then the length of the index
should be equal to the length of the arrays.
If no index is passed , then by default , index will be of
range(n) where n is the array length.
Example:
import pandas as pd
data = {'Name':['Tom', ‘Sam', ‘Ricky', ‘Steve'],'Age':
[28,34,29,42]}
df = pd.DataFrame(data)
print df
Output:
Age Name
0 28 Tom
1 34 Sam
2 29 Ricky
3 42 Steve
Indexed Data Frame Using Arrays
Create an indexed data frame using arrays.
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':
[28,34,29,42]}
df = pd.DataFrame(data,
index=['rank1','rank2','rank3','rank4'])
print df
Output:
Age Name
Rank 1 20 Sam
Rank 2 30 Yolo
Rank 3 40 Kala
Rank 4 50 Rita
Create a Data Frame from list of Dicts:
List of dictionaries can be passed as input to create a data frame.
Example 1:
Create a dictionary by passing a list of dataframes.
import pandas as pd
data = [{'a': 3, 'b': 5},{'a': 5, 'b': 10, 'c': 20}]
df = pd . DataFrame(data)
print df
Output:
a b c
0 3 5 NaN
1 5 10 15
Create a data frame by passing a list of dictionaries and row
indices:
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]
df = pd.DataFrame(data, index=['first', 'second'])
print df
Output:
a b c
First 1 2 NaN
Second 5 10 20
Task:
Write a python program to illustrate the use of passing
a dictionary to a dataframe.
Example 3:
Create a data frame with a list of dictionaries , row indices and column indices.
import pandas as pd
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] #With two column indices, values same
as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b']) #With two
column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print df1
print df2
Output:
#df1 output a b
first 1 2
second 5 10
#df2 output a b1
first 1 NaN
second 5 NaN
Fill in the missing code:
import pandas as ____
data _____[{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}] #With two column
indices, values same as dictionary keys
df1 = _____.DataFrame(data, index=['first', 'second'], columns=['a',
'b']) #With two column indices with one index with other name
df2 = ______.DataFrame(data, index=['first', 'second'], columns=['a',
'b1'])
print df1
print df2
Create a data frame from dict of series:
 Dictionary of series can be passed to form a data frame.
 Resultant index is the union of all the series indexes passed.
import pandas as pd
d = {'one' : pd.Series([1, 2, 3],
index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4],
index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
Print(df)
Output:
one two
A 1.0 1
B 2.0 2
C 3.0 3
C NaN 4
Column Selection:
We can understand this by selecting a column from the database.
Example:
import pandas as pd
d = {'one' : pd.Series([1, 2, 3],
index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df ['one']
DOUBTS:
Output:
a 1.0
b 2.0
c 3.0
d NaN
Name: one, dtype: float64
Column Addition:
We can understand this concept by adding a new column to the dataframe.
import pandas as pd
 d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b',
'c', 'd'])}
df = pd.DataFrame(d) # Adding a new column to an existing DataFrame object with
column label by passing new series
 print ("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print df print ("Adding a new column using the existing columns in DataFrame:")
df['four']=df['one']+df['three']
print df
DOUBTS:
Output:
Adding a new column by passing as Series:
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
Adding a new column using the existing columns in
DataFrame:
one two three four
a 1.0 1 10.0 11.0
b 2.0 2 20.0 22.0
c 3.0 3 30.0 33.0
d NaN 4 NaN NaN
Column Deletion:
Columns can be deleted or popped.
EXAMPLE:
# Using the previous DataFrame, we will delete a column # using del function
import pandas as pd
d = {'one' : pd.Series([1, 2, 3],
index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']), 'three' : pd.Series([10,20,30],
index=['a','b','c'])}
df = pd.DataFrame(d)
print ("Our dataframe is:")
print df # using del function
print ("Deleting the first column using DEL function:")
del df['one']
print df # using pop function print ("Deleting another column using POP function:")
df.pop('two')
print df
Output:
Our dataframe is: one three two a 1.0 10.0 1 b 2.0 20.0 2 c
3.0 30.0 3 d NaN NaN 4 Deleting the first column
using DEL function: three two a 10.0 1 b 20.0 2 c 30.0 3
d NaN 4 Deleting another column using POP
function: three a 10.0 b 20.0 c 30.0 d NaN
Row selection , deletion and addition:
Selection by label:
import pandas as pd
d = {'one' : pd.Series([1, 2, 3],
index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4],
index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df.loc['b']
Output:
one 2.0 two 2.0 Name: b, dtype: float64
Selection by integer location:
Rows can be selected by passing the integer location to
the iloc function.

import pandas as pd d = {'one' : pd.Series([1, 2, 3],


index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4], index=['a',
'b', 'c', 'd'])} df = pd.DataFrame(d) print df.iloc[2]
Output:
one 3.0 two 3.0 Name: c, dtype: float64
Slice Rows:
import pandas as pd
d = {'one' : pd.Series([1, 2, 3],
index=['a', 'b', 'c']), 'two' : pd.Series([1, 2, 3, 4],
index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)
print df[2:4]
one two c 3.0 3 d NaN 4
Addition of Rows:
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df = df.append(df2)
print df
Output:
ab012134056178
Deletion of Rows:
import pandas as pd df = pd.DataFrame([[1, 2], [3, 4]],
columns = ['a','b']) df2 = pd.DataFrame([[5, 6], [7, 8]],
columns = ['a','b']) df = df.append(df2) # Drop rows
with label 0 df = df.drop(0) print df
Output:
ab134178
Follow us on:
Instagram--https://instagram.com/pantechelearning?igshid=1fohp030onteu

Telegram--https://t.me/pantechelearning

YouTube--https://youtube.com/c/PantecheLearning.

Podcasts--https://www.instagram.com/tv/COPQIigJZFi/?igshid=4gjhz4dlls1p
After Internship Registration what you have to do?

1. Login to www.pantechelearning.com
2.Access the Video on daily basis for next 30 Days. Practice the
Concept and submit assignments
3. Ask your doubts in VIP Group. Group link is avail in your
dashboard.
4.Finish all the videos and download your Certificate from your
dashboard
Internship Certificate (Sample)
30 Days Internship on Machine Learning Master Class
Reg Link: https://imjo.in/Rb6xqe
Discount Coupon Code: WELCOMEML

Happy learning
Call / whatsapp : +91 9840974408
THANKS
Further Information:

Senthil Kumar – 98409 74406


Srinivasan – 7010888841
Kumarasamy - 8925533489

www.pantechelearning.com |
CREDITS: This presentation template was created by Slidesgo,
training@pantechmail.com
including icons by Flaticon, and infographics & images by Freepik

You might also like