Pandas Python
Pandas Python
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan Introduction to Python Pandas for Data
Introduction Analytics
to Python
Python
programming
NumPy
Srijith Rajamohan
Matplotlib
Advanced Research Computing, Virginia Tech
Introduction
to Pandas
Case study
Tuesday 19th July, 2016
Conclusion
1 / 115
Course Contents
Introduction
to Python
Pandas for
Data
Analytics
This week:
Srijith
Rajamohan Introduction to Python
Introduction Python Programming
to Python
Python NumPy
programming
Plotting with Matplotlib
NumPy
2 / 115
Section 1
Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics
Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy
Matplotlib
5 Introduction to Pandas
Introduction
to Pandas
Case study
6 Case study
Conclusion
7 Conclusion
3 / 115
Python Features
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Why Python ?
Introduction
Interpreted
to Python
Intuitive and minimalistic code
Python
programming
Expressive language
NumPy
Dynamically typed
Matplotlib
Case study
Conclusion
4 / 115
Python Features
Introduction
to Python
Pandas for
Data
Analytics
Advantages
Srijith Ease of programming
Rajamohan
Minimizes the time to develop and maintain code
Introduction
to Python Modular and object-oriented
Python
programming Large community of users
NumPy A large standard and user-contributed library
Matplotlib
Introduction
Disadvantages
to Pandas
Case study
Interpreted and therefore slower than compiled languages
Conclusion Decentralized with packages
5 / 115
Code Performance vs Development Time
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Introduction
to Python
Python
programming
NumPy
Matplotlib
Introduction
to Pandas
Case study
Conclusion
6 / 115
Versions of Python
Introduction
to Python
Pandas for
Data
Analytics
NumPy
Matplotlib
Example
Introduction
to Pandas $ python -- version
Case study
Conclusion
7 / 115
Section 2
Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics
Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy
Matplotlib
5 Introduction to Pandas
Introduction
to Pandas
Case study
6 Case study
Conclusion
7 Conclusion
8 / 115
Variables
Introduction
to Python
Pandas for
Data
Analytics Variable names can contain alphanumerical characters and
Srijith some special characters
Rajamohan
It is common to have variable names start with a
Introduction
to Python lower-case letter and class names start with a capital letter
Python
programming
Some keywords are reserved such as and, assert,
NumPy
break, lambda. A list of keywords are located at
Matplotlib https://docs.python.org/2.5/ref/keywords.html
Introduction Python is dynamically typed, the type of the variable is
to Pandas
Case study
derived from the value it is assigned.
Conclusion A variable is assigned using the = operator
9 / 115
Variable types
Introduction
to Python
Pandas for
Data
Analytics Variable types
Srijith
Rajamohan
Integer (int)
Float (float)
Introduction Boolean (bool)
to Python
Complex (complex)
Python
programming String (str)
NumPy ...
Matplotlib User Defined! (classes)
Introduction Documentation
to Pandas
https://docs.python.org/2/library/types.html
Case study
https://docs.python.org/2/library/datatypes.html
Conclusion
10 / 115
Variable types
Introduction
to Python
Pandas for
Data
Analytics
Introduction
to Python
Example
Python
programming >>> log_file = open ( " / home / srijithr /
NumPy logfile " ," r " )
Matplotlib >>> type ( log_file )
Introduction file
to Pandas
Case study
Conclusion
11 / 115
Variable types
Introduction
to Python
Pandas for
Data
Analytics
Variables can be cast to a different type
Srijith
Rajamohan
Example
Introduction
to Python
Python
>>> share_of_rent = 295.50 / 2.0
programming >>> type ( share_of_rent )
NumPy
float
Matplotlib
>>> rounded_share = int ( share_of_rent )
Introduction
to Pandas >>> type ( rounded_share )
Case study int
Conclusion
12 / 115
Operators
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Arithmetic operators +, -, *, /, // (integer division for
Introduction
to Python floating point numbers), ** power
Python Boolean operators and, or and not
programming
Case study
Conclusion
13 / 115
Strings (str)
Introduction
to Python
Pandas for
Example
Data
Analytics
>>> dir ( str )
Srijith
Rajamohan [... , capitalize , center , count ,
decode , encode , endswith ,
Introduction
to Python expandtabs , find , format , index ,
Python isalnum , isalpha , isdigit ,
programming
NumPy
islower , isspace , istitle ,
Matplotlib
isupper , join , ljust , lower ,
Introduction
lstrip , partition , replace , rfind
to Pandas
, rindex , rjust , rpartition ,
Case study
rsplit , rstrip , split , splitlines
Conclusion
, startswith , strip , swapcase ,
title , translate , upper , zfill ]
14 / 115
Strings
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> greeting = " Hello world ! "
Introduction
>>> len ( greeting )
to Python 12
Python
programming
>>> greeting
NumPy Hello world
Matplotlib >>> greeting [0] # indexing starts at 0
Introduction H
to Pandas
>>> greeting . replace ( " world " , " test " )
Case study
Hello test !
Conclusion
15 / 115
Printing strings
Introduction
to Python Example
Pandas for
Data
Analytics # concatenates strings with a space
Srijith
Rajamohan
>>> print ( " Go " , " Hokies " )
Go Hokies
Introduction
to Python
# concatenated without space
Python >>> print ( " Go " + " Tech " + " Go " )
programming
GoTechGo
NumPy
# C - style string formatting
Matplotlib
>>> print ( " Bar Tab = % f " %35.28)
Introduction
to Pandas Bar Tab = 35.280000
Case study # Creating a formatted string
Conclusion >>> total = " My Share = %.2 f . Tip = % d " %
(11.76 , 2.352)
>>> print ( total )
My Share = 11.76. Tip = 2 16 / 115
Lists
Introduction
to Python
Pandas for
Data
Analytics Array of elements of arbitrary type
Srijith
Rajamohan Example
Introduction
to Python >>> numbers = [1 ,2 ,3]
Python
programming
>>> type ( numbers )
NumPy
list
Matplotlib
>>> arbitrary_array = [1 , numbers , " hello " ]
Introduction >>> type ( arbitrary_array )
to Pandas
list
Case study
Conclusion
17 / 115
Lists
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
# create a new empty list
Introduction >>> characters = []
to Python
Python
# add elements using append
programming >>> characters . append ( " A " )
NumPy >>> characters . append ( " d " )
Matplotlib
>>> characters . append ( " d " )
Introduction
to Pandas >>> print ( characters )
Case study [ A , d , d ]
Conclusion
18 / 115
Lists
Introduction
to Python
Pandas for
Data
Analytics Lists are mutable - their values can be changed.
Srijith
Rajamohan Example
Introduction
to Python >>> characters = [ " A " ," d " ," d " ]
Python
programming
# Changing second and third element
NumPy
>>> characters [1] = " p "
Matplotlib
>>> characters [2] = " p "
Introduction >>> print ( characters )
to Pandas
[ A , p , p ]
Case study
Conclusion
19 / 115
Lists
Introduction
to Python
Pandas for Example
Data
Analytics
20 / 115
Lists
Introduction
to Python
Pandas for Example
Data
Analytics
21 / 115
Tuples
Introduction
to Python
Tuples are like lists except they are immutable. Difference is in
Pandas for
Data
performance
Analytics
Srijith
Example
Rajamohan
Introduction
>>> point = (10 , 20) # Note () for tuples
to Python instead of []
Python
programming
>>> type ( point )
NumPy
tuple
Matplotlib
>>> point = 10 ,20
Introduction >>> type ( point )
to Pandas
tuple
Case study
>>> point [2] = 40 # This will fail !
Conclusion
TypeError : tuple object does not support
item assignment
22 / 115
Dictionary
Introduction
to Python
Pandas for
Dictionaries are lists of key-value pairs
Data
Analytics Example
Srijith
Rajamohan
>>> prices = { " Eggs " : 2.30 ,
Introduction ... " Sausage " : 4.15 ,
to Python
... " Spam " : 1.59 ,}
Python
programming >>> type ( prices )
NumPy dict
Matplotlib >>> print ( prices )
Introduction
to Pandas
{ Eggs : 2.3 , Sausage : 4.15 , Spam :
Case study 1.59}
Conclusion >>> prices [ " Spam " ]
1.59
23 / 115
Conditional statements: if, elif, else
Introduction
to Python
Pandas for Example
Data
Analytics
24 / 115
Loops - For
Introduction
to Python Example
Pandas for
Data
Analytics >>> for i in [1 ,2 ,3]: # i is an arbitrary
Srijith
Rajamohan
variable for use within the loop
section
Introduction
to Python
... print ( i )
Python 1
programming
2
NumPy
3
Matplotlib
>>> for word in [ " scientific " , " computing "
Introduction
to Pandas , " with " , " python " ]:
Case study ... print ( word )
Conclusion scientific
computing
with
python 25 / 115
Loops - While
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>>i = 0
Introduction
>>> while i < 5:
to Python ... print ( i )
Python
programming
... i = i + 1
NumPy 0
Matplotlib 1
Introduction 2
to Pandas
3
Case study
4
Conclusion
26 / 115
Functions
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> def pr int_wo rd_len gth ( word ) :
Introduction
... """
to Python ... Print a word and how many
Python
programming
characters it has
NumPy ... """
Matplotlib ... print ( word + " has " + str ( len (
Introduction word ) ) + " characters . " )
to Pandas
>>> print_wo rd_len gth ( " Diversity " )
Case study
Diversity has 9 characters .
Conclusion
27 / 115
Functions - arguments
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Passing immutable arguments like integers, strings or
Introduction
to Python tuples acts like call-by-value
Python They cannot be modified!
programming
NumPy
Passing mutable arguments like lists behaves like
Matplotlib
call-by-reference
Introduction
to Pandas
Case study
Conclusion
28 / 115
Functions - arguments
Introduction
to Python
Pandas for
Data
Analytics Call-by-value
Srijith
Rajamohan Example
Introduction
to Python >>> def make_me_rich ( balance ) :
Python
programming
balance = 1000000
NumPy
account_balance = 500
Matplotlib
>>> make_me_rich ( account_balance )
Introduction >>> print ( account_balance )
to Pandas
500
Case study
Conclusion
29 / 115
Functions - arguments
Introduction
to Python
Pandas for
Call-by-reference
Data
Analytics Example
Srijith
Rajamohan
>>> def talk_to_advisor ( tasks ) :
Introduction tasks . insert (0 , " Publish " )
to Python
tasks . insert (1 , " Publish " )
Python
programming tasks . insert (2 , " Publish " )
NumPy >>> todos = [ " Graduate " ," Get a job " ," ... " ,
Matplotlib " Profit ! " ]
Introduction
to Pandas
>>> talk_to_advisor ( todos )
Case study >>> print ( todos )
Conclusion [ " Publish " ," Publish " ," Publish " ," Graduate "
," Get a job " ," ... " ," Profit ! " ]
30 / 115
Functions - arguments
Introduction
to Python
However, you cannot assign a new object to the argument
Pandas for A new memory location is created for this list
Data
Analytics This becomes a local variable
Srijith
Rajamohan Example
Introduction
to Python
>>> def switcheroo ( favorite_teams ) :
Python ... print ( favorite_teams )
programming
... favorite_teams = [ " Redskins " ]
NumPy
... print ( favorite_teams )
Matplotlib
>>> my_favor ite_te ams = [ " Hokies " , "
Introduction
to Pandas Nittany Lions " ]
Case study >>> switcheroo ( my _favor ite_te ams )
Conclusion [ " Hokies " , " Nittany Lions " ]
[ " Redskins " ]
>>> print ( my _favor ite_te ams )
[ " Hokies " , " Nittany Lions " ] 31 / 115
Functions - Multiple Return Values
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Example
Rajamohan
NumPy
>>> print ( squared )
Matplotlib
9
Introduction
>>> print ( cubed )
to Pandas
27
Case study
Conclusion
32 / 115
Functions - Default Values
Introduction
to Python
Pandas for
Data Example
Analytics
33 / 115
Section 3
Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics
Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy
Matplotlib
5 Introduction to Pandas
Introduction
to Pandas
Case study
6 Case study
Conclusion
7 Conclusion
34 / 115
NumPy
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Used in almost all numerical computations in Python
Introduction
to Python
Used for high-performance vector and matrix computations
Python Provides fast precompiled functions for numerical routines
programming
NumPy
Written in C and Fortran
Matplotlib Vectorized computations
Introduction
to Pandas
Case study
Conclusion
35 / 115
Why NumPy?
Introduction
to Python Example
Pandas for
Data
Analytics >>> from numpy import *
Srijith
Rajamohan
>>> import time
>>> def trad_version () :
Introduction
to Python
t1 = time . time ()
Python X = range (10000000)
programming
Y = range (10000000)
NumPy
Z = []
Matplotlib
for i in range ( len ( X ) ) :
Introduction
to Pandas Z . append ( X [ i ] + Y [ i ])
Case study return time . time () - t1
Conclusion
>>> trad_version ()
1.9738149642 94 43 36
36 / 115
Why NumPy?
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> def numpy_version () :
Introduction
t1 = time . time ()
to Python X = arange (10000000)
Python
programming
Y = arange (10000000)
NumPy Z = X + Y
Matplotlib return time . time () - t1
Introduction
to Pandas
>>> numpy_version ()
Case study
0.05930709 8 3 8 8 6 7 1 8 7 5
Conclusion
37 / 115
Arrays
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> from numpy import *
Introduction
# the argument to the array function is a
to Python Python list
Python
programming
>>> v = array ([1 ,2 ,3 ,4])
NumPy # the argument to the array function is a
Matplotlib nested Python list
Introduction >>> M = array ([[1 , 2] , [3 , 4]])
to Pandas
>>> type ( v ) , type ( M )
Case study
( numpy . ndarray , numpy . ndarray )
Conclusion
38 / 115
Arrays
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> v . shape , M . shape
Introduction
((4 ,) , (2 , 2) )
to Python >>> M . size
Python
programming
4
NumPy >>> M . dtype
Matplotlib dtype ( int64 )
Introduction # Explicitly define the type of the array
to Pandas
>>> M = array ([[1 , 2] , [3 , 4]] , dtype =
Case study
complex )
Conclusion
39 / 115
Arrays - Using array-generating functions
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
>>> x = arange (0 , 10 , 1) # arguments :
Introduction start , stop , step
to Python
Python
array ([0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9])
programming >>> linspace (0 ,10 ,11) # arguments : start ,
NumPy end and number of points ( start and
Matplotlib
end points are included )
Introduction
to Pandas array ([ 0. , 1. , 2. , 3. , 4. , 5. ,
Case study 6. , 7. , 8. , 9. , 10.])
Conclusion
40 / 115
Diagonal and Zero matrix
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
>>> diag ([1 ,2 ,3])
Introduction array ([[1 , 0 , 0] ,
to Python
Python
[0 , 2 , 0] ,
programming [0 , 0 , 3]])
NumPy >>> zeros ((3 ,3) )
Matplotlib
array ([[ 0. , 0. , 0.] ,
Introduction
to Pandas [ 0. , 0. , 0.] ,
Case study [ 0. , 0. , 0.]])
Conclusion
41 / 115
Array Access
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
>>> M = random . rand (3 ,3)
Introduction >>> M
to Python
Python
array ([
programming [ 0.37389376 , 0.64335721 , 0.12435669] ,
NumPy [ 0.01444674 , 0.13963834 , 0.36263224] ,
Matplotlib
[ 0.00661902 , 0.14865659 , 0.75066302]])
Introduction
to Pandas >>> M [1 ,1]
Case study 0.13963834 21 4 7 55 5 8 8
Conclusion
42 / 115
Array Access
Introduction
to Python Example
Pandas for
Data
Analytics # Access the first row
Srijith
Rajamohan
>>> M [1]
array (
Introduction
to Python
[ 0.01444674 , 0.13963834 , 0.36263224])
Python # The first row can be also be accessed
programming
using this notation
NumPy
>>> M [1 ,:]
Matplotlib
array (
Introduction
to Pandas [ 0.01444674 , 0.13963834 , 0.36263224])
Case study # Access the first column
Conclusion >>> M [: ,1]
array (
[ 0.64335721 , 0.13963834 , 0.14865659])
43 / 115
Array Access
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
# You can also assign values to an entire
Introduction row or column
to Python
Python
>>> M [1 ,:] = 0
programming >>> M
NumPy array ([
Matplotlib
[ 0.37389376 , 0.64335721 , 0.12435669] ,
Introduction
to Pandas [ 0. , 0. , 0. ],
Case study [ 0.00661902 , 0.14865659 , 0.75066302]])
Conclusion
44 / 115
Array Slicing
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan # Extract slices of an array
Introduction
>>> M [1:3]
to Python array ([
Python
programming
[ 0. , 0. , 0. ],
NumPy [ 0.00661902 , 0.14865659 , 0.75066302]])
Matplotlib >>> M [1:3 ,1:2]
Introduction array ([
to Pandas
[ 0. ],
Case study
[ 0.14865659]])
Conclusion
45 / 115
Array Slicing - Negative Indexing
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
# Negative indices start counting from the
Introduction end of the array
to Python
Python
>>> M [ -2]
programming array (
NumPy [ 0. , 0. , 0.])
Matplotlib
>>> M [ -1]
Introduction
to Pandas array (
Case study [ 0.00661902 , 0.14865659 , 0.75066302])
Conclusion
46 / 115
Array Access - Strided Access
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Example
Introduction
to Python # Strided access
Python
programming
>>> M [::2 ,::2]
NumPy array ([[ 0.37389376 , 0.12435669] ,
Matplotlib [ 0.00661902 , 0.75066302]])
Introduction
to Pandas
Case study
Conclusion
47 / 115
Array Operations - Scalar
Introduction
to Python
Pandas for
These operation are applied to all the elements in the array
Data
Analytics Example
Srijith
Rajamohan
>>> M *2
Introduction array ([
to Python
[ 0.74778752 , 1.28671443 , 0.24871338] ,
Python
programming [ 0. , 0. , 0. ],
NumPy [ 0.01323804 , 0.29731317 , 1.50132603]])
Matplotlib >>> M + 2
Introduction
to Pandas
array ([
Case study [ 2.37389376 , 2.64335721 , 2.12435669] ,
Conclusion [ 2. , 2. , 2. ],
[ 2.00661902 , 2.14865659 , 2.75066302]])
48 / 115
Matrix multiplication
Introduction
to Python
Pandas for
Data Example
Analytics
49 / 115
Iterating over Array Elements
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan In general, avoid iteration over elements
Introduction Iterating is slow compared to a vector operation
to Python
If you must, use the for loop
Python
programming In order to enable vectorization, ensure that user-written
NumPy functions can work with vector inputs.
Matplotlib Use the vectorize function
Introduction Use the any or all function with arrays
to Pandas
Case study
Conclusion
50 / 115
Vectorize
Introduction
to Python Example
Pandas for
Data
Analytics >>> def Theta ( x ) :
Srijith
Rajamohan
... """
... Scalar implemenation of the
Introduction
to Python
Heaviside step function .
Python ... """
programming
... if x >= 0:
NumPy
... return 1
Matplotlib
... else :
Introduction
to Pandas ... return 0
Case study ...
Conclusion >>> Theta (1.0)
1
>>> Theta ( -1.0)
0 51 / 115
Vectorize
Introduction
to Python
Pandas for
Data Without vectorize we would not be able to pass v to the
Analytics
function
Srijith
Rajamohan
Example
Introduction
to Python >>> v
Python
programming
array ([1 , 2 , 3 , 4])
NumPy >>> Tvec = vectorize ( Theta )
Matplotlib >>> Tvec ( v )
Introduction array ([1 , 1 , 1 , 1])
to Pandas
>>> Tvec (1.0)
Case study
array (1)
Conclusion
52 / 115
Arrays in conditions
Introduction
to Python
Pandas for
Data
Analytics Use the any or all functions associated with arrays
Srijith
Rajamohan Example
Introduction
to Python >>> v
Python
programming
array ([1 , 2 , 3 , 4])
NumPy
>>> ( v > 3) . any ()
Matplotlib True
Introduction >>> ( v > 3) . all ()
to Pandas
False
Case study
Conclusion
53 / 115
Section 4
Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics
Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy
Matplotlib
5 Introduction to Pandas
Introduction
to Pandas
Case study
6 Case study
Conclusion
7 Conclusion
54 / 115
Matplotlib
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Introduction
Used for generating 2D and 3D scientific plots
to Python
Support for LaTeX
Python
programming Fine-grained control over every aspect
NumPy
Many output file formats including PNG, PDF, SVG, EPS
Matplotlib
Introduction
to Pandas
Case study
Conclusion
55 / 115
Matplotlib - Customize matplotlibrc
Introduction
to Python
Pandas for
Data
Analytics
NumPy
Use matplotlib.matplotlib fname() to determine
Matplotlib from where the current matplotlibrc is loaded
Introduction Customization options can be found at
to Pandas
Case study
http://matplotlib.org/users/customizing.html
Conclusion
56 / 115
Matplotlib
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Matplotlib is the entire library
Introduction Pyplot - a module within Matplotlib that provides access
to Python
Python
to the underlying plotting library
programming
Pylab - a convenience module that combines the
NumPy
functionality of Pyplot with Numpy
Matplotlib
Introduction
Pylab interface convenient for interactive plotting
to Pandas
Case study
Conclusion
57 / 115
Pylab
Introduction
to Python
Pandas for
Example
Data
Analytics
>>> import pylab as pl
Srijith
Rajamohan >>> pl . ioff ()
>>> pl . isinteractive ()
Introduction
to Python False
Python >>> x = [1 ,3 ,7]
programming
NumPy
>>> pl . plot ( x ) # if interactive mode is
Matplotlib
off use show () after the plot command
Introduction
[ < matplotlib . lines . Line2D object at 0
to Pandas
x10437a190 >]
Case study
>>> pl . savefig ( fig_test . pdf , dpi =600 ,
Conclusion
format = pdf )
>>> pl . show ()
58 / 115
Pylab
Introduction
to Python
Pandas for
Data
Analytics Simple Pylab plot
7
Srijith
Rajamohan 6
Introduction
to Python 5
Python
programming 4
NumPy
3
Matplotlib
Introduction
to Pandas 2
Case study
1
Conclusion 0.0 0.5 1.0 1.5 2.0
59 / 115
Pylab
Introduction
to Python Example
Pandas for
Data
Analytics >>> X = np . linspace ( - np . pi , np . pi , 256 ,
Srijith
Rajamohan
endpoint = True )
>>> C , S = np . cos ( X ) , np . sin ( X )
Introduction
to Python
# Plot cosine with a blue continuous line
Python of width 1 ( pixels )
programming
>>> pl . plot (X , C , color = " blue " , linewidth
NumPy
=1.0 , linestyle = " -" )
Matplotlib
>>> pl . xlabel ( " X " ) ; pl . ylabel ( " Y " )
Introduction
to Pandas >>> pl . title ( " Sine and Cosine waves " )
Case study # Plot sine with a green continuous line
Conclusion of width 1 ( pixels )
>>> pl . plot (X , S , color = " green " , linewidth
=1.0 , linestyle = " -" )
>>> pl . show () 60 / 115
Pylab
Introduction
to Python
Pandas for
Data
Analytics 1.0 Sine and Cosine waves
Srijith
Rajamohan
0.5
Introduction
to Python
Python
programming 0.0
Y
NumPy
Matplotlib
0.5
Introduction
to Pandas
Case study
1.0
Conclusion 4 3 2 1 0 1 2 3 4
X
61 / 115
Pylab - subplots
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> pl . figure ( figsize =(8 , 6) , dpi =80)
Introduction
>>> pl . subplot (1 , 2 , 1)
to Python >>> C , S = np . cos ( X ) , np . sin ( X )
Python
programming
>>> pl . plot (X , C , color = " blue " , linewidth
NumPy =1.0 , linestyle = " -" )
Matplotlib >>> pl . subplot (1 , 2 , 2)
Introduction >>> pl . plot (X , S , color = " green " , linewidth
to Pandas
=1.0 , linestyle = " -" )
Case study
>>> pl . show ()
Conclusion
62 / 115
Pylab - subplots
Introduction
to Python
Pandas for
Data
Analytics 1.0 1.0
Srijith
Rajamohan
0.5 0.5
Introduction
to Python
Python
programming 0.0 0.0
NumPy
Matplotlib
0.5 0.5
Introduction
to Pandas
Case study
1.0 1.0
Conclusion 4 3 2 1 0 1 2 3 4 4 3 2 1 0 1 2 3 4
63 / 115
Pyplot
Introduction
to Python Example
Pandas for
Data
Analytics >>> import matplotlib . pyplot as plt
Srijith
Rajamohan
>>> plt . isinteractive ()
False
Introduction
to Python
>>>x = np . linspace (0 , 3* np . pi , 500)
Python >>> plt . plot (x , np . sin ( x **2) )
programming
[ < matplotlib . lines . Line2D object at 0
NumPy
x104bf2b10 >]
Matplotlib
>>> plt . title ( Pyplot plot )
Introduction
to Pandas < matplotlib . text . Text object at 0
Case study x104be4450 >
Conclusion >>> savefig ( fig_test_pyplot . pdf , dpi =600 ,
format = pdf )
>>> plt . show ()
64 / 115
Pyplot
Introduction
to Python
Pandas for
Data
Analytics Pyplot plot
1.0
Srijith
Rajamohan
0.5
Introduction
to Python
Python
programming 0.0
NumPy
Matplotlib
0.5
Introduction
to Pandas
Case study
1.0
Conclusion 0 2 4 6 8 10
65 / 115
Pyplot - legend
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> import matplotlib . pyplot as plt
Introduction
>>> line_up , = plt . plot ([1 ,2 ,3] , label =
to Python Line 2 )
Python
programming
>>> line_down , = plt . plot ([3 ,2 ,1] , label =
NumPy Line 1 )
Matplotlib >>> plt . legend ( handles =[ line_up , line_down
Introduction ])
to Pandas
< matplotlib . legend . Legend at 0 x1084cc950 >
Case study
>>> plt . show ()
Conclusion
66 / 115
Pyplot - legend
Introduction
to Python
Pandas for
Data
Analytics 3.0
Line 2
Srijith
Rajamohan
Line 1
2.5
Introduction
to Python
Python
programming 2.0
NumPy
Matplotlib
1.5
Introduction
to Pandas
Case study
1.0
Conclusion 0.0 0.5 1.0 1.5 2.0
67 / 115
Pyplot - 3D plots
Introduction
to Python
Pandas for
Data Surface plots
Analytics
Srijith
Rajamohan
Introduction
to Python
Python
programming
NumPy
Matplotlib
Introduction
to Pandas
Case study
Conclusion
Visit http://matplotlib.org/gallery.html for a gallery of
plots produced by Matplotlib
68 / 115
Section 5
Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics
Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy
Matplotlib
5 Introduction to Pandas
Introduction
to Pandas
Case study
6 Case study
Conclusion
7 Conclusion
69 / 115
What is Pandas?
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Introduction
Pandas is an open source, BSD-licensed library
to Python
High-performance, easy-to-use data structures and data
Python
programming analysis tools
NumPy
Built for the Python programming language.
Matplotlib
Introduction
to Pandas
Case study
Conclusion
70 / 115
Pandas - import modules
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan Example
Introduction
to Python >>> from pandas import DataFrame , read_csv
Python # General syntax to import a library but
programming
no functions :
NumPy
>>> import pandas as pd # this is how I
Matplotlib
usually import pandas
Introduction
to Pandas
Case study
Conclusion
71 / 115
Pandas - Create a dataframe
Introduction
to Python
Pandas for
Example
Data
Analytics
Srijith
Rajamohan >>>d = { one : pd . Series ([1. , 2. , 3.] ,
index =[ a , b , c ]) ,
Introduction
to Python two : pd . Series ([1. , 2. , 3. , 4.] , index
Python =[ a , b , c , d ]) }
programming
NumPy
>>> df = pd . DataFrame ( d )
Matplotlib
>>> df
Introduction
one two
to Pandas
a 1.0 1.0
Case study
b 2.0 2.0
Conclusion
c 3.0 3.0
d NaN 4.0
72 / 115
Pandas - Create a dataframe
Introduction
to Python
Pandas for Example
Data
Analytics
Srijith
Rajamohan
>>> names = [ Bob , Jessica , Mary , John ,
Introduction Mel ]
to Python
>>> births = [968 , 155 , 77 , 578 , 973]
Python
programming # To merge these two lists together we will
NumPy use the zip function .
Matplotlib
Introduction
to Pandas
>>> BabyDataSet = list ( zip ( names , births ) )
Case study
>>> BabyDataSet
Conclusion
[( Bob , 968) , ( Jessica , 155) , ( Mary ,
77) , ( John , 578) , ( Mel , 973) ]
73 / 115
Pandas - Create a data frame and write to a csv file
Introduction
to Python
Pandas for
Data
Analytics
Use the pandas module to create a dataset.
Srijith
Rajamohan
Example
Introduction
to Python
Python
programming >>> df = pd . DataFrame ( data = BabyDataSet ,
NumPy columns =[ Names , Births ])
Matplotlib >>> df . to_csv ( births1880 . csv , index = False ,
Introduction
to Pandas
header = False )
Case study
Conclusion
74 / 115
Pandas - Read data from a file
Introduction
to Python
Pandas for
Data
Analytics Import data from the csv file
Srijith
Rajamohan Example
Introduction
to Python >>> df = pd . read_csv ( filename )
Python
programming
# Don t treat the first row as a header
NumPy
>>> df = pd . read_csv ( Location , header = None )
Matplotlib
# Provide specific names for the columns
Introduction >>> df = pd . read_csv ( Location , names =[
to Pandas
Names , Births ])
Case study
Conclusion
75 / 115
Pandas - Get data types
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
# Check data type of the columns
Introduction >>> df . dtypes
to Python
Python
Names object
programming Births int64
NumPy dtype : object
Matplotlib
# Check data type of Births column
Introduction
to Pandas >>> df . Births . dtype
Case study dtype ( int64 )
Conclusion
76 / 115
Pandas - Take a look at the data
Introduction
to Python
Pandas for Example
Data
Analytics
77 / 115
Pandas - Take a look at the data
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> df . values
Introduction
array ([[ Bob , 968] ,
to Python [ Jessica , 155] ,
Python
programming
[ Mary , 77] ,
NumPy [ John , 578] ,
Matplotlib [ Mel , 973]] , dtype = object )
Introduction
to Pandas
>>> df . index
Case study
Int64Index ([0 , 1 , 2 , 3 , 4] , dtype = int64 )
Conclusion
78 / 115
Pandas - Working on the data
Introduction
to Python
Pandas for
Data
Analytics
Srijith Example
Rajamohan
Conclusion
79 / 115
Pandas - Describe the data
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>> df [ Names ]. unique ()
Introduction
array ([ Mary , Jessica , Bob , John ,
to Python Mel ] , dtype = object )
Python
programming
>>> print ( df [ Names ]. describe () )
NumPy count 1000
Matplotlib unique 5
Introduction top Bob
to Pandas
freq 206
Case study
Name : Names , dtype : object
Conclusion
80 / 115
Pandas - Add a column
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan >>>d = [0 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8 ,9]
Introduction
to Python # Create dataframe
Python
programming
>>> df = pd . DataFrame ( d )
NumPy # Name the column
Matplotlib >>> df . columns = [ Rev ]
Introduction # Add another one and set the value in that
to Pandas
column
Case study
>>> df [ NewCol ] = 5
Conclusion
81 / 115
Pandas - Accessing and indexing the data
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
Introduction
# Perform operations on columns
to Python >>> df [ NewCol ] = df [ NewCol ] + 1
Python
programming
# Delete a column
NumPy >>> del df [ NewCol ]
Matplotlib # Edit the index name
Introduction >>>i = [ a , b , c , d , e , f , g , h , i
to Pandas
, j ]
Case study
>>> df . index = i
Conclusion
82 / 115
Pandas - Accessing and indexing the data
Introduction
to Python
Pandas for Example
Data
Analytics
83 / 115
Pandas - Accessing and indexing the data
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan Example
Introduction
to Python # Find based on index value
Python >>> df . at [ a , Rev ]
programming
0
NumPy
>>> df . iat [0 ,0]
Matplotlib
0
Introduction
to Pandas
Case study
Conclusion
84 / 115
Pandas - Accessing and indexing for loc
Introduction
to Python
Pandas for
Data
Analytics
Matplotlib
usual python slices, both the start and the stop are
Introduction included!)
to Pandas
A boolean array
Case study
Conclusion
85 / 115
Pandas - Accessing and indexing for iloc
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
NumPy
A slice object with ints 1:7
Matplotlib
Introduction
to Pandas
Case study
Conclusion
86 / 115
Pandas - Accessing and indexing summarized
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan loc : only work on index
Introduction
iloc : work on position
to Python ix : this is the most general and
Python
programming
supports index and position based
NumPy retrieval
Matplotlib at : get scalar values , it s a very fast
Introduction loc
to Pandas
iat : get scalar values , it s a very fast
Case study
iloc
Conclusion
87 / 115
Pandas - Missing data
Introduction
to Python
Pandas for
How do you deal with data that is missing or contains NaNs
Data
Analytics Example
Srijith
Rajamohan
>>> df = pd . DataFrame ( np . random . randn (5 , 3)
Introduction , index =[ a , c , e , f , h ] ,
to Python
columns =[ one , two , three ])
Python
programming >>> df . loc [ a , two ] = np . nan
NumPy one two three
Matplotlib a -1.192838 NaN -0.337037
Introduction
to Pandas
c 0.110718 -0.016733 -0.137009
Case study e 0.153456 0.266369 -0.064127
Conclusion f 1.709607 -0.424790 -0.792061
h -1.076740 -0.872088 -0.436127
88 / 115
Pandas - Missing data
Introduction
to Python
Pandas for
Data
Analytics
How do you deal with data that is missing or contains NaNs?
Srijith
Rajamohan Example
Introduction >>> df . isnull ()
to Python
Python
one two three
programming
a False True False
NumPy
c False False False
Matplotlib
e False False False
Introduction
to Pandas f False False False
Case study h False False False
Conclusion
89 / 115
Pandas - Missing data
Introduction
to Python
Pandas for
Data
Analytics
You can fill this data in a number of ways.
Srijith
Rajamohan Example
Introduction >>> df . fillna (0)
to Python
Python
one two three
programming
a -1.192838 0.000000 -0.337037
NumPy
c 0.110718 -0.016733 -0.137009
Matplotlib
e 0.153456 0.266369 -0.064127
Introduction
to Pandas f 1.709607 -0.424790 -0.792061
Case study h -1.076740 -0.872088 -0.436127
Conclusion
90 / 115
Pandas - Query the data
Introduction
to Python
Pandas for Also, use the query method where you can embed boolean
Data
Analytics expressions on columns within quotes
Srijith
Rajamohan Example
Introduction
to Python >>> df . query ( one > 0 )
Python one two three
programming
c 0.110718 -0.016733 -0.137009
NumPy
e 0.153456 0.266369 -0.064127
Matplotlib
Introduction
f 1.709607 -0.424790 -0.792061
to Pandas >>> df . query ( one > 0 & two > 0 )
Case study one two three
Conclusion
e 0.153456 0.266369 -0.064127
91 / 115
Pandas - Apply a function
Introduction
to Python
Pandas for
Data
Analytics
Introduction
Example
to Python
Introduction
three 0.727934
to Pandas
Case study
Conclusion
92 / 115
Pandas - Applymap a function
Introduction
to Python
Pandas for
Data You can apply any function to the element wise data in a
Analytics
dataframe
Srijith
Rajamohan
Example
Introduction
to Python >>> df . applymap ( np . sqrt )
Python
programming
one two three
NumPy a NaN NaN NaN
Matplotlib c 0.332742 NaN NaN
Introduction e 0.391735 0.516109 NaN
to Pandas
f 1.307520 NaN NaN
Case study
h NaN NaN NaN
Conclusion
93 / 115
Pandas - Query data
Introduction
to Python
Pandas for
Data Determine if certain values exist in the dataframe
Analytics
Srijith Example
Rajamohan
Introduction
>>>s = pd . Series ( np . arange (5) , index = np .
to Python arange (5) [:: -1] , dtype = int64 )
Python
programming
>>>s . isin ([2 ,4 ,6])
NumPy 4 False
Matplotlib 3 False
Introduction 2 True
to Pandas
1 False
Case study
0 True
Conclusion
94 / 115
Pandas - Query data
Introduction
to Python
Pandas for
Data Use the where method
Analytics
Srijith Example
Rajamohan
Introduction
>>>s = pd . Series ( np . arange (5) , index = np .
to Python arange (5) [:: -1] , dtype = int64 )
Python
programming
>>>s . where (s >3)
NumPy 4 NaN
Matplotlib 3 NaN
Introduction 2 NaN
to Pandas
1 NaN
Case study
0 4
Conclusion
95 / 115
Pandas - Grouping the data
Introduction
to Python
Pandas for
Data
Analytics
Introduction
grouped = obj . groupby ([ key1 , key2 ])
to Pandas
Case study
Conclusion
96 / 115
Pandas - Grouping the data
Introduction
to Python
Pandas for
Data
Analytics Example
Srijith
Rajamohan
Python
foo , bar ,
programming foo , bar , foo , foo ] ,
NumPy B : [ one , one , two , three ,
Matplotlib
two , two , one , three ] ,
Introduction
to Pandas C : np . random . randn (8) ,
Case study D : np . random . randn (8) })
Conclusion
97 / 115
Pandas - Grouping the data
Introduction
to Python
Pandas for
Data Example
Analytics
Srijith
Rajamohan
A B C D
Introduction
to Python 0 foo one 0.469112 -0.861849
Python 1 bar one -0.282863 -2.104569
programming
2 foo two -1.509059 -0.494929
NumPy
3 bar three -1.135632 1.071804
Matplotlib
4 foo two 1.212112 0.721555
Introduction
to Pandas 5 bar two -0.173215 -0.706771
Case study 6 foo one 0.119209 -1.039575
Conclusion 7 foo three -1.044236 0.271860
98 / 115
Pandas - Grouping the data
Introduction
to Python
Pandas for
Data
Analytics
Group by either A or B columns or both
Srijith
Rajamohan
Example
Introduction
to Python
>>> grouped = df . groupby ( A )
Python
programming >>> grouped = df . groupby ([ A , B ])
NumPy # Sorts by default , disable this for
Matplotlib potential speedup
Introduction
to Pandas
>>> grouped = df . groupby ( A , sort = False )
Case study
Conclusion
99 / 115
Pandas - Grouping the data
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan Get statistics for the groups
Introduction Example
to Python
Python
programming >>> grouped . size ()
NumPy >>> grouped . describe ()
Matplotlib >>> grouped . count ()
Introduction
to Pandas
Case study
Conclusion
100 / 115
Pandas - Grouping the data
Introduction
to Python
Pandas for
Print the grouping
Data
Analytics Example
Srijith
Rajamohan
>>> list ( grouped )
Introduction A B C D
to Python
1 bar one -1.303028 -0.932565
Python
programming 3 bar three 0.135601 0.268914
NumPy 5 bar two -0.320369 0.059366)
Matplotlib 0 foo one 1.066805 -1.252834
Introduction
to Pandas
2 foo two -0.180407 1.686709
Case study 4 foo two 0.228522 -0.457232
Conclusion 6 foo one -0.553085 0.512941
7 foo three -0.346510 0.434751) ]
101 / 115
Pandas - Grouping the data
Introduction
to Python
Get the first and last elements of each grouping. Also, apply
Pandas for
Data
the sum function to each column
Analytics
Srijith
Example
Rajamohan
NumPy
foo one 1.066805 -1.252834
Matplotlib
# Similar results can be obtained with g .
Introduction
last ()
to Pandas
>>> grouped . sum ()
Case study
A C D
Conclusion
bar -1.487796 -0.604285
foo 0.215324 0.924336
102 / 115
Pandas - Grouping the data
Introduction
to Python
Pandas for
Data
Analytics
Introduction
Example
to Python
Introduction
foo 0.215324 0.924336
to Pandas
Case study
Conclusion
103 / 115
Pandas - Grouping the data
Introduction
to Python
Pandas for
Data
Analytics Apply multiple functions to a grouped column
Srijith
Rajamohan Example
Introduction
to Python >>> grouped [ C ]. agg ([ np . sum , np . mean ])
Python
programming
NumPy
A sum mean
Matplotlib
Conclusion
104 / 115
Pandas - Grouping the data
Introduction
to Python
Pandas for
Data
Analytics
Introduction
Example
to Python
Introduction
>>> plt . show ()
to Pandas
Case study
Conclusion
105 / 115
Pandas - Grouping the data
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan Apply a transformation to the grouping
Introduction Example
to Python
Python
programming >>>f = lambda x : x *2
NumPy >>> transformed = grouped . transform ( f )
Matplotlib >>> print transformed
Introduction
to Pandas
Case study
Conclusion
106 / 115
Pandas - Grouping the data
Introduction
to Python
Pandas for Apply a filter to select a group based on some criterion.
Data
Analytics
Example
Srijith
Rajamohan
>>> grouped . filter ( lambda x : sum ( x [ C ]) >
Introduction
to Python 0)
Python
programming
A B C D
NumPy
0 foo one 1.066805 -1.252834
Matplotlib
Introduction
2 foo two -0.180407 1.686709
to Pandas 4 foo two 0.228522 -0.457232
Case study 6 foo one -0.553085 0.512941
Conclusion
7 foo three -0.346510 0.434751
107 / 115
Section 6
Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics
Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy
Matplotlib
5 Introduction to Pandas
Introduction
to Pandas
Case study
6 Case study
Conclusion
7 Conclusion
108 / 115
Cost of College
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Introduction
to Python
We are going to analyze the cost of college data scorecard
Python provided by the federal government
programming
https://collegescorecard.ed.gov/data/
NumPy
Matplotlib
Introduction
to Pandas
Case study
Conclusion
109 / 115
Cost of College
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Find the top 10 median 10 year debt
Find the top 10 median earnings
Introduction
to Python
Find the top 10 schools with the best sat scores
Python
programming Find the top 10 best return of investment
NumPy
Find average median earnings per state
Matplotlib
Introduction
Compute the correlation between the SAT scores and
to Pandas median income
Case study
Conclusion
110 / 115
Cost of College
Introduction
to Python
Pandas for
Data
Analytics
Conclusion
111 / 115
Cost of College - Generate metrics and create
interactive visualizations using Bokeh
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Generate metrics and create interactive visualizations
Introduction
to Python using Bokeh
Python Create an interactive chloropleth visualization
programming
Case study
Conclusion
112 / 115
Interactive Chloropleth for querying and
visualization
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Introduction
to Python
Python
programming
NumPy
Matplotlib
Introduction
to Pandas
Case study
Conclusion
113 / 115
Section 7
Introduction
to Python
Pandas for
Data 1 Introduction to Python
Analytics
Srijith
Rajamohan 2 Python programming
Introduction
to Python 3 NumPy
Python
programming
4 Matplotlib
NumPy
Matplotlib
5 Introduction to Pandas
Introduction
to Pandas
Case study
6 Case study
Conclusion
7 Conclusion
114 / 115
Questions
Introduction
to Python
Pandas for
Data
Analytics
Srijith
Rajamohan
Introduction
to Python
Python
Thank you for attending !
programming
NumPy
Matplotlib
Introduction
to Pandas
Case study
Conclusion
115 / 115