Python For Data Science
Python For Data Science
1
Part I
print ( ’ 1 ’ )
print ( ’ 1 ’ , ’ 2 ’ )
print ( ’ 1 ’ , ’ 2 ’ , ’ 3 ’ )
print ( ’ 1 ’ , ’ 2 ’ , ’ 3 ’ , ’ 4 ’ ) # a f t e r writing t h i s
p r e s s s h i f t e n t e r in your j u p y t e r
notebook
Out:
1
1 2
1 2 3
1 2 3 4
5
6 variables and statements
This may be fun but is a long and tiring way to print such
a pattern. Maybe we will write a short code that can do
the same after some experience. Note that there are various
ways to write code with all of them giving the same output.
y=4
print (y)
Out: 4
2* x # m u l t i p l i c a t i o n
Out: 4.4
x/2 # d i v i s i o n
variables and statements 7
Out: 1.1
x * * 2 #x r a i s e d t o t h e power o f 2
Out: 4.840000000000001
# L e t us p u t them t o g e t h e r h o r i z o n t a l l y and
add some s p a c e b e t w e e n v a l u e s
p r i n t ( 2 * x , ’ ’ , x /2 , ’ ’ , x * * 2 )
Out: 4.4 1.1 4.840000000000001
p r i n t ( x+y , ’ ’ , x*y , ’ ’ , x/y )
Out: 6.2 8.8 0.55
2
USER-DEFINED FUNCTIONS
g( x ) = x2
9
10 user-defined functions
Out: 9
Look at this:
h( x, y) = x + y
The function h is like another machine that eats two vari-
ables x and y and vomits their sum.
h(2, 3) = 2 + 3 = 5
11
12 conditions
p r i n t ( ’ I w i l l help you ’ )
else :
p r i n t ( ’ I w i l l not help you ’ )
Out: I will not help you
you to get a value more than 4 in both the dices when rolled.
This is now your winning condition. Thus the condition can
be restated as Dice A must have value greater than 4 and
Dice B must have value greater than 4 too.
i f A>4 and B > 4 : # n o t i c e t h e and
p r i n t ( ’ You win t h i s d i f f i c u l t game o f
two d i c e s ’ )
else :
p r i n t ( ’ You l o s e ’ )
Out: You win this difficult game of two dices
Out:
H
e
l
l
o
The ’i’ in the code touches each letter of D one by one and
everytime it touches a letter, it executes the print commmand
which prints the string that i touches/stores temporarily.
f o r i in D:
p r i n t ( i , end= ’ ’ )
# Here we i n t e n d t o p r i n t h o r i z o n t a l l y
i n s t e a d o f v e r t i c a l l y as done in
previous c e l l code .
15
16 loops
Out: H e l l o
How about bringing down element by elemet from a list
containing numbers
E = [ 1 , 2 , 3 , 4 , 5 ] # T h i s i s a l i s t . We w i l l
r e a d a b o u t i t more l a t e r
f o r i in E :
print ( i )
Out:
1
2
3
4
5
f o r i in range ( 1 , 4 ) :
f o r j in range ( 1 , 3 ) :
p r i n t ( ’ Hello ’ )
Out:
Hello
Hello
Hello
Hello
Hello
Hello
Let us try to understand why we get 6 hello.
Well, i taken on values 1,2,3. Now when i=1, j takes on
values 1,2. And so on.
i=1 j =1
j =2
i=2 j =1
j =2
i=3 j =1
j =2
17
18 nested-loops
Out:
Hello 1 1
Hello 1 2
Hello 2 1
Hello 2 2
Hello 3 1
Hello 3 2
import pandas as pd
df = pd . DataFrame ( data )
p r i n t ( df )
Out:
i j
0 Hello 1 1
1 Hello 1 2
2 Hello 2 1
nested-loops 19
3 Hello 2 2
4 Hello 3 1
5 Hello 3 2
We will understand the above code later but you can now
see how individual each printed hello is related to its corre-
sponding i and j value.
6
L I S T A N D D I C T I O N A RY
L_NUM= [ 1 , 2 , 3 , 4 ] #L_NUM i s a l i s t
L_STR =[ ’ a ’ , ’ b ’ , ’ c ’ , ’ d ’ ] #L_STR i s a l i s t
too
L_FRUIT_DATA=[ ’ apple ’ , ’ orange ’ , ’mango ’ ] #
L_FRUIT_DATA i s a l i s t o f f r u i t names
L_SALARY_DATA= [ 2 5 0 0 0 , 3 0 0 0 0 , 4 0 0 0 0 , 8 0 0 0 0 ] #
L_SALARY_DATA i s a l i s t o f s a l a r y o f 4
e m p l o y e e s i n a company
21
22 list and dictionary
P Y T H O N & D ATA S C I E N C E
7
NUMPY QUICK
25
26 numpy quick
L=[1 ,2 ,3]
L_1 = [ 4 , 5 , 6 ]
L+L_1
Out: [1, 2, 3, 4, 5, 6]
Well adding two list gives you a third list containing all
values of first and secound list. But we were unable to
actually do the mathematical addition. Numpy will help us.
Convert all list to numpy array and store the (it is optional
if you want to store them). And then add them.
N=np . a r r a y ( L ) # We h a v e c o n v e r t e d l i s t L
i n t o numpy a r r a y and s t o r e them i n N
N_1=np . a r r a y ( L_1 ) # We h a v e c o n v e r t e d l i s t
L_1 i n t o numpy a r r a y and s t o r e them i n
N_1
N+N_1 # WE ADD THE TWO NUMPY ARRAYS HERE
Out: array([5, 7, 9])
ALL GOOD. Now we understand what list can do and what
numpy can do.
8
PA N D A S Q U I C K
27
28 pandas quick
d i c _ 2 ={ ’ Ram_toys ’ : [ 1 , 2 , 3 ] , ’ Shyam_toys ’
: [ 4 , 5 , 6 ] , ’ Radha_toys ’ : [ 7 , 8 , 9 ] } # T h i s i s
a k e y number d i c t i o n a r y .
# Here k e y s a r e ’ Ram_toys ’ , ’ Shyam_toys ’ and
’ R a d h a _ t o y s ’ and t h e y s t o r e o n l y l i s t s .
print ( dic_2 )
Out: ’Ram_toys’: [1, 2, 3], ’Shyam_toys’: [4,
5, 6], ’Radha_toys’: [7, 8, 9]
It will be nice if we will have them in tabular form. This is
where Pandas DataFrame enters and helps. Think of pandas
DataFrame is something that eats dictionary of type 2
pd . DataFrame ( dic_2 , index =[ ’ c a t ’ , ’ dog ’ , ’
dragon ’ ] )
Out:
pandas quick 29
def f ( x ) : r e t u r n 2 * x+3 # h e r e we d e f i n e a
stringht line .
import m a t p l o t l i b . pyplot as p l t
x=np . a r r a y ( [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] ) # t h e
v a l u e s o f x f o r which you want t o p l o t f
( x ) . . . o r y= f ( x ) =2x+3
print ( f ( x ) )
plt . plot ( f ( x ) )
Out: [ 3 5 7 9 11 13 15 17 19 21]
31
32 visualisation toolkits quick
def g ( x ) : r e t u r n x * * 2 # h e r e we d e f i n e a non
l i n e a r curve
import m a t p l o t l i b . pyplot as p l t
x=np . a r r a y ( [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ] ) # t h e
v a l u e s o f x f o r which you want t o p l o t f
( x ) . . . o r y= f ( x ) =2x+3
print (g( x ) )
plt . plot (g( x ) )
Out: [ 0 1 4 9 16 25 36 49 64 81]
visualisation toolkits quick 33
# P l o t many f u n c t i o n s i n o n e . Here i s an
example
plt . plot (g( x ) )
plt . plot ( f ( x ) )
Out:
34 visualisation toolkits quick
B) HISTOGRAMS
Histograms basically show frequency distribution. That is
how many time a particular number appears in a list.
List
=[1 ,1 ,1 ,1 ,1 ,2 ,3 ,4 ,4 ,4 ,4 ,4 ,4 ,4 ,5 ,5 ,5 ,6 ,6 ,6]
p l t . h i s t ( L i s t , width = 0 . 1 ) # I h a v e r e d u c e
t he width h e r e .
p l t . h i s t ( L i s t , width = 1 . 1 ) # I h a v e
i n c r e a s e d th e width h e r e .
Now its your job to understand how width effects the intu-
ition behind the visualisation. Lets move on
Remember our dataframe. Lets give it a name df
df=pd . DataFrame ( dic_2 , index =[ ’ c a t ’ , ’ dog ’ , ’
dragon ’ ] )
# L e t us c a l l i n d i v i d u a l columns .
df [ ’ Ram_toys ’ ]
Out: cat 1 dog 2 dragon 3 Name: Ram_toys, dtype:
int64
df [ ’ Shyam_toys ’ ]
Out: cat 4 dog 5 dragon 6 Name: Shyam_toys,
dtype: int64
df [ ’ Radha_toys ’ ]
visualisation toolkits quick 37
C) BAR PLOT
# L e t us p l o t u s i n g p a n d a s l i b r a r y t o p l o t a
bar graph
df . p l o t . bar ( )
Out:
Kids = l i s t ( df . keys ( ) )
38 visualisation toolkits quick
v a l u e s = l i s t ( df . v a l u e s )
# Here we t o o k t h e k e y s and v a l u e s f r o m d f
and k e p t them i n l i s t named K i d s and
values
Kids
Kids = l i s t ( df . keys ( ) )
v a l u e s = l i s t ( df . v a l u e s )
# Here we t o o k t h e k e y s and v a l u e s f r o m d f
and k e p t them i n l i s t named K i d s and
values
kids
values
Out:
visualisation toolkits quick 39
Here you can ask yourself if the hist plot will make any
sense for dataframe df under consideration
D) SCATTER PLOT
To understand scatter plot let us work with a dataset.
The following data set is a simple dataset. There are two
columns. One is ’Hours’ and other is ’Scores’. The dataset
says basically if a student studies for a particular hour (say
2.5 hours) then he gets a score of 31.
You can put ’Hours’ in x-axis and ’Scores’ in y-axis. Now
let us first see the datset.
#READ DATA USING PANDAS
df=pd . read_csv ( r "C: \ Users\HP\Downloads\
s t u d e n t _ s c o r e s . csv " )
df
Out:
40 visualisation toolkits quick
Let us look at the plot. The scatter plot. We will use seaborn
library.
Out:
visualisation toolkits quick 41
Out:
42 visualisation toolkits quick
sns . p a i r p l o t ( data=df )
Out:
visualisation toolkits quick 43
E) BOX PLOT
df . p l o t . box ( df )
Out:
44 visualisation toolkits quick
Out:
visualisation toolkits quick 45
This is clear that box plot tells you range of values in ’Hours’
column and ’Scores’ column.
df . d e s c r i b e ( )
46 visualisation toolkits quick
Out:
To understand box plot, let us see minimum and maximum
values in both. See that hours range from min 1.1 to max 9.2.
The same is reflected in box plot. The score column range
from min 17 to max 95, and the same is reflected in box plot.
Part III
MUST DO EXERCISES
10
SOME BASIC PROBLEMS
49
50 some basic problems
( x+y ) −2 * x * z/2
Out: -6.5
5. find the square root of x
x**(1/2)
Out: 1.4142135623730951
6. let’s solve a math problem with the help of python.
Q. My brother is 5 years older than me. IF my age is 15, find
my brothers age.
b r o t h e r _ a g e = my_age+5
my_age= 15
brother_age
Out: –––––––––––––––––––––––––––––––––––––-
NameError Traceback (most recent call last)
Cell In[6], line 1
––> 1 brother_age= my_age+5
2 my_age= 15
3 brothe_age
NameError: name ’my_age’ is not defined
Why didn’t it work?
Because python follows your instruction in sequence. It
did not recognise "my_age" in 1st line because you have
introduced it in the 2nd line.
Let’s see if swapping the order works or not.
my_age= 15
b r o t h e r _ a g e = my_age+5
brother_age
Out: 20
7. what is the type of " brothe_age"?
type ( b r o t h e r _ a g e )
some basic problems 51
Out: int
8. print your brother’s age in a sentence.
p r i n t ( " My b r o t h e r ’ s age i s " + s t r (
brother_age ) ) # Print t a k e s s t r i n g s only
as a part o f sentence
Out: My brother’s age is 20
9. Input a number and write if it is even or odd
n= i n t ( input ( " E n t e r a number : " ) )
i f n%2==0 :
p r i n t ( " I t i s an even number " )
else :
p r i n t ( " I t i s an odd number " )
Out: Enter a number : 3 It is an odd number
else :
p r i n t ( " You can buy your t i c k e t from
t h e c o u n t e r on r i g h t " )
else :
p r i n t ( " you do not r e q u i r e a t i c k e t .
You can e n t e r t h e museum" )
Out: Enter your age: 6
Enter your gender (male/ female)male
you do not require a ticket. You can enter the
museum
List and Dictionaries
Out: [0, 1, 2, 3]
3. The above way of making the list of location is time
consuming if the list contains say 100 elements. Can you
find an easier way to do this ?
The best way is to use loops. Let us write a code below to
do the same task.
L_Location = [ ]
f o r i in L :
L_Location . append ( L . index ( i ) )
L_Location
Out: [0, 1, 2, 3]
Let us explain ’append’. Say L=[1,2,3,4]. Suppose you want
to add another number 5 to the list. The all you need to do
is
L.append(5)
Now print L. You will see the following:
L=[1,2,3,4,5]
4. Can you write a code to show the evolution of list
L_Location ?
L_Location = [ ]
f o r i in L :
L_Location . append ( L . index ( i ) )
p r i n t ( L_Location )
Out: [0, 1] [0, 1, 2] [0, 1, 2, 3]
Note how the list gets filled after each loop is run success-
fully.
5. Can you slice the list and make new list with limited
number of elements from original list ?
L=[1 ,2 ,3 ,4]
L_1=L [ 0 : 2 ]
L_1
54 some basic problems
Out: [1, 2]
L=[1 ,2 ,3 ,4]
L_2=L [ 0 : 3 ]
L_2
Out: [1, 2, 3]
L=[1 ,2 ,3 ,4]
L_3=L [ 1 : 3 ]
L_3
Out: [2, 3]
L=[1 ,2 ,3 ,4]
L_4=L [ 2 : 3 ]
L_4
Out: [3]
6. Make a list containing number from 0 to 19 ?
J = [ * range ( 2 0 ) ]
J
Out: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19]
7. Make a list containing even number from 20 to 30 ?
J _ 1 = [ * range ( 2 0 , 3 0 , 2 ) ]
J_1
Out: [20, 22, 24, 26, 28]
So here the range is from 20 to 30 with step size 2. However
let us also mention that for loop can also help us with this
task
f o r i in range ( 2 0 , 3 0 , 2 ) :
print ( i )
Out: 20 22 24 26 28
All we need is to append these values to a black list
some basic problems 55
J_2 =[]
f o r i in range ( 2 0 , 3 0 , 2 ) :
J _ 2 . append ( i )
print ( J_2 )
Out: [20, 22, 24, 26, 28]
8. Given two list containing name of flowers what happens
when you add them ?
H_1=[ ’ r o s e ’ , ’ sunflower ’ , ’ marigold ’ ]
H_2=[ ’ l o t u s ’ , ’ j a s m i n e ’ ]
H_1 + H_2
Out: [’rose’, ’sunflower’, ’marigold’, ’lotus’,
’jasmine’]
Thus adding two lists fetches you a bigger list
9. How can list be utilized to represent a 2 by 2 matrix ?
Matrix_2D = [ [ 1 , 2 ] , [ 3 , 4 ] ]
Out: 1 2 3 4
Here we have two loops. The ’i’ brings down all rows and ’j’
bring down all elements in the rows.
12. Given the dictionary dic_1=’a’:1,’b’:2,’c’:3 use this in
pandas series. Then write a code to bring down all keys and
values in it ?
d i c _ 1 ={ ’ a ’ : 1 , ’ b ’ : 2 , ’ c ’ : 3 }
import pandas as pd
df_1=pd . S e r i e s ( d i c _ 1 )
# B r i n g down a l l t h e v a l u e s
f o r i in df_1 . v a l u e s :
print ( i )
Out: 1 2 3
# B r i n g down a l l t h e k e y s
f o r i in df_1 . keys ( ) :
print ( i )
Out: a b c
13. Given dic_2=’a’:[1,2,3],’b’:[4,5,6], use pandas dataframe
to make a dataframe out of this dictionary. Then write code
to bring down the keys and associated lists ?
d i c _ 2 ={ ’ a ’ : [ 1 0 0 , 2 0 0 , 3 0 0 ] , ’ b ’ : [ 4 0 0 , 5 0 0 , 6 0 0 ] }
df_2=pd . DataFrame ( d i c _ 2 )
df_2
Out:
some basic problems 57
f o r i in df_2 . keys ( ) :
print ( i )
Out: a b
Basically, ’a’ and ’b’ are just column name of the table.
Now let us bring down the column values (list values)
df_2 [ ’ a ’ ]
df_2 [ ’ b ’ ]
df_2 . i l o c [ 1 ]
df_2 . i l o c [ 2 ]
df_2 . l o c [ 0 , ’ a ’ ]
58 some basic problems
Out: 100
Note that the value 100 can be located exactly if the index
value (to the left) and the column value (on the top) is
known. The code goes like this :
df_2.loc[index value, column name].
11
L E T ’ S P L AY W I T H D ATA
import pandas as pd
import numpy as np
import seaborn as sns
import m a t p l o t l i b . pyplot as p l t
Out:
59
60 let’s play with data
# B r i n g o u t t h e h o u r s column f r o m t h e
dataframe
df [ ’ Hours ’ ]
Out: 0 2.5
1 5.1
2 3.2
3 8.5
4 3.5
5 1.5
6 9.2
7 5.5
8 8.3
9 2.7
10 7.7
11 5.9
12 4.5
13 3.3
14 1.1
15 8.9
16 2.5
17 1.9
18 6.1
19 7.4
20 2.7
21 4.8
22 3.8
23 6.9
24 7.8
Name: Hours, dtype: float64
Note that the extra values on the left are indices.
# B r i n g o u t t h e v a l u e s f o r Hours w i t h o u t t h e
indices
62 let’s play with data
f o r i in df [ ’ Hours ’ ] :
print ( i )
Out: 2.5
5.1
3.2
8.5
3.5
1.5
9.2
5.5
8.3
2.7
7.7
5.9
4.5
3.3
1.1
8.9
2.5
1.9
6.1
7.4
2.7
4.8
3.8
6.9
7.8
# L e t us c r e a t e a b a l n k l i s t and p u t t h e
values there
H= [ ]
f o r i in df [ ’ Hours ’ ] :
let’s play with data 63
H. append ( i )
p r i n t (H)
Out: [2.5, 5.1, 3.2, 8.5, 3.5, 1.5, 9.2, 5.5,
8.3, 2.7, 7.7, 5.9, 4.5, 3.3, 1.1, 8.9, 2.5, 1.9,
6.1, 7.4, 2.7, 4.8, 3.8, 6.9, 7.8]
#Can we do t h e same i n any o t h e r way . Yes .
Numpy can h e l p
H_1=np . a r r a y ( df [ ’ Hours ’ ] )
p r i n t ( H_1 )
Out: [2.5 5.1 3.2 8.5 3.5 1.5 9.2 5.5 8.3 2.7
7.7 5.9 4.5 3.3 1.1 8.9 2.5 1.9 6.1 7.4 2.7 4.8
3.8 6.9 7.8]
# How a b o u t we t a k e o u t t h o s e v a l u e s o f
h o u r s t h a t f a l l w i t h i n r a n g e 6 t o 9 and
p u t them i n a l i s t
H_condition = [ ]
f o r i in df [ ’ Hours ’ ] :
i f i >6 and i <=9:
H_condition . append ( i )
p r i n t ( H_condition )
Out: [8.5, 8.3, 7.7, 8.9, 6.1, 7.4, 6.9, 7.8]
df . i l o c [ 0 : 3 ]
Out:
Out:
2.5 21
5.1 47
3.2 27
8.5 75
3.5 30
1.5 20
9.2 88
5.5 60
8.3 81
2.7 25
7.7 85
5.9 62
4.5 41
3.3 42
1.1 17
8.9 95
2.5 30
1.9 24
6.1 67
7.4 69
2.7 30
4.8 54
3.8 35
6.9 76
7.8 86