UNIT 1 PYTHON PROGRAMMING-II
UNIT 1 PYTHON PROGRAMMING-II
========================================
Python Libraries : →
Python libraries are collections of pre-written codes that we can use to perform common
tasks, making our programming easier. They are like toolkits that provide functions and
methods to help us avoid writing code from scratch.
Some common libraries that are used in AI →
NumPy Library : →→
NumPy, short for Numerical Python is a powerful library in Python used for numerical
computing. It is a general-purpose array-processing package.
Methods of creating numpy array : →
1) Using List
import numpy as np
L = [56, 32, 21, 78, 90 ]
ar = np.array(L)
print (ar)
O/ P → [56 32 21 78 90 ]
2) Using arange() method
import numpy as np
ar = np.arange(5)
print (ar)
O/ P → [0 1 2 3 4]
ar1 = np.arange(2, 11, 2)
print(ar1)
O/P → [2 4 6 8 10]
3) Using linspace() method
import numpy as np
ar = np.linspace(1, 5, 6)
print (ar)
O/ P →[1. 1.8 2.6 3.4 4.2 5. ]
4) Using ones() method
import numpy as np
ar = np.ones(5)
print (ar)
O/ P → [1. 1. 1. 1. 1.]
5) Using zeros() method
import numpy as np
ar = np.zeros(5)
print (ar)
O/ P → [0. 0. 0. 0. 0.]
Creating 2 - D array
import numpy as np
ar = np.arange(10).reshape(2,5)
print (ar)
O/P → [ [0 1 2 3 4]
[5 6 7 8 9] ]
import numpy as np
L = [4, 8, 9, 1, 7, 3, 7, 5]
ar = np.array(L).reshape(2,4)
print(ar)
O/P → [ [4 8 9 1]
[7 3 7 5] ]
Pandas Library : →
It stands for Panel Data System
Pandas is one the most preferred and widely used Library. It offers some very efficient data
structure through which we can store data and perform action on data.
Pandas is a library for data analysis. It consists of high label data manipulation tools for data
analysis.
Pandas generally provides two data structures for manipulating data, They are:
● Series
● Data Frame
e.g.
Index/ Label Data
0 35
1 56
2 45
3 67
4 78
--------------------------------------
We can also assign user-defined labels to the index : →→
In the above code, we can also define our own index/label of data.
import pandas as pd
L = [ 3, 5, 7, 8,10 ]
S = pd.Series( L)
S.index = ['I', 'II', 'III', 'IV', 'V' ]
print( S )
Output -----> I 3
II 5
III 7
IV 8
V 10
dtype: int64
Output ------>
RollNo Name Fees Gender
0 12 Piyush 700 M
1 30 Rajat 780 M
2 40 Hema 567 F
3 70 Rajiv 990 M
In above data frame, keys of dictionary become labels for column. By default index for rows
are 0,1,2 .....
Column ===>
df[ 'Third' ] = 80
First Second Third
A 45 50 80
B 78 60 80
C 43 87 80
D 21 76 80
Note : → If the column already exists in the DataFrame then the assignment
statement will update the values of the already existing column.
First Second
A 45.0 50.0
B 78.0 60.0
C 43.0 87.0
D 21.0 76.0
E 7.0 7.0
First Second
A 45.0 50.0
B 78.0 60.0
C 43.0 87.0
D 21.0 76.0
E 56.0 32.0
df is a DataFrame ------>
RN Name Fees Gender
VI 34 Jay 750 M
VII 31 Om 870 M
VIII 30 Geeta 900 F
IX 25 Deepa 650 F
X 21 Yash 1070 M
XI 32 Tanu 865 F
XII 45 Umesh 1200 M
Note : →
import pandas as pd
import numpy as np
a = np.arange(10, 30, 2).reshape(2, 5)
df = pd.DataFrame(a)
df.index = ["First", "Second" ]
df.columns = ['a', 'b', 'c', 'd', 'e' ]
print(df)
print("-" * 30)
print(df.size)
print("-" * 30)
print(df.dtypes)
print("-" * 30)
print(df.values)
print("-" * 30)
print(df.ndim)
print("-" * 30)
print(df.index)
print("-" * 30)
print(df.columns)
print("-" * 30)
print(df.shape)
print("-" * 30)
print(df.empty)
print("-" * 30)
print(df.axes)
output :---->
a b c d e
First 10 12 14 16 18
Second 20 22 24 26 28
------------------------------
10
------------------------------
a int64
b int64
c int64
d int64
e int64
dtype: object
------------------------------
[[10 12 14 16 18]
[20 22 24 26 28]]
------------------------------
2
------------------------------
Index(['First', 'Second'], dtype='object')
------------------------------
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
------------------------------
(2, 5)
------------------------------
False
-------------------------------
[Index(['First', 'Second'], dtype='object'), Index(['a', 'b', 'c', 'd', 'e'], dtype='object')]
If we want to access multiple columns, then put all columns within square bracket....
Name of DataFrame [ [col1, col2, ...... ] ]
e.g.
import pandas as pd
d = { 'RollNo' : [12, 30, 40, 70],
'Name' : ['Piyush', 'Rajat', 'Hema', 'Rajiv'], "Fees" : [700, 780, 567, 990 ],
'Gender' : ["M", "M", 'F', "M"] }
df = pd.DataFrame( d )
print(df)
Output:-
RollNo Name Fees Gender
0 12 Piyush 700 M
1 30 Rajat 780 M
2 40 Hema 567 F
3 70 Rajiv 990 M
df.loc[ row ]
df.loc[ row, column ]
df.loc[ [ Boolean array/List/Series ] ]
df.loc[ we can define range in row, we can define range in column ]
df.loc[ can multiple row in any order ,can multiple col in any order ]
e.g.
Suppose df is a DataFrame ----
A B C
I 3 4 5
II 7 8 9
III 2 5 6
IV 6 8 3
V 2 1 4
e.g accessing second row ----
df.loc[ 'II' ]
e.g. accessing first, fourth and fifth rows with first & second columns ----
df.loc [ [ 'I', 'IV', 'V' ] , [ 'A', 'C' ] ]
Output ====>
RN Name Fees Gender
X 34 Jay 750 M
XII 31 Om 870 M
X 30 Geeta 900 F
XI 25 Deepa 650 F
XII 21 Yash 1070 M
XI 32 Tanu 865 F
XII 45 Umesh 1200 M
e.g. display the name of those students whose fees more than 1000
con = df['Fees'] > 1000
print( df. loc [ con, 'Name'])
OR
print( df[con, 'Name]
e.g. display details of those male students whose fees is more than 800 ----->
con = (df['Gender'] == 'M') &
(df['Fees'] > 800 )
print( df.loc [con] )
We can also access row(s) from iloc attributes of DataFrame, through this we use indexes of
row
e.g.
Let us create a DataFrame from the “studentmarks.csv” file.
import pandas as pd
df=pd.read_csv("studentsmarks.csv")
print(df)
print(marks.isnull().sum().sum() )
Output : →
3