Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

UNIT 1 PYTHON PROGRAMMING-II

The document provides an overview of Python libraries, focusing on NumPy and Pandas, which are essential for numerical computing and data analysis, respectively. It details methods for creating arrays and data structures, including Series and DataFrames, along with examples of manipulating these structures by adding, deleting, and accessing data. Additionally, it covers attributes of DataFrames and various ways to access rows and columns within them.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

UNIT 1 PYTHON PROGRAMMING-II

The document provides an overview of Python libraries, focusing on NumPy and Pandas, which are essential for numerical computing and data analysis, respectively. It details methods for creating arrays and data structures, including Series and DataFrames, along with examples of manipulating these structures by adding, deleting, and accessing data. Additionally, it covers attributes of DataFrames and various ways to access rows and columns within them.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

UNIT 1: PYTHON PROGRAMMING-II

========================================
Python Libraries : →
Python libraries are collections of pre-written codes that we can use to perform common
tasks, making our programming easier. They are like toolkits that provide functions and
methods to help us avoid writing code from scratch.
Some common libraries that are used in AI →

NumPy Library : →→
NumPy, short for Numerical Python is a powerful library in Python used for numerical
computing. It is a general-purpose array-processing package.
Methods of creating numpy array : →
1) Using List
import numpy as np
L = [56, 32, 21, 78, 90 ]
ar = np.array(L)
print (ar)
O/ P → [56 32 21 78 90 ]
2) Using arange() method
import numpy as np
ar = np.arange(5)
print (ar)
O/ P → [0 1 2 3 4]
ar1 = np.arange(2, 11, 2)
print(ar1)
O/P → [2 4 6 8 10]
3) Using linspace() method
import numpy as np
ar = np.linspace(1, 5, 6)
print (ar)
O/ P →[1. 1.8 2.6 3.4 4.2 5. ]
4) Using ones() method
import numpy as np
ar = np.ones(5)
print (ar)
O/ P → [1. 1. 1. 1. 1.]
5) Using zeros() method
import numpy as np
ar = np.zeros(5)
print (ar)
O/ P → [0. 0. 0. 0. 0.]

6) Using full() method


import numpy as np
ar = np.fill(6, 3)
print (ar)
O/P → [3 3 3 3 3 3]

Creating 2 - D array
import numpy as np
ar = np.arange(10).reshape(2,5)
print (ar)
O/P → [ [0 1 2 3 4]
[5 6 7 8 9] ]

import numpy as np
L = [4, 8, 9, 1, 7, 3, 7, 5]
ar = np.array(L).reshape(2,4)
print(ar)
O/P → [ [4 8 9 1]
[7 3 7 5] ]

Pandas Library : →
It stands for Panel Data System
Pandas is one the most preferred and widely used Library. It offers some very efficient data
structure through which we can store data and perform action on data.
Pandas is a library for data analysis. It consists of high label data manipulation tools for data
analysis.
Pandas generally provides two data structures for manipulating data, They are:
● Series
● Data Frame

Series Data Structure :--->


Pandas Series is a one-dimensional array-like object containing a sequence of values.
Each of these values is associated with a label called index.
A Series data structure has two main components --
1) A collection of actual data
2) An associate data Label

e.g.
Index/ Label Data
0 35
1 56
2 45
3 67
4 78
--------------------------------------
We can also assign user-defined labels to the index : →→

Index/ Label Data


Jan 31
Feb 28
March 31
April 30
May 31
-------------------------------------------
Index/ Label Data
Amit 78
Shivani 98
Rajat 87
Ankit 46
Riya 88

Creating series object by python sequence (list) : →


import pandas as pd
L = [ 3, 5, 7, 8,10 ]
S = pd.Series( L)
print( S )
Output -----> 0 3
1 5
2 7
3 8
4 10
dtype: int64

In the above code, we can also define our own index/label of data.
import pandas as pd
L = [ 3, 5, 7, 8,10 ]
S = pd.Series( L)
S.index = ['I', 'II', 'III', 'IV', 'V' ]
print( S )
Output -----> I 3
II 5
III 7
IV 8
V 10
dtype: int64

** Creating series object through Numpy Array ( list) =====>


import pandas as pd
import numpy as np
N = np.arange(10, 15)
S = pd.Series( N )
print( S )
Output -----> 0 10
1 11
2 12
3 13
4 14
dtype: int64

** Creating series through dictionary


import pandas as pd
d = {'Jan' : 31, 'Feb' : 28, 'March' : 31}
s = pd.Series ( d)
print (s)
Output -------> Jan 31
Feb 28
March 31
dtype : int64
In the above example keys of the dictionary become index/ label, however we can change
our own indexes.

Data Frame data structure


** It is a kind of Pandas data structure that stores data in 2-D form.
** It is 2-D labelled array which is an ordered collection of columns store different kind of
data.
** It consists of three things...
1) Actual data
2) Row index/ label
3) Column index / label

Creation of a DataFrame from dictionary of array/lists:

We can create data frame through 2-D dictionary ----->


In 2-D dictionary, a key has multiple values in the list.
e.g.
import pandas as pd
d = { 'RollNo' : [12, 30, 40, 70],
'Name' : ['Piyush', 'Rajat', 'Hema', 'Rajiv'], "Fees" : [700, 780, 567, 990 ],
'Gender' : ["M", "M", 'F', "M"] }
df = pd.DataFrame( d )
print(df)

Output ------>
RollNo Name Fees Gender
0 12 Piyush 700 M
1 30 Rajat 780 M
2 40 Hema 567 F
3 70 Rajiv 990 M
In above data frame, keys of dictionary become labels for column. By default index for rows
are 0,1,2 .....

Creation of DataFrame from List of Dictionaries


import pandas as pd
listDict = [ {'a’ : 10, 'b’ : 20}, {'a’ : 5,'b' : 10,'c' : 20} ]
a = pd.DataFrame(listDict)
print(a)
Output : →
a b c
0 10 20 NAN
1 5 10 20.0

Adding column / row in a DataFrame ==>


We can add column / row in a given Data Frame ------
Suppose df a DataFrame that has some columns/rows. Now we want to add column and
row.
First Second
A 45 50
B 78 60
C 43 87
D 21 76

Column ===>
df[ 'Third' ] = 80
First Second Third
A 45 50 80
B 78 60 80
C 43 87 80
D 21 76 80

df [ 'Third' ] = df [ 'First' ] + df [ 'Second']


First Second Third
A 45 50 95
B 78 60 138
C 43 87 130
D 21 76 97

Though loc attribute :--------->


df.loc[ : , 'Third' ] = [ 19, 56, 43, 11 ]

First Second Third


A 45 50 19
B 78 60 56
C 43 87 43
D 21 76 11

Note : → If the column already exists in the DataFrame then the assignment
statement will update the values of the already existing column.

Adding row through loc attribute :->


df.loc[ 'E', : ] = 7

First Second
A 45.0 50.0
B 78.0 60.0
C 43.0 87.0
D 21.0 76.0
E 7.0 7.0

df. loc [ 'E' , : ] = [ 56, 32 ]

First Second
A 45.0 50.0
B 78.0 60.0
C 43.0 87.0
D 21.0 76.0
E 56.0 32.0

Deleting column/row from DataFrame ==>


There are some methods through which we can delete row/column :---

del command -------> to delete column


pop() method -------> to delete column
drop() —---------------> to delete both row and column

df is a DataFrame ------>
RN Name Fees Gender
VI 34 Jay 750 M
VII 31 Om 870 M
VIII 30 Geeta 900 F
IX 25 Deepa 650 F
X 21 Yash 1070 M
XI 32 Tanu 865 F
XII 45 Umesh 1200 M

e.g. To delete Name column


del df[ 'Name' ]

e.g. To delete Fees column by pop()


df.pop( 'Fees')
e.g. To delete RN column by drop()
df.drop('RN', axis =1, inplace =True )
In this method, for deleting row and column, we have to define axis for row and column..

Note : →

row axis known as 0


column axis known as 1
inplace =True ----> for saving changes in DataFrame

e.g. To delete XI row...


df.drop( 'XII', axis=0, inplace=True)

e.g. To delete more than one row


df.drop(['X', 'XII'], axis=0, inplace=True)

e.g. To delete more than one column


df.drop(['RN', 'Fees'], axis=1, inplace=True)

Attributes of DataFrame ============>


There are following attributes -----
size =>
It gives no of elements in DataFrame.
dtypes=>
It tells what type of data stored in DataFrame. ( int64, float64, object )
values =>
It returns numpy representation of data of Data Frame
ndim=>
It return no of dimension.
index =>
it returns index of DataFrame.
columns =>
It returns columns of DataFrame
shape =>
It returns no of elements in the dimension and provide result in tuple.
empty =>
It check given data frame is empty or not and gives result in True or False.
T =>
Transpose index and columns
axes =>
It returns a list representation axes of DataFrame.
e.g.

import pandas as pd
import numpy as np
a = np.arange(10, 30, 2).reshape(2, 5)
df = pd.DataFrame(a)
df.index = ["First", "Second" ]
df.columns = ['a', 'b', 'c', 'd', 'e' ]
print(df)
print("-" * 30)

print(df.size)
print("-" * 30)

print(df.dtypes)
print("-" * 30)

print(df.values)
print("-" * 30)

print(df.ndim)
print("-" * 30)

print(df.index)
print("-" * 30)

print(df.columns)
print("-" * 30)

print(df.shape)
print("-" * 30)

print(df.empty)
print("-" * 30)

print(df.axes)

output :---->
a b c d e
First 10 12 14 16 18
Second 20 22 24 26 28
------------------------------
10
------------------------------
a int64
b int64
c int64
d int64
e int64
dtype: object
------------------------------
[[10 12 14 16 18]
[20 22 24 26 28]]
------------------------------
2
------------------------------
Index(['First', 'Second'], dtype='object')
------------------------------
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
------------------------------
(2, 5)
------------------------------
False
-------------------------------
[Index(['First', 'Second'], dtype='object'), Index(['a', 'b', 'c', 'd', 'e'], dtype='object')]

Accessing Column from DataFrame ==>


We can access column(s) through following ways ----
DataFrame["name of column"]
DataFrame.name of column

From above way, we can access single column from DataFrame .

If we want to access multiple columns, then put all columns within square bracket....
Name of DataFrame [ [col1, col2, ...... ] ]
e.g.
import pandas as pd
d = { 'RollNo' : [12, 30, 40, 70],
'Name' : ['Piyush', 'Rajat', 'Hema', 'Rajiv'], "Fees" : [700, 780, 567, 990 ],
'Gender' : ["M", "M", 'F', "M"] }
df = pd.DataFrame( d )
print(df)

Output:-
RollNo Name Fees Gender
0 12 Piyush 700 M
1 30 Rajat 780 M
2 40 Hema 567 F
3 70 Rajiv 990 M

Now we want to show column Name, we can use command as -


df[ 'Name' ]
Output ---> we get result as series object
0 Piyush
1 Rajat
2 Hema --> This is Series object
3 Rajiv
Name: Name, dtype: object

If we use this command as –


df[ ['Name'] ]

Output--> we get result as DataFrame object


Name
0 Piyush
1 Rajat ----> This is DataFrame
2 Hema
3 Rajiv

Now we access more than one columns.


df [ [ 'RollNo', 'Fees', 'Gender' ] ]

Output :-> we get the result as DataFrame object -----


Roll No Fees Gender
0 12 700 M
1 30 780 M
2 40 567 F
3 70 990 M

Accessing individual value from DataFrame ======>


We can access individual value as ---
Here DataFrame is df ---
A B C
I 3 4 5
II 7 8 9
III 2 5 6

Now we want to access 8 from DataFrame ---------


df[ 'B' ][ 'II' ] or df.B[ 'II' ]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Accessing row(s) from DataFrame ===>


We can access row(s) through two attributes ( loc & iloc) and slicing :--

df.loc[ row ]
df.loc[ row, column ]
df.loc[ [ Boolean array/List/Series ] ]
df.loc[ we can define range in row, we can define range in column ]

df.loc[ can multiple row in any order ,can multiple col in any order ]
e.g.
Suppose df is a DataFrame ----
A B C
I 3 4 5
II 7 8 9
III 2 5 6
IV 6 8 3
V 2 1 4
e.g accessing second row ----
df.loc[ 'II' ]

e.g.accessing last three rows.....


df. loc [ 'III' : 'V' ]

e.g. accessing first two rows with last


two columns ------
df. loc[ 'I' : 'II', 'B' : 'C' ]

e.g. accessing second column with help of loc attribute ------


df.loc [ : , 'B' ]

e.g. accessing first, fourth and fifth rows with first & second columns ----
df.loc [ [ 'I', 'IV', 'V' ] , [ 'A', 'C' ] ]

Conditional selection in dataFrame :-


import pandas as pd
d = { 'RN' : [34,31,30,25,21,32,45],
'Name' : ['Jay', 'Om', 'Geeta', 'Deepa', 'Yash', 'Tanu','Umesh'], "Fees" :
[750,870,900,650,1070,865,1200],
'Gender' : ["M", "M", 'F', "F",'M','F','M'] }
df = pd.DataFrame( d )
df.index =['VI','VII','VIII','IX','X','XI','XII']
print(df)
print("-"*31)

Output ====>
RN Name Fees Gender
X 34 Jay 750 M
XII 31 Om 870 M
X 30 Geeta 900 F
XI 25 Deepa 650 F
XII 21 Yash 1070 M
XI 32 Tanu 865 F
XII 45 Umesh 1200 M

e.g. display the rows of male students


con = df['Gender'] == 'M'
print( df.loc [con] )

e.g. display the name of those students whose fees more than 1000
con = df['Fees'] > 1000
print( df. loc [ con, 'Name'])
OR
print( df[con, 'Name]

e.g. display details of those male students whose fees is more than 800 ----->
con = (df['Gender'] == 'M') &
(df['Fees'] > 800 )
print( df.loc [con] )
We can also access row(s) from iloc attributes of DataFrame, through this we use indexes of
row

Import and Export Data between CSV Files and DataFrames : →


CSV files, which stand for Comma-Separated Values, are simple text files used to store
tabular data. Each line in a CSV file represents a row in the table, and each value in the row
is separated by a comma.
Pandas provides powerful tools to read CSV files into Data Frames, which are data
structures that allow us to perform complex operations on the data with ease. This makes
CSV files a go-to format for data scientists and analysts working with Python.

Importing a CSV file to a DataFrame : →


Using the read_csv() function, we can import tabular data from CSV files into pandas
DataFrame
by specifying a parameter value for the file name (e.g. pd.read_csv("filename.csv")).

e.g.
Let us create a DataFrame from the “studentmarks.csv” file.
import pandas as pd
df=pd.read_csv("studentsmarks.csv")
print(df)

Exporting a DataFrame to a CSV file : →


We can use the to_csv() function to save a DataFrame to a text or csv file
e.g.
df.to_csv("resultout.csv",index=False)

Handling Missing Values : →


The two most common strategies for handling missing values explained in this section are:
i) Drop the row having missing values OR
ii) Estimate the missing value

Checking Missing Values


Pandas provide a function isnull() to check whether any value is missing or not in the
DataFrame.
This function checks all attributes and returns True in case that attribute has missing values,
otherwise returns False

Drop Missing Values


Dropping will remove the entire row (object) having the missing value(s). This strategy
reduces
the size of the dataset used in data analysis, hence should be used in case of missing
values on
few objects.
The dropna() function can be used to drop an entire row from the DataFrame

Estimate the missing value


Missing values can be filled by using estimations or approximations e.g a value just before
(or
after) the missing value, average/minimum/maximum of the values of that attribute, etc. In
some
cases, missing values are replaced by zeros (or ones).
The fillna(num) function can be used to replace missing value(s) by the value specified in
num.
For example, fillna(0) replaces missing value by 0. Similarly fillna(1) replaces missing value
by 1.
e.g.
import pandas as pd
import numpy as np
ResultSheet={'Maths':[90,91,97,89,65,93], 'Science’ : [92,81,np.NaN,87,50,88],
'English': [89, 91, 88,78,77,82],
'Hindi': [81, 71, 67,82,np.NaN,89],
'AI': [94, 95, 99,np.NaN,96,99] }
marks = pd.DataFrame(ResultSheet)
marks.index =['Heena','Shefali','Meera','Joseph','Suhana','Bismeet']
print(marks)

Maths Science English Hindi AI


Heena 90 92.0 89 81.0 94.0
Shefali 91 81.0 91 71.0 95.0
Meera 97 NaN 88 67.0 99.0
Joseph 89 87.0 78 82.0 NaN
Suhana 65 50.0 77 NaN 96.0
Bismeet 93 88.0 82 89.0 99.0

check for missing values


print(marks.isnull())
Output : →
Maths Science English Hindi AI
Heena False False False False False
Shefali False False False False False
Meera False True False False False
Joseph False False False False True
Suhana False False False True False
Bismeet False False False False False

To find the total number of NaN in the whole dataset


print(marks.isnull().sum())
Output : →
Maths 0
Science 1
English 0
Hindi 1
AI 1
dtype: int64

print(marks.isnull().sum().sum() )
Output : →
3

apply dropna() for the above case


drop=marks.dropna()
print(drop)
Output : →
Maths Science English Hindi AI
Heena 90 92.0 89 81.0 94.0
Shefali 91 81.0 91 71.0 95.0
Bismeet 93 88.0 82 89.0 99.0

Estimate the missing value


FillZero = marks.fillna(0)
print(FillZero)
Output : →
Maths Science English Hindi AI
Heena 90 92.0 89 81.0 94.0
Shefali 91 81.0 91 71.0 95.0
Meera 97 0.0 88 67.0 99.0
Joseph 89 87.0 78 82.0 0.0
Suhana 65 50.0 77 0.0 96.0
Bismeet 93 88.0 82 89.0 99.0

You might also like