Module3 Advance Pythonlibraries
Module3 Advance Pythonlibraries
Python Library
• Numpy
• Visualization using matplotlib
Numpy
Introduction to Numpy
2
Contd..
4.NumPy package provides:
1.The whole NumPy library is based on one main object: ndarray
(which stands for N-dimensional array).
2.This object is a multidimensional homogeneous array with a
predetermined number of items: homogeneous because virtually all the
elements/items within it are of the same type and the same size. Changing
the size of the array will create new array and delete original.
3.The data type is specified by another NumPy object called dtype(data-
type); each ndarray is associated with only one type of dtype.
4.Lots of mathematical operations on large number of data can be
performed very efficiently with less line of code.
Example: import numpy as np
list1=[10,20,30,40,50]
array1=np.array(list1)
print(array1)
3
Vectorization:
1. Using NumPy we can perform complex vector
operations using single statements.
2. Example: (To perform addition of 2 Arrays)
C PYTHON
for(i=0 ; i<r ; i++)
{
for(j=0 ; j<c ; j++)
{ C=A+B
C[i][j] = A[i][j] + B[i][j];
}
}
4
Program 1: Write a Python program using NumPy to perform addition
and multiplication operations on two arrays. Initialize two arrays, a
and b, with elements [0, 1, 2, 3] and [11, 12, 13, 14], respectively.
Perform element-wise addition of the arrays and store the result in
array c. Similarly, perform element-wise multiplication of the arrays
and store the result in array d. Finally, print both arrays c and d."
import numpy as np
a=np.array([0,1,2,3])
b=np.array([11,12,13,14])
c=a+b
d=a*b
print(c)
print(d)
Broadcasting:
1. It describes, element-by-element behavior of operations
(Arithmetic, Logical, Bitwise etc.)
2. Program 2: Write a Python program using NumPy to determine
whether the elements of an array are greater than 2 or not.
Initialize an array a with elements [1, 3, 0]. Print a boolean array
indicating whether each element of a is greater than 2.
import numpy as np
a = np.array([1,3,0])
print(a>2) # o/p:- [False, True,
False]
6
Importing NumPy:
1. NumPy is a python package and can be imported like
any other package.
2. Here, are some ways to import NumPy package:
import numpy
import numpy as np
from numpy import*
7
Arrays in NumPy:
8
Creation of NumPyArray: To define a new ndarray, the easiest way is to
use the array() function, passing a Python list containing the elements to
be included in it as an argument.
Arrays can be created in different possible ways:
a. Using List or Tuple
import numpy as np
a = np.array([2,3,4]) #using list, 1-D Array
print(type(a)) #<type ‘numpy.ndarray’>
b. Creating 2D array
a = np.array([[1,3,5],[2,4,6]])
print(a[0][0])
#1
a.ndim #ndim is used to check the dimension of array
9
Contd..
c. Defining the type of array
a = np.array([[1,2],[3,4]],dtype=complex)
print(type(a)) #<type ‘numpy.ndarray’>
[0, 0, 0] ]
10
Contd..
while the ones() function creates an array full of ones in a very similar way.
arr = np.ones((3,3),int)
print(arr) # [ [1,
1, 1],
[1, 1, 1],
[1, 1, 1] ]
e. To create Identity Matrix
arr = np.identity(3, int) # [[1, 0, 0]
[0, 1, 0]
[0, 0, 1]]
11
g. Using arange function: This function returns an ndarray object containing
evenly spaced values within a given range.
The format of the function is as follows −numpy.arange(start, stop, step, dtype)
Program 2: a) create an Numpy array a of 10 elements starting with 0 and ending at 9 using
arange function
b) creates a NumPy array b containing a sequence of numbers starting from 1 and
incrementing by 2 until reaching just before 11.
c) creates an array arr starting from 10 and ending just before 30, with elements
incrementing by 5 each time.
d)creates an array arr starting from 0 and ending just before 2, with elements incrementing
by 0.3 each time.
e)creates a NumPy array x containing floating-point numbers from 0 to 4
a)a = np.arange(10)
print(a)
b)b = np.arange(1,11,2) #start(inclusive), end(exclusive), step
print(b)
c)arr = np.arange(10, 30, 5) #[10, 15, 20, 25]
print(arr)
d)arr = np.arange(0, 2, 0.3) #[0, 0.3, 0.6, 0.9, 1.2,1.5, 1.8]
print(arr)
e)x = np.arange(5, dtype = float) # dtype set
print(x)
13
Array Iteration:
A = np.array([1, 4, 5], int)
for x in A:
print(x, end=“, “) #1, 4, 5
for (x, y) in A:
print(x*y, end=“, “) #2, 12, 30
14
Array Basic Operations
Program 3: Demonstration of basic element-wise
operations(addition, subtraction, multiplication, squaring) on
NumPy arrays
a = np.array([10, 20, 30, 40])
b = np.arange(4) #[0, 1, 2, 3]
c=a+b
print(c)
d=a-b #[10, 19, 28, 37]
print(d)
print(a*b)
print(b**2) #[0, 1, 4, 9]
15
Array Basic Operations (using built-in functions)
Program 4: Given matrices A and B:
A = [1, 1] B = [2, 0]
[0, 1] [3, 4]
Using NumPy, perform the following operations:
1.Element-wise Product: Calculate the element-wise product of
matrices A and B.
2.Matrix Product: Compute the matrix product of matrices A and B.
Provide the results for both operations.
A = np.array ([[1, 1], [0, 1]])
B = np.array ([[2, 0], [3, 4]])
A * B = [2, 0] #Element-wise product
[0, 4]
A.dot(B) = [5, 4] #Matrix product
[3, 4]
16
Array Basic Operations (using built-in functions)
Program 5: Consider matrices A defined as follows:
A = [1, 1]
[0, 1]
Using NumPy, compute the following:
1. The sum of all elements in matrix A.
2. The product of all elements in matrix A.
3. The minimum value in matrix A.
4. The maximum value in matrix A.
5. The shape of matrix A.
a = np.array ([[1, 1], [0, 1]])
print(a)
print(np.sum(a)) # 3, sum of all the elements
print(np.prod(a)) # 1*1*0*1 = 0
print(np.min(a)) #0
print(np.max(a)) #1
print(np.shape(a))
17
Array Basic Operations (using built-in functions)
Program 6a: Demonstrates NumPy operations on arrays such as
concatenation, sorting, finding unique elements, and accessing
diagonal elements of 2D array.
1. To concatenate arrays
a = np.array([1,2])
b = np.array([3,4])
c = np.array([5,6])
print(np.concatenate([a,b,c])) #[1,2,3,4,5,6]
18
2. Sorting of array:
A = np.array([6, 2, 5, -1, 0])
print(np.sort(A))
#[-1, 0, 2, 5, 6]
3. Unique elements:
A = np.array([1, 1, 4, 5, 5, 7])
print(np.unique(A))
#[1, 4, 5, 7]
4. Diagonal elements of 2D array:
A = np.array([[1, 2], [3, 4]])
print(A.diagonal()) #[1, 4]
5. Transpose:
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(A.transpose())
Indexing and Slicing
Indexing
a = np.arange(5)
print(a[0]) #access first element
print(a[4]) #access last element
print(a[-1]) #access last element using negative indexing
a[3] = 10 #assigning new value to the index
print(a)
20
Slicing
a = np.arange(10)
print(a[1:10:2]) # start, end(exclusive), step
b = np.arange(10)
b[5:] = 10 #assigns 10 from index 5 till last
print(b)
c = np.arange(5)
b[5:] = c[::-1] #assigns in reverse order
print(b)
Program 6b: Comparison in Array
1. The output for the comparison will be of Boolean type.
import numpy as np
a = np.array([1, 3, 0], float)
b = np.array([0, 3, 2], float)
print(a > b) #[True,
False, False]
print(np.ceil(a)) #[2, 2, 2]
26
Data Visualization
• Data Visualization
• Results of data analysis is present in the form of
pictures/graphs to the management.
• Helps to predict the future of the company.
• Ex : how much money is spent in the last 5 weeks in purchasing raw
materials?
• How many items are found defective in the production unit?
• Goal of visualization is to communicate information clearly
and efficiently using statistical graphs, plots and diagrams.
Data Visualization – Bar Graph
• Data Visualization – representing data in the
form of pictures or graphs is known as
visualization.
• Use pyplot submodule of matplotlib
• Bar Graph – represents data in
vertical/horizontal bars. Used for comparing
quantities.
• Draw a bar graph between empid and Sal
Program10a: Write a program Using Python and Matplotlib, create a
bar plot to visualize the salary distribution among employees based
on the provided dataset. The dataset consists of employee IDs and
their salaries. Ensure that the plot is labeled as "EMPLOYEE
DATA" and the bars should be colored red, with the x-axis
representing employee IDs and the y-axis representing salaries.
Additionally, include a legend and title as 'Employee Salary
Distribution’ for the plot. Finally, discuss any observations or
insights you can gather from the generated visualization in terms of
salary distribution among employees.
Dataset:
x = [1001, 1003, 1006, 1007, 1009, 1011]
y = [10000, 23000.50, 18000, 16500, 12000.75, 9999.99]
import matplotlib.pyplot as plt
# Provided data
x = [1001,1002,1003,1004,1005,1006]
y = [1000,23000.50,18000.33,16500.5,12000.75,9999.99]
# Creating the bar plot
plt.bar(x, y, label="EMPLOYEE DATA", color='red’)
plt.xlabel('Employee IDs’)
plt.ylabel('Salary’)
plt.title('Employee Salary Distribution’)
plt.legend()
# Displaying the plot
plt.show()
31
Creating two bar graphs
Program10b: Using Python and Matplotlib, create a program to
generate two bar graphs representing the salary distribution for the
Sales and Production departments. The Sales department dataset
includes employee IDs (x) and salaries (y), while the Production
department dataset includes different employee IDs (x1) and salaries
(y1). Each bar graph should display employee IDs on the x-axis and
corresponding salaries on the y-axis. Ensure to label each bar graph
appropriately with the department names. After generating the
visualizations, discuss any insights or observations you can gather
regarding the salary distribution among employees in the Sales and
Production departments.
Dataset:
x = [1001, 1003, 1006, 1007, 1009, 1011]
y = [10000, 23000.50, 18000, 16500, 12000.75, 9999.99]
x1 = [1002, 1004, 1010, 1008, 1014, 1015]
y1 = [5000, 6000, 4500, 12000, 9000, 10000]
import matplotlib.pyplot as plt
#initialize empids and salaries for sales department
x = [1001, 1003, 1006, 1007, 1009, 1011]
y = [10000, 23000.50, 18000, 16500, 12000.75, 9999.99]
#initialize empids and salaries for Production department
x1 = [1002, 1004, 1010, 1008, 1014, 1015]
y1 = [5000, 6000, 4500, 12000, 9000, 10000]
#create bar graphs
plt.bar(x, y, label = 'Sales Department', color ='red’)
plt.bar(x1,y1,label = 'Production Department', color ='blue’)
plt.xlabel('Empids’)
plt.ylabel('Salaries’)
plt.legend()
plt.show()
34
Horizontal Bar Graph
• Use barh() method
35
Histogram
• Histograms show distribution of values
• Shows values grouped into bins or intervals
• X-axis has values
• Y-axis has frequencies
• Ex: show the ages of employees in a group as a
histogram
• plt.hist(………)
36
Program11: Using Python and Matplotlib, create a program to
generate a histogram illustrating the distribution of employee ages
within the company. The x-axis should represent employee ages,
while the y-axis should represent the number of employees falling
within each age group. The histogram bars should be colored green
and appropriately labeled. Additionally, include a legend in the plot.
After generating the visualization, discuss any insights or
observations you can gather regarding the age demographics of the
company's workforce
The age of each employee is as follows: [22, 45, 30, 59, 58, 56, 57, 45,
43, 43, 50, 40, 34, 33, 25, 19].
The bins to be used are [0, 10, 20, 30, 40, 50, 60].
37
import matplotlib.pyplot as plt
emp_ages = [22,45,30,59,58,56,57,45,43,43,50,40,34,33,25,19]
bins = [0,10,20,30,40,50,60]
plt.hist(emp_ages, bins, label='Employee Age Distribution’,
histtype='bar', rwidth = 0.8, color = 'green')
plt.xlabel('Employee Ages')
plt.ylabel('No. of Employees')
plt.title('PU')
plt.legend()
plt.show()
• Most of the employees are between 40 to 59 years.
39
Creating a Pie Chart
• A pie chart shows a circle which is divided into
sectors.
• Each sector represents a proportion of a whole
• Suppose there are four departments in a
company, sales, production, hr and finance.
• The distribution of employees as 50%, 20% 15%
and 15% in the respective departments can be
shown as a pie chart.
40
• plt.pie(slices, labels=depts, colors=cols, startangle=90,
explode=(0,0.2,0,0), shadow = true, autopic =‘%.1f%’)
• labels is a list of labels
• colors = list of colors
• startangle = 90 indicates that the pie chart will start at 90 degrees;
else default it starts at 0 degrees.
• explode = (0,0.2,0,0) indicates whether the slices should stand out or
not. Its value is between 0.1 to 1.
• 0.2 indicates that the second slice should come out slightly.
• shadow = True indicates that the pie chart should be displayed with
a shadow.
• autopct = ‘%.1f%’ indicates how to display the percentages on the
slices.
41
Program 12:Using Python and Matplotlib, create a pie chart to
illustrate the distribution of employees across the Sales, Production,
HR, and Finance departments. Assume that the distribution of
employees in each department is as follows: Sales (50%), Production
(20%), HR (15%), and Finance (15%). You should label each sector
of the pie chart with the corresponding department name and
include a shadow effect for better visualization. Additionally, the
second largest sector (Production) should be slightly exploded to
stand out. After generating the visualization, discuss any insights or
observations you can gather from the pie chart regarding the
distribution of employees across different departments within the
company.
import matplotlib.pyplot as plt
slices = [50,20,15,15]
depts = ['Sales', 'Production', 'HR', 'Finance']
cols = ['magenta', 'cyan', 'brown', 'gold']
plt.pie(slices, labels=depts, colors = cols, startangle = 90,
explode=(0,0.2,0,0), shadow=True, autopct='%.1f')
plt.title('XYZ Company')
plt.legend()
plt.show()
43
Creating a Line Graph
• A line graph is a graph that shows the results in
the form of lines.
• We need x and y coordinates to create a line.
• plt.plot(x,y, ’colorname’)
44
Program 13:Using Python and Matplotlib, create a line graph to
visualize the annual profits of XYZ Company from 2012 to 2017.
The provided dataset includes the years (x-axis) and corresponding
profits in million INR (y-axis). Label the x-axis as "Years" and the y-
axis as "Profit in million INR". Additionally, give the graph a title
"XYZ Company". After generating the visualization, discuss any
insights or observations you can gather from the line graph
regarding the company's profit trends over the six-year period.
Dataset:
years = ['2012', '2013', '2014', '2015', '2016', '2017']
profits = [9,10, 10.5, 8.8, 10.9, 9.75]
import matplotlib.pyplot as plt
years = ['2012', '2013', '2014', '2015', '2016', '2017']
profits = [9,10, 10.5, 8.8, 10.9, 9.75]
plt.plot(years, profits, 'blue')
plt.title('XYZ Company')
plt.xlabel('Years')
plt.ylabel('Profit in million INR')
plt.show()
46
The graph shows that profit was highest in 2016 and lowest in
2015.
47
Data Frame?
• used to represent data in the form of rows and
columns
• Data can be from a file, excel spreadsheet, sequence
in Python(lists and tuples) and dictionaries
• After storing data in the frame, various operations
can be done to analyze and understand it.
• ‘Pandas’ package in Python is used for data analysis
and manipulation.
• ‘xlrd’ package is used to retrieve data from excel
spreadsheets.
• Pandas – name derived from ‘panel data’ –
multidimensional data.
49
Creating Data Frames from .csv files
• Create an excel file and store the following data
• Save the file empdata.csv extension
• Type the following in Jupyter Notebook
import pandas as pd
df = pd.read_csv("C:\Users\Admin\Desktop\PU I
Sem 2019-2020\CSE 317 Prog in Python\Lecture
Slides/empdata.csv")
df
Operations on Data Frame
• To retrieve a range of rows
>>df [2:5]
>>df [: : 2]
>>>df [ 5 : 0 : -1]
• To retrieve column names
>>df.columns
• To retrieve column data
>>df.Empid
>>df[“Empid”]
Operations on Data Frame
• To retrieve data from multiple columns
>>df[[“Empid”, “Ename”]]
• To find minimum and maximum values of a
column
>>df[“Salary”].max()
>>df[“Salary”].min()
• To display statistical information
>>df.describe()
Queries on Data
• To display the details of the employees whose
salary is greater than 20000
>>df[df.Salary > 20000]
• To display only the Empid and Names of the
employees whose salary is greater than 20000
>>df[[“Empid”, “Ename”]] [df.Salary > 20000]
• To get the details of the highest paid employee
>>df[df.Salary == df.Salary.max()]