Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views43 pages

Python-Unit-4

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 43

Python Programming

Unit-IV
A.Appandairaj MTech CSE
HOD/MCA
Ganadipathy Tulsi’s Jain Engineering College
• Modules: Introduction
• Module Loading and Execution
• Packages
• Making Your Own Module
• The Python Libraries for data processing,
• data mining and visualization
• NUMPY,
• Pandas,
• Matplotlib,
• Plotly-
• Frameworks-
• -Django,
• Flask,
• Web2Py
NumPy
• NumPy is a Python library.
• NumPy is used for working with arrays.
• NumPy is short for "Numerical Python".
• It also has functions for working in domain of
linear algebra, fourier transform, and matrices.
• NumPy was created in 2005 by Travis Oliphant. It
is an open source project and you can use it
freely.
• In Python we have lists that serve the purpose of
arrays, but they are slow to process.
• NumPy aims to provide an array object that is up
to 50x faster than traditional Python lists.
• NumPy is an N-dimensional array type called ndarray.

• The array object in NumPy is called ndarray, it provides a lot of supporting


functions that make working with ndarray very easy.

• Arrays are very frequently used in data science, where speed and
resources are very important.
• There are the following advantages of using NumPy for data analysis.
1. NumPy performs array-oriented computing.

2. It efficiently implements the multidimensional arrays.

3. It performs scientific computations.

4. It is capable of performing Fourier Transform and reshaping the data


stored in multidimensional arrays.

5. NumPy provides the in-built functions for linear algebra and random
number generation.
• NumPy Ndarray
• Ndarray is the n-dimensional array object defined in
the numpy which stores the collection of the similar
type of elements.

• In other words, we can define a ndarray as the


collection of the data type (dtype) objects.

• Creating a ndarray object

• The ndarray object can be created by using the array


routine of the numpy module. For this purpose, we
need to import the numpy.

• >>> a = numpy.array
• >>> numpy.array(object, dtype = None, copy
= True, order = None, subok = False, ndmin =
0)
• a = numpy.array([1, 2, 3])
• a = numpy.array([[1, 2, 3], [4, 5, 6]])
• a = numpy.array([1, 3, 5, 7], complex)
• import numpy as np
• arr = np.array([[1, 2, 3, 4], [4, 5, 6, 7], [9, 10, 11, 23]])
• print(arr.ndim)
• Finding the size of each array element
• The itemsize function is used to get the size of each
array item. It returns the number of bytes taken by
each array element.
import numpy as np
a = np.array([[1,2,3]])
print("Each item contains",a.itemsize,"bytes")
• Finding the data type of each array item
• To check the data type of each array item, the dtype
function is used.
import numpy as np
a = np.array([[1,2,3]])
print("Each item is of the type",a.dtype)

• Finding the shape and size of the array


• To get the shape and size of the array, the size and
shape function associated with the numpy array is
used.
import numpy as np
a = np.array([[1,2,3,4,5,6,7]])
print("Array Size:",a.size)
print("Shape:",a.shape)
• Reshaping the array objects
• By the shape of the array, we mean the
number of rows and columns of a multi-
dimensional array.
• However, the numpy module provides us the
way to reshape the array by changing the
number of rows and columns of the multi-
dimensional array.
import numpy as np
a = np.array([[1,2],[3,4],[5,6]])
print("printing the original array..")
print(a)
a=a.reshape(2,3)
print("printing the reshaped array..")
print(a)

• Slicing in the Array


• Slicing in the NumPy array is the way to extract a range of
elements from an array. Slicing in the array is performed in
the same way as it is performed in the python list.
import numpy as np
a = np.array([[1,2],[3,4],[5,6]])
print(a[0,1])
print(a[2,0])
• Finding the maximum, minimum, and sum of
the array elements
• The NumPy provides the max(), min(), and
sum() functions which are used to find the
maximum, minimum, and sum of the array
elements respectively.
• import numpy as np
• a = np.array([1,2,3,10,15,4])
• print("The array:",a)
• print("The maximum element:",a.max())
• print("The minimum element:",a.min())
• print("The sum of the elements:",a.sum())
• Linspace
• The linspace() function returns the evenly spaced values over the given
interval. The following example returns the 10 evenly separated values
over the given interval 5-15

• import numpy as np
• a=np.linspace(5,15,10)
• print(a)

• Finding square root and standard deviation


• The sqrt() and std() functions associated with the numpy array are used to
find the square root and standard deviation of the array elements
respectively.

• Standard deviation means how much each element of the array varies
from the mean value of the numpy array.
mport numpy as np
a = np.array([[1,2,30],[10,15,4]])
print(np.sqrt(a))
print(np.std(a))
• Arithmetic operations on the array
• The numpy module allows us to perform the
arithmetic operations on multi-dimensional
arrays directly.
• In the following example, the arithmetic
operations are performed on the two multi-
dimensional arrays a and b.
import numpy as np
a = np.array([[1,2,30],[10,15,4]])
b = np.array([[1,2,3],[12, 19, 29]])
print("Sum of array a and b\n",a+b)
print("Product of array a and b\n",a*b)
print("Division of array a and b\n",a/b)
• Array Concatenation
• The numpy provides us with the vertical
stacking and horizontal stacking which allows
us to concatenate two multi-dimensional
arrays vertically or horizontally.
import numpy as np
a = np.array([[1,2,30],[10,15,4]])
b = np.array([[1,2,3],[12, 19, 29]])
print("Arrays vertically concatenated\n",np.vstac
k((a,b)));
print("Arrays horizontally concatenated\n",np.h
stack((a,b)))
• NumPy Array Axis
• A NumPy multi-dimensional array is represented by
the axis where axis-0 represents the columns and
axis-1 represents the rows.
• We can mention the axis to perform row-level or
column-level calculations like the addition of row or
column elements.
• To calculate the maximum element among
each column, the minimum element among
each row, and the addition of all the row
elements,
import numpy as np
a = np.array([[1,2,30],[10,15,4]])
print("The array:",a)
print("The maximum elements of columns:",a.m
ax(axis = 0))
print("The minimum element of rows",a.min(axi
s = 1))
print("The sum of all rows",a.sum(axis = 1))
[ 2 8 18 32 50 72 98]

• NumPy Broadcasting
• In Mathematical operations, we may need to
consider the arrays of different shapes.
NumPy can perform such operations where
the array of different shapes are involved.
import numpy as np import numpy as np
a = np.array([1,2,3,4,5,6,7]) a = np.array([1,2,3,4,5,6,7])
b = np.array([2,4,6,8,10,12,14]) b = np.array([2,4,6,8,10,12,14,19
c = a*b; ])
print(c) c = a*b;
Output: print(c)
[ 2 8 18 32 50 72 98] ValueError: operands could not
be broadcast together with
shapes (7,) (8,)
import numpy as np
a = np.array([[1,2,3,4],[2,4,5,6],[10,
20,39,3]])
b = np.array([2,4,6,8])
print("\nprinting array a..")
print(a)
print("\nprinting array b..")
print(b)
print("\nAdding arrays a and b ..")
c = a + b;
print(c)
• NumPy Array Iteration
import numpy as np
a = np.array([[1,2,3,4],[2,4,5,6],[10,20,39,3]])
print("Printing array:")
print(a);
print("Iterating over the array:")
for x in np.nditer(a):
print(x,end=' ')
• Order of Iteration
• As we know, there are two ways of storing values into the numpy arrays:
1. F-style order
2. C-style order
import numpy as np

a = np.array([[1,2,3,4],[2,4,5,6],[10,20,39,3]])
print("\nPrinting the array:\n")
print(a)
print("\nPrinting the transpose of the array:\n")
at = a.T
print(at)
print("\nIterating over the transposed array\n")

print("\nSorting the transposed array in C-style:\n")


c = at.copy(order = 'C')
print(c)
d = at.copy(order = 'F')

print(d)
• NumPy Mathematical Functions
import numpy as np
arr = np.array([0, 30, 60, 90, 120, 150, 180])
print("\nThe sin value of the angles",end = " ")
print(np.sin(arr * np.pi/180))
print("\nThe cosine value of the angles",end = " ")
print(np.cos(arr * np.pi/180))
print("\nThe tangent value of the angles",end = " ")
print(np.tan(arr * np.pi/180))

import numpy as np
arr = np.array([12.202, 90.23120, 123.020, 23.202])
print("printing the original array values:",end = " ")
print(arr)
print("Array values rounded off to 2 decimal position",np.around(arr, 2))
print("Array values rounded off to -1 decimal position",np.around(arr, -1))
• The numpy.floor() function
import numpy as np
arr = np.array([12.202, 90.23120, 123.020, 23.202])
print(np.floor(arr))
• The numpy.ceil() function
import numpy as np
arr = np.array([12.202, 90.23120, 123.020, 23.202])
print(np.ceil(arr))
• Numpy statistical functions
• import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
print("Array:\n",a)
print("\nMedian of array along axis 0:",np.median(a,0))
print("Mean of array along axis 0:",np.mean(a,0))
print("Average of array along axis 1:",np.average(a,1))
• What Can Pandas Do?
1. Is there a correlation between two or more columns?
2. What is average value?
3. Max value?
4. Min value?
5. Pandas are also able to delete rows that are not relevant, or
contains wrong values, like empty or NULL values. This is
called cleaning the data.
• C:\Users\Your Name>pip install pandas
• import pandas
• import pandas as pd
• import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)
• Python Pandas Series
• The Pandas Series can be defined as a one-dimensional array that
is capable of storing various data types.

• We can easily convert the list, tuple, and dictionary into series
using "series' method.

• The row labels of series are called the index. A Series cannot
contain multiple columns.
• import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
• import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)
• DataFrame
• A Pandas DataFrame is a 2 dimensional data structure, like a 2
dimensional array, or a table with rows and columns.
• import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df)
• Locate Row
• As you can see from the result above, the DataFrame is like a table
with rows and columns.
• Pandas use the loc attribute to return one or more specified row(s)
• print(df.loc[0])
• print(df.loc[[0, 1]])
• df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
• print(df.loc["day2"])
• Pandas Read CSV
• A simple way to store big data sets is to use CSV files (comma
separated files).
• CSV files contains plain text and is a well know format that can
be read by everyone including Pandas.
• import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())
• print(pd.options.display.max_rows)
• Read JSON
• Big data sets are often stored, or extracted as JSON.
• JSON is plain text, but has the format of an object, and is well
known in the world of programming, including Pandas.
• import pandas as pd
df = pd.read_json('data.json')
print(df.to_string())
• Pandas - Cleaning Data
• Data Cleaning
• Data cleaning means fixing bad data in your data set.
• Bad data could be:
• Empty cells
• Data in wrong format
• Wrong data
• Duplicates
• Empty Cells
• Empty cells can potentially give you a wrong result when you
analyze data.
• Remove Rows
• One way to deal with empty cells is to remove rows that contain
empty cells.
• This is usually OK, since data sets can be very big, and removing a
few rows will not have a big impact on the result.
• import pandas as pd
df = pd.read_csv('data.csv')
new_df = df.dropna()
print(new_df.to_string())
• dropna() method returns a new DataFrame, and will not change
the original.
• dropna(inplace = True) will NOT return a new DataFrame, but it
will remove all rows containg NULL values from the original
DataFrame.
• df.dropna(inplace = True)
print(df.to_string())
• Replace Empty Values
• Another way of dealing with empty cells is to insert a new value
instead.
• This way you do not have to delete entire rows just because of
some empty cells.
• The fillna() method allows us to replace empty cells with a value:
• import pandas as pd
df = pd.read_csv('data.csv')
df.fillna(130, inplace = True)
• Replace Only For Specified Columns
• The example above replaces all empty cells in the whole Data
Frame.
• To only replace empty values for one column, specify the column
name for the DataFrame:
• import pandas as pd
df = pd.read_csv('data.csv')
df["Calories"].fillna(130, inplace = True)
• Replace Using Mean, Median, or Mode
• import pandas as pd
df = pd.read_csv('data.csv')
x = df["Calories"].mean()
x = df["Calories"].median()
x = df["Calories"].mode()[0]
df["Calories"].fillna(x, inplace = True)
• Pandas - Fixing Wrong Data
• In our example, it is most likely a typo, and the value should
be "45" instead of "450", and we could just insert "45" in row
7:
• df.loc[7, 'Duration'] = 45
• Removing Duplicates
• print(df.duplicated())
• Pandas - Data Correlations
• A great aspect of the Pandas module is the corr() method.
• The corr() method calculates the relationship between each
column in your data set.
• df.corr()
• Pandas - Plotting
• import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df.plot()
plt.show()
• Scatter Plot
• Specify that you want a scatter plot with the kind argument:
• kind = 'scatter'
• A scatter plot needs an x- and a y-axis.
• In the example below we will use "Duration" for the x-axis and
"Calories" for the y-axis.
• Include the x and y arguments like this:
• x = 'Duration', y = 'Calories'
• df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')
plt.show()
• Histogram
• Use the kind argument to specify that you want a histogram:
• kind = 'hist'
• A histogram needs only one column.
• A histogram shows us the frequency of each interval, e.g. how
many workouts lasted between 50 and 60 minutes?
• In the example below we will use the "Duration" column to
create the histogram:
• df["Duration"].plot(kind = 'hist')
• Matplotlib (Python Plotting Library)
• Human minds are more adaptive for the visual representation
of data rather than textual data.

• We can easily understand things when they are visualized.

• It is better to represent the data through the graph where we


can analyze the data more efficiently and make the specific
decision according to data analysis.

• Before learning the matplotlib, we need to understand data


visualization and why data visualization is important.
• Data Visualization
• Why need data visualization?
• Data visualization can perform below tasks:
1. It identifies areas that need improvement and attention.
2. It clarifies the factors.
3. It helps to understand which product to place where.
4. Predict sales volumes.

• Matplotlib is a Python library which is defined as a multi-platform


data visualization library built on Numpy array. It can be used in
python scripts, shell, web application, and other graphical user
interface toolkit.

• Benefit of Data Visualization


• 1. Building ways of absorbing information
• 2. Visualize relationship and patterns in Businesses
• 3. Take action on the emerging trends faster
• 4. Geological based Visualization
• The General Concept of Matplotlib
• Figure: It is a whole figure which may hold one or more axes
(plots). We can think of a Figure as a canvas that holds plots.

• Axes: A Figure can contain several Axes. It consists of two or


three (in the case of 3D) Axis objects. Each Axes is comprised
of a title, an x-label, and a y-label.

• Axis: Axises are the number of line like objects and


responsible for generating the graph limits.

• Artist: An artist is the all which we see on the graph like Text
objects, Line2D objects, and collection objects. Most Artists
are tied to Axes.
• Basic Example of plotting Graph
• from matplotlib import pyplot as plt
• plt.plot([1,2,3],[4,5,1])
• plt.show()

• from matplotlib import pyplot as plt


• x = [5, 2, 7]
• y = [1, 10, 4]
• plt.plot(x, y)
• plt.title('Line graph')
• plt.ylabel('Y axis')
• plt.xlabel('X axis')
• plt.show()
• from matplotlib import pyplot as plt 'b' Using for the blue
marker with default
• plt.plot([1,2,3,4,5],[1,4,9,16,25]) shape.

• plt.ylabel("y axis") 'ro' Red circle


• plt.xlabel('x axis')
'-g' Green solid line
• plt.show()
'--' A dashed line with
the default color

• from matplotlib import pyplot as plt


• plt.plot([1, 2, 3, 4,5], [1, 4, 9, 16,25], 'ro') '^k:' Black triangle up
markers connected
• plt.axis([0, 6, 0, 20]) by a dotted line
• plt.show()
1. Line graph
from matplotlib import pyplot as plt
x = [4,8,9]
y = [10,12,15]
plt.plot(x,y)
plt.title("Line graph")
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()
2. Bar graphs
from matplotlib import pyplot as plt
players = ['Virat','Rohit','Shikhar','Hardik']
runs = [51,87,45,67]
plt.bar(players,runs,color = 'green')
plt.title('Score Card')
plt.xlabel('Players')
plt.ylabel('Runs')
plt.show()
• 3. Pie Chart
from matplotlib import pyplot as plt
# Pie chart, where the slices will be ordered and plotted counter
-clockwise:
Players = 'Rohit', 'Virat', 'Shikhar', 'Yuvraj'
Runs = [45, 30, 15, 10]
explode = (0.1, 0, 0, 0) # it "explode" the 1st slice
fig1, ax1 = plt.subplots()
ax1.pie(Runs, explode=explode, labels=Players, autopct='%1.1f%
%',
shadow=True, startangle=90)
ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as
a circle.
plt.show()
• 4. Histogram
from matplotlib import pyplot as plt
population_age = [21,53,60,49,25,27,30,42,40,1,2,102,95,8,15,1
05,70,65,55,70,75,60,52,44,43,42,45]
bins = [0,10,20,30,40,50,60,70,80,90,100]
plt.hist(population_age, bins, histtype='bar', rwidth=0.8)
plt.xlabel('age groups')
plt.ylabel('Number of people')
plt.title('Histogram')
plt.show()

You might also like