Python-Unit-4
Python-Unit-4
Python-Unit-4
Unit-IV
A.Appandairaj MTech CSE
HOD/MCA
Ganadipathy Tulsi’s Jain Engineering College
• Modules: Introduction
• Module Loading and Execution
• Packages
• Making Your Own Module
• The Python Libraries for data processing,
• data mining and visualization
• NUMPY,
• Pandas,
• Matplotlib,
• Plotly-
• Frameworks-
• -Django,
• Flask,
• Web2Py
NumPy
• NumPy is a Python library.
• NumPy is used for working with arrays.
• NumPy is short for "Numerical Python".
• It also has functions for working in domain of
linear algebra, fourier transform, and matrices.
• NumPy was created in 2005 by Travis Oliphant. It
is an open source project and you can use it
freely.
• In Python we have lists that serve the purpose of
arrays, but they are slow to process.
• NumPy aims to provide an array object that is up
to 50x faster than traditional Python lists.
• NumPy is an N-dimensional array type called ndarray.
• Arrays are very frequently used in data science, where speed and
resources are very important.
• There are the following advantages of using NumPy for data analysis.
1. NumPy performs array-oriented computing.
5. NumPy provides the in-built functions for linear algebra and random
number generation.
• NumPy Ndarray
• Ndarray is the n-dimensional array object defined in
the numpy which stores the collection of the similar
type of elements.
• >>> a = numpy.array
• >>> numpy.array(object, dtype = None, copy
= True, order = None, subok = False, ndmin =
0)
• a = numpy.array([1, 2, 3])
• a = numpy.array([[1, 2, 3], [4, 5, 6]])
• a = numpy.array([1, 3, 5, 7], complex)
• import numpy as np
• arr = np.array([[1, 2, 3, 4], [4, 5, 6, 7], [9, 10, 11, 23]])
• print(arr.ndim)
• Finding the size of each array element
• The itemsize function is used to get the size of each
array item. It returns the number of bytes taken by
each array element.
import numpy as np
a = np.array([[1,2,3]])
print("Each item contains",a.itemsize,"bytes")
• Finding the data type of each array item
• To check the data type of each array item, the dtype
function is used.
import numpy as np
a = np.array([[1,2,3]])
print("Each item is of the type",a.dtype)
• import numpy as np
• a=np.linspace(5,15,10)
• print(a)
• Standard deviation means how much each element of the array varies
from the mean value of the numpy array.
mport numpy as np
a = np.array([[1,2,30],[10,15,4]])
print(np.sqrt(a))
print(np.std(a))
• Arithmetic operations on the array
• The numpy module allows us to perform the
arithmetic operations on multi-dimensional
arrays directly.
• In the following example, the arithmetic
operations are performed on the two multi-
dimensional arrays a and b.
import numpy as np
a = np.array([[1,2,30],[10,15,4]])
b = np.array([[1,2,3],[12, 19, 29]])
print("Sum of array a and b\n",a+b)
print("Product of array a and b\n",a*b)
print("Division of array a and b\n",a/b)
• Array Concatenation
• The numpy provides us with the vertical
stacking and horizontal stacking which allows
us to concatenate two multi-dimensional
arrays vertically or horizontally.
import numpy as np
a = np.array([[1,2,30],[10,15,4]])
b = np.array([[1,2,3],[12, 19, 29]])
print("Arrays vertically concatenated\n",np.vstac
k((a,b)));
print("Arrays horizontally concatenated\n",np.h
stack((a,b)))
• NumPy Array Axis
• A NumPy multi-dimensional array is represented by
the axis where axis-0 represents the columns and
axis-1 represents the rows.
• We can mention the axis to perform row-level or
column-level calculations like the addition of row or
column elements.
• To calculate the maximum element among
each column, the minimum element among
each row, and the addition of all the row
elements,
import numpy as np
a = np.array([[1,2,30],[10,15,4]])
print("The array:",a)
print("The maximum elements of columns:",a.m
ax(axis = 0))
print("The minimum element of rows",a.min(axi
s = 1))
print("The sum of all rows",a.sum(axis = 1))
[ 2 8 18 32 50 72 98]
• NumPy Broadcasting
• In Mathematical operations, we may need to
consider the arrays of different shapes.
NumPy can perform such operations where
the array of different shapes are involved.
import numpy as np import numpy as np
a = np.array([1,2,3,4,5,6,7]) a = np.array([1,2,3,4,5,6,7])
b = np.array([2,4,6,8,10,12,14]) b = np.array([2,4,6,8,10,12,14,19
c = a*b; ])
print(c) c = a*b;
Output: print(c)
[ 2 8 18 32 50 72 98] ValueError: operands could not
be broadcast together with
shapes (7,) (8,)
import numpy as np
a = np.array([[1,2,3,4],[2,4,5,6],[10,
20,39,3]])
b = np.array([2,4,6,8])
print("\nprinting array a..")
print(a)
print("\nprinting array b..")
print(b)
print("\nAdding arrays a and b ..")
c = a + b;
print(c)
• NumPy Array Iteration
import numpy as np
a = np.array([[1,2,3,4],[2,4,5,6],[10,20,39,3]])
print("Printing array:")
print(a);
print("Iterating over the array:")
for x in np.nditer(a):
print(x,end=' ')
• Order of Iteration
• As we know, there are two ways of storing values into the numpy arrays:
1. F-style order
2. C-style order
import numpy as np
a = np.array([[1,2,3,4],[2,4,5,6],[10,20,39,3]])
print("\nPrinting the array:\n")
print(a)
print("\nPrinting the transpose of the array:\n")
at = a.T
print(at)
print("\nIterating over the transposed array\n")
print(d)
• NumPy Mathematical Functions
import numpy as np
arr = np.array([0, 30, 60, 90, 120, 150, 180])
print("\nThe sin value of the angles",end = " ")
print(np.sin(arr * np.pi/180))
print("\nThe cosine value of the angles",end = " ")
print(np.cos(arr * np.pi/180))
print("\nThe tangent value of the angles",end = " ")
print(np.tan(arr * np.pi/180))
import numpy as np
arr = np.array([12.202, 90.23120, 123.020, 23.202])
print("printing the original array values:",end = " ")
print(arr)
print("Array values rounded off to 2 decimal position",np.around(arr, 2))
print("Array values rounded off to -1 decimal position",np.around(arr, -1))
• The numpy.floor() function
import numpy as np
arr = np.array([12.202, 90.23120, 123.020, 23.202])
print(np.floor(arr))
• The numpy.ceil() function
import numpy as np
arr = np.array([12.202, 90.23120, 123.020, 23.202])
print(np.ceil(arr))
• Numpy statistical functions
• import numpy as np
a = np.array([[1,2,3],[4,5,6],[7,8,9]])
print("Array:\n",a)
print("\nMedian of array along axis 0:",np.median(a,0))
print("Mean of array along axis 0:",np.mean(a,0))
print("Average of array along axis 1:",np.average(a,1))
• What Can Pandas Do?
1. Is there a correlation between two or more columns?
2. What is average value?
3. Max value?
4. Min value?
5. Pandas are also able to delete rows that are not relevant, or
contains wrong values, like empty or NULL values. This is
called cleaning the data.
• C:\Users\Your Name>pip install pandas
• import pandas
• import pandas as pd
• import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)
• Python Pandas Series
• The Pandas Series can be defined as a one-dimensional array that
is capable of storing various data types.
• We can easily convert the list, tuple, and dictionary into series
using "series' method.
• The row labels of series are called the index. A Series cannot
contain multiple columns.
• import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
• import pandas as pd
calories = {"day1": 420, "day2": 380, "day3": 390}
myvar = pd.Series(calories)
print(myvar)
• DataFrame
• A Pandas DataFrame is a 2 dimensional data structure, like a 2
dimensional array, or a table with rows and columns.
• import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
print(df)
• Locate Row
• As you can see from the result above, the DataFrame is like a table
with rows and columns.
• Pandas use the loc attribute to return one or more specified row(s)
• print(df.loc[0])
• print(df.loc[[0, 1]])
• df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
• print(df.loc["day2"])
• Pandas Read CSV
• A simple way to store big data sets is to use CSV files (comma
separated files).
• CSV files contains plain text and is a well know format that can
be read by everyone including Pandas.
• import pandas as pd
df = pd.read_csv('data.csv')
print(df.to_string())
• print(pd.options.display.max_rows)
• Read JSON
• Big data sets are often stored, or extracted as JSON.
• JSON is plain text, but has the format of an object, and is well
known in the world of programming, including Pandas.
• import pandas as pd
df = pd.read_json('data.json')
print(df.to_string())
• Pandas - Cleaning Data
• Data Cleaning
• Data cleaning means fixing bad data in your data set.
• Bad data could be:
• Empty cells
• Data in wrong format
• Wrong data
• Duplicates
• Empty Cells
• Empty cells can potentially give you a wrong result when you
analyze data.
• Remove Rows
• One way to deal with empty cells is to remove rows that contain
empty cells.
• This is usually OK, since data sets can be very big, and removing a
few rows will not have a big impact on the result.
• import pandas as pd
df = pd.read_csv('data.csv')
new_df = df.dropna()
print(new_df.to_string())
• dropna() method returns a new DataFrame, and will not change
the original.
• dropna(inplace = True) will NOT return a new DataFrame, but it
will remove all rows containg NULL values from the original
DataFrame.
• df.dropna(inplace = True)
print(df.to_string())
• Replace Empty Values
• Another way of dealing with empty cells is to insert a new value
instead.
• This way you do not have to delete entire rows just because of
some empty cells.
• The fillna() method allows us to replace empty cells with a value:
• import pandas as pd
df = pd.read_csv('data.csv')
df.fillna(130, inplace = True)
• Replace Only For Specified Columns
• The example above replaces all empty cells in the whole Data
Frame.
• To only replace empty values for one column, specify the column
name for the DataFrame:
• import pandas as pd
df = pd.read_csv('data.csv')
df["Calories"].fillna(130, inplace = True)
• Replace Using Mean, Median, or Mode
• import pandas as pd
df = pd.read_csv('data.csv')
x = df["Calories"].mean()
x = df["Calories"].median()
x = df["Calories"].mode()[0]
df["Calories"].fillna(x, inplace = True)
• Pandas - Fixing Wrong Data
• In our example, it is most likely a typo, and the value should
be "45" instead of "450", and we could just insert "45" in row
7:
• df.loc[7, 'Duration'] = 45
• Removing Duplicates
• print(df.duplicated())
• Pandas - Data Correlations
• A great aspect of the Pandas module is the corr() method.
• The corr() method calculates the relationship between each
column in your data set.
• df.corr()
• Pandas - Plotting
• import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df.plot()
plt.show()
• Scatter Plot
• Specify that you want a scatter plot with the kind argument:
• kind = 'scatter'
• A scatter plot needs an x- and a y-axis.
• In the example below we will use "Duration" for the x-axis and
"Calories" for the y-axis.
• Include the x and y arguments like this:
• x = 'Duration', y = 'Calories'
• df.plot(kind = 'scatter', x = 'Duration', y = 'Calories')
plt.show()
• Histogram
• Use the kind argument to specify that you want a histogram:
• kind = 'hist'
• A histogram needs only one column.
• A histogram shows us the frequency of each interval, e.g. how
many workouts lasted between 50 and 60 minutes?
• In the example below we will use the "Duration" column to
create the histogram:
• df["Duration"].plot(kind = 'hist')
• Matplotlib (Python Plotting Library)
• Human minds are more adaptive for the visual representation
of data rather than textual data.
• Artist: An artist is the all which we see on the graph like Text
objects, Line2D objects, and collection objects. Most Artists
are tied to Axes.
• Basic Example of plotting Graph
• from matplotlib import pyplot as plt
• plt.plot([1,2,3],[4,5,1])
• plt.show()