Unit 7 Python Libraries For Data Science
Unit 7 Python Libraries For Data Science
Unit 7 Python Libraries For Data Science
print('First array:')
print(arr1)
print('\nSecond array:')
arr2 = np.array([12, 12])
print(arr2)
Output:
First array:
[[ 0. 1.]
[ 2. 3.]]
Second array:
[12 12]
Adding the two arrays:
[[ 12. 13.]
[ 14. 15.]]
Subtracting the two arrays:
[[-12. -11.]
[-10. -9.]]
Multiplying the two arrays:
[[ 0. 12.]
[ 24. 36.]]
Dividing the two arrays:
[[ 0. 0.08333333]
[ 0.16666667 0.25 ]]
numpy.reciprocal() This function returns the reciprocal of argument, element-
wise. For elements with absolute values larger than 1, the result is always 0 and
for integer 0, overflow warning is issued. Example:
Output
Our array is:
[ 25. 1.33 1. 1. 100. ]
After applying reciprocal function:
[ 0.04 0.7518797 1. 1. 0.01 ]
The second array is:
[25]
After applying reciprocal function:[0]
numpy.power() This function treats elements in the first input array as the
base and returns it raised to the power of the corresponding element in the
second input array.
Output:
First array is:
[ 5 10 15]
Applying power function:
[ 25 100 225]
Second array is:
[1 2 3]
Applying power function again:
[ 5 100 3375]
numpy.mod() This function returns the remainder of division of the
corresponding elements in the input array. The function numpy.remainder()
also produces the same result.
Output:
First array:
[ 5 15 20]
Second array:
[2 5 9]
Applying mod() function:
[1 0 2]
Applying remainder() function:
[1 0 2]
7.1.3 N-dimensional Array Processing
Numpy is mainly used for working with n-dimensional arrays. Numpy arrays are
homogeneous, meaning all elements must be of the same data type. They can
have any number of dimensions, but most commonly used are 1D, 2D, and 3D
arrays.
1. 1D arrays: These are also known as vectors and are created using the
`np.array()` function.
Example:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
Output:
[1 2 3 4 5]
`
2. 2D arrays: These are also known as matrices and are created using the
`np.array()` function with multiple nested lists.
Example:
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
Output:
[[1 2 3]
[4 5 6]]
3. 3D arrays: These are created using the `np.array()` function with multiple
nested lists.
Example:
import numpy as np
arr = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(arr)
Output:
[[[1 2]
[3 4]]
[[5 6]
[7 8]]]
➢ Numpy also provides a range of functions to create n-dimensional arrays
such as `np.zeros()`, `np.ones()`, `np.eye()`, `np.random.random()`,
`np.empty()` etc.
❖ Example:
import numpy as np
arr1 = np.zeros((2, 3, 4))
arr2 = np.ones((2, 3))
arr3 = np.eye(5)
arr4 = np.random.random((2, 3))
print(arr1)
print(arr2)
print(arr3)
print(arr4)
Output:
[[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]]
[[1. 1. 1.]
[1. 1. 1.]]
[[1. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 0. 0. 1. 0.]
[0. 0. 0. 0. 1.]]
[[0.43407942 0.37427243 0.46211803]
[0.84423743 0.80177559 0.23460201]]
Pandas are also able to delete rows that are not relevant, or contains wrong
values, like empty or NULL values. This is called cleaning the data.
Series:
A Series is a one-dimensional labeled array capable of holding data of any type.
It can be created using a list or array, and it contains both the data and index
labels. The index can be customized to make it easier to work with the data.
Example:
import pandas as pd
# Creating a simple Series
s = pd.Series([1, 2, 3, 4, 5])
# Using a custom index
s = pd.Series([1, 2, 3, 4, 5], index=['a', 'b', 'c', 'd', 'e'])
print(s)
Output:
a 1
b 2
c 3
d 4
e 5
dtype: int64
DataFrame:
❖ Data frameme is a two-dimensional table in which the columns can have
different types. It can be thought of as a dictionary of Series objects
where each Series represents a column. It can be created using lists,
dictionaries, or other DataFrame objects. It also contains both the data
and index labels.
Example:
import pandas as pd
# Creating a simple DataFrame using a dictionary
data = {'name': ['John', 'Jane', 'James', 'Emily'],
'age': [30, 25, 35, 28]}
df = pd.DataFrame(data)
print(df)
Output:
name age
0 John 30
1 Jane 25
2 James 35
3 Emily 28
Pandas provides many built-in functions and methods to work with these
data structures, including but not limited to:
- Importing and Exporting: Pandas supports reading data from and writing data
to many different file formats including CSV, Excel, JSON, SQL databases and
more.
- Selection and Indexing: Pandas supports advanced data selection and
indexing functionality, including Boolean indexing, label-based indexing, and
more.
- Data cleaning and transformation: DataFrames can be manipulated using
built-in or custom functions, and missing data can be addressed using
interpolation or deletion.
- Aggregation and Grouping: Pandas supports aggregation and grouping
functionality including groupby, pivot tables, and cross-tabulation.
Pandas is a powerful tool that makes data analysis tasks easier and more
efficient.
# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]
Output:
The code seems self-explanatory. Following steps were followed:
• Define the x-axis and corresponding y-axis values as lists.
• Plot them on canvas using .plot() function.
• Give a name to x-axis and y-axis using .xlabel() and .ylabel() functions.
• Give a title to your plot using .title() function.
• Finally, to view your plot, we use .show() function.
# line 1 points
x1 = [1,2,3]
y1 = [2,4,1]
# plotting the line 1 points
plt.plot(x1, y1, label = "line 1")
# line 2 points
x2 = [1,2,3]
y2 = [4,1,3]
# plotting the line 2 points
plt.plot(x2, y2, label = "line 2")
Output:
• Here, we plot two lines on the same graph. We differentiate between
them by giving them a name(label) which is passed as an argument of
the .plot() function.
• The small rectangular box giving information about the type of line and
its color is called a legend. We can add a legend to our plot
using .legend() function.
Customization of Plots
Here, we discuss some elementary customizations applicable to almost any
plot.
• Python
# x axis values
x = [1,2,3,4,5,6]
# corresponding y axis values
y = [2,4,1,5,2,6]
Output:
As you can see, we have done several customizations like
• setting the line-width, line-style, line-color.
• setting the marker, marker’s face color, marker’s size.
• overriding the x and y-axis range. If overriding is not done, pyplot
module uses the auto-scale feature to set the axis range and scale.
Bar Chart
• Python
# heights of bars
height = [10, 24, 36, 40, 5]
Output :
• Here, we use plt.bar() function to plot a bar chart.
• x-coordinates of the left side of bars are passed along with the heights of
bars.
• you can also give some names to x-axis coordinates by
defining tick_labels
Histogram
• Python
# frequencies
ages = [2,5,70,40,30,45,50,45,43,40,44,
60,7,13,57,18,90,77,32,21,20,40]
# setting the ranges and no. of intervals
range = (0, 100)
bins = 10
# plotting a histogram
plt.hist(ages, bins, range, color = 'green',
histtype = 'bar', rwidth = 0.8)
# x-axis label
plt.xlabel('age')
# frequency label
plt.ylabel('No. of people')
# plot title
plt.title('My histogram')
Output:
• Here, we use plt.hist() function to plot a histogram.
• frequencies are passed as the ages list.
• The range could be set by defining a tuple containing min and max
values.
• The next step is to “bin” the range of values—that is, divide the entire
range of values into a series of intervals—and then count how many
values fall into each interval. Here we have defined bins = 10. So, there
are a total of 100/10 = 10 intervals.
Scatter plot
# x-axis values
x = [1,2,3,4,5,6,7,8,9,10]
# y-axis values
y = [2,4,5,7,6,8,9,11,12,12]
# x-axis label
plt.xlabel('x - axis')
# frequency label
plt.ylabel('y - axis')
# plot title
plt.title('My scatter plot!')
# showing legend
plt.legend()
Output:
• Here, we use plt.scatter() function to plot a scatter plot.
• As a line, we define x and corresponding y-axis values here as well.
• marker argument is used to set the character to use as a marker. Its size
can be defined using the s parameter.
Pie-chart
# defining labels
activities = ['eat', 'sleep', 'work', 'play']
# plotting legend
plt.legend()
• Python
The output
of above program looks like this:
• To set the x-axis values, we use the np.arange() method in which the first
two arguments are for range and the third one for step-wise increment.
The result is a NumPy array.
• To get corresponding y-axis values, we simply use the
predefined np.sin() method on the NumPy array.
• Finally, we plot the points by passing x and y arrays to
the plt.plot() function.
So, in this part, we discussed various types of plots we can create in matplotlib.
There are more plots that haven’t been covered but the most significant ones
are discussed here –
2.Changing plot colors, line styles, and marker styles: You can use the color,
linestyle, and marker parameters in the plot() function to change the color, line
style, and marker style of the plot, respectively. Here's an example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [10, 8, 6, 4, 2]
plt.plot(x, y, color='red', linestyle='--', marker='o')
plt.show()
This changes the color to red, the line style to dashed, and the marker style to circles.
3.Changing plot size and resolution: You can use the figure() function to change
the size and resolution of the plot. Here's an example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [10, 8, 6, 4, 2]
fig = plt.figure(figsize=(8, 6), dpi=100)
plt.plot(x, y)
plt.show()
4.Adding grid lines: You can use the grid() function to add grid lines to the plot.
Here's an example:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [10, 8, 6, 4, 2]
plt.plot(x, y)
plt.grid(True)
plt.show()
This adds grid lines to the plot.
These are just a few examples of the many customization options available in
Matplotlib. By exploring the documentation, you can discover many more ways
to customize your plots.