22am901 Data Science Using Python Unit 2
22am901 Data Science Using Python Unit 2
2
Please read this disclaimer before proceeding:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.
3
22AM901
DATA SCIENCE USING PYTHON
UNIT II
Department : Artificial Intelligence and
Machine learning
Created by : Dr C Ambhika
Date : 18.01.2024
4
1.Table of Contents
S
CONTENTS PAGE NO
NO
1
1 CONTENTS
3
2 COURSE OBJECTIVES
5
3 PRE REQUISITES (COURSE NAMES WITH CODE)
7
4 SYLLABUS (WITH SUBJECT CODE, NAME, LTPC DETAILS)
11
5 COURSE OUTCOMES
13
6 CO- PO/PSO MAPPING
15
7 LECTURE PLAN – UNIT 2
18
8 ACTIVITY BASED LEARNING – UNIT 2
10 ASSIGNMENTS 1 – UNIT 2 89
12 PART B Q s 97
13 PART C Q s 99
1
106
16 ASSESSMENT SCHEDULE
108
17 PRESCRIBED TEXT BOOKS & REFERENCE BOOKS
2
COURSE OBJECTIVES
3
22AM901 – DATA SCIENCE USING PYTHON
2. COURSE OBJECTIVES
4
PREREQUISITE
5
3.PREREQUISITE
6
SYLLABUS
7
4 .SYLLABUS
UNIT I INTRODUCTION
Need for data science – benefits and uses of Data Science and Big Data – facets of data – data
science process – setting the research goal – retrieving data – cleansing, integrating, and
transforming data – exploratory data analysis – build the models – presenting and building
applications
List of Exercise/Experiments:
1. Download, install and explore the features of R/Python for data analytics
• Installing Anaconda
• Basic Operations in Jupiter Notebook
• Basic Data Handling
List of Exercise/Experiments:
8
4 .SYLLABUS
Data manipulation with Pandas – Data Indexing and Selection – Handling missing data –
Hierarchical indexing – Combining datasets – Aggregation and Grouping – String operations –
Working with time series – High performance Pandas.
List of Exercise/Experiments:
1. Perform the fundamental Pandas data structures operations : the Series, DataFrame ,
and Index.
2.Implement the Data Selection Operations
3.Implement the Data indexing operations like: loc, iloc, and ix
4. From the given sample data set perform the operations of handling the missing data like
None,Nan.
5.Manipulate on the operation of Null Vaues (is null(), not null(), dropna(), fillna())
Importing Data into Excel from Different Data Source – Data Cleansing and Preliminary
Data Analysis - Correlations and the importance of Variables Technical requirements -
Implementing Time Series
List of Exercise/Experiments:
9
4 .SYLLABUS
Importing Matplotlib – Simple line plots – Simple scatter plots – visualizing errors – density and
contour plots – Histograms – legends – colors – subplots – text and annotation – customization –
three dimensional plotting - Geographic Data with Basemap - Visualization with Seaborn.
List of Exercise/Experiments:
10
COURSE OUTCOMES
11
5.COURSE OUTCOMES
12
CO – PO/ PSO Mapping
13
6. CO-PO MAPPING
PO’s/PSO’s
COs
PO PO PO PO PO PO PO PO PO PO PO PO PSO PSO PSO
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
3 2 2 - 1 - - - - - - - - 3 3
CO1
3 3 3 - 3 - - - - - - - - 3 3
CO2
3 3 3 - 3 - - - - - - - - 3 3
CO3
3 3 3 - 3 - - - - - - - - 3 3
CO4
3 3 3 - 3 - - - - - - - - 3 3
CO5
1 – Low, 2 – Medium, 3 – Strong
14
LECTURE PLAN
15
7. LECTURE PLAN
Mode
S Topics No Propos Actua Pertain Taxo of
No of ed l in g nomy deliv
peri date Lectu CO level ery
ods re
Date
MD1,
5 Boolean Logic 1 CO2 K3
MD5
MD1,
7 Sorting Arrays 1 CO2 K3
MD5
MD1,
8 Structured Data 1 CO2 K3
MD5
16
LECTURE PLAN – UNIT 2
ASSESSMENT COMPONENTS
AC 1. Unit Test
AC 2. Assignment
AC 3. Course Seminar
AC 4. Course Quiz
AC 5. Case Study
AC 6. Record Work
AC 7. Lab / Mini Project
AC 8. Lab Model Exam
AC 9. Project Review
MODE OF DELEIVERY
MD 1. Oral presentation
MD 2. Tutorial
MD 3. Seminar
MD 4 Hands On
MD 5. Videos
MD 6. Field Visit
17
ACTIVITY BASED LEARNING
18
8. ACTIVITY BASED LEARNING
Activity name:
Creating and Automating an Interactive Dashboard using Python Students will have better
understanding about how the python libraries and other features of python work with any
datasets.
19
LECTURE NOTES
20
9.LECTURE NOTES
UNIT 2
1.Introduction to Numpy
1.1 NumPy (short for Numerical Python).
It provides an efficient interface to store and operate on dense data buffers.NumPy
arrays are faster and more compact than Python lists.NumPy gives you an enormous
range of fast and efficient ways of creating arrays and manipulating numerical data
inside them. NumPy arrays provide much more efficient storage and data operations
as the arrays grow larger in size. N. It also has functions for working in domain of
linear algebra, fourier transform, and matrices.
NumPy arrays are stored at one continuous place in memory unlike lists, so processes
can access and manipulate them very efficiently.This behavior is called locality of
reference in computer science.This is the main reason why NumPy is faster than lists.
Also it is optimized to work with latest CPU architectures.
An array consumes less memory and is convenient to use. NumPy uses much less
memory to store data and it provides a mechanism of specifying the data types. This
allows the code to be optimized even further you can import NumPy.
Users of Python are often drawn in by its ease of use, one piece of which is dynamic
21
typing. While a statically typed language like C or Java requires each variable to be
explicitly declared, a dynamically typed language like Python skips this specification.
result += i;
# Python code
result = 0
for i in range(100):
result += i
Notice the main difference: in C, the data types of each variable are explicitly declared,
while in Python the types are dynamically inferred. This means, for example, that we
can assign any kind of data to any variable:
# Python code x = 4
x = "four"
Here we’ve switched the contents of x from an integer to a string. The same thing in
C would lead (depending on compiler settings) to a compilation error or other
unintended consequences:
/* C code */ int x = 4;
x = "four"; // FAILS
This sort of flexibility is one piece that makes Python and other dynamically typed
languages convenient and easy to use. Understanding how this works is an important
piece of learning to analyze data efficiently and effectively with Python. But what this
22
type flexibility also points to is the fact that Python variables are more than just their
value; they also contain extra information about the type of the value.
The standard Python implementation is written in C. This means that every Python
object is simply a cleverly disguised C structure, which contains not only its value, but
other information as well. For example, when we define an integer in Python, such as
x = 10000, x is not just a “raw” integer. It’s actually a pointer to a compound C
structure, which contains several values. Looking through the source code, we find
that the integer (long) type definition effectively looks like this (once the C macros
are expanded)
struct _longobject
{ long ob_refcnt;
PyTypeObject *ob_type;
size_t ob_size;
long ob_digit[1];
};
• ob_refcnt, a reference count that helps Python silently handle memory allocation and
deallocation
• ob_digit, which contains the actual integer value that we expect the Python variable
to represent
A Python integer is a pointer to a position in memory containing all the Python object
information, including the bytes that contain the integer value. This extra information
23
in the Python integer structure is what allows Python to be coded so freely and
dynamically.
Let’s consider now what happens when we use a Python data structure that holds
many Python objects. The standard mutable multielement container in Python is the
list. We can create a list of integers as follows:
In[1]: L = list(range(10)) L
Out[1]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Out[3]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] In[4]: type(L2[0])
Out[4]: str
Because of Python’s dynamic typing, we can even create heterogeneous lists: In[5]:
L3 = [True, "2", 3.0, 4]
But this flexibility comes at a cost: to allow these flexible types, each item in the list
must contain its own type info, reference count, and other information—that is, each
item is a complete Python object.
Python offers several different options for storing data in efficient, fixed-type data
buffers. The built-in array module (available since Python 3.3) can be used to create
24
dense arrays of a uniform type:
A = array.array('i', L)
Here 'i' is a type code indicating the contents are integers.Much more useful, however,
is the ndarray object of the NumPy package. While Python’s array object provides
efficient storage of array-based data, NumPy adds to this efficient operations on that
data. We will explore these operations in later sections; here we’ll demonstrate several
ways of creating a NumPy array. We’ll start with the standard NumPy import, under
the alias np:
First, we can use np.array to create arrays from Python lists: In[8]: # integer array:
np.array([1, 4, 2, 5, 3])
Remember that unlike Python lists, NumPy is constrained to arrays that all contain
the same type. If types do not match, NumPy will upcast if possible (here, integers
are upcast to floating point):
If we want to explicitly set the data type of the resulting array, we can use the dtype
keyword:
25
Finally, unlike Python lists, NumPy arrays can explicitly be multidimensional; here’s
one way of initializing a multidimensional array using a list of lists:
[4, 5, 6],
[6, 7, 8]])
The inner lists are treated as rows of the resulting two-dimensional array
Especially for larger arrays, it is more efficient to create arrays from scratch using
routines built into NumPy. Here are several examples:
In[12]: # Create a length-10 integer array filled with zeros np.zeros(10, dtype=int)
In[13]: # Create a 3x5 floating-point array filled with 1s np.ones((3, 5), dtype=float)
In[14]: # Create a 3x5 array filled with 3.14 np.full((3, 5), 3.14
In[15]: # Create an array filled with a linear sequence # Starting at 0, ending at 20,
stepping by 2
26
Out[15]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
np.linspace(0, 1, 5)
In[17]: # Create a 3x3 array of uniformly distributed # random values between 0 and
1
np.random.random((3, 3))
In[18]: # Create a 3x3 array of normally distributed random values # with mean 0
and standard deviation 1
In[19]: # Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))
[5, 7, 8],
[0, 5, 0]])
27
[ 0., 0., 1.]])
# memory location
np.empty(3)
np.zeros(10, dtype='int16')
28
2. The Basics of NumPy Arrays
The newer tools like Pandas are built around the NumPy array. NumPy array
manipulation are used to :
● Split
● Reshape
Attributes of arrays
Determining the size, shape, memory consumption, and data types of arrays.
Indexing of arrays
29
Getting and setting the value of individual array elements.
Slicing of arrays
Reshaping of arrays
Combining multiple arrays into one, and splitting one array into many
Output:
x3 ndim: 3 x3
shape: (3, 4, 5) x3
size: 60
Another useful attribute is the dtype, the data type of the array print("dtype:",
x3.dtype)
30
Output:
dtype: int64
Other attributes include itemsize, which lists the size (in bytes) of each array element
and nbytes which lists the total size (in bytes) of the array:
itemsize: 8 bytes
In a one-dimensional array, the ith value (counting from zero) can be accessed by
specifying the desired index in square brackets, just as with Python lists.
x1
Output:
x1[4]
Output:
To index from the end of the array, we can use negative indices.
x1[-1]
Output: 9 X1[-2]
Output:
31
In a multi-dimensional array, items can be accessed using a comma-separated tuple
of indices.
X2
array([[3, 5, 2, 4],
[7, 6, 8, 8],
[1, 6, 7, 7]])
x2[0, 0]
Output:
3 x2[2, 0]
Output:
1 x2[2, - 1]
Output:
Values can also be modified using any of the above index notation:
x2[0, 0] = 12 X2
Output:
array([[12, 5, 2, 4],
[ 7, 6, 8, 8],
[ 1, 6, 7, 7]])
Unlike Python lists, NumPy arrays have a fixed type. That is if we attempt to insert a
floating- point value to an integer array, the value will be truncated.
X1
Output:
32
array([3, 0, 3, 3, 7, 9])
As we can use square brackets to access individual array elements, we can also use
them to access subarrays with the slicenotation, marked by the colon (:) character.
The NumPy slicing syntax follows that of the standard Python list, to access a slice of
an array x:
x[start:stop:step]
If any of these are unspecified, they default to the values start=0, stop=size of
dimension, step=1. We can access sub-arrays in one dimension and in multiple
dimensions:
One-dimensional subarrays
x = np.arange(10)
Output:
array([0, 1, 2, 3, 4])
Output:
array([5, 6, 7, 8, 9])
33
element, starting at index 1
Output:
array([1, 3, 5, 7, 9])
A confusing case is when the step value is negative. In this case, the defaults for start
and stop are swapped. This becomes a convenient way to reverse an array:
array([5, 3, 1])
Multi-dimensional subarrays
Multi-dimensional slices work in the same way, with multiple slices separated by
commas. For example:
x2
Output:
array([[12, 5, 2, 4],
[ 7, 6, 8, 8],
Output:
array([[12, 5, 2],
Output:
array([[12, 2],
34
[ 7, 8],
[ 1, 7]])
x2[::-1, ::-1]
Output:
array([[ 7, 7, 6, 1],
[ 8, 8, 6, 7],
[ 4, 2, 5, 12]])
[12 7 1]
print(x2[0, :]) # first row of x2 row access, the empty slice can be omitted for a more
compact equivalent to x2[0, :]
Output:
[12 5 2 4]
print(x2[0]) # Output:
[12 5 2 4]
Array slices return viewsrather than copiesof the array data. This is one area in
which NumPy array slicing differs from Python list slicing: in lists, slices will be
copies. Consider the two-dimensional array from before:
35
print(x2)
Output:
[[12 5 2 4]
[ 7 6 8 8]
[ 1 6 7 7]]
[[12 5]
[ 7 6]]
Now if we modify this subarray, we'll see that the original array is changed. x2_sub[0,
0] = 99
print(x2_sub)
Output:
[[99 5]
[ 7 6]]
print(x2) Output:
[[99 5 2 4]
[ 7 6 8 8]
[ 1 6 7 7]]
When we work with large datasets, we can access and process pieces of these
datasets without the need to copy the underlying data buffer.
It is sometimes useful to instead explicitly copy the data within an array or a subarray.
This can be most easily done with the copy() method: x2_sub_copy = x2[:2,
36
:2].copy()
print(x2_sub_copy) Output:
[[99 5]
[ 7 6]]
If we now modify this subarray, the original array is not altered: x2_sub_copy[0, 0]
= 42 print(x2_sub_copy) Output:
[[42 5]
[ 7 6]]
print(x2) Output:
[[99 5 2 4]
[ 7 6 8 8]
[ 1 6 7 7]]
MATERIALS:
https://www.youtube.com/watch?v=QUT1VHiLmmI
https://www.youtube.com/watch?v=ZGsLUC49Jns
https://www.youtube.com/watch?v=4-epfRgaiq4
Another useful type of operation is reshaping of arrays. This can be done using the
reshape method. For example, if we want to put the numbers 1 through 9 in a 3×3
grid, we can do the following:
[[1 2 3]
[4 5 6]
37
[7 8 9]]
For this to work, the size of the initial array must match the size of the reshaped array.
Where possible, the reshape method will use a no-copy view of the initial array, but
with non-contiguous memory buffers this is not always the case.
x.reshape((1, 3))
Output:
array([[1, 2, 3]])
array([[1, 2, 3]])
Output:
array([[1],[2],[3])
Output:
array([[1],[2],[3]])
It is also possible to combine multiple arrays into one and to conversely split a single
array into multiple arrays.
Concatenation of arrays
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])
print(np.concatenate([x, y, z]))
Output:
[ 1 2 3 3 2 1 99 99 99]
[4, 5, 6]])
Output:
array([[1, 2, 3],
[4, 5, 6],
[1, 2, 3],
[4, 5, 6]])
Output:
array([[1, 2, 3, 1, 2, 3],
[4, 5, 6, 4, 5, 6]])
39
For arrays of mixed dimensions, we can use the np.vstack (vertical stack) and np.hstack
(horizontal stack) functions: x = np.array([1, 2, 3]) grid = np.array([[9, 8, 7],
[6, 5, 4]])
[9, 8, 7],
[6, 5, 4]])
= np.array([[99],
[99]])
np.hstack([grid,y])
Output:
array([[ 9, 8, 7, 99],
[ 6, 5, 4, 99]])
Splitting of arrays
Output:
40
[1 2 3] [99 99] [3 2 1]
Nsplit-points leads to N+1subarrays. The related functions np.hsplit and np.vsplit are
similar.
Output:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[[0 1 2 3]
[4 5 6 7]]
[[ 8 9 10 11]
[12 13 14 15]]
Output:
[[ 0 1]
[ 4 5]
[ 8 9]
[12 13]]
[[ 2 3]
[ 6 7]
[10 11]
41
[14 15]]
MATERIALS
https://www.youtube.com/watch?v=3osJ59xXAGo
https://www.youtube.com/watch?v=KehyltXMrZE
NumPy provides an easy and flexible interface to optimized computation with arrays of
data.Computation on NumPy arrays can be very fast or it can be very slow. The key to
making it fast is to use vectorized operations, generally
implemented through NumPy's universalfunctions(ufuncs). NumPy's ufuncs
can be used to make repeated calculations on array elements much more efficient.
Python's default implementation (known as CPython) does some operations very slowly.
This is in part due to the dynamic, interpreted nature of the language: the fact that types
are flexible, so that sequences of operations cannot be compiled down to efficient machine
code as in languages like C and Fortran.
Recently there have been various attempts to address this weakness: well-known examples
are the PyPy project, a just-in-time compiled implementation of Python; the
Cython project, which converts Python code to compilable C code; and the Numba
project, which converts snippets of Python code to fast LLVM bytecode. Each of these has
its strengths and weaknesses, but none of the three approaches has yet surpassed the
reach and popularity of the standard CPython engine.
Many small operations are being repeated. That is looping over arrays to operate on each
element. For example, imagine we have an array of values and we'd like to compute the
reciprocal of each:
42
import numpy as np np.random.seed(0)
def compute_reciprocals(values):
output = np.empty(len(values))
for i in range(len(values)):
Output:
If we measure the execution time of this code for a large input, we see that this operation
is very slow.
%timeit compute_reciprocals(big_array)
Output:
It takes several seconds to compute these million operations and to store the result. Each
time the reciprocal is computed, Python first examines the object's type and does a
dynamic lookup of the correct function to use for that type. If we were working in
compiled code instead, this type specification would be known before the code executes
and the result could be computed much more efficiently.
For many types of operations, NumPy provides a convenient interface into statically
typed, compiled routine. This is known as a vectorized operation. This can be
accomplished by simply performing an operation on the array which will then be applied
to each element. This vectorized approach is designed to push the loop into the compiled
layer that underlies NumPy, leading to much faster execution.
43
Compare the results of the following two:
Output:
The execution time for big array shows that it completes orders of magnitude faster
Output:
Ufuncs are extremely flexible, we can also operate between two arrays:
np.arange(5) / np.arange(1, 6)
Output:
And ufunc operations are not limited to one-dimensional arrays, they can also act on multi-
dimensional arrays as well: x = np.arange(9).reshape((3, 3))
2 ** x
Output:
array([[ 1, 2, 4],
[ 8, 16, 32],
Computations using vectorization through ufuncs are more efficient than their counterpart
44
implemented using Python loops, especially as the arrays grow in size. Any time we see
such a loop in a Python script, we should consider whether it can be replaced with a
vectorized expression.
Ufuncs exist in two types: unaryufuncs, which operate on a single input and binary ufuncs,
which operate on two inputs.
Array arithmetic
NumPy's ufuncs make use of Python's native arithmetic operators. The standard addition,
subtraction, multiplication, and division can all be used:
x = np.arange(4)
print("x =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
Output: x = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2] x * 2
= [0 2 4 6] x / 2 = [ 0.0.5 1. 1.5] x // 2 = [0 0 1 1]
There is also a unary ufunc for negation, a ** operator for exponentiation and a % operator
for modulus: print("-x = ", -x) print("x ** 2 = ", x ** 2) print("x % 2 = ", x
% 2)
Output:
45
-x = [ 0 -1 -2 -3]
x ** 2 = [0 1 49]
x % 2 = [0 1 0 1]
-(0.5*x + 1) ** 2
Output:
Each of these arithmetic operations are wrappers around specific functions built into
NumPy. For example, the + operator is a wrapper for the add function:
np.add(x,2}
Output:
array([2, 3, 4, 5])
Absolute value
Output:
array([2, 1, 0, 1, 2])
46
The corresponding NumPy ufunc is np.absolute, which is also available under the alias np.abs:
np.absolute(x)
Output: array([2, 1,
0, 1, 2]) np.abs(x) Output:
array([2, 1, 0, 1, 2])
This ufunc can also handle complex data, in which the absolute value returns the magnitude:
x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x) Output:
Trigonometric functions
NumPy provides a large number of useful ufuncs and some of the most useful for the data
scientist are the trigonometric functions.
theta = np.linspace(0, np.pi, 3) we can compute some trigonometric
Output:
The values are computed to within machine precision, which is why values that should be
zero do not always hit exactly zero. Inverse trigonometric functions are also available:
47
x = [-1, 0, 1] print("x= ", x)
x = [-1, 0, 1]
0.78539816 0. 0.78539816]
Another common type of operation available in a NumPy ufunc are the exponentials: x =
[1, 2, 3] print("x=", x)
Output:
x= [1, 2, 3]
e^x = [ 2.71828183
2^x = [ 2. 4. 8.]
3^x = [ 3 9 27]
The inverse of the exponentials and the logarithms are also there. The basic np.log gives the
natural logarithm if we want to compute the base-2 logarithm or the base10 logarithm. x =
[1, 2, 4, 10] print("x =", x)
48
print("ln(x) =", np.log(x)) print("log2(x)
x = [1, 2, 4, 10]
log2(x) = [ 0. 1. 2. 3.32192809]
There are some specialized versions that are useful for maintaining precision with very small
input:
=", np.log1p(x))
Output:
When x is very small, these functions give more precise values than if the raw np.log or
np.exp were to be used.
Specialized ufuncs
NumPy has ufuncs such as hyperbolic trig functions, bitwise arithmetic, comparison operators,
conversions from radians to degrees, rounding and remainders.
The more specialized and obscure ufuncs is the submodule scipy.special. If we want to
compute some obscure mathematical function on our data, it can be implemented in
scipy.special.
49
gamma(x) =[1.00000000e+00 2.40000000e+01 3.62880000e+05] ln|gamma(x)| = [
0. 3.17805383 12.80182748] beta(x, 2) = [ 0.5 0.03333333 0.00909091]
# Error function (integral of Gaussian) # its complement, and its inverse x = np.array([0,
0.3, 0.7, 1.0]) print("erf(x)
=", special.erf(x))
print("erfc(x) =",special.erfc(x))
print("erfinv(x)=", special.erfinv(x))
Output:
MATERIALS
https://www.youtube.com/watch?v=VuaQKtygva4
https://www.youtube.com/watch?v=kOn2lCrd37w
https://www.youtube.com/watch?v=shi56WRsiM8
Specifying output
For large calculations, it is useful to be able to specify the array where the result of the
calculation will be stored. Rather than creating a temporary array, this can be used to write
computation results directly to the memory location where we want them to be. For all
ufuncs, this can be done using the out argument of the function:
50
x = np.arange(5) y =
Output:
This can even be used with array views. For example, we can write the results of a
Output:
[ 1. 0. 2. 0. 4. 0. 8. 0. 16. 0.]
If we had instead written y[::2] = 2 ** x, this would have resulted in the creation of a
temporary array to hold the results of 2 ** x, followed by a second operation copying those
values into the y array.
1.4 Aggregates
For binary ufuncs, there are aggregates that can be computed directly from the object. For
example, if we'd like to reducean array with a particular operation, we can use the reduce
method of any ufunc. A reduce repeatedly applies a given operation to the elements of an
array until only a single result remains.
For example, calling reduce on the add ufunc returns the sum of all elements in the array:
x = np.arange(1, 6)
np.add.reduce(x) Output:
15
Similarly, calling reduce on the multiply ufunc results in the product of all array elements:
np.multiply.reduce(x) Output:
120
51
To store all the intermediate results of the computation, we can instead use accumulate:
10, 15])
np.multiply.accumulate( x) Output:
Outer products
Finally, any ufunc can compute the output of all pairs of two different inputs using the outer
method. This allows us, in one line, to do things like create a multiplication table:
x = np.arange(1, 6)
np.multiply.outer(x,x)
Output:
array([[ 1, 2, 3, 4, 5],
[ 2, 4, 6, 8, 10],
[ 3, 6, 9, 12, 15],
MATERIALS
https://www.youtube.com/watch?v=PP7NfO5cd-I
https://www.youtube.com/results?search_query=Aggregations+IN+NUMPY+NPTEL
https://www.youtube.com/watch?v=orQuiFokFPM
Aggregations
While working with a large amount of data, a first step is to compute summary statistics for
the data. The most common summary statistics are the mean and standard deviation, which
52
allow us to summarize the "typical" values in a dataset, but other aggregates are also useful
such as the sum, product, median, minimum and maximum, quantiles, etc.
Python computes the sum of all values in an array using the built-in sum function: import
numpy as np L =
np.random.random(100) sum(L)
Output:
55.61209116604941
The syntax is similar to that of NumPy's sum function, and the result is the same in the
simplest case:
Python has built-in min and max functions, used to find the minimum value and maximum
value of any given array: min(big_array), max(big_array)
Output:
(1.1717128136634614e-06, 0.9999976784968716)
NumPy's corresponding functions have similar syntax and operate more quickly:
np.min(big_array), np.max(big_array)
Output:
(1.1717128136634614e-06, 0.9999976784968716)
%timeit min(big_array)
53
1000 loops, best of 3: 497 µs per loop
For min, max, sum, and several other NumPy aggregates, a shorter syntax is to use methods
of the array object itself:
One common type of aggregation operation is an aggregate along a row or column. Say we
have some data stored in a two-dimensional array: M
np.sum(L)
Output:
55.612091166049424
because it executes the operation in compiled code, NumPy's version of the operation is
computed much more quickly: big_array = np.random.rand(1000000)
%timeit sum(big_array)
54
6.0850555667307118
Aggregation functions take an additional argument specifying the axisalong which the aggregate
is computed. For example, we can find the minimum value within each column by specifying
axis=0:
M.min(axis=0)
The function returns four values, corresponding to the four columns of numbers. Similarly, we can find the
maximum value within each row:
M.max(axis=1) Output:
The axis keyword specifies the dimensionofthearraythatwillbecollapsed, rather than the dimension that will
be returned. So specifying axis=0 means that the first axis will be collapsed: for two-dimensional arrays, this
means that values within each column will be aggregated.
Most aggregates have a NaN-safe counterpart that computes the result while ignoring missing values.
The following table provides a list of useful aggregation functions available in NumPy:
55
Example: What is the Average Height of US Presidents?
Aggregates available in NumPy can be useful for summarizing a set of values. As a simple
example, let's consider the heights of all US presidents. This data is available in the file
president_heights.csv, which is a simple comma-separated list of labels and values:
We use the Pandas package to read the file and extract this information
Output:
56
[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
174 183 183 168 170 178 182 180 183 178 182 188 175 179 183 193 182 183
Now that we have this data array, we can compute a variety of summary statistics:
print("Mean height: ", heights.mean())
Output:
In each case, the aggregation operation reduces the entire array to a single summarizing
value, which gives us information about the distribution of values. We can also compute
quantiles:
Median: 182.0
We see that the median height of US presidents is 182 cm. It is more useful to see a visual
representation of this data, which we can accomplish using tools in Matplotlib
57
For example, this code generates the following chart:
%matplotlib inline import matplotlib.pyplot as plt import seaborn; seaborn.set() # set plot
style
Output:
NumPy's universal functions can be used to vectorizeoperations and thereby remove slow
Python loops. Another means of vectorizing operations is to use NumPy's broadcasting
functionality. Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition,
subtraction, multiplication, etc.) on arrays of different sizes.
MATERIALS
https://www.youtube.com/watch?v=oG1t3qlzq14
https://www.youtube.com/watch?v=tuKHsfAehz4
https://www.youtube.com/watch?v=0u9OzBSRZec
58
Introducing Broadcasting
For arrays of the same size, binary operations are performed on an element-by-element
basis: import numpy as np a
= np.array([0, 1, 2]) b
= np.array([5, 5, 5])
a+b
Output:
array([5, 6, 7])
a + 5 Output:
array([5, 6, 7])
We can think of this as an operation that stretches or duplicates the value 5 into the array
[5, 5, 5] and adds the results. The advantage of NumPy's broadcasting is that this
duplication of values does not actually take place, but it is a useful mental model as we
think about broadcasting.
M = np.ones((3, 3)) M
Output:
M + a Output:
59
array([[ 1., 2., 3.],
Here the one-dimensional array a is stretched or broadcast across the second dimension in
order to match the shape of M.
[0 1 2]
[[0]
[1]
[2]]
a + b Output:
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])
Here we have stretched botha and b to match a common shape and the result is a
two-dimensional array. The geometry of these examples is visualized in the following figure
60
61
he l ht o es represent the roadcasted values
s s
ule : If the two arrays d ffer n the r num er of d mens ons the shape of the
one w th fewer d mens ons s paddedw th ones on ts lead n left s de
ule : If the shape of the two arrays does not match n any d mens on the
array w th shape e ual to n that d mens on s stretched to match the other
shape
ule : If n any d mens on the s es d sa ree and ne ther s e ual to an
error s ra sed
s
dd n a two d mens onal array to a one d mens onal
shape
a shape
We see y rule that the array a has fewer d mens ons so we pad t on the left w th
ones:
shape
a shape
y rule we now see that the f rst d mens on d sa rees so we stretch th s d mens on
to match:
shape
a shape
utput:
array
a np aran e reshape
np aran e
a shape
shape
ule says we must pad the shape of w th ones:
a shape
shape
63
Let s cons der an operat on on these two arrays he shape of the arrays are
shape
a shape
We see y rule that the array a has fewer d mens ons so we pad t on the left w th
ones:
shape
a shape
y rule we now see that the f rst d mens on d sa rees so we stretch th s d mens on
to match:
shape
a shape
utput:
array
a np aran e reshape
np aran e
a shape
shape
ule says we must pad the shape of w th ones:
a shape
shape
64
Let s cons der an operat on on these two arrays he shape of the arrays are
shape
a shape
We see y rule that the array a has fewer d mens ons so we pad t on the left w th
ones:
shape
a shape
y rule we now see that the f rst d mens on d sa rees so we stretch th s d mens on
to match:
shape
a shape
utput:
array
a np aran e reshape
np aran e
a shape
shape
ule says we must pad the shape of w th ones:
a shape
shape
65
nd rule tells us that we up rade each of these ones to match the correspond n s e
of the other array:
a shape
shape
ecause the result matches these shapes are compat le We can see th s here:
utput:
array
np ones a np aran e
shape
a shape
shape
a shape
shape
a shape
Now y rule the f nal shapes do not match so these two arrays are ncompat le
a
utput:
a : np newa s shape
utput:
a : np newa s
utput:
array
hese roadcast n rules apply to any nary ufunc or e ample the lo adde p a
utput:
array
s P
s ufuncs allow a NumPy user to remove the need to e pl c tly wr te slow Python loops
roadcast n e tends th s a l ty he e ample s when center n an array of data
ons der an array of o servat ons each of wh ch cons sts of values s n the
standard convent on we w ll store th s n a array:
np random random
67
We can compute the mean of each feature us n the mean a re ate across the f rst
d mens on:
mean mean
mean utput: array
P s
roadcast n s very useful n d splay n ma es ased on two d mens onal funct ons
If we want to def ne a funct on f y roadcast n can e used to compute the
funct on across the r d:
and y have steps from to
np l nspace y
np l nspace : np newa s
np s n np cos y np cos
We w ll use atplotl to plot th s two d mens onal array
matplotl nl ne
mport matplotl pyplotas plt
plt mshow or n lower e tent
cmap v r d s plt color ar
utput:
68
he result s a v sual at on of the two d mens onal funct on
s s s s
as n comes up when you want to e tract mod fy count or otherw se man pulate
values n an array ased on some cr ter on: for e ample you m ht w sh to count all
values reater than a certa n value or perhaps remove all outl ers that are a ove
some threshold In NumPy oolean mas n s often the most eff c ent way to
accompl sh these types of tas s
Ima ne you have a ser es of data that represents the amount of prec p tat on each
day for a year n a ven c ty or e ample here we ll load the da ly ra nfall stat st cs
for the c ty of Seattle n us n Pandas
In : mport numpy as np
mport pandas as pd
69
nches shape
ut :
In : matplotl nl ne
In : plt h st nches
h s h sto ram ves us a eneral dea of what the data loo s l e: desp te ts reputat on
the vast ma or ty of days n Seattle saw near ero measured ra nfall n ut th s
doesn t do a ood o of convey n some nformat on we d l e to see: for e ample how
many ra ny days were there n the year What s the avera e prec p tat on on those
ra ny days How many days were there w th more than half an nch of ra n
70
style of array nde n s nown as fancy nde n ancy nde n s l e the s mple
nde n ut we pass arrays of nd ces n place of s n le scalars h s allows us to very
u c ly access and mod fy compl cated su sets of an array s values
ancy nde n s pass n an array of nd ces to access mult ple array elements at once
or e ample cons der the follow n array:
mport numpy as np rand
np random andomState
rand rand nt s e pr nt
utput:
nd
utput:
array
ancy nde n also wor s n mult ple d mens ons ons der the follow n array:
np aran e reshape
utput:
array
71
L e w th standard nde n the f rst nde refers to the row and the second to the
column:
row np array
col np array
row col
utput:
array
he f rst value n the result s the second s and the th rd s
he pa r n of nd ces n fancy nde n follows all the roadcast n rules So for
e ample f we com ne a column vector and a row vector w th n the nd ces we et
a two d mens onal result: row : np newa s col utput:
array
Here each row value s matched w th each column vector e actly as we saw n
roadcast n of ar thmet c operat ons or e ample:
row : np newa s col
utput:
array
In fancy nde n the return value reflects the roadcasted shape of the nd ces
rather than the shape of the array e n nde ed
or even more powerful operat ons fancy nde n can e com ned w th the other
nde n schemes pr nt
utput:
ll of these nde n opt ons com ned lead to a very fle le set of operat ons for
access n and mod fy n arrayvalues
S P s
ne common use of fancy nde n s the select on of su sets of rows from a matr
or e ample we m ht have an N y D matr represent n N po nts n D d mens ons
such as the follow n po nts drawn from a two d mens onal normal d str ut on:
mean
cov
73
utput:
Now to see wh ch po nts were selected let s over plot lar e c rcles at the locat ons of
the selected po nts:
plt scatter : : alpha
plt scatter select on : select on :
facecolor none s
74
h s sort of strate y s often used to u c ly part t on datasets as s often needed n
tra n test spl tt n for val dat on of stat st cal models and n sampl n approaches to
answer n stat st cal uest ons
s
s fancy nde n can e used to access parts of an array t can also e used to mod fy
parts of an array or e ample say we have an array of nd ces and we want to set
the correspond n tems n an array to some value:
np aran e
np array
pr nt
utput:
pr nt
utput:
pr nt
utput: 75
he repeated nd ces w th these operat ons can cause some potent ally une pected results
ons der the follow n :
np eros
pr nt
utput:
he result of course s that conta ns the value ut cons der th s operat on:
utput:
array
We m ht e pect that would conta n the value and would conta n the value as
th s s how many t mes each nde s repeated
So what f we want a method where the operat on s repeated or th s We can use the at
method of ufuncs and do the follow n :
np eros
np add at
pr nt
utput:
he at method does an n place appl cat on of the ven operator at the spec f ed
nd ces here w th the spec f ed value here nother method that s s m lar n
sp r t s the reduceat method of ufuncs 76
We can use these deas to eff c ently n data to create a h sto ram y hand or
e ample ma ne we have values and would l e to u c ly f nd where they fall
w th n an array of ns We could compute t us n ufunc at l e th s:
np random seed
np eros l e ns f nd the
np searchsorted ns add to
each of these ns
np add at counts
utput:
atplotl prov des the plt h st rout ne wh ch does the same n a s n le l ne:
h s funct on w ll create a nearly dent cal plot to the one seen here o compute the
nn n matplotl uses the np h sto ram funct on wh ch does a very s m lar
computat on to what we d d efore Let s compare the two here:
77
pr nt NumPy rout ne:
t me t counts ed es np h sto ram ns pr nt ustom
rout ne:
utput:
rout ne:
np random randn
rout ne:
utput:
rout ne:
or e ample a s mple select onsort repeatedly f nds the m n mum value from a l st and ma es
swaps unt l the l st s sorted We can code th s n ust a few l nes of Python:
mport numpy as np
swap swap
return np array
select on sort
utput:
array
s S P s s
o return a sorted vers on of the array w thout mod fy n the nput you can use np sort:
np array
np sort
utput:
array
o sort the array n place we can use the sort method of arrays:
sort pr nt
utput:
related funct on s ar sort wh ch nstead returns the nd ces of the sorted elements:
np array
np ar sort
pr nt
utput:
he f rst element of th s result ves the nde of the smallest element the second
value ves the nde of the second smallest and so on hese nd ces can then e
used v a fancy nde n to construct the sorted array f re u red:
79
utput:
array
S s s
useful feature of NumPy s sort n al or thms s the a l ty to sort alon spec f c rows
or columns of a mult d mens onalarray us n the a s ar ument or e ample:
rand rand nt
pr nt
utput:
utput:
array
np sort a s
utput:
array
80
P S s P
NumPy prov des the np part t on funct on np part t on ta es an array and a num er
the result s a new array w th the smallest values to the left of the part t on and
the rema n n values to the r ht n ar trary order:
np array
np part t on
utput:
array
he f rst three values n the result n array are the three smallest n the array and the
rema n n array pos t ons conta n the rema n n values W th n the two part t ons the
elements have ar trary order
S m larly to sort n we can part t on alon an ar trary a s of a mult d mens onalarray:
np part t on a s
utput:
array
he result s an array where the f rst two slots n each row conta nthe smallest values
from that row w th the rema n n values f ll n the rema n n slots
nally there s a np ar sort that computes nd ces of the sort there s a np ar part t on
that computes nd ces of the part t on
s s
We use th s ar sort funct on alon mult ple a es to f nd the nearest ne h ors of each
po nt n a set We w ll start y creat n a random set of po nts on a twod mens onal
plane s n the standard convent on we w ll arran e these n a array:
rand rand
81
he scatter plot of the a ove s:
matplotl nl ne mport matplotl pyplot
as plt mport sea orn sea orn set Plot
styl n plt scatter : : s
utput:
o chec for correctness we should see that the d a onal of th s matr e the set
of d stances etween each po nt and tself s all ero:
82
d st s d a onal
utput:
array
W th the pa rw se s uare d stances converted we can now use np ar sort to sort alon
each row he leftmost columns w ll then ve the nd ces of the nearest ne h ors:
nearest np ar sort d st s a s
pr nt nearest
utput:
he f rst column ves the num ers throu h n order: th s s due to the fact that
each po nt s closest ne h or s tself In the nearest ne h ors all we need s to
part t on each row so that the smallest s uared d stances come f rst w th lar er
d stances f ll n the rema n n pos t ons of the array We can do th s w th the
np ar part t on funct on:
ach po nt n the plot has l nes drawn to ts two nearest ne h ors Some of the po nts
have more than two l nes com n out of them: th s s due to the fact that f po nt s
one of the two nearest ne h ors of po nt th s does not necessar ly mply that po nt
s one of the two nearest ne h ors of po nt
S P sS s
NumPy s structured arrays and record arrays prov de eff c ent stora e for compound
hetero eneous data Wh le the patterns are useful for s mple operat ons the pandas
dataframes are also used
mport numpy as np
ons der that we have several cate or es of data on a num er of people name a e
and we ht and we want to store these values for use n a Python pro ram It would
here s noth n here that tells us that the three arrays are related t would e more
natural f we could use a s n le structure to store all of th s data NumPy can handle
th s throu h structured arrays wh ch are arrays w th compound data types
We can create a structured array us n a compound data type spec f cat on:
84
se a compound data type for structured arrays data
np eros dtype names : name a e we ht
formats : f
pr nt data dtype
utput:
name a e we ht f
Here translatesto n code str n of ma mum len th translatesto yte
e t nte er and f translates to yte e t float
Now that we have created an empty conta ner array we can f ll the array w th our l sts
of values:
data name name
data a e a e
data we ht we ht
pr nt data utput:
l ce o athy
Dou
We can refer to values e ther y nde or y name:
et all names
data name
utput:
array l ce o athy Dou
dtype et f rst row of
data data utput:
l ce
et the name from the last row
data name
utput:
Dou
oolean mas n allows us to do operat ons such as f lter n on a e:
et names where a e s under
data data a e name
utput:
array l ce Dou
dtype Pandas prov des
a Dataframe o ect wh ch s a
structure u lt on NumPy arrays 85
that offers a var ety of useful
data man pulat on funct onal ty
S s
Structured array data types can e spec f ed n a num er of
ways np dtype names : name a e we ht
formats : f utput:
dtype name a e we ht f numer cal types
can e spec f ed us n Python types or NumPy dtypes :
np dtype names : name a e we ht
formats : np str nt np float
utput:
dtype name a e we ht f
compound type can also e spec f ed as a l st of
tuples: np dtype name S a e we ht
f utput:
dtype name S a e we ht f
We can spec fy the types alone n a comma separated str n : np dtype S f
he f rst opt onal character s or wh ch means l ttle end an or end an
respect vely and spec f es the order n convent on for s n f cant ts he ne t
character spec f es the type of data: characters ytes nts float n po nts and so on
he last character or characters represents the s e of the o ect n ytes
86
s
We can create a type where each element contains an array or matrix of values.we
will create a data type with a mat component consisting of a 3×3 floating-point
matrix:
tp = np.dtype([('id', 'i8'), ('mat', 'f8', (3, 3))]) X = np.zeros(1, dtype=tp) print(X[0])
print(X['mat'][0]) Output:
(0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
Now each element in the X array consists of an id and a 3×3 matrix.
NumPy dtype directly maps onto a C structure definition, so the buffer containing the
array content can be accessed directly within an appropriately written C program.
1.9.3 RecordArrays: Structured Arrays with a Twist
NumPy provides the np.recarray class, which is almost identical to the structured
arrays but with one additional feature: fields can be accessed as attributes rather
than as dictionary keys. data['age'] Output: array([25, 45,
37, 19], dtype=int32)
If we view our data as a record array, we can access this with fewer keystrokes:
data_rec =
data.view(np.recarray)
data_rec.age
Output:
array([25, 45, 37, 19], dtype=int32)
For record arrays, there is some extra overhead involved in accessing the fields,
even when using the same syntax: %timeit data['age']
%timeit data_rec['age']
%timeit data_rec.age
Output:
1000000 loops, best of 3: 241 ns per loop
100000 loops, best of 3: 4.61 µs per loop 100000 loops, best of 3: 7.27 µs per loop
MATERIALS
https://www.youtube.com/watch?v=3eAMvnIxQd0
https://www.youtube.com/watch?v=0MEO9wzSxTE
https://www.youtube.com/watch?v=awXS7_-52fY
87
https://www.youtube.com/watch?v=8y-o1zWSXR8
https://www.youtube.com/watch?v=0KxB7IMoqQg
88
ASSIGNMENT
89
10. ASSIGNMENT
Q. Question CO K Level
No. Level
90
11. PART-A
UNIT-2 Q&A
91
1. What are the categories of basic array manipulation? (CO2,K2)
Attributes of arrays
Determining the size, shape, memory consumption, and data types of arrays.
Indexing of arrays
Slicing of arrays
Reshaping of arrays
Combining multiple arrays into one, and splitting one array into many
The NumPy slicing syntax follows that of the standard Python list, to access a slice of an
array x:
x[start:stop:step]
If any of these are unspecified, they default to the values start=0, stop=size of dimension,
92
What w ll e the output for the elow code: array
pr nt :
utput:
What do you mean y ufuncs ufuncs are the un versal funct ons
he ector ed operat ons n NumPy are mplemented v a ufuncs whose ma n purpose
s to u c ly e ecute repeated operat ons on values n NumPy arrays NumPy s
un versal funct ons can e used to vector e operat ons and there y remove slow
Python loops
a ule : If the two arrays d ffer n the r num er of d mens ons the shape of
the one w th fewer d mens ons s paddedw th ones on ts lead n left
s de
93
P
ule : If the shape of the two arrays does not match n any d mens on
the array w th shape e ual to n that d mens on s stretched to match the
other shape
c ule : If n any d mens on the s es d sa ree and ne ther s e ual to an
error s ra sed
formats : f
pr nt data dtype
utput:
name a e we ht f
Wh le the Numpy rray has an mpl c tly def ned nte er nde used to access the
values the Pandas Ser es has an e pl c tly def ned nde assoc ated w th the values
94
P
h s e pl c t nde def n t on ves the Ser es o ect add t onal capa l t es or
e ample the nde need not e an nte er ut can cons st of values of any des red
type or e ample we can use str n s as an nde
How the ser es o ect can e mod f ed
he f rst sent nel value used y Pandas s None a Python s n leton o ect that s
often used for m ss n data n Python code ecause t s a Python o ect None
cannot e used n any ar trary NumPy Pandas array ut only n arrays w th data
type o ect e arrays of Python o ects
ult nde n s used to represent two d mens onal data w th n a one d mens onal
Ser es We can also use t to represent data of three or more d mens ons n a Ser es
or Data rame ach e tra level n a mult nde represents an e tra d mens on of
data
95
P
he method descr e computes several common a re ates for each column and
returns the result We can use th s method on the dataset for dropp n rows w th
m ss n values
What s spl t apply and com ne
he spl t step nvolves rea n up and roup n a data frame depend n on
the value of the spec f ed ey
he apply step nvolves comput n some funct on usually an a re ate
transformat on or f lter n w th n the nd v dual roups
he com ne step mer es the results of these operat ons nto an output array
Nume pr evaluates the e press on n a way that does not use full s ed temporary
arrays and can e much more eff c ent than NumPy espec ally for lar e arrays he
Pandas eval and uery tools are conceptually s m lar and depend on the Nume pr
pac a e
96
12. PART B
QUESTIONS :
UNIT –II
97
I. Explain all the array manipulation functions with examples in Numpy.( CO2,K3)
III. Explain Aggregation Functions and Fancy Indexing with examples in Numpy. (CO2, K3)
IV. Explain selection sort and other sorting methods used in Numpy with Examples. ( CO2,
K3)
98
13. PART C
QUESTIONS :
UNIT –II
99
I. How to create an empty and a full NumPy array?
II. Create a Numpy array filled with all zeros
III. Create a Numpy array filled with all ones
IV. Check whether a Numpy array contains a specified row
V. How to Remove rows in Numpy array that contains non-numeric values?
VI. Remove single-dimensional entries from the shape of an array
VII. Find the number of occurrences of a sequence in a NumPy array
VIII. Find the most frequent value in a NumPy array
IX. Combining a one and a two-dimensional NumPy Array
X. How to build an array of all combinations of two NumPy arrays?
XI. How to add a border around a NumPy array?
XII. How to compare two NumPy arrays?
XIII. How to check whether specified values are present in NumPy array?
XIV. How to get all 2D diagonals of a 3D NumPy array?
XV. Flatten a Matrix in Python using NumPy
100
14 SUPPORTIVE ONLINE CERTIFICATION COURSES
NPTEL: https://onlinecourses.nptel.ac.in/noc21_cs69/preview?
UDEMY: https://www.udemy.com/topic/data-science/
MOOC: https://mooc.es/course/introduction-to-data-science-in-python
Edx: https://learning.edx.org/course/course-v1:Microsoft+DAT208x+2T2016/home
GEEKSOFGEEKS: https://www.geeksforgeeks.org/data-science-fundamentals/
101
REAL LIFE APPLICATIONS IN
DAY TO DAY LIFE AND TO
INDUSTRY
102
15.Real Time Applications in Day to Day life
and to Industry
NumPy is useful for performing mathematical and logical operations on large high-dimensional arrays and
matrices. With it, you can perform a wide range of numerical functions efficiently. NumPy simplifies coding
procedures, provides online access to all its information, and collaborates with other libraries to make
tasks more efficient. Here are four real-life examples where NumPy is used:
1. Web Development
Python is popularly known as the language of choice for web development and Pyramid,
Django, and Flask. Standard libraries are included in these frameworks, making protocol
integration easy and efficient.
2. Education Sector
Python is also used in the development of online courses and education programs. It is an
easy language to learn for beginners since its syntax is similar to English. It provides a
beginner with a standard library and a variety of resources to get a handle on the language,
making it easier to learn. As a result, Python is a preferred programming language for
beginners in developing education programs at both basic and advanced levels.
3. Game Development
Battlefield 2 was one of the most popular video games in the early 2000s, and it was
developed in Python. Python frameworks are commonly used in game development, including
Pygame, PyKyra, Pyglet, PyOpenGL, Kivy, Panda3D, Cocos2D, etc.
4. Software Development
103
Software developers primarily use Python. It simplifies the development of complex
applications. The language is used for project management, as a support language, as build
control, and test.
NumPy mainly works with numerical data, while Pandas deals primarily with tabular data. With
Pandas, you can work with numeric data and time series in a fast, easy-to-use
environment. Pandas is written in Python, Cython, and C and is built around the NumPy library.
Data can be imported into Pandas from various file formats, including JSON, SQL, Microsoft
Excel, etc. Lastly, Pandas is used for data analysis and visualization, and NumPy is widely used
for numerical calculations.
NumPy stands for Numerical Python, and SciPy stands for Scientific Python; both are essential
Python libraries. These libraries are used to manipulate data in various ways. In arrays of
homogeneous data, NumPy is used for efficient operations. SciPy is a set of Python tools. These
tools support integration, differentiation, gradient optimization, and many other functions. They
are faster than other popular tools on the market. All general numerical computation is done via
SciPy in Python.
Python indexing begins at 0 and is performed with brackets, whereas MATLAB indexing begins
at one and is performed with parentheses. NumPy provides efficient operations on arrays of
homogeneous data in Python. Python can thus be used as a high-level language for manipulating
numerical data, similar to IDL, MATLAB, or Yorick. In MATLAB, everything is treated as an array,
whereas everything is a more general object in Python. In MATLAB, strings are arrays of
104
characters or arrays of strings, whereas, in Python, strings are their type of object called str.
MATLAB's scripting language was designed for linear algebra, so some array manipulations are
easier in MATLAB than in NumPy.
Math is part of the Python standard library. Basic mathematical operations are provided, as well
as some commonly used constants. NumPy, on the other hand, is a third-party package designed
for scientific computation. This is the defacto package for numerical and vector operations in
Python. Math is a standard library that contains functions (trigonometry, logarithms) and
constants. At the same time, NumPy is a mathematical library written in C. While the code is
almost identical, the performance is very different. MATLAB takes 0.252454 seconds to complete
the task, while NumPy takes 0.973672151566 seconds, nearly four times as long.
105
ASSESSMENT SCHEDULE
106
16.ASSESSMENT SCHEDULE
107
Prescribed Text books &
Reference books
108
17.PRESCRIBED TEXT BOOKS AND REFERENCE BOOKS
TEXT BOOKS:
Dav d elen rno D eysman and ohamed l “Introduc n Data Sc ence” ann n
Publications, 2016. (first two chapters for Unit I)
2. AshwinPajankar, Aditya Joshi, Hands-on Machine Learning with Python: Implement Neural
Network Solutions with Scikit-learn and PyTorch, Apress, 2022.
a e anderPlas “Python Data Sc ence Hand oo ” ’ e lly
REFERENCES:
1. Roger D. Peng, R Programming for Data Science, Lulu.com, 2016
2. Jiawei Han, MichelineKamber, Jian Pei, "Data Mining: Concepts and Techniques", 3rd Edition,
Morgan Kaufmann, 2012.
3. Samir Madhavan, Mastering Python for Data Science, Packt Publishing, 2015
4. Laura Igual, SantiSeguí, "Introduction to Data Science: A Python Approach to Concepts,
Techniques and Applications", 1st Edition, Springer, 2017
5. Peter Bruce, Andrew Bruce, "Practical Statistics for Data Scientists: 50 Essential Concepts", 3rd
Edition, O'Reilly, 2017
Hector uerrero “ cel Data nalys s: odell n and S mulat on” Spr n er Internat onal
Publishing, 2nd Edition, 2019
7. NPTEL Courses:
a. Data Science for Engineers
https://onlinecourses.nptel.ac.in/noc23_cs17/preview
b. Python for Data Science - https://onlinecourses.nptel.ac.in/noc23_cs21/preview
109
MINI PROJECT
SUGGESTIONS
110
18.MINI PROJECT SUGGESTIONS
111
Thank you
Disclaimer :
This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the respective
group / learning community as intended. If you are not the addressee you should not disseminate,
distribute or copy through e-mail. Please notify the sender immediately by e-mail if you have received
this document by mistake and delete this document from your system. If you are not the intended
recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the
contents of this information is strictly prohibited.
112