Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
41 views

22am901 Data Science Using Python Unit 2

Uploaded by

faizrahman4059
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

22am901 Data Science Using Python Unit 2

Uploaded by

faizrahman4059
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 116

1

2
Please read this disclaimer before proceeding:
This document is confidential and intended solely for the educational purpose of
RMK Group of Educational Institutions. If you have received this document
through email in error, please notify the system manager. This document
contains proprietary information and is intended only to the respective group /
learning community as intended. If you are not the addressee you should not
disseminate, distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake and delete
this document from your system. If you are not the intended recipient you are
notified that disclosing, copying, distributing or taking any action in reliance on
the contents of this information is strictly prohibited.

3
22AM901
DATA SCIENCE USING PYTHON

UNIT II
Department : Artificial Intelligence and
Machine learning

Batch/Year : 2022 - 2026 /II

Created by : Dr C Ambhika

Date : 18.01.2024

4
1.Table of Contents

S
CONTENTS PAGE NO
NO
1
1 CONTENTS

3
2 COURSE OBJECTIVES

5
3 PRE REQUISITES (COURSE NAMES WITH CODE)

7
4 SYLLABUS (WITH SUBJECT CODE, NAME, LTPC DETAILS)

11
5 COURSE OUTCOMES

13
6 CO- PO/PSO MAPPING

15
7 LECTURE PLAN – UNIT 2

18
8 ACTIVITY BASED LEARNING – UNIT 2

9 LECTURE NOTES – UNIT 2 20

10 ASSIGNMENTS 1 – UNIT 2 89

11 PART A (Q & A) With K LEVEL & CO UNIT 2 91

12 PART B Q s 97

13 PART C Q s 99

14 SUPPORTIVE ONLINE CERTIFICATION COURSES UNIT 2 101

REAL TIME APPLICATIONS IN DAY TO DAY LIFE AND TO 102


15 INDUSTRY

1
106
16 ASSESSMENT SCHEDULE

108
17 PRESCRIBED TEXT BOOKS & REFERENCE BOOKS

18 MINI PROJECT SUGGESTIONS 110

2
COURSE OBJECTIVES

3
22AM901 – DATA SCIENCE USING PYTHON

2. COURSE OBJECTIVES

Learn the fundamentals of Data Science.


Acquire skills in data preparatory and preprocessing steps.
Learn the tools and packages in Python for Data science.
Understand the various Excel Function to Solve Data Science Problem.
Acquire knowledge in data interpretation and visualization techniques.

4
PREREQUISITE

5
3.PREREQUISITE

22AM901 – DATA SCIENCE USING PYTHON

6
SYLLABUS

7
4 .SYLLABUS

22AM901 – DATA SCIENCE USING PYTHON

UNIT I INTRODUCTION

Need for data science – benefits and uses of Data Science and Big Data – facets of data – data
science process – setting the research goal – retrieving data – cleansing, integrating, and
transforming data – exploratory data analysis – build the models – presenting and building
applications

List of Exercise/Experiments:

1. Download, install and explore the features of R/Python for data analytics
• Installing Anaconda
• Basic Operations in Jupiter Notebook
• Basic Data Handling

UNIT II NUMPY FOR DATA SCIENCE

Introduction to Numpy- The Basics of NumpyArrays- Universal Functions-Aggregation-Computation


on Arrays- Comparisons, Masks and Boolean Logic-Fancy Indexing – Sorting Arrays –Structured
Data :Numpy’s Structured array

List of Exercise/Experiments:

1. Creation of numpy array using the tuple


2. Determine the size, shape and dimension of the array
3. Manipulation with array Attributes
4. Creation of Sub array
5. Perform the reshaping of the array along the row vector and column vector
6. Create Two arrays and perform the concatenation among the arrays
7. Perform the Statistics operation for the data (the sum, product, median, minimum and
maximum, quantiles, arg min, arg max etc.).
8. Use any data set compute the mean ,standard deviation, Percentile.

8
4 .SYLLABUS

UNIT III MANIPULATION WITH PANDAS

Data manipulation with Pandas – Data Indexing and Selection – Handling missing data –
Hierarchical indexing – Combining datasets – Aggregation and Grouping – String operations –
Working with time series – High performance Pandas.

List of Exercise/Experiments:

1. Perform the fundamental Pandas data structures operations : the Series, DataFrame ,
and Index.
2.Implement the Data Selection Operations
3.Implement the Data indexing operations like: loc, iloc, and ix
4. From the given sample data set perform the operations of handling the missing data like
None,Nan.
5.Manipulate on the operation of Null Vaues (is null(), not null(), dropna(), fillna())

UNIT IV DATA SCIENCE IN SPREADSHEET

Importing Data into Excel from Different Data Source – Data Cleansing and Preliminary
Data Analysis - Correlations and the importance of Variables Technical requirements -
Implementing Time Series

List of Exercise/Experiments:

1.Explore the Basic functions in Excel


2.Perform the task of importing the data in to Excel from data set
3.Do the data processing operations like data cleansing, data preparation

9
4 .SYLLABUS

UNIT V DATA VISUALIZATION

Importing Matplotlib – Simple line plots – Simple scatter plots – visualizing errors – density and
contour plots – Histograms – legends – colors – subplots – text and annotation – customization –
three dimensional plotting - Geographic Data with Basemap - Visualization with Seaborn.

List of Exercise/Experiments:

1.Exploring the Data Visualization using Excel


2.Basic plots using Matplotlib .
3.Implementation of Scatter Plot.
4.Construction of Histogram, bar plot, Subplots,Line Plots.
5.Implement the three dimensional potting

10
COURSE OUTCOMES

11
5.COURSE OUTCOMES

Course Outcome Cognitive /Affective Level Course


Statement of the Course Outcome Outcome
Apply the
Skillset in data K2 CO1
Processing.
Interpreting the
various uses of K2 CO2
libraries.
Understand the
real-world data K2,K3 CO3
and information.
Apply data
science using K2,K3 CO4
excel& Python
Interpret data
using
K2,K3 CO5
visualization
tools in Python

12
CO – PO/ PSO Mapping

13
6. CO-PO MAPPING

PO’s/PSO’s
COs
PO PO PO PO PO PO PO PO PO PO PO PO PSO PSO PSO
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3
3 2 2 - 1 - - - - - - - - 3 3
CO1
3 3 3 - 3 - - - - - - - - 3 3
CO2
3 3 3 - 3 - - - - - - - - 3 3
CO3
3 3 3 - 3 - - - - - - - - 3 3
CO4
3 3 3 - 3 - - - - - - - - 3 3
CO5
1 – Low, 2 – Medium, 3 – Strong

14
LECTURE PLAN

15
7. LECTURE PLAN

Mode
S Topics No Propos Actua Pertain Taxo of
No of ed l in g nomy deliv
peri date Lectu CO level ery
ods re
Date

1 Introduction to Numpy 1 CO2 K1 MD1,


MD5

2 Basics of NumPy Arrays 1 CO2 K2 MD1,


MD5

3 Universal Functions 1 CO2 K2 MD1,


Aggregations- Computation on MD5
Arrays
MD1,
4 Comparisons, Masks 1 CO2 K3 MD5

MD1,
5 Boolean Logic 1 CO2 K3
MD5

6 Fancy Indexing 1 CO2 K3 MD1,


MD5

MD1,
7 Sorting Arrays 1 CO2 K3
MD5

MD1,
8 Structured Data 1 CO2 K3
MD5

9 Numpy’s Structured array 1 CO2 K2 MD1,


MD5

16
LECTURE PLAN – UNIT 2

ASSESSMENT COMPONENTS
AC 1. Unit Test
AC 2. Assignment
AC 3. Course Seminar
AC 4. Course Quiz
AC 5. Case Study
AC 6. Record Work
AC 7. Lab / Mini Project
AC 8. Lab Model Exam
AC 9. Project Review

MODE OF DELEIVERY
MD 1. Oral presentation
MD 2. Tutorial
MD 3. Seminar
MD 4 Hands On
MD 5. Videos
MD 6. Field Visit

17
ACTIVITY BASED LEARNING

18
8. ACTIVITY BASED LEARNING

Activity name:
Creating and Automating an Interactive Dashboard using Python Students will have better
understanding about how the python libraries and other features of python work with any
datasets.

The steps involved in this activity are:


1. Downloading daily updated data from the web using selenium
2. Updating data directories using shutil, glob, and os python libraries
3. Simple cleaning of excel files with pandas
4. Formatting time series data frames to be input into plotly graphs

5. Creating a local web page for your dashboard using dash


Guidelines to do an activity :
1) Students can form group. ( 3 students / team)
2) Take any dataset.
3) Import python libraries. ( Follow above mentioned steps )
4) Conduct Peer review. ( each team will be reviewed by all other teams andmentors )

19
LECTURE NOTES

20
9.LECTURE NOTES
UNIT 2

NUMPY FOR DATA SCIENCE

1.Introduction to Numpy
1.1 NumPy (short for Numerical Python).
It provides an efficient interface to store and operate on dense data buffers.NumPy
arrays are faster and more compact than Python lists.NumPy gives you an enormous
range of fast and efficient ways of creating arrays and manipulating numerical data
inside them. NumPy arrays provide much more efficient storage and data operations
as the arrays grow larger in size. N. It also has functions for working in domain of
linear algebra, fourier transform, and matrices.

1.2 Why use NumPy?

NumPy arrays are stored at one continuous place in memory unlike lists, so processes
can access and manipulate them very efficiently.This behavior is called locality of
reference in computer science.This is the main reason why NumPy is faster than lists.
Also it is optimized to work with latest CPU architectures.

An array consumes less memory and is convenient to use. NumPy uses much less
memory to store data and it provides a mechanism of specifying the data types. This
allows the code to be optimized even further you can import NumPy.

In[1]: import numpy numpy. version__


Out[1]: '1.20.3'
In[2]: import numpy as np

I. Understanding Data Types in Python


Effective data-driven science and computation requires understanding how data is
stored and manipulated.

Users of Python are often drawn in by its ease of use, one piece of which is dynamic
21
typing. While a statically typed language like C or Java requires each variable to be
explicitly declared, a dynamically typed language like Python skips this specification.

For example, in C you might specify a particular operation as follows:

/* C code */ int result = 0;

for(int i=0; i<100; i++)

result += i;

While in Python the equivalent operation could be written this way:

# Python code

result = 0

for i in range(100):

result += i

Notice the main difference: in C, the data types of each variable are explicitly declared,
while in Python the types are dynamically inferred. This means, for example, that we
can assign any kind of data to any variable:

# Python code x = 4

x = "four"

Here we’ve switched the contents of x from an integer to a string. The same thing in
C would lead (depending on compiler settings) to a compilation error or other
unintended consequences:

/* C code */ int x = 4;

x = "four"; // FAILS

This sort of flexibility is one piece that makes Python and other dynamically typed
languages convenient and easy to use. Understanding how this works is an important
piece of learning to analyze data efficiently and effectively with Python. But what this

22
type flexibility also points to is the fact that Python variables are more than just their
value; they also contain extra information about the type of the value.

II. A Python Integer Is More Than Just an Integer

The standard Python implementation is written in C. This means that every Python
object is simply a cleverly disguised C structure, which contains not only its value, but
other information as well. For example, when we define an integer in Python, such as
x = 10000, x is not just a “raw” integer. It’s actually a pointer to a compound C
structure, which contains several values. Looking through the source code, we find
that the integer (long) type definition effectively looks like this (once the C macros
are expanded)

struct _longobject

{ long ob_refcnt;

PyTypeObject *ob_type;

size_t ob_size;

long ob_digit[1];

};

A single integer in Python 3.4 actually contains four pieces:

• ob_refcnt, a reference count that helps Python silently handle memory allocation and
deallocation

• ob_type, which encodes the type of the variable

• ob_size, which specifies the size of the following data members

• ob_digit, which contains the actual integer value that we expect the Python variable
to represent

A Python integer is a pointer to a position in memory containing all the Python object
information, including the bytes that contain the integer value. This extra information
23
in the Python integer structure is what allows Python to be coded so freely and
dynamically.

III A Python List Is More Than Just a List

Let’s consider now what happens when we use a Python data structure that holds
many Python objects. The standard mutable multielement container in Python is the
list. We can create a list of integers as follows:

In[1]: L = list(range(10)) L

Out[1]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In[2]: type(L[0]) Out[2]: int

Or, similarly, a list of strings:

In[3]: L2 = [str(c) for c in L] L2

Out[3]: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] In[4]: type(L2[0])

Out[4]: str

Because of Python’s dynamic typing, we can even create heterogeneous lists: In[5]:
L3 = [True, "2", 3.0, 4]

[type(item) for item in L3]

Out[5]: [bool, str, float, int]

But this flexibility comes at a cost: to allow these flexible types, each item in the list
must contain its own type info, reference count, and other information—that is, each
item is a complete Python object.

IV Fixed-Type Arrays in Python

Python offers several different options for storing data in efficient, fixed-type data
buffers. The built-in array module (available since Python 3.3) can be used to create

24
dense arrays of a uniform type:

In[6]: import array L = list(range(10))

A = array.array('i', L)

Out[6]: array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Here 'i' is a type code indicating the contents are integers.Much more useful, however,
is the ndarray object of the NumPy package. While Python’s array object provides
efficient storage of array-based data, NumPy adds to this efficient operations on that
data. We will explore these operations in later sections; here we’ll demonstrate several
ways of creating a NumPy array. We’ll start with the standard NumPy import, under
the alias np:

In[7]: import numpy as np

V Creating Arrays from Python Lists

First, we can use np.array to create arrays from Python lists: In[8]: # integer array:

np.array([1, 4, 2, 5, 3])

Out[8]: array([1, 4, 2, 5, 3])

Remember that unlike Python lists, NumPy is constrained to arrays that all contain
the same type. If types do not match, NumPy will upcast if possible (here, integers
are upcast to floating point):

In[9]: np.array([3.14, 4, 2, 3])

Out[9]: array([ 3.14, 4. , 2. , 3. ])

If we want to explicitly set the data type of the resulting array, we can use the dtype
keyword:

In[10]: np.array([1, 2, 3, 4], dtype='float32')

Out[10]: array([ 1., 2., 3., 4.], dtype=float32)

25
Finally, unlike Python lists, NumPy arrays can explicitly be multidimensional; here’s
one way of initializing a multidimensional array using a list of lists:

In[11]: # nested lists result in multidimensional arrays np.array([range(i, i + 3) for i


in [2, 4, 6]])

Out[11]: array([[2, 3, 4],

[4, 5, 6],

[6, 7, 8]])

The inner lists are treated as rows of the resulting two-dimensional array

VI Creating Arrays from Scratch

Especially for larger arrays, it is more efficient to create arrays from scratch using
routines built into NumPy. Here are several examples:

In[12]: # Create a length-10 integer array filled with zeros np.zeros(10, dtype=int)

Out[12]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In[13]: # Create a 3x5 floating-point array filled with 1s np.ones((3, 5), dtype=float)

Out[13]: array([[ 1., 1., 1., 1., 1.],

[ 1., 1., 1., 1., 1.],

[ 1., 1., 1., 1., 1.]])

In[14]: # Create a 3x5 array filled with 3.14 np.full((3, 5), 3.14

Out[14]: array([[ 3.14, 3.14, 3.14, 3.14, 3.14],

[ 3.14, 3.14, 3.14, 3.14, 3.14],

[ 3.14, 3.14, 3.14, 3.14, 3.14]])

In[15]: # Create an array filled with a linear sequence # Starting at 0, ending at 20,
stepping by 2

# (this is similar to the built-in range() function) np.arange(0, 20, 2)

26
Out[15]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])

In[16]: # Create an array of five values evenly spaced between 0 and 1

np.linspace(0, 1, 5)

Out[16]: array([ 0. , 0.25, 0.5 , 0.75, 1. ])

In[17]: # Create a 3x3 array of uniformly distributed # random values between 0 and
1

np.random.random((3, 3))

Out[17]: array([[ 0.99844933, 0.52183819, 0.22421193],

[ 0.08007488, 0.45429293, 0.20941444],

[ 0.14360941, 0.96910973, 0.946117 ]])

In[18]: # Create a 3x3 array of normally distributed random values # with mean 0
and standard deviation 1

np.random.normal(0, 1, (3, 3))

Out[18]: array([[ 1.51772646, 0.39614948, -0.10634696],

[ 0.25671348, 0.00732722, 0.37783601],

[ 0.68446945, 0.15926039, -0.70744073]])

In[19]: # Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

Out[19]: array([[2, 3, 4],

[5, 7, 8],

[0, 5, 0]])

In[20]: # Create a 3x3 identity matrix np.eye(3)

Out[20]: array([[ 1., 0., 0.],

[ 0., 1., 0.],

27
[ 0., 0., 1.]])

In[21]: # Create an uninitialized array of three integers

# The values will be whatever happens to already exist at that

# memory location

np.empty(3)

VII NumPy Standard Data Types

NumPy arrays contain values of a single type, so it is important to have detailed


knowledge of those types and their limitations. Because NumPy is built in C, the types
will be familiar to users of C, Fortran, and other related languages. Note that when
constructing an array, you can specify them using a string:

np.zeros(10, dtype='int16')

Or using the associated NumPy object: np.zeros(10, dtype=np.int16)

The standard NumPy data types are listed below.

28
2. The Basics of NumPy Arrays

The newer tools like Pandas are built around the NumPy array. NumPy array
manipulation are used to :

● Access data and sub arrays

● Split

● Reshape

● Join the arrays

Categories of basic array manipulations

Attributes of arrays

Determining the size, shape, memory consumption, and data types of arrays.

Indexing of arrays
29
Getting and setting the value of individual array elements.

Slicing of arrays

Getting and setting smaller sub arrays within a larger array

Reshaping of arrays

Changing the shape of a given array

Joining and splitting of arrays

Combining multiple arrays into one, and splitting one array into many

2.1 NumPy Array Attributes

There are three random arrays: a one dimensional, two-dimensional, and


threedimensional array. The NumPy’s random number generator is used, which we
will seed with a set value in order to ensure that the same random arrays are
generated each time the code is run.

import numpy as np np.random.seed(0) # seed for reproducibility x1 =


np.random.randint(10, size=6) # One-dimensional array x2 =
np.random.randint(10, size=(3, 4)) # Two-dimensional array x3 =

np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array

print("x3 ndim: ", x3.ndim)

print("x3 shape:", x3.shape)

print("x3 size: ", x3.size)

Output:

x3 ndim: 3 x3

shape: (3, 4, 5) x3

size: 60

Another useful attribute is the dtype, the data type of the array print("dtype:",

x3.dtype)
30
Output:

dtype: int64

Other attributes include itemsize, which lists the size (in bytes) of each array element
and nbytes which lists the total size (in bytes) of the array:

print("itemsize:", x3.itemsize, “bytes")

print("nbytes:", x3.nbytes, "bytes") Output:

itemsize: 8 bytes

nbytes: 480 bytes

In general, we expect that nbytes is equal to itemsize times size.

2.2 Array Indexing: Accessing Single Elements

In a one-dimensional array, the ith value (counting from zero) can be accessed by
specifying the desired index in square brackets, just as with Python lists.

x1

array([5, 0, 3, 3, 7, 9]) X1[0]

Output:

x1[4]

Output:

To index from the end of the array, we can use negative indices.

x1[-1]

Output: 9 X1[-2]

Output:

31
In a multi-dimensional array, items can be accessed using a comma-separated tuple
of indices.

X2

array([[3, 5, 2, 4],

[7, 6, 8, 8],

[1, 6, 7, 7]])

x2[0, 0]

Output:

3 x2[2, 0]

Output:

1 x2[2, - 1]

Output:

Values can also be modified using any of the above index notation:

x2[0, 0] = 12 X2

Output:

array([[12, 5, 2, 4],

[ 7, 6, 8, 8],

[ 1, 6, 7, 7]])

Unlike Python lists, NumPy arrays have a fixed type. That is if we attempt to insert a
floating- point value to an integer array, the value will be truncated.

x1[0] = 3.14159 # this will be truncated!

X1

Output:

32
array([3, 0, 3, 3, 7, 9])

2.3 Array Slicing: Accessing Subarrays

As we can use square brackets to access individual array elements, we can also use
them to access subarrays with the slicenotation, marked by the colon (:) character.

The NumPy slicing syntax follows that of the standard Python list, to access a slice of
an array x:

x[start:stop:step]

If any of these are unspecified, they default to the values start=0, stop=size of
dimension, step=1. We can access sub-arrays in one dimension and in multiple
dimensions:

One-dimensional subarrays

x = np.arange(10)

Output:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) x[:5]

# first five elements Output:

array([0, 1, 2, 3, 4])

x[5:] # elements after index 5

Output:

array([5, 6, 7, 8, 9])

x[4:7] # middle sub-array Output:

array([4, 5, 6]) x[::2] #

every other element Output:

array([0, 2, 4, 6, 8]) x[1::2] # every other

33
element, starting at index 1

Output:

array([1, 3, 5, 7, 9])

A confusing case is when the step value is negative. In this case, the defaults for start
and stop are swapped. This becomes a convenient way to reverse an array:

x[::-1] # all elements, reversed Output:

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) x[5::-2] #

reversed every other from index 5 Output:

array([5, 3, 1])

Multi-dimensional subarrays

Multi-dimensional slices work in the same way, with multiple slices separated by
commas. For example:

x2

Output:

array([[12, 5, 2, 4],

[ 7, 6, 8, 8],

[ 1, 6, 7, 7]]) x2[:2, :3] # two

rows, three columns

Output:

array([[12, 5, 2],

[ 7, 6, 8]]) x2[:3, ::2] # all rows,

every other column

Output:

array([[12, 2],

34
[ 7, 8],

[ 1, 7]])

Finally, subarray dimensions can even be reversed together:

x2[::-1, ::-1]

Output:

array([[ 7, 7, 6, 1],

[ 8, 8, 6, 7],

[ 4, 2, 5, 12]])

Accessing array rows and columns

Accessing of single rows or columns of an array can be done by combining indexing


and slicing, using an empty slice marked by a single colon (:):

print(x2[:, 0]) # first column of x2 Output:

[12 7 1]

print(x2[0, :]) # first row of x2 row access, the empty slice can be omitted for a more
compact equivalent to x2[0, :]

Output:

[12 5 2 4]

In the case of syntax:

print(x2[0]) # Output:

[12 5 2 4]

Subarrays as no-copy views

Array slices return viewsrather than copiesof the array data. This is one area in
which NumPy array slicing differs from Python list slicing: in lists, slices will be
copies. Consider the two-dimensional array from before:

35
print(x2)

Output:

[[12 5 2 4]

[ 7 6 8 8]

[ 1 6 7 7]]

Extract a 2×2 subarray from this:

x2_sub = x2[:2, :2] print(x2_sub) Output:

[[12 5]

[ 7 6]]

Now if we modify this subarray, we'll see that the original array is changed. x2_sub[0,
0] = 99

print(x2_sub)

Output:

[[99 5]

[ 7 6]]

print(x2) Output:

[[99 5 2 4]

[ 7 6 8 8]

[ 1 6 7 7]]

When we work with large datasets, we can access and process pieces of these
datasets without the need to copy the underlying data buffer.

Creating copies of arrays

It is sometimes useful to instead explicitly copy the data within an array or a subarray.
This can be most easily done with the copy() method: x2_sub_copy = x2[:2,

36
:2].copy()

print(x2_sub_copy) Output:

[[99 5]

[ 7 6]]

If we now modify this subarray, the original array is not altered: x2_sub_copy[0, 0]
= 42 print(x2_sub_copy) Output:

[[42 5]

[ 7 6]]

print(x2) Output:

[[99 5 2 4]

[ 7 6 8 8]

[ 1 6 7 7]]

MATERIALS:

https://www.youtube.com/watch?v=QUT1VHiLmmI

https://www.youtube.com/watch?v=ZGsLUC49Jns

https://www.youtube.com/watch?v=4-epfRgaiq4

2.4 RESHAPING OF ARRAYS

Another useful type of operation is reshaping of arrays. This can be done using the

reshape method. For example, if we want to put the numbers 1 through 9 in a 3×3
grid, we can do the following:

grid = np.arange(1, 10).reshape((3, 3)) print(grid) Output:

[[1 2 3]

[4 5 6]

37
[7 8 9]]

For this to work, the size of the initial array must match the size of the reshaped array.
Where possible, the reshape method will use a no-copy view of the initial array, but
with non-contiguous memory buffers this is not always the case.

Another common reshaping pattern is the conversion of a one-dimensional array into


a two-dimensional row or column matrix. This can be done with the reshape method,
or by making use of the newaxis keyword within a slice operation:

x = np.array([1, 2, 3]) # row vector via reshape

x.reshape((1, 3))

Output:

array([[1, 2, 3]])

# row vector via

newaxis x[np.newaxis, :] Output:

array([[1, 2, 3]])

# column vector via reshape x.reshape((3, 1))

Output:

array([[1],[2],[3])

# column vector via newaxis x[:, p.newaxis]

Output:

array([[1],[2],[3]])

2.5 Array Concatenation and Splitting

It is also possible to combine multiple arrays into one and to conversely split a single
array into multiple arrays.

Concatenation of arrays

Concatenation or joining of two arrays in NumPy, is done by using np.concatenate,


38
np.vstack and np.hstack. np.concatenate takes a tuple or list of arrays as its first
argument, as :

x = np.array([1, 2, 3])

y = np.array([3, 2, 1])

np.concatenate([x, y])

Output: array([1, 2, 3,3, 2, 1])

We can also concatenate more than two arrays at once: z

= [99, 99, 99]

print(np.concatenate([x, y, z]))

Output:

[ 1 2 3 3 2 1 99 99 99]

It can also be used for two-dimensional arrays:

grid = np.array([[1, 2, 3],

[4, 5, 6]])

# concatenate along the first axis np.concatenate([grid, grid])

Output:

array([[1, 2, 3],

[4, 5, 6],

[1, 2, 3],

[4, 5, 6]])

# concatenate along the second axis (zero-indexed) np.concatenate([grid, grid], axis=1)

Output:

array([[1, 2, 3, 1, 2, 3],

[4, 5, 6, 4, 5, 6]])

39
For arrays of mixed dimensions, we can use the np.vstack (vertical stack) and np.hstack
(horizontal stack) functions: x = np.array([1, 2, 3]) grid = np.array([[9, 8, 7],

[6, 5, 4]])

# vertically stack the arrays

np.vstack([x, grid]) Output: array([[1, 2, 3],

[9, 8, 7],

[6, 5, 4]])

# horizontally stack the arrays y

= np.array([[99],

[99]])

np.hstack([grid,y])

Output:

array([[ 9, 8, 7, 99],

[ 6, 5, 4, 99]])

Similarly, np.dstack will stack arrays along the third axis.

Splitting of arrays

The opposite of concatenation is splitting which is implemented by the functions np.split,


np.hsplit, and np.vsplit. For each of these, we can pass a list of indices giving the split
points:

x = [1, 2, 3, 99, 99, 3, 2, 1]

x1, x2, x3 = np.split(x, [3, 5])

print(x1, x2, x3)

print(x1, x2, x3)

Output:

40
[1 2 3] [99 99] [3 2 1]

Nsplit-points leads to N+1subarrays. The related functions np.hsplit and np.vsplit are
similar.

grid = np.arange(16).reshape((4, 4)) grid

Output:

array([[ 0, 1, 2, 3],

[ 4, 5, 6, 7],

[ 8, 9, 10, 11],

[12, 13, 14, 15]])

upper, lower = np.vsplit(grid, [2])

print(upper) print(lower) Output:

[[0 1 2 3]

[4 5 6 7]]

[[ 8 9 10 11]

[12 13 14 15]]

left, right = np.hsplit(grid, [2]) print(left) print(right)

Output:

[[ 0 1]

[ 4 5]

[ 8 9]

[12 13]]

[[ 2 3]

[ 6 7]

[10 11]

41
[14 15]]

Similarly, np.dsplit will split arrays along the third axis.

MATERIALS

https://www.youtube.com/watch?v=3osJ59xXAGo

https://www.youtube.com/watch?v=KehyltXMrZE

3. COMPUTATION ON NUMPY ARRAYS: UNIVERSAL FUNCTIONS

NumPy provides an easy and flexible interface to optimized computation with arrays of
data.Computation on NumPy arrays can be very fast or it can be very slow. The key to
making it fast is to use vectorized operations, generally
implemented through NumPy's universalfunctions(ufuncs). NumPy's ufuncs
can be used to make repeated calculations on array elements much more efficient.

i) The Slowness of Loops

Python's default implementation (known as CPython) does some operations very slowly.

This is in part due to the dynamic, interpreted nature of the language: the fact that types
are flexible, so that sequences of operations cannot be compiled down to efficient machine
code as in languages like C and Fortran.

Recently there have been various attempts to address this weakness: well-known examples
are the PyPy project, a just-in-time compiled implementation of Python; the

Cython project, which converts Python code to compilable C code; and the Numba
project, which converts snippets of Python code to fast LLVM bytecode. Each of these has
its strengths and weaknesses, but none of the three approaches has yet surpassed the
reach and popularity of the standard CPython engine.

Many small operations are being repeated. That is looping over arrays to operate on each
element. For example, imagine we have an array of values and we'd like to compute the
reciprocal of each:
42
import numpy as np np.random.seed(0)

def compute_reciprocals(values):

output = np.empty(len(values))

for i in range(len(values)):

output[i] = 1.0 / values[i] return output

values = np.random.randint(1, 10, size=5) compute_reciprocals(values)

Output:

array([ 0.16666667, 1. , 0.25 , 0.25 , 0.125])

If we measure the execution time of this code for a large input, we see that this operation
is very slow.

big_array = np.random.randint(1, 100, size=1000000)

%timeit compute_reciprocals(big_array)

Output:

1 loop, best of 3: 2.91 s per loop

It takes several seconds to compute these million operations and to store the result. Each
time the reciprocal is computed, Python first examines the object's type and does a
dynamic lookup of the correct function to use for that type. If we were working in
compiled code instead, this type specification would be known before the code executes
and the result could be computed much more efficiently.

ii) Introducing UFuncs

For many types of operations, NumPy provides a convenient interface into statically
typed, compiled routine. This is known as a vectorized operation. This can be
accomplished by simply performing an operation on the array which will then be applied
to each element. This vectorized approach is designed to push the loop into the compiled
layer that underlies NumPy, leading to much faster execution.

43
Compare the results of the following two:

print(compute_reciprocals(values)) print(1.0 / values)

Output:

[0.16666667 1. 0.25 0.25 0.125 ]

[0.16666667 1. 0.25 0.25 0.125 ]

The execution time for big array shows that it completes orders of magnitude faster

than the Python loop: %timeit (1.0 / big_array)

Output:

100 loops, best of 3: 4.6 ms per loop

Vectorized operations in NumPy are implemented via ufuncswhose main purpose is to

quickly execute repeated operations on values in NumPy arrays.

Ufuncs are extremely flexible, we can also operate between two arrays:

np.arange(5) / np.arange(1, 6)

Output:

array([ 0. , 0.5 , 0.66666667, 0.75, 0.8 ])

And ufunc operations are not limited to one-dimensional arrays, they can also act on multi-
dimensional arrays as well: x = np.arange(9).reshape((3, 3))

2 ** x

Output:

array([[ 1, 2, 4],

[ 8, 16, 32],

[ 64, 128, 256]])

Computations using vectorization through ufuncs are more efficient than their counterpart

44
implemented using Python loops, especially as the arrays grow in size. Any time we see
such a loop in a Python script, we should consider whether it can be replaced with a
vectorized expression.

3.3 Exploring NumPy's UFuncs

Ufuncs exist in two types: unaryufuncs, which operate on a single input and binary ufuncs,
which operate on two inputs.

Array arithmetic

NumPy's ufuncs make use of Python's native arithmetic operators. The standard addition,
subtraction, multiplication, and division can all be used:

x = np.arange(4)

print("x =", x)

print("x + 5 =", x + 5)

print("x - 5 =", x - 5)

print("x * 2 =", x * 2)

print("x / 2 =", x / 2)

print("x // 2 =", x // 2) # floor division

Output: x = [0 1 2 3]

x + 5 = [5 6 7 8]

x - 5 = [-5 -4 -3 -2] x * 2

= [0 2 4 6] x / 2 = [ 0.0.5 1. 1.5] x // 2 = [0 0 1 1]

There is also a unary ufunc for negation, a ** operator for exponentiation and a % operator
for modulus: print("-x = ", -x) print("x ** 2 = ", x ** 2) print("x % 2 = ", x

% 2)

Output:

45
-x = [ 0 -1 -2 -3]

x ** 2 = [0 1 49]

x % 2 = [0 1 0 1]

-(0.5*x + 1) ** 2

Output:

array([-1. , -2.25, -4. , -6.25])

Each of these arithmetic operations are wrappers around specific functions built into
NumPy. For example, the + operator is a wrapper for the add function:

np.add(x,2}

Output:

array([2, 3, 4, 5])

The following table lists the arithmetic operators implemented in NumPy

Absolute value

Just as NumPy understands Python's built-in arithmetic operators, it also understands


Python's built-in absolute value function:
x = np.array([-2, -1, 0, 1, 2]) abs(x)

Output:
array([2, 1, 0, 1, 2])

46
The corresponding NumPy ufunc is np.absolute, which is also available under the alias np.abs:
np.absolute(x)

Output: array([2, 1,
0, 1, 2]) np.abs(x) Output:
array([2, 1, 0, 1, 2])
This ufunc can also handle complex data, in which the absolute value returns the magnitude:
x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x) Output:

array([ 5., 5., 2., 1.])

Trigonometric functions
NumPy provides a large number of useful ufuncs and some of the most useful for the data
scientist are the trigonometric functions.
theta = np.linspace(0, np.pi, 3) we can compute some trigonometric

functions on these values:

print("theta = ", theta)


print("sin(theta) = ", np.sin(theta))

print("cos(theta) = ", np.cos(theta))

print("tan(theta) = ", np.tan(theta))

Output:

theta= [ 0. 1.57079633 3.14159265]

sin(theta) = [ 0.00000000e+00 1.00000000e+00 1.22464680e-16] cos(theta)

= [ 1.00000000e+00 6.12323400e-17 -1.00000000e+00]

tan(theta) = [ 0.00000000e+00 1.63312394e+16 -1.22464680e-16]

The values are computed to within machine precision, which is why values that should be
zero do not always hit exactly zero. Inverse trigonometric functions are also available:

47
x = [-1, 0, 1] print("x= ", x)

print("arcsin(x) = ", np.arcsin(x))

print("arccos(x) = ", np.arccos(x))

print("arctan(x) = ", np.arctan(x))

x = [-1, 0, 1]

arcsin(x) = [-1.57079633 0. 1.57079633] arccos(x)

= [ 3.14159265 1.57079633 0. ] arctan(x) = [-

0.78539816 0. 0.78539816]

Exponents and logarithms

Another common type of operation available in a NumPy ufunc are the exponentials: x =
[1, 2, 3] print("x=", x)

Print ("e^x =", np.exp(x))

print("2^x =", np.exp2(x))

print("3^x =", np.power(3,x))

Output:

x= [1, 2, 3]

e^x = [ 2.71828183

2^x = [ 2. 4. 8.]

3^x = [ 3 9 27]

The inverse of the exponentials and the logarithms are also there. The basic np.log gives the
natural logarithm if we want to compute the base-2 logarithm or the base10 logarithm. x =
[1, 2, 4, 10] print("x =", x)

48
print("ln(x) =", np.log(x)) print("log2(x)

=", np.log2(x)) print("log10(x) =", np.log10(x))

x = [1, 2, 4, 10]

ln(x)= [ 0. 0.69314718 1.38629436 2.30258509]

log2(x) = [ 0. 1. 2. 3.32192809]

log10(x) = [ 0. 0.30103 0.60205999 1.]

There are some specialized versions that are useful for maintaining precision with very small
input:

x = [0, 0.001, 0.01, 0.1] print("exp(x)

- 1 =", np.expm1(x)) print("log(1 + x)

=", np.log1p(x))

Output:

When x is very small, these functions give more precise values than if the raw np.log or
np.exp were to be used.

Specialized ufuncs

NumPy has ufuncs such as hyperbolic trig functions, bitwise arithmetic, comparison operators,
conversions from radians to degrees, rounding and remainders.

The more specialized and obscure ufuncs is the submodule scipy.special. If we want to
compute some obscure mathematical function on our data, it can be implemented in
scipy.special.

from scipy import special

# Gamma functions (generalized factorials) and related functions x = [1, 5, 10]


print("gamma(x) =", special.gamma(x)) print("ln|gamma(x)| =", special.gammaln(x))
print("beta(x, 2) =", special.beta(x, 2))

49
gamma(x) =[1.00000000e+00 2.40000000e+01 3.62880000e+05] ln|gamma(x)| = [
0. 3.17805383 12.80182748] beta(x, 2) = [ 0.5 0.03333333 0.00909091]

# Error function (integral of Gaussian) # its complement, and its inverse x = np.array([0,
0.3, 0.7, 1.0]) print("erf(x)

=", special.erf(x))

print("erfc(x) =",special.erfc(x))

print("erfinv(x)=", special.erfinv(x))

Output:

erf(x) = [ 0. 0.32862676 0.67780119 0.84270079]

erfc(x) = [ 1. 0.67137324 0.32219881 0.15729921]

erfinv(x) = [ 0. 0.27246271 0.73286908 inf]

MATERIALS

https://www.youtube.com/watch?v=VuaQKtygva4

https://www.youtube.com/watch?v=kOn2lCrd37w

https://www.youtube.com/watch?v=shi56WRsiM8

3.4 Advanced Ufunc Features

Specifying output

For large calculations, it is useful to be able to specify the array where the result of the
calculation will be stored. Rather than creating a temporary array, this can be used to write
computation results directly to the memory location where we want them to be. For all
ufuncs, this can be done using the out argument of the function:

50
x = np.arange(5) y =

np.empty(5) np.multiply(x, 10, out=y) print(y)

Output:

[ 0. 10. 20. 30. 40.]

This can even be used with array views. For example, we can write the results of a

computation to every other element of a specified array:

y = np.zeros(10) np.power(2, x, out=y[::2]) print(y)

Output:

[ 1. 0. 2. 0. 4. 0. 8. 0. 16. 0.]

If we had instead written y[::2] = 2 ** x, this would have resulted in the creation of a
temporary array to hold the results of 2 ** x, followed by a second operation copying those
values into the y array.

1.4 Aggregates

For binary ufuncs, there are aggregates that can be computed directly from the object. For
example, if we'd like to reducean array with a particular operation, we can use the reduce
method of any ufunc. A reduce repeatedly applies a given operation to the elements of an
array until only a single result remains.

For example, calling reduce on the add ufunc returns the sum of all elements in the array:

x = np.arange(1, 6)

np.add.reduce(x) Output:

15

Similarly, calling reduce on the multiply ufunc results in the product of all array elements:

np.multiply.reduce(x) Output:

120

51
To store all the intermediate results of the computation, we can instead use accumulate:

np.add.accumulate(x) Output: array([ 1, 3, 6,

10, 15])

np.multiply.accumulate( x) Output:

array([ 1, 2, 6, 24, 120])

Outer products

Finally, any ufunc can compute the output of all pairs of two different inputs using the outer
method. This allows us, in one line, to do things like create a multiplication table:

x = np.arange(1, 6)

np.multiply.outer(x,x)

Output:

array([[ 1, 2, 3, 4, 5],

[ 2, 4, 6, 8, 10],

[ 3, 6, 9, 12, 15],

[ 4, 8, 12, 16, 20],

[ 5, 10, 15, 20, 25]])

MATERIALS

https://www.youtube.com/watch?v=PP7NfO5cd-I

https://www.youtube.com/results?search_query=Aggregations+IN+NUMPY+NPTEL

https://www.youtube.com/watch?v=orQuiFokFPM

Aggregations

While working with a large amount of data, a first step is to compute summary statistics for
the data. The most common summary statistics are the mean and standard deviation, which
52
allow us to summarize the "typical" values in a dataset, but other aggregates are also useful
such as the sum, product, median, minimum and maximum, quantiles, etc.

NumPy has fast built-in aggregation functions for working on arrays;

4.1 Summing the Values in an Array

Python computes the sum of all values in an array using the built-in sum function: import
numpy as np L =

np.random.random(100) sum(L)

Output:

55.61209116604941

The syntax is similar to that of NumPy's sum function, and the result is the same in the
simplest case:

1000 loops, best of 3: 442 µs per loop

1.4.2 Minimum and Maximum

Python has built-in min and max functions, used to find the minimum value and maximum
value of any given array: min(big_array), max(big_array)

Output:

(1.1717128136634614e-06, 0.9999976784968716)

NumPy's corresponding functions have similar syntax and operate more quickly:
np.min(big_array), np.max(big_array)

Output:

(1.1717128136634614e-06, 0.9999976784968716)

%timeit min(big_array)

%timeit np.min(big_array) Output:

10 loops, best of 3: 82.3 ms per loop

53
1000 loops, best of 3: 497 µs per loop

For min, max, sum, and several other NumPy aggregates, a shorter syntax is to use methods
of the array object itself:

print(big_array.min(), big_array.max(), big_array.sum()) Output:

1.17171281366e-06 0.999997678497 499911.628197

4.3 Multi dimensional aggregates

One common type of aggregation operation is an aggregate along a row or column. Say we
have some data stored in a two-dimensional array: M

= np.random.random((3, 4)) print(M)

np.sum(L)

Output:

55.612091166049424

because it executes the operation in compiled code, NumPy's version of the operation is
computed much more quickly: big_array = np.random.rand(1000000)

%timeit sum(big_array)

%timeit np.sum(big_array) Output:

10 loops, best of 3: 104 ms per loop


Output:

[[ 0.8967576 0.03783739 0.75952519 0.06682827]

[ 0.8354065 0.99196818 0.19544769 0.43447084]

[ 0.66859307 0.15038721 0.37911423 0.6687194 ]]


By default, each NumPy aggregation function will return the aggregate over the entire
array:
M.sum() Output:

54
6.0850555667307118
Aggregation functions take an additional argument specifying the axisalong which the aggregate
is computed. For example, we can find the minimum value within each column by specifying
axis=0:

M.min(axis=0)

Output: array([ 0.66859307,0.06682827, 0.03783739, 0.19544769])

The function returns four values, corresponding to the four columns of numbers. Similarly, we can find the
maximum value within each row:

M.max(axis=1) Output:

array([ 0.8967576 , 0.99196818, 0.6687194 ])

The axis keyword specifies the dimensionofthearraythatwillbecollapsed, rather than the dimension that will
be returned. So specifying axis=0 means that the first axis will be collapsed: for two-dimensional arrays, this
means that values within each column will be aggregated.

4.4 Other aggregation functions

Most aggregates have a NaN-safe counterpart that computes the result while ignoring missing values.

The following table provides a list of useful aggregation functions available in NumPy:

55
Example: What is the Average Height of US Presidents?

Aggregates available in NumPy can be useful for summarizing a set of values. As a simple
example, let's consider the heights of all US presidents. This data is available in the file
president_heights.csv, which is a simple comma-separated list of labels and values:

!head -4 data/president_heights.csv Output:

order,name,height(cm) 1,George Washington,189

2,John Adams,170 3,Thomas Jefferson,189

We use the Pandas package to read the file and extract this information

import pandas as pd data = pd.read_csv('data/president_heights.csv') heights =


np.array(data['height(cm)']) print(heights)

Output:

56
[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173

174 183 183 168 170 178 182 180 183 178 182 188 175 179 183 193 182 183

177 185 188 188 182 185]

Now that we have this data array, we can compute a variety of summary statistics:
print("Mean height: ", heights.mean())

print("Standard deviation:", heights.std()) print("Minimum height: ", heights.min())


print("Maximum height: ", heights.max())

Output:

Mean height: 179.738095238

Standard deviation: 6.93184344275

Minimum height: 163

Maximum height: 193

In each case, the aggregation operation reduces the entire array to a single summarizing
value, which gives us information about the distribution of values. We can also compute
quantiles:

print("25th percentile: ", np.percentile(heights, 25))

print("Median: ", np.median(heights)) print("75th percentile: ", np.percentile(heights,


75)) Output:

25th percentile: 174.25

Median: 182.0

75th percentile: 183.0

We see that the median height of US presidents is 182 cm. It is more useful to see a visual
representation of this data, which we can accomplish using tools in Matplotlib

57
For example, this code generates the following chart:

%matplotlib inline import matplotlib.pyplot as plt import seaborn; seaborn.set() # set plot
style

plt.hist(heights) plt.title('Height Distribution of US Presidents') plt.xlabel('height (cm)')


plt.ylabel('number');

Output:

Computation on Arrays: Broadcasting

NumPy's universal functions can be used to vectorizeoperations and thereby remove slow
Python loops. Another means of vectorizing operations is to use NumPy's broadcasting
functionality. Broadcasting is simply a set of rules for applying binary ufuncs (e.g., addition,
subtraction, multiplication, etc.) on arrays of different sizes.

MATERIALS

https://www.youtube.com/watch?v=oG1t3qlzq14

https://www.youtube.com/watch?v=tuKHsfAehz4

https://www.youtube.com/watch?v=0u9OzBSRZec

58
Introducing Broadcasting

For arrays of the same size, binary operations are performed on an element-by-element
basis: import numpy as np a

= np.array([0, 1, 2]) b

= np.array([5, 5, 5])

a+b

Output:

array([5, 6, 7])

Broadcasting allows these types of binary operations to be performed on arrays of different


sizes. For example, we can add a scalar to an array:

a + 5 Output:

array([5, 6, 7])

We can think of this as an operation that stretches or duplicates the value 5 into the array
[5, 5, 5] and adds the results. The advantage of NumPy's broadcasting is that this
duplication of values does not actually take place, but it is a useful mental model as we
think about broadcasting.

We can similarly extend this to arrays of higher dimension. Consider adding a


onedimensional array to a two-dimensional array:

M = np.ones((3, 3)) M

Output:

array([[ 1., 1., 1.],

[ 1., 1., 1.],

[ 1., 1., 1.]])

M + a Output:

59
array([[ 1., 2., 3.],

[ 1., 2., 3.],

[ 1., 2., 3.]])

Here the one-dimensional array a is stretched or broadcast across the second dimension in
order to match the shape of M.

Consider broadcasting of both arrays:

a = np.arange(3) b = np.arange(3)[:, np.newaxis]

print(a) print(b) Output:

[0 1 2]

[[0]

[1]

[2]]

a + b Output:

array([[0, 1, 2],

[1, 2, 3],

[2, 3, 4]])

Here we have stretched botha and b to match a common shape and the result is a

two-dimensional array. The geometry of these examples is visualized in the following figure

60
61
he l ht o es represent the roadcasted values

s s

roadcast n n NumPy follows a str ct set of rules to determ ne the nteract on


etween the two arrays:

ule : If the two arrays d ffer n the r num er of d mens ons the shape of the
one w th fewer d mens ons s paddedw th ones on ts lead n left s de
ule : If the shape of the two arrays does not match n any d mens on the
array w th shape e ual to n that d mens on s stretched to match the other
shape
ule : If n any d mens on the s es d sa ree and ne ther s e ual to an
error s ra sed
s
dd n a two d mens onal array to a one d mens onal

array: np ones a np aran e 62


Let s cons der an operat on on these two arrays he shape of the arrays are

shape
a shape
We see y rule that the array a has fewer d mens ons so we pad t on the left w th
ones:

shape
a shape
y rule we now see that the f rst d mens on d sa rees so we stretch th s d mens on
to match:

shape
a shape

he shapes match and we see that the f nal shape w ll e :

utput:

array

n e ample where oth arrays need to e roadcast:

a np aran e reshape

np aran e

he shape of the arrays:

a shape
shape
ule says we must pad the shape of w th ones:

a shape
shape

63
Let s cons der an operat on on these two arrays he shape of the arrays are

shape
a shape
We see y rule that the array a has fewer d mens ons so we pad t on the left w th
ones:

shape
a shape
y rule we now see that the f rst d mens on d sa rees so we stretch th s d mens on
to match:

shape
a shape

he shapes match and we see that the f nal shape w ll e :

utput:

array

n e ample where oth arrays need to e roadcast:

a np aran e reshape

np aran e

he shape of the arrays:

a shape
shape
ule says we must pad the shape of w th ones:

a shape
shape

64
Let s cons der an operat on on these two arrays he shape of the arrays are

shape
a shape
We see y rule that the array a has fewer d mens ons so we pad t on the left w th
ones:

shape
a shape
y rule we now see that the f rst d mens on d sa rees so we stretch th s d mens on
to match:

shape
a shape

he shapes match and we see that the f nal shape w ll e :

utput:

array

n e ample where oth arrays need to e roadcast:

a np aran e reshape

np aran e

he shape of the arrays:

a shape
shape
ule says we must pad the shape of w th ones:

a shape
shape

65
nd rule tells us that we up rade each of these ones to match the correspond n s e
of the other array:

a shape
shape

ecause the result matches these shapes are compat le We can see th s here:

utput:

array

n e ample n wh ch the two arrays are not compat le:

np ones a np aran e

Here the matr s transposed he shape of the arrays are

shape
a shape

an y rule we must pad the shape of a w th ones:

shape
a shape

y rule the f rst d mens on of a s stretched to match that of :

shape
a shape
Now y rule the f nal shapes do not match so these two arrays are ncompat le
a

utput:

alue rror: operands could not e roadcast to ether w th shapes


66
We can try ma n a and compat le y padd n a s shape w th ones on the r ht
rather than the left ut th s s not how the roadcast n rules wor hat sort of
fle l ty m ht e useful n some cases ut t would lead to potent al areas of
am u ty If we want to do r ht s de padd n we can e pl c tly do so y reshap n
the array or th s we have to use the np newa s eyword

a : np newa s shape

utput:

a : np newa s

utput:

array

hese roadcast n rules apply to any nary ufunc or e ample the lo adde p a

funct on wh ch computes lo e p a e p np lo adde p a : np newa s

utput:

array

s P

s ufuncs allow a NumPy user to remove the need to e pl c tly wr te slow Python loops
roadcast n e tends th s a l ty he e ample s when center n an array of data
ons der an array of o servat ons each of wh ch cons sts of values s n the
standard convent on we w ll store th s n a array:

np random random
67
We can compute the mean of each feature us n the mean a re ate across the f rst
d mens on:
mean mean
mean utput: array

nd now we can center the array y su tract n the mean th s s a roadcast n


operat on :
centered mean
o chec that we have done th s correctly we can chec that the centered array has
near ero mean: centered mean utput: array e e
e o w th n mach ne prec s on the mean s now ero

P s
roadcast n s very useful n d splay n ma es ased on two d mens onal funct ons
If we want to def ne a funct on f y roadcast n can e used to compute the
funct on across the r d:
and y have steps from to
np l nspace y
np l nspace : np newa s

np s n np cos y np cos
We w ll use atplotl to plot th s two d mens onal array
matplotl nl ne
mport matplotl pyplotas plt
plt mshow or n lower e tent
cmap v r d s plt color ar
utput:

68
he result s a v sual at on of the two d mens onal funct on

s s s s

as n comes up when you want to e tract mod fy count or otherw se man pulate
values n an array ased on some cr ter on: for e ample you m ht w sh to count all
values reater than a certa n value or perhaps remove all outl ers that are a ove
some threshold In NumPy oolean mas n s often the most eff c ent way to
accompl sh these types of tas s

Ima ne you have a ser es of data that represents the amount of prec p tat on each
day for a year n a ven c ty or e ample here we ll load the da ly ra nfall stat st cs
for the c ty of Seattle n us n Pandas

In : mport numpy as np

mport pandas as pd

use Pandas to e tract ra nfall nches as a NumPy array

ra nfall pd read csv data Seattle csv P P values

nches ra nfall mm nches

69
nches shape

ut :

he array conta ns values v n da ly ra nfall n nches from anuary to


Decem er s a f rst u c v sual at on let sloo at the h sto ram of ra ny
days shown n elow f ure

In : matplotl nl ne

mport matplotl pyplotas plt

mport sea orn sea orn set set plot styles

In : plt h st nches

H sto ram of ra nfall n Seattle

h s h sto ram ves us a eneral dea of what the data loo s l e: desp te ts reputat on
the vast ma or ty of days n Seattle saw near ero measured ra nfall n ut th s
doesn t do a ood o of convey n some nformat on we d l e to see: for e ample how
many ra ny days were there n the year What s the avera e prec p tat on on those
ra ny days How many days were there w th more than half an nch of ra n

70
style of array nde n s nown as fancy nde n ancy nde n s l e the s mple
nde n ut we pass arrays of nd ces n place of s n le scalars h s allows us to very
u c ly access and mod fy compl cated su sets of an array s values

ancy nde n s pass n an array of nd ces to access mult ple array elements at once
or e ample cons der the follow n array:
mport numpy as np rand
np random andomState
rand rand nt s e pr nt
utput:

Suppose we want to access three d fferent elements We could do t l e th s:


utput:

lternat vely wecan pass a s n le l st or array of nd ces to o ta n the same result:


nd
nd
utput:
array
When us n fancy nde n the shape of the result reflects the shape of the nde
arraysrather than the shape of the array e n nde ed:
nd np array

nd
utput:
array

ancy nde n also wor s n mult ple d mens ons ons der the follow n array:
np aran e reshape

utput:
array
71
L e w th standard nde n the f rst nde refers to the row and the second to the
column:
row np array
col np array
row col
utput:
array
he f rst value n the result s the second s and the th rd s
he pa r n of nd ces n fancy nde n follows all the roadcast n rules So for
e ample f we com ne a column vector and a row vector w th n the nd ces we et
a two d mens onal result: row : np newa s col utput:
array

Here each row value s matched w th each column vector e actly as we saw n
roadcast n of ar thmet c operat ons or e ample:
row : np newa s col
utput:
array

In fancy nde n the return value reflects the roadcasted shape of the nd ces
rather than the shape of the array e n nde ed

or even more powerful operat ons fancy nde n can e com ned w th the other
nde n schemes pr nt
utput:

We can com ne fancy and s mple nd ces: 72


utput:
array
We can also com ne fancy nde n w th sl c n :
:
utput:
array

nd we can com ne fancy nde n w th mas n :


mas np array dtype ool
row : np newa s mas
utput:
array

ll of these nde n opt ons com ned lead to a very fle le set of operat ons for
access n and mod fy n arrayvalues

S P s
ne common use of fancy nde n s the select on of su sets of rows from a matr
or e ample we m ht have an N y D matr represent n N po nts n D d mens ons
such as the follow n po nts drawn from a two d mens onal normal d str ut on:
mean
cov

rand mult var ate normal mean cov


shape
utput:

we can v sual e these po nts as a scatter plot:


matplotl nl ne mport matplotl pyplot as
plt mport sea orn sea orn set for plot
styl n plt scatter : :

73
utput:

Let s use fancy nde n to select random po nts We ll do th s y f rst choos n


random nd ces w th no repeats and use these nd ces to select a port on of the or nal
array:
nd ces np random cho ce shape
replace alse nd ces utput:
array
select on nd ces
fancy nde n here select on shape utput:

Now to see wh ch po nts were selected let s over plot lar e c rcles at the locat ons of
the selected po nts:
plt scatter : : alpha
plt scatter select on : select on :
facecolor none s

74
h s sort of strate y s often used to u c ly part t on datasets as s often needed n
tra n test spl tt n for val dat on of stat st cal models and n sampl n approaches to
answer n stat st cal uest ons

s
s fancy nde n can e used to access parts of an array t can also e used to mod fy
parts of an array or e ample say we have an array of nd ces and we want to set
the correspond n tems n an array to some value:
np aran e
np array

pr nt
utput:

We can use any ass nment type operator for th s or e ample:

pr nt
utput:

We can use any ass nment type operator for th s or e ample:

pr nt
utput: 75
he repeated nd ces w th these operat ons can cause some potent ally une pected results
ons der the follow n :
np eros

pr nt
utput:

Where d d the o he result of th s operat on s to f rst ass n followed y

he result of course s that conta ns the value ut cons der th s operat on:

utput:

array
We m ht e pect that would conta n the value and would conta n the value as
th s s how many t mes each nde s repeated

Why s th s not the case onceptually th s s ecause s meant as a shorthand of


s evaluated and then the result s ass ned to the nd ces n W th
th s t s not the au mentat on that happens mult ple t mes ut the ass nment wh ch leads
to the non ntu t ve results

So what f we want a method where the operat on s repeated or th s We can use the at
method of ufuncs and do the follow n :

np eros

np add at

pr nt

utput:

he at method does an n place appl cat on of the ven operator at the spec f ed
nd ces here w th the spec f ed value here nother method that s s m lar n
sp r t s the reduceat method of ufuncs 76
We can use these deas to eff c ently n data to create a h sto ram y hand or
e ample ma ne we have values and would l e to u c ly f nd where they fall
w th n an array of ns We could compute t us n ufunc at l e th s:

np random seed

np random randn compute a

h sto ram y hand ns


np l nspace counts

np eros l e ns f nd the

appropr ate n for each

np searchsorted ns add to

each of these ns

np add at counts

he counts now reflect the num er of po nts w th n each n n other words a


h sto ram: plot the results plt plot ns counts l nestyle steps

utput:
atplotl prov des the plt h st rout ne wh ch does the same n a s n le l ne:

plt h st ns h sttype step

h s funct on w ll create a nearly dent cal plot to the one seen here o compute the
nn n matplotl uses the np h sto ram funct on wh ch does a very s m lar
computat on to what we d d efore Let s compare the two here:
77
pr nt NumPy rout ne:
t me t counts ed es np h sto ram ns pr nt ustom

rout ne:

t me t np add at counts np searchsorted ns

utput:

NumPy rout ne:


loops est of : s per loop ustom

rout ne:

loops est of : s per loop


NumPy s al or thm s more fle le and part cularly s des ned for etter performance
when the num er of data po nts ecomes lar e:

np random randn

pr nt NumPy rout ne:

t me t counts ed es np h sto ram ns pr nt ustom

rout ne:

t me t np add at counts np searchsorted ns

utput:

NumPy rout ne:


loops est of : ms per loop ustom

rout ne:

loops est of : ms per loop


S s

Sort n the values n a l st or array

or e ample a s mple select onsort repeatedly f nds the m n mum value from a l st and ma es
swaps unt l the l st s sorted We can code th s n ust a few l nes of Python:

mport numpy as np

def select on sort :


78
for n ran e len :
swap np ar m n :

swap swap

return np array

select on sort

utput:

array

s S P s s

o return a sorted vers on of the array w thout mod fy n the nput you can use np sort:
np array

np sort

utput:

array

o sort the array n place we can use the sort method of arrays:

sort pr nt

utput:

related funct on s ar sort wh ch nstead returns the nd ces of the sorted elements:

np array

np ar sort

pr nt

utput:

he f rst element of th s result ves the nde of the smallest element the second
value ves the nde of the second smallest and so on hese nd ces can then e
used v a fancy nde n to construct the sorted array f re u red:

79
utput:

array

S s s
useful feature of NumPy s sort n al or thms s the a l ty to sort alon spec f c rows
or columns of a mult d mens onalarray us n the a s ar ument or e ample:

rand np random andomState

rand rand nt

pr nt

utput:

sort each column of


np sort a s

utput:

array

sort each row of

np sort a s

utput:

array

80
P S s P
NumPy prov des the np part t on funct on np part t on ta es an array and a num er
the result s a new array w th the smallest values to the left of the part t on and
the rema n n values to the r ht n ar trary order:

np array
np part t on

utput:

array
he f rst three values n the result n array are the three smallest n the array and the
rema n n array pos t ons conta n the rema n n values W th n the two part t ons the
elements have ar trary order
S m larly to sort n we can part t on alon an ar trary a s of a mult d mens onalarray:
np part t on a s

utput:

array

he result s an array where the f rst two slots n each row conta nthe smallest values
from that row w th the rema n n values f ll n the rema n n slots

nally there s a np ar sort that computes nd ces of the sort there s a np ar part t on
that computes nd ces of the part t on

s s
We use th s ar sort funct on alon mult ple a es to f nd the nearest ne h ors of each
po nt n a set We w ll start y creat n a random set of po nts on a twod mens onal
plane s n the standard convent on we w ll arran e these n a array:
rand rand
81
he scatter plot of the a ove s:
matplotl nl ne mport matplotl pyplot
as plt mport sea orn sea orn set Plot
styl n plt scatter : : s
utput:

We w ll compute the d stance etween each pa r of po nts he s uared d stance


etween two po nts s the sum of the s uared d fferences n each d mens on us n the
eff c ent roadcast n and a re at on prov ded y NumPy we can compute the matr
of s uare d stances n a s n le l ne of code:
d st s np sum : np newa s : np newa s : : a s
It w ll e useful to rea t down nto ts component steps: for each pa r
of po nts compute d fferences n the r coord nates d fferences :
np newa s : np newa s : : d fferences shape utput:

s uare the coord nate d fferences


s d fferences d fferences
s d fferences shape utput:

sum the coord nate d fferences to et the s uared d stance d st s


s d fferences sum
D st s shape
utput:

o chec for correctness we should see that the d a onal of th s matr e the set
of d stances etween each po nt and tself s all ero:
82
d st s d a onal
utput:
array
W th the pa rw se s uare d stances converted we can now use np ar sort to sort alon
each row he leftmost columns w ll then ve the nd ces of the nearest ne h ors:
nearest np ar sort d st s a s
pr nt nearest
utput:

he f rst column ves the num ers throu h n order: th s s due to the fact that
each po nt s closest ne h or s tself In the nearest ne h ors all we need s to
part t on each row so that the smallest s uared d stances come f rst w th lar er
d stances f ll n the rema n n pos t ons of the array We can do th s w th the
np ar part t on funct on:

nearest part t on np ar part t on d st s a s


In order to v sual e th s networ of ne h ors We can plot the po nts alon w th l nes
represent n the connect ons from each po nt to ts two nearest ne h ors:
plt scatter : : s
draw l nes from each po nt to ts two nearest ne h ors

for n ran e shape :


for n nearest part t on : :
plot a l ne from to 83
use some p ma c to ma e t happen:
plt plot p color lac

ach po nt n the plot has l nes drawn to ts two nearest ne h ors Some of the po nts
have more than two l nes com n out of them: th s s due to the fact that f po nt s
one of the two nearest ne h ors of po nt th s does not necessar ly mply that po nt
s one of the two nearest ne h ors of po nt

S P sS s
NumPy s structured arrays and record arrays prov de eff c ent stora e for compound
hetero eneous data Wh le the patterns are useful for s mple operat ons the pandas
dataframes are also used

mport numpy as np
ons der that we have several cate or es of data on a num er of people name a e
and we ht and we want to store these values for use n a Python pro ram It would

e poss le to store these n three separate arrays:


name l ce o athy Dou
a e we ht

here s noth n here that tells us that the three arrays are related t would e more
natural f we could use a s n le structure to store all of th s data NumPy can handle
th s throu h structured arrays wh ch are arrays w th compound data types

We can create a structured array us n a compound data type spec f cat on:

84
se a compound data type for structured arrays data
np eros dtype names : name a e we ht

formats : f
pr nt data dtype
utput:
name a e we ht f
Here translatesto n code str n of ma mum len th translatesto yte
e t nte er and f translates to yte e t float
Now that we have created an empty conta ner array we can f ll the array w th our l sts
of values:
data name name
data a e a e
data we ht we ht
pr nt data utput:
l ce o athy
Dou
We can refer to values e ther y nde or y name:
et all names
data name
utput:
array l ce o athy Dou
dtype et f rst row of
data data utput:
l ce
et the name from the last row
data name
utput:
Dou
oolean mas n allows us to do operat ons such as f lter n on a e:
et names where a e s under
data data a e name
utput:
array l ce Dou
dtype Pandas prov des
a Dataframe o ect wh ch s a
structure u lt on NumPy arrays 85
that offers a var ety of useful
data man pulat on funct onal ty

S s
Structured array data types can e spec f ed n a num er of
ways np dtype names : name a e we ht
formats : f utput:
dtype name a e we ht f numer cal types
can e spec f ed us n Python types or NumPy dtypes :
np dtype names : name a e we ht
formats : np str nt np float
utput:
dtype name a e we ht f
compound type can also e spec f ed as a l st of
tuples: np dtype name S a e we ht
f utput:
dtype name S a e we ht f
We can spec fy the types alone n a comma separated str n : np dtype S f
he f rst opt onal character s or wh ch means l ttle end an or end an
respect vely and spec f es the order n convent on for s n f cant ts he ne t
character spec f es the type of data: characters ytes nts float n po nts and so on
he last character or characters represents the s e of the o ect n ytes

86
s
We can create a type where each element contains an array or matrix of values.we
will create a data type with a mat component consisting of a 3×3 floating-point
matrix:
tp = np.dtype([('id', 'i8'), ('mat', 'f8', (3, 3))]) X = np.zeros(1, dtype=tp) print(X[0])
print(X['mat'][0]) Output:
(0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])
[[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
Now each element in the X array consists of an id and a 3×3 matrix.
NumPy dtype directly maps onto a C structure definition, so the buffer containing the
array content can be accessed directly within an appropriately written C program.
1.9.3 RecordArrays: Structured Arrays with a Twist
NumPy provides the np.recarray class, which is almost identical to the structured
arrays but with one additional feature: fields can be accessed as attributes rather
than as dictionary keys. data['age'] Output: array([25, 45,
37, 19], dtype=int32)
If we view our data as a record array, we can access this with fewer keystrokes:
data_rec =
data.view(np.recarray)
data_rec.age
Output:
array([25, 45, 37, 19], dtype=int32)
For record arrays, there is some extra overhead involved in accessing the fields,
even when using the same syntax: %timeit data['age']
%timeit data_rec['age']
%timeit data_rec.age
Output:
1000000 loops, best of 3: 241 ns per loop
100000 loops, best of 3: 4.61 µs per loop 100000 loops, best of 3: 7.27 µs per loop
MATERIALS
https://www.youtube.com/watch?v=3eAMvnIxQd0
https://www.youtube.com/watch?v=0MEO9wzSxTE
https://www.youtube.com/watch?v=awXS7_-52fY

87
https://www.youtube.com/watch?v=8y-o1zWSXR8
https://www.youtube.com/watch?v=0KxB7IMoqQg

88
ASSIGNMENT

89
10. ASSIGNMENT

Q. Question CO K Level
No. Level

Find the number of rows and columns of a given matrix


CO2 K3
1.
using NumPy

NumPy – Fibonacci Series using Binet Formula


2. CO2 K3

Extracting the real and imaginary parts of an NumPy CO2 K3


3.
array of complex numbers

Explain Aggregation Functions and Fancy Indexing with


examples in Numpy.
4. CO2 K2

Explain selection sort and other sorting methods used in


5. CO2 K2
Numpy with Examples.

90
11. PART-A
UNIT-2 Q&A

91
1. What are the categories of basic array manipulation? (CO2,K2)
Attributes of arrays

Determining the size, shape, memory consumption, and data types of arrays.

Indexing of arrays

Getting and setting the value of individual array elements.

Slicing of arrays

Getting and setting smaller sub arrays within a larger array

Reshaping of arrays

Changing the shape of a given array

Joining and splitting of arrays

Combining multiple arrays into one, and splitting one array into many

2. What is the syntax for numpy slicing? (CO2,K2)

The NumPy slicing syntax follows that of the standard Python list, to access a slice of an

array x:

x[start:stop:step]

If any of these are unspecified, they default to the values start=0, stop=size of dimension,

step=1. We can access sub-arrays in one dimension and in multiple dimensions.

92
What w ll e the output for the elow code: array

pr nt :
utput:

What do you mean y ufuncs ufuncs are the un versal funct ons
he ector ed operat ons n NumPy are mplemented v a ufuncs whose ma n purpose
s to u c ly e ecute repeated operat ons on values n NumPy arrays NumPy s
un versal funct ons can e used to vector e operat ons and there y remove slow
Python loops

What s the purpose of the a s eyword

he a s eyword spec f es the d mens on of the array that w ll e collapsed rather


than the d mens on that w ll e returned So spec fy n a s means that the f rst
a s w ll e collapsed or two d mens onal arrays th s means that values w th n each
column w ll e a re ated

What are the rules for roadcast n


roadcast n n NumPy follows a str ct set of rules to determ ne the nteract on
etween the two arrays:

a ule : If the two arrays d ffer n the r num er of d mens ons the shape of
the one w th fewer d mens ons s paddedw th ones on ts lead n left
s de

93
P
ule : If the shape of the two arrays does not match n any d mens on
the array w th shape e ual to n that d mens on s stretched to match the
other shape
c ule : If n any d mens on the s es d sa ree and ne ther s e ual to an
error s ra sed

What s fancy nde n

style of array nde n s nown as fancy nde n


ancy nde n s l e the s mple nde n ut we pass arrays of nd ces n place of
s n le scalars h s allows us to very u c ly access and mod fy compl cated su sets
of an array s values

What s the d fference etween np sort and np ar sort


np sort s used to return a sorted vers on of the array w thout mod fy n the nput

np ar sort s used to return the nd cesof the sorted elements

What s the output of the ven code

data np eros dtype names : name a e we ht

formats : f
pr nt data dtype
utput:
name a e we ht f

What s the d fference etween numpy array and pandas ser es

Wh le the Numpy rray has an mpl c tly def ned nte er nde used to access the
values the Pandas Ser es has an e pl c tly def ned nde assoc ated w th the values

94
P
h s e pl c t nde def n t on ves the Ser es o ect add t onal capa l t es or
e ample the nde need not e an nte er ut can cons st of values of any des red
type or e ample we can use str n s as an nde
How the ser es o ect can e mod f ed

Ser es o ects can e mod f ed w th a d ct onary l e synta ust as we can e tend a


d ct onary y ass n n to a new ey we can e tend a Ser es y ass n n to a new
nde value

What s python none o ect

he f rst sent nel value used y Pandas s None a Python s n leton o ect that s
often used for m ss n data n Python code ecause t s a Python o ect None
cannot e used n any ar trary NumPy Pandas array ut only n arrays w th data
type o ect e arrays of Python o ects

What s the use of mult nde n

ult nde n s used to represent two d mens onal data w th n a one d mens onal
Ser es We can also use t to represent data of three or more d mens ons n a Ser es
or Data rame ach e tra level n a mult nde represents an e tra d mens on of
data

What s pd mer e funct on

he pd mer e funct on mplements a num er of types of o ns: the one to one

many to oneand many to many o ns ll three types of o ns are accessed v a an


dent cal call to the pd mer e nterface he type of o n performed depends on the
form of the nput data

95
P

What s descr e method

he method descr e computes several common a re ates for each column and
returns the result We can use th s method on the dataset for dropp n rows w th
m ss n values
What s spl t apply and com ne
he spl t step nvolves rea n up and roup n a data frame depend n on
the value of the spec f ed ey
he apply step nvolves comput n some funct on usually an a re ate
transformat on or f lter n w th n the nd v dual roups
he com ne step mer es the results of these operat ons nto an output array

What s the use of et and sl ce operat ons


he et and sl ce operat ons ena le vector ed element access from each array
or e ample we can et a sl ce of the f rst three characters of each array us n
str sl ce
et and sl ce methods also let us access elements of arrays returned y spl t
or e ample to e tract the last name of each entry we can com ne spl t and et

What do you mean y datet me and dateut l


he datet me type s used to manually u ld a date s n the dateut l module we
can parse dates from a var ety of str n formats W th datet me o ect we can pr nt
the day of the wee

What s the advanta e of us n nume pr l rary


he Nume pr l rary ves the a l ty to compute compound e press ons element y
element w thout the need to allocate full ntermed ate arrays

Nume pr evaluates the e press on n a way that does not use full s ed temporary
arrays and can e much more eff c ent than NumPy espec ally for lar e arrays he
Pandas eval and uery tools are conceptually s m lar and depend on the Nume pr
pac a e

96
12. PART B
QUESTIONS :
UNIT –II

97
I. Explain all the array manipulation functions with examples in Numpy.( CO2,K3)

II. Write a short notes on Computation on Arrays. ( CO2 , K2)

III. Explain Aggregation Functions and Fancy Indexing with examples in Numpy. (CO2, K3)

IV. Explain selection sort and other sorting methods used in Numpy with Examples. ( CO2,

K3)

98
13. PART C
QUESTIONS :
UNIT –II

99
I. How to create an empty and a full NumPy array?
II. Create a Numpy array filled with all zeros
III. Create a Numpy array filled with all ones
IV. Check whether a Numpy array contains a specified row
V. How to Remove rows in Numpy array that contains non-numeric values?
VI. Remove single-dimensional entries from the shape of an array
VII. Find the number of occurrences of a sequence in a NumPy array
VIII. Find the most frequent value in a NumPy array
IX. Combining a one and a two-dimensional NumPy Array
X. How to build an array of all combinations of two NumPy arrays?
XI. How to add a border around a NumPy array?
XII. How to compare two NumPy arrays?
XIII. How to check whether specified values are present in NumPy array?
XIV. How to get all 2D diagonals of a 3D NumPy array?
XV. Flatten a Matrix in Python using NumPy

100
14 SUPPORTIVE ONLINE CERTIFICATION COURSES

NPTEL: https://onlinecourses.nptel.ac.in/noc21_cs69/preview?

COURSE ERA: https://www.coursera.org/learn/python-data-analysis

UDEMY: https://www.udemy.com/topic/data-science/

MOOC: https://mooc.es/course/introduction-to-data-science-in-python

Edx: https://learning.edx.org/course/course-v1:Microsoft+DAT208x+2T2016/home

GEEKSOFGEEKS: https://www.geeksforgeeks.org/data-science-fundamentals/

101
REAL LIFE APPLICATIONS IN
DAY TO DAY LIFE AND TO
INDUSTRY

102
15.Real Time Applications in Day to Day life
and to Industry
NumPy is useful for performing mathematical and logical operations on large high-dimensional arrays and
matrices. With it, you can perform a wide range of numerical functions efficiently. NumPy simplifies coding
procedures, provides online access to all its information, and collaborates with other libraries to make

tasks more efficient. Here are four real-life examples where NumPy is used:

1. Web Development

Python is popularly known as the language of choice for web development and Pyramid,
Django, and Flask. Standard libraries are included in these frameworks, making protocol
integration easy and efficient.

2. Education Sector

Python is also used in the development of online courses and education programs. It is an
easy language to learn for beginners since its syntax is similar to English. It provides a
beginner with a standard library and a variety of resources to get a handle on the language,
making it easier to learn. As a result, Python is a preferred programming language for
beginners in developing education programs at both basic and advanced levels.

3. Game Development

Battlefield 2 was one of the most popular video games in the early 2000s, and it was
developed in Python. Python frameworks are commonly used in game development, including
Pygame, PyKyra, Pyglet, PyOpenGL, Kivy, Panda3D, Cocos2D, etc.

4. Software Development

103
Software developers primarily use Python. It simplifies the development of complex
applications. The language is used for project management, as a support language, as build
control, and test.

NumPy vs. Other Technologies & Methodologies


NumPy vs. Pandas

NumPy mainly works with numerical data, while Pandas deals primarily with tabular data. With
Pandas, you can work with numeric data and time series in a fast, easy-to-use
environment. Pandas is written in Python, Cython, and C and is built around the NumPy library.
Data can be imported into Pandas from various file formats, including JSON, SQL, Microsoft
Excel, etc. Lastly, Pandas is used for data analysis and visualization, and NumPy is widely used
for numerical calculations.

NumPy vs. Scipy

NumPy stands for Numerical Python, and SciPy stands for Scientific Python; both are essential
Python libraries. These libraries are used to manipulate data in various ways. In arrays of
homogeneous data, NumPy is used for efficient operations. SciPy is a set of Python tools. These
tools support integration, differentiation, gradient optimization, and many other functions. They
are faster than other popular tools on the market. All general numerical computation is done via
SciPy in Python.

NumPy vs. Matlab

Python indexing begins at 0 and is performed with brackets, whereas MATLAB indexing begins
at one and is performed with parentheses. NumPy provides efficient operations on arrays of
homogeneous data in Python. Python can thus be used as a high-level language for manipulating
numerical data, similar to IDL, MATLAB, or Yorick. In MATLAB, everything is treated as an array,
whereas everything is a more general object in Python. In MATLAB, strings are arrays of

104
characters or arrays of strings, whereas, in Python, strings are their type of object called str.
MATLAB's scripting language was designed for linear algebra, so some array manipulations are
easier in MATLAB than in NumPy.

Numpy vs. Math

Math is part of the Python standard library. Basic mathematical operations are provided, as well
as some commonly used constants. NumPy, on the other hand, is a third-party package designed
for scientific computation. This is the defacto package for numerical and vector operations in
Python. Math is a standard library that contains functions (trigonometry, logarithms) and
constants. At the same time, NumPy is a mathematical library written in C. While the code is
almost identical, the performance is very different. MATLAB takes 0.252454 seconds to complete
the task, while NumPy takes 0.973672151566 seconds, nearly four times as long.

105
ASSESSMENT SCHEDULE

106
16.ASSESSMENT SCHEDULE

Tentative schedule for the Assessment During 2023-24 EVEN


semester

CYCLE TEST – I : 29.01.2024


MCQ UNIT I & II – 09-02-2024
FIAT: 12.02.2024
CYCLE TEST – II: 14.03.2024
MCQ UNIT III & IV – 28-03-2024
SIAT: 1.04.2024
MODEL:01.05.2024

107
Prescribed Text books &
Reference books

108
17.PRESCRIBED TEXT BOOKS AND REFERENCE BOOKS

TEXT BOOKS:
Dav d elen rno D eysman and ohamed l “Introduc n Data Sc ence” ann n
Publications, 2016. (first two chapters for Unit I)
2. AshwinPajankar, Aditya Joshi, Hands-on Machine Learning with Python: Implement Neural
Network Solutions with Scikit-learn and PyTorch, Apress, 2022.
a e anderPlas “Python Data Sc ence Hand oo ” ’ e lly
REFERENCES:
1. Roger D. Peng, R Programming for Data Science, Lulu.com, 2016
2. Jiawei Han, MichelineKamber, Jian Pei, "Data Mining: Concepts and Techniques", 3rd Edition,
Morgan Kaufmann, 2012.
3. Samir Madhavan, Mastering Python for Data Science, Packt Publishing, 2015
4. Laura Igual, SantiSeguí, "Introduction to Data Science: A Python Approach to Concepts,
Techniques and Applications", 1st Edition, Springer, 2017
5. Peter Bruce, Andrew Bruce, "Practical Statistics for Data Scientists: 50 Essential Concepts", 3rd
Edition, O'Reilly, 2017
Hector uerrero “ cel Data nalys s: odell n and S mulat on” Spr n er Internat onal
Publishing, 2nd Edition, 2019
7. NPTEL Courses:
a. Data Science for Engineers
https://onlinecourses.nptel.ac.in/noc23_cs17/preview
b. Python for Data Science - https://onlinecourses.nptel.ac.in/noc23_cs21/preview

109
MINI PROJECT
SUGGESTIONS

110
18.MINI PROJECT SUGGESTIONS

a) Human Action Recognition


b) Forest Fire Prediction
c) Road Lane Line Detection
d) Recognition of Speech Emotion
e) Gender and Age Detection with Data Science
f) Handwritten Digit & Character Recognition Project
g) Weather Prediction
h) Keyword generation for google ads
i) Traffic Signs Recognition

j) Air Pollution Prediction


k) Product Price Suggestions

111
Thank you

Disclaimer :

This document is confidential and intended solely for the educational purpose of RMK Group of
Educational Institutions. If you have received this document through email in error, please notify the
system manager. This document contains proprietary information and is intended only to the respective
group / learning community as intended. If you are not the addressee you should not disseminate,
distribute or copy through e-mail. Please notify the sender immediately by e-mail if you have received
this document by mistake and delete this document from your system. If you are not the intended
recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the
contents of this information is strictly prohibited.

112

You might also like