Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit 5

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 60

21CSS101J-PROGRAMMINGFORPROBLEMSOLVING

Unit – 05 : Session – 01 : SLO - 01


Creating NumPy Array
• There are various ways to create or initialize arrays in NumPy, one
most used approach is using numpy.array() function. This method
takes the list of values or a tuple as an argument and returns a
ndarray object (NumPy array).
• In Python, matrix-like data structures are most commonly used
with numpy arrays.
• The numpy Python package is well-developed for efficient
computation of matrices.
• N-Dimensional arrays play a major role in machine learning and
data science.
• In order to use NumPy arrays, we have to initialize or create
NumPy arrays.
# Import numpy module
import numpy as np
Cont…

Create NumPy Array


• NumPy arrays support N-dimensional arrays, let’s see how to initialize single and multi-
dimensional arrays using numpy.array() function. This function returns ndarray object.
Syntax of numpy.array()
numpy.array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
like=None)

Create a Single Dimension NumPy Array


• We can create a single-dimensional array using a list of numbers.
• Use numpy.array() function which is the most familiar way to create a NumPy array from
other array-like objects. For example, you can use this function to create an array from a
python list and tuple.
Import numpy module
import numpy as np

Creation of 1D numpy array


arr1=np.array([10,20,30])
print("My 1D array:\n",arr1)
Cont…
Create Multi-Dimensional NumPy Array
• A list of lists will create a 2D Numpy array, similarly, we can also create N-dimensional arrays.

Create a 2D array by using numpy.array() function

arr2 = np.array([[10,20,30],[40,50,60]])
print("My 2D numpy array:\n", arr2)

Creating a Three-dimensional Array and Beyond


• To create a three-dimensional array, specify 3 parameters to the reshape function.
array = np.arange(27).reshape(3,3,3)
Arraypython

Output:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],

[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],

[[18, 19, 20],


[21, 22, 23],
[24, 25, 26]]])
Example
Create a sequence of integers from 0 to 20 with steps of 3

arr= np.arange(0, 20, 3)


print ("A sequential array with steps of 3:\n", arr)

Create a sequence of 5 values in range 0 to 3


arr= np.linspace(0, 3, 5)
print ("A sequential array with 5 values between 0 and 5:\n", arr)

Use asarray() convert array


list = [20,40,60,80]
array = np.asarray(list)
print(" Array:", array)
Example
Use empty() create array
arr = (3, 4) # 3 rows and 4 columns
rr1 = np.empty(arr)
print(" Array with values:\n",arr1)

Use zero() create array


arr = np.zeros((3,2))
print("numpy array:\n", arr)
print("Type:", type(arr))

Use ones() create array


arr = np.ones((2,3))
print("numpy array:\n", arr)
print("Type:", type(arr))
Example
Create array from existing array Using copy()
arr=np.array([10,20,30])
arr1=arr.copy()
print("Original array",arr)
print("Copied array",arr1)

Create array using = operator


arr=np.array([10,20,30])
arr1=arr
print("Original array",arr)
print("Copied array",arr1)
Numpy Indexing
• Elements in NumPy arrays can be accessed by indexing.
• Indexing is an operation that pulls out a select set of values from an array.
• The index of a value in an array is that value's location within the array.
• There is a difference between the value and where the value is stored in
an array.
• An array with 3 values is created in the code section below.
import numpy as np
a = np.array([2,4,6])
print(a)
[2 4 6]
• The array above contains three values: 2, 4 and 6. Each of these values
has a different index.
Remember counting in Python starts at 0 and ends at n-1.
• The value 2 has an index of 0. We could also say 2 is in location 0 of the
array.
• The value 4 has an index of 1 and the value 6 has an index of 2.
Cont…

• The table below shows the index (or location) of each value in the array.

• Individual values stored in an array can be accessed with indexing.


• The general form to index a NumPy array is below:
<value> = <array>[index]
• Where <value> is the value stored in the array, <array> is the array object name
and [index] specifies the index or location of that value.
• In the array above, the value 6 is stored at index 2.
import numpy as np
a = np.array([2,4,6])
print(a)
value = a[2]
print(value)
[2 4 6]
6
Multi-dimensional Array Indexing:
• Multi-dimensional arrays can be indexed as well. A simple 2-D array is defined by a list of
lists.
import numpy as np
a = np.array([[2,3,4],[6,7,8]])
print(a)
[[2 3 4]
[6 7 8]]
Values in a 2-D array can be accessed using the general notation below:
<value> = <array>[row,col]
• Where <value> is the value pulled out of the 2-D array and [row,col] specifies the row and
column index of the value.
• We can access the value 8 in the array above by calling the row and column index [1,2]. This
corresponds to the 2nd row (remember row 0 is the first row) and the 3rd column (column 0 is
the first column).

import numpy as np
a = np.array([[2,3,4],[6,7,8]])
print(a)
value = a[1,2]
print(value)
[[2 3 4]
[6 7 8]]
8
Assigning Values with Indexing
• Array indexing is used to access values in an array. And array indexing can also be used for assigning values of
an array.
• The general form used to assign a value to a particular index or location in an array is below:
<array>[index] = <value>
• Where <value> is the new value going into the array and [index] is the location the new value will occupy.
• The code below puts the value 10 into the second index or location of the array a.

import numpy as np
a = np.array([2,4,6])
a[2] = 10
print(a)
[ 2 4 10]
• Values can also be assigned to a particular location in a 2-D arrays using the form:
<array>[row,col] = <value>
• The code example below shows the value 20 assigned to the 2nd row (index 1) and 3rd column (index 2) of the
array.
import numpy as np
a = np.array([[2,3,4],[6,7,8]])
print(a)
a[1,2]=20
print(a)
[[2 3 4]
[6 7 8]]
[[ 2 3 4]
[ 6 7 20]]
Negative Indexing
• Use negative indexing to access an array from the end.
Example:
Print the last element from the 2nd dim:
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('Last element from 2nd dim: ', arr[1, -1])
Output:
Last element from 2nd dim: 10
21CSS101J – PROGRAMMING FOR PROBLEM SOLVNG

Unit – 05 : Session – 02: SLO - 1

SRM Institute of Science and Technology 13


Numpy Array attributes

• We are going to learn about different array attributes in Numpy


that are essential to know if you want to transform your arrays.

• Array attributes in Numpy


• ndarray.shape
• ndarray.ndim
• ndarray.itemsize
• ndarray.T

SRM Institute of Science and Technology 14


Array Attributes in Numpy
• Array attributes are essential to find out the
shape ,dimension ,item size etc.
• If connected with ndarray object of numpy then we can find
about these in detail.

SRM Institute of Science and Technology 15


Ndarray.Shape
• By using this method in numpy you can know the array
dimensions. It can also be used to resize the array.

1. import numpy as np
2. arr = np.array([[1,2,3,4],[5,6,7,8]])
3. arr
O/P: array([[1,2,3,4], [5,6,7,8]])
You can change the shape of the array by rearranging the tuple
1. arr.shape = (4,2)
2. arr
O/P: array([[1,2],[3,4], [5,6],7,8]])

SRM Institute of Science and Technology 16


Ndarray.ndim
• This method returns the number of dimensions of array.

1. arr = n.array ([[1,2,3,4],[5,6,7,8]])


2. arr
O/P: array([[1,2,3,4], [5,6,7,8]])

1. arr.np.arrange(10). Reshape(2,5)
2. arr
O/P: array([0,1,2,3,4],[5,6,7,8]])

arr.ndim
O/P: 2
SRM Institute of Science and Technology 17
Ndarray.itemsize
• This method returns the length of the array of each
component in bytes.

1. import numpy as np
2. arr = np.array([1,2,3,4,5])
3. arr.itemsize

O/P: 8

SRM Institute of Science and Technology 18


Ndarray.T
• This method creates the transpose method for array. It
converts rows into columns and columns into rows.

1. arr = np.array([[1,2,3,4] , [5,6,7,8]])


2. arr
O/P: [[1,2,3,4], [5,6,7,8]]

Apply the transpose method


1. arr.T
O/P: array([[1,5], [2,6], [3,7], [4,8]])

SRM Institute of Science and Technology 19


Slicing using Numpy
• Slicing in python means taking elements from one given
index to another given index.

we pass slice instead of index like this:


[start:end]

We can also define the step, like this


[ start:end:step]

SRM Institute of Science and Technology 20


Slicing using numpy
Example:
Slice elements from index 1 to index 5 from the following
array:
import numpy as np
arr = np.array([1,2,3,4,5,6,7])
print(arr[1:5])

O/P: [2,3,4,5]

SRM Institute of Science and Technology 21


Negative Slicing
The minus operator to refer to an index from the end
Example:
Slice elements from index 3 from the end of index 1 from the
end:
import numpy as np
arr = np.array([1,2,3,4,5,6,7])
print(arr[-3:-1])

O/P: [5,6]

SRM Institute of Science and Technology 22


STEP
Use the step value to determine the step of the slicing
Example:
Return every other element from index 1 to index 5:
import numpy as np
arr = np.array([1,2,3,4,5,6,7])
print(arr[1:5:2])

O/P: [2,4]

SRM Institute of Science and Technology 23


Slicing 2D Array
Example:
From the second element, slice elements from index 1 to index
4
import numpy as np
arr = np.array([1,2,3,4,5], [6,7,8,9,10]])
print(arr[1, 1:42])

O/P: [7,8,9]

SRM Institute of Science and Technology 24


21CSS101J-PROGRAMMINGFORPROBLEMSOLVING
Unit – 05 : Session – 03 : SLO - 03
Descriptive Statistics in NumPy
• Descriptive Statistics is the building block of data science.
• Advanced analytics is often incomplete without analyzing descriptive statistics
of the key metrics.
• In simple terms, descriptive statistics can be defined as the measures that
summarize a given data, and these measures can be broken down further into
the measures of central tendency and the measures of dispersion.
• Measures of dispersion are values that describe how the data varies.
• It gives us a sense of how much the data tends to diverge from the typical
value, while central measures give us an idea about the typical value of the
distribution.
• Measures of central tendency include mean, median, and the mode.
• On the other hand, the measures of dispersion include standard deviation,
variance, and the interquartile range.
• We will cover the following topics in detail:
1. Percentile in NumPy.
2. Variance in Numpy.
Percentile

• Percentile is a measure which indicates the value below which a given percentage of
points in a dataset fall. For instance, the 35th percentile(\(P_{35}\)) is the score below
which 35% of the data points may be found.
• We can observe that median represents the 50th percentile. Similarly, we can have 0th
percentile representing the minimum and 100th percentile representing the maximum
of all data points.
• There are various methods of calculation of quartiles and percentiles, but we will stick to
the one below. To calculate \(k^{th}\) percentile(\(P_{k}\)) for a data set of \(N\)
observations which is arranged in increasing order, go through the following steps:
• Step 1: Calculate \(\displaystyle i=\frac{k}{100}\times N\)
• Step 2: If \(i\) is a whole number, then count the observations in the data set from left to
right till we reach the \(i^{th}\) data point. The \(k^{th}\) percentile, in this case, is equal
to the average of the value of \(i^{th}\) data point and the value of the data point that
follows it.
• Step 3: If \(i\) is not a whole number, then round it up to the nearest integer and count
the observations in the data set from left to right till we reach the \(i^{th}\) data point.
The \(k^{th}\) percentile now is just equal to the value corresponding this data point.
Example

• Suppose we want to calculate \(P_{27}\) for the marks of students in Subject 2.


Let us first arrange the data in increasing order which results in the following
dataset {8,9,12,14.5,15.5,17,18}.
• Following the steps above,
Step 1: \(\displaystyle i=\frac{27}{100}\times 7 = 1.89\)

Step 2: Not applicable here as 1.89 is not a whole number, so let us move
to step 3

Step 3: Rounding up 1.89 gives 2, hence 27th percentile is the value of


second observation, i.e., 9

Therefore, 9 is \(27^{th}\) percentile which means that 27% of the students have
scored below 9.
Percentiles with NumPy
• numpy.percentile(a, q, axis=None,iterpolation=’linear’)
• a: array containing numbers whose range is required
q: percentile to compute(must be between 0 and 100)
axis: axis or axes along which the range is computed, default is to compute the
range of the flattened array
interpolation: it can take the values as ‘linear’, ‘lower’, ‘higher’, ‘midpoint’or
‘nearest’. This parameter specifies the method which is to be used when the
desired quartile lies between two data points, say i and j.
• linear: returns i + (j-i)*fraction, fraction here is the fractional part of the index
surrounded by i and j
• lower: returns i
• higher: returns j
• midpoint: returns (i+j)/2
• nearest: returns the nearest point whether i or j
• numpy.percentile() agrees with the manual calculation of percentiles (as shown
above) only when interpolation is set as ‘lower’.
Example

>>> import numpy as np


>>> A=np.array([[10,14,11,7,9.5,15,19],[8,9,17,14.5,12,18,15.5],
[15,7.5,11.5,10,10.5,7,11],[11.5,11,9,12,14,12,7.5]])
>>> B=A.T
>>> a=np.percentile(B,27,axis=0, interpolation='lower')
>>> b=np.percentile(B,25,axis=1, interpolation='lower')
>>> c=np.percentile(B,75,axis=0, interpolation='lower')
>>> d=np.percentile(B,50,axis=0, interpolation='lower')
>>> print(a) [ 9.5 9. 7.5 9. ]
>>> print(b) [ 8. 7.5 9. 7. 9.5 7. 7.5]
>>> print(c) [ 14. 15.5 11. 12. ]
>>> print(d) [ 11. 14.5 10.5 11.5]
Importance of Percentile
• Percentile gives the relative position of a particular value within the dataset.
• If we are interested in relative positions, then mean and standard deviations
does not make sense.
• In the case of exam scores, we do not know if it might have been a difficult
exam and 7 points out of 20 was an amazing score.
• In this case, personal scores in itself are meaningless, but the percentile would
reflect everything.
• For example, GRE and GMAT scores are all in terms of percentiles.
• Another good property of percentiles is that it has a universal interpretation;
• i.e., it does not depend on whether we are looking at exam scores or the
height of the players across a few basketball teams.
• 55th percentile would always mean that 55 % would always be found below
the value and other 45% would be above it.
• It helps in comparing the data sets which have different means and deviations.
Variance
• Variance is another measure of dispersion.

• It is the square of the standard deviation and the covariance of the random
variable with itself.

• numpy.var(a, axis=None, dtype=None, ddof=0)

• Parameters are the same as numpy.mean except


• ddof : int, optional(ddof stands for delta degrees of freedom. It is the divisor
used in the calculation, which is N – ddof, where N is the number of elements.
The default value of ddof is 0)
Variance with Numpy

>>> import numpy as np

>>> A=np.array([[10,14,11,7,9.5,15,19],[8,9,17,14.5,12,18,15.5],
[15,7.5,11.5,10,10.5,7,11],[11.5,11,9,12,14,12,7.5]])

>>> B=A.T

>>> a = np.var(B,axis=0)

>>> b = np.var(B,axis=1)

>>> print(a) [ 13.98979592 12.8877551 6.12244898 3.92857143]

>>> print(b) [ 6.546875 5.921875 8.796875 7.546875 2.875 16.5 19.0625 ]


Cont..
Importance of Variance:
• It is an important measure in descriptive statistics because it allows us to
measure the spread of a data set around its mean.
• The observations may or may not be meaningful if observations in data sets
are highly spread.

Limitations of descriptive statistics


• Descriptive statistics measures are limited in the way that we can only make
the summary about the people or objects that are actually measured.
• The data cannot be used to generalize to other people or objects.
• For example, if we have recorded the marks of the students for the past few
years and would want to predict the marks for next exam, we cannot do that
only relying on descriptive statistics; inferential statistics would help.
• Descriptive statistics can often be difficult when we are dealing with a large
dataset.
21CSS101J –
Programming for Problem
Solving

Unit-5
Querying from Data
Frames
Querying from Data Frames

Definition and Usage

The query() method allows you to query the DataFrame.

The query() method takes a query expression as a string parameter, which has to
evaluate to either True of False.

It returns the DataFrame where the result is True according to the query
expression.

Syntax
dataframe.query(expr, inplace)
Parameters
The inplace paramater is a keyword argument.

Parameter Values Description


expr Required. A string that represents a query expression.
inplace True|False Optional. A boolean value that specifies if the query()
method should leave the original DataFrame
untouched and return a copy (inplace = False). This is
Default.
Or:
Make the changes in the original DataFrame (inplace
= True)
Return Value

A DataFrame with the new result, or None if the changes were


made in the original DataFrame (inplace = True)

import pandas as pd

data = {
"name": ["Sally", "Mary", "John"],
"age": [50, 40, 30]
}

df = pd.DataFrame(data)

print(df.query('age > 35'))


Output:

name age
0 Sally 50
1 Mary 40
21CSS101J-PROGRAMMING FOR PROBLEM SOLVING
Unit – 05 : Session – 08 : SLO - 01
Speed Testing between NumPy and Pandas

• Pandas and NumPy are both essential tools in Python.

• Numpy runs vector and matrix operations very efficiently, while Pandas

provides the R-like data frames allowing intuitive tabular data analysis.
• Numpy is more optimized for arithmetic computations.

• Pandas has a better performance when a number of rows is 500K or

more.
• NumPy has a better performance when number of rows is 50K or less.

• Indexing of the Pandas series is very slow as compared to NumPy arrays.

• Indexing of NumPy Arrays is very fast.


Speed Testing between NumPy and Pandas
• To Understand the speed test comparison between NumPy and Pandas, Lets take an example of,
indexing on Pandas Series objects and NumPy .
• Example:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: np.version.version
Out[3]: '1.8.2‘
In [4]: pd.version.version
Out[4]: '0.14.1'
In [5]: a = np.arange(100)
In [6]: aa = np.arange(100, 200)
In [7]: s = pd.Series(a)
In [8]: ss = pd.Series(aa)
In [9]: i = np.random.choice(a, size=10)
Speed Testing between NumPy and Pandas
• Performance Comparison of NumPy and Pandas,

In [10]: %timeit a[i]


1000000 loops, best of 3: 998 ns per loop
In [11]: %timeit s[i]
10000 loops, best of 3: 168 µs per loop
Indexing the array is over 100 times faster than indexing the Series.
In [12]: %timeit a * aa 1000000 loops, best of 3: 1.21 µs per loop
In [13]: %timeit s * ss 10000 loops, best of 3: 88.5 µs per loop
Cont…
Why is Pandas so much slower than NumPy?
• Pandas is doing a lot of stuff when you index into a
Series, and it’s doing that stuff in Python.
• As an illustration, here’s a visualization made by profiling s[i]:
• Refer the below picture.
• Each colored arc is a different function call in Python. There are
about 100 calls there.
Cont…
Cont…
• By contrast, here’s the visualization made by profiling a[i]:

• There’s actually nothing to see because array indexing goes


straight into the NumPy C extensions, and the Python profiler
can’t see what’s going on there.
Cont…
• Pandas is fast enough most of the time, and you get the benefit of
Pandas’ sophisticated indexing features.
• It’s only in loops that the microseconds start to add up to minutes.
21CSS101J – PROGRAMMING
FOR PROBLEM SOLVNG
Unit – 05 : Session – 02 : SLO - 02

SRM Institute of Science and Technology 48


Other Python Libraries (1/12)
 Python's syntax, semantics, and tokens are all part of the Python
standard library.
 It comes with built-in modules that allow users to access basic
functions such as I/O and other important modules.
 The Python library is mostly written in his C language.
 The Python standard library has over 200 core modules.
 All these factors make Python a powerful programming language.
 The Python standard library is very important.

SRM Institute of Science and Technology 49


Other Python Libraries (2/12)
 Programmers can only use it if they have Python capabilities.
 Apart from that, Python has several libraries that make the
programmer's life easier.
 Let's explore some of the most popular libraries.
 Matplotlib  Keras
 SciPy  Scrapy
 Scikit- learn  PyGame
 Seaborn  PyBrain
 TensorFlow  Statsmodels
SRM Institute of Science and Technology 50
Other Python Libraries (3/12)
Matplotlib:
 Numerical data plotting is handled by this library.
 For this reason, it is used to analyze data.
 This is an open source library for plotting high resolution numbers
such as pie charts, scatterplots, boxplots and graphs..

SRM Institute of Science and Technology 51


Other Python Libraries (4/12)
SciPy:
 Scipy is a Python library.
 It is an open-source library, especially designed for scientific
computing, information processing, and high-level computing.
 A large number of user-friendly methods and functions for quick
and convinient computation are included in the library.
 Scipy can be used for mathematical computations alongside
NumPy.
 Cluster, fftpack, constants, integrate, io, linalg, interpolate,
ndimage, odr, optimise, signal, spatial, special, sparse, and stats
are just a few of the subpackages available in SciPy.

SRM Institute of Science and Technology 52


Other Python Libraries (5/12)
scikit - learn
 scikit-learn is also a Python-based open source machine learning
library.
 Both supervised and unsupervised learning methods are available
in this library.
 Common algorithms and packages SciPy, NumPy, Matplotlib are
already included in this library.
 The most famous Scikit Most Learn application is for Spotify
music recommendations.
SRM Institute of Science and Technology 53
Other Python Libraries (6/12)
Seaborn:
 This package allows visualization of statistical models. This
library is mainly based on Matplotlib and allows you to create
statistical graphs in the following ways:
 Variable comparison with dataset-based API
 Easily create complex visualizations, including multiplot rasters.
 Univariate and bivariate visualizations are used to compare
subsets of data.

SRM Institute of Science and Technology 54


Other Python Libraries (7/12)
TensorFlow:
 TensorFlow is an open source library for high performance
numerical computing.
 Deep learning and ML algorithms also make use of it.
 It was developed by a researcher in his Google Brain Group
within the Google AI organization and is now widely used by
mathematicians, physicists and machine learning researchers for
complex mathematical calculations.

SRM Institute of Science and Technology 55


Other Python Libraries (8/12)
Keras:
 Keras is an open-source, Python-based neural network library that
allows deep exploration of deep neural networks.
 As deep learning becomes more popular, Keras is emerging as a
viable option.
 According to its developers, Keras is an API (application
programming interface) designed for humans, not machines.
 Compared to TensorFlow and Theano, Keras has a higher
acceptance rate in the research community and industry.
 Before installing Keras, users must first download the TensorFlow
backend engine.

SRM Institute of Science and Technology 56


Other Python Libraries (9/12)
 Scrapy
 Scrapy is a web scraping tool that scrapes multiple pages within a
minute.
 Scrapy is also an open source Python library framework for
extracting data from websites.
 Named Scrapinghub ltd, it's a fast, high-level scraping and
crawling web library.
.

SRM Institute of Science and Technology 57


Other Python Libraries (10/12)
 pygame
 This library provides a simple interface to the standard
Directmedia Library (SDL) graphics, audio, and input libraries
that work on any platform.
 It is used to create video games, computer graphics and sound
libraries using the Python programming language.

SRM Institute of Science and Technology 58


Other Python Libraries (11/12)
 PyBrain
 PyBrain is a fast and easy machine learning library compared to
other Python learning libraries.
 PyBrain is also an open source ML algorithm library for beginners
to research from available Python libraries.
 PyBrain's main goal is to provide ML algorithms that are flexible
and easy to use even for novice programmers.
 It also includes a ready-made environment for comparing
algorithms.
SRM Institute of Science and Technology 59
Other Python Libraries (12/12)
 statsmodels
 Statsmodels is a Python library useful for analyzing and
estimating statistical models.
 The library is used to perform statistical tests and other tasks with
high quality results.
 User-Friendly Interface The Python programming language is
widely used in many real-world applications.
 Because it is a dynamically written, high-level language, it spreads
rapidly in the area of ​troubleshooting.
 Python is increasingly being used in popular applications such as
YouTube and DropBox.
 The accessibility of Python libraries also allows users to perform
multiple tasks without typing code.
SRM Institute of Science and Technology 60

You might also like