Unit 5
Unit 5
Unit 5
arr2 = np.array([[10,20,30],[40,50,60]])
print("My 2D numpy array:\n", arr2)
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
• The table below shows the index (or location) of each value in the array.
import numpy as np
a = np.array([[2,3,4],[6,7,8]])
value = a[1,2]
[[2 3 4]
[6 7 8]]
Assigning Values with Indexing
• Array indexing is used to access values in an array. And array indexing can also be used for assigning values of
an array.
• The general form used to assign a value to a particular index or location in an array is below:
<array>[index] = <value>
• Where <value> is the new value going into the array and [index] is the location the new value will occupy.
• The code below puts the value 10 into the second index or location of the array a.
import numpy as np
a = np.array([2,4,6])
a[2] = 10
[ 2 4 10]
• Values can also be assigned to a particular location in a 2-D arrays using the form:
<array>[row,col] = <value>
• The code example below shows the value 20 assigned to the 2nd row (index 1) and 3rd column (index 2) of the
import numpy as np
a = np.array([[2,3,4],[6,7,8]])
[[2 3 4]
[6 7 8]]
[[ 2 3 4]
[ 6 7 20]]
Negative Indexing
• Use negative indexing to access an array from the end.
Print the last element from the 2nd dim:
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('Last element from 2nd dim: ', arr[1, -1])
Last element from 2nd dim: 10
1. import numpy as np
2. arr = np.array([[1,2,3,4],[5,6,7,8]])
3. arr
O/P: array([[1,2,3,4], [5,6,7,8]])
You can change the shape of the array by rearranging the tuple
1. arr.shape = (4,2)
2. arr
O/P: array([[1,2],[3,4], [5,6],7,8]])
1. arr.np.arrange(10). Reshape(2,5)
2. arr
O/P: array([0,1,2,3,4],[5,6,7,8]])
O/P: 2
SRM Institute of Science and Technology 17
• This method returns the length of the array of each
component in bytes.
1. import numpy as np
2. arr = np.array([1,2,3,4,5])
3. arr.itemsize
O/P: 8
O/P: [2,3,4,5]
O/P: [5,6]
O/P: [2,4]
O/P: [7,8,9]
• Percentile is a measure which indicates the value below which a given percentage of
points in a dataset fall. For instance, the 35th percentile(\(P_{35}\)) is the score below
which 35% of the data points may be found.
• We can observe that median represents the 50th percentile. Similarly, we can have 0th
percentile representing the minimum and 100th percentile representing the maximum
of all data points.
• There are various methods of calculation of quartiles and percentiles, but we will stick to
the one below. To calculate \(k^{th}\) percentile(\(P_{k}\)) for a data set of \(N\)
observations which is arranged in increasing order, go through the following steps:
• Step 1: Calculate \(\displaystyle i=\frac{k}{100}\times N\)
• Step 2: If \(i\) is a whole number, then count the observations in the data set from left to
right till we reach the \(i^{th}\) data point. The \(k^{th}\) percentile, in this case, is equal
to the average of the value of \(i^{th}\) data point and the value of the data point that
follows it.
• Step 3: If \(i\) is not a whole number, then round it up to the nearest integer and count
the observations in the data set from left to right till we reach the \(i^{th}\) data point.
The \(k^{th}\) percentile now is just equal to the value corresponding this data point.
Step 2: Not applicable here as 1.89 is not a whole number, so let us move
to step 3
Therefore, 9 is \(27^{th}\) percentile which means that 27% of the students have
scored below 9.
Percentiles with NumPy
• numpy.percentile(a, q, axis=None,iterpolation=’linear’)
• a: array containing numbers whose range is required
q: percentile to compute(must be between 0 and 100)
axis: axis or axes along which the range is computed, default is to compute the
range of the flattened array
interpolation: it can take the values as ‘linear’, ‘lower’, ‘higher’, ‘midpoint’or
‘nearest’. This parameter specifies the method which is to be used when the
desired quartile lies between two data points, say i and j.
• linear: returns i + (j-i)*fraction, fraction here is the fractional part of the index
surrounded by i and j
• lower: returns i
• higher: returns j
• midpoint: returns (i+j)/2
• nearest: returns the nearest point whether i or j
• numpy.percentile() agrees with the manual calculation of percentiles (as shown
above) only when interpolation is set as ‘lower’.
• It is the square of the standard deviation and the covariance of the random
variable with itself.
>>> A=np.array([[10,14,11,7,9.5,15,19],[8,9,17,14.5,12,18,15.5],
>>> B=A.T
>>> a = np.var(B,axis=0)
>>> b = np.var(B,axis=1)
Querying from Data
Querying from Data Frames
The query() method takes a query expression as a string parameter, which has to
evaluate to either True of False.
It returns the DataFrame where the result is True according to the query
dataframe.query(expr, inplace)
The inplace paramater is a keyword argument.
import pandas as pd
data = {
"name": ["Sally", "Mary", "John"],
"age": [50, 40, 30]
df = pd.DataFrame(data)
name age
0 Sally 50
1 Mary 40
Unit – 05 : Session – 08 : SLO - 01
Speed Testing between NumPy and Pandas
• Numpy runs vector and matrix operations very efficiently, while Pandas
provides the R-like data frames allowing intuitive tabular data analysis.
• Numpy is more optimized for arithmetic computations.
• NumPy has a better performance when number of rows is 50K or less.