Unit 5
Unit 5
Unit 5
arr2 = np.array([[10,20,30],[40,50,60]])
print("My 2D numpy array:\n", arr2)
Output:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
• The table below shows the index (or location) of each value in the array.
import numpy as np
a = np.array([[2,3,4],[6,7,8]])
print(a)
value = a[1,2]
print(value)
[[2 3 4]
[6 7 8]]
8
Assigning Values with Indexing
• Array indexing is used to access values in an array. And array indexing can also be used for assigning values of
an array.
• The general form used to assign a value to a particular index or location in an array is below:
<array>[index] = <value>
• Where <value> is the new value going into the array and [index] is the location the new value will occupy.
• The code below puts the value 10 into the second index or location of the array a.
import numpy as np
a = np.array([2,4,6])
a[2] = 10
print(a)
[ 2 4 10]
• Values can also be assigned to a particular location in a 2-D arrays using the form:
<array>[row,col] = <value>
• The code example below shows the value 20 assigned to the 2nd row (index 1) and 3rd column (index 2) of the
array.
import numpy as np
a = np.array([[2,3,4],[6,7,8]])
print(a)
a[1,2]=20
print(a)
[[2 3 4]
[6 7 8]]
[[ 2 3 4]
[ 6 7 20]]
Negative Indexing
• Use negative indexing to access an array from the end.
Example:
Print the last element from the 2nd dim:
import numpy as np
arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])
print('Last element from 2nd dim: ', arr[1, -1])
Output:
Last element from 2nd dim: 10
21CSS101J – PROGRAMMING FOR PROBLEM SOLVNG
1. import numpy as np
2. arr = np.array([[1,2,3,4],[5,6,7,8]])
3. arr
O/P: array([[1,2,3,4], [5,6,7,8]])
You can change the shape of the array by rearranging the tuple
1. arr.shape = (4,2)
2. arr
O/P: array([[1,2],[3,4], [5,6],7,8]])
1. arr.np.arrange(10). Reshape(2,5)
2. arr
O/P: array([0,1,2,3,4],[5,6,7,8]])
arr.ndim
O/P: 2
SRM Institute of Science and Technology 17
Ndarray.itemsize
• This method returns the length of the array of each
component in bytes.
1. import numpy as np
2. arr = np.array([1,2,3,4,5])
3. arr.itemsize
O/P: 8
O/P: [2,3,4,5]
O/P: [5,6]
O/P: [2,4]
O/P: [7,8,9]
• Percentile is a measure which indicates the value below which a given percentage of
points in a dataset fall. For instance, the 35th percentile(\(P_{35}\)) is the score below
which 35% of the data points may be found.
• We can observe that median represents the 50th percentile. Similarly, we can have 0th
percentile representing the minimum and 100th percentile representing the maximum
of all data points.
• There are various methods of calculation of quartiles and percentiles, but we will stick to
the one below. To calculate \(k^{th}\) percentile(\(P_{k}\)) for a data set of \(N\)
observations which is arranged in increasing order, go through the following steps:
• Step 1: Calculate \(\displaystyle i=\frac{k}{100}\times N\)
• Step 2: If \(i\) is a whole number, then count the observations in the data set from left to
right till we reach the \(i^{th}\) data point. The \(k^{th}\) percentile, in this case, is equal
to the average of the value of \(i^{th}\) data point and the value of the data point that
follows it.
• Step 3: If \(i\) is not a whole number, then round it up to the nearest integer and count
the observations in the data set from left to right till we reach the \(i^{th}\) data point.
The \(k^{th}\) percentile now is just equal to the value corresponding this data point.
Example
Step 2: Not applicable here as 1.89 is not a whole number, so let us move
to step 3
Therefore, 9 is \(27^{th}\) percentile which means that 27% of the students have
scored below 9.
Percentiles with NumPy
• numpy.percentile(a, q, axis=None,iterpolation=’linear’)
• a: array containing numbers whose range is required
q: percentile to compute(must be between 0 and 100)
axis: axis or axes along which the range is computed, default is to compute the
range of the flattened array
interpolation: it can take the values as ‘linear’, ‘lower’, ‘higher’, ‘midpoint’or
‘nearest’. This parameter specifies the method which is to be used when the
desired quartile lies between two data points, say i and j.
• linear: returns i + (j-i)*fraction, fraction here is the fractional part of the index
surrounded by i and j
• lower: returns i
• higher: returns j
• midpoint: returns (i+j)/2
• nearest: returns the nearest point whether i or j
• numpy.percentile() agrees with the manual calculation of percentiles (as shown
above) only when interpolation is set as ‘lower’.
Example
• It is the square of the standard deviation and the covariance of the random
variable with itself.
>>> A=np.array([[10,14,11,7,9.5,15,19],[8,9,17,14.5,12,18,15.5],
[15,7.5,11.5,10,10.5,7,11],[11.5,11,9,12,14,12,7.5]])
>>> B=A.T
>>> a = np.var(B,axis=0)
>>> b = np.var(B,axis=1)
Unit-5
Querying from Data
Frames
Querying from Data Frames
The query() method takes a query expression as a string parameter, which has to
evaluate to either True of False.
It returns the DataFrame where the result is True according to the query
expression.
Syntax
dataframe.query(expr, inplace)
Parameters
The inplace paramater is a keyword argument.
import pandas as pd
data = {
"name": ["Sally", "Mary", "John"],
"age": [50, 40, 30]
}
df = pd.DataFrame(data)
name age
0 Sally 50
1 Mary 40
21CSS101J-PROGRAMMING FOR PROBLEM SOLVING
Unit – 05 : Session – 08 : SLO - 01
Speed Testing between NumPy and Pandas
• Numpy runs vector and matrix operations very efficiently, while Pandas
provides the R-like data frames allowing intuitive tabular data analysis.
• Numpy is more optimized for arithmetic computations.
more.
• NumPy has a better performance when number of rows is 50K or less.