Estimation of Variable | set 2

Last Updated : 05 Aug, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisite: Estimation of Variable | set 1
Terms related to Variability Metrics : 

-> Deviation 
-> Variance
-> Standard Deviation
-> Mean Absolute Deviation
-> Median Absolute Deviation
-> Order Statistics
-> Range
-> Percentile 
-> Inter-quartile Range

 

  • Median Absolute Deviation: Mean Absolute Deviation, Variance, and Standard deviation (discussed in the previous section) are not robust to extreme values and outliers. We average the sum of deviations from the median. 
     

  • Example : 
Sequence : [2, 4, 6, 8] 
Mean     = 5
Deviation around mean = [-3, -1, 1, 3]

Mean Absolute Deviation = (3 + 1 + 1 + 3)/ 4

Python3




# Median Absolute Deviation
 
import numpy as np
 
def mad(data):
    return np.median(np.absolute(
            data - np.median(data)))
     
Sequence = [2, 4, 10, 6, 8, 11]
 
print ("Median Absolute Deviation : ", mad(Sequence))
    


Output : 

Median Absolute Deviation :  3.0
  • Order Statistics: This variability measurement approach is based on the spread of ranked (sorted) data.
  • Range: It is the most basic measurement belonging to Order Statistics. It is the difference between the largest and the smallest value of the dataset. It is good to know the spread of data but it is very sensitive to outliers. We can make it better by dropping the extreme values. 
    Example : 
Sequence : [2, 30, 50, 46, 37, 91]
Here, 2 and 91 are outliers

Range = 91 - 2 = 89
Range without outliers = 50 - 30 = 20
  • Percentile: It is a very good measure to measure the variability in data, avoiding outliers. The Pth percentile in data is a value such that atleast P% or fewer values are lesser than it and atleast (100 – P)% values are more than P. 
    The Median is the 50th percentile of the data. 
    Example : 
Sequence : [2, 30, 50, 46, 37, 91] 
Sorted   : [2, 30, 37, 46, 50, 91]

50th percentile = (37 + 46) / 2 = 41.5 
  • Code – 

Python3




# Percentile
 
import numpy as np
 
     
Sequence = [2, 30, 50, 46, 37, 91]
 
print ("50th Percentile : ", np.percentile(Sequence, 50))
     
print ("60th Percentile : ", np.percentile(Sequence, 60))


Output : 

50th Percentile :  41.5
60th Percentile :  46.0
  • Inter-Quartile Range(IQR): It works for the ranked(sorted data). It has 3 quartiles dividing data – Q1(25th percentile), Q2(50th percentile), and Q3(75th percentile). Inter-quartile Range is the difference between Q3 and Q1. 
    Example : 
Sequence : [2, 30, 50, 46, 37, 91] 
Q1 (25th percentile) : 31.75
Q2 (50th percentile) : 41.5
Q3 (75th percentile) : 49

IQR = Q3 - Q1 = 17.25
  • Code – 1 

Python3




# Inter-Quartile Range
 
import numpy as np
from scipy.stats import iqr
     
Sequence = [2, 30, 50, 46, 37, 91]
 
print ("IQR : ", iqr(Sequence))


Output : 

IQR :  17.25
  • Code – 2 

Python3




import numpy as np
 
# Inter-Quartile Range
iqr = np.subtract(*np.percentile(Sequence, [75, 25]))
 
print ("\nIQR : ", iqr)


Output : 

IQR :  17.25


Previous Article
Next Article

Similar Reads

Probability Density Estimation & Maximum Likelihood Estimation
Prerequisite: Probability Distribution Probability Density: Assume a random variable x that has a probability distribution p(x). The relationship between the outcomes of a random variable and its probability is referred to as the probability density. The problem is that we don't always know the full probability distribution for a random variable. T
6 min read
Estimation of Variable | set 1
Variability: It is the import dimension that measures the data variation i.e. whether the data is spread out or tightly clustered. Also known as Dispersion When working on data sets in Machine Learning or Data Science, involves many steps - variance measurement, reduction, and distinguishing random variability from the real one. identifying sources
3 min read
ML | ADAM (Adaptive Moment Estimation) Optimization
Prerequisite: Optimization techniques in Gradient Descent Gradient Descent is applicable in the scenarios where the function is easily differentiable with respect to the parameters used in the network. It is easy to minimize continuous functions than minimizing discrete functions. The weight update is performed after one epoch, where one epoch repr
2 min read
DeepPose: Human Pose Estimation via Deep Neural Networks
DeepPose was proposed by researchers at Google for Pose Estimation in the 2014 Computer Vision and Pattern Recognition conference. They work on formulating the pose Estimation problem as a DNN-based regression problem towards body joints. They present a cascade of DNN-regressors which resulted in high precision pose estimates.. Architecture: Pose V
5 min read
PoseNet Pose Estimation
Pose estimation refers to computer vision techniques that detect persons or objects in images and video so that one could determine , for example, where someone’s elbow shown up in an image. Pose Estimation techniques have many applications such as Gesture Control, Action Recognition and also in the field of augmented reality. In this article, we w
4 min read
Ledoit-Wolf vs OAS Estimation in Scikit Learn
Generally, Shrinkage is used to regularize the usual covariance maximum likelihood estimation. Ledoit and Wolf proposed a formula which is known as the Ledoit-Wolf covariance estimation formula; This close formula can compute the asymptotically optimal shrinkage parameter with minimizing a Mean Square Error(MSE) criterion feature. After that, one r
4 min read
Sparse Inverse Covariance Estimation in Scikit Learn
Sparse inverse covariance (also known as the precision matrix) is a statistical technique used to estimate the inverse covariance matrix of a dataset. The goal of this technique is to find a sparse estimate of the precision matrix, which means that many of the entries in the matrix are set to zero. This can be useful for identifying relationships b
3 min read
Comparing Randomized Search and Grid Search for Hyperparameter Estimation in Scikit Learn
Hyperparameters are the parameters that determine the behavior and performance of a machine-learning model. These parameters are not learned during training but are instead set prior to training. The process of finding the optimal values for these hyperparameters is known as hyperparameter optimization, and it is an important step in the developmen
8 min read
Shrinkage Covariance Estimation in Scikit Learn
The Ledoit and Wolf proposed a formula for shrinkage which is generally used for regularizing the usual maximum likelihood estimation. This formula is called the Ledoit-Wolf covariance estimation formula. This formula is able to compute asymptotically optimal shrinkage parameters by minimizing the mean-square-error criterion features. OAS Estimator
3 min read
Non Parametric Density Estimation Methods in Machine Learning
Non-parametric methods: Similar inputs have similar outputs. These are also called instance-based or memory-based learning algorithms. There are 4 Non - parametric density estimation methods: Histogram EstimatorNaive EstimatorKernel Density Estimator (KDE)KNN estimator (K - Nearest Neighbor Estimator)Histogram Estimator It is the oldest and the mos
5 min read