Normal Distribution For ML
Normal Distribution For ML
Source: Google
Mathematical Definition:
Source: Google
2. It is a continuous distribution.
5. It is unimodal.
Area Properties:
• 95% — (μ±1.96σ)
• 99% — (μ±2.75σ)
Source: Google
Thus, almost all the data lies within 3 standard deviations. This
rule enables us to check for Outliers and is very helpful when
determining the normality of any distribution.
So it’s better to critically explore the data and check for the underlying
distributions for each variable before going to fit the model.
Visualization Techniques:
Feature Analysis:
When the quantiles of two variables are plotted against each other,
then the plot obtained is known as quantile — quantile plot or qqplot.
This plot provides a summary of whether the distributions of two
variables are similar or not with respect to the locations.
Here we can clearly see that feature is not normally distributed. But it
somewhat resembles it. We can conclude that standardizing
(StandardScaler) this feature before feeding it to a model can generate
a good result.
5. Time Series Analysis: While time series data may not always
strictly follow a normal distribution, understanding the normal
distribution’s properties can be helpful in modeling and
forecasting time series data, especially when dealing with
residuals in models like ARIMA (AutoRegressive Integrated
Moving Average).