Distribution and Statistical Interference
Distribution and Statistical Interference
Welcome to:
Distribution and Statistical Interference
9.1
Unit objectives IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Learn about the applied probability techniques used in data analytics and visualization
• Gain knowledge on the probability distributions used in data analytics and visualization
• Learn about various testing such as hypothesis testing, parametric & non-parametric testing
like t-test, chi-square test, ANNOVA.
• Understand the concept of dimension reduction such as PCA and factor analysis
Basic concepts of probability
theory IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Experiments
– Deterministic.
– Random.
• Basic terminologies.
– Outcome.
– Trial.
– Sample space.
– Event.
– Mutually exclusive events.
– Exhaustive events.
– Equally likely events.
– Probability space.
Defining probability (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Von Mises’s statistical or empirical or frequency definition of probability: if tests are repeated
several times under the same conditions, the limit of the repeated incident takes place to
maximum set of tests as the number of tests is that indefinitely is called the probability of that
incident occurring.
• Multiplication theorem can be stated as follows for two events: For two events A and B,
Bayes’ Theorem IBM ICE (Innovation Centre for Education)
IBM Power Systems
• If an event A can occur if one or another of a set of events that are mutually exclusive B1,B2,
B3, ... , Bk Happens and if P(Bi) ≠ 0 for i = 1,2,3, ... , k, then,
Random variables (1 of 5) IBM ICE (Innovation Centre for Education)
IBM Power Systems
• For probabilistic modelling, random variables are a fundamental method. They help us to
model unknown numerical quantities.
• Probability mass function: To calculate the likelihood of each value, a discrete random
variable must be described.
• We should discrete their fields and define them as distinct random variables in order to
design these quantities probabilistically.
• Cumulative Distribution Function (CDF) of the random variable X can be termed as: F(x) =
P(X ≤ x).
Random Variables (4 of 5) IBM ICE (Innovation Centre for Education)
IBM Power Systems
• In the length range, fX (x) x is the probability that X is about x as 0. A basic theorem of
calculus calculates the probability of a factor X that belongs to a group,
• Probability density function and cumulative distribution function for rolling a die in example:
Expectation and variance of a
random variable (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems
• The probability density function (or probability mass function) and the cumulative distribution
function are helpful in characterizing the features of a random variable.
• Some other features of random variables are characterized by the concepts of:
– Expectation
– Variance
• For a discrete random variable X, which takes the values x1, x2,... with respective
probabilities p2, p2,..., the expectation of X is defined as
Expectation and variance of a
random variable (2 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Variance: Variance of a random variable is described by variance. This gives some insight
into how values are clustered and distributed around the distribution's arithmetic mean.
– Var(X)=E[X − E(X)]2 is the variability of a random variable X.
• The difference is normally referred to as π2=Var(X). The standard deviation of the variability
is the positive square root.
Probability distribution IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Probability distribution: It is a statistical feature that connects or lists all possible outcomes
that a random variable with its corresponding probability of occurrence can take in any
random process.
Discrete probability distribution (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Bernoulli Distribution: Only two possible Bernoulli distribution options are available: 1
(success) and 0 (failure) and one test. Thus the random variable X with a Bernoulli
distribution takes value 1 with the likelihood of success, say p, and value 0 with the likelihood
of catastrophe, either q or 1-p.
• The weight function of the likelihood is given by: px(1-p)1-x where x € (0, 1) is used.
• The mean and variance of a Bernoulli random variable are determined to be,
Discrete probability distribution (2 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems
• One of the characteristics of binomial distribution graph is the chances of success are always
not equal to the chances of failure.
Continuous probability
distributions (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems
• These variables are defined by an infinite number of potential outcomes and a continuous
function of f(x) distribution. The point odds, therefore, are 0, i.e. P(X = x) = 1.
• If b is given to the PDF, a variable of random X will follow a normal distribution with
parameters of μ and π2.
• Unless you assume that potential lifespan is independent of lifespan already in place (that is,
no "aging" mechanism works), waiting times can be viewed as exponentially increasing.
• If its PDF is given by, a random variable X will adopt exponential distribution with parameter
π > 0.
Statistical analysis IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Descriptive statistical analysis helps to understand the data and is a very important part of
analytics.
• A distribution's modality depends on how many peaks it contains. Some distributions have
only one peak, but it is possible to find two or more peak distributions.
• The following picture shows graphical representations of the three modality types:
Figure: Modality
Source: https://miro.medium.com/max/1693/0*m_Fd3Opt6L70LiYS.png
Skewness IBM ICE (Innovation Centre for Education)
IBM Power Systems
Figure: Skewness
Source: https://miro.medium.com/max/750/0*F1mkGYUbmqtZzLKF.jpg
Data visualizations for
descriptive analytics: Box plot IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Box plot
– A box plot is based on a representation of five numbers (min, max, three quartiles written in
increasing order) and can be used to provide a graphical overview of the centre-point and variance of
the observed parameter values in a data set.
Figure: QQ plots
Source: Introduction%20to%20Statistics%20and%20Data%20Analysis%20in%20R.pdf
Data visualizations for
descriptive analytics: Histogram IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Histogram
Figure: Histogram
Source: Introduction%20to%20Statistics%20and%20Data%20Analysis%20in%20R.pdf
Data visualizations for descriptive
analytics: Kernel density plots IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Inferential statistics allows you to make predictions (“inferences”) from that data.
• With inferential statistics, the user can take data from samples and make generalizations
about a population.
• Population distribution
– Sampling distribution helps to estimate the population statistic
– Example of effect of sample size on the distribution process
Inferential statistics:
Confidence interval IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Confidence interval
– It is a type of sampling distribution interval estimation that provides a range of values that may include
population statistics
– Formally, Confidence Interval can be described as:
– Example :
– Where, X = the sample mean
– Zα/2 = Z value for desired confidence level α
– σ= standard deviation of population
– Alpha value = 0.95 which is 95% confidence interval,
– Z=1.96
.
Inferential statistics: Hypothesis IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Hypothesis
– Null hypothesis: H0 is usually referred to as the theory with "no difference."
• Significance analysis often starts with the presumption that there is a null hypothesis.
– Alternate hypothesis: Negation of null hypothesis is alternative hypothesis.
• It is set in such a way as to consider alternative hypothesis when null hypothesis is denied.
– Test stats: Test statistics are determined from sample data.
• Based on the significance of the test result, the option to accept or not reject the null hypothesis.
Inferential statistics: Type I type II error IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Type I error.
• Type II error.
• Hypothesis
– H0 :There is no significant difference between two treatments.
– H0: 1 = 2 (or) 1 - 2 = 0
– H1: There is a significant difference between two treatments.
– H1: 1 ≠ 2 (or) 1 - 2 ≠ 0
• Test Statistic
– The test statistic is given by:
– Where,
– 𝑛1 and 𝑛2 = number of observations in given two groups respectively.
– 𝑥 ̅1 and 𝑥 ̅2 = sample means of two groups.
– 𝑆12 and 𝑆22 =sample variances the given groups.
Paired t- test IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Test Statistic
– It is given by :
– Where,
– 𝑛= number of observations in given sample.
– 𝑑 ̅ = mean difference between two observations in given sample.
– 𝑆_𝑑 = differences in standard deviation.
• Assumptions
– Random samples should be separate from each population.
– The distribution of variations between pairs in the population should be distributed or distributed
roughly normally.
ANOVA IBM ICE (Innovation Centre for Education)
IBM Power Systems
• The goal is to decide whether the mean differences obtained for sample data are significant
enough to support a hypothesis that mean differences exist between the populations from
which the samples are taken.
Non-parametric test IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Assumptions
– Not require normal distributions or homogeneity-of-variances,
– But does require independent observations and assumes dependent variable is continuous.
• Test Statistic
– Where,
Kruskal-Wallis Test IBM ICE (Innovation Centre for Education)
IBM Power Systems
• Test statistic
Checkpoint (1 of 2) IBM ICE (Innovation Centre for Education)
IBM Power Systems
1. The goal of ANOVA is to decide whether the ___obtained for sample data are significant
enough to support a hypothesis.
2. Manny Whitney U Test are ____method that decides how graded scores vary in two
separate groups
3. The central trend determines the cluster data's value inclination around its_______
4. An experiment is called _______ when just two outcomes are possible, replicated numerous
times.
True or False:
1. The goal of ANOVA is to decide whether the mean differences obtained for sample data
are significant enough to support a hypothesis.
2. Manny Whitney U Test are non- parametric method that decides how graded scores vary in
two separate groups
3. The central trend determines the cluster data's value inclination around its average.
4. An experiment is called binomial when just two outcomes are possible, replicated
numerous times.
True or False
• Learn about the applied probability techniques used in data analytics and visualization
• Gain knowledge on the probability distributions used in data analytics and visualization
• Learn about various testing such as hypothesis testing, parametric & non-parametric testing
like t-test, chi-square test, ANNOVA.
• Understand the concept of dimension reduction such as PCA and factor analysis