Module 2-Random Variables (Modified) (1)
Module 2-Random Variables (Modified) (1)
Definition and examples of random variables and their types. Mean, variance
and standard deviation of a Discrete probability distribution.
Discrete probability distribution: (𝑥𝑖 , p (𝑥𝑖 ) ) where 𝑥𝑖 denotes a certain event and
p (𝑥𝑖 ) denotes its associated probability. X is the random variable associated with
an experiment within which we have 𝑥𝑖 (𝑠𝑖𝑛𝑔𝑢𝑙𝑎𝑟 𝑒𝑣𝑒𝑛𝑡𝑠).
Mean = ∑ 𝑥𝑖 . 𝑝(𝑥𝑖 )
Variance = ∑ (𝑥𝑖 − 𝑚𝑒𝑎𝑛)2 . p(𝑥𝑖 )
Standard deviation = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
Ques – Define random variable, probability function, discrete probability
distribution, probability density function (probability mass function), cumulative
distribution function.
Q1. Check whether the following distribution is a discrete probability distribution. Find mean and
variance
(i) Find k
(ii) Find 𝑃(𝑋 < 4), 𝑃(𝑋 ≥ 5), 𝑃(3 < 𝑋 ≤ 6)
Q3. A random variable X has the following probability function given by the following table:
P(X) k k 3k 𝑘 2+k 6𝑘 2
(i) Find the value of k. (ii) Evaluate P(X<3), P(1<X<4) (iii) Determine the distribution
function of X.
(i) Find the value of k and calculate the mean and variance.
Q6. A random variable X has the following probability function:
x 0 1 2 3 4
P(x) k 2k 2k 𝑘2 5𝑘 2
Q 7 If three coins are tossed, find the expectation and variance of the number of heads.
[Ans:9/8,63/6]
Q 8 A coin is tossed until a head appears. Find the expectation of tosses required.
[Ans:2]
Mean (X) = E(X) = ∑𝑖 𝑥𝑖 . 𝑓(𝑥𝑖 ) where f(𝑥𝑖 ) is the probability function corresponding
to 𝑥𝑖
COV(X,Y)
Correlation of X and Y = 𝜌(𝑋, 𝑌) =
𝜎𝑋 𝜎𝑌
n = number of trials
x denotes the number of times we need successes
p = probability of success
q = probability of failure
P(x) = 𝑪𝒏𝒙 𝒑𝒙 𝒒𝒏−𝒙
Mean = np
Variance = npq
Q1. In a sampling of a large number of parts manufactured by a company, the mean number of
defectives in a sample of 20 is 2. Out of 1000 such samples, how many would be expected to
contain at least 3 defective parts? Ans: P(x>=3)=323
Q2. The probability that a person aged 60 years will live up to 70 is 0.65. What is the probability
that out of 10 persons aged 60 at least 7 of them will live up to 70. Ans: P(x>=7)=0.5138
Q3. The number of telephone lines busy at an instant of time is a binomial variate with probability
0.1 that a line is busy. If 10 lines are chosen at random, then what is the probability that
(i) No line is busy ii) All lines are busy iii) At least one line is busy iv) At most two lines are
busy
Ans: i) P(0)=0.3487 ii) P(10)=(0.1)^10 iii) P(x>=1)=0.6513 iv)P(2)=0.9298
Q4. In 800 families with 5 children each how many families would be expected to have
(i) 3 boys ii) 5 girls iii) Either 2 or 3 boys
(iv)At most 2 girls by assuming probabilities for boys and girls to be equal.
Ans: i) f(3)=250 ii) f(0)=25 iii) f(2)+f(3)=500 iv) f(5)+f(4)+f(3)=400
Q5. A die has thrown 8 times. Find the probability that 3 falls (i) Exactly 2 times (ii) At least once
(iii) Atmost 7 times.
[0.260476, 0.7674319, 0.9999994]
Q6. Ten coins are tossed simultaneously. Find the probability of getting at least seven heads.
[ANS: 0.171875]
Q7. In a hurdle race a player has to cross 10 hurdles. The probability that he will clear each hurdle
is 5/6. What is the probability that he will knock down fewer than 2 hurdles.
[Ans: 0.9999991731]
Find A and the probabilities that she will speak more than 10 minutes, less than 5 minutes.
and between 5 and 10 minutes. [Ans:1/5,0.1353,0.6321,0.2325]
Q 2 Find the constant c such that the function
Find the value of k and the probability that on a given day the electric consumption is
more
than the expected electric
consumption. [Ans:1/9,0.406]
Q 4 A continuous random variable X has p.d.f. f(x) given by
f(x) = 2ax + b for 0 x < 2,
=0 otherwise.
If the mean of the distribution is 3, find the constants a and b. [Ans:3/2,-
5/2]
Q 6 In a certain city the daily consumption of water (in millions of litres) is a random variable
X
with p.d.f.
1 −(𝑥−𝜇)2⁄
2. Probability density function p(x) = 𝑒 𝜎2
𝜎 √2𝜋
6. In order to solve the question, always reduce your given distribution to standard
𝑥−𝜇
normal distribution with 𝜇 = 0 and 𝜎 = 1 by taking = z.
𝜎
7. Values for standard normal distribution are given in a table where 𝜑(𝑧1 ) =
1 𝑧1 −𝑧 2⁄
∫0
𝑒 2 dz = p (0 ≤ 𝑧 ≤ 𝑧1 )
√2𝜋
10. 𝜑(𝑧1 = 0. 𝑝𝑞) is seen in such a way that for 0.p we move across the column
and 0.0q is seen across the row and intersection of these two give us the value for
𝜑(𝑧1 = 0. 𝑝𝑞) which is the area under the bell-shaped curve from 0 to 𝑧1 .
Q1. Evaluate the following probabilities with the help of normal probability tables.
i) 𝒑(𝒛≥𝟎.𝟖𝟓) ii) 𝒑(−𝟏.𝟔𝟒≤𝒛≤−𝟎.𝟖𝟖) iii) 𝒑(𝒛≤−𝟐.𝟒𝟑) iv) 𝒑(|𝒛|≤𝟏.𝟗𝟒)
Ans: i) 0.1977 ii) 0.1389 iii) 0.0075 iv) 0.9476
Q2. If x is a normal variate, with mean 30 and standard deviation 5, then find the probabilities of
the following
i) 𝟐𝟔≤𝒙≤𝟒𝟎 ii)𝒙≥𝟒 iii) |𝒙−𝟑𝟎|>𝟓 Ans: i) 0.7653 ii) 0.0013
Q3. The marks of 1000 students in an examination follows a normal distribution with mean 70 and
standard deviation 5. Find the number of students whose marks will be
i) Less than 65 ii) More than 75 iii) Between 65 and 75.
Ans: i) 159 ii) 159 iii) 683
Q4. In a normal distribution 31% of the items are under 45 and 8% of the items are over 64. Find
the mean and standard deviation.
Ans: Mean=50 and S.D.=10
• Chebyshev's inequality
In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation
of a random variable (with finite variance) from its mean. More specifically, the probability that a
random variable deviates from its mean by more than 𝑘𝜎 is at most 1/𝑘 2, where k is any positive
constant.
Chebyshev's inequality is more general, stating that a minimum of just 75% of values must lie within
two standard deviations of the mean and 88.89% within three standard deviations for a broad range
of different probability distributions.
Probabilistic statement
Let X (integrable) be a random variable with finite non-zero variance σ2 (and thus finite expected
value μ). Then for any real number k > 0
1
Pr(𝜇 − 𝑘𝜎 < 𝑥 < 𝜇 + 𝑘𝜎) > 1 - 𝑘2
1
Pr(|𝑥 − 𝜇| ≥ 𝑘𝜎) ≤ 𝑘2
Using this you can calculate that what percentage of data lies within k range of standard deviation from
the mean.
Problems
Q1. Suppose a fair coin is tossed 50 times. The bound on the probability that the number of
heads will be greater than 35 or less than 15 can be found using Chebyshev's Inequality.
In other words, chances of a fair coin coming up heads outside the range of 15 to 35 times is
at most 0.125.
Q2. Suppose 1,000 applicants show up for a job interview, but there are only 70 positions
available. To select the best 70 people amongst the 1,000 applicants, the employer gives an
aptitude test to judge their abilities. The mean score on the test is 60, with a standard deviation
of 6. If an applicant scores an 84, can they assume they are getting a job?
The results show that about 63 people scored above a 60, so with 70 positions available, an
applicant who scores an 84 can be assured they got the job.
Q-1 Suppose X is a random variable with mean 100 and standard deviation 5.
(a) What conclusion can you draw for k=2,3 through Chebyshev’s inequality.
(b) Estimate the probability that x lies between 80 and 120.
(c) Find [a,b] about 100 for which the probability that x lies in the interval is at least 99
percent.
Q- 2 Let X be a random variable with mean 100 and standard deviation 10. Use Chebyshev’s
inequality to estimate (a) P(X ≥ 120) and (b) P(X ≤ 75).
3. Suppose a random variable X has mean μ = 25 and standard deviation 𝜎
= 2. Use Chebyshev’s inequality to estimate
Q5. Let X be a random variable with mean μ = 80 and unknown standard deviation σ
. Use Chebyshev’s inequality to find a value of σ for which
Pr(75≤X ≤85) ≥ 0.9.
Lognormal distribution is a continuous probability distribution with a long tail to the right that is
right skewed. It’s used to represent things like income distributions, chess game lengths, and the
time it takes to repair a maintainable system, among other things.
This type of distribution is generally characterized by skewed distributions with low mean
values, large variation, and all-positive values. ln(x) only exists for positive x values, hence
values must be positive.
𝑋 = 𝑒 𝜇+𝜎𝑥
where μ and σ is the mean and standard deviation of the logarithm of X respectively.
The probability density function for the lognormal is defined by the two parameters μ and σ,
where x > 0. When our lognormal data is transformed using logarithms our μ can then be viewed
as the mean and σ as the standard deviation.
2
1 1 ln(𝑥)−𝜇
− ( )
𝑓(𝑥) = 𝑒 2 𝜎
𝑥𝜎√2𝜋
Here,
μ is the location parameter or mean value and σ the shape parameter or standard deviation of the
distribution.
Mean of Lognormal Distribution
1 𝜎2
where 𝜇 = ln 𝑥 − ln (𝑥 2 + 1)
2
We already know that μ′ is the mean value. μ is the mean of the natural logarithms. x values are
the times-to-failure.
The lognormal distribution differs from the normal distribution. The shape of the normal and
lognormal distributions differs significantly. The normal distribution is symmetrical, whereas the
lognormal distribution is not. A right-skewed curve is created by the positive values in a lognormal
distribution.
The expected value is also known as the mean of the distribution and provides useful information
about the average that may be expected from a large number of repeated trials. A distribution’s
variance indicates how “spread out” the data is. The standard deviation, or the square root of the
variance, is related and useful because it is in the same units as the data.
4. For the same σ, the lognormal distribution curve skewness increases as μ increases.
5. For σ values significantly greater than 1, the lognormal distribution curve rises very sharply
in the beginning. It then follows the coordinate axis, peaks out early, and then decreases
sharply like an exponential curve.
6. μ or the mean in terms of X = ln(x) is the scale parameter and not the location parameter
as in the case of the normal probability distribution function.
7. σ or the standard deviation in terms of X = ln(x) is the shape parameter and not the scale
parameter, as in the normal probability distribution function.
Example 1:
The daily website visitors of a small blog follow a lognormal distribution with a mean of 50
visitors and a geometric standard deviation of 1.1. Calculate the variance of the daily website
visitors.
Solution:
To find the variance σ2 we will use the formula for the variance of a lognormal distribution:
• σ2 is the variance
• μ = 50
• σ = 1.1
putting these values in the formula we get,
σ2 = 21.1829
∴ Variance of the daily website visitors is approximately 21.1829.