Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Module 2-Random Variables (Modified) (1)

Module 2 covers basic probability concepts including random variables, their types, and key statistical parameters such as expectation and variance. It discusses various probability distributions like Binomial, Poisson, and Normal, along with their properties and applications. Additionally, it includes practical problems and Chebyshev's inequality for understanding deviations in probability distributions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Module 2-Random Variables (Modified) (1)

Module 2 covers basic probability concepts including random variables, their types, and key statistical parameters such as expectation and variance. It discusses various probability distributions like Binomial, Poisson, and Normal, along with their properties and applications. Additionally, it includes practical problems and Chebyshev's inequality for understanding deviations in probability distributions.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

MODULE 2-Basic Probability

Basic Probability: Random variables: Discrete and continuous random variables,


Expectation of Random Variables, Variance of random variables, Chebyshev’s
Inequality. Probability distributions: Binomial, Poisson and Normal - evaluation of
statistical parameters for these three distributions, Correlation Tests for normality,
Log normal etc.

Definition and examples of random variables and their types. Mean, variance
and standard deviation of a Discrete probability distribution.

Discrete probability distribution: (𝑥𝑖 , p (𝑥𝑖 ) ) where 𝑥𝑖 denotes a certain event and
p (𝑥𝑖 ) denotes its associated probability. X is the random variable associated with
an experiment within which we have 𝑥𝑖 (𝑠𝑖𝑛𝑔𝑢𝑙𝑎𝑟 𝑒𝑣𝑒𝑛𝑡𝑠).

Mean = ∑ 𝑥𝑖 . 𝑝(𝑥𝑖 )
Variance = ∑ (𝑥𝑖 − 𝑚𝑒𝑎𝑛)2 . p(𝑥𝑖 )
Standard deviation = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
Ques – Define random variable, probability function, discrete probability
distribution, probability density function (probability mass function), cumulative
distribution function.

Q1. Check whether the following distribution is a discrete probability distribution. Find mean and
variance

Q2. The probability density function of a variate X is

(i) Find k
(ii) Find 𝑃(𝑋 < 4), 𝑃(𝑋 ≥ 5), 𝑃(3 < 𝑋 ≤ 6)

Q3. A random variable X has the following probability function given by the following table:

(i) Find the value of k


(ii) Evaluate P(𝑿<𝟔), 𝑷(𝑿≥𝟔), 𝑷(𝟎<𝑿≤𝟓)
(iii) Also find the probability distribution.

Q4. A random variable X has the following probability function:


X 1 2 3 4 5

P(X) k k 3k 𝑘 2+k 6𝑘 2

(i) Find the value of k. (ii) Evaluate P(X<3), P(1<X<4) (iii) Determine the distribution
function of X.

Q5. A random variable X has the following probability function:


x -2 -1 0 1 2 3

P(x) 0.1 k 0.2 2k 0.3 k

(i) Find the value of k and calculate the mean and variance.
Q6. A random variable X has the following probability function:
x 0 1 2 3 4

P(x) k 2k 2k 𝑘2 5𝑘 2

(i) Find the value of k.


(ii) Evaluate P(X<3), P(0<X<4)
(iii) Determine the distribution function of X.

Q 7 If three coins are tossed, find the expectation and variance of the number of heads.
[Ans:9/8,63/6]

Q 8 A coin is tossed until a head appears. Find the expectation of tosses required.
[Ans:2]

Expectation of Random Variables, Variance of random variables

Bivariate distribution or joint probability distribution is a distribution if f(x,y) ≥ 0


and ∑𝑥 ∑𝑦 𝑓(𝑥, 𝑦) = 1.

P(𝑥𝑖 , 𝑦𝑗 ) or f(x,y) = joint probability of an event (𝑥𝑖 , 𝑦𝑗 ) happening simultaneously


E(X) = Expected weighted average of X

Mean (X) = E(X) = ∑𝑖 𝑥𝑖 . 𝑓(𝑥𝑖 ) where f(𝑥𝑖 ) is the probability function corresponding
to 𝑥𝑖

Variance (X) = E[(𝑋 − 𝜇𝑋 )2 ] = ∑𝑛1(𝑥𝑖 − 𝜇𝑋 )2 f(𝑥𝑖 )

Standard deviation 𝜎𝑋 = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

COV(X,Y)
Correlation of X and Y = 𝜌(𝑋, 𝑌) =
𝜎𝑋 𝜎𝑌

Discrete Probability Distribution :


1. Binomial distribution: Statement, Mean, Variance and S.D. (No proof)

n = number of trials
x denotes the number of times we need successes
p = probability of success
q = probability of failure
P(x) = 𝑪𝒏𝒙 𝒑𝒙 𝒒𝒏−𝒙
Mean = np
Variance = npq

Q1. In a sampling of a large number of parts manufactured by a company, the mean number of
defectives in a sample of 20 is 2. Out of 1000 such samples, how many would be expected to
contain at least 3 defective parts? Ans: P(x>=3)=323

Q2. The probability that a person aged 60 years will live up to 70 is 0.65. What is the probability
that out of 10 persons aged 60 at least 7 of them will live up to 70. Ans: P(x>=7)=0.5138

Q3. The number of telephone lines busy at an instant of time is a binomial variate with probability
0.1 that a line is busy. If 10 lines are chosen at random, then what is the probability that
(i) No line is busy ii) All lines are busy iii) At least one line is busy iv) At most two lines are
busy
Ans: i) P(0)=0.3487 ii) P(10)=(0.1)^10 iii) P(x>=1)=0.6513 iv)P(2)=0.9298

Q4. In 800 families with 5 children each how many families would be expected to have
(i) 3 boys ii) 5 girls iii) Either 2 or 3 boys
(iv)At most 2 girls by assuming probabilities for boys and girls to be equal.
Ans: i) f(3)=250 ii) f(0)=25 iii) f(2)+f(3)=500 iv) f(5)+f(4)+f(3)=400

Q5. A die has thrown 8 times. Find the probability that 3 falls (i) Exactly 2 times (ii) At least once
(iii) Atmost 7 times.
[0.260476, 0.7674319, 0.9999994]

Q6. Ten coins are tossed simultaneously. Find the probability of getting at least seven heads.
[ANS: 0.171875]

Q7. In a hurdle race a player has to cross 10 hurdles. The probability that he will clear each hurdle
is 5/6. What is the probability that he will knock down fewer than 2 hurdles.
[Ans: 0.9999991731]

2. Poisson distribution: Statement, Mean, Variance and S.D. (No proof) :

Limiting case of binomial distribution i.e. n goes to infinity and p is approaching


0.
𝒎𝒙 𝒆−𝒎
P(x) = Probability of x successes = where np tends to a finite number m
𝒙!
Mean = m
Variance = m
Q1. If the probability of a bad reaction from a certain injection is 0.001, determine the chance that
out of 2000 individuals, more than two will get a bad reaction. Ans: 0.3222
Q2. 2% of the fuses manufactured by a firm is found to be defective. Find the probability that a
box containing 200 fuses contains i) No defective fuses ii) 3 or more defective
fuses
Ans: i) 0.0183 ii) 0.7621
Q3. The probability that a pen manufactured by a company will be defective is 1/500. The pens
are supplied in packets of size 10. Use Poisson distribution to calculate the approximate number
of packets containing i)No defective ii) One defective iii) Two defective pens,
in a consignment of 10,000 packets. Ans: i) 9802 ii)196 iii) 2
Q4. The number of accidents in a year to taxi drivers in a city follows a Poisson distribution with
mean 3. Out of 1000 taxi drivers, find approximately the number of drivers with
i) No accident in a year ii) More than 3 accidents in a year
Ans: i) 50 ii) 350

Continuous Probability Distribution


Q 1 The length of time (in minutes) a lady speak on the telephone is found to be random
phenomenon with p.d. f as

Find A and the probabilities that she will speak more than 10 minutes, less than 5 minutes.
and between 5 and 10 minutes. [Ans:1/5,0.1353,0.6321,0.2325]
Q 2 Find the constant c such that the function

is a density function find (i) c (ii) P(1<X<2)


Q 3 The daily consumption of electric power (in millions of kwh ) is a r.v. X with p.d.f.

Find the value of k and the probability that on a given day the electric consumption is
more
than the expected electric
consumption. [Ans:1/9,0.406]
Q 4 A continuous random variable X has p.d.f. f(x) given by
f(x) = 2ax + b for 0 x < 2,
=0 otherwise.
If the mean of the distribution is 3, find the constants a and b. [Ans:3/2,-
5/2]

Q 5 A continuous random variable X has probability density function given by

Find E(x) and E(X2).

Q 6 In a certain city the daily consumption of water (in millions of litres) is a random variable
X
with p.d.f.

Find the variance in the daily consumption of water.

* Normal distribution: Statement, Mean, Variance and S.D. (No proof)

1. Data that is given to us follows normal distribution, so random variable denotes


the data that is normally distributed. −∞ < 𝑥 < ∞, −∞ < 𝜇 < ∞ 𝑎𝑛𝑑 𝜎 > 0 .

1 −(𝑥−𝜇)2⁄
2. Probability density function p(x) = 𝑒 𝜎2
𝜎 √2𝜋

3. Normal distribution is always symmetric about the mean 𝜇.



4. Area under the curve p(x) gives probability which implies ∫−∞ 𝑝(𝑥). 𝑑𝑥 = 1.

5. Mean for this data is 𝜇 and variance is 𝜎 .

6. In order to solve the question, always reduce your given distribution to standard
𝑥−𝜇
normal distribution with 𝜇 = 0 and 𝜎 = 1 by taking = z.
𝜎

7. Values for standard normal distribution are given in a table where 𝜑(𝑧1 ) =
1 𝑧1 −𝑧 2⁄
∫0
𝑒 2 dz = p (0 ≤ 𝑧 ≤ 𝑧1 )
√2𝜋

8. Area is equally distributed about mean.

9. Since data is symmetric about mean 𝜇, so negative values can be evaluated in


context of positive side of 𝜇.

10. 𝜑(𝑧1 = 0. 𝑝𝑞) is seen in such a way that for 0.p we move across the column
and 0.0q is seen across the row and intersection of these two give us the value for
𝜑(𝑧1 = 0. 𝑝𝑞) which is the area under the bell-shaped curve from 0 to 𝑧1 .
Q1. Evaluate the following probabilities with the help of normal probability tables.
i) 𝒑(𝒛≥𝟎.𝟖𝟓) ii) 𝒑(−𝟏.𝟔𝟒≤𝒛≤−𝟎.𝟖𝟖) iii) 𝒑(𝒛≤−𝟐.𝟒𝟑) iv) 𝒑(|𝒛|≤𝟏.𝟗𝟒)
Ans: i) 0.1977 ii) 0.1389 iii) 0.0075 iv) 0.9476

Q2. If x is a normal variate, with mean 30 and standard deviation 5, then find the probabilities of
the following
i) 𝟐𝟔≤𝒙≤𝟒𝟎 ii)𝒙≥𝟒 iii) |𝒙−𝟑𝟎|>𝟓 Ans: i) 0.7653 ii) 0.0013

Q3. The marks of 1000 students in an examination follows a normal distribution with mean 70 and
standard deviation 5. Find the number of students whose marks will be
i) Less than 65 ii) More than 75 iii) Between 65 and 75.
Ans: i) 159 ii) 159 iii) 683

Q4. In a normal distribution 31% of the items are under 45 and 8% of the items are over 64. Find
the mean and standard deviation.
Ans: Mean=50 and S.D.=10

• Chebyshev's inequality

In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation
of a random variable (with finite variance) from its mean. More specifically, the probability that a
random variable deviates from its mean by more than 𝑘𝜎 is at most 1/𝑘 2, where k is any positive
constant.
Chebyshev's inequality is more general, stating that a minimum of just 75% of values must lie within
two standard deviations of the mean and 88.89% within three standard deviations for a broad range
of different probability distributions.

Probabilistic statement

Let X (integrable) be a random variable with finite non-zero variance σ2 (and thus finite expected
value μ). Then for any real number k > 0
1
Pr(𝜇 − 𝑘𝜎 < 𝑥 < 𝜇 + 𝑘𝜎) > 1 - 𝑘2

1
Pr(|𝑥 − 𝜇| ≥ 𝑘𝜎) ≤ 𝑘2

Using this you can calculate that what percentage of data lies within k range of standard deviation from
the mean.

Problems
Q1. Suppose a fair coin is tossed 50 times. The bound on the probability that the number of
heads will be greater than 35 or less than 15 can be found using Chebyshev's Inequality.
In other words, chances of a fair coin coming up heads outside the range of 15 to 35 times is
at most 0.125.

Q2. Suppose 1,000 applicants show up for a job interview, but there are only 70 positions
available. To select the best 70 people amongst the 1,000 applicants, the employer gives an
aptitude test to judge their abilities. The mean score on the test is 60, with a standard deviation
of 6. If an applicant scores an 84, can they assume they are getting a job?

The results show that about 63 people scored above a 60, so with 70 positions available, an
applicant who scores an 84 can be assured they got the job.

Q-1 Suppose X is a random variable with mean 100 and standard deviation 5.
(a) What conclusion can you draw for k=2,3 through Chebyshev’s inequality.
(b) Estimate the probability that x lies between 80 and 120.
(c) Find [a,b] about 100 for which the probability that x lies in the interval is at least 99
percent.
Q- 2 Let X be a random variable with mean 100 and standard deviation 10. Use Chebyshev’s
inequality to estimate (a) P(X ≥ 120) and (b) P(X ≤ 75).
3. Suppose a random variable X has mean μ = 25 and standard deviation 𝜎
= 2. Use Chebyshev’s inequality to estimate

(i) Pr(X ≤35) and (ii) Pr(X ≥20)


Q4. Let X be a random variable with mean μ = 40 and standard deviation σ= 5. Use
Chebyshev’s inequality to find b for which
Pr(40−b ≤X ≤40+b) ≥ 0.95.

Q5. Let X be a random variable with mean μ = 80 and unknown standard deviation σ
. Use Chebyshev’s inequality to find a value of σ for which
Pr(75≤X ≤85) ≥ 0.9.

Log normal distribution (Self-study)

Lognormal distribution is a continuous probability distribution with a long tail to the right that is
right skewed. It’s used to represent things like income distributions, chess game lengths, and the
time it takes to repair a maintainable system, among other things.

This type of distribution is generally characterized by skewed distributions with low mean
values, large variation, and all-positive values. ln(x) only exists for positive x values, hence
values must be positive.

An example of a Lognormal distribution curve can be seen below:

Lognormal Distribution of a Random Variable

By definition, lognormal means the continuous probability distribution of a random variable


whose logarithm is normally distributed. Let x be a standard normal variable, which means the
probability distribution of x is normal, centered at 0 and with a variance of 1. Then a log-normal
distribution is defined as the probability distribution of a random variable

𝑋 = 𝑒 𝜇+𝜎𝑥
where μ and σ is the mean and standard deviation of the logarithm of X respectively.

As x is normal, Y= ln(X) has a normal distribution if the random variable X is lognormally


distributed.

The lognormal distribution of a Random Variable is shown below.

Probability Density Function

The probability density function for the lognormal is defined by the two parameters μ and σ,
where x > 0. When our lognormal data is transformed using logarithms our μ can then be viewed
as the mean and σ as the standard deviation.
2
1 1 ln(𝑥)−𝜇
− ( )
𝑓(𝑥) = 𝑒 2 𝜎
𝑥𝜎√2𝜋

Here,

μ is the location parameter or mean value and σ the shape parameter or standard deviation of the
distribution.
Mean of Lognormal Distribution

The mean of the lognormal distribution, μ, is given by:


1 2
μ′ = 𝑒 𝜇+ 2(𝜎)

1 𝜎2
where 𝜇 = ln 𝑥 − ln (𝑥 2 + 1)
2

We already know that μ′ is the mean value. μ is the mean of the natural logarithms. x values are
the times-to-failure.

Lognormal Distribution Curve

The lognormal distribution differs from the normal distribution. The shape of the normal and
lognormal distributions differs significantly. The normal distribution is symmetrical, whereas the
lognormal distribution is not. A right-skewed curve is created by the positive values in a lognormal
distribution.

The expected value is also known as the mean of the distribution and provides useful information
about the average that may be expected from a large number of repeated trials. A distribution’s
variance indicates how “spread out” the data is. The standard deviation, or the square root of the
variance, is related and useful because it is in the same units as the data.

The lognormal distribution curve shows the following properties:

1. The lognormal distribution curve is a distribution skewed to the right.


2. The lognormal distribution curve starts at zero, increases to its mode, and decreases
thereafter.
3. The degree of skewness increases as σ increases, for a given μ

o μ = mean of the natural logarithms of the times-to-failure.


o σ = standard deviation of the natural logarithms of the times-to-failure.

4. For the same σ, the lognormal distribution curve skewness increases as μ increases.
5. For σ values significantly greater than 1, the lognormal distribution curve rises very sharply
in the beginning. It then follows the coordinate axis, peaks out early, and then decreases
sharply like an exponential curve.
6. μ or the mean in terms of X = ln(x) is the scale parameter and not the location parameter
as in the case of the normal probability distribution function.
7. σ or the standard deviation in terms of X = ln(x) is the shape parameter and not the scale
parameter, as in the normal probability distribution function.

Example 1:
The daily website visitors of a small blog follow a lognormal distribution with a mean of 50
visitors and a geometric standard deviation of 1.1. Calculate the variance of the daily website
visitors.
Solution:

To find the variance σ2 we will use the formula for the variance of a lognormal distribution:

Accordng to the given information, we have:

• σ2 is the variance
• μ = 50
• σ = 1.1
putting these values in the formula we get,
σ2 = 21.1829
∴ Variance of the daily website visitors is approximately 21.1829.

You might also like