Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Complete Lectures PME

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 330

Probability Methods in Engineering

CONTENTS
1

1. I N T R O D U C T I O N T O S T A T I S T I C S
2. M E A S U R E S O F C E N T R A L T E N D E N C Y
3. P R O B A B I L I T Y
4. C O N D I T I O N A L P R O B A B I L I T Y & M A T H E M A T I C A L E X P E C T A T I O N
5. P R O B A B I L I T Y D I S T R I B U T I O N S
6. P R O B A B I L I T Y D E N S I T I E S
7. S A M P L I N G D I S T R I B U T I O N S
8. R E G R E S S I O N & C O R R E L A T I O N A N A L Y S I S
9. E S T I M A T I O N O F P A R A M E T E R S
10.T E S T I N G O F H Y P O T H E S E S
Agenda
2

 Introduction of Course

Why you should study Probability Methods in Engineering

 Method of Statistical Thinking

Contents of Course

Grade Criteria

First Topic Treatment of Data (or Data Analysis)


Introduction of Course
3

Course Prerequisite:
Calculus (Integral & Differential)
Algebra (Solution of system of Equation)
Recommended Books
1. Applied Statistics and Probability for Engineers, By Douglas C. Montgomery
2. Probability for Engineers By Irwin Miller, John E Freund
3. Statistical methods for Engineering & Scientists, By Walpol & Meyers
4. Introduction to statistics Theory, By Sher Mohammad
Student’s Efforts: Besides class hours(ONLY 48 hrs), every student should devote at
least 6 hours a week to grasp the content of the book and the class notes, and to work out the
examples.
Why you should study Engineering Analysis & Statistics
4

Statistics is a discipline that includes procedure and techniques used to collect


process and analyse numerical data to reach at decision in case of uncertainty.
Descriptive Statistics Consists of organizing and summarizing data.
Inferential Statistics Consists of using data you’ve collected to form conclusions.
THE ENGINEERING METHOD AND STATISTICAL THINKING
5

An engineer is someone who solves problems of interest to society by the efficient


application of scientific principles. Engineers accomplish this by either refining an
existing product or process or by designing a new product or process that meets
customers’ needs.
Steps in the Engineering Method
6

1. Develop a clear and concise description of the problem.


2. Identify, at least tentatively, the important factors that affect this problem or that may play a role in its
solution.

3. Propose a model for the problem, using scientific or engineering knowledge of the phenomenon being
studied. State any limitations or assumptions of the model.

4. Conduct appropriate experiments and collect data to test or validate the tentative model or conclusions
made in steps 2 and 3.

5. Refine the model on the basis of the observed data.

6. Manipulate the model to assist in developing a solution to the problem.

7. Conduct an appropriate experiment to confirm that the proposed solution to the problem is both
effective and efficient.
8. Draw conclusions or make recommendations based on the problem solution.
Why you should study probability
7

How to make accurate predictions How to exercise accurate decision

The mark you will get on the Can you make money by playing
final the Lottery?
Your average annual income
Population Sample
over the next several years
a
Whether it will rain tomorrow
Inference
Parameters Statistic
..

Weekly Plan of Course


8
Week # Course Contents
1 Introduction to Statistics, Treatment of data
2 Measures of central tendency, Variance, Standard deviation
3 Counting Principle, Probability & its elementary theorems
4 Conditional probabilities, Bay’s theorem
5 Mathematical expectation and decision making
6 Sampling distributions
7 Probability distributions
8 Probability densities
9 Regression, Correlation & Rank Correlation
10 Standard error of estimates
11 Curve fitting by least square methods incorporating linear, polynomial, exponential & power function
12 Estimation of parameters
13 Point and interval estimates
14 Confidence interval
15 Statistical decisions
16 Hypotheses tests
Grade Criteria
9
Sr. # Marks Distribution % weight
1. Quiz 15
2. Attendance 00
3. Assignment/ Presentation 10
4. Mid Term EXAM 25
5. Final Term EXAM 50
Total 100
Presentation of Data
10

There are three main types for presentation of


data.
Classification
Tabulation
Graphical Display
Classification
11

Classification is the sorting of data into homogeneous classes or groups


according to their being alike or not:
OR
Process of dividing a set of data into classes or groups in such a way
that,
Observation in same class are similar
Observation in each class are dissimilar to the other classes.
Tabulation
12

It is the systematic presentation of classified


data under suitable heading and placed in the
forms of rows and columns.
 This sort of logical arrangement makes the
data easy to understand, facilitates comparison
and provides effective way to convey
information to reader.
Graphical Display
13

A visual representation of statistical data in the form of lines, area and


other geometrical shapes is known as graphical representation.

Graphical Display is further divided into two types. These types are as
follow:
 Graph
 Diagram
Frequency Distribution
14

The organization of data in a table which shows


distribution of data into classes or groups
together with the number of observation in each
class is called frequency distribution.
The number of observation in each class is
referred as frequency.
Frequency distribution terms
15

 Class Limit
 Class Boundary
 Class Interval
 Class Mark ( Midpoint value)
Make a Classification
16
of data into groups

106, 107, 76, 82, 109, 107, 115, 93, 187, 95, 123, 125, 111, 92, 86, 70, 110, 126,

68, 130, 129, 139, 115, 128, 100, 186, 84, 99, 113, 204, 111, 141, 136, 123, 90,

115, 98, 110, 78, 185, 162, 178, 140, 152, 173, 146, 158, 194, 148, 90, 107, 181,

131, 75, 184, 104, 110, 80, 118, 82.

204-68=136 136/7=19.47 Let h=20


Class Class Class Tally Marks Frequenc
Boundaries
Limits Marks y
17
65-84 64.5-84.5 74.5 9
IIII IIII
85-104 84.5-104.5 94.5 10
IIIIIIII
105-124 104.5-124.5 114.5
IIII IIIIIIII II 17

125-144 124.5-144.5 134.5 10


IIIIIIII
145-164 144.5-164.5 154.5
IIII I 6

165-184 164.5-184.5 174.5


IIII 4

185-204 184.5-204.5 194.5


IIII 5
Difference between Graph and Diagram
18

 A graph is a representation of data by


continuous curve

 Diagram is any other form of visual


representation.
Diagrams
19

Simple Bar Chart

A simple bar chart consist of horizontal or


vertical bar of equal widths and lengths equal
to value represented by frequency
Example: Draw a simple bar diagram to represent the
turnover of a company
20 for 5 years
Years 1965 1966 1967 1968 1969
Turnover (in 35000 42000 43500 48000 48500
Dollars)

Years
60000

50000

40000

30000

20000

10000

0
1965 1966 1967 1968 1969

Years
Multiple Bar Chart 21

A multiple Bar chart shows two or more characteristics


corresponding to value of a common variable in the form of
grouped bars whose lengths are proportional to the value of the
characteristics and each bar is colored or shaped differently.

Example:

Draw Multiple bar diagram to show area and production of cotton


from the following data
Year Area Production
1965-66 2866 1588
1970-71 22
3233 2229
1975-76 3420 1937

Chart Title
4000

3500

3000

2500

2000

1500

1000

500

0
1965-66 1970-71 1975-76

Area Production
Component Bar Chart
23

A component bar chart is an effective technique in which each bar


is divided into two or more sections proportional in size to
component part of total being displayed by each bar

Example: Draw a component Bar chart of Population city wise


Cities Total Male Female
Peshawar 64 33 31

Rawalpindi 40 21 19

Sargodha 60 32 28

Lahore 65 35 30
Component Bar Chart
24

Chart Title
70

60

50

40

30

20

10

0
Peshawar Rawalpindi Sargodha Lahore

Male Female
Pie Diagram
25

 
 Pie diagram is consisting of a circle divided
into sectors whose area are proportional top
the various parts into which whole quantity
is divided.
Example: Represent the total expenditure and expenditure of
various items of a family by the Pie diagram.
26

Items Food Clothing House Fuel Misc.


rent
Expenditure 50 30 20 15 35
Angle of Sector 50/150*360 30/150*360 20/150*360 15/150*360 35/150*360
=120 =72 =48 =36 =84

Food Clothing House Rent Fuel Misc


Graphs27

Historigram:

A curve showing changes in the value of one or


more item from one period to next period of
time is known as historigram.
Example
28
Year 1929 1930 1931
No. of Cars 98 74 68

No. of Cars
120

100

80

60

40

20

0
1929 1930 1931

No. of Cars
Histogram
29

A histogram consist of a set of adjacent


rectangles whose bases are marked off by
class boundaries or the X-axis and whose
height are proportional to frequency
associated with respective classes
Example: Construct a histogram for following frequency Distribution relating ages of
telephone operators
30

Age (Years) 18-19 20-24 25-29 30-34 35-44 45-54

No. of Operators 9 188 160 123 84 15

Class Boundaries Class Interval Proportional Height


17.5-19.5 2 4.5
19.5-24.5 5 37.6
24.5-29.5 5 32
29.5-34.5 5 24.6
34.5-44.5 10 8.6
44.5-54.5 15 1
Exercise
31

How to collect data?


Not Exactly Frequency curve
32

Frequency polygon & Frequency curve


Not Exactly Frequency curve
33

Frequency polygon & Frequency curve


MEASURES OF CENTRAL TENDENCY
34
Outline
35

Properties, Merits & Demerits of Averages


Properties of Variance
Calculation of Averages & Variance
Consistency criteria
Chebyshev's Theorem
Empirical Rule
Choice of Average
AVERAGE
36

  A single value that can represent whole set of data


is known as average
Types of Average
Arithmetic mean (AM)
Geometric mean (GM)
Harmonic mean (HM)
Median
Mode
Importance of Measures of central tendency
37
  To find representative value
Measures of central tendency or averages give us one value for the distribution and this value
represents the entire distribution.
 To condense data
     Collected and classified figures are scattered. To condense these figures we use average. Average
converts the whole set of figures into just one figure.
  To make comparisons
    To make comparisons of two or more than two distributions, we have to find the representative
values of these distributions. These representative values are found with the help of measures of the
central tendency.
  Helpful in further statistical analysis
    Many techniques of statistical analysis like Measures of Dispersion, Measures of Skewness, Measures
of Correlation, and Index Numbers are based on measures of central tendency. 
Ungrouped & Grouped data
38
A raw data set  can be organized by
An unorganized raw data set is
constructing a table showing
referred to as ungrouped data.
the frequency distribution of the
For Example see slide # 17 of
variable. Such a frequency table is
previous lecture often referred to as grouped data.
106, 107, 76, 82, 109, 107, 115, 93,  For Example see slide # 18 of previous
Class Boundaries lecture
Frequency
187, 95, 123, 125, 111, 92, 86, 70,
64.5-84.5 9
110, 126, 68, 130, 129, 139, 115, 128,
84.5-104.5 10
100, 186, 84, 99, 113, 204, 111, 141,
104.5-124.5 17
136, 123, 90, 115, 98, 110, 78, 185, 124.5-144.5 10
162, 178, 140, 152, 173, 146, 158, 144.5-164.5 6
194, 148, 90, 107, 181, 131, 75, 184, 164.5-184.5 4
104, 110, 80, 118, 82. 184.5-204.5 5
Population & Sample
39

Statistical Population is a collection of data that possessing a


common characteristic.

Sample is a subgroup or subset of the population.

Parameter is a numerical measure obtained from a population.

Statistic (not to be confused with Statistics) is a characteristic or


measure obtained from a sample.
Arithmetic mean with Properties
40
Sum of all the values divided by the number of values is known as Arithmetic mean __
This can either be a population mean (denoted by μ) or a sample mean (denoted by )
x
__
 x
  X
x 
n
&
N

For grouped data it is calculated by

The sum of the deviations of all the observation taken from their arithmetic mean is zero.
 _

i,e.

 x  x 0

The sum of the squares 2 the deviations of the observation taken from arithmetic mean is minimum.
_ of
  2 2
  x - x 
 _  
  x - x     x -a 
i,e.   is minimum

or
   
Where a is an arbitrary value.
Combined mean & Weighted mean
41

Weighted mean The mean when each value is multiplied by its


weight and summed. This sum is divided by the total of the weights.
__
XW 
 WX

Combined mean W
If a series of n observations consists of two components having n1 and n2
observations (n1+ n2=n), and means Xand1 respectively
X 2 then the
Combined mean of n observations is given by
__
n1 x1  n2 x 2
XC 
n1  n2
Arithmetic Mean
42

Merits Demerits
Arithmetic mean is most popular among  The A.M is affected by the extreme
averages used in statistical analysis. (Lowest & highest) values in a series.
It is very simple to understand and easy to
calculate.  In case of a missing observation in a
The calculation of A.M is based on all the
series it is not possible to calculate the
observations in the series.
A.M.
The A.M is responsible for further algebraic
 In case frequency distribution with
treatment.
It is strictly defined.
open end classes the calculation of A.M
is impossible.
It provides a good means of comparison
GEOMETRIC MEAN
43

The Geometric mean of n positive observation is defined as the n th root of the product
of all observation.

G.M=(X1×X2×...×Xn)1/n
 logXi
or G.M= Antilog{ n}

For grouped data it is calculated


1 by
f

G = (X1f1×X2f2×... ×Xnfn)

 f log x
i i

or G= antilog{  f} i
Combined Geometric mean
44

If G1 and G2 are geometric means of two component having n1 and n2


observations and Gc is the geometric mean of the combined series of n
(n=n1+n2) observations then
w1 w2

G=
c
G G 1 2

n1 n2
w1  w2 
Where n1  n2 & n1  n2
Geometric mean
45

Merits Demerits
It is rigidly defined and its value is a It cannot be calculated if any of the
precise figure. observation is zero or negative.

It is based on all observations. Its calculation is rather difficult.

It is capable of further algebraic It is not easy to understand.


treatment.
It may not coincide with any of the
It is not affected by extreme values. observations.
Harmonic mean
46

Harmonic mean is the reciprocal of arithmetic mean of reciprocal


values.
1
H.M= 1 1 1
A.M { , ,..., }
x1 x2 xn
n
= 1 1 1
  ... 
x1 x2 xn

For grouped data it is calculated by

H.M= f
f
x
Harmonic Mean
47

Merits Demerits
Its value is based on all the It is not simple to calculate and
observations of the data. easy to understand.
It is less affected by the extreme
values. It can not be calculated if one of
It is suitable for further the observations is zero.
mathematical treatment.
It is strictly defined. The H.M is always less than A.M
and G.M.
Relation between A.M, G.M, and H.M
48

The relation between A.M, G.M, and H.M is given


by following inequality
H .M  G.M  A.M
The equality condition holds true only if all the values are equal in the
distribution.
Median
49

It is the value that divides the arranged set of data into two equal parts.
For n observations x1,xth 2,...,xn first arrange the data into ascending or descending order
For odd n,  n  1 observation is median.
  th
 n  2 
th
 2  n
  &  2  value
For even n, median is A.M of 2  
n
For group data first choose median group by observing in 2cumulative frequencies

Median=

Where l=lower class boundary of median group.


c=cumulative frequency of preceding group of median group.
h=class interval,
f=frequency of median group.
n=∑f.
Median
50

Merits Demerits
The median is useful in case of For calculating median it is necessary to

frequency distribution with open- arrange the data, where as other averages do
not need arrangement.
end classes. Since it is a positional average its value is not
The median is recommended if determined by all the observations in the series.
distribution has unequal classes.
Extreme values do not affect the It can not be calculated if first class is chosen as

median as strongly as they affect median class.


the arithmetic mean.
Median is not capable for further algebraic
It is easy to calculate and
calculations.
understand.
Mode
51

Mode is the most frequent value in the data.

For grouped data first choose the Mode group by observing largest frequency

f m  f1
Mode=
l h
( f m  f1 )  ( f m  f 2 )
where, l= lower class boundary of mode group.
fm =Frequency of mode group
f1 =Frequency of preceding group of mode group
f2 =Frequency of following group of mode group
h=class interval
Mode
52

Merits Demerits
It is not suitable for further
It is easy to calculate and simple
to understand. mathematical treatments.
The value of mode cannot always
It is not affected by the extreme
values. be determined.
The value of mode is not based on
Its value can be determined in
case of open-end class interval. each and every value of the series.
The mode is strictly defined. It can not be calculated if first or
last class is chosen as mode class.
Relative position of mean, median, mode for three distribution
53

Positively skewed distribution

Symmetric distribution

Negatively skewed distribution


Variance
54

The mean of “square of ‘deviation from arithmetic mean’ ” is known as variance.

2 
  x  x
2

or 
 x 2

 2
N N
Properties of variance
 Var(X) cannot be negative.
 Var(a) = 0 , where a is constant.
 Var(aX) = a2 Var(X) , where a is constant.
 Var(X +a) = Var(X) , where a is constant.
 Var(aX + b) = a2 Var(X) , where a & b are constant.
 Var(X + Y) = Var(X) + Var(Y)
 Var(X - Y) = Var(X) - Var(Y)
Standard Deviation
55

The positive square root of variance is known as standard deviation. It is


denoted by “σ”. __ 2
 
For Population σ =   x  x 
 
N
2
 
__
For Sample S=   x  x 
n 1
For group data
 fx   fx 
2
2

   
f f 
 
Standard Deviation
56

A smaller value of standard deviation indicates that most observation in


data are closed to arithmetic mean
and larger value indicates observations are widely scattered about mean

Properties of Standard deviation


S.D(X) cannot be negative.
S.D(a) = 0 , where a is constant.
S.D(aX) = aS.D (X) , where a is constant.
S.D(X +a) = S.D(X) , where a is constant.
S.D(aX + b) = aS.D (X) , where a & b are constant.
S.D (X + Y) = S.D (X) + S.D (Y)
S.D(X - Y) = S.D (X) - S.D (Y)
EXAMPLE (Ungrouped Data)
57
Find arithmetic mean of 45, 32 ,37, 46, 39, 36, 41, 48, 36.
A.M =360/9 =40
Find geometric mean of 45, 32, 37, 46, 39, 36, 41, 48, 36.
X Log x
45 1.6532
32 1.5051
37 1.5682
46 1.6627
39 1.591
36 1.5563
41 1.6127
48 1.6812
36 1.5563
GM=antilog{14.387/9} ∑logx=14.387

GM=39.67
58
Find harmonic mean of 15, 20, 25.
HM=

= 19.5
Find Median of
(i) 15, 20, 25 (ii) 2, 2, 3, 3, 3, 4, 4, 4, 5, 5
Median= 20 Median= 3 4
2
Find mode of 2, 3, 4, 5, 6, 7, 8, 9,10 mode does not exist
Find mode of 2, 3, 3, 3, 4, 4, 4, 4, 4 mode=4
Find mode of 2, 3, 3, 3, 4, 4, 4, 5, 6 mode=3, 4
Find mode of 2, 2, 3, 3, 4, 4, 5, 5, 6 mode=2, 3, 4, 5
EXAMPLE (Grouped Data)
59

Find the A.M, G.M, H.M, Median and Mode from data given below
Class limits Frequency
65-84 9
85-104 10
105-124 17
125-144 10
145-164 5
165-184 4
185-204 5
A.M (Solution)
60

Class limits Frequency (f) x fx


65-84 9 74.5 670.5
85-104 10 94.5 945
105-124 17 114.5 1946.5
125-144 10 134.5 1345
145-164 5 154.5 772.5
165-184 4 174.5 698
185-204 5 194.5 972.5
∑f=60 ∑fx=7350
3750
x  122.5
60
G.M (Solution)
61

Class limits Frequency (f) x logx f logx


65-84 9 74.5 1.872 16.85
85-104 10 94.5 1.975 19.75
105-124 17 114.5 2.058 34.99
125-144 10 134.5 2.128 21.28
145-164 5 154.5 2.188 10.94
165-184 4 174.5 2.241 8.964
185-204 5 194.5 2.288 11.44
∑f=60 ∑f logx=124.248

124.248 
G.M  anti log    117 .1
 60 
H.M (Solution)
62

Class limits Frequency (f) f/x


65-84 9 0.1208
85-104 10 0.1058
105-124 17 0.1484
125-144 10 0.0743
145-164 5 0.0323
165-184 4 0.0229
185-204 5 0.0257
∑f=60 ∑f/x=0.5304

60
H .M   113.1
0.5304
Median (Solution)
63

Class limits Frequency Class boundaries Cumulative frequency


65-84 9 64.5-84.5 9
85-104 10 84.5-104.5 19
105-124 17 104.5-124.5 36
125-144 10 124.5-144.5 46
145-164 5 144.5-164.5 51
165-184 4 164.5-184.5 55
185-204 5 184.5-204.5 60
n
 30
2
20  60 
Median  104.5    19   116 .8
17  2 
Mode (Solution)
64

Class limits Frequency Class boundaries


65-84 9 64.5-84.5
85-104 10 84.5-104.5
105-124 17 104.5-124.5
125-144 10 124.5-144.5
145-164 5 144.5-164.5
165-184 4 164.5-184.5
185-204 5 184.5-204.5

Highest frequency is 17
17  10
Mode  104 . 5   20  114 . 5
17  10  17  10
CO-EFFICIENT OF VARIATION
65

It is expressed as the percentage ratio of standard deviation and arithmetic mean.



Where C.V  100
σ = standard deviation x
x =arithmetic mean

Note: 1. Coefficient of variation measures the variation in data.


2. The smaller Coefficient of variation indicates data is more consistent.
3. The large value of Coefficient of variation indicates greater variability of data.
 
CONSISTENCY CRITERIA
When there are two set of data with their co-efficient of variation; the data with smaller value of co-
efficient of variation will be more consistent.
Example: Goals scored by two teams A and B in a football
season were as follows. Find which team is more consistent.
66

No of goals No of matches No of matches f1 x f1 x2 f2 x f2 x2


(x) of team A (f1) of team B (f2)
00 27 17 00 00 00 00
01 09 09 09 09 09 09
02 08 06 16 32 12 24
03 05 05 15 45 15 45
04 04 03 16 64 12 48
SUM 53 40 56 150 48 126

1.08229
σA=1.08229 x A  1.0566 C.V for team A 
1.0566
 100  1.09 0 0

σB=1.307 x B  1.2 C.V for team B 


1.307
 100  1.02 0 0
1.2
Chebyshev's Theorem
67

The proportion of the values that fall within k standard deviations of the mean will be atleast,
where k is any number greater than 1.
1
1 2
k _ _
  x  ks to x  ks
"Within k standard deviations" interprets as the interval:  
Chebyshev's Theorem is true for any sample set, not matter what the distribution.
Empirical Rule
The empirical rule is only valid for bell-shaped (normal) distributions. The following statements
are true.
Approximately 68% of the data values fall within one standard deviation of the mean.
Approximately 95% of the data values fall within two standard deviations of the mean.
Approximately 99.7% of the data values fall within three standard deviations of the mean.
Exercise
68

 2. Find A.M, G.M, H.M, median and mode of following data
H .M  G.M  A.M
Wages in rupees Less than 10 Less than 20 Less than 30 Less than 40 Less than 50

No. of workers 5 17 20 22 25

3. Prove the following for the arbitrary choice of a


2 2

 _  
 
  x - x     x -a 
_


 x  x 0

   
Exercise
69
4. What is the least value of arithmetic mean for which f3 must exist
What is the most value of arithmetic mean for which f3 must exist
Groups 60-62 63-65 66-68 69-71 72-74
Frequencies 15 54 81 24
f3
5. How Arithmetic mean, median is affected if every value of the variable is increased by 2 and multiplied by 5?

6. The G.M of 10 items of a series was 16.2. It was later found that one of the items was wrongly taken as 12.9 instead of
21.9. Calculate the correct G.M.

7. If n1=2, n2 = 3 and n3=5 and GM1=8, GM2=10 and GM3=15. find the combined geometric mean of all the observations.

8. Find the average rate of growth of population which in the first decade has increased of 20%, in the second decade by
30% and in third by 45%.
9. Write a precise note on “Proper choice of measures of central tendency” for different type of data.
Probability
70
Outline
71

Basic definitions
Counting principles
Probability
Elementary theorem of probability
Examples of Probability
OBSERVATION AND VARIABLE
72
 Any numerical recording  A characteristic that
of information is known as varies with an individual
observation. or object is called variable.

EXAMPLE 1: Classification EXAMPLE : Age is a variable


of a coin as a head or tail. as it varies from person to
EXAMPLE 2: Height of person.
persons
Random variable
73

 A random variable is some numerical


outcomes of a random process.
Example 1: Toss a coin 10 times
X=Number of heads
Example 2: Toss a coin until a head occur
X= Number of tosses needed
More random variables
74

 Toss a die
X=points showing on upper face of die
 Test a light bulb
X=lifetime of bulb
 Test 20 light bulbs
X=average lifetime of bulbs
Types of Random variable
75
2) Continuous random variable:
1) Discrete random variable:

A discrete random variable can A continuous random variable can


take only a discrete set of take any value within given interval.
integers or whole number

For example:
Height , weight and temperature
For example:
Number of students in class. Lifetimes of bulb
Number of facebook friend. Time t>0 or [0, t]
Experiment
76
Experiment means a process whose result yield a set of data
For example: Tossing of a coin.
 Random Experiment:
An experiment which processes different results under similar condition is called
random experiment.

Experiment Outcomes
Flip a coin Heads, Tails
Exam Marks Numbers: 0, 1, 2, ..., 100
Course Grades - -
F, C, C+, B , B, B+, A , A,
Sample Space
77

A set consisting of all the possible outcomes of a random experiment is


known as sample space. It is usually denoted by S.
Example: In case of tossing a die sample space is S = { 1,2,3,4,5,6 }
Event
An event is an individual outcome of random experiment. Or subset of
sample space
Let A be an event that dots on the upper face of the dice are even
A = { 2,4,6 }
Let B be an event that dots on the upper face of the dice are divisible by 3
B = { 3,6 }
Counting Rule
78

Permutation
Choose r object out of n object with order (choice with replacement)
nP
r = n!/[n-r]!

Choose n Object out of n object with order


nP = n!
n
How many six letter words are possible from letter A, B, C, D, E, F if DEF are together in given order
How many six letter words are possible from letter A, B, C, D, E, F if DEF are together in any order

Combination
Choose r object out of n object without order (choice without replacement)
nC
r = n!/r![n-r]!
Product rule
79

If set S1 & S2 contains n & m element respectively, there are m x n


way of choosing first an element of S1 and then an element of S2

For example: {H,T} sample space for a coin.


{1, 2, 3, 4, 5, 6} sample space for a dice
Joint experiment of rolling a dice and tossing a coin simultaneously
contains m x n = 2 x 6 = 12 sample points.
 {(1,H), (2,H), (3,H), (4,H),(5,H), (6,H), (1,T), (2,T), (3,T), (4,T),
(5,T), (6,T)}
Compliment of an Event
80

Let A be an event in a sample space S, Then complement of event A is


Ā= S-A
For example: A ={2,4,6} S={1, 2, 3, 4, 5, 6}
Then Ā ={1,3,5}

Disjoint or mutually exclusive event


Two events A and B are said to be disjoint if and only if they cannot
occur at the same time e.g. A B=Ø
Probability
81

Probability actually measures the uncertainty


Probability of an event A is defined as
P(A)= Number of favorable outcomes to A / Total number of sample space
Axioms of Probability

For any event Ai in a sample space S


i. 0≤P(Ai)≤1
ii. ∑i P(Ai) = 1
Probability Examples
82

EXAMPLE 1
(Assuming a fair die) S = {1, 2, 3, 4, 5, 6}
P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6
Then:
P(EVEN) = P(2) + P(4) + P(6)
= 1/6 + 1/6 + 1/6
= 3/6
= 1/2
Example 2
83

A fair coin is tossed two times what is the probability that at least one head appear?
SOLUTION:
Sample space S= {HH, HT, TH, TT}
n(S)=2x2=22=4
Let A be an event that at least one head appears
A= {HH, HT, TH}
n (A) =3
We know that
P(A)=n(A)/n(S)
P(A)=3/4
So probability of at least one head appear is 3/4
Example 3
84
An employer wishes to hire 3 people from a group of 15 equally qualified applicants , includes 8 men
and 7 women.
If he select 3 candidates randomly then what is the probability that
a) All three selected are women
b) At-least one woman is selected.

Solution (a) Let A be an event that all three selected are women.

n(S) = 15C3= 455.


n (A) = 7C3 X 8C0 = 35
P(A)=n(A)/n(S)
P(A)=35/455 =1/13
Solution (b)
85

Let B be an event that at least one woman is selected


n(S) = 15C3= 455.
n(B) = (7C1 X 8C2 ) + ( 7C2 X8C1 ) + ( 7C3X 8C0 )
= 399
P(B)=n(B)/n(S)
P(B)=399/455 =57/65
Probability Theorems
86

1. If Ā is a complement of event A in a sample space S then


P(Ā)= 1-P(A) or P(A)= 1-P(Ā)
2. If A and B are any two events in a sample space S then
probability that one of them occur is P(A U B) and
calculated by formula
P(A U B)= P(A) + P(B) – P(A∩B)
2.1 If A and B are disjoint events then
(A∩B) = Ø & P(A∩B) = 0
P(AUB) =P(A) + P(B)
Probability Theorems
87

3. If A,B and C are any three event. Then the probability of one of them
occur is
P(AUBUC) = P(A)+P(B)+P(C)-P(A∩B)-P(B∩C)-P(A∩C)+P(A∩B∩C)

4. If A1, A2, ...,Ak are any k event. Then the probability of one of them
occur is
P( A1  A2  ...  Ak )   P( Ai )   P( Ai  A j )   P( Ai  A j  Al )  ...  (1) k 1 P( A1  A2  ...  Ak )
i i j i  j l

The probability of an event given that another event has occurred is


called a conditional probability
Probability Theorems
88

5. If A and B are any two events of sample space then

P(A∩B)= P(A|B). P(B) if P(B)≠ 0

P(A∩B)= P(B|A). P(A) if P(A)≠ 0


Where P(A|B) is the conditional probability of A if B has already occurred.
& P(B|A) is the conditional probability of B if A has already occurred.
5.1 If A and B are independent events of sample space S
P(A∩B) = P(A) . P(B) ________ (1)
P(A|B) = P(A) (By equation 1& theorem 5)
& P(B|A)=P(B)
EXAMPLE 4
89
A coin is tossed four times. What is the probability that that at least one head occurred?

Solution: n(S)=24=16
S={HHHH,HHHT,HHTT,HTTT,TTTT,HTHH,HHTH,THH
H,HTTH,HTHT,THTH,TTHH, THHT,TTTH,TTHT,THTT}
Let A be an event that at least one head occurred
A={HHHH,HHHT,HHTT,HTTT, HTHH,HHTH,THHH,HTTH,HTHT,THTH,TTHH,
THHT,TTTH,TTHT,THTT}
n(A)=15
P(A)=15/16
Alternative Solution
90

Alternatively we can apply (Theorem 1) to find probability of event A


Let Ā be an event that no head occurred
Ā ={TTTT}
n(Ā)=1
P(Ā)=n(Ā)/n(S) =1/16

By using Theorem 1
P(A)= 1-P(Ā)= 1- 1/16= 15/16
Example 5
91

A pair of Dice are thrown randomly. Find the probability


of getting a total of either 5 or 11?
Solution:
Experiment: Rolling 2 dice and summing 2 numbers on top.
Sample Space: S = {2, 3, …, 12}
Sum Table
6.92

1 2 3 4 5 6
1 2 3 4 5 6 7
P(5) = 4/36
2 3 4 5 6 7 8
3 4 5 6 7 8 9
P(11) =2/36 4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
93

Let A be an event of rolling dice whose sum is 5


A= {(1,4),(2,3),(3,2),(4,1)}
Let B be an event of dice whose sum is 11
B= {(5,6),(6,5)}

P(A)=4/36 P(B)=2/36 P(A∩B) = Ø

Probability of getting a total of either 5 or 11 P(AUB) =P(A) +


P(B) –P(A∩B)
P(AUB)= P(A)+ P(B)=1/6
Example 6
94

The probability that a consumer testing service will


rate a product Very poor, Poor, Fair, Good, Very good,
Excellent are 0.07, 0.12, 0.17, 0.32, 0.21, 0.11
respectively. What is the probability that it will rate the
product?
i) Very poor, poor, fair or good.
ii) Good, Very good, or Excellent
Solution
95
 Let A be an event that testing service rate the product very poor
 Let B be an event that testing service rate the product poor
 Let C be an event that testing service rate the product Fair
 Let D be an event that testing service rate the product Good
 Let E be an event that testing service rate the product Very good
 Let F be an event that testing service rate the product Excellent

P(A) = 0.07
P(B) = 0.12
(i)
P(C) = 0.17
P(D) = 0.32 P(AUBUCUD)=?
P(E) = 0.21
P(F) = 0.11
96

Since Events A, B, C & D are disjoint so by Theorem 2


P(AUBUCUD) = P(A)+P(B)+P(C)+P(D)
= 0.07+0.12+0.17+0.32= 0.68
(ii)

P(DUEUF)=?
Since Events D, E & F are disjoint so by Theorem 2
P(DUEUF) = P(D)+P(E)+P(F)
= 0.32+ 0.21+0.11 = 0.64
Example 7
97

A card is drawn randomly from a deck of playing card what is the probability
that it is a diamond card, a face card or card of king.
Information
Deck of card consist on
52 cards
13 Diamond card 4 Suit
Diamond, Spade, Club, Heart

13 Spade card Each suit contain one card of

13 Club card Ace, King, Queen , Jack, 2, 3, 4, 5, 6, 7 , 8, 9, 10

13 Heart card
Solution
98

Let A be an event that a card is diamond


Let B be an event that a card is faced
Let C be an event that a card is king
13
P ( A) 
52
P( B) 
12
52
P(AUBUC)=?
4
P (C ) 
52

P(AUBUC)=P(A)+P(B)+P(C)-P(A∩B)-P(A∩C)-P(B∩C)+P(A∩B∩C)
99

A∩B: faced card which are diamond = 3


A∩C: diamond card which are king = 1
B∩C: faced card which are king = 4
A∩B∩C : faced card which are king of diamond = 1
3
P ( A  B) 
52
1
P ( A C ) 
52
4
P ( B C ) 
52
1
P ( A  B C ) 
52

13 12 4 3 1 4 1 11
P ( AUBUC)        
52 52 52 52 52 52 52 26
Example 8
100

Two coins are tossed, what is the conditional probability that two head appear given
that there is at least one head?
Solution:
Sample space for two coin S= {HH,HT,TH,TT}
Let A be an event that 2 head appears
A= {HH}
P(A|B)= ?
Let B be the event at least 1 head appear P ( A)  1
B= {HH,HT,TH} 4
3
P( B) 
4
101

P( A  B)
P( A | B) 
P( B)
A ∩ B= {HH} 1
P( A  B) 
4
1
1
P( A | B)  4 
3 3
4
NOTE THAT (!)

P ( A | B )  P ( A)
Exercise
102
A maintenance firm has gathered the following information regarding the failure
mechanisms for air conditioning systems:
Evidence of gas leaks
YES NO

Evidence of electrical failure


YES 55 17

NO 32 3

The units without evidence of gas leaks or electrical failure showed other types of
failure. If this is a representative sample of AC failure, find the probability
(a) That failure involves a gas leak
(b) That there is evidence of electrical failure given that there was a gas leak
(c) That there is evidence of a gas leak given that there is evidence of electrical
failure
Conditional Probability
103
Outline
104

 Conditional Probabilities
 Total Probability
 Baye’s Theorem
 Mathematical Expectation
 Decision Making
Conditional Probability
105

The probability of an event given that another event


has occurred is called a conditional probability
5. If A and B are any two events of sample space then
P(A∩B)= P(A|B). P(B) if P(B)≠ 0
P(A∩B)= P(B|A). P(A) if P(A)≠ 0
Where P(A|B) is the conditional probability of A if B has
already occurred.
& P(B|A) is the conditional probability of B if A has already
occurred.
106

5.1 If A and B are independent events of sample space S


P(A|B) = P(A)
& P(B|A) = P(B)
P(A∩B) = P(A) . P(B)
6. If, in an experiment, the events A1,A2, … ,Ak-1, Ak can occur, then

P( A1  A2  ...  Ak )  P( Ak A1  A2  ...  Ak 1 )  P( A1  A2  ...  Ak 1 )


 P ( Ak A1  A2  ...  Ak 1 )  P( Ak 1 A1  A2  ...  Ak  2 )  ...  P ( A3 A1  A2 )  P( A2 A1 )  P( A1 )
or  P ( A1 )  P( A2 A1 )  P( A3 A1  A2 )  ...  P ( Ak 1 A1  A2  ...  Ak  2 )  P( Ak A1  A2  ...  Ak 1 )
6.1 If A1,A2, …,Ak are independent events of sample space S

P ( A1  A2  ...  Ak )  P ( A1 )  P ( A2 )  ...  P ( Ak )
Example 8
107

Two coins are tossed, what is the conditional probability that two head appear given
that there is at least one head?
Solution:
Sample space for two coin S= {HH,HT,TH,TT}
Let A be an event that 2 head appears
A= {HH}
P(A|B)= ?
Let B be the event at least 1 head appear P ( A)  1
B= {HH,HT,TH} 4
3
P( B) 
4
108

P( A  B)
P( A | B) 
P( B)
A ∩ B= {HH} 1
P( A  B) 
4
1
1
P( A | B)  4 
3 3
4
NOTE THAT (!)

P ( A | B )  P ( A)
Total Probability Theorem
109

For any events A and B


A  ( A  B )  ( A  B)
7. For any events A and B the total probability theorem is
P ( A)  P ( A  B )  P ( A  B)
 P(A B)  P(B)  P(A B)  P(B)
7.1 Suppose B1, B2, B3,…, Bk are k mutually exclusive and exhaustive events. Then
P ( A)  P ( A  B1 )  P ( A  B2 )  ...  P ( A  Bk )
 P(A B1 )  P(B1 )  P(A B 2 )  P(B 2 )  ...  P(A B k )  P(B k )
k
  P(A Bi )  P(Bi )
i 1
Baye’s Theorem
110

By definition of conditional probability


P( A  B)  P( A B).P ( B)
8. By comparison
P( B  A)  P( B A).P ( A)
P ( B A).P ( A)  P ( A B ).P ( B )
P ( A B ).P ( B )
P ( B A)  provided P(A)  0
P ( A)
8.1 Suppose B1, B2, B3,…, Bk are k mutually exclusive and exhaustive events & A is any events. Then
P ( A Br ).P ( Br )
P ( Br A)  provided P(A)  0
P ( A)
P ( A Br ).P ( Br )
P ( Br A) 
P(A B1 )  P(B1 )  P(A B 2 )  P(B 2 )  ...  P(A B k )  P(B k )
P ( A Br ).P ( Br )
 k
1 r  k
 P(A B
i 1
i )  P(Bi )
Example 1 (Total probability Theorem)
111

In a certain assembly plant, three machines, B1, B2, and B3, make 30%, 45%, and
25% of the products respectively. It is known from past experience that 2%, 3%,
and 2% of the products made by each machine are defective respectively. Now
suppose that a finished product is randomly selected. What is the probability that
it is defective?
Solution: Consider the following events
A: The product is defective
B1: The product is made by machine B1
B2: The product is made by machine B2
B3: The product is made by machine B3
112

By applying theorem 7.1 we can write


P(A) = P(A|B1) P(B1)+ P(A|B2) P(B2)+ P(A|B3) P(B3)

P(A|B1) P(B1) = (0.3) (0.02) = 0.006.


P(A|B2) P(B2)= (0.45)(0.03) = 0.0135,
P(A|B3) P(B3)= (0.25)(0.02) = 0.005,

and hence
P(A) = 0.006 + 0.0135 + 0.005 = 0.0245
113

Suppose that a product was randomly selected and it is


defective. What is the probability that this product was
made by machine B1

Questions of this type can be answered by using the


Baye’s theorem
Example 2 (Baye’s Theorem)
114

If a product were chosen randomly and found to be defective, what is the


probability that it was made by machine B3?
P(B3|A)= ?
Solution: Using Baye’s theorem

P ( A B3 ).P ( B3 )
P ( B3 A) 
P(A B1 )  P(B1 )  P(A B 2 )  P(B 2 )  ...  P(A B k )  P(B k )
0.005
  0.2040
0.0245
In view of the fact that a defective product was selected, this result suggests that it probably was not made by machine B3.

Using Baye’s rule, a statistical methodology called the Bayesian method


Probability Distribution/ Densities
115
For a discrete random variable, the distribution of probabilities for each outcome x to occur is denoted by p(x),
with properties

0  p(x)  1,
p(x) =1

The function f(x) is a probability density function for a continuous random variable X defined over the set of real
numbers R, if
1. f ( x )  0 for all x

2.  f ( x)dx  1

b
3. P (a  x  b)   f ( x )dx
a
Mathematical Expectation
116

For Discrete random variable For continous random variable

Let X be a discrete random Let X be a continuous random


variable with probability variable with probability density
distribution p(x). Then expected function f(x). Then expected value
value of X is of X is

E ( x)   xP( x) E ( x)  

xf ( x ) dx
Properties of Mathematical Expectation
117

E(a)= a (Where a is constant)


E(ax ±b)= aE(x) ± b (Where a and b are constant)
E(xy) = E(x).E(y)
E(x ± y) = E(x) ± E(y)
Var(x)= E(x2)-[E(x)]2
Example
118

What is expected value of number of heads when three fair coins one
tossed.
X Outcome P(x) xP(x)

0 TTT 1/8 0/8

1 HTT,THT,TTH 3/8 3/8

2 HTH,HHT,THH 3/8 6/8

3 HHH 1/8 3/8

E=∑x p(x)
=12/8=1.5
119
A salesman can earn Rs 1000 per day If day is rainy, otherwise he lose Rs 200 What is expected earning
of salesman, if probability of rain is 0.3

X Probability P(x) P(x)


1000 0.3 300
-200 0.7 -140

E=∑x p(x)
=160
Expected earning of salesman is Rs 160 per day
Decision Making problem
120

Two new product designs are to be compared on the basis of revenue potential. Marketing
feels that the revenue from design A can be predicted quite accurately to be $3 million. The
revenue potential of design B is more difficult to assess. Marketing concludes that there is a
probability of 0.3 that the revenue from design B will be $7 million, but there is a 0.7 probability that
the revenue will be only $2 million. Which design do you prefer?

Let X denote the revenue from design A. Because there is no uncertainty in the revenue
from design A, we can model the distribution of the random variable X as $3 million with
probability 1. Therefore, E(x)= $3 million .
Let Y denote the revenue from design B. The expected value of Y in millions of dollars is

E(y)= 7 (0.3) + 2(0.7)= $3.5 million .


Because E(Y) exceeds E(X), we might prefer design B.
Example 1 (Density function)
121

Let the continuous random variable X denote the current measured in a thin
copper wire in milliamperes. Assume that the range of X is [0, 20 mA], and
assume that the probability density function of X is f(x)=0.05 for 0<x<20What
is the probability that a current measurement is less than 10 milliamperes?
Is f(x) a probability density function?
20

 0.05dx  1
0
10
P (0  x  10)   o.o5dx  0.5
0
Mathematical Expectation(for continuous variable)
122

For the copper current measurement in (Example 1) the Expected value


of X is 20
E ( x)   xf ( x)dx  10
0
In (Example 1) X is the current measured in milliamperes. What is the
expected value of the squared current?
20
E ( x 2 )   x 2 f ( x)dx  133.33
0
Example 2 (Density function)
123

Let the continuous random variable X denote the diameter of a hole


drilled in a sheet metal component. The target diameter is 12.5
millimeters. Most random disturbances to the process result in larger
diameters. Historical data show that the distribution of X can be
modeled by a probability density function

f(x)  20e -20(x-12.5) for x  12.5


(i) If a part with a diameter larger than 12.60 millimeters is scrapped,
what proportion of parts is scrapped?

P( x  12.6)   f ( x)dx  0.135
12.6
124

(ii) What proportion of parts is between 12.5 and 12.6 millimeters?


12.6
P(12.5  x  12.6) 
12.5
 f ( x)dx  0.865
Alternatively
Because the total area under f(x) equals 1, we can also calculate
P(12.5  x  12.6)  1  P( x  12.6)  1  0.135
 0.865
Mathematical Expectation(for continuous variable)
125

For the drilling operation in (Example 2) the expected value of X is



E ( x)   x 20 e  20 ( x 12.5 )
dx  12.55
12.5
126
Probability Distributions
127
Outline
128

 Probability distribution
 Uniform distribution
 Binomial distribution
 Hypergeometric distribution
 Geometric distribution
 Poisson distribution
 The mean of a probability distribution
 Standard deviation of a probability distribution
Probability distributions
129

Statistical Experiments any process by which measurements are obtained.

A quantitative variable, x is a random variable if its value is determined by the outcome of


a random experiment.

By probability distribution, we mean a correspondence that assigns probabilities to the values


of a random variable.

For a discrete random variable, the probability for each outcome x to occur is denoted by f(x),
known as probability distribution if it satisfy
0 f(x) 1
f(x)=1
Uniform distribution
130

A random variable X has a discrete uniform distribution if each of the


n values in its range, say x1, x2, x3,…xn, has equal probability
f (xi)=1/n

For example Roll a die, X= Dot appear on upper face of die

x 1 2 3 4 5 6

f(x) 1/6 1/6 1/6 1/6 1/6 1/6


Example
131

Toss a coin twice. X=# of heads

?
x f(x)
0 ¼
1 ½
Toss a coin ten time. X=# of heads
2 1/4

?
Pick up 2 cards from a deck of cards. X=# of aces

x f(x)
Example
132

Check whether the correspondence given by


x3
f ( x)  , for x=1, 2, and 3
15

can serve as the probability distribution of some random variable.

Substituting x=1, 2, and 3 into f(x)


4 5 6
They are all between 0 and 1. The sum is   1
15 15 15
So it can serve as the probability distribution of some random variable.
Exercise
133

Verify that for the number of heads obtained in four flips of a balanced
coin the probability distribution is given by
4
 x
f ( x )    , for x=0, 1, 2, 3, and 4
16

In many applied problems, we are interested in the probability that an event will occur x times out of n.
Roll a die 3 times. X=# of sixes
134

S=a six, N=not a six


No six: (x=0) NNN  (5/6)(5/6)(5/6)

One six: (x=1)


NNS  (5/6)(5/6)(1/6)
NSN  same
SNN  same
Two sixes: (x=2)
NSS  (5/6)(1/6)(1/6)
SNS  same
SSN  same
Three sixes: (x=3)
SSS (1/6)(1/6)(1/6)
Binomial distribution
135

x f(x)
0 (5/6)3
1 3 (1/6) (5/6)2
2 3 (1/6)2 (5/6)
3 (1/6)3

x 3 x
 3  1   5 
f ( x)  
 x 6   6 
    
Toss a die 5 times. X=# of six. Find P(X=2)
S=six N=not a six 136

SSNNN 1/6*1/6*5/6*5/6*5/6=(1/6)2(5/6)3
SNSNN 1/6*5/6*1/6*5/6*5/6=(1/6)2(5/6)3
SNNSN 1/6*5/6*5/6*1/6*5/6=(1/6)2(5/6)3
SNNNS
10 ways to choose 2 of 5 places for S.
NSSNN etc. __ __ __ __ __
5 5! 5! 5 * 4 * 3!
NSNSN  2   2!(5  2)!  2!3!  2 *1* 3!  10
 
NSNNS
NNSSN 2
1 5
3

NNSNS P( x  2)  10 *    
6 6
NNNSS [1-P(S)]5 - # of S
[P(S)]# of S
n independent trials; p probability of a success; x=# of successes

137
A trial with only two possible outcomes is used so frequently as a building block of a random experiment
that it is called a Bernoulli trial.
A random experiment consists of n Bernoulli trials such that
1) There are a fixed number of trials. This is denoted by n.
2) The n trials are independent and repeated under identical conditions.
3) Each trial results in only two possible outcomes, labeled as “success’’ and “failure’’
4) The probability of a success in each trial, denoted as p, remains constant

The random variable X has a binomial random variable with parameters n and p The probability
function of X is
n px (1-p)n-x
  ways to choose x places for s,
 x
n x
f ( x )    p (1  p ) n  x
 x
Roll a die 20 times. X=# of 6’s, n=20, p=1/6
138

x 20  x
 20   1   5 
f ( x)    6   6 
 x    
4 16
 20  1   5 
p( x  4)    6   6 
 4    

Flip a fair coin 10 times. X=# of heads

x 10  x 10
 10  1   1   10  1 
f ( x)  
x   2   2  
x   2 
       
Geometric distribution
139

Rather than repeat a fixed number of trials, we repeat the experiment


until the first success.
Let the random variable X denote the number of trials until the first
success.
Then X is a geometric random variable with parameter p and probability
function is
x 1
f ( x)  (1  P ) P x  1, 2, 3,...
Hypergeometric distribution
140

If we sample with replacement and the trials are all independent, the
binomial distribution applies.

If we sample without replacement, a different probability distribution


applies. ( Hypergeometric distribution )
Example
141

Pick up n balls from a box without replacement. The box contains a


white balls and b black balls

X=# of white balls picked

n picked

a successes X= # of successes

b non-successes
In the box: a successes,
142
b non-successes
The probability of getting x successes (white balls):
# of ways to pick n balls with x successes
p( x ) 
total # of ways to pick n balls
# of ways to pick x successes
=(# of ways to choose x successies)*(# of ways to choose n-x non-successes)
 a  b 
=   
x
  n  x 
A sample of size n objects is selected randomly (without replacement) from the a+b objects .
Let the random variable X denote the number of successes in the sample. Then X is a hypergeometric random
variable and probability function is defined as
 a b 
 x  n  x 
f ( x)     , x  0,1, 2, ..., a
a  b
n 
 
Example
143

52 cards. Pick n=5.


X=# of aces,
then a=4, b=48

 4  48 

 2
3  
P ( X  2)    
 52 

5  
 
Example
144

A box has 100 batteries.


a=98 good ones
b= 2 bad ones
n=10
X=# of good ones

 98  2 

8  
 2
P ( X  8)    
100 

10  
 
Poisson distribution
145
This distribution is used to model the number of “rare” events that occur in a time
interval, volume, area, length, etc…
Example: Number of deaths from horse kicks in the Army in different years

Given an interval of real numbers, assume counts occur at random throughout the interval.
If the interval can be partitioned into subintervals of small enough length such that
The number of successes in a fixed subinterval, follows a Poisson process provided the
following conditions are met
1. The probability of two or more successes in any sufficiently small subinterval is 0.
2. The probability of success is the same for any two subintervals of equal length.
3. The number of successes in any subinterval is independent of the number of successes in
any other subinterval provided the subintervals are not overlapping.
Poisson distribution
146

The random variable X that equals the number of counts in the interval is a Poisson
random variable with parameter λ , and the probability function of X is

 x e
f ( x)  , x=0, 1, 2, ...
x!

When there is a large number of trials, but a small probability of success, binomial
calculation becomes impractical
Limiting case of Binomial dist
147

Radioactive decay
x=# of particles/min
2 3 e 2
λ=2 particles per minutes P ( x  3)  , x=0, 1, 2, ...
3!
Example
148

Radioactive decay
X=# of particles/hour
λ =2 particles/min * 60min/hour=120 particles/hr

125 120
120 e
P ( x  125)  , x=0, 1, 2, ...
125!
The Poisson Distribution
Emission of -particles
No. - Observed
149 particles
0 57
 In 1910, Ernest Rutherford and Hans Geiger recorded the 1 203
2 383
number of -particles emitted from a polonium source in 3 525
4 532
successive intervals of one-eighth of a minute. 5 408
6 273
The results are reported in a table. 7 139
8 45
Does a Poisson probability function accurately describe 9 27
10 10
the number of -particles emitted? 11 4
 Source: Rutherford, Sir Ernest; Chadwick, James; and Ellis, C.D.. 12 0
Radiations from Radioactive Substances. London, Cambridge University Press, 1951, p. 172.
13 1
14 1
Over 14 0
Total 2608
No. - Observe Expected
150 particles d
0 57 54
Calculation of λ : 1 203 210
2 383 407
3 525 525
4 532 508
λ = No. of particles per interval 5 408 394
6 273 254
= 10097/2608 7 139 140
= 3.87 8 45 68
9 27 29
10 10 11
11 4 4
Expected values 12 0 1
13 1 1
=2608  e -3.87
(3.87)x
14 1 1
x! 0 0
Over 14
2608 2680
Total
The mean of a probability distribution
151

X=# of 6’s in 3 tosses of a die


x f(x)
0 (5/6)3
1 3 (1/6) (5/6)2
2 3 (1/6)2 (5/6)
3 (1/6)3

Expected long run average of X?


152
The average or mean value of x in the long run over repeated
experiments is the weighted average of the possible x values,
weighted by their probabilities of occurrence.

x 3 x
3
  1   5 
3
E( X )  X   x     
x 0  x  6   6 
3 2 2 3
5  1  5  1 5 1
 0 *    1* 3      2 * 3      3*    1/ 2
6  6  6  6 6 6
In general
153

X=# showing on a die


mean:  x  E ( x )   xf ( x)
1 1 1 1 1 1
E ( x )  1   2    3    4    5    6    3.5
6 6 6 6 6 6
The population is all possible outcomes of the
experiment (tossing
154
a die).

Population mean=3.5
Box of equal number of
1’s 2’s 3’s
4’s 5’s 6’s

E(X)=(1)(1/6)+(2)(1/6)+(3)(1/6)+
(4)(1/6)+(5)(1/6)+(6)(1/6)
=3.5
X=# of heads in 2 coin tosses
155

Box of 0’s, 1’s and 2’s


with twice as many 1’s as 0’s or 2’s.)

X 0 1 2
P(x) 1/4 ½ 1/4

Population Mean=1
For probability distribution
156

For example,
 3 white balls, 2 red balls x P(x)
 Pick 2 without replacement 0 P(RR)=2/5*1/4=2/20=0.1
X=# of white balls 1 P(RW or WR)=P(RW U
WR)=P(RW)+P(WR)

=2/5*3/4+3/5*2/4=0.6

2 P(WW)=3/5*2/4=6/20=0.3

m =E(X)=(0)(0.1)+(1)(0.6)+(2)(0.3)=1.2

m
The mean of a probability distribution
157

 Binomial distribution
n= # of trials,
p=probability of success on each trial
X=# of successes

n x
E ( x )     x   p (1  p ) n x
 np
 x
158

Toss a die n=60 times, X=# of 6’s


known that p=1/6
μ=μX =E(X)=np=(60)(1/6)=10

We expect to get 10 6’s.


Hypergeometric Distribution
159

a – successes
b – non-successes
pick n balls without replacement
X=# of successes

 a  b   a  b
E( x)     x      n 
 x  n  x   
a
  n
ab
Example
160

50 balls
20 red
30 blue
N=10 chosen without replacement
X=# of red

20
E ( x )    10 * ( )  10 * 0.4  4
50
Since 40% of the balls in our box are red, we expect on average
40% of the chosen balls to be red. 40% of 10=4.
Standard Deviation of a Probability Distribution
161

Variance:
σ2 = weighted average of (X-μ)2 by the probability of each
possible x value =  (x- μ)2f(x)
Standard deviation:

   ( x   ) 2
f ( x)
Example
162

 Toss a coin n=2 times. X=# of heads


μ=np=(2)(½)=1
x (x-μ)2 f(x) (x-m)2f(x)
0 1 ¼ ¼
1 0 ½ 0
2 1 ¼ ¼
________________________
½ = σ2
σ=0.707
Variance for Binomial distribution
163

 σ2=np(1-p)
where n is # of trials and p is probability of a success.
From the previous example, n=2, p=0.5
Then
σ2=np(1-p)=2*0.5*(1-0.5)=0.5
Variance for Hypergeometric distributions
164

Hypergeometric:

a b abn
  n
2
 
a  b a  b a  b 1
 np(1  p )  finite population correction factor
Alternative
165
formula
 σ2=∑x2f(x)–μ2

Example: X binomial n=2, p=0.5


x 0 1 2
f(x) 0.25 0.50 0.25
Get σ2 from one of the 3 methods
1. Definition for variance
2. Formula for binomial distribution
3. Alternative formula
PROBABILITY DENSITIES
166
Outline
167

Probability densities
Uniform distribution
Gamma Function
Exponential distribution
Gamma distribution
Beta distribution
Normal distribution
Standardized Normal distribution
Normal approximate to binomial
Probability densities
168

The function f(x) is a probability density function for a continuous


random variable X defined over the set of real numbers R, if
1. f ( x )  0 for all x

2. 

f ( x ) dx  1

b
3. P ( a  x  b)  
a
f ( x ) dx

4. P (x  a)  0
.

Density functions are commonly used in engineering to describe physical systems


169
 For example, consider the density of a loading on a long, thin beam as shown in Fig (A) For any
point x along the beam, the density can be described by a function (in grams/cm).

Fig (A)

 The total loading between points a and b is determined as the integral of the density function
from a to b. This integral is the area under density function over interval as shown in Fig (B).

Fig (B)
Correspondence of Area Probability
170

 A histogram is an approximation to a probability density function. For each interval


of the histogram, the area of the bar equals the relative frequency
of the measurements in the interval. The relative frequency
is an estimate of the probability that a
measurement falls in the interval.

 Similarly, the area under f(x) over any interval equals the true probability that a
measurement falls in the interval.
Example 1 (Density function)
171

Let the continuous random variable X denote the current measured in a thin
copper wire in milliamperes. Assume that the range of X is [0, 20 mA], and
assume that the probability density function of X is f(x)=0.05 for 0<x<20
What is the probability that a current measurement is between 5 to 10
milliamperes?
Is f(x) a probability density function? 20

 0.05dx  1
0
10
P (5  x  10)   o.o5dx  0.25
5
Mathematical Expectation(for continuous variable)
172

For the copper current measurement in (Example 1) the Expected value


of X is 20
E ( x)   xf ( x)dx  10
0
In (Example 1) X is the current measured in milliamperes. What is the
expected value of the squared current?
20
E ( x 2 )   x 2 f ( x)dx  133.33
0
Example 2 (Density function)
173

Let the continuous random variable X denote the diameter of a hole


drilled in a sheet metal component. The target diameter is 12.5
millimeters. Most random disturbances to the process result in larger
diameters. Historical data show that the distribution of X can be
modeled by a probability density function

f(x)  20e -20(x-12.5) for x  12.5


(i) If a part with a diameter larger than 12.60 millimeters is scrapped,
what proportion of parts is scrapped?

P( x  12.6)   f ( x)dx  0.135
12.6
174

(ii) What proportion of parts is between 12.5 and 12.6 millimeters?


12.6
P(12.5  x  12.6) 
12.5
 f ( x)dx  0.865
Alternatively
Because the total area under f(x) equals 1, we can also calculate
P(12.5  x  12.6)  1  P( x  12.6)  1  0.135
 0.865
Uniform Distribution
175

Used to model random variables that tend to occur “evenly” over a


range of values
Probability of any interval of values proportional to its width

 1
 b  a a xb
f ( x)  

0 elsewhere
Mean & Variance of Uniform distribution
176

b
b  1   1 x
2
b2  a2 (b  a )(b  a ) ba
E ( x)  a
x
 b  a
 dx  
  b  a

 2

2(b  a )

2(b  a )

2
a

b
b3  a 3 (b  a )(a 2  b 2  ab)
 
3
2 1   1 x
b
 dx    
2
E x x  
a
ba ba 3 a
3(b  a ) 3(b  a )
( a 2  b 2  ab)

3
2
( a 2  b 2  ab) b  a 
 2

Var ( x )  E x   E ( x )  
2

3
  
 2 
4( a 2  b 2  ab)  3(b 2  a 2  2ab) a 2  b 2  2ab (b  a ) 2
  
12 12 12
Gamma Function
177


 ( )  0
y  1e  y dy

 (  1)  0
y  e  y dy Integrating by Parts :
u  y   du  y  1dy
dv  e  y dy  v  e  y
  
  (  1)   y  e  y dy  uv    y  1e  y dy 
 y
vdu   y e 
0 0 0

 0  ( 0)    y  1e  y dy   ( ) (Recursive Property)
0

Note that if  is an integer,  ( )  (  1)!



 y  1e  y Letting x  y  :

Consider the integral : dy
0

 y  x  dy   dx
  
  ( x )  dx    x  1e  x dx     ( )
 1 y   1 x 
 y e dy  e
0 0 0
Exponential Distribution
178

 Right-Skewed distribution with maximum at x=0


 Random variable can only take positive values
 Used to model inter-arrival times/distances for a Poisson process

 1  x /
 e x0

f ( x)  
0 elsewhere


Exponential Distribution Mean & Variance
179

  1 x   1  x  1  21  x 
E ( x)   x e dx   xe dx   x e dx
0
   0  0
1
 (2) 2   (2  1)! 

  
2  1 x   1  2 x 
E x   x  e dx   x e dx   x e dx
2 1  31  x 
0
   0  0
1
 (3) 3   2 (3  1)! 2 2

 
Var ( x)  E x 2   E ( x)  2 2  ( ) 2   2
2
Gamma Distribution
180

 Family of Right-Skewed Distributions


 Random Variable can take only positive values

 1  1  x 
 ( )   x e x  0,  ,   0

f ( x)  
0 otherwise


Gamma Distribution Mean & Variance
181

  1  1  x   1 
0  ( )    0
 x 
E ( x)  x  x e 
dx  x e dx
  ( ) 
1  1  (  1) 
 ( )   0
( 1) 1  x   1
 x e dx   (  1)  
 ( )    ( )
 ( ) 
  
 ( )
  1  x  
 
E x  x 
2

2 1
  ( )   x e 
dx 
 ( 
1
) 

 0
x  1  x 
e dx
0
 
1  1 (  2)  2
 ( )   0
(  2 ) 1  x   2
 x e dx   (  2)  
 ( )    ( )
(  1) (  1)  2 (  1) ( )  2
   (  1)
 ( )  ( )
 
Var ( x )  E x 2   E ( x ) 
2
 (  1)  ( ) 2   2  2   2   2  2   2
Beta Distribution
182

 Used to model probabilities (can be generalized to any finite, positive range)


 Parameters allow a wide range of shapes to model empirical data

 (   )  1  1
 ( )(  ) x (1  x ) 0  x  1,  ,   0

f ( x)  
0 otherwise


Normal Distribution
183

 Bell-shaped distribution with tendency for individuals to clump around the


group mean
 Many estimators have approximate normal sampling distributions

1 ( x )2
1 
f ( x)  e 2 2
   x  ,      ,   0
2 2
184
Normal Distribution – Normalizing Constant
185
( x ) 2
 
Consider the integral :  e 2 2
dx  k (we want to solve for k )


x dz 1
Changing variables : z     dx  dz
 dx 
( x ) 2
z2
   
k   
e 2 2
dx  

e 2
dz
z2
k  


  
e 2
dz

  z12  z 22 
2 1 z2 z2
 k    1   2  
     e 2
dz1  e 2
dz 2    e 2 dz1dz 2
      

Changing to Polar Co - Ordinates :


z1  r cos  , z 2  r sin  with domains : r  (0,  ),   [0,2 ) and dz1dz 2  rdrd
 k 
2
   
1 2
z1  z 22  2  
1 2

r cos 2   sin 2   2  
1 2
r

 
     
e 2
dz1dz 2    0 0
e 2
rdrd   
0 0
e 2
rdrd (cos 2   sin 2   1)

1 2 
2  r 2 2 2
 
0
e 2
d   0
( 0  ( 1))d  0
d   0
 2
r 0
2
 k 
   2  k 2  2 2  k  2 2
 
Obtaining Value of ┌(1/2)
186

From Previous slide, we get :  2
2
ez 2
dz 


 1
 2
2
ez 2
dz   
0
2
1  
Now, Consider :     y 1 2  1
e  y dy   y 1 2 e  y dy
2 0 0

z2
Changing Variables : y   dy  zdz
2
1 2
1   z2    1  z2
 
z2 2
   
 2 
 e zdz  2 e
2
zdz 
 2  0
  0
 z 
 1
 2 e z2 2
dz  2  2  
0
 2 
1
   
 2 
Standardized Normal Distribution Mean& Variance
187
1
1  z2
Z ~ N (0,1)  f ( z )  e 2
2

 1 
1 2
z  1  
1 2
z 1 
1
 z2 
E ( Z )   z e 2 dz   ze 2
dz  e 2   0  ( 0)  0
  
 2  2 
2 


 
 1
1   2  12 z 2  
  
 z 
1  z2

2  0
E Z 2 2
e 2  z e dzdz  2
 2 

  
1
Changing Variables : y  z 2  dy  2 zdz  dy  zdz
2
1 1
 1   2  2 z2 2   z2 2 
y 2 1 
2  z e dz   ze 2
zdz   y e   dy
 2  0
2 0
2 0
2
1 
 3 2  1 1 3 1  1   1  3/ 2 23 2 

y 2
 y e dy    2 3 2      2  1
2 0
2  2  2    
2 2 2 2
 
Var ( Z )  E Z 2   E ( Z )   1  0 2  1
2
188
189
Example
190

Let Z be a standard Normal variable, then find following probabilities


1. P(Z > 1.26)
2. P(Z<- 0.86)
3. P(Z > -1.37)
4. P(-1.25< Z <0.37)
5. P(Z<-4.6)
6. Find the value z such that P(Z > z)=0.05
7. Find the value of z such that P(-z< Z <z)=0.99
191
Solution
192

1. P(Z > 1.26)= 1- 0.89616 =0.10384


2. P(Z<- 0.86) =0.19490
3. P(Z > -1.37)= 0.91465
4. P(-1.25< Z <0.37)=0.64431 - 0.10565 = 0.53866
5. P(Z<-4.6) =0
6. P(Z > z)=0.05 Z=1.65
7. P(-z< Z <z)=0.99 Z=2.58
Example
193

(a) Suppose the current measurements in a strip of wire are assumed to follow a normal
distribution with a mean of 10 milliamperes and a variance of 4 (milliamperes)2 . What is the
probability that a measurement will exceed 13 milliamperes?

Let X denote the current in milliamperes. The required probability can be represented as P(X > 13).
Let Z = (X - 10)/2 X > 13 corresponds to Z >1.5
P (Z >1.5)= 0.06681

(b) Determine the value for which the probability that a current measurement is below this value is 0.98 We
need the value of x such that P(X < x) = 0.98. By standardizing, this probability expression can be written

P(Z< z)= 0.98.

The nearest probability from Table P(Z< 2.05)= 0.97982


Normal approximation to binomial distribution
194
Normal approximation to binomial distribution
195
Exercise
196

Find mean & variance of Beta distribution.


Show that following function are probability distribution.
1. Uniform Distribution 3. Gamma Distribution
 1  1
 a xb x  1  x 
e x  0,  ,   0

f ( x)   b  a
 ( )  

 f ( x)  

0 elsewhere 0 otherwise


2. Exponential Distribution 4. Beta Distribution
 1  x /
 e x0  (   )  1  1
 ( )(  ) x (1  x ) 0  x  1,  ,   0

f ( x)   
f ( x)  
0 elsewhere
 0 otherwise
 

SAMPLING DISTRIBUTION
197
Outline
198
 Sampling
 Sampling without replacement
 Sampling with replacement
 Sampling distribution
 Mean & variance of sampling distribution
 Standard error
 Law of large number
 Central limit theorem
Type of inference
199

Estimation: We
 can estimate the value of a population
parameter.

Testing: We can formulate a decision about a population



parameter.

Regression: We can make predictions about the value of a



statistical variable.
200
Why sampling dist?
201

To evaluate the reliability of our inference, we


need to know about the probability distribution
of the sample statistic.

Our interest is in the formation of sampling


distributions for sample means(statistic) and
sample variances(statistic).
Need of sample
202

Essentially, we would like to know the parameter.


But in most cases it is hard to know the parameter since the population is too large. So
we have to estimate the parameter by some proper statistics computed from the
sample.

For example, the mean of the data from a sample is used to give information about
the overall mean in the population from which that sample was drawn

Our interest is to know something about the population, but because our time,
resources, and efforts are limited, we can take a sample to learn about the population.
Need of sample
(otherwise for parameter calculation we have to pay the following)
203

Time of researcher and those being surveyed.


Cost to group or agency commissioning the survey.
Confidentiality, anonymity, and other ethical issues.
Interference with population.

 Large sample could alter the nature of population, e.g. opinion surveys.
 Destruction of population, e.g. crash test only a small sample of
automobiles.
 Cooperation of respondents – individuals, firms, administrative
agencies.
 In some cases partial data is all that is available, e.g. fossils and historical
records, climate change.
Choice of sample
204

Random Sample : A sample designed in such a way as to ensure that


(1) Every observation of the population has an equal chance of being
chosen and
(2) Every combination of n observation has an
equal chance of being chosen.
Statistic (sample statistic)
205

• A number that describe a sample

• Known after we take a sample

• Change from sample to sample

• Used to estimate an unknown parameter


Selection of simple random sample
206

Simple random sample is a sample of size n selected in a


manner that each possible sample of size n has the same
probability of being selected.
Where
N is the symbol given for the size of the population or the number of
elements in the population.

n is the symbol given for the size of the sample or the number of
elements in the sample.
Sampling without replacement
207

Draw simple random sample of size 2 without replacement from a


population of 4 elements

Population elements are A, B, C, D. N=4, n=2.


If the order of selection does not matter (i.e. we are interested only in
what elements are selected), the equally likely random samples are
AB BC CD
sample of size 3
AC BD
sample
AD of size 3 ABC ACD BCD
ABD
This is the number of combinations
N N! 4!
C n
 
n!( N  n)! 2!(4  2)!
6
Sampling with replacement
208

After any element randomly selected, replace it and randomly select


another element. But this could lead to the same element being
selected more than once.
Draw simple random sample of size 2 with replacement from a
population of 4 elements

Population elements are A, B, C, D. N=4, n=2.

The number of random samples are n


N
sample of size 2 sample of size 3
AAA BAA CAA DAA
209 AAB BAB CAB DAB
AA BA CA DA AAC BAC CAC DAC
AB BB CB DB AAD BAD CAD DAD
AC BC CC DC ABA BBA CBA DBA
AD BD CD DD ABB BBB CBB DBB
ABC BBC CBC DBC
ABD BBD CBD DBD
ACA BCA CCA DCA
ACB BCB CCB DCB
ACC BCC CCC DCC
ACD BCD CCD DCD
ADA BDA CDA DDA
ADB BDB CDB DDB
ADC BDC CDC DDC
ADD BDD CDD DDD
Sampling distribution serves as a bridge between the
sample and the population
210

Sampling distribution

Staticstic

Parameter
Sampling distribution
211

• This is not the distribution of the sample.

• The sampling distribution is the distribution


of sample statistic.

• If we take many samples and get the statistic


for each of those samples, the probability
distribution of all those statistics is the
sampling distribution.
Sampling distribution
212

A sampling distribution is the probability distribution for all


possible values of the sample statistic.

Each sample contains different elements so the value of the


sample statistic differs for each sample selected. These
statistics provide different estimates of the parameter. The
sampling distribution describes how these different values
are distributed.
Sampling distribution of the sample mean x
͞
213

A probability distribution of sample means that would


be obtained from all possible samples of the same size

 If the expected value of the statistic x


͞ is μ. This
characteristic of the sample mean is that of being an
unbiased estimator of μ. In this case,

E(x)  
Sampling distribution approximately a normal distribution
214

If a simple random sample is drawn from a normally distributed


population, the sampling distribution of ͞x is normally distributed
Sampling distribution will approximate a normal curve even if the
population you started with does NOT look normal (HOW?)

The mean of the sampling distribution of ͞ is equal to the population


x
mean μ, x  
If the sample size n is a reasonably small from the population size N,
then the standard deviation of the sampling distribution of x͞ is the
population standard deviation  σ divided by the square root of the sample
size. x 
n
Standard error
215

The standard deviation of the sampling distribution is called the


standard error.
Law of large numbers
216

Q: Each sample of the same population will have a different mean ,


Why it is a reasonable estimate of the population mean?
EXAMPLE 1
217

Population has 6 elements: 1, 2, 3, 4, 5, 6 (like numbers on dice)


We want to find the sampling distribution of the mean for n=2
If we sample with replacement, what will happen

1+2+3+4+5+6 = 21.
µ=21/6 = 3.5
2
91  21 
2   
6  6 
There is only 1 way to get a mean of 1, but 6 ways to get a mean of
3.5
Sample with mean
218

1st 2nd M 1st 2nd M 1st 2nd M


1 1 3 1 5 1
1 2 3
1 2 3 2 5 2
1.5 2.5 3.5
1 3 3 3 5 3
2 3 4
1 4 3 4 5 4
2.5 3.5 4.5
1 5 3 5 5 5
3 4 5
1 6 3 6 5 6
3.5 4.5 5.5
2 1 4 1 6 1
1.5 2.5 3.5
2 2 4 2 6 2
2 3 4
2 3 4 3 6 3
2.5 3.5 4.5
2 4 4 4 6 4
3 4 5
2 5 4 5 6 5
3.5 4.5 5.5
2 6 4 6 6 6
4 5 6
Sampling distribution of sample mean
x f (x ) P(x ) x P(x ) 2
x P( x )
219
1 1 1/36 1/36 1/36
1.5 2 2/36 3/36 4.5/36
2 3 3/36 6/36 12/36
2.5 4 4/36 10/36 25/36
3 5 5/36 15/36 45/36
3.5 6 6/36 21/36 73.5/36
4 5 5/36 20/36 80/36
4.5 4 4/36 18/36 81/36
126
5 3 3/36 15/36 75/36  x  E ( x )   x P( x )   3.5
5.5 2 2/36 11/36 60.5/36 36
6 1 1/36 6/36 36/36
SUM 36 1 126/36 493.5/36
2

  x P( x )    xP( x )  
493.5  126 
2
 2
x
2
   1.45833
36  36 
With replacement
220

 x    3.5
2
2.9166
 2
x    1.4583
n 2
221

The sampling distribution shows the relation between the probability


of a statistic and the statistic’s value for all possible samples of size n
drawn from a population.
Hypothetical Distribution of Sample Means
f(M)

Mean Value
Example 2
222

Population has 5 elements: 0, 3, 6, 9,12


We want to find the sampling distribution of the mean for n=3
If we sample without replacement,

30
  6
5
  18
2
Sample with mean
223

sample mean sample mean sample mean


0, 3, 6 3 3, 6, 9 6 6, 9, 12 9
0, 3, 9 4 3, 6, 12 7
0, 3, 12 5 3, 9,12 8
0, 6, 9 5
0, 6, 12 6
0, 9, 12 7
Sampling distribution of sample mean
224
2
x f (x ) P(x ) x P(x ) x P( x )
3 1 1/10 3/10 9/10
60
4 1 1/10 4/10 16/10  x  E ( x )   x P( x )   6
5 2 2/10 10/10 50/10 10
6 2 2/10 12/10 72/10
7 2 2/10 14/10 98/10
8 1 1/10 8/10 64/10
9 1 1/10 9/10 81/10
SUM 10 1 60/10 390/10

  x P( x )    x P( x )  
2 390  60 
 2
x
2
   3
10  10 
225

x    6
2
N  n 18  2
 2
x    3
n N 1 3 4
Central limit theorem
226

Sampling error
The sample cannot be fully representative of the population
As such, there is variability due to chance
We could have a thousand sample means and none of them equal exactly the
population mean.

The sampling error is the difference between the point estimate (value of
the estimator) and the value of the parameter. This is the error caused by
sampling only a subset of elements of a population, rather than all
elements in a population. Our interest lies in minimizing the sampling
error, but all samples have some such error associated with them.
Central limit theorem
227

For any population , regardless of form, the sampling distribution of


the mean will approach a normal distribution as the sample size (n)
gets larger.
 This of course begs the question of what n is ‘large enough’

Furthermore, the sampling distribution of the mean will have a mean


equal to µ (the population mean), and a standard deviation equal to

  
2
   
2

X ~ Normal   ,   2
   Normal   ,    
  n 
    n 
Central limit theorem(CLT)
228

The sampling distribution of the sample mean, is approximated by a normal


distribution when the sample is a simple random sample and the sample
size n is large.
In this case, the mean of the sampling distribution is the population mean, μ,
and the standard deviation of the sampling distribution is the population
standard deviation, σ, divided by the square root of the sample size.
A sample size of 100 or more elements is generally considered sufficient to
permit using the CLT.
If the population from which the sample is drawn is symmetrically
distributed, n > 30 may be sufficient to use the CLT.
REGRESSION & CORRELATION
229
Outline
230

Correlation

Rank Correlation

 Linear Regression Line

 Least squares principles

 Curve fitting by least squares

Relation Between Correlation & Regression Coefficients


Relation Between two variables X & Y
231

(a) Linear
232

(b) Linear
233

(c) Curvilinear
234

(d) Curvilinear
235

(e) No Relationship
Correlation
236

Two variables are said to be correlated if they tend simultaneously vary

in some direction.

 If both variables tend to increase (or decrease) together, the correlation

is said to be Positive.

 If one variable tend to increase and other variable tend to decreases,


the correlation is said to be Negative.
Correlation coefficient
237

The correlation coefficient


is a quantitative measure of the strength of the
linear relationship between two variables.
n xy   x y
r
[n( x 2 )  ( x) 2 ][n( y 2 )  ( y ) 2 ]

r = Sample correlation coefficient or r


 ( x  x )( y  y )
n = Sample size
x = independent variable [ ( x  x ) ][ ( y  y ) ]
2 2

y = dependent variable
Example of correlation coefficient
238

y x yx y2 x2
487 3 1,461 237,169 9
445 5 2,225 198,025 25
272 2 544 73,984 4
641 8 5,128 410,881 64
187 2 374 34,969 4
440 6 2,640 193,600 36
346 7 2,422 119,716 49
238 1 238 56,644 1
312 4 1,248 97,344 16
269 2 538 72,361 4
655 9 5,895 429,025 81
563 6 3,378 316,969 36
Sum 4855 55 26091 2240286 4855
239

n xy   x y
r
[n( x )  ( x) ][ n( y )  ( y ) ]
2 2 2 2

12(26,091)  55(4,855)
r
[12(329)  (55) 2 ][12(2,240,687)  (4,855) 2 ]
 0.8325
Regression
240

Correlation describes the strength of a linear relationship between two variables

Regression analysis describes the relationship between two (or more) variables.
Examples: Income and educational level
Demand for electricity and the weather

Regression tells us how to draw the straight line described by the correlation

Definition: The
relationship between the expected value of dependent variable Y and independent
variable X is Known as Regression line of Y on X

A good line is one that minimizes the sum of squared differences between the points and the
line.
Interpretation of Regression line
241

Y
Y = bX + a
Change
b = S lo p e in Y
C h a n g e in X
a = Y -in te r c e p t
X
Interpretation of Regression coefficient
242

The interpretation of the regression coefficient b is that is gives the


average change in the dependent variable for a unit increase in the
independent variable.

The slope coefficient may be positive or negative, depending on the


relationship between the two variables.
ESTIMATED REGRESSION
243

y  ab x  x y
 xy 
 ( x  x )( y  y ) or  n or r
Sy
b  (  x ) 2
Sx
 (x  x) 2
 x 2

n
a  y  bx
ŷ = Estimated, or predicted, y value
a = Unbiased estimate of the regression intercept
b = Unbiased estimate of the regression slope
x = Value of the independent variable
Example of regression line Y on X
244

y x yx y2 x2
487 3 1,461 237,169 9
445 5 2,225 198,025 25
272 2 544 73,984 4
641 8 5,128 410,881 64
187 2 374 34,969 4
440 6 2,640 193,600 36
346 7 2,422 119,716 49
238 1 238 56,644 1
312 4 1,248 97,344 16
269 2 538 72,361 4
655 9 5,895 429,025 81
563 6 3,378 316,969 36
Sum 4855 55 26091 2240286 4855
245

 xy   x y
26,091 
55(4,855)
b n  12  49.9101
(
x  n
2  x ) 2
329 
(55) 2

12

a  y  b x  404.5833  49.9101(4.5833)  175.8288

yˆ  175.8288  49.9101( x)
The principle of least squares
246

The principle of least squares consist of determining the value of


unknown parameters that will minimize the sum of squares of error

 ( y  yˆ ) 2
should be minimize

A residual (or Error) is the difference between the actual value of


the dependent variable and the value predicted by the regression line.

y  yˆ
Error Analysis
247

SSE   ( y  y )
ˆ 2
or simply
SSE   y  a  y  b xy 2

TSS   ( y  y) 2

SSR   ( y
ˆ  y) 2

TSS  SSE  SSR


Total Sum of squares = Sum of squares of error +Sum of squares of Regression
coefficient of determination
248
The coefficient of determination is the portion of the total variation in
the dependent variable that is explained by its relationship with the
independent variable.
The coefficient of determination is also called R-squared and is denoted as R2.
 SSR
R 2

TSS
SSR 191,600.62
R 
2
  0.6931
TSS 276,434.90
69.31% of the variation in the data for this sample can be explained by
the linear relationship between X and Y
249

COEFFICIENT OF DETERMINATION SINGLE INDEPENDENT VARIABLE CASE

R r
2 2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient
MEAN SQUARE REGRESSION
250

SSR
MSR 
k
MEAN SQUARE ERROR
SSE
MSE 
where: n  k 1
SSE = Sum of squares error
n = Sample size
k = Number of independent variables
Curve fitting by least squares
251

Y=a+bX Linear Curve

Y= a+bX+cX2 Parabolic Curve

Y=a+bX+cX2+dX3 Cubic Curve

Y=a ebX Exponential Curve


Y=aXb Power Curve
Y=1/a+bX Hyperbolic Curve
Linearization & Normal Equation
Ranking
252

An order arrangement of objects (or individuals) according to some characteristics of


interest is called ranking.

The correlation between two sets of ranking is known as rank correlation.


Let two ranking of n objects with respect to character A & B be respectively; x1,
x2, . . . , xn & y1, y2, . . .yn
we assume that no two or more objects are given the same ranks(Ranking without tie).
6 di 2
Then ranked correlation rs is calculated by rs  1
n( n 2
 1)

If there is tie among ranks of individuals suppose m numbers of ties then for

( m  m)
3 2
each tie add a quantity di
12 for each tie in
Example 1
253

 In a study of the relationship between level education and income the


following data was obtained. Find the relationship between them.
Sample Education (X) Income(Y)

A Preparatory. 25
B Primary. 10
C College 8
D secondary 15
E Illiterate 50
F University. 60
Without tie
254

Rank Rank di=x-y di2


X Y
5 3 2 4
4 5 -1 1
2 6 -4 16
3 4 -1 1
6 2 4 16
1 1 0 0

∑ di2=38
6  38
rs  1   0.085
6(35)
Example 2
255

 In a study of the relationship between level education and income the


following data was obtained. Find the relationship between them.
Sample Education Income
(X) (Y)
A Preparatory. 25
B Primary. 10
C University. 8
D secondary 10
E secondary 15
F Illiterate 50
G University. 60
With tie
256

Rank Rank di=x-y di2


X Y
5 3 2 4
There is tie among 2
observation i,e m=2 6 5.5 0.5 0.25
1.5 7 -5.5 30.25
3.5 5.5 -2 4
There are 3 tie 3.5 4 -0.5 0.25
7 2 5 25
1.5 1 0.5 0.25

∑ di2=64
6   64  0.5  0.5  0.5
rs  1   0.169
7( 48)
Exercise
257

The following table gives the distribution of the total population and
those who are totally or partially blind among them. Find out if there is
any correlation between age and blindness.

Age 0—10 10—20 20—30 30—40 40—50 50—60 60—70 70—80

No. of
persons in 100  60  40  36  24 11  6 3 
thousand

Blind 55 40 40 40 36 22 18      5 
Exercise
258

For data given on {Slide 16} Fit the following (IF POSSIBLE) & decide which
is the best fitted
X=c+dY Linear Curve
Y= a+bX+cX2 Parabolic Curve r=b x
d?

Y=a+bX+cX2+dX3 Cubic Curve
Y=a ebX Exponential Curve
Y=aXb Power Curve
Y=1/a+bX Hyperbolic Curve
Also find Mean Squares Regression , Mean squares Error & Coefficient of
Determination
ESTIMATION OF PARAMETERS
259
Outline
260

Estimators & Estimates

Properties of a Good Estimator

Point Estimators versus Interval Estimators

Confidence Intervals
261

Statistical
Methods

Descriptive Inferential
Statistics Statistics

Hypothesis
Estimation
Testing
Estimators & Estimates
262

 Estimators are the random variables used to estimate


population parameters, while the specific values of these variables
are the estimates.
 Example: the estimator of m is often
n

 Xi
X  i 1
n
but if the observed values of X are 1, 2, 3, and 6, the estimate is 3.
So the estimator is a formula; the estimate is a number.
Properties of a Good Estimator
263

1. Unbiasedness
2. Efficiency
3. Sufficiency
4. Consistency
Unbiasedness
264

 An estimatorˆ (“theta hat”) is unbiased if its expected value


equals the value of the parameter  (theta) being estimated.

ˆ
E ( )  
In other words, on average the estimator is right on target.
Examples
265

Since E(X)   , X is an unbiased estimator of  .

Since E(X/n)   , X/n is an unbiased estimator of  .

Since E(s 2 )   2 , s 2 is an unbiased estimator of  2 .


n

 (X - X ) 2

Recall that s 2  i 1
.
n 1
If we divided by n instead of by n-1, we would not have an unbiased estimator of s 2. That is why s2 is
defined the way it is.
Bias
266

ˆ
bias  E( )  
Note: The bias of an unbiased estimator is zero
Mean Squared Error (MSE)
267

ˆ
MSE  E[(   ) ]
2

which happens to equal


 2  bias 2
Efficiency
268

The most efficient estimator is the one with the smallest MSE.
Efficiency
269

Since MSE   2
 bias , 2

For unbiased estimators (where the bias is zero), MSE = s2.


So if you are comparing unbiased estimators, the most efficient one is
the one with the smallest variance.

If you have two estimators, one of which has a small bias & a small variance and

The other has no bias but a large variance, the more efficient one may be the one that is just slightly
off on average, but that is more frequently in the right vicinity.
Example: sample mean & median
270
As we have found, the sample mean is an unbiased estimator of m.
It turns out that the sample median is also an unbiased estimator of m.
We know the variance of the sample mean is s2/n.
The variance of the sample median is (p/2)(s2/n).
Since p is about 3.14, p/2 >1.
So the variance of the sample median is greater than s2/n, the variance of
the sample mean.
Since both estimators are unbiased, the one with the smaller variance
(the sample mean) is the more efficient one.
In fact, among all unbiased estimators of m, the sample mean is the one
with the smallest variance.
Sufficiency
271

An estimator is said to be sufficient if it uses all the information about


the population parameter that the sample can provide.
Examples
272

Example 1: The sample median is not a sufficient estimator because it


uses only the ranking of the observations, and not their numerical values
{with the exception of the middle one(s)}.

Example 2: The sample mean, however, uses all the information, and
therefore is a sufficient estimator.
Consistency
273

An estimator is said to be consistent if it yields estimates that converge


in probability to the population parameter being estimated as n
approaches infinity.

In other words, as the sample size increases, The estimator spends more
and more of its time closer and closer to the parameter value.

One way that an estimator can be consistent is for its bias and its variance
to approach zero as the sample size approaches infinity.
Example of a consistent estimator
274

distribution of estimator when n = 500


As the sample size
increases, the bias &
the variance are both
shrinking. distribution of estimator when n = 50

distribution of estimator when n = 5

m
Example (Sample Mean)
275
_
We know that the mean of X is m.
So its bias not only goes to zero as n approaches infinity, its bias is
always zero.

The variance of the sample mean is s2/n.


As n approaches infinity, that variance approaches zero.

since both bias and the variance go to zero, as n approaches infinity,


the sample mean is a consistent estimator.
A great estimator: the sample mean X
276

We have found that the sample mean is a great


estimator of the population mean m.
because
It is unbiased,
efficient,
sufficient,
& consistent.
Point Estimators versus Interval Estimators
277
Up until now we have considered point estimators that provide us
with a single value as an estimate of a desired parameter.

It is unlikely, however, that our estimate will precisely equal our
parameter.

We, therefore, may prefer to report something like this: We are 95%
certain that the parameter is between “a” and “b.”

This statement is a confidence interval.


Building a
Confidence Interval
278

-1.96
We know that P (0 < Z < 1.96) = 0.4750
0.4750
Then P(-1.96 < Z < 1.96) = 0.95
0 1.96 Z
X-
We also know that  is distributed as a standard normal (Z).
n
So there is a 95% probability that X-
- 1.96   1.96

n
X-
- 1.96   1.96

n
279

     
 - 1.96    X -   1.96  
Multiplying through by ,  n   n 
n    
     
Subtracting off , X - X - 1.96    -   -X  1.96  
 n   n 
   
Multiplying by -1 and flipping the      
inequalities appropriately, X  1.96      X - 1.96  
 n   n 
   
Flipping the entire expression,      
X - 1.96      X  1.96  
 n   n 
   
So we have a 95% Confidence Interval for the
Population Mean m
280

     
X - 1.96      X  1.96  
 n  n
   
Example
281
Suppose a sample of 25 students at a university has a sample mean IQ of 127. If the population standard
deviation is 5.4, calculate the 95% confidence interval for the population mean.
An intelligence quotient,
or IQ, is a score derived from one
      of several standardized tests
X - 1.96      X  1.96  
 n    n   designed to assess intelligence
 

 5.4   5.4 
127 - 1.96      127  1.96  
 25   25 
   
127 - 2.12    127  2.12
124 .88    129.12
We are 95% certain that the population mean is between 124.88 & 127.12
When we say we are 95% certain that the population mean m is between
124.88 & 127.12, it means that
282

The population mean m is a fixed number, but we don’t know what it is.
Our confidence intervals, however, vary with the random sample that
we take.

Sometimes we get a more typical sample, sometimes a less typical one.

If we took 100 random samples and from them calculated 100


confidence intervals, 95 of the intervals should contain the
population mean that we are trying to estimate.
What if we want a confidence level other than 95%?
283
     
X - 1.96      X  1.96  
 n   n 
   
In our formula, the 1.96 came from our the fact that the Z distribution
will be between -1.96 and 1.96 95% of the time.
To get a different confidence level, all we need to do is find the Z values such
that we are between them the desired percent of the time.
Using that Z value, we have the general formula for the confidence interval for
the population mean m :
     
X - Z      X  Z  
 n   n 
   
Determining Z values for confidence intervals
0.9800
284

0.4900

-k 0 k
-2.33 Z 2.33
Suppose we want a 98% confidence interval.

We need to find 2 values, call them –k and k, such that Z is between them 98% of the time.

Then Z will be between 0 and k with probability half of 0.98, which is 0.49 .

Look in the body of the Z table for the value closest to 0.49, which is 0.4901 .

The number on the border of the table corresponding to 0.4901 is 2.33.

So that is your value of k, and the number you use for Z in your confidence interval.
Sometimes 2 numbers in the Z table are equally close to the value you want.

285

For example, if you want a 90% confidence interval, you look for half of 0.90 in the body of the Z
table, that is, 0.45

You find 0.4495 and 0.4505 Both are off by 0.0005.

The number on the border of the table corresponding to 0.4495 is 1.64


The number corresponding to 0.4505 is 1.65

Usually in these cases, we use the average of 1.64 and 1.65, which is 1.645

Similarly for the 99% confidence interval, we usually use 2.575


Which interval is wider: One with a higher confidence level (such as
99%) or one with a lower confidence level (such as 90%)?
286

You would definitely be more confident with the wider interval.

Thus, when the confidence level needs to be very high (such as 99%), the
interval needs to be wide.
Let’s redo the IQ example with a different confidence level
287

We had a sample of 25 students with a sample mean IQ of 127. The


population standard deviation was 5.4 .
Calculate the 99% confidence interval for the population mean.
     
Our general formula is: X - Z      X  Z  
 n   n 
   

We said that the Z value for 99% confidence is 2.575.


Putting in our values,
 5.4   5.4 
127 - 2.575      127  2.575  
 25   25 
   

or 124.22 < m < 129.78


We had for the 95% confidence interval
288

124.88 < m < 129.12


We just got for the 99% confidence interval:
124.22 < m < 129.78
The 99% confidence interval starts a little lower & ends a little
higher than the 95% interval.
So the 99% interval is wider than the 95% interval, as we said it
should be.
TESTING OF HYPOTHESIS
289
Outline
290

 Basics of Hypothesis Testing

 Testing a Claim About a Mean μ , When  Known

 Testing a Claim About a Mean μ, When  Not Known

 Testing a Claim About a Proportion

 Testing a Claim About a Standard Deviation or Variance


291

Statistical
Methods

Descriptive Inferential
Statistics Statistics

Hypothesis
Estimation
Testing
Objectives
292
Given a claim, identify the null hypothesis, and the alternative hypothesis, and
express them both in symbolic form.

Given a claim and sample data, calculate the value of the test statistic.

Given a significance level, identify the critical value(s).

Given a value of the test statistic, identify the P-value.

State the conclusion of a hypothesis test in simple terms.

Identify the type I and type II errors that could be made when testing a given
claim.
Hypothesis Testing
293

After 2 hours of frustration trying to fill out a


NTS form, you are skeptical about the NTS
claim that the form takes 15 minutes on
average to complete.

How would you challenge the NTS claim?


Methods to Test a Claim
294

First method find a confidence interval for the


average amount of time to fill out the form, and then
determine whether the interval suggests an average
different from 15 minutes.

Another method to test a claim conduct a Hypothesis


Testing
Hypothesis
295

Hypothesis is a claim or statement about a


property of a population(Parameter)

Hypothesis Test is a standard procedure


for testing a claim about a property of a
population
Example
296

A quality engineer would like to determine whether the production


process he is charged of monitoring is still producing products whose
mean response value is supposed to be m0 (process is in-control), or
whether it is producing products whose mean response value is now
different from the required value of m0 (process is out-of-control).

 Statement 1 (Null):  = 0 (process in-control)

 Statement 2 (Alternative):   0 (process out-of-control)


297
Null Hypothesis Alternative Hypothesis

 Represented by H0  Represented by H1
 Statement about the value of a  Statement about the value of a
population parameter that must be
population parameter that is under true if the null hypothesis is false
investigation.  Stated in on of three forms
 Always stated as an Equality >
It asserts there is no change. <
For the NTS form, the null 
For the NTS form, the alternative
hypothesis is:
hypothesis is:
H0:  = 15 minutes
H1:  > 15 minutes
Null Hypothesis: H0
298

Statement about value of population parameter that is


equal to some claimed value
H0:  = 98.6
H0: p = 0.5
H0:  = 15
Test the Null Hypothesis directly
Reject H0 or fail to reject H0
Form of Alternative Hypothesis
299

Left-tailed Tests: H0:  = k; H1:  < k


Right-tailed Tests: H0:  = k; H1:  > k
Two-tailed Tests: H0:  = k; H1:  ≠ k
Note about Forming Your Own Claims (Hypotheses)
300

If you are conducting a study and want to use a hypothesis test to support
your claim, the claim must be worded so that it becomes the alternative
hypothesis. This means your claim must be expressed using only


<
>
Type of Test to Use
301

This depends on what you suspect. For the NTS form, you suspected
the mean was greater time was greater than claimed, so you would
lean to a right-tailed test.

If you suspect that the average length of time you get from your
phone battery is less than claimed, you would use a left-tailed test.

If the production process mean response value is supposed to be m


0
(process is in-control), or mean response value is now different from
the required value of m0 (process is out-of-control).(two tailed test).
Data to Collect
302

You will collect information similar to that done for confidence


intervals.

If the distribution is not normal, you will need a sample size of at least
30 to test the mean. If the population standard deviation is known, you
will use z.

 If the population standard deviation is not known, you may use t,


especially if the sample is not very large.
Test Statistic
303

If the distribution is normal (or the sample size is larger than 30) and
the standard deviation is known. Then

x 
z/ n
Hypothesis Tests when  is unknown
304

Follow same procedures as before. If distribution is not approximately


normal, then the sample size must be at least 30.

Except use the t-distribution with d.f.=n-1 and the test statistic will be

x 
t s/ n
Test Statistic For Proportion
305

The test statistic is a value computed from the sample data, and it is
used in making the decision about the rejection of the null
hypothesis.



z=p-p Test statistic for


pq proportions
 n
Test Statistic for Variance
306

The test statistic is a value computed from the sample data, and it is
used in making the decision about the rejection of the null
hypothesis.

(n – 1)s2
2 =
 2
Conclusions
307

 Every hypothesis test ends with the experimenters (you and I) either
 Rejecting the Null Hypothesis, or
 Failing to Reject the Null Hypothesis

 As strange as it may seem, you never accept the Null Hypothesis.


The best you can ever say about the Null Hypothesis is that you don’t
have enough evidence, based on a sample, to reject it!
P-Value308

The P-value (or p-value or probability value) is the probability of


getting a value of the test statistic that is at least as extreme as the one
representing the sample data, assuming that the null hypothesis is true.

The null hypothesis is rejected if the P-value is very small, such as 0.05
or less. The smaller the P-value, the stronger the evidence against H0
Interpreting the p-value
309

The smaller the p-value, the more statistical evidence exists to support
the alternative hypothesis.
 If the p-value is less than 1%, there is overwhelming evidence that
supports the alternative hypothesis.
 If the p-value is between 1% and 5%, there is a strong evidence that
supports the alternative hypothesis.
 If the p-value is between 5% and 10% there is a weak evidence that
supports the alternative hypothesis.
 If the p-value exceeds 10%, there is no evidence that supports the
alternative hypothesis.
Interpreting the p-value
310
Overwhelming Evidence
(Highly Significant)
Strong Evidence
(Significant)

Weak Evidence
(Not Significant)

No Evidence
(Not Significant)

0 .01 .05 .10


311

If we reject the null hypothesis, we conclude that there is enough


evidence to infer that the alternative hypothesis is true.

If we fail to reject the null hypothesis, we conclude that there is not
enough statistical evidence to infer that the alternative hypothesis is
true.
This does not mean that we have proven that the null hypothesis is true.
Types of Errors
312

A Type I Error occurs if we reject the null hypothesis when it is


true.
A Type II error occurs if we fail to reject the null hypothesis if it is
false.
Example
A type I error is analogous to convicting an innocent person for a
crime they didn’t commit.
A type II error is analogous to failing to convict a guilty person.
Type I & Type II Error
313

Referring
to Ho, the
Null
Hypothesis
True False

Reject Type I O.K


Error
Fail to O.K. Type II
Reject Error
Level of Significance
314

 The level of significance  is the probability of rejecting the null hypothesis


when it is true.

 A common level of significance is .05 (that means if we reject the null hypothesis,
we will be at least 95% sure that the null hypothesis is false).

 We will reject the null hypothesis if P-value ≤ 

 If P-value > , we do not reject the null hypothesis


Summary of Hypothesis Tests
315

Determine the null and alternative hypothesis and set the level of
significance 

Collect the data and compute the test statistic

Compute the P-value

If P-value ≤ , then reject H0; If P-value > , then do not reject H0
Example
316
317
Critical Region (or Rejection Region)
318

Set of all values of the test statistic that would cause a rejection of the
null hypothesis
Right-tailed Test
319
H 0: =
H 1: > Points Right

Values that
differ significantly
from Ho
Left-tailed Test
320
H0 : =
H1 : <
Points Left

Values that
differ significantly
from Ho
Critical Region Method
321

As with previous method for hypothesis tests, determine H0, H1 and .

Instead of waiting to compute P-value and compare to , you


predetermine the critical region, that is the values of the test statistic at
which you will reject H0.

Then compute test statistic, and if it is in the critical region, reject H0


otherwise do not reject H0 .
Example
322
The mean and standard deviation 17.09 and 3.87 (respectively).
H1: µ ≠ 17.09
H0: µ = 17.09
At a 5% significance level (i.e. α = .05), we have
α
 /2 = .025 Thus, z.025 = 1.96 and our rejection region is:
z < –1.96 -or- z > 1.96

z
-z.025 0 +z.025
323

 The Estimated mean = 17.55 from a sample of 100 observation

Using our standardized test statistic:

We find that:

Since z = 1.19 is not greater than 1.96, nor less than –1.96 we cannot reject
the null hypothesis in favor of H1.
Example
324
ˆ
A survey of n = 880 randomly selected adult drivers showed
that 56% of those respondents admitted to running red lights.
Find the value of the test statistic for the claim that the
majority of all adult drivers admit to running red lights.
Solution 325

The preceding example showed that the given claim results in the following null and alternative hypotheses:
H0: p = 0.5 and H1: p > 0.5

Because we work under the assumption that the null hypothesis is true with p = 0.5, we get the following test
statistic:

z=p–p


= 0.56 - 0.5 = 3.56

 pq
n  (0.5)(0.5)
880
326
Decision Criterion
327

Traditional method:
Reject H0 if the test statistic falls within the critical region.
Fail to reject H0 if the test statistic does not fall within the critical region.

P-value method:
Reject H0 if P-value   (where  is the significance level, such as 0.05).
Fail to reject H0 if P-value > .
Decision Criterion
328

Confidence Intervals:
Because a confidence interval estimate of a population parameter
contains the likely values of that parameter,

Reject a claim that the population parameter has a value that is not
included in the confidence interval.
Decision
329

True State of Nature


The null The null
hypothesis is hypothesis is
true false

Type I error
We decide to Correct
(rejecting a true
reject the decision
null hypothesis)
null hypothesis
Decision

Type II error
We fail to Correct (rejecting a false
reject the decision null hypothesis)
null hypothesis

Controlling Type I and Type II Errors
330

· α is the probability of Type I error


· β is the probability of Type II error
· The experimenters (you and I) have the freedom to set the -level for a
particular hypothesis test. That level is called the level of significance for the
test. Changing  can (and often does) affect the results of the test—whether you
reject or fail to reject H0.
 It would be wonderful if we could force both  and  to equal zero.
Unfortunately, these quantities have an inverse relationship. As  increases, 
decreases and vice versa.
 The only way to decrease both  and  is to increase the sample size. To make
both quantities equal zero, the sample size would have to be infinite—you would
have to sample the entire population.

You might also like