Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
603 views

Probability and Statistics

This document provides an introduction to probability and statistics for scientists and technologists. It covers basic concepts of probability, random variables, probability distributions, descriptive statistics, inferential statistics, hypothesis testing, regression models, and experimental design. The document is divided into four parts and contains multiple chapters within each part.
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
603 views

Probability and Statistics

This document provides an introduction to probability and statistics for scientists and technologists. It covers basic concepts of probability, random variables, probability distributions, descriptive statistics, inferential statistics, hypothesis testing, regression models, and experimental design. The document is divided into four parts and contains multiple chapters within each part.
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 135

Probability and Statistics

For Scientist and Technologist


A Course in Probability and Statistics
By
Radzuan Razali
Afza Shafie
1
Contents:
Part One: Probability, Random Variables and Distribution
Chapter 1
1. The basic concept of probability
1.1 Introduction
1.2 Sample Space, probability of events, counting rule
1.3 Conditional probability
1.4 ultiplication rule
1.! "ayes theorem
Chapter 2#
2. $iscrete random variable and probability distribution
2.1 Introduction
2.2 $iscrete random variable
2.3 $iscrete probability distribution
2.4 Special functions for discrete probability distribution
Chapter 3#
3. Continuous random variable and probability distribution
3.1 Introduction
3.2 Continuous random variable
3.3 Continuous probability distribution
3.4 Special functions for continuous probability distribution
Part Two: Descriptive Statistics
Chapter 4#
4. $ata display and summary of data
4.1 Introduction
4.2 The definition and the difference bet%een sample and population
4.3 &raphical display of data# Stem and leaf, and "o'(plot
4.4 The mean, variance and standard deviation of the data
2
Chapter !#
!. )andom sample, Central *imit Theorem, +ormal ,ppro'imation and Statistical process
control# -(bar and )(charts.
!.1 Introduction
!.2 )andom sample
!.3 The sampling distribution of -
!.4 Central *imit Theorem
!.! +ormal ,ppro'imation for "inomial and .oisson distributions.
!./ Statistical process control# -(bar and )(charts
Part Three: Inferential Statistics
Chapter /#
/. 0ypothesis testing for single population
/.1 Introduction
/.2 Test about a sample mean for large sample, population variance is 1no%n
/.3 The P-value for the test and confidence interval for mean
/.4 Test about sample mean for small sample, population variance is un1no%n
/.! Test about sample mean for small sample, population variance is un1no%n but the
sample si2e is large, n 3 34.
/./ The P-value for the test and confidence interval for mean
/.5 Test about proportion
/.6 The P-value for the test and confidence interval for proportion
/.7 Confidence interval for proportion
/.14 Test about variance
/.11 The P-value for the test and confidence interval for variance
Chapter 5#
5. 0ypothesis testing for t%o populations
5.1 Introduction
5.2 Test about the difference bet%een the means of t%o populations %ith variances are
1no%n
5.3 The P-value for the test and confidence interval bet%een the means of t%o
populations
5.4 Test about the difference bet%een the means of t%o populations %ith variances are
un1no%n but assuming to be e8ual
5.! The P-value for the test and confidence interval bet%een the means of t%o
populations
5./ Test about the difference bet%een the means of t%o populations %ith variances are
un1no%n but assuming there are not e8ual
3
5.5 The P-value for the test and confidence interval bet%een the means of t%o population
means.
5.6 Test about the difference bet%een the means of t%o population means %ith variances
are un1no%n but the samples si2es for both populations are large, n
1
3 34 and n
2
3 34.
5.7 The P-value for the test and confidence interval bet%een the means of t%o populations
5.14 Test about the difference bet%een the proportions of t%o populations
5.11 The P-value for the test and confidence interval bet%een the proportions of t%o
populations
Chapter 6#
6. Simple linear regression model
6.1 Introduction
6.2 *east s8uares estimator to determine the intercept and slope
6.3 ,ssessment of the regression# standard error of estimate, coefficient of determination
and t-test of the parameters.
6.4 Significance test for the regression model
Chapter 7#
7. ultiple linear regression models
7.1 Introduction
7.2 +ormal e8uations
14. The coefficient of determination
11. 6.4 Confidence intervals and significance tests
Part our: Desi!n of "#periments
Chapter 6#
12. The design and analysis of e'periments
14.1 Introduction design of e'periment
14.2 9ne(%ay ,+9:,
14.3 T%o(%ay ,+9:,
Appendix
Table 1# The normal ( $ distribution
Table 2# The Student;s t(distribution
Table 3# The chi(s8uared,
2
distribution
Table 4# The (distribution
4
Preface
This boo1 provides an introduction to probability and statistics, %ith particular emphasis on
applications in applied sciences, technology and engineering. Typically introductory te'ts on
engineering statistics spend a great deal of time on basic probability ideas for the first several
chapters. In fact, basic probabilities can easily fill up a standard introductory course. "ecause
engineering students often have only one probability<statistics course, the material needs to be
reorgani2ed in order to allo% for coverage of statistical methodology.
This boo1 %ill be divided to four parts= part 1 is related to the basic concept of probability and
the distributions as in the Chapter 1 till Chapter 4. Chapter 1, %e give a brief introduction to the
basic concept of the probability. In Chapter 2, %e introduce the definition of discrete random
variables and the probability distributions. The continuous and their probability distributions %ill
be discussed in Chapter 3.
.art 2 is covering on descriptive statistics as in the Chapter 4 and Chapter !. In Chapter 4, %e
introduce the types of ho% to display data and summary of the data. ean%hile, the random
sample, central limit theorem, normal appro'imation and statistical process control %ill be
discussed in the Chapter !.
The engineering students also must be given some e'perience on ho% to do a basic data analysis.
The inferential statistics are important as a part of statistical methods to do the data analysis. .art
III is covering the inferential statistics such as in the Chapter / till Chapter 7. In Chapter /, %e
introduce the hypothesis testing for single population and the hypothesis testing for t%o
populations %ill be discussed in Chapter 5. >hile in Chapter 6 and Chapter 7, %e introduce the
simple and multiple linear regressions, respectively.
?inally, in part I:, %e also %ant the students have some e'perience on the real application in
engineering. The related topics such as a factorial design and design of e'periment %ill be
discussed in the Chapter 14.
Afza Shafie
Radzuan Razali
!
,pril 2414.
Chapter 1
1. Basic concept of probability
*earning ob@ectives#
,t the end of this chapter, student should be able to#
$efine and construct sample space of an e'periment.
$efine random events, identify types of events, apply :enn $iagram and la%s To
find event set including intersection, union and complement.
Identify mutually e'clusive and e'haustive events.
To apply "ayes; theorem to find the conditional probability of an event %hen the
event is partitioned into several mutually e'clusive and e'haustive subsets.
1.1 Introduction
.robability theory refers to the study of randomness and uncertainty. .robability forms
the basis 1no%ledge %hich %e can ma1e inferences about a population based on the
distribution and it;s provide methods for 8uantifying the chances or li1elihood associated
%ith various outcomes. .robability helps to e'plain a lot of everyday occurrences and %e
actually discuss it fre8uently.
.robability also has been used everyday in engineering and technology. ?or e'ample# the
probability of a good part being produce, the reliability of a ne% machine Areliabilities are
actually probabilitiesB etc.
,n engineer %ants to be fairly certain that the percentage of good rods is at least 74C=
other%ise he %ill shut do%n the process for recalibration. 0o% certain that he has at least
74C of the 1444 rods are goodD
>hat is the different bet%een probability and inferential statisticsD .robability is
involving properties of the population under study %hich are assumed 1no%n and
8uestions regarding a sample ta1en from the population are posed and ans%ered. >hile,
inferential statistics is involved a characteristics of a sample %hich are available to the
e'perimenter and this information enables e'perimenter to dra% conclusions about the
populations.
/
1.1.1 $efinition#
Some definitions or terms in basic probability must be 1no%n and %ell understand.
,mong the definitions are#
Rando Process is a situation in %hich possible results are 1no%n but actual results
cannot be predicted %ith certainty in advance.
!utcoe is related to each possible result for a random process
"#perient is a process by %hich an observation or measurement is obtained Ayield
outcomesB
1$% Saple Space$ probability of e&ents$ counting rule
In the process of collecting data before analysis and interpretation being done, the method
of ho% to model the random e'periment is crucial. The terms related to it such as sample
space and an event are important.
1.2.1 Sample Space#
Sample space denoted by S, is the set of all possible outcomes of an e'periment.
Event is any collection AsubsetB of outcomes contained in the sample space S.
,n event is called simple if it consists of e'actly one outcome and called compound
event if it consists of more than one outcome. ean %hile the null event is an event %ith
no outcomes. This is actually impossible event or empty set.
E'ample 1.1#

E'periment of roll a die#
The sample space is# S = {1, 2, 3, 4, 5, 6}
The simple events Aor outcomesB are#

E
1
: obseve !o" 1 = {1} E
2
= {2} E3 = {3}
E
4
= {4} E
5
= {5} E
6
= {6}
The compound events are#
A # observe an odd number F {1, 3, 5}
# # observe a number greater than or e8ual to 4 F {4, 5, 6}
5
E'ample 1.2#
Toss a coin for three times and observed the number of heads. The sample space is,

S = G$, 1, 2, 3}
The sample space for the lifetime of a machine Ain daysB is,

S = G t % t & $ H = I $, ' B
The sample space for the number of calls at a telephone e'change during a specific time
interval is,
S = G$, 1,("}
The 1no%ledge in set theory is important to understand the basic of probability. The
union of events A and # denoted by A ) # and read JA or #K is the event consisting of all
outcomes that are either in A or in # or in both events"
The intersection of A and # denoted by A * # and read JA and #K, is the event consisting
of all outcomes that are in both A and #.
The complement of event A, denoted by A
+
, is the event of all outcomes in the sample
space S that are not contained in event A.
If t%o events A and # have no outcomes in common they are said to be mutually
e'clusive or dis@oint events. This means that if one of the events occurs the other cannot.
,ll these events can be visuali2ed in term of :enn diagram#
1.2.2 .robability of Events
,n event is a subset of all of the possible outcomes of an e'periment. The probability of
event is to assign for each event, say E, a number, P,E-, called the probability of E %hich
%ill give a precise measure of the chance that E %ill occur. The probability of an event E,
is defined as the ratio of the number of outcome favorable to the event, n divided by the
total number of all possible outcomes, !. That is P,E- = n.!.
?or e'ample, in the e'periment tossing a die repeatedly, in the long run, %hat %ould %e
e'pect that the probability of even number %ill occurs, P,E=2 o 4 o 6-D
6
In this e'periment, an event is even number %ill occur three times, so n=3" The total
possible outcomes is si', so !=6. 0ence the probability of even number %ill occur is,
P,E=2 o 4 o 6-=3.6=$"5
Condition of .robability
, probability denoted by P is a rule Aor functionB %hich assigns a number bet%een 4 and
1 to each event and must satisfies#
% & P'"( & ) for any e&ent "
P'* ( + % , P'S( + ),
'f ,
)
, ,
-
, . is an infinite collection of mutually e#clusive e&ents$ then
The probability of the complement of any event , is given as
?or e'ample, if PArain tomorro%B F 4./ then PAno rain tomorro%B F 4.4
9ther notations for complement for A is A
/
or 0
E'ample 1.3#
,n oil(prospecting firm plans to drill t%o e'ploratory %ells. .ast evidence is used to
assess the possible outcomes listed in the follo%ing table#
Event $escription .robability
,
"
C
+either %ell produces oil nor gas
E'actly one %ell produces oil or gas.
"oth %ells produce oil or gas
4.6!
4.12
4.43
?ind and give description.
7
1 2 1 2
A ...B A B A B ... P A A P A P A + +
A LB 1 A B P A P A
B A B A B, A
+
# P and + # P # A P
Solution#
Events A, # and + are mutually e'clusive because the occurrence of one event
precludes the occurrence of either of the other t%o.

P,A o #- = P,A- 1 P,#- = $"23 Aprobability at most one %ell produces oil or
gasB
P,# o +- = P,#- 1 P,+-= $"15 Aprobability at least one %ell produces gas or oil
P,#4- = 1 5 P,#- = $"66 Aprobability both %ells not produce or both produce oil or gasB
1.2.3 &eneral ,ddition *a%
*et A and # be t%o events defined in a sample space S.


If t%o events A and # are 7utuall8 ex/lusive, then

Thus
This can be e'panded to consider ore than t(o mutually e'clusive events.
E'ample 1.4
9ne of the residential in Ipoh, 4!C of all households subscribe to the Sinar 0arian
ne%spaper published in a nearby city, 5!C subscribe to the Mtusan alaysia, and 34C of
all households subscribe to both papers. $ra% a :enn diagram for this problem.
If a household is selected at random, %hat is the probability that it subscribes to
aB ,t least one of the t%o ne%spapers
bB E'actly one of the t%o ne%spapers
Solution#
aB A F event subscribe to Sinar 0arian, # F event subscribe to Mtusan alaysia

P,A ) #- = 9 P,A- 1 P,#- 5 P,A * #-: = $"45 1 $"35 5 $"3$ = $"2
bB P Ae'actly oneB F P ,A * #4- 1 P ,A4 * #- = $"15 1 $"45 = $"6

14
+ P)A B* P)A* P)B* P)A B*
P)A B* +
+ P)A B* P)A* P)B*
The probability of an event A e8uals the number of outcomes Asample pointsB contained
in A divided by the total number of possible outcomes. That is#
P',( + n',( / n'S(
'portant condition# all outcomes are e8ually li1ely to occur. Inefficient %hen n,S- is
large.
1.2.4 Counting )ule#
Eliminates the need for listing each simple event and help to easily assigned probabilities
to various events %hen the outcomes are e8ually li1ely. Especially helpful if the sample
space is 8uite large.
.roduct AultiplicationB )ule
If there are ; elements A or thingsB to choose and there are n
1
choices for the first
element, n
2
for the second element, and so on to n
;
choices for the ;
t<
element,
then the number of possible %ays of selecting them is only applies %hen elements are
different or the order of elements matters.
E'ample 1.!#

, chemical engineer %ishes to conduct an e'periment to determine ho% these four
factors affect the 8uality of the coating. She is interested in comparing t%o charge levels,
three density levels, four temperature levels, and five speed levels. 0o% many
e'perimental conditions are possibleD
Solution#
The possible e'periment conditions are 2'3'4'5=12$
.ermutations and Combinations
.ermutation is an odeed arrangement of ; ob@ects ta1en from a set of n distinct ob@ects A
; = n B.
The nu7be o> ?a8s of permutation of ; ob@ects from n distinct ob@ects %ill be denoted
by the symbol P
;,n
11
*, )
,
P P
; n
n
n
; n ;


,
E'ample 1./#
6 teaching assistants are available to grade an e'am of four 8uestions. >ish to select a
different assistant to grade each 8uestion Aonly one assistant per 8uestionB. 0o% many
possible %ays can the assistant are chosen for gradingD

Solution#
The number of possible %ays is
Combination
Combination# an unodeed subset of ; ob@ects ta1en from a set of n distinct ob@ects.
The nu7be o> ?a8s of combination of ; ob@ects from n distinct ob@ects is denoted by the
symbol +
;,n
.ermutation vs. Combination
.ermutations are larger in number than combinations# e.g., the three numbers A1,2, 3B, A1,
3,2B A2,3,1B , A3,1,2B, A3,2,1B are all different permutations of the numbers 1, 2 and 3.
0o%ever, they all represent the same combination of numbers.
E'ample 1.5#
?ifteen players compete in a tournament. In ho% many %ays can
aB ran1ings be assigned to the top five competitorsD
bB the best five competitors be randomly chosenD
Solution
The number of ran1ings that can be assigned to the top five competitors is
The number of %ays that five competitors can be chosen is
12
1/64
6
4
P P
n
;
N N
N
,
* ) ; n ;
n
;
n
+ +
;
n n ;

,
_


,
N
NA BN N
; n ;
n
n P
n
+
; ; n ; ;
_

,
3/4 , 3/4
1!
!
P P
n
;
443 , 3
N 14 N !
N 1!
!
1!

,
_

n
;
+
1.- Conditional Probability. 'ndependent "&ents.
Sometimes it is useful to 1no% the probability that an event %ill occur given that another
event occurred. &iven t%o possible events, if %e 1no% that one event occurred then this
information can be applied in calculating the other event;s probability.
1.3.1 Conditional .robability
The conditional probability of A, given that # has already occurred, is denoted as P ,A %
#- and defined as#
4 B A provided ,
B A
B A
B A

# p
# p
# A p
# A p
The conditional probability of #, given that A has already occurred, is denoted as P , # %
A- and defined as#
4 B A provided ,
B A
B A
B A

A p
A p
# A p
A # p
E'ample 1.6#
The Information )esource CenterAI)CB, MT. displays three types of boo1s entitled
JScienceK ASB, JEngineeringK AEB, and JTechnologyK A@B. )eading habits of randomly
selected reader %ith respect to these types of boo1s are
)ead regularly S E @ S*E S*@ E*@ S*E*@
.robability $"14 $"23 $"33 $"$6 $"$2 $"13 $"$5
?ind the follo%ing probabilities and interpret
a- P, S % E -
b- P, S %E ) @ -
/- P, S % eads at least one -
d- P, S ) E % @-
Solution#
13
3456 . 4
23 . 4
46 . 4
B A
B A
B A

E p
E S p
E S p
2!!3 . 4
45 . 4
12 . 4
B A
B A
B A



@ E p
@ E S p
@ E S p
26!5 . 4
47 . 4
14 . 4
B A
B A
B A
B A

B A B one least at reads A


@ E S p
S p
@ E S p
@ E S S p
@ E S S p S p
!74/ . 4
35 . 4
22 . 4
B A
B A
B A


@ p
@ E S p
@ E S p
1.3.2 Independent Events
The probability of both events occurring can be calculated by rearranging the terms in the
e'pression of conditional probability.
T%o events A and # are called independent if the probability of event A is not affected by
the occurrence of event #, so and
E'ample 1.7#
In rolling a fair die, let event A = {1, 3, 5} and event # = {4, 5, 6}"
,re events A and # independentD
Solution#
14
0.07
0.20
0.02
0.04
0.03
0.08
0.05
S
E
T
-/ -. x
B A B O A A P # A P
B A B A B A # P # A P # A P
B A B A B A # P A P # A P
P,A- = A, P,#-=1.2 and
Since , so A and # are not independent events.
>hat;s the difference bet%een 7utuall8 ex/lusive and independent eventsD
T%o events mutually e'clusive Adis@ointB# both cannot happen %hen the e'periment is
performed, so P, A% #- = $, or vice versa
T%o mutually e'clusive events# P,A * #- = $ and P, A ) #- = P,A- 1 P,#-
utually e'clusive events must be dependent.
T%o events are independent# P, A ) #- = P,A- 1 P,#- 5 P,A * #-
E'ample 1.14#
Toss a single die and observe the events
A# a number less than 4
## a number less than or e8ual to 2
C# a number greater than 3
,re events A and # independentD ,re events A and # mutually e'clusiveD
,re events A and + independentD ,re events A and + mutually e'clusiveD

Solution#
P,A- = A , P,#- = 1.3, P,+ - = A
P, A % #- B P,A-, A and # dependent but not mutually e'clusive.
A and + are dependent but mutually e'clusive.
1.0 Bayes theore
1.4.1 ultiplicative *a% of .robability and Independence
?or t%o events , and ",
A B A O B. A B P A # P A # P #
Events , and " are independent if and only if
If events A
1
, (("", A
;
are independent then,
1!
A B A B. A B P A # P A P #
1 2 1 2
A ... B A B A B A B
; ;
P A A A P A P A P A
B A B A B A # P A P # A P
/ < 1 B A # A P
ultiplication rule is most useful %hen the e'periment consists of several stages in
succession. The conditioning event, #, describes the outcome of the first stage and A the
outcome of the second, so that P, A% #- P conditioning on %hat occurs first %ill often be
1no%n.
E'ample 1.11#
$uring a space shot, the primary computer system is bac1ed up by t%o secondary
systems. They operate independently of one another, and each is 7!C reliable.
>hat is the probability that all three systems %ill be operable at the time of the launchD
Solution
*et,
A
1
# event main system is operable
A
2
# event first bac1up is operable
A
3
# event second bac1up is operable
&iven P,A
1
- = P,A
2
- = P,A
3
- = $"25
Since they operate independently
P,A
1
* A
2
*A
3
- = P,A
1
-P,A
2
- P,A
3
- = $"653
1.4.2 The *a% of Total .robability
Suppose #
1
, #
2
,(, #
n
are 7utuall8 ex/lusive and ex<austive in S, then for any event A
1.4.3 "ayes; Theorem
Suppose #
1
, #
2
,(, #
n
are mutually e'clusive and e'haustive A%hose union is SB. *et A
be an event such that P,A- 3 4. Then for any event #
C
, C =1, 2, (, n,
E'ample 1.12#
, store stoc1s bulbs for *C$ pro@ector from three suppliers. Suppliers A, #, and +
supply 14C, 24C, and 54C of the bulbs respectively. It has been determined that
company A;s bulbs are 1C defective %hile company #;s are 3C defective and company
1/
1
A B A O B A B
A O B
A B
A O B A B
; ; ;
; n
i i
i
P A # P A # P #
P # A
P A
P A # P #

1 1
A B A B A O B A B
n n
i i i
i i
P A P A # P A # P #



+;s are 4C defective. If a bulb is selected at random and found to be defective, %hat is
the probability that it came from supplier #D
Solution#
*et D is a defective, then the probability that it came from supplier # is
15
( )
( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
O
O
O O O
P # P D #
P # D
P A P D A P # P D # P + P D +

+ +
( )
( ) ( ) ( )
4.2 4.43
4.1 4.41 4.2 4.43 4.5 4.44

+ +
4.1514
"#ercise Chapter 1:
1.Each message in a digital communication system is classified as to %hether it is received
%ithin the time specified by the system design. If 3 messages are classified, %hat is an
appropriate sample space for this e'perimentD
2., digital scale is used that provide %eights to the nearest gram. *et event ,# a %eight e'ceeds
11 grams, "# a %eight is less than or e8ual to 1! grams, C# a %eight is greater than or e8ual to
6 grams and less than 12 grams. >hat is the sample space for this e'perimentD and find
AaB A M " AbB A4 AcB , * "
AdB AA M CB; AeB , * " * C AfB "; * C
3. Samples of building materials from three suppliers are classified for conformance to air(
8uality specifications. The results from 144 samples are summari2ed as follo%s#

Conforms
Qes +o
Supplie
r
) 34 14
S 22 6
T 2! !
*et , denote the event that a sample is from supplier ), and " denote the event that a sample
conforms to the specifications. If sample is selected at random, determine the follo%ing
probabilities#
AaB P,A- AbB P,#- AcB P,#4-
AdB P,A)#- AeB P,A #- AfB P,A)#4-
AgB
B A # A P
,<-
B A A # P
4. The compact discs from a certain supplier are analy2ed for scratch and shoc1 resistance. The
results from 144 discs tested are summari2ed as follo%s#

Scratch
)esistance
0igh *o%
Shoc1
)esistance
0igh 34 14
edium 22 6
*o% 2! !
*et , denote the event that a disc has high shoc1 resistance, and " denote the event that a
16
disc has high scratch resistance. If sample is selected at random, determine the follo%ing
probabilities#
AaB P,A- AbB P,#- AcB P,#4-
AdB P,A)#- AeB P,A #- AfB P,A)#4-
AgB
B A # A P
,<-
B A A # P
!. The reaction times A in minutesB of a reactor for t%o batches are measured in an e'periment.
AaB $efine the sample space of the e'periment.
AbB $efine event , %here the reaction time of the first batch is less than 4! minutes and event
" is the reaction time of the second batch is greater than 5! minutes.
AcB ?ind A M ", , * " and ,;
AdB :erify %hether events , and " are mutually e'clusive.
/. >hen a die is rolled and a coin is tossed, use a tree diagram to describe the set of possible
outcomes and find the probability that the die sho%s an odd number and the coin sho%s a
head.
5. , bag contains 3 blac1 and 4 %hile balls. T%o balls are dra%n at random one at a time
%ithout replacement.
AiB >hat is the probability that a second ball dra%n is blac1D
AiiB >hat is the conditional probability that first ball dra%n is blac1 if the second ball is
1no%n to be blac1D
6. ,n oil(prospecting firm plans to drill t%o e'ploratory %ells. .ast evidence is used to assess
the possible outcomes listed in the follo%ing table#
?ind and give description for

17
Event $escription .robability
,
"
C
+either %ell produces oil or gas
E'actly one %ell produces oil or gas
"oth %ells produce oil or gas
4.64
4.16
4.42
B L A B A B, A # P and + # P # A P
7. In a residential suburb, /4C of all households subscribe to the metro ne%spaper published in
a nearby city, 64C subscribe to the local paper, and !4C of all households subscribe to both
papers. $ra% a :enn diagram for this problem. If a household is selected at random, %hat is
the probability that it subscribes to
AaB at least one of the t%o ne%spapers
AbB e'actly one of the t%o ne%spapers
14. In a student organi2ation election, %e %ant to elect one president from five candidates, one
vice president from si' candidates, and one secretary from three candidates. 0o% many
possible outcomesD
11. Suppose each student is assigned a ! digit number. 0o% many different numbers can be
createdD
12. , chemical engineer %ishes to conduct an e'periment to determine ho% these four factors
affect the 8uality of the coating. She is interested in comparing three charge levels, five
density levels, four temperature levels, and three speed levels. 0o% many e'perimental
conditions are possibleD
13. , menu has five appeti2ers, three soup, seven main course, si' salad dressings and eight
desserts. In ho% many %ays can
AaB a full meal be chosenD
AbB a meal be chosen if either and appeti2er or a soup is ordered, but not bothD
14. Ten teaching assistants are available to grade a test of four 8uestions. >ish to select a
different assistant to grade each 8uestion Aonly one assistant per 8uestionB. 0o% many
possible %ays can the assistant be chosen for gradingD
1!. .articipant samples 6 products and is as1ed to pic1 the best, the second best, and the third
best. 0o% many possible %aysD
1/. Suppose that in the taste test, each participant samples eight products and is as1ed to select
the three best products. >hat is the number of possible outcomesD
15. , contractor has 6 suppliers from %hich to purchase electrical supplies. 0e %ill select 3 of
these at random and as1 each supplier to submit a pro@ect bid. In ho% many %ays can the
selection of bidders be madeD
16. T%enty players compete in a tournament. In ho% many %ays can
AaB ran1ings be assigned to the top five competitorsD
AbB the best five competitors be randomly chosenD
17. Three balls are selected at random %ithout replacement from the @ar belo%. ?ind the
probability that one ball is red and t%o are blac1.
24
24. , university %arehouse has received shipment of 2! printers, of %hich 14 are laser printers
and 1! are in1@et models. If / of these 2! are selected at random by a technician, %hat is the
probability that e'actly 3 of those selected are laser printersD
21. There are 15 bro1en light bulbs in a bo' of 144 light bulbs. , random sample of 3 light bulbs
is chosen %ithout replacement.
AaB 0o% many %ays are there to choose the sampleD
AbB 0o% many samples contain no bro1en light bulbsD
AcB >hat is the probability that the sample contains no bro1en light bulbsD
AdB 0o% many %ays to choose a sample that contains e'actly 1 bro1en light bulbD
AeB >hat is the probability that the sample contains no more than 1 bro1en light bulbD
22. ,n agricultural research establishment gro%s vegetables and grades each one as either good
or bad for taste, good or bad for its si2e, and good or bad for its appearance. 9verall, 56C of
the vegetables have a good taste. 0o%ever, only /7C of the vegetables have both a good
taste and a good si2e. ,lso, !C of the vegetables have a good taste and a good appearance,
but a bad si2e. ?inally, 64C of the vegetables have either a good si2e or a good appearance.
AaB if a vegetable has a good taste, %hat is the probability that it also has a good si2eD
AbB if a vegetable has a bad si2e and a bad appearance, %hat is the probability that it has a
good tasteD
23. , local library displays three types of boo1s entitled JScienceK ASB, J,rtsK A,B, and
J+ovelsK A+B. )eading habits of randomly selected reader %ith respect to these types of
boo1s are
Eead eFulal8 S , + SR, SR+ ,R+ SR,R+
Pobabilit8 4.14 4.23 4.35 4.46 4.47 4.13 4.4!
?ind the follo%ing probabilities and interpret
AaB PA S O , B
AbB PA S O , M + B
AcB PA S O reads at least one B
AdB PA S M , O +B
24. , batch of !44 containers for fro2en orange @uice contains ! that are defective. T%o are
selected at random, %ithout replacement, from the batch. *et , and " denote that the first
and second selected is defective respective
AaB ,re , and " independent eventsD
AbB If the sampling %ere done %ith replacement, %ould , and " be independentD
2!. Everyday Aon to ?riB a batch of components sent by a first supplier arrives at certain
inspection facility. T%o days a %ee1, a batch also arrives from a second supplier. Eighty
percent of all batches from supplier 1 pass inspection, and 74C batches of supplier 2 pass
inspection. 9n a randomly selected day, %hat is the probability that t%o batches pass
inspectionD
21
2/. The probability is 1C that an electrical connector that is 1ept dry fails during the %arranty
period of a portable computer. If the connector is ever %et, the probability of a failure during
the %arranty period is !C. If 74C of the connectors are 1ept dry and 14C are %et, %hat
proportion of connectors fail during the %arranty periodD
25. Computer 1eyboard failures are due to faulty electrical connects A12CB or mechanical defects
A66CB. echanical defects are related to loose 1eys A25CB or improper assembly A53CB.
Electrical connect defects are caused by defective %ires A3!CB, improper connections A13CB
or poorly %elded %ires A!2CB. ?ind the probability that a failure is due to
AaB loose 1eys
AbB improperly connected or poorly %elded %ires.
26. $uring a space shot, the primary computer system is bac1ed up by t%o secondary systems.
They operate independently of one another, and each is 74C reliable. >hat is the probability
that all three systems %ill be operable at the time of the launchD
27. , store stoc1s light bulbs from three suppliers. Suppliers A, #, and + supply 14C, 24C, and
54C of the bulbs respectively. It has been determined that company A;s bulbs are 1C
defective %hile company #;s are 3C defective and company +;s are 4C defective. If a bulb
is selected at random and found to be defective, %hat is the probability that it came from
supplier #D
34. , particular city has three airports. ,irport , handles !4C of all airline traffic, %hile airports
" and C handle 34C and 24C, respectively. The rates of losing a baggage in airport ,, " and
C are 4.3, 4.1! and 4.14 respectively. If a passenger arrives in the city and losses a baggage,
%hat is the probability that the passenger arrives at airport ,D
31. , company rated 5!C of its employees as satisfactory and 2!C unsatisfactory. 9f the
satisfactory ones 64C had e'perience, of the unsatisfactory only 44C. If a person %ith
e'perience is hired, %hat is the probability that AsBhe %ill be satisfactoryD
32. In a certain assembly plant, three machines, "
1
, "
2
, "
3
, ma1e 34C, 4!C and 2!C,
respectively, of the products. It is 1no%n from past e'perience that 2C,3C and 2C of the
products made by each machine, respectively, are defective. +o%, suppose that a finished
product is randomly selected.
AaB >hat is the probability that it is defectiveD
AbB If a product %as chosen randomly and found to be defective, %hat is the probability
that
it %as produced by machine "
3
D
33. Three machines ,, " and C produce identical items of their respective output !C, 4C and
3C of the items are faulty. 9n a certain day , has produced 2!C, " has produced 34C and
C has produced 4!C of the total output. ,n item selected at random is found to be faulty.
>hat are the chances that it %as produced by CD
22
34. Suppose that a test for Influen2a ,, 01+1 disease has a very high success rate# if a tested
patient has the disease, the test accurately reports this, a ;positive;, 77C of the time, and if a
tested patient does not have the disease, the test accurately reports that, a ;negative;, 7!C of
the time. Suppose also, ho%ever, that only 4.1C of the population have that disease.
AaB >hat is the probability that the test returns a positive resultD
AbB If the patient has a positive, %hat is the probability that he has the diseaseD
AcB >hat is the probability of a false positiveD
3!. ,n insurance company charges younger drivers a higher premium than it does older drivers
because younger drivers as a group tend to have more accidents. The company has 3 age
groups# &roup , includes those less than 2! years old, have a 22C of all its policyholders.
&roup " includes those 2!(37 years old, have a 43C of all its policyholders, &roup C
includes those 44 years old and older, have 3!C of all its policyholders. Company records
sho% that in any given one(year period, 11C of its &roup , policyholders have an accident.
The percentages for groups " and C are 3C and 2C, respectively.
AaB >hat is the probability that the company;s policyholders are e'pected to have an accident
during the ne't 12 monthsD
AbB Suppose r. Chong has @ust had a car accident. If he is one of the company;s
policyholders, %hat is the probability that he is under 2!D
23
Chapter 2
%. 1iscrete rando &ariable and discrete probability distributions
*earning ob@ectives#
,t the end of this chapter, student should be able to#
$efine the random variables
$ifferentiate bet%een discrete and continuous random variables
$efine the discrete probability distributions
Sno% the special functions for discrete probability distribution
2.1 Introduction
, random variable is a rule that assigns a number to each outcome of an e'periment.
These numbers are called the measured values of the random variable. The capital letters
li1e G, H and I is used to denote a random variable and the small letters li1e ', y and 2 to
denote the measured values.
E'ample 2.1#
Select a soccer player= the random variable H is the number of goals the player has
scored during the season.
The measured values of H are 4, 1, 2, 3,
The test mar1s for 144 engineering students= the random variable I is the average number
of goals scored by the students.
The values of I are /!.4, /5.6, 54.!, 55.3,
There are t%o types of random variables called a discrete random variable and a
continuous random variable.
2.2 $iscrete random variable
24
The measured values for a discrete random variable are finite or countable. The values
are in terms of integer value. The number of students in this class is the e'ample of a
discrete random variable.
2.3 Continuous random variable
The measured values for continuous random variables are in terms of real number in the
range. It can be any values %ithin the range. The %eight of students in this class is the
e'ample of a continuous random variable.
E'ample 2.2#
Identify %hether the random variable belo% is discrete or continuous random variable.
AiB The number of female students in the class.
AiiB The number of telephone calls.
AiiiB The time bet%een t%o accidents.
AivB The number of crac1s in a certain length of road.
AvB The height of the athletes participated in the ,sian &ame.
AviB The volume of %ater in the tan1.
AviiB , score on the statistics final e'amination.
AviiiB The number of cars on the road at a certain period of time.
Solution#
AiB discrete
AiiB discrete
AiiiB continuous
AivB discrete
AvB continuous
AviB continuous
AviiB continuous
AviiiB discrete
2.4 $iscrete probability distribution
If a random variable is a discrete variable, its probability distribution is called a
discrete probability distribution or probability mass function, pmf.
Suppose the e'periment is flipping a coin t%o times. This simple experiment can have
four possible outcomes or sample space# 00, 0T, T0, and TT. +o%, let the random
variable 0 represent the number of 0eads that result from this e'periment. The random
variable G can only ta1e on the values %, ), or -, so it is a discrete random variable.
2!
The probability distribution for this e'periment appears as belo%.
+umber of
heads, 0
.robability function, P'0(
4 1<4
1 1<2
2 1<4
The above table represents a discrete probability distribution and the probability function,
P'0+#
i
( is called probability mass function ApmfB of 0 because it relates each value of a
discrete random variable %ith its probability of occurrence.
2.4.1 The properties of the probability mass function ApmfB
The pmf, P'0+#
i
( of a discrete random variable 0 must satisfied t%o conditions=
AiB 1 B A 4
i
x G P
AiiB
1 B A
i
x
i
x G P
&iven pmf, the probability of 0 occurs can be calculated. ?or e'ample the probability at
most one occurs is
B 1 A B 4 A B 1 A + G P G P G P
E'ample 2.3#
T%o balls are dra%n at random in succession %ithout replacement from an urn containing
4 red balls and / blac1 balls. ?ind the probabilities of all the possible outcomes.
Solution#
*et 0 denote the number of red balls in the outcome.
.ossible )) )" ") ""
2/
outcomes
0 2 1 1 4
0ere, x
1
F 2, x
2
F 1, x
3
F 1, x
4
F 4
+o%, the probability of getting 2 red balls %hen %e dra% out the balls one at a time is#
.robability of first ball being red F 4<14
.robability of second ball being red F 3<7 Abecause there are 3 red balls left in the urn, out
of a total of 7 balls left.B So#
*i1e%ise, for the probability of red first is 4<14 follo%ed by blac1 is /<7 Abecause there
are / blac1 balls still in the urn and 7 balls all togetherB. So#
Similarly for blac1 then red#
?inally, for 2 blac1 balls#
So the probability distribution is#
0 - ) %
P'0+#( -/)1 2/)1 1/)1
E'ample 2.4#
&iven the probability distribution,
0 % ) - 3 4 1
P'0+#( )/)% )/1 5 )/1 3/)% )/)%
?ind the value of 5 that ma1es P'0+#( a valid pmf of 0.
25
Solution#
?or .A-F'B is truly pmf of -, its must satisfy
1 B A
i
x
i
x G P
. 0ence,
14 < 1 1 14 < 1 14 < 3 ! < 1 ! < 1 14 < 1 B A + + + + + ; ; x G P
i
x
i
.
2.4.2 The cumulative distribution function AcdfB
The cdf of a discrete random variable 0 is defined by,

B A B A B A
x x
i
i
x G P x G P x J
E'ample 2.!#
&iven pmf,
0 2 1 4
P'0+#( 2<1! 6<1! !<1!
?ind the cdf of 0.
Solution#
?or # T %,
4 B A B A x G P x J
?or
1! < ! B 4 A B 4 A B A , 1 4 < x P G P x J x
?or
1! < 13 1! < 6 1! < ! B 1 A B 4 A B 1 A B A , 2 1 + + < x P x P G P x J x
?or
1 1! < 2 14 < 6 1! < !
B 2 A B 1 A B 4 A B 2 A B A , 2
+ +
+ + x P x P x P G P x J x
So the cdf of 0 is#
26

'

<
<
<

2 , 1
2 1 , 1! < 13
1 4 , 1! < !
4 , 4
B A
x
x
x
x
x J
2.4.3 The mean and the variance of 0
&iven the pmf of 0, P'0+#(, all the parameters of - such as the mean, the variance and
the standard deviation can be determined by using the e'pectation definition.

The mean of 0 is defined by,

B A B A
1

n
i
i i
x G P x G E
The variance of 0 is defined by,
2
1
2 2 2 2
B A BB A A B A B A

n
i
i i
x G P x G E G E G Ka
The standard deviation, is a s8uare root of the variance.
E'ample 2./#
*et 0 is a random variable %ith pmf,
0 2 1 4
P'0+#( 2<1! 6<1! !<1!
?ind the mean, the variance and the standard deviation of 0.
Solution#
27
The mean of - is#
6 . 4 1! < 12 B 1! < ! A 4 B 1! < 6 A 1 B 1! < 2 A 2 B A B A
1
+ +

n
i
i i
x G P x G E
The variance of - is#
( )
4.425 4./4 ( 1/<1!
B 6 . 4 A B 1! < ! A 4 B 1! < 6 A 1 B 1! < 2 A 2
B A BB A A B A B A
2 2 2 2
2
1
2 2 2 2

+ +


n
i
i i
x G P x G E G E G Ka
Standard deviation is F 4./!3
2.! Special functions for discrete probability distribution
There are many special discrete probability distributions such as "ernoulli distribution,
"inomial distribution and .oisson distribution.
2.!.1 "ernoulli distribution
The e'periment conducted %ith only t%o possible outcomes. In an e'periment of tossing
a fair coin for 1 time and 0 is the number of head. There are only t%o possible outcomes,
0 +% or 0+) %ith probability distribution#
.ossible
outcomes
0ead Tail
0 1 4
P'0+#( 1<2 1<2
2.!.2 "inomial distribution
If the "ernoulli e'periment conducted for n times, and the random variable 0 is the
number of success, then the probability distribution of 0 is called "inomial distribution
%ith pmf,
34
,....... 3 , 2 , 1 , 4 , B A

,
_



x L p
x
n
x G P
x n x
%here p is the probability of success and 6+)7p.
"y using the definition, it can be sho%n that, if 0 is a "inomial distribution, then the
mean of 0 is "'0( + np and the variance of 0, is Var'0( + np6.
E'ample 2.5#
In the e'periment of tossing a fair coin for 14 times, and 0 is the number of head.
AiB >hat is the pmf of 0D.
AiiB ?ind the probability the head %ill appear e'actly ! times.
AiiiB >hat is the probability no headD
AivB ?ind the mean and the variance of G.
Solution#

AiB The probability mass function of 0 is given by#
,....... 3 , 2 , 1 , 4 , B ! . 4 A B ! . 4 A
14
B A
14

,
_



x
x
x G P
x x
AiiB
24/ . 4 B ! . 4 A B ! . 4 A
N ! N !
N 14
B ! . 4 A B ! . 4 A
!
14
B ! A
! ! ! !

,
_

G P
31
AiiiB
44475 . 4 B ! . 4 A B ! . 4 A
N 14 N 4
N 14
B ! . 4 A B ! . 4 A
4
14
B 4 A
14 4 14 4

,
_

G P
AiiiB The mean of 0 is np + )%'%81(+1
The variance of 0 is np6+)%'%81('%81(+-81
2.!.3 .oisson distribution
,nother important discrete distribution is a .oisson distribution. The random variable G
is the number of occurrences in the interval of interest. The e'ample of .oisson
distribution is the number of accidents %ithin a certain period of times. If 0 is a random
variable %ith a .oisson distribution then the pmf of 0 is given by,
,....... 3 , 2 , 1 , 4 ,
N
B A

x
x
e
x G P
x

%here is the mean of 0 for the interval of interest.


"y using the definition, it can be sho%n that, if 0 is a .oisson distribution, then the mean
of 0 is "'0( + and the variance of 0, is Var'0( + .
E'ample 2.6#
,nneLs ans%ering machine receives about / telephone calls bet%een 6 a.m. and 14 a.m.
>hat is the probability that ,nne receives more than 1 call in the ne't 1! minutesD
Solution#
*et G F the number of calls ,nne receives in 1! minutes. AThe interval of interest is 1!
minutes. The random variable G ta1es on the values 4, 1, 2,.. If ,nne receives, on the
average, / telephone calls in 2 hours, then ,nne %ill receives 1<6 F 4.5! calls in 1!
minutes, on the average. So, it means that is 4.5!.
0ence, the probability that ,nne receives more than 1 call in the ne't 1! minutes is,
154 . 4 B 3!4 . 4 452 . 4 A 1
N 1
B 5! . 4 A
N 4
B 5! . 4 A
1
BU 1 A B 4 A I 1 B 1 A 1 B 1 A
1 5! . 4 4 5! . 4
+
1
]
1

+
+ >

e e
G P x P G P G P
32
"#ercise %
1. Identify each of the random variables as continuous or discrete random variable.
AaB The number of atoms
AbB The number of fish in a pond
AcB The home team score in a football game
AdB The voltage on a po%er line
AeB , score on the mathematic final e'am
AfB The volume of gas in the tan1
AgB The number of cars at the petrol station
AhB The number of accidents in Ipoh
AiB The number of ca1es left in the pantry
A@B The height of civil engineering students in MT.
2. *et,
x $ 1 2 3
$"1
5
$"2
5
; $"3
5
AiB ?ind the value of ; that result in a valid probability distribution.
AiiB ?ind the e'pected number of G and the standard deviation of G.
AiiiB >hat is the probability that G greater than or e8ual to 1D
3.,t MT., the business students run an investment club. Each semester they create investment
portfolios in multiples of )1,444 each. )ecords from the past several years sho% the
follo%ing probabilities of profits Arounded to the nearest )!4B. In the table belo%, x F
profit per )1, 444 and PAxB is the probability of earning that profit.
x $ 5$ 1$
$
15
$
2$$
$"1
5
$"3
5
; $"2 $"$
5
AaB $etermine the value of ; that results in a valid probability distribution.
AbB The profit per )1, 444 is a random variable. Is it discrete or continuousD E'plain.
AcB ?ind the e'pected value of the profit in a V1,444 portfolio.
AdB ?ind the standard deviation of the profit.
AeB >hat is the probability of a profit of V1!4 or more in a )1, 444 portfoliosD
33
4. *et G denote the number of bars of service on your cell phone %henever you are at an
intersection %ith the follo%ing probabilities#
x 4 1 2 3 4 !
4.4
!
4.1
!
4.2
4
4.3
!
4.1
!
4.
1
$etermine the follo%ing#
AaB JAxB
AbB ean and variance
,/- P,G M 2-
,d- P,G N2"5-
!. , local cab company is interested in the number of pieces of luggage a cab carries on a ta'i
run. , random sample of 2/4 ta'i runs gave the follo%ing information. x F number of pieces
of luggage and > is the fre8uency %ith %hich ta'i runs carried x pieces of luggage.
# : 4 1 2 3 4 ! / 5 6 7 14
f : 42 !1 /3 36 17 1/ 12 14 / 2 1
AaB ?ind the probability distribution for x.
AbB Estimate the probability that a ta'i run %ill have from 4 to 4 pieces of luggage
Aincluding 4 and 4B.
AcB >hat is the e'pected value of #D
AdB >hat is the standard deviation of x.
/. , .rofessor estimates the probability that he %ill receive at least one telephone call at home
during the hours of !pm to 5pm on a %ee1day to be 1<3. Mse the formulas for computing
binomial probabilities to ans%er the follo%ing 8uestions#
AaB >hat is the probability that he %ill receive at least one call on all five of the ne't five
%ee1day nightsD
AbB >hat is the probability that he %ill not receive a call on any of the ne't five %ee1day
nightsD
AcB >hat is the probability that he %ill receive a call on at least four of the ne't five %ee1day
nightsD
5. The probability of successfully landing a plane using a flight simulator is given as 4.64. +ine
randomly and independently chosen student pilots are as1ed to try to fly the plane using the
simulator.
AaB >hat is the probability that all the student pilots successfully land the plane using the
simulatorD
AbB >hat is the probability that none of the student pilots successfully lands the plane using
the simulatorD
34
AcB >hat is the probability that e'actly eight of the student pilots successfully land the plane
using the simulatorD
6.Suppose G has a .oisson distribution a mean of 5. $etermine the follo%ing.
AaB P,G = $-O
AbB P,G = 5-O
AcB P,G M 3-O and
AdB
B 4 A G P
.
7.,t the c $onald drive(thru %indo% of food establishment, it %as found that during slo%er
periods of the day, vehicles visited at the rate of 1! per hour. $etermine the probability that
AaB no vehicles visiting the drive(thru %ithin a ten(minute interval during one of these slo%
periods=
AbB only 3 vehicles visiting the drive(thru %ithin a ten(minute interval during one of these
slo% periods= and
AcB at least three vehicles visiting the drive(thru %ithin a ten(minute interval during one of
these slo% periods.
14. The number of crac1s in a section of .*MS high%ay that are significant enough to re8uire
repair is assumed to follo% a .oisson distribution %ith a mean of t%o crac1s per 1ilometer.
$etermine the probability that
AaB there are no crac1s at all in 21m of high%ay=
AbB at least one crac1 in !44meter of high%ay= and
AcB there are e'actly 3 crac1s in 4.!1m of high%ay.
3!
Chapter 3
-. Continuous probability distributions
*earning ob@ectives#
,t the end of this chapter, student should be able to#
$efine the continuous probability distributions
Sno% the special functions for continuous probability distribution
3.1 Introduction
If the outcomes of the e'periment conducted are continuous random variables, its
probability distribution is called a continuous probability distribution or probability
density function, pdf.
3.1.1 The properties of the probability density function ApdfB
The pdf, f'#( of a continuous random variable 0 must satisfied t%o conditions=
A@B
1 B A 4 x >
AiiB
1 B A


dx x >
&iven pdf, the probability of 0 occurs can be calculated. ?or e'ample the probability at
most one occurs is



1
B A B 1 A dx x > G P
E'ample 3.1#
*et 0 be continuous random variable %ith pdf given by,
3/

'

else%here , 4
2 4 ,
B A
2
x ;x
x >
?ind the value of 5 that ma1es f'#( a valid pdf of 0.
Solution#
To be a valid pdf, f'#( must satistify,
1 B A


dx x >
So,
6
3
1
3
6
4
2
3
B A
3
2
4
2


; ;
x
; dx ;x dx x >
3.1.2 The cumulative distribution function AcdfB
The cdf of a continuous random variable 0 is defined by,



x dx x > x G P x J
x
, B A B A B A
E'ample 3.2#
*et 0 be continuous random variable %ith pdf given by,

'

else%here , 4
1 4 , 3
B A
2
x x
x >
?ind,
)i* P'0 9 %81($ P' %81 9 0 9 %8:1(
AiiB The cdf of 0.
Solution#
35
AiB
6
1
B ! . 4 A
4
! . 4
3 B A B ! . 4 A
3 3 ! . 4
4
2 ! . 4

<

x dx x dx x > G P
/4
17
B ! . 4 A B 5! . 4 A
! . 4
5! . 4
3 B A B 5! . 4 ! . 4 A
3 3
3 5! . 4
! . 4
2 5! . 4
! . 4

< < x dx x dx x > G P


36
AiiB The cdf of 0,

<

x
dx x > x G P x J 4 B A B A B A 4, ?or '


+

x x
x dx x dx dx x > x G P x J x
4
4
3 2
3 4 B A B A B A , 1 4 ?or

+

+ >

x x
dx dx x dx dx x > x G P x J x
1
4 1
4
2
1 4 3 4 B A B A B A , 1 ?or
So the cdf of 0 is#

'

>

<

1 , 1
1 4 ,
4 , 4
B A
3
x
x x
x
x J
3.1.3 The mean and the variance of 0
&iven the pdf of 0, f'#(, all the parameters of 0 such as the mean, the variance
and the standard deviation can be determined by using the e'pectation definition.

The mean of 0 is defined by,




dx x x> G E B A B A
The variance of 0 is defined by,
2 2 2 2
B A BB A A B A B A




dx x x> G E G E G Ka
The standard deviation, is a s8uare root of the variance.
E'ample 3.3#
*et 0 be continuous random variable %ith pdf given by,
37

'

else%here , 4
2 4 ,
6
3
B A
2
x x
x >
?ind the mean and the variance of 0.
Solution#
The mean of 0 is
! . 1
2
3
4
2
32
3
6
3
B A B A
4
3
2
4




x
dx x dx x x> G E
The variance of 0 is,
1! . 4
24
3
4
7
!
12
4
7
4
2
44
3
4
7
6
3

B ! . 1 A
6
3
B A B A
!
2
4
4
2
2
4
2 2 2 2 2




x dx x
dx x x dx x > x G Ka
3.2 Special functions for continuous probability distribution
There are many special continuous probability distributions such as Mniform
distribution, E'ponential distribution, &amma distribution and +ormal
distribution.
3.2.1 Mniform distribution
The random variable 0 is a uniform distribution, and then the pdf of 0 is given
by,
44

'

else%here , 4
,
1
B A
b x a
a b
x >
"y using the definition, it can be sho%n that, if 0 is a uniform distribution, then
the mean of 0 is "'0( + 'b ; a( /- and the variance of 0, is Var'0( + 'b ; a(
-
/)-.
3.2.2 E'ponential distribution
The random variable 0 is an e'ponential distribution, and then the pdf of 0 is
given by,

'

else%here , 4
4 ,
B A
x e
x >
x

"y using the definition, it can be sho%n that, if 0 is a uniform distribution, then
the mean of 0 is "'0( + 1/ and the variance of 0, is Var'0( +)/
2
.
E'ample 3.4#
*et 0 is the number of individuals failing in a large group and has a e'ponential
distribution. If %e assume that the mean of 0 under a certain situation is 14, %hat
is the probability that more than 24 %ill fail at the same timeD
Solution#
The random variable 0 has pdf,

'

else%here , 4
4 ,
B A
x e
x >
x


%here is the 1<mean of 0. So, F 1<14F4.1
41
0ence,
13! . 4
24
B 1 . 4 A B 24 A
2 1 . 4
24
1 . 4

>

e e dx e G P
x x
3.2.3 +ormal distribution
The random variable 0 is a +ormal distribution, and then the pdf of 0 is given by,

x e x >
x
,
2
1
B A
2
2
B A


"y using the definition, it can be sho%n that, if 0 is a random variable %ith
normal distribution, then the mean of 0 is "'0( + and the variance of 0, is
Var'0( +
2
. If 0 is a random variable %ith normal distribution, then 0 is al%ays
be %ritten as,
0 < =' ,
2
)
This distribution is called non(standard normal distribution. The probability of 0
can be found by integrate the pdf, f'#(. "ut this integration is not easy to calculate.
"y using the transformation, W F (#)/$ then the random variable 0 %ill be
change to random variable W %ith pdf,



P e P >
P
,
2
1
B A
2

$ is a random variable normally distributed %ith mean % and variance ), and


al%ays be %ritten as,
$ < ='%,)(
This distribution is called standard normal distribution.
The probability of > occurs can be calculated from pdf and the values of the
integration B A B A B A P dP P > P I P
P

<

are tabulated in the Standard +ormal
$istribution Table.
E'ample 3.!#
,fter completing a study, the civil engineering department in Mniversiti
Te1nologi .ET)9+,S AMT.B concluded that the time MT. employees spend
commuting to %or1 each day is normally distributed %ith a mean e8ual to 1!
minutes and a standard deviation e8ual to ! minutes. 9ne employee has indicated
42
that he commutes 2! minutes per day. ?ind the probability that an employee
%ould commute 2! or more minutes per day,
Solution#
The random variable 0 is the time employees spend commuting to %or1 and
0 < =' ,
2
)
%here + )1 and
-
+'1(
-
0ence,
423 . 4 755 . 4 1 B 2 A 1 B 2 A 1
B 2 A
!
1! 2!
B 2! A


,
_


P P
P P
x
P G P

"#ercise -
1.Suppose that G is a continuous random variable having the probability density function

AaB ?ind the value of constant ;
AbB ?ind P,-$"5MGM$"5-
AcB $etermine x such that .A G N xB F 4.!
AdB $etermine the mean and the variance of G"
2. *et G be a continuous random variable %ith pdf given by

'
< <

else%here , 4
2 1 ,
B A
x x ;
x >
?ind
AaB the value of constant ;
,b- P,G M 1-
AcB the mean of G
AdB the standard deviation of G.
43

'

< <

else?<ee
>o ;
>
, 4
1 ' 1 '
B ' A
2
3.*et G be a continuous random variable %ith pdf given by

'

>

4 , 4
4 ,
B A
2
x
x ;xe
x >
x
?ind
AaB the value of constant ;
,b- P,G N 1-
,/- P,$ M G M 2-
AdB the mean of G
AeB the variance of G.
4. *et G be a continuous random variable %ith pdf given by

'

else%here , 4
3 4 ,
B A
2
x ;x
x >
?ind
AaB the value of constant ;
AbB the /d>, J,x-
,/- P,G N1-
AdB the mean of G
AeB the variance of G.
!. ?ind the cumulative probability distribution of G given that the density function is
?ind
AaB the value of constant ;
AbB the /d>, J,x-
,/- P,$"25 M G M $"5-
AdB the mean of G
AeB the variance of G.
/.Suppose a random variable, G has a uniform distribution %ith a F ! and b F 7. ?ind
AaBPA!.! T G T 6B
AbBPAG T 5B
AcBthe mean of G
44

'

< <

else?<ee
x >o x ;
x >
, 4
1 4 B, 1 A
B A
4
AdBthe standard deviation of G.
5.*et G be an e'ponential random variable %ith X F 4.41. Calculate the follo%ing
probabilities#
AaBPAG M !4B
AbBPAx 3 /4B
AcBPA!4 T x T /4B
AdB>hat is the mean and the variance of G"
6. The lifetime of a certain electronic component is 1no%n to be e'ponentially
distributed %ith a mean lifetime of 144 hours. >hat is the probability that
AaB the lifetime of the component is more than 144hoursD
AbB the lifetime of the component is bet%een !4 to 144hoursD
AcB a component %ill fail before !4hoursD
7.The time bet%een telephone calls to ,ST)9, a cable television payment processing
center follo%s an e'ponential distribution %ith a mean of 1.! minutes. >hat is the
probability that the time bet%een the ne't t%o calls
AaB at least 4! secondsD
AbB %ill be bet%een !4 to 144 secondsD= and
AcB at most 1!4 secondsD
14. The mean %eight of !44 MT. students is /61g and the variance is 52.2!1g. ?ind the
probability of students %ho %eight
AaB bet%een /!1g and 521g
AbB more than 541g
11. ,n average *C$ .ro@ector bulb manufactured by the ,"C Corporation lasts 344 days
%ith variance of 2!44days. "y assuming that the bulb life is normally distributed,
%hat is the probability that the bulb %ill last
AaB at most 3/! daysD
AbB bet%een 2!4days and 3!4daysD
AcB at least 444daysD
12. The line %idth of a tool used for semiconductor manufacturing is assumed to be
normally distributed %ith a mean of 4.! micrometer and a standard deviation of 4.4!
micrometer.
AaB >hat is the probability that a line %idth is greater than 4./2 micrometerD
AbB >hat is the probability that a line %idth is bet%een 4.45 and 4./3 micrometerD
AcB The line %idth of 74C of samples is belo% %hat valueD
oooOOOooo
4!
Chapter 4
0. 1ata display and suary of data
*earning ob@ectives#
,t the end of this chapter, student should be able to#
E'plain the different bet%een population and sample
?ind the sample mean, sample variance and sample standard deviation
.lot data using stem and leaf display
Construct the "o'(.lot
4.1 Introduction
The ma@or use of inferential statistics is to use information from a sample to infer
something about a population. , population is a collection of data %hose
properties are analy2ed. The population is the complete collection to be studied= it
contains all sub@ects of interest. , sample is a part of the population of interest, a
sub(collection selected from a population. , parameter is a numerical
measurement that describes a characteristic of a population, %hile a statistic is a
numerical measurement that describes a characteristic of a sample. In general, %e
%ill use a statistic to infer something about a parameter.
4.2 ean and variance
4/
The mean is the sum of all numbers in the list divided by the total numbers in the
list. If the given list is Statistical .opulation then the mean is called .opulation
ean and the given list is a Statistical Sample, then the mean is called Sample
mean. The mean has an e'pected value of Y, 1no%n as the population mean. The
sample mean ma1es a good estimator of the population mean, as its e'pected value
%hich is as the same as the population mean.
9ften, since the population variance is an un1no%n parameter, it is estimated by
the mean sum of s8uares, %hich changes the distribution of the sample mean from
a normal distribution to a StudentLs t distribution %ith n Z 1 degrees of freedom.
The mean and the variance of population and sample mean and sample variance
can be e'pressed as follo%s. "y using the follo%ing e8uations %e can identify the
difference.
.opulation ean and :ariance are defined as#
2
1
2 1
B A
1
:ariance ean

!
i
i
!
i
i
x
!
Q
!
x
%here = is the si2e of the .opulation.
Sample ean and sample variance are defined as#
2
1
2 1
B A
1
1
:ariance ean x x
n
s
n
x
x
!
i
i
n
i
i

%here n is the sample si2e


E'ample 4.1#
&iven the sample data as !!, /6, 74, 42, 67, 54. ?ind the sample mean and the
sample variance of this data.
Solution#
45
/ . 3!3 U
/
B 414 A
34334 I
!
1

B A
1
1
B A
1
1

is, variance The
/7
/
414
/
54 67 42 74 /6 !!

is, ean The
2
1
2
1
2 2
1
2
1

1
1
1
]
1


+ + + + +

n
x
x
n
x x
n
s
n
x
x
n
i
i n
i
i
n
i
i
n
i
i
4.3 , Stem and *eaf plot
$ata can be sho%n in a variety of %ays including graphs, charts and tables. ,
Stem and *eaf plot is a type of graph that is similar to a histogram but sho%s
more information. The Stem(and(*eaf plot summari2es the shape of a set of data
Athe distributionB and provides e'tra detail regarding individual values.
The data is arranged by place value. The digits in the largest place are referred to
as the stem and the digits in the smallest place are referred to as the leaf AleavesB.
The leaves are al%ays displayed to the left of the stem. Stem and *eaf plots are
great organi2ers for large amounts of information. It provides an at [a glance; tool
for specific information in large sets of data, other%ise one %ould have a long of
mar1s to sift through and analy2e. The totals of data, median and mode are also
can be determined by Stem and *eaf plots. They are usually used %hen there are
large amounts of numbers or data to analy2e. Series of scores on sports teams,
series of temperatures or rainfall over a period of time, series of classroom test
scores are e'amples of %hen Stem and *eaf plots could be used.
E'ample 4.2#
The follo%ing data is the temperatures for ,ugust in alaysia.
55 64 62 /6 /! !7 /1
!! !4 /2 /1 54 /7 /4
/! 54 /2 /! /! 5! 5/
46
6! 64 62 63 57 57 51
64 55 67
Mse the Stem and *eaf plot to determine the mode and the median for the
temperatures.
Solution#
?irst step should be to place the numbers in order from smallest to the largest.
The mode is /! and the median is 54.
4.4 , "o' plot
In descriptive statistics, a bo' plot or Aalso 1no%n as a bo'(and(%his1er diagramB
is an e'cellent visual summary of many important aspects of a data distribution
through their five(number summaries# the smallest observation Asample
minimumB, lo%er 8uartile A\1B, median A\2B, upper 8uartile A\3B, and largest
observation Asample ma'imumB. , bo' plot may also indicate %hich observations,
if any, might be considered outliers. "o' plot can be dra%n either hori2ontally or
vertically.
4.4.1 Construct a bo' plot
Step 1# .lace the numbers in order from smallest to the largest.
Step 2# ?ind the median, \2, t<e lo%er 8uartile, \
2
and the upper 8uartile, \
3
of a
given set of data.
Step 3# ?ind the inter8uartile range AI\)B. The I\) is the difference bet%een the
upper 8uartile and the lo%er 8uartile.
47
Temperatures
Tens Ones
5 0 5 9
6 1 1 2 2 4 5 5 5 5 8 9
7 0 0 1 5 6 7 7 9 9
8 0 0 0 2 2 3 5 9
Step 4# Start to dra% the "o'(plot either hori2ontally or vertically.
Step !# Calculate the 1.!I\) and determine the range of 1.!I\) from upper
8uartile and the lo%er 8uartile. The valueAsB that place outside of the
1.!I\) range called the outlierAsB. The valueAsB that place outside of the
3I\) range called the e'treme outlierAsB.
E'ample 4.3
Suppose that thirty MT. students live in :illage 2. These are the follo%ing ages#
16, 24, 21, 2/, 24, 17, 2!, 24, 22, 21,
17, 24, 2!, 26, 24, 24, 2/, 24, 3!, 15,
16, 24, 24, 21, 22, 25, 2!, 26, 25, 24.
Step 1# .lace the numbers in order from smallest to the largest.
15, 16, 16, 17, 17, 24, 24, 24, 24, 24,
21, 21, 21, 22, 22, 24, 24, 24, 24, 24,
2!, 2!, 2!, 2!, 2/, 2/, 25, 25, 26, 3!.
Step 2# ?ind the median, \2, the lo%er 8uartile, \
2
and the upper 8uartile, \
3
of a
given set of data.
The median, \
2
F A-
1!
] -
1/
B<2 F A22]24B<2F23
The position of \
1
F A4.2!B An]1B F 4.2!A31B F 5.5!
So the lo%er 8uartile, \
1
is -
5
] 45!A-
6
(-
5
B F24 ] 4.5!A24(24BF24
The position of \
3
F A4.5!B An]1B F 4.5!A31B F 23.2!
So the upper 8uartile, \
3
is -
23
] 4.2!A-
24
(-
23
B F2!]4.2!A2!(2!B F 2!
!4
Step 3# The inter8uartile range AI\)B F \
3
( \
1
F 2! P 24 F !
The 1.!I\) F 5.! and 3I\) F 1!
Step 4# Start to dra% the "o'(plot either hori2ontally or vertically.
outlier
15 26 o3!



\
1
F24 \
2
F23 \
3
F2!
12.!.T^1.!I\)^3T ((((((((((((^..I\)F! ((((((((((((((((3T^1.!I\)^3.32.!
"#ercise 0:
1. ?ind the mean, median and mode for the follo%ing observations#
/.! 5.6 4./ 3.5 /.! 7.2 12.1 /.! 3.5 14.6

2. ?ind the mean, median and mode for the follo%ing observations#
2.3 3./ 2./ 2.6 3.2 3./ 4.3 !.2 /.7 2.6 3./
3. Seven o'ide thic1ness measurements of %afers are studied to assess 8uality in a
semiconductor manufacturing process. The data Ain angstromsB are# 12/4, 1264, 1341,
1344, 1272, 1345, and 125!. Calculate the sample average, variance and standard
deviation.
4. The follo%ing data are direct solar intensity measurements A%atts<m
2
B on different
days at a location in southern Spain# !/2, 6/7, 546, 55!, 55!, 544, 647, 6!/, /!!, 64/,
656, 747, 716, !!6, 5/6, 654, 716, 744, 74/, //1, 624, 676, 73!, 7!2, 7!5, /73, 63!,
!1
74!, 737, 7!!, 7/4, 476, /!3, 534, 5!3. Calculate the sample mean, variance and
sample standard deviation.
!. ?ind the mean, variance and standard deviation of the follo%ing samples of mar1s for
the probability and statistics final e'amination.
64.7 61.7 64.6 57.4 56.2 5/.!
5!.4 53.6 52.5 52./ 51.4 54.7
/7.3 /6./ /5.! //.6 /!.2 /4.4
!7.! !6.3 !6.! !5./ !/.7 !!.2
46.2 46.4 45.6 4/.! 4!.7 44./
36.3 35.4 3/.6 3/.! 3!./ 34.7
36.4
/. ?ind the mean, variance and standard deviation of the follo%ing samples of mar1s for
the engineering dra%ing course.
76.4 76.1 76.4 75.6 7/.4 7!.2 74.3 72./ 71.6 74.!
67./ 66.5 65.3 6/.6 6!.5 64.2. 63.5 62.6 64.! 64.6
57.5 56.2 55.4 55.4 5/.6 5!.7 54.2 53.7 52./ 51.4
/7.6 /6./ /5.! //.6 /!.2 /4.4 /3.5 /2.6 /1.4 /4.5
!7.2 !6.3 !6.! !5./ !/.7 !!.2 !4.5 !3.7 !2.7 !1.2
!7./ 46.4 45.6 4/.! 4!.7 44./ 43.6 42.5 41.6 44./
37.6 35.4 3/.6 3/.! 3!./ 34.7 33.2 33.6 32.5 31./
5.The shear strengths of 144 spot %elds in a titanium alloy follo%. Construct a stem(and(
leaf diagram for the %eld strength data and comment on any important features that
you notice.
!44
6
!43
1
!45
!
!44
2
!35
/
!36
6
!4!
7
!42
2
!41
/
!43
!
!42
4
!42
7
!44
1
!44
/
!46
5
!41
/
!36
2
!3!
5
!36
6
!4!
5
!44
5
!4/
7
!41
/
!35
5
!4!
4
!35
!
!44
7
!4!
7
!44
!
!42
7
!4/
3
!44
6
!46
1
!4!
3
!42
2
!3!
4
!42
1
!44
/
!44
4
!4/
/
!37
7
!37
1
!45
5
!44
5
!32
7
!45
3
!42
3
!44
1
!41
2
!36
4
!44
!
!43
/
!4!
4
!4!
3
!42
6
!41
6
!4/
!
!42
5
!42
1
!37
/
!36
1
!42
!
!36
6
!36
6
!35
6
!46
1
!36
5
!44
4
!46
2
!44
/
!44
1
!41
1
!37
7
!43
1
!44
4
!41
3
!44
/
!34
2
!4!
2
!42
4
!2
!4!
6
!46
!
!43
1
!41
/
!43
1
!37
4
!37
7
!43
!
!36
5
!4/
2
!36
3
!44
1
!44
5
!36
!
!44
4
!42
2
!44
6
!3/
/
!43
4
!41
6
AaB Construct a stem(and(leaf display for these data.
AbB ?ind the median, the 8uartiles, and the !th and 7!th percentiles.
6. The data that follo% represent the yield on 74 consecutive batches of ceramic
substrate to %hich a metal coating has been applied by a vapor(deposition process.
74.1 65.3 74.1 72.4 64./ 6!.4
73.2 64.1 72.1 74./ 63./ 6/./
74./ 74.1 7/.4 67.1 6!.4 71.5
71.4 7!.2 66.2 66.6 67.5 65.!
66.2 6/.1 6/.4 6/.4 65./ 64.2
6/.1 74.3 6!.4 6!.1 6!.1 6!.1
7!.1 73.2 64.7 64.4 67./ 74.!
74.4 6/.5 56.3 73.5 74.4 7!./
72.4 63.4 67./ 65.5 74.1 66.3
65.3 7!.3 74.3 74./ 74.3 64.1
6/./ 74.1 73.1 67.4 75.3 63.5
71.2 75.6 74./ 66./ 7/.6 62.7
6/.1 73.1 7/.3 64.1 74.4 65.3
74.4 6/.4 74.5 62./ 7/.1 6/.4
67.1 65./ 71.1 63.1 76.4 64.!
AaB Construct a cumulative fre8uency plot and histogram for the yield
AbB Construct a stem(and(leaf display for these data.
AcB ?ind the median, the 8uartiles, and the !th and 7!th percentiles for the yield
7. The average age of the football players on each team of the premier league as follo%s.
27.4 27.6 27.4 31.6 32.5 34.4
26.! 25.7 34.7 27.3 26.6 26./
27.1 31.4 34.5 34.3 27.5 31.4
26.4 26.7 25.5 26.5 34.! 27.6
2/./ 25.7 25.7 27.7 27.3 26.1
AaB Construct a cumulative fre8uency plot and histogram for the yield
AbB Construct a stem(and(leaf display for these data.
!3
AcB ?ind the median, the 8uartiles, and the !th and 7!th percentiles for the yield
14. The follo%ing J cold start ignition timeK of an automobile engine obtained for a test
vehicle are as follo%s#
1.5! 1.72 2./2 2.3! 3.47 3.1! 2.!3 1.71
AaBCalculate the sample median, the 8uartiles and the I\)
AbBConstruct a bo' plot of the data.
11. The follo%ing data are the @oint temperatures of the 9(rings A_?B for each test firing or
actual launch of the space shuttle roc1et motor Afrom Pesidential +o77ission on t<e
Spa/e S<uttle +<allenFe A//ident, :ol. 1, pp. 127P131B# 64, 47, /1, 44, 63, /5, 4!,
//, 54, /7, 64, !6, /6, /4, /5, 52, 53, 54, !5, /3, 54, 56, !2, /5, !3, /5, 5!, /1, 54, 61,
5/, 57, 5!, 5/, !6, 31.
AaB Compute the sample mean and sample standard deviation=
AbB Calculate the median, the 8uartiles and the I\)=
AcB Construct a bo' plot of the data and comment on the possible presence of outliers.
12. Ipoh .antai 0ospital compiles data on the length of stay by patients in short(term
hospitals. , random sample of 26 patients yielded the follo%ing data on length of
stay, in days.
3 / 1! 5 3 !! 1
4 4 12 16 7 / 12
! 14 13 5 1 23 7
/ 6 11 7 4 21 14
AaB Compute the sample mean and sample standard deviation=
AbB Calculate the median, the 8uartiles and the I\)=
AcB Construct a bo' plot of the data and comment on the possible presence of outliers.

oooOOOooo
Chapter 5
2. Rando saple$ central liit theore and 3oral Appro#iation.
Statistical process control
*earning ob@ectives#
,t the end of this chapter, student should be able to#
!4
$efine the random sample and sample mean
Mse the Central *imit Theorem to define the sample mean distribution
$efine the +ormal ,ppro'imation to "inomial and .oisson distribution
Construct the -(bar chart and ) chart in statistical process control
!.1 )andom sample and sample mean
In statistical terms, a random sample is a set of independent random variables G
1
,
G
2
, (, G
n
that have been dra%n from a population in such a %ay that each
random variable %as selected has the same distribution and has the same chance
of being selected.
Sample mean is the average of the sample. If %e have n observation in one
sample, the sample mean is the total of the observation divide by the number of
sample si2e, n.
!.2 Central *imit Theorem and sample mean distribution
Central limit theorem says that if the sample si2e is large, and a random sample is
a set of independent random variables G
1
, G
2
, (, G
n
has a normal distribution %ith
mean, and variance,
2
then the sample mean, G is also normally distributed
%ith mean, and variance,
2
/n. That is
B < , A `
2
n ! G
E'ample !.1#
,t chemical engineering department, Mniversiti Te1nologi .ET)9+,S, the mean
age of the students is 24./ years old, and the variance is 24 years. , random
sample of 64 students is dra%n from 2!4 students. >hat is the probability that the
average age of these students is greater than 22 years oldD
Solution#
4646 . 4 7172 . 4 1 B 4 . 1 A 1
B 4 . 1 A 1 B 4 . 1 A B
2! . 4
/ . 24 22
A B 22 A So,
B 2! . 4 , / . 24 A ` 0ence,
2! . 4
64
24
B A and / . 24 B A of mean the , 64 ?or
24 B A of variance the and / . 24 B A of mean The
2

>

> >


I P I P I P G P
! G
n
G K G E G n
G K G G E G

!!
!.3 +ormal ,ppro'imation
The binomial and .oisson distributions are discrete random variables, %hereas the
normal distribution is continuous. >e need to ta1e this into account %hen %e are
using the normal distribution to appro'imate a binomial or .oisson using a
continuity correction.
The continuity correction, ! . 4 t for probability of 0 is depend on the ine8uality
sign,
> < , , ,
. ?or e'ample P'0 9 a( + P'0 7 %81 9 a 7 %81B and for
B ! . 4 ! . 4 A B A + + a G P a G P
!.3.1 +ormal appro'imation to "inomial
The Central *imit Theorem says that as n increases, the binomial distribution
%ith n trials and probability p of success gets closer and closer to a normal
distribution. That is, the binomial probability of any event gets closer and closer
to the normal probability of the same event.
The normal distribution is a good appro'imation to "inomial %hen n is
sufficiency large and p is not too close to % or ). 0o% large n needs to be depends
on the value of p. It is better to be conservative and limit the use of the normal
distribution as an appro'imation to the binomial %hen np 4 2 and n)1 5 p* 4 2.
That is, if %e have a random variable 0 < ?in'n , p( and n is large and p is small
such that np @ 1, than 0 can be calculated appro'imately using the +ormal
distribution. It means that the random variable 0 %ill be normally distributed %ith
mean A + np and variance, B 1 A
2
p np i.e 0 < =',
2
(.
E'ample !.2#
Suppose in e'periment of tossing a fair coin for !4 times. >hat is the probability
of getting bet%een 7 and 11 headsD
Solution#
*et - be the random variable representing the number of heads thro%n.
0 < ?in '1%, %81(
!/
Since n is large and np @ 1, then %e can use normal appro'imation to find the
probability. It mean that no%, - is normally distributed %ith mean np +-1 and
variance )-81. i.e 0 < = '-1, )-81(. 0ence,
5//! . 4 4734 . 4 6!77 . 4 B 32 . 1 A B 46 . 1 A
B 6 . 4 . 1 32 . 1 A
! . 12
2! ! . 11
! . 12
2! ! . 6
! . 12
B 2! B ! . 4 11 A
! . 12
B 2! B ! . 4 7 A
B 11 7 A


,
_

,
_

+



I P I P
I P G P
!.3.2 +ormal appro'imation to .oisson
The normal distribution can also be used to appro'imate the .oisson distribution
for large values of Athe mean of the .oisson distributionB.
That is, if %e have a random variable 0 < Poisson '( and is large than 0 can
be calculated appro'imately using the +ormal distribution. It means that the
random variable 0 %ill be normally distributed %ith mean A + and variance,

2
i.e 0 < = ' , (
E'ample !.3#
, car hire firm has 24 cars to hire. The number of demands for a car is hired per
day is a .oisson distribution %ith mean of 3. Calculate the probability that at most
ten cars %ill be hired in one day.
Solution#
*et a random variable 0 denotes the number of demands for a car.
The given mean value is 3. "y the .oisson distribution
24 ....., . . . . , 4 , 3 , 2 , 1 , 4 ,
N
B A

x
x
e
x G P
x

Since is large, then the probability can be calculated using a normal


appro'imation %ith mean 3 and variance is also 3. i.e 0 < ='3 , 3) .
0ence,
774 . 4 B ! . 2 A B ! . 2 A
3
3 B ! . 4 14 A
3
3 B ! . 4 A
B 14 A

,
_

+

I P
G
P G P
!.4 Statistical process control
!5
Statistical process control AS.CB, is a po%erful tools that implement the concept
of prevention as a shift from the traditional 8uality by inspection<correction. S.C
is a techni8ue that employs statistical tools for controlling and improving
processes. It is an important ingredient in continuous process improvement AC.IB
strategies. It uses simple statistical means to control, monitor, and improve
processes.
,mong the most commonly used tools of S.C#
histograms
cause(and(effect diagrams
.areto diagrams
control charts
scatter or correlation diagrams
run charts
process flo% diagrams
The most important S.C tool is called control charts. That is a graphical
representations of process performance over time concerned %ith ho% Aor
%hetherB processes vary at different intervals and identifying nonrandom or
assignable causes of variation. The control charts are also providing a po%erful
analytical tool for monitoring process variability and other changes in process
mean. There are t%o common charts use in the S.C. The G ( chart and )(chart.
The G and )ange, ) Charts are a set of control charts for variables data Adata
that is both 8uantitative and continuous in measurement, such as a measured
dimension or timeB. The G ( chart monitors the process location over time, based
on the average of a series of observations, called a subgroup. >hile the )(chart
monitors the variation bet%een observations in the subgroup over time.
The G ( chart or )( chart are used %hen you can rationally collect measurements
in groups AsubgroupsB of bet%een t%o and ten observations. The chartsL '(a'es are
time based, so that the charts sho% a history of the process. The data is time(
ordered= that is, entered in the se8uence from %hich it %as generated.
!.4.1 0o% to construct G ( chart and )(chart
In order to construct the chart, the sample mean, the average of the sub(group and
the limits must be calculated.
!6
The sample mean is calculated from a set of n data values as

n
i
i
x
n
x
1
1
.
The average of the subgroups data is calculated as

7
C
n
i
iC
x
7n
x
1 1
1
%here n is the subgroup si2e and m is the total number of subgroups included in
the analysis.
This x is a centre line of the chart and is called the estimate process mean.
The average range is calculated as

7
i
i

1
1
, %here r is range bet%een the
largest and the smallest value in each subgroup.
The upper and lo%er limits for the G ( chart are calculated by using the formula
A x
2
t %here ,
2
can be find from the process control chart table.
>hile the upper for the )(chart is calculated by using the formula D
4
and for
lo%er limit using the formula
D
3
%here D
4
and D
3
can be find from the process
control chart table.
,fter the centre line and limits are calculated, and then the chart can be
constructed by plotting the observations of sample number versus x for G (
chart and the sample number versus for )(chart.
E'ample !.4#
, component part for a @et aircraft engine is manufactured by an investment
casting process. The vane opening on this casting is an important functional
parameter of the part.
>e %ill illustrate the use of G and E control charts to assess the statistical
stability of this process. The table presents 24 samples of five parts each. The
values given in the table have been coded by using the last three digits of the
dimension= that is, 31./ should be 4.!431/ inch.
Sample +umber x1 x2 x3 x4 x!
G

1 33 27 31 32 33 31./ 4
2 33 31 3! 35 31 33.4 /
3 3! 35 33 34 3/ 3!.4 4
4 34 31 33 34 33 32.2 4
! 33 34 3! 33 34 33.6 2
!7
/ 36 35 37 44 36 36.4 3
5 34 31 32 34 31 31./ 4
6 27 37 36 37 37 3/.6 14
7 26 33 3! 3/ 43 3!.4 1!
14 36 33 32 3! 32 34.4 /
11 26 34 26 32 31 27.6 4
12 31 3! 3! 3! 34 34.4 4
13 25 32 34 3! 35 33.4 14
14 33 33 3! 35 3/ 34.6 4
1! 3! 35 32 3! 37 3!./ 5
1/ 33 33 25 31 34 34.6 /
15 3! 34 34 34 32 33.4 !
16 32 33 34 34 33 31./ 3
17 2! 25 34 25 26 26.2 7
24 3! 3! 3/ 33 34 33.6 /
AaB Construct G and E control charts.
AbB ,fter the process is in control, estimate the process mean and standard
deviation.
"#ercise 2
1. Suppose 0
)
, 0
-
, ., 0
-%
is a sample from normal distribution = ',
-
( %ith + 1,

-
+ 4. ?ind
AaB E'pectation and :ariance of
AbB $istribution of
2. &iven that 0 is normally distributed %ith mean !4 and standard deviation 4, compute
the follo%ing for nF2!.
AaB ean and variance of G
AbB B 47 A G P
AcB B !2 A > G P
/4
6
6
AdB B ! . !1 47 A G P
3. &iven that 0 is normally distributed %ith mean 24 and standard deviation 2, compute
the follo%ing for nF44.
AaB ean and variance of G
AbB B 17 A G P
AcB B 22 A > G P
AdB B ! . 21 17 A G P
4. *et G denote the number of fla%s in a 1 in length of copper %ire. The pmf of G is
given in the follo%ing table
-F' 4 1 2 3
.A-F'B 4.4
6
4.37 4.12 4.41
144 %ires are sampled from this population. >hat is the probability that the average
number of fla%s per %ire in this sample is less than 4.!D
!. ,t a large university, the mean age of the students is 22.3 years, and the standard
deviation is 4 years. , random sample of /4 students is dra%n. >hat is the
probability that the average age of these students is greater than 23 yearsD
/. ,ssuming an e8ual chance of a ne% baby being a boy or a girl, %hat is the probability
that /4 or more out of the ne't 144 births at .antai 0ospital %ill be girlsD
5. If 14C of MT. students are international students, %hat is the probability that fe%er
than 144 in a random sample of 616 students are coming from overseasD
6. Suppose that a sample of n F 1,/44 tires of the same type are obtained at random from
an ongoing production process in %hich 6C of all such tires produced are defective.
>hat is the probability that in such a sample 1!4 or fe%er tires %ill be defectiveD
7. ?or overseas flights, an airline has three different choices on its dessert menuaice
cream, apple pie, and chocolate ca1e. "ased on past e'perience the airline feels that
each dessert is e8ually li1ely to be chosen.
AaB If a random sample of four passengers is selected, %hat is the probability that at
least t%o %ill choose ice cream for dessertD
AbB If a random sample of 21 passengers is selected, %hat is the appoxi7ate
probability that at least t%o %ill choose ice cream for dessertD
14. Suppose that at a certain automobile plant, the number of %or1 stoppage is a .oisson
distribution %ith an average per day due to e8uipment problems during the production
process is 12.4.>hat is the appro'imate probability of having 1! o >e?e %or1
stoppages due to e8uipment problems on any given dayD
/1
11. The number of cars arriving per minute at a toll booth on a particular bridge is
Poisson distributed with a mean of 2.5.What is the probability that in any given
minute
(a) no cars arrive
(b) not more than two cars arrive
!f the e"pected number of cars arriving at the toll booth per ten#minute interval is
25.$% what is the approximate probability that in any given ten#minute period
(c) not more than 2$ cars arrive
(d) between 2$ and &$ cars arrive
12. , component part for a @et aircraft engine is manufactured by an investment casting
process. The vane opening on this casting is an important functional parameter of the
part. >e %ill illustrate the use of G and E control charts to assess the statistical
stability of this process. The table presents 24 samples of five parts each. The values
given in the table have been coded by using the last three digits of the dimension= that
is, 31./ should be 4.!431/ inch.
Sample Number x1 x2 x3 x4 x5

G

r
1 33 29 31 32 33 31.6 4
2 33 31 35 37 31 33.4 6
3 35 37 33 34 36 35.0 4
4 30 31 33 34 33 32.2 4
5 33 34 35 33 34 33.8 2
6 38 37 39 40 38 38.4 3
7 30 31 32 34 31 31.6 4
8 29 39 38 39 39 36.8 10
9 28 33 35 36 43 35.0 15
10 38 33 32 35 32 34.0 6
11 28 30 28 32 31 29.8 4
12 31 35 35 35 34 34.0 4
/2
13 27 32 34 35 37 33.0 10
14 33 33 35 37 36 34.8 4
15 35 37 32 35 39 35.6 7
16 33 33 27 31 30 30.8 6
17 35 34 34 30 32 33.0 5
18 32 33 30 30 33 31.6 3
19 25 27 34 27 28 28.2 9
20 35 35 36 33 30 33.8 6
AaB Construct G and E control charts.
AbB ,fter the process is in control, estimate the process mean and standard deviation.
13. The overall length of a s1e% used in a 1nee replacement device is monitored using
and E charts. The follo%ing table gives the length for 24 samples of si2e 4.
Aeasurements are coded from 2.44 mm= that is, 1! is 2.1! mm.B
9bservation 9bservation
Sample 1 2 3 4 Sample 1 2 3 4
1 1
/
1
6
1
!
13 11 1
4
1
4
1
!
1
3
2 1
/
1
!
1
5
1/ 12 1
!
1
3
1
!
1
/
3 1
!
1
/
2
4
1/ 13 1
3
1
5
1
/
1
!
4 1
4
1
/
1
4
12 14 1
1
1
4
1
4
2
1
! 1
4
1
!
1
3
1/ 1! 1
4
1
!
1
4
1
3
/ 1
/
1
4
1
/
1! 1/ 1
6
1
!
1
/
1
4
5 1
/
1
/
1
4
1! 15 1
4
1
/
1
7
1
/
6 1
5
1
3
1
5
1/ 16 1
/
1
4
1
3
1
7
7 1
!
1
1
1
3
1/ 17 1
5
1
7
1
5
1
3
14 1
!
1
6
1
4
13 24 1
2
1
!
1
2
1
5
AaB Msing all the data, find trial control limits for and E charts, construct the chart,
and plot the data.
AbB Mse the trial control limits from part AaB to identify out(of(control points. If
necessary, revise your control limits, assuming that any samples that plot outside
/3
the control limits can be eliminated.
AcB ,ssuming that the process is in control, estimate the process mean and process
standard deviation.
14. The thic1ness of a printed circuit board A.C"B is an important 8uality parameter. $ata
on board thic1ness Ain cmB are given belo% for 2! samples of three boards each.
Saple 1 % - Saple 1 % -
1 4.4/27 4.4/3/ 4.4/44 14 4.4/4! 4.4/44 4.4/31
2 4.4/34 4.4/31 4.4/22 1! 4.4/17 4.4/44 4.4/32
3 4.4/26 4.4/31 4.4/33 1/ 4.4/31 4.4/25 4.4/34
4 4.4/34 4.4/34 4.4/31 15 4.4/1/ 4.4/23 4.4/31
! 4.4/17 4.4/26 4.4/34 16 4.4/34 4.4/34 4.4/2/
/ 4.4/13 4.4/27 4.4/34 17 4.4/3/ 4.4/31 4.4/27
5 4.4/34 4.4/37 4.4/2! 24 4.4/44 4.4/3! 4.4/27
6 4.4/26 4.4/25 4.4/22 21 4.4/26 4.4/2! 4.4/1/
7 4.4/23 4.4/2/ 4.4/33 22 4.4/1! 4.4/2! 4.4/17
14 4.4/31 4.4/31 4.4/33 23 4.4/34 4.4/32 4.4/34
11 4.4/3! 4.4/34 4.4/36 24 4.4/3! 4.4/27 4.4/3!
12 4.4/23 4.4/34 4.4/34 2! 4.4/23 4.4/27 4.4/34
13 4.4/3! 4.4/31 4.4/34
AaB Msing all the data, find trial control limits for and E charts, construct the chart,
and plot the data.
AbB Mse the trial control limits from part AaB to identify out(of(control points. If
necessary, revise your control limits, assuming that any samples that plot outside
the control limits can be eliminated.
AcB ,ssuming that the process is in control, estimate the process mean and process
standard deviation.
ooo999ooo
Chapter 6
/4
7. 8ypothesis Testing 9 !ne population
*earning ob@ectives#
,t the end of this chapter, student should be able to#
E'plain the concept of hypothesis testing
Mnderstand the procedure or steps to perform the test
$o a testing about the mean %hen the population variance is 1no%n and is
un1no%n
$o a testing about the proportion
.erform the testing about the variance
/.1 Introduction
There are t%o types of statistical inferences# estimation of population parameters
and hypothesis testing. 0ypothesis testing is one of the most important tools of
application of statistics to real life problems. ost often, decisions are re8uired to
be made concerning populations on the basis of sample information. Statistical
tests are used in arriving at these decisions.
Statistical hypotheses are based on the concept of proof by contradiction. ?or
e'ample, say, %e test the mean AB of a population to see if an e'periment has
caused an increase or decrease in . >e do this by proof of contradiction by
formulating a null hypothesis against alternative hypothesis.
7.1.1 3ull 8ypothesis#
It is a hypothesis %hich states that there is no difference bet%een the procedures
and is denoted by 0
4
. ?or the above e'ample the corresponding 0
4
%ould be that
there has been no increase or decrease in the mean. ,l%ays the null hypothesis is
tested, i.e., %e %ant to either accept or re@ect the null hypothesis because %e have
information only for the null hypothesis.
7.1.% Alternati&e 8ypothesis:
It is a hypothesis %hich states that there is a difference bet%een the procedures
and is denoted by 0
1
.
/!
In hypothesis testing there %ill be a correct decision or false decision %ould be
made on the null hypothesis as summari2e in this table.
Suppose Accept 8
+
as true Re:ect 8
+
as false
0
4
is true
Correct decision. .robability# 1
( b
Type I error. .robability# b
0
4
is false Type II error. .robability# c
Correct decision.
.robability# 1 Z c
>e %ill ma1e a correct decision if %e accept 0
4
%hen 0
4
is true or %e %ill re@ect
0
4
%hen actually 0
4
is false.
The ris1 of re@ecting the null hypothesis %hen %e should not re@ect it is called
type I error %ith probability b. It means that %e ma1e a false decision because %e
re@ect 0
4
%hen actually 0
4
is true %ith probability b.
>hile %hen %e accept 0
4
but actually 0
4
is not true, then %e ma1e a %rong
decision and this decision is called type II error is. The probability of type II error
is . >e cannot determine c AbetaB %ith the statistical tools you learn in this
course.
The probability type I error, b is called the level of significance and A1( bB144C
is called the confidence level of the test and A1 Z cB is called the dpo%erd of the
test.
/.1.3 Types of test
In hypothesis testing there are three types of test on any parameters of interest
called t%o tailed AsidedB test, upper tailed test and lo%er tailed test such as in the
table belo%.
Type +ull
0ypothesis, 04
,lternative
0ypothesis, 01
1
4

4
T%o tailed test
2
4

4
> Mpper tailed test
3
4

4
< *o%er tailed test
/.1.4 Test statistics
It is the random variable 0 %hose value is tested to arrive at a decision. In
hypothesis testing the right choice of test statistics is essential. ,mong the test
//
statistics in hypothesis testing are W(statistics, T(statistics,
2
(statistic and ?(
statistic. The choice of this test statistics is depend on the parameter of interest
that %e %ant to test. The Central *imit Theorem states that for large sample si2es
An 3 34B dra%n randomly from a population, the distribution of the means of those
samples %ill appro'imate normality, even %hen the data in the parent population
are not distributed normally but the population variance is 1no%n then a $7
statistic is usually used for large sample si2es An 3 34B. 0o%ever, often large
samples are not easy to obtain and the population variance is un1no%n, then the t(
distribution can be used. The population standard deviation is estimated by the
sample standard deviation, s. ?or test the population variance or standard
deviation, the
2
(statistic is used. In case of performing multiple comparisons by
one %ay ,+9:,, the ?(statistic is normally used.
/.1.! )e@ection region
't is the part of the sample space Acritical regionB %here the null hypothesis 0
4
is
re@ected. The si2e of this region is determined by the probability AB of the sample
point falling in the critical region %hen 0
4
is true. is also 1no%n as the level of
significance, the probability of the value of the random variable falling in the
critical region. ,lso it should be noted that the term dStatistical significanced
refers only to the re@ection of a null hypothesis at some level . It implies that the
observed difference bet%een the sample statistic and the mean of the sampling
distribution did not occur by chance alone.
/.1./ $ecision ma1ing
If the test statistic falls in the re@ection<critical region, then %e may conclude that
0
4
is re@ected, it means that there are enough evidence to support the alternative
hypothesis. 9ther%ise %e fail to re@ect 0
4
means that there are no evidence to
support the claim that the 0
1
is true.
/.1.5 Steps to do the test
In hypothesis testing, there are seven steps to perform any statistical test#
AiB Identify the parameter of interest.
AiiB State the hypothesis# +ull 0ypothesis and ,lternate 0ypothesis
AiiiB $etermine the appropriate Test Statistic
AivB $etermine the critical value
AvB $etermine the )e@ection<Critical )egion or .(value or 144A1(BC confidence
intervals
AviB Calculate the Test Statistic
AviiB a1e a decision or conclusion based on step AvB.
/.2 Testing about the mean for large sample si2e and the variance is 1no%n
/5
If the parameter of interest is to test about the mean for population %hen the
variance is 1no%n or the sample si2e is very large, then the test can be performed
as belo%#
Step 1# To test about the mean and population variance is 1no%n
Step 2# 4 1 4 1 4 1 4 4
# or # or # versus # < > R R R R

Step 3# Test statistic# B 1 , 4 A `
<
4
4
!
n
x
I

Step 4# The critical value, at significant level is $

for one tailed AsidedB test and


$
/2
for t%o tailed AsidedB test.
Step !# i. The critical region#
,lternative 0ypothesis, R
1
)e@ection Criteria A)e@ect R
$
B I?
T%o tailed test, 4

2 < 4 2 < 4
or

I I I I < >
Mpper tailed test, 4
>

I I >
4
*o%er tailed test, 4
<

I I <
4
ii. P(:alue approach#
,lternative 0ypothesis, R
1 )e@ect R
$
I? P T
T%o tailed test, 4
P(value F 2I1(eAO2
4
OBU
Mpper tailed test,
4
> P(value F I1(eA2
4
BU
*o%er tailed test, 4
< P(value F eA2
4
B
iii. 144A1(BC Confidence Intervals#
,lternative 0ypothesis, R
1 )e@ect R
$
I?
o
falls outside of
the interval
/6
T%o tailed test, 4

n
I x
n
I x

2 < 2 <
+
Mpper tailed test, 4
>
n
I x

2 <
+
*o%er tailed test, 4
<

n
I x
2 <
Step /# Calculate the test statistics, I in step 3.
Step 5# $ecision# To ma1e a conclusion based on the criteria in step !.
E'ample /.1#
Test the hypothesis that the mean age of MT. students is less than 21, given a
random sample of 24 individuals %ho have a mean of 24 and assume that the age
is normally distributed %ith variance of 24.
i. Test the hypothesis that the mean age is less than 21. Mse alpha F 4.4!.
ii. >hat is the .(value for this testD
iii. Construct 7!C t%o(sided CI on the mean strength.
iv. Mse the CI found in part AiiiB to test the hypothesis.
Solution#
i. A1B The parameter of interest is to test the true mean age of MT. students, Y.
A2B The hypothesis Testing#
21 # 21 #
4 1 4 4
< R vs R
A3B The test statistics is#
n
x
P
<
4
4

A4B Critical value, 0.05, so P


4.4!
F 1./!
A!B The critical region is re@ect R
4
if P
4
T ( 1./!
A/B Computation
/7
1
24 < 24
21 24
24 , 21
4
2



P
x
A5B )esult and conclusion#
Since P
o
3 (1./!, so %e failed to re@ect R
$
and conclude that not enough
evidence to say the true mean age of MT. students is less than 21 years
old at b F 4.4!.
ii. The P(value for this test is IA(1BUF4.1!7. Since P(value 3 4.4!, then %e failed
to re@ect R
$
.
iii. , 7!C t%o(sided CI on mean strength is
7/ . 21 44 . 16
24
24
7/ . 1 24
24
24
7/ . 1 24
42! . 4 42! . 4

,
_

,
_

,
_

+
,
_

n
P x
n
P x
Since 21 is in the interval, so %e failed to re@ect R
$
and conclude that not
enough evidence to say the true mean age of MT. students is less than 21
years old at b F 4.4!.
/.3 Testing about the mean %hen the population variance is un1no%n
If the parameter of interest is to test about the mean for population %hen the
variance is un1no%n or the sample si2e is small, then to perform the test same as
for variance is 1no%n. E'cept that in step 3, instead of using the I statistic, no%
the test statistic %ill be replaced by @ statistic and %ill be replaced by s the
sample standard deviation.
Step 1# To test about the mean and population variance is 1no%n
Step 2# 4 1 4 1 4 1 4 4
# or # or # versus # < > R R R R

Step 3# Test statistic#
4 1
4
if , `
<

n
t
n s
x
@
Step 4# The critical value, at significant level is t
, n-1
for one tailed AsidedB test
and t
/2, n-1
for t%o tailed AsidedB test.
54
Step !# i. The critical region#
,lternative 0ypothesis, R
1
)e@ection Criteria A)e@ect R
$
B I?
T%o tailed test, 4

1 , 2 < 1 , 2 <
or

< >
n n
t @ t @

Mpper tailed test,
4
>
1 ,
>
n
t @

*o%er tailed test, 4
<
1 ,
<
n
t @

ii. P(:alue approach#
,lternative 0ypothesis, R
1 )e@ect R
$
I? P T
T%o tailed test, 4
P(value F 2PA@
n-1
3 Ot%B
Mpper tailed test, 4
> P(value F PA@
n-1
3 tB
*o%er tailed test, 4
< P(value F PA@
n-1
T tB
iii. 144A1(BC Confidence Intervals#
,lternative 0ypothesis,
R
1
)e@ect R
$
I?
o
falls outside of the
interval
T%o tailed test,
4

n
s
t x
n
s
t x
n n 1 , 2 < 1 , 2 <
+


Mpper tailed test,
4
>
n
s
t x
n 1 , 2 <
+

*o%er tailed test, 4


<


n
s
t x
n 1 , 2 <
Step /# Calculate the test statistics, @ in step 3.
Step 5# $ecision# To ma1e a conclusion based on the criteria in step !.
51
E'ample /.2#
, practical brand of diet margarine %as analy2ed to determine the level of
polyunsaturated fatty acid Ain percentB. , sample of si' pac1ages resulted in the
follo%ing data# 1/.6, 15.2, 15.4, 1/.7, 1/.! and 15.1.
i. Msing the .(value approach, test the hypothesis that the mean is not 15.4,
ii. Construct 7!C t%o(sided CI on the mean.
iii. Mse the CI found in part AiiB to test the hypothesis.
I4 mar1sU
Solution#
i. A1B The parameter of interest is the true mean compressive strength, Y,
variance un1no%n
A2B The hypothesis Testing#
15 # 15 #
4 1 4 4
R vs R
A3B The test statistics is#
n s
x
t
<
4
4

A4B Critical value, 0.05


A!B The critical region is re@ect R
4
if P-value T 4.4!
A/B Computation
1!35 . 4
/ < 3166 . 4
15 76 . 1/
3166 . 4 , 76 . 1/
4



t
s x
?rom t(table, t
4
F 4.1!35 %ith ! df is fall T 4.2/5 for %hich 3 4.4,so the
P-value 3 2A4.4B F 4.6
A5B )esult and conclusion#
Since P-value 3 4.4!, then %e fail re@ect R
$
and %e conclude that the true
mean is 15 at b F 4.4!.
ii. , 7!C t%o(sided CI on mean strength is
52
314/ . 15 /4! . 1/
/
3166 . 4
!51 . 2 76 . 1/
/
3166 . 4
!51 . 2 76 . 1/
! , 42! . 4 ! , 42! . 4

,
_

+
,
_

,
_

+
,
_

n
s
t x
n
s
t x
iii. Since 15 is fall in the interval, so %e fail to re@ect R
$
and conclude the true
mean is 15at b F 4.4!.
/.4 Testing about the proportion
In hypothesis testing, the procedure to test about the proportion of the population
is the same as the procedure to test about the mean %hen the population variance
is 1no%n.
Step 1# To test about the population proportion
Step 2#
4 1 4 1 4 1 4 4
# or # or # versus # p p R p p R p p R p p R < >

Step 3# Test statistic#
n
G
p p p !
n p p
p p
I

f %here , if , B 1 , 4 A `
< B 1 A
f
4
4 4
4
4
Step 4# The critical value, at significant level is $

for one tailed AsidedB test and


$
/2
for t%o tailed AsidedB test.
Step !# i. The critical region#
,lternative 0ypothesis, R
1
)e@ection Criteria A)e@ect R
$
B I?
T%o tailed test, 4
p p
2 < 4 2 < 4
or

I I I I < >
Mpper tailed test,
4
p p >

I I >
4
*o%er tailed test, 4
p p <

I I <
4
ii. P(:alue approach#
53
,lternative 0ypothesis, R
1 )e@ect R
$
I? P T
T%o tailed test,
4
p p P(value F 2I1(eAO2
4
OBU
Mpper tailed test, 4
p p > P(value F I1(eA2
4
BU
*o%er tailed test,
4
p p < P(value F eA2
4
B
iii. 144A1(BC Confidence Intervals#
,lternative
0ypothesis, R
1
)e@ect R
$
I? p
o
falls outside of the interval
T%o tailed test,
4
p p
n p p I p p n p p I p < B f 1 A f f < B f 1 A f f
2 < 2 <


Mpper tailed test,
4
p p >
n p p I p p < B f 1 A f f
2 <
+

*o%er tailed test,
4
p p <
n p p I p p < B f 1 A f f
2 <


Step /# Calculate the test statistics, I in step 3.
Step 5# $ecision# To ma1e a conclusion based on the criteria in step !.
/.! Testing about the variance
?or the parameter of interest is to test about the population variance or standard
deviation, same steps are used e'cept in step 3, the test statistic used in this test is

2
test. The steps are follo%ing#
Step 1# To test about the population proportion
Step 2#
2
4
2
1
2
4
2
1
2
4
2
1
2
4
2
4
# or # or # versus # < > R R R R
Step 3# Test statistic#
54
2
4
2 2
1 2
2
2
if , `
B 1 A

n
s n
Step 4# The critical value, at significant level is $

for one tailed AsidedB test and


$
/2
for t%o tailed AsidedB test.
Step !# i. The critical region#
,lternative 0ypothesis, R
1
)e@ection Criteria A)e@ect R
$
B I?
T%o tailed test,
2
4
2

2
1 , 2 <
2 2
1 , 2 < 1
2

> <
n n
o


Mpper tailed test,
2
4
2
>
2
1 ,
2

>
n

*o%er tailed test,
2
4
<
2
1 , 1
2

<
n

ii. P(:alue approach#
,lternative 0ypothesis, R
1 )e@ect R
$
I? P T
T%o tailed test,
2
4
2
P(value F 2PA
2
n-1
3 O
2
%B
Mpper tailed test,
2
4
2
> P(value F PA
2
n-1
3
2
B
*o%er tailed test,
2
4
< P(value F PA
2
n-1
T
2
B
iii. 144A1(BC Confidence Intervals#
,lternative 0ypothesis, R
1 )e@ect R
$
I?
o
falls outside of
the interval
T%o tailed test,
2
4
2

2
1 , 2 < 1
2
2
2
1 , 2 <
2
B 1 A B 1 A

n n
s n s n

Mpper tailed test,


2
4
2
>
2
1 , 2 < 1
2
2
B 1 A

n
s n

5!
*o%er tailed test,
2
4
<
2
1 , 2 <
2
2
B 1 A

n
s n

Step /# Calculate the test statistics,


2
in step 3.
Step 5# $ecision# To ma1e a conclusion based on the criteria in step !.
E'ample /.3#
,n ,erospace Engineers claim that the standard deviation of the percentage in an
alloy used in aerospace casting is greater than 4.3. !1 parts %ere randomly
selected and the sample standard deviation of the percentage in an alloy used in
aerospace casting is s F4.35.
AiB. ,t b F 4.4!, do these data support the claim of the engineersD
AiiB >hat is the .(value for this testD
AiiiB Construct a 7!C t%o(sided CI for . >hat is conclusionD
Solution#
AiB A1B The parameter of interest is the population variance
2
.
A2B The hypothesis testing#
2 2
1
2 2
4
B 3 . 4 A #
B 3 . 4 A #
>

R
vs
R

A3B Test statistics is#
2
4
2
2
4
B 1 A

s n

A4B Critical value, 0.05


A!B The critical region is )e@ect R
4
if
!4 . /5
2
!4 , 4! . 4
2
4
>
A/B Computation
4!!/ . 5/
B 3 . 4 A
B 35 . 4 A !4
2
2
2
4

A5B )esult and Conclusion#
Since 5/.4!!/ T /5.!4, thus %e re@ect the null hypothesis and conclude that
the engineers claim is true at the 4.4! level of significance.
5/

AiiB. ?rom the
2
table,
1! . 5/ , 42 . 51
2
!4 , 41 . 4
2
!4 , 42! . 4

.Since
51.42T5/.4!!/T 5/.1!, so the .(value is 4.41 T p T 4.42!. "ecause the .(
value is T4.4!, then %e re@ect the null hypothesis.
AiiiB 7!C t%o(sided CI is
4!7 . 4 347 . 4
211! . 4 47!6 . 4
3/ . 32
B 35 . 4 A !4
42 . 51
B 35 . 4 A !4
B 1 A B 1 A
2
2
2
2
2
1 , 2 < 1
2
2
2
1 , 2 <
2


n n
s n s n
Since F4.3 is outside of the interval, then %e re@ect the null hypothesis
and conclude that the engineers claim is true at the 4.4! level of
significance.
ooo999ooo
Chapter 7
1., manufacturer of sprin1ler systems used for fire protection in office buildings claims
55
that the true average system( activation temperature is 1344. , sample of 7 systems
%hen tested yields an average activation temperature of 131.464?. If the distribution
of activation times is normal %ith standard deviation 1.!4?, does the data contradict
the firm;s claim at level of significance a F 4.41. >hat is the P-value for this testD
2. , random sample of !4 battery pac1s is selected and sub@ected to a life test. The
average life of these batteries is 4.4! hours. ,ssume that the battery life is normally
distributed %ith standard deviation e8uals 4.2 hour. Is there evidence to support the
claims that mean battery life e'ceeds 4 hoursD Mse a F 4.4!. >hat is the P-value for
this testD
3. The flo% discharge of .era1 )iver Ameasured in m
3
<sB %as obtained at random. 44
readings %ere collected and the mean flo% discharge %as found to be 3.61!m
3
<s %ith
a standard deviation of 4.!m
3
<s.
AaB Test the hypothesis that mean flo% discharge at .era1 )iver is not e8ual to 4m
3
<s .
Mse F4.4!=
AbB Mse the P-value approach to test the hypothesis null.
AcB Construct a 7!C t%o(sided CI on mean flo% discharge. >hat is conclusionD
4. , civil engineer is analy2ing the compressive strength of concrete. Compressive
strength is appro'imately normally distributed %ith variance
2
F 1444psi
2
. , random
sample of 12 specimens has a mean compressive strength of x ;32!!.42 psi.
AaB Test the hypothesis that mean compressive strength is 3!44psi. Mse F4.41=
AbB >hat is the smallest level of significance at %hich you %ould be %illing to re@ect
the null hypothesisD=
AcB Construct a 7!C t%o(sided CI on mean compressive strength= and
AdB Construct a 77C t%o(sided CI on mean compressive strength. Compare the %idth
of this confidence interval %ith the %idth of the one in part AcB. >hat is your
commentD
!. , ne% process for producing synthetic diamonds can be operated at a profitable level
only if the average %eight of the diamonds is greater than 4.! 1arat. To evaluate the
profitability of the process, si' diamonds are generated %ith recorded %eights, 4.4/,
4./1, .!2, .46, .!5 and .!4 1arat.
AaB ,t !C significance level $o the si' measurements present sufficient evidence that
the average %eight of the diamonds produced by the process is in e'cess of .4!
1aratD
AbB Mse the P-value approach to test the hypothesis null.
AcB Construct a 7!C CI on the average %eight of diamonds.
/. 9ne of the Cigarette Company claims that their cigarettes contain an average of only
14mg of tar. , random sample of 2! cigarettes sho%s the average tar content to be
12.! mg %ith standard deviation of 4.!mg.
AaB Construct a hypothesis test to determine %hether the average tar content of
cigarettes e'ceeds 14mg. using the P(value approach=
AbB Construct a 7!C t%o(sided CI on the average tar content of cigarettes.
56
5. )egardless of age, about 24C of alaysian adults participate in fitness activities at
least t%ice a %ee1. In a local survey of 144 adults over 44 years old, a total of 1!
people indicated that they participated in a fitness activity at least t%ice a %ee1.
AaB $o these data indicate that the participation rate for adults over 44 years of age is
significantly less than 24CD Carry out a test at 14C significance level and dra%
appropriate conclusion.
AbB Construct a 7!C t%o(sided CI on the participation rate.
6. , survey done one year ago sho%ed that 4!C of the population participated in
recycling programs. In a recent poll a random sample of 12!4 people sho%ed that !66
participate in recycling programs.
AaB Test the hypothesis that the proportion of the population %ho participate in
recycling programs is greater than it %as one year ago. Mse a !C significance
level.
AbB Construct a 7!C t%o(sided CI on the proportion.
7. , Ipoh city council member gave a speech in %hich she said that 16C of all private
homes in the city had been undervalued by the county ta' assessor;s office. In a
follo%(up story the local ne%spaper reported that it had ta1en random sample of 71
private homes. Msing professional evaluator to evaluate the property and chec1ing
against county ta' records it found that 14 of the homes had been undervalued.
AaB $oes this data indicate that the proportion of private homes that are undervalued
by the county ta' assessor is different from 16CD Mse a !C significance level.
AbB Construct a 7!C t%o(sided CI on the proportion.
14. Engineers designing the front(%heel(drive half shaft of a ne% model automobile
claim that the variance in the displacement of the constant velocity @oints of the shaft
is less than 1.! mm. 24 simulations %ere conducted and the follo%ing results %ere
obtained, and s F 1.41.
AaB ,t b F 4.4!, do these data support the claim of the engineersD
AbB >hat is the .(value for this testD
AcB Construct a t%o(sided CI for .
11. ,n ,erospace Engineers claim that the standard deviation of the percentage in an
alloy used in aerospace casting is greater than 4.3. !1 parts %ere randomly selected
and the sample standard deviation of the percentage in an alloy used in aerospace
casting is s F4.35.
AaB ,t b F 4.4!, do these data support the claim of the engineersD
AbB >hat is the .(value for this testD
AcB Construct a 7!C t%o(sided CI for . >hat is conclusionD
12. The scientists claim that the variance of sugar content of the syrup in canned peaches
thought to be 16 mg
2
. ?rom a random sample of 14 cans yields a sample deviation of
4.6mg.
AaB ,t b F 4.4!, do these data support the claim of the scientistsD
57
-/ -. x
AbB >hat is the .(value for this testD
AcB Construct a 7!C t%o(sided CI for . >hat is the conclusionD
ooo999ooo
Chapter 7
<. 8ypothesis Testing 9 T(o populations
64
*earning ob@ectives#
,t the end of this chapter, student should be able to#
Mnderstand the procedure or steps to perform the test for t%o populations.
$o a testing about the different bet%een the t%o mean %hen the
populations variance are 1no%n.
$o a testing about the different bet%een the t%o mean %hen the
populations variance are un1no%n but assume to be e8ual.
$o a testing about the different bet%een the t%o mean %hen the
populations variance are un1no%n but assume to be not e8ual.
$o a testing about the different bet%een the t%o proportions.
.erform the testing about the different bet%een the t%o variances.
5.1 Introduction
In hypothesis testing for t%o populations, the procedure or method is the same as
in hypothesis testing for one population. "ut no% %e %ant to test the different
bet%een t%o parameters of interest of populations. ?or e'ample, %e %ant to test
about the different bet%een the t%o mean of the t%o populations,
1
and
2
or to
test the different bet%een t%o proportions of the t%o populations, p
1
and p
2
.
5.1.1 Types of test
In this hypothesis testing there are three types of test on any parameters of interest
called t%o tailed AsidedB test, upper tailed test and lo%er tailed test such as in the
table belo%.
Type +ull
0ypothesis, 04
,lternative
0ypothesis, 01
1 4
2 1
4
2 1
T%o tailed test
2 4
2 1
4
2 1
> Mpper tailed test
3 4
2 1
4
2 1
< *o%er tailed test
5.1.3 Steps to do the test
In hypothesis testing for t%o populations, the procedure to perform the test is the
same as in the hypothesis testing for one population.
AiB Identify the parameter of interest.
61
AiiB State the hypothesis# +ull 0ypothesis and ,lternate 0ypothesis
AiiiB $etermine the appropriate Test Statistic
AivB $etermine the critical value
AvB $etermine the )e@ection<Critical )egion or .(value or 144A1(BC confidence
intervals
AviB Calculate the Test Statistic
AviiB a1e a decision or conclusion based on step AvB.
5.2 Testing about the different bet%een the t%o means, %hen the both population
variances are 1no%n.
If the parameter of interest is to test about the different bet%een the t%o means for
t%o populations %hen both variances are 1no%n, then the test can be performed as
belo%#
Step 1# To test about the different bet%een t%o means,
1
and
2
%hen both
population variances,
2
1
and
2
2
are 1no%n
Step 2#
4 # or 4 # or 4 # versus 4 #
2 1 1 2 1 1 2 1 1 2 1 4
< > R R R R

Step 3# Test statistic#
B 1 , 4 A `
B A B A
2
2
2
1
2
1
2 1
2 1
4
!
n n
x x
I


+

Step 4# The critical value, at significant level is $

for one tailed AsidedB test and


$
/2
for t%o tailed AsidedB test.
Step !# i. The critical region#
,lternative 0ypothesis, R
1
)e@ection Criteria A)e@ect R
$
B
I?
T%o tailed test, 4
2 1

2 < 4 2 < 4
or

I I I I < >
Mpper tailed test, 4
2 1
>

I I >
4
*o%er tailed test, 4
2 1
<

I I <
4
62
ii. P(:alue approach#
,lternative 0ypothesis, R
1 )e@ect R
$
I? P T
T%o tailed test, 4
2 1
P(value F 2I1(eAO2
4
OBU
Mpper tailed test, 4
2 1
> P(value F I1(eA2
4
BU
*o%er tailed test, 4
2 1
< P(value F eA2
4
B
iii. 144A1(BC Confidence Intervals#
,lternative
0ypothesis, R
1
)e@ect R
$
I? 4
2 1
falls outside of the interval
T%o tailed test,
4
2 1

1
2
1
1
2
1
2 <
2 1
2 1
1
2
1
1
2
1
2 <
2 1
B A
B A
n n
I x x
n n
I x x


+ +
+
Mpper tailed test,
4
2 1
>
1
2
1
1
2
1
2 <
2 1
2 1
B A
n n
I x x



+ +
*o%er tailed test,
4
2 1
<
+ B A
2 1
1
2
1
1
2
1
2 <
2 1

n n
I x x
Step /# Calculate the test statistics, I in step 3.
Step 5# $ecision# To ma1e a conclusion based on the criteria in step !.
E'ample 5.1#
The burning rates of t%o different solid(fuel propellants used in roc1et
systems are being studied. It is 1no%n that both propellants have
appro'imately the same standard deviation of burning rate, that is
3cm<second. T%o random samples %ith the same sample si2e of 24 specimens
are tested and the sample mean burning rates are 16 cm<second and 24
cm<second respectively.
63
i. Test the hypothesis that both propellants have the same mean burning rate,
using the .(value approach.
ii. Construct a t%o(sided 7!C CI on the difference in means,
1

2
.
iii. >hat is the practical meaning of this intervalD
Solution#
AiB A1B The parameter of interest is the difference in mean fill volume, Y
1
( Y
2
,
variances,
2
1
and
2
2
are 1no%n
A2B The hypothesis testing#
4 # versus 4 #
2 1 1 2 1 4
R R
A3B The test statistics is#
B 1 , 4 A `
B A
2
2
2
1
2
1
2 1
4
!
n n
x x
P

+

A4B Critical value 0.05


(5) The critical region is re@ect R
4
if P-value T 4.4!
A/B Computation# P
$
and P-value
32 . /
24
7
24
7
B 24 16 A
4

+

P
.(value F 2I1(A/.32BUF2I1(1UF4
A5B )esult and conclusion#
Since .(vale T 4.4!, then %e re@ect R
$
. "oth propellants are not the
same mean burning rate.
ii. , 7!C t%o(sided CI on the difference in means,
1

2
is
64
141 . 4 6!7 . 5
24
7
24
7
7/ . 1 B 24 16 A
24
7
24
7
7/ . 1 B 24 16 A
B A B A
2 1
2
2
2
1
2
1
42! . 4
2 1
2
2
2
1
2
1
42! . 4
2 1

,
_

+ +

,
_

,
_

+ +

,
_


n n
P x x
n n
P x x
iii. Since
1

2
0 is not in the interval then %e re@ect R
$
. "oth propellants
are not the same mean burning rate.
5.2 Testing about the different bet%een the t%o means, %hen the both population
variances are un1no%n.
If the parameter of interest is to test about the different bet%een the t%o means for
t%o populations %hen both variances are un1no%n, then the test can be classified
into t%o cases. ?irst case %e assume that the populations variances are the same,
2
2
2
1
and second case is %e assume that the variances are not e8ual,
2
2
2
1
.
5.2.1 ?irst case#
2
2
2
1

Testing about the different bet%een the t%o means, %hen the both
population variances are un1no%n but
2
2
2
1
. The test can be
performed as belo%#
Step 1# To test about the different bet%een t%o means,
1
and
2
%hen both
population variances,
2
1
and
2
2
are un1no%n but
2
2
2
1
.
Step 2#
4 # or 4 # or 4 # versus 4 #
2 1 1 2 1 1 2 1 1 2 1 4
< > R R R R

Step 3# Test statistic#
2
B 1 A B 1 A
%here
B 2 A %ith ,
1 1
B A B A
2 1
2
2 2
2
1 1 2
2 1
2 1
2 1
2 1
4
+
+

+
+

n n
s n s n
s
d> n n
n n
s
x x
t
p
p

6!
Step 4# The critical value, at significant level is t

for one tailed AsidedB test and


t
/2
for t%o tailed AsidedB test.
Step !# i. The critical region#
,lternative 0ypothesis, R
1
)e@ection Criteria A)e@ect R
$
B I?
T%o tailed test, 4
2 1

2 , 2 < 4 2 , 2 < 4
2 1 2 1
or
+ +
< >
n n n n
t t t t

Mpper tailed test, 4
2 1
>
2 , 4
2 1
+
>
n n
t t

*o%er tailed test, 4
2 1
<
2 , 4
2 1
+
<
n n
t t

ii. P(:alue approach#
,lternative 0ypothesis, R
1 )e@ect R
$
I? P T
T%o tailed test, 4
2 1
P(value F 2PA@
n-1
3 Ot%B
Mpper tailed test, 4
2 1
> P(value F PA@
n-1
3 tB
*o%er tailed test, 4
2 1
< P(value F PA@
n-1
T tB
iii. 144A1(BC Confidence Intervals#
,lternative
0ypothesis, R
1
)e@ect R
$
I? 4
2 1
falls outside of the interval
T%o tailed test,
4
2 1

2 1
2 , 2 <
2 1
2 1
2 , 2 <
1
1 1
B A
1 1
2 1
2 1
n n
s t x
n n
s t x
p n n
p n n
+ +
+
+
+


Mpper tailed test,
4
2 1
>
2 1
2 , 2 < 2 1
1 1
B A
2 1
n n
s t x
p n n
+ +
+

6/
*o%er tailed test,
4
2 1
<
+
+
B A
1 1
2 1
2 1
2 , 2 <
2 1

n n
s t x
p n n
Step /# Calculate the test statistics, t
$
in step 3.
Step 5# $ecision# To ma1e a conclusion based on the criteria in step !.
E'ample 5.2#
.rofessor ,dams taught the same large lecture course for t%o terms. E'cept
for negligible differences the t%o courses %ere the same. 0o%ever, one met at
6a.m. and the other met at 11a.m. The t%o courses %ere given final e'ams of
the same degree of difficulty and covering the same material. "oth e'ams
%ere %orth 144 points. , random sample of 47 students from the 6a.m. class
had an average score of 53.2 %ith standard deviation 6.1. , random sample of
3/ students from the 11#44a.m. class had an average score of 56.1 %ith
standard deviation 14.4. ,ssume that the population variances are the same
and the data are dra%n from a normal distribution.
i. $oes this data indicate that the mean score for the 11#44 a.m. class is
higher than the mean score for the 6a.m. classD Mse a !C significance
level.
ii. >hat is the .(value for this testD
iii. Construct a t%o(sided 7!C CI on the difference in average scores.
Solution#
AiB A1B The parameter of interest is to test the mean score at 11am,
2
is better
than the mean score at 6am,
1
.
A2B 0ypothesis testing#
2 1 1
2 1 4
#
4 #


<

R
vs
R
A3B Test statistics is#
65
2
B 1 A B 1 A
%here
2 %ith ,
1 1
B A B A
2 1
2
2 2
2
1 1 2
2 1
2 1
2 1
2 1
4
+
+

+
+

n n
s n s n
s
d> n n
n n
s
x x
t
p
p

A4B Critical value 0.05, t
4.4!, 63
F 1./!6
A!B The critical region is re@ect R
4
if t
4
T (1./!6
A/B Computation#
Computations#
7! . 6
63
26 . //47
2 3/ 47
B 14 BA 3! A B 1 . 6 BA 46 A
%here
63 %ith ! . 2
7/ . 1
7 . 4
3/
1
47
1
7! . 6
B 1 . 56 2 . 53 A
So
3/ , 47 , 14 , 1 . 6 , 1 . 56 , 2 . 53
2 2
4
2 1 2 1 2 1

+
+


p
s
d> t
n n s s x x
A5B )esult and conclusion#
Since t
$
T (1./!6, then %e have to re@ect R
$
at F4.4!. The mean score
for the t%o tests are the same. There is enough evidence to say that test
at 11am is better result from test at 6am.
ii. The .(value for the test#
?rom t(table %ith 63 df, t
$
=2"5 is bet%een t= 2"356 and t=2"613, %hich
give 4.44!TpT4.41. Since P T 4.4!, thus %e re@ect R
$
at the 4.4! level of
significance and conclude that there is enough evidence to say that test at
11am is better result from test at 6am.
iii. , 7!C CI for the difference in mean before and after the policy change
%here t
$"$25,22
=1"26 is
66
1.4141 ( 6.5677 (
3/
1
47
1
B 7! . 6 BA 76 . 1 A B 1 . 56 2 . 53 A
3/
1
47
1
B 7! . 6 BA 76 . 1 A B 1 . 56 2 . 53 A
1 1
B A B A
1 1
B A B A
3/ , 47 , 7! . 6 , 1 . 56 , 2 . 53
2 1
2 1
2 1
2 , 2 <
2 1
2 1
2 1
2 , 2 <
2 1
2 1
2 1
2 1 2 1

,
_

+ +

,
_

+
+ + +

+ +




n n
s t x x
n n
s t x x
n n s x x
p n n p n n
p
5.2.2 Second case#
2
2
2
1

Testing about the different bet%een the t%o means, %hen the both
population variances are un1no%n but
2
2
2
1
. The test can be
performed as belo%#
Step 1# To test about the different bet%een t%o means,
1
and
2
%hen both
population variances,
2
1
and
2
2
are un1no%n but
2
2
2
1
.
Step 2#
4 # or 4 # or 4 # versus 4 #
2 1 1 2 1 1 2 1 1 2 1 4
< > R R R R

Step 3# Test statistic#
2
2
2
1
2
1
2 1
2 1
4
B A B A
n
s
n
s
x x
t
+


%ith degrees of freedom given by,
( ) ( )
1
<
1
<
2
2
2
2
2
1
2
1
2
1
2
2
2
2
1
2
1

,
_

n
n s
n
n s
n
s
n
s
v
Step 4# The critical value, at significant level is t
, v
for one tailed AsidedB test
and t
/2, v
for t%o tailed AsidedB test.
Step !# i. The critical region#
,lternative 0ypothesis, R
1
)e@ection Criteria A)e@ect R
$
B
I?
67
T%o tailed test, 4
2 1

v v
t t t t
, 2 < 4 , 2 < 4
or

< >
Mpper tailed test, 4
2 1
>
v
t t
, 4
>
*o%er tailed test, 4
2 1
<
v
t t
, 4
<
ii. P(:alue approach#
lternative 0ypothesis, R
1 )e@ect R
$
I? P T
T%o tailed test, 4
2 1
P(value F 2PA@
n-1
3 Ot%B
Mpper tailed test, 4
2 1
> P(value F PA@
n-1
3 tB
*o%er tailed test, 4
2 1
< P(value F PA@
n-1
T tB
iii. 144A1(BC Confidence Intervals#
,lternative
0ypothesis, R
1
)e@ect R
$
I? 4
2 1
falls outside of the interval
T%o tailed test,
4
2 1

2
2
2
1
2
1
, 2 <
2 1
2 1
2
2
2
1
2
1
, 2 <
2 1
B A
B A
n
s
n
s
t x x
n
s
n
s
t x x
v
v
+ +
+


Mpper tailed test,
4
2 1
>
2
2
2
1
2
1
, 2 <
2 1
2 1
B A B A
n
s
n
s
t x x
v
+ +


*o%er tailed test,
4
2 1
<
+ B A B A
2 1
2
2
2
1
2
1
, 2 <
2 1

n
s
n
s
t x x
v
Step /# Calculate the test statistics, t
$
in step 3.
Step 5# $ecision# To ma1e a conclusion based on the criteria in step !.
E'ample 5.3#
74
T%o companies manufacture a rubber material intended for use in an
automotive application. The part %ill be sub@ected to abrasive %ear in the
application, so %e decide to compare the material produced by each company
in a test. T%enty(five samples of material from each company are tested in an
abrasion test and the amount of %ear after 1444 circles is observed. The
sample mean and standard deviation of %ear for company , and "
respectively, are
/i/les 77 S /i/les 77 S
/i/les 77 G /i/les 77 G
# A
# A
1444 < 6 , 1444 < 2
, 1444 < 12 , 1444 < 24


i. $o the data support the claim that the t%o companies produce material
%ith different mean %earD Mse 4.4!, and assume that each population is
normally distributed but their variances are not e8ual. >hat is the .(value
for this testD
ii. Construct a t%o(sided 7!C CI that %ill address the 8uestions in partAiB
and AiiB above.
A/ mar1sB
Solution#
AiB A1B The parameter of interest is to test the different bet%een the t%o means,

and

, variance un1no%n but not e8ual.


A2B 0ypothesis testing#
4 #
4 #
1
4


# A
# A
R
vs
R


A3B Test statistics is#
#
#
A
A
# A
# A
n
s
n
s
x x
t
2 2
4
B A B A
+


A4B Critical value 0.05,
( ) ( )
25
24 <
2!
6
24 <
2!
2
2!
6
2!
2
1
<
1
<
2
2
2
2
2
2 2
2
2
2
2
2
2 2

,
_

,
_

,
_

,
_

#
# A
A
# A
#
#
A
A
n
n s
n
n s
n
s
n
s
v
t
4.42!, 25
F 3.4!5
71
A!B The critical region is re@ect R
4
if t
4
3 3.4!5 or t
4
T (3.4!5
A/B Computation#
6! . 4
2!
6
2!
2
B 12 24 A
So
2! , 6 , 2 , 12 , 24
2 2
4

+


t
n n s s x x
# A # A # A
A5B )esult and conclusion#
Since t
$
N 3"$53, thus %e have to re@ect the null hypothesis and enough
evidence to support that the means are difference.
P-value for the test is . T 4.4441, since t
$
F 4.6! %ith 25 df is falls 3 3./7
for %hich 3 4.444!.
AiiB , 7!C CI for the difference in mean

and

is
13.4425 2.7!62
2!
2
2!
6
B 4!5 . 3 B 12 24 A
2!
2
2!
6
B 4!5 . 3 B 12 24 A
B A B A
2! , 2 , 6 , 12 , 24
2 2 2 2
2 2
, 2 <
2 2
, 2 <

,
_

+ +

,
_

+
+ + +

# A
# A
#
#
A
A
v
# A
# A
#
#
A
A
v
# A
# A # A
# A
n
s
n
s
t x x
n
s
n
s
t x x
n n s s x x




?rom CI, since

$ is not in the interval, so %e re@ect R


4
. Strong
evidence to support that

and
"
are difference.
5.3 Testing about the different bet%een the t%o proportions, p
1
and p
2
.
If the parameter of interest is to test about the different bet%een the t%o
proportions of t%o populations, then the test can be performed as belo%#
Step 1# To test about the different bet%een t%o proportions, p
1
and p
2
.
72
Step 2#
4 # or 4 # or 4 # versus 4 #
2 1 1 2 1 1 2 1 1 2 1 4
< > p p R p p R p p R p p R
Step 3# Test statistic#
B 1 , 4 A `
1 1
B f 1 A f
B A B f f A
2 1
2 1 2 1
4
!
n n
p p
p p p p
I

,
_

%here,
2 1
2 1
f
n n
x x
p
+

Step 4# The critical value, at significant level is $

for one tailed AsidedB test


and W
/2
for t%o tailed AsidedB test.
Step !# i. The critical region#
,lternative 0ypothesis, R
1
)e@ection Criteria A)e@ect R
$
B
I?
T%o tailed test, 4
2 1
p p
2 < 4 2 < 4
or

I I I I < >
Mpper tailed test, 4
2 1
> p p

I I >
4
*o%er tailed test, 4
2 1
< p p

I I <
4
ii. P(:alue approach#
,lternative 0ypothesis, R
1 )e@ect R
$
I? P T
T%o tailed test, 4
2 1
p p P(value F 2I1(eAO2
4
OBU
Mpper tailed test, 4
2 1
> p p P(value F I1(eA2
4
BU
*o%er tailed test, 4
2 1
< p p P(value F eA2
4
B
iii. 144A1(BC Confidence Intervals#
,lternative
0ypothesis, R
1
)e@ect R
$
I? 4
2 1
p p falls outside of the interval
73
T%o tailed test,
4
2 1
p p
1
2
1
1
2
1
2 < 2 1
2 1
1
2
1
1
2
1
2 < 2 1
B f f A
B f f A
n n
I p p
p p
n n
I p p

+ +
+
Mpper tailed test,
4
2 1
> p p
1
2
1
1
2
1
2 1 2 1
B f f A
n n
I p p p p

+ +
*o%er tailed test,
4
2 1
< p p
+ B f f A
2 1
1
2
1
1
2
1
2 < 2 1
p p
n n
I p p

Step /# Calculate the test statistics, I in step 3.


Step 5# $ecision# To ma1e a conclusion based on the criteria in step !.
E'ample 5.4#
In a study on the effects of sodium restricted diets on hypertension, 24 out of !!
hypertensive patients %ere on sodium restricted diets, and 3/ out of 147 non(
hypertensive patients %ere on sodium restricted diets.
i. Test the hypothesis that the proportion of patients on sodium restricted diets
is higher for hypertensive patients at F4.4!.
11. >hat is the .(value for this testD
12. Construct a t%o sided 7!C CI and comment.
Solution#
i. A1B The parameters of interest are to test the proportion of hypertension
patients on sodium restricted diets, p
,
and non(hypertension patients, p
"
A2B 0ypothesis testing#
# A
# A
p p R
vs
p p R
>

#
#
1
4
A3B Test statistics is#
# A
# A
# A
# A # A
n n
x x
p
n n
p p
p p p p
P
+
+

,
_

+

f %here ,
1 1
B f 1 A f
B A B f f A
4
A4B Critical value 0.05, 2

F 1./!
74
A!B The critical region is re@ect R
4
if P
$
3 1
A/B Computation#
27 . 4
147 !!
3/ 24
f %here , 74!! . 4
147
1
!!
1
B 34 . 4 1 A 34 . 4
24 . 4 44 . 4
24 . 4 f , 44 . 4 f , 3/ , 24 , 147 , !!
4

+
+

,
_



p P
p p x x n n
# A # A # A
A5B )esult and conclusions#
Since P
$
3 1./!, then %e re@ect R
4
. It means that enough evidence to claim that
the proportion of patients on hypertension is higher than non hypertension
patients
ii. P-valueFA1(A4.7!BFA1(1BF4. Since the P(value is less than 4.4!, thus %e
re@ect the null hypothesis. There is enough evidence to claim that the
proportion of patients on hypertension is higher than non hypertension
patients.
iii , 7!C CI on the difference in the t%o proportion of patients are
4.346 4.4!2
147
B 5/ . 4 BA 24 . 4 A
!!
B !/ . 4 BA 44 . 4 A
B 7/ . 1 A B 24 . 4 44 . 4 A
147
B 5/ . 4 BA 24 . 4 A
!!
B !/ . 4 BA 44 . 4 A
B 7/ . 1 A B 24 . 4 44 . 4 A
B f 1 A f B f 1 A f
B f f A
B f 1 A f B f 1 A f
B f f A
24 . 4 f , 44 . 4 f , 3/ , 24 , 147 , !!
2 < 2 <

,
_

+ +

,
_



# A
# A
#
# #
A
A A
# A # A
#
# #
A
A A
# A
# A # A # A
p p
p p
n
p p
n
p p
P p p p p
n
p p
n
p p
P p p
p p x x n n

?rom CI, since 4
# A
p p is not in the interval, so %e re@ect R
4
. There a
significance difference in the t%o proportion of patients.
5.3 Testing about the different bet%een the t%o variances,
2
2
2
1
and .
If the parameter of interest is to test about the different bet%een the t%o variances
of t%o populations, then the test can be performed as belo%#
Step 1# To test about the different bet%een t%o variances,
2
2
2
1
and .
7!
Step 2#
4 # or 4 # or 4 # versus 4 #
2
2
2
1 1
2
2
2
1 1
2
2
2
1 1
2
2
2
1 4
< > R R R R
Step 3# Test statistic#
2
2
2
1
4
s
s
J
Step 4# The critical value, at significant level is f

for one tailed AsidedB test


and f
/2
for t%o tailed AsidedB test.
Step !# i. The critical region#
,lternative 0ypothesis, R
1
)e@ection Criteria A)e@ect R
$
B I?
T%o tailed test, 4
2
2
2
1

1 , 1 , 2 < 1 4 1 , 1 , 2 < 4
2 1 2 1
or

< >
n n n n
> J > J

Mpper tailed test,
4
2
2
2
1
>
1 , 1 , 4
2 1

>
n n
> J

*o%er tailed test,
4
2
2
2
1
<
1 , 1 , 4
2 1

<
n n
> J

ii. 144A1(BC Confidence Interval on the ratio of t%o variances
,lternative
0ypothesis, R
1
)e@ect R
$
I?
1
2
2
2
1

falls outside of the interval


T%o tailed test,
4
2
2
2
1

1 , 1 , 2 <
2
2
2
1
2
2
2
1
1 , 1 , 2 < 1
2
2
2
1
1 2 1 2


n n n n
>
s
s
>
s
s

Mpper tailed test,


4
2
2
2
1
>
1 , 1 ,
2
2
2
1
2
2
2
1
1 2
4


n n
>
s
s

*o%er tailed test,


4
2
2
2
1
<


2
2
2
1
1 , 1 , 1
2
2
2
1
1 2

n n
>
s
s
Step /# Calculate the test statistics, I in step 3.
Step 5# $ecision# To ma1e a conclusion based on the criteria in step !.
7/
E'ample 5.4#
, random sample of 12 air pollution inde' at MT. station produced a variance
4.4344 %hile a random sample of another 13 air pollution inde' at Tronoh station
produced a variance 4.4!2!.
AiB. ,re the population variances e8ualD. Mse F 4.4!.
AiiB ?ind the 7!C t%o(sided confidence interval on the ratio of t%o variances.
Solution#
i. A1B The parameters of interest are to test the difference bet%een the t%o
variances of pollution inde'es,
1
2
and
2
2
.
A2B 0ypothesis testing#
2
2
2
1 1
2
2
2
1 4
#
#

R
vs
R
A3B Test statistics is#
2
2
2
1
4
s
s
J
A4B Critical value 0.05, >
/2, 12,13
F 3.1!
A!B The critical region is re@ect R
4
if
32 . 4
1! . 3
1 1
or 1! . 3
13 , 12 , 42! . 4
13 , 12 , 75! . 4 4 13 , 12 , 42! . 4 4
< >
>
> J > J

A/B Computation#
/46 . 4
4!2! . 4
4344 . 4
2
2
2
1
4

s
s
J
A5B )esult and conclusions#
Since 4.32TJ
$
T 3.1!, then %e cannot re@ect R
4
. It means that not enough
evidence to say that the variances of the t%o pollution inde'es are different.
ii , 7!C t%o(side confidence interval on the ratio of t%o variances of pollution
inde'es are
75
441 . 2 24/ . 4
B 1! . 3 A
4!2! . 4
4344 . 4
1! . 3
1
4!2! . 4
4344 . 4
1
2
2
2
1
2
2
2
1
1 , 1 , 2 <
2
2
2
1
2
2
2
1
1 , 1 , 2 <
2
2
2
1
1 , 1 , 2 <
2
2
2
1
2
2
2
1
1 , 1 , 2 < 1
2
2
2
1
1 2
1 2
1 2 1 2







n n
n n
n n n n
>
s
s
> s
s
>
s
s
>
s
s
?rom CI, since
2
2
2
1

F1 is in the interval, so %e cannot re@ect R


4
. It means that
not enough evidence to say that the variances of the t%o pollution inde'es are
different.
"#ercise <
1. , random sample of si2e n F 2! ta1en from a normal population %ith F !.2 has a
mean e8uals 61. , second random sample of si2e n F 3/, ta1en from a different
normal population %ith F 3.4, has a mean e8uals 5/.
(a) $o the data indicate that the true mean value
1
and
2
are differentD Carry out a
test at F 4.41
AbB ?ind 74C CI on the difference in mean strength
2. T%o machines are used for filling plastic bottles %ith a net volume of 1/.4 o2. The fill
volume can be assumed normal %ith, s
1
F 4.42 and s
2
F 4.42!. , member of the
8uality engineering staff suspects that both machines fill to the same mean net
volume, %hether or not this volume is 1/.4 o2. , random sample of 14 bottles is
ta1en from the output of each machine %ith the follo%ing results#
AaB $o you thin1 the engineer is correctD Mse the p P value approach.
76
AbB ?ind a 7!C CI on the difference in means.
3. T%o machine are used to fill plastic bottles %ith dish%ashing detergent. The standard
deviations of fill volume are 1no%n to be
1
g4.41 and g
g
F 4.1! fluid ounce for t%o
machines, respectively. T%o random samples of n
1
F 12 bottles from machine 1 and
n
2
F14 bottles from machine 2 are selected, and the sample mean fill volumes are
1 x ;34./1
2 x ;34.24 fluid ounces. ,ssume normality.
AaB Test the hypothesis that both machines fill to the same mean volume. Mse the P(
value approach=
AbB Construct a 74C t%o(sided CI on the mean difference in fill volume= and
AcB Construct a 7!C t%o(sided CI on the mean difference in fill volume. Compare
and comment on the %idth of this interval to the %idth of the interval in part AiiB.
4. To find out %hether a ne% serum %ill arrest leu1emia, 7 mice, all %ith an advanced
stage of the disease are selected. ! mice receive the treatment and 4 do not. Survival,
in years, from the time the e'periment commenced are as follo%s#
Treatment 2.1 !.3 1.4 4./ 4.7
+o
treatment
1.7 4.! 2.6 3.1
,t the 4.4! level of significance can the serum be said to be effectiveD ,ssume the
t%o distributions to be of e8ual variances.
!. , ne% policy regarding overtime pay %as implemented. This policy decreased the
pay factor for overtime %or1. +either the staffing pattern nor the %or1 loads changed.
To determine if overtime loads changed under the policy, a random sample of
employees %as selected. Their overtime hours for a randomly selected %ee1 before
and for another randomly selected %ee1 after the policy change %ere recorded as
follo%s#
"ployees: 1 2 3 4 ! / 5 6 7 14 11 12
Before: ! 4 2 6 14 4 7 3 / 4 1 !
After: 3 5 ! 3 5 4 4 1 2 3 2 2
,ssume that the t%o population variances are e8ual and the underlying population is
normally distributed.
AaB Is there any evidence to support the claim that the average number of hours
%or1ed as overtime per %ee1 changed after the policy %ent into effect. Mse a P(
value approach in arriving at this conclusion.
AbB Construct a 7!C CI for the difference in mean before and after the policy change.
Interpret this interval.
77
/. The diameter of steel rods manufactured on t%o different e'trusion machines is being
investigated. T%o random samples of si2es n
1
F 1! and n
2
F 15 are selected, and
respectively. ,ssume that data are dra%n
normal distribution %ith e8ual variances.
AaB Is there evidence to support the claim that the t%o machines produce rods %ith
different mean diameters D Mse the p P value approach.
AbB Construct a 7!C CI on the difference in mean rod diameter.
5. The follo%ing data represent the running times of films produced by 2 motion(picture
companies. Test the hypothesis that the average running time of films produced by
company 2 e'ceeds the average running time of films produced by company 1 by 14
minutes against the one(sided alternative that the difference is less than 14 minutesD
Mse a F 4.41 and assume the distributions of times to be appro'imately normal %ith
une8ual variances.
Ti
me
Company
-
1
142 6/ 76 147 72
-
2
61 1/! 75 134 72 65 114
6. T%o companies manufacture a rubber material intended for use in an automotive
application. 2! samples of material from each company are tested, and the amount of
%ear after 1444 cycles is observed. ?or company 1, the sample mean and standard
deviation of %ear are and for
company 2, %e obtain

AaB $o the sample data support the claim that the t%o companies produce material
%ith different mean %earD ,ssume each population is normally distributed but
une8ual variancesD
AbB Construct a 7!C CI for the difference in mean %ear of these t%o companies.
Interpret this interval.
7. .rofessor , claims that a probability and statistics student can increase his or her
score on tests if the person is provided %ith a pre(test the %ee1 before the e'am. To
144
cycles 1444 < 7 . 1 and cycles 1444 < 12 . 24
1 1
7F s 7F x
cycles 1444 < 7 . 5 and cycles 1444 < /4 . 11
2 2
7F s 7F x
44 . 4 , /6 . 6 and 3! . 4 , 35 . 6
2
2 2
2
1 1
s x s x
test her theory she selected 1/ probability and statistics students at random and gave
these students a pre(test the %ee1 before an e'am. She also selected an independent
random sample of 12 students %ho %ere given the same e'am but did not have access
to the pre(test. The first group had a mean score of 57.4 %ith standard deviation 6.6.
The second group had sample mean score 51.2 %ith standard deviation 5.7.
AaB $o the data support .rofessor , claims that the mean score of students %ho get a
pre(test are different from the mean score of those %ho do not get a pre test before
an e'am. Mse the P-value approach and assume that their variances are not e8ual.
AbB Construct a 7!C CI for the difference in mean score of students %ho get a pre(test
and those %ho do not get a pre(test before an e'am. Interpret this interval.
14. , vote is to be ta1en among residents of a to%n and the surrounding county to
determine %hether a proposed chemical plant should be constructed. If 124 of 244
to%n voters favour the proposal and 244 of !44 county residents favour it, %ould you
agree that the proportion of to%n voters favouring the proposal is higher than the
proportion of county votersD Mse a F 4.4!
11. The rollover rate of sport utility vehicles is a transportation safety issue. Safety
advocates claim that the manufacturer ,;s vehicle has a higher rollover rate than that
of manufacturer ". 9ne hundreds crashes for each of this vehicles %ere e'amined.
The rollover rates %ere p
A
=$"35 and p
#
=$"25.
AaB "y using the P(value approach, does manufacturer ,;s vehicle has a higher
rollover rate than manufacturer ";sD
AbB Construct a 7!C one(sided CI on the difference in the t%o rollover rates of the
vehicle. Interpret this interval.
12. .rofessor )ady gave !6 ,;s and ";s to a class of 12! students in his section of
English 141. The ne't term .rofessor 0ady gave 4! ,;s and ";s to a class of
11!students in his section of English 141.
AaB "y using a !C significance level, test the claim that .rofessor )ady gives a higher
percentage of ,;s and ";s in English 141 than .rofessor 0ady does. >hat is
commentD
AbB Construct a 7!C one(sided CI on the difference in the percentage of ,;s and ";s
in English 141 given by this t%o professors.
13. The diameter of steel rods manufactured on t%o different e'trusion machines is being
investigated. T%o random samples of si2es n
1
F 1! and n
2
F 15 are selected, and
respectively.
AaB Is there evidence to conclude that the variance of the diameter of steel rods is
different for the t%o machinesD Mse the p P value approach.
AbB Construct a 7!C t%o(sided CI on the difference in mean rod diameter.
141
44 . 4 , /6 . 6 and 3! . 4 , 35 . 6
2
2 2
2
1 1
s x s x
14. .rofessor , claims that a probability and statistics student can increase his or her
score on tests if the person is provided %ith a pre(test the %ee1 before the e'am. To
test her theory she selected 1/ probability and statistics students at random and gave
these students a pre(test the %ee1 before an e'am. She also selected an independent
random sample of 12 students %ho %ere given the same e'am but did not have access
to the pre(test. The first group had a mean score of 57.4 %ith standard deviation 6.6.
The second group had sample mean score 51.2 %ith standard deviation 5.7.
AaB $o the data support .rofessor , claims that the mean score of students %ho get a
pre(test are different from the mean score of those %ho do not get a pre test before
an e'am. Mse the P-value approach and assume that their variances are not e8ual.
AbB Construct a 7!C t%o(sided CI for the difference in mean score of students %ho
get a pre(test and those %ho do not get a pre(test before an e'am. Interpret this
interval.
1!. The melting points of t%o alloys used %ere investigated by melting 1! samples of
each material. The sample standard deviation for alloy 1 %as 2.34
o
? and for alloy 2
%as 2.!
o
?.
(a) $o the sample data support a claim that both alloys have the same variance
melting pointD. Mse F 4.4!.
(b) Construct a 7!C t%o(sided confidence interval on the ratio of the t%o variances.
1/. , study %as conducted to test %hether there are differences bet%een t%o variances of
petrol consumptions of t%o types of petrol, )9+7! and )9+75. ?ive cars %ere
selected at random and the data of petrol consumptions in 1m<liter for each petrol
types are obtained as follo%#
Sm per liter
)9+7! )9+75
Car 1 6.7 7.2
Car 2 5.! 5.6
Car 3 6.2 6.!
Car 4 6./ 6.6
Car ! 7.! 7.4
AaB $o the sample data support a claim that both petrol types have the same variance
of petrol consumptionsD. Mse Mse F 4.4!.
AbB Construct a 7!C t%o(sided confidence interval on the ratio of the t%o variances.
ooo999ooo
142
Chapter 8
=. Siple >inear Regression
*earning 9utcomes#
,t the end of the lesson, the student should be able to
Mse the least s8uares method to estimate the intercept and slope of the liner
regression model
Carry out tests to determine if the model obtained is an ade8uate fit to the data
Construct confidence intervals on regression parameters
143
6.1 Introduction
)egression models are statistical models %hich describe the variation in one Aor
moreB variableAsB %hen one or more other variableAsB vary. Inference based on
such models is 1no%n as regression analysis. There are t%o types of regression
models called simple linear regression model and multiple linear regression
models.
Simple linear regression is a linear regression model %ith a single predictor
variable. In other %ords, simple linear regression fits a straight line through the
set of n points in such a %ay that ma1es the sum of s8uared residuals of the model
Athat is, vertical distances bet%een the points of the data set and the fitted lineB as
small as possible.
The simple refers to the fact that this regression model has only one response or
dependent variable and one independent variable. The fitted line has the slope
e8ual to the correlation bet%een the dependent variable, 8 is also referred to as the
response and independent variable, x is also referred to as eFesso or
pedi/to vaiable corrected by the ratio of standard deviations of these variables.
The intercept of the fitted line is such that it passes through the center of mass Ax,
8B of the data points. The statistical relation bet%een x and 8 may be e'pressed as
follo%s#
+ + x H
1 4
A6.1B
>here, 4

is called the intercept of the regression and


1
is the slope of the
regression. These t%o parameters called regression coefficients" The slope,
1
,
can be interpreted as the change in the mean value of H for a unit change in x.

The random error term,

, is assumed to follo% the normal distribution %ith a


mean of 4 and variance of
2
. Since H is the sum of this random term and the
mean value, E,H-, A%hich is a constantB, the variance of H at any given value of x
is also
2
. Therefore, at any given value of x, say x
i
, the dependent variable H
follo%s a normal distribution %ith a mean of
i
x
1 4
+
and a standard deviation
of
2
.
6.2 ?itted )egression odel
The true regression line corresponding to E8uation A6.1B is usually never 1no%n.
0o%ever, the regression line can be estimated by estimating the coefficients 4

and
1
for an observed data set. The estimates of,
4
f

and
1
f
, are calculated
using least s8uares method. The estimated regression line, obtained using the
144
values of
4
f

and
1
f
, is called the fitted line. The least s8uare estimates,
4
f

and
1
f
, are given by

,
_

,
_


,
_


n
i
n
i
i
i
n
i
n
i
i
n
i
i
i i
n
x
x
n
8 x
8 x
1
2
1 2
1
1 1
1
f

A6.2B
and
x 8
1 4
f f

A6.3B
%here
n
8
8
n
i
i

1 is the mean of all the observed values and


n
x
x
n
i
i

1 is the
mean of all values of the predictor variable at %hich the observations %ere ta1en.
9nce the
4
f

and
1
f
are 1no%n, the fitted regression model can be %ritten as#
x 8
1 4
f f
f +

A6.4B
>here
8f
is the fitted or estimated value based on the fitted regression model. It
is an estimate of the mean value, E,H-" The fitted value,
8f
for a given value of the
predictor variable, x
i
, may be different from the corresponding observed value, 8
i
.
The difference bet%een the t%o values is called the residual,
i i i
8 8 e f

A6.!B
6.3 ,ssessment of the )egression odel
The fitted regression model, e8uation A6.4B can be used to estimate the value of
response, 8 for a certain value of predictor variable x.
The regression model can be evaluated by three method of assessment. There are,
the error of estimate,

, the coefficient of determination, E


2
and testing the slope
of the regression.
14!
6.3.1 ethod 1# The error of estimate,

The error of estimate is a s8uare root of the error of sum of s8uares divided by
error degree of freedom, n-2. i.e
2

n
SS
E
. The smaller

the more successful


is the linear regression model in e'plaining the response, 8.
6.3.2 ethod 2# the coefficient of determination, E
2
The coefficient of determination can be interpreted as the proportion of variability
in the observed response variable that is e'plained by the linear regression model.
The coefficient of determination measures the strength of that linear relationship,
denoted by
E
2
= 1 - SS
E
.SS
@
The greater E
2
the more successful is the linear regression model.
6.3.3 ethod 3# Testing the slope,
1
f

The significance of the fitted regression model can be tested by using the t(
student;s test on the parameter,
1
f
. The test statistic is
B
f
A
f
1
1
4

se
t
. If
2 . 2 < 4
>
n
t t
, then the null hypothesis, 4
f
1
is re@ected. It means that the
regression model is ade8uate and fitted to the data other%ise there is no
relationship bet%een x and 8.
6.3.4 The ,+9:, ,pproach
The analysis of variance A,+9:,B can also be used to test for the significance of
regression as in the table belo%#
,+9:, Table for simple linear regression,
+ + x H
1 4 #
Source of
variation
$egree of
freedom
Sum of
s8uares
ean of
s8uares
?
)egression 1 SS
)
S
)
FSS
)
<1 S
)
<S
E
Error n(2 SS
E
S
E
FSS
E
<n(2
Total n(1 SS
T
>here,
14/

n
i
i E
8 8 SS
1
2
B f A is called regression sum of s8uares %hich measures
variability e'plained by the regression model.


n
i
n
i
i i i E
e 8 8 SS
1 1
2 2
B f A is called error sum of s8uares %hich measures of
une'plained variability in the response.
n
8
8 8 8 S SS
n
i
i
n
i
i
n
i
i 88 @
2
1
1
2
1
2
B A

,
_




is called total sum of s8uares %hich
measures of the total variability in the response.
SS
@
can be %ritten as,
E E
n
i
i i
n
i
i
n
i
i @
SS SS 8 8 8 8 8 8 SS + +

2
1
2
1 1
2
B f A B f A B A
If the value of ? is large compared to >
,1, n-2
, then the regression model is
significant.
6.4 Confidence intervals on regression parameters
, 144 A1( a BC confidence level on the slope
1
f
in a simple linear regression is
given by
Similarly, a 144 A1( a BC confidence level on the intercept
4
f

is given by
%here and
145
B
f
A
f
B
f
A
f
1 2 , 2 < 1 1 1 2 , 2 < 1


se t se t
n n
+
B
f
A
f
B
f
A
f
4 2 , 2 < 4 4 4 2 , 2 < 4


se t se t
n n
+
GG
S
se
2
1
f
B
f
A


B
1
A f B
f
A
2
2
4
GG
S
x
n
se +
E'ample 6.1
The follo%ing measurements of the specific heat of a certain chemical %ere made
in order to investigate the variation in specific heat %ith temperature.
Temperature
o
C 4 14 24 34 44 !4
Specific heat 4.!
1
4.!
!
4.!
5
4.!
7
4./
3
4./!
i. .lot the points on a scatter diagram
ii. Estimate the regression line of specific heat on temperature
iii. Estimate the value of the specific heat %hen the temperature is
3!
o
C.
E'ample 6.2
The follo%ing data %ere collected on 6 lung cancer patients %here x measures the
number of years the patient smo1e cigarette Aor any form of nicotine productB and
8 is the physician;s sub@ective evaluation of the e'tent of lung damage on a scale
of 4 to 144.
x AyearsB 2! 3! 22 1! 46 37 42 31
8 A4(144B !! /4 !4 34 5! 54 51 !!
,n analysis of variance is conducted using a statistic soft%are pac1age and the
output is displayed in the table belo%#
The regression e8uation is
8 F 21.226 ] 1.234x
.redictor Coef SE Coef T p P
value
Constant 21.226 7.442 2.246 4.4//
x
1.234 4.264 4.375 4.44!
S F 6.151/7 ) ( s8uared F D ) ( s8uared A ad@B F 4.524
,nalysis of :ariance
146
Source $f SS S ? p P
value
)egression 1 1274.64 1274.64 17.331 4.44!
)esidual Error / 444.// //.555
Total 5 1/71.!4
i. Estimate the predicted physician scores, on the e'tent of lung
damage, for t%o patients %ho have smo1ing habits of 24 and 44
years respectively.
ii. 9btain a 7! C confidence interval for the true slope S.
iii. Estimate the coefficient of determination AE
2
B. $iscuss briefly on
the value obtained.
iv. Conduct a hypothesis test on the significance of the regression at
level of significance T F 4.4!.
v. ?ind a 7!C Confidence Interval for the the physician;s sub@ective
evaluation of the e'tent of lung damage.
"#ercise =
1. The manager of a car plant %ishes to investigate ho% the plant;s electricity usage
depends upon the plant production. The data is given belo%
.roduction
A)millionB
A'B
4.!1 3.!6 4.31 !.4/ !./4 4.77 !.27 !.63 4.5 !./1 4.7 4.2
Electricity
Msage
AyB
2.46 2.2/ 2.45 2.55 2.77 3.4! 3.16 3.4/ 3.43 3.2/ 2./5 2.!3
AaB Estimate the linear regression e8uation
AbB ,n estimate for the electricity usage %hen ' F !
AcB ?ind a 74C Confidence Interval for the electricity usage.
147
x H
1 4
+
2. ,n e'periment %as set up to investigate the variation of the specific heat of a certain
chemical %ith temperature. The data is given belo%
Temperature
o
?
A'B
!4 /4 54 64 74 144
0eat
AyB
1./4
1./4
1./3
1./!
1./5
1./5
1.54
1.52
1.51
1.52
1.51
1.54
AaB Estimate the linear regression e8uation
AbB .lot the results on a scatter diagram
AcB ,n estimate for the specific heat %hen the temperature is 5!
o
?
AdB ?ind a 7!C Confidence Interval for the specific heat.
3. ,n engineer at a semiconductor company %ants to model the relationship bet%een the
device 0?E A8B and the parameter Emitter ( )S A B. $ata for Emitter ( )S %as first
collected and a statistical analysis is carried out and the output is displayed in the
table given.
)egression ,nalysis# 8 F 145!.2 P /3.65x
1
.redictor Coef SE Coef T .(value
Constant 145!.2 121.1 6.66 4.444
x
1
(/3.65 6.442 (5.76 4.444
S F 17.4 )(S8 F 4.56
,nalysis of variance
Source $? SS S ?
)egression 1 237/! 237/! /3.54
)esidual 16 /552 35/
Total 17 34535
AaB Estimate 0?E %hen the Emitter ( )S is 14.!.
AbB 9btain a 7! C confidence interval for the true slope S.
AcB Test for significance of regression for a F 4.4!.
4. ,n chemical engineer %ants to model the relationship bet%een the purity of o'ygen
A8B produced in a chemical distillation process and the percentage of hydrocarbons
Ax B that are present in the main condenser of the distillation unit. , statistical analysis
is carried out and the output is displayed in the table given.
)egression ,nalysis# 8 F 54.3 ] 14.7x

114
1
x
x H
1 4
+
.redictor Coef SE Coef T .(value
Constant 54.263 1.!73 4/./2 4.444
x
1
14.745 1.315 11.3! 4.444
S F 1.465 )(S8 F 65.5C
,nalysis of variance
Source $? SS S ?
)egression 1 1!2.13 1!2.13 12.6/
)esidual 16 21.2! 1.16
Total 17 153.36
AaB Estimate the purity of o'ygen %hen the percentage of hydrocarbon 1C.
AbB 9btain a 7! C confidence interval for the true slope S.
AcB Test for significance of regression for a F 4.4!.
!. )egression methods %ere used to analy2e the data from a study investigating the
relationship bet%een road%ay surface temperature AxB and pavement deflection A8B.
The data follo%.
Temperature
x
$eflection
8
Temperature
x
$eflection
8
54.4 4./21 52.5 4./35
55.4 4./!5 /5.6 4./25
52.1 4./44 5/./ 4./!2
52.6 4./23 53.4 4./34
56.3 4.//1 54.! 4./25
54.! 4./41 52.1 4./31
54.4 4./35 51.2 4./41
52.4 4./34 53.4 4./31
5!.2 4./44 52.5 4./34
5/.4 4./37 51.4 4./36
AaB Estimate the intercept and slope regression coefficients. >rite the
estimated
regression line.
AbB Compute SS
E
and estimate the variance.
AcB ?ind the standard error of the slope and intercept coefficients.
AdB Sho% that
AeB Compute the coefficient of determination, E
2
. Comment on the value.
AfB Mse a t(test to test for significance of the intercept and slope coefficients at
. &ive the P(values of each and comment on your results.
111
AgB Construct the ,+9:, table and test for significance of regression using the
P
value. Comment on your results and their relationship to your results in part AfB.
AhB Construct 7!C CIs on the intercept and slope. Comment on the relationship of
these
CIs and your findings in parts AfB and AgB.
/. The designers of a database information system that allo%s its users to search
bac1%ards for several days %anted to develop a formula to predict the time it %ould
be ta1e to search. ,ctually elapsed time %as measured for several different values of
days. The measured data is sho%n in the follo%ing table#
+umber of $ays 1 2 4 6 1/ 2!
Elapsed Time 4./
!
4.57 1.3/ 2.2/ 3.!7 !.37
AaB Estimate the intercept and slope regression coefficients. >rite the estimated
regression line.
AbB Compute SS
E
and estimate the variance.
AcB ?ind the standard error of the slope and intercept coefficients.
AdB Sho% that
AeB Compute the coefficient of determination, E
2
. Comment on the value.
AfB Mse a t(test to test for significance of the intercept and slope coefficients at .
&ive the P(values of each and comment on your results.
AgB Construct the ,+9:, table and test for significance of regression using the P(value.
Comment on your results and their relationship to your results in part AviB.
AhB Construct 7!C CIs on the intercept and slope. Comment on the relationship of these
CIs and your findings in parts AviB and AviiB.
Chapter 9
/. ?ultiple >inear Regression
*earning 9utcomes#
,t the end of the lesson, the student should be able to
Mse the least s8uares method to estimate a multiple linear model
Carry out tests to determine if the model obtained is an ade8uate fit to the data
7.1 Introduction
112
ultiple regressions Athe term %as first used by .earson, 1746B is to learn more
about the relationship bet%een several independent or predictor variables and a
dependent or criterion variable. It is an e'tension of a simple linear regression
model.
Consider the follo%ing data consisting of n sets of values
The value of the dependent variable 8
i
is modeled as
A7.1B
>here, ;
., . . . , , ,
2 1 4 are called the regression coefficients can be
estimated by using the least s8uares method. The random error term,

, is
assumed to follo% the normal distribution %ith a mean of 4 and variance of
2
.
7.2 ?itted )egression odel
The true regression line corresponding to E8uation A7.1B is usually never 1no%n.
0o%ever, the regressions model can be estimated by estimating the coefficients
;
., . . . , , ,
2 1 4
for an observed data set.. The estimated regressions
model is obtained using the values of
;

f
., . . ,
f
,
f
,
f
2 1 4
are called the fitted
model e8uations. The least s8uare estimates,
;

f
., . . ,
f
,
f
,
f
2 1 4
are given by
113
B .... , , , A
.
B .... , , , A
B .... , , , A
2 1
2 22 12 2
1 21 11 1
;n n n n
;
;
x x x 8
x x x 8
x x x 8
+ + + +
; ;
x x H ....
1 1 4
These e8uations can be solved by using matrices. Then %e have the fitted
regression model as given belo%
7.3 ,ssessment of the )egression odel
The ,+9:, ,pproach
The analysis of variance A,+9:,B can also be used to test for the significance of
regression as in the table belo%#
,+9:, Table for multiple linear regression,
Source of
variation
$egree of
freedom
Sum of
s8uares
ean of s8uares ?
)egression 1 SS
)
S
)
FSS
)
<1 S
)
<S
E
Error n ( A1]1B SS
E
S
E
FSS
E
<n(A1]1B
Total n(1 SS
T
>here,

n
i
i E
8 8 SS
1
2
B f A is called regression sum of s8uares %hich measures
variability e'plained by the regression model.


n
i
n
i
i i i E
e 8 8 SS
1 1
2 2
B f A is called error sum of s8uares %hich measures of
une'plained variability in the response.
n
8
8 8 8 S SS
n
i
i
n
i
i
n
i
i 88 @
2
1
1
2
1
2
B A

,
_




is called total sum of s8uares %hich
measures of the total variability in the response.
114
; ;
x x H
f
....
f f f
1 1 4
+ + +
+ + + +
; ;
x x H ....
1 1 4
SS
@
can be %ritten as,
E E
n
i
i i
n
i
i
n
i
i @
SS SS 8 8 8 8 8 8 SS + +

2
1
2
1 1
2
B f A B f A B A
The ,+9:, table is used to test for significance of regression
The hypotheses are#
The test statistic is
USE
USE
J
4
.
If the value of J
$
is large compared to >
,,;, n-,;11-
, then the multiple regression
model is significant.
E'ample 7.1#
, set of e'perimental runs %ere made to determine a %ay of predicting coo1ing
time 8 at various levels of oven %idth x1, and temperature x2. The data %ere
recorded as follo%s#
i. Estimate the multiple linear regression e8uation for the data.
ii. Test %hether the regression e'plained by the model obtained in part AiB is
significant at the 4.41 level of significance.
Solution#
i. Msing the computer for computations, the follo%ing results %ere observed.
The regression e8uation is
Coo1ing time F 4.!/6 ] 2.54/ %idth ] 2.4!1 temperature
11!
8 x1 x2
6.4 1.32 1.15
15.05 2.69 3.4
18.75 3.56 4.1
30.25 4.41 8.75
44.86 5.35 14.82
48.94 6.3 15.15
51.55 7.12 15.32
61.5 8.87 18.18
100.44 9.8 35.19
111.42 10.65 40.4
2ero not are both and #
#
2


1 1
% 1 +
+
R
R
.redictor Coef SE Coef T .
Constant 4.!/6 4.!6! 4.754 4.3/4
>idth 2.54/ 4.174 13.73! 4.444
Temp. 2.4!1 4.44/ 44.364 4.444
S F 4./334 )(S8 F 144C )(S8Aad@B F 144C
ii. The follo%ing ,+9:, table is obtained
Analysis of @ariance
Source 1F SS ?S F P
Regression % 1+/2-.--0 20<7.77< 1-70<.=<%
+.+++
Residual "rror < %.=+/ +.0+1
Total / 1+/27.10-
Since ? 3 >
$"$1, 2, 3
F 7.!!, then %e re@ect 0
4
. It means that the regressions
are significant.
"#ercise /
1. &iven the data#
Test +umber 8 x1 x2
1 1./ 1 1
2 2.1 1 2
3 2.4 2 1
4 2.6 2 2
! 3./ 2 3
/ 3.6 3 2
5 4.3 2 4
6 4.7 4 2
11/
7 !.5 4 3
14 ! 3 4
AaB ?it a multiple linear regression model to these data.
2. &iven the data#
9bservation
+umber .ull Strength 8
>ire *ength
x1 $ie 0eight x2
1 7.7! 2 !4
2 24.4! 6 114
3 31.5! 11 124
4 3!.44 14 !!4
! 2!.42 6 27!
/ 1/.6/ 4 244
5 14.36 2 35!
6 7./4 2 !2
7 24.3! 7 144
14 25.!4 6 344
11 15.46 4 412
12 35.44 11 444
13 41.7! 12 !44
14 11.// 2 3/4
1! 21./! 4 24!
1/ 15.67 4 444
15 /7.44 24 /44
16 14.34 1 !6!
17 34.73 14 !44
24 4/.!7 1! 2!4
21 44.66 1! 274
22 !4.12 1/ !14
23 !/./3 15 !74
24 22.13 / 144
2! 21.1! ! 444
AbB ?it a multiple linear regression model to these data.
3. , study %as performed to investigate the shear strength of soil ,8- as it related to
depth in meter Ax
1
B and percentage moisture content Ax
2
B. Ten observations %ere
collected and the follo%ing summary 8uantities obtained#

/ . !7! , 351 , 6 . 53/ , 144 , 6 . !!4 , 43
, 3!2 , 12 , 527 , 31 , 7 . 244 , !
, 71/ , 1 , !!3 , 223 , 14
2
2 1
2 1
2
2
2
1
2 1






i i i i i
i i i i
i i i
8 8 x 8 x
x x x x
8 x x n
AaB Estimate the parameters to fit the multiple regression models for these data.
AbB >hat is the predicted strength %hen x
1
F16meter and x
2
F 43C.
115
4. , set of e'perimental runs %ere made to determine a %ay of predicting coo1ing time
8 at various levels of oven %idth x1, and temperature x2. The data %ere recorded as
follo%s#
AaB ?it a multiple linear regression model to these data.
AbB Estimate and the standard errors of the regression coefficients.
AcB Test for significance of and .
AdB .redict the useful range %hen brightness F 64 and contrast F 5!. Construct a 7!C
.I.
AeB Compute the mean response of the useful range %hen brightness F 64 and contrast
F 5!. Compute a 7!C CI.
AfB Interpret parts AdB and AeB and comment on the comparison bet%een the 7!C .I
and 7!C CI.
!. ,n article in Vpti/al EnFineeinF AJ9perating Curve E'traction of a CorrelatorLs
?ilter,K :ol. 43, 2444, pp. 255!P2557B reported the use of an optical correlator to
perform an e'periment by varying brightness and contrast. The resulting modulation
is characteri2ed by the useful range of gray levels. The data are sho%n
"rightness ACB# !
4
/
1
/
!
14
4
14
4
14
4
!4 !5 !4
Contrast ACB# !
/
6
4
5
4
!4 /! 64 2! 3! 2/
Mseful range AngB# 7
/
!
4
!
4
11
2
7/ 64 1!
!
14
4
2!
!
AaB ?it a multiple linear regression model to these data.
AbB Estimate and the standard errors of the regression coefficients.
AcB Test for significance of and .
116
8 x1 x2
6.4 1.32 1.15
15.05 2.69 3.4
18.75 3.56 4.1
30.25 4.41 8.75
44.86 5.35 14.82
48.94 6.3 15.15
51.55 7.12 15.32
61.5 8.87 18.18
100.44 9.8 35.19
111.42 10.65 40.4
AdB .redict the useful range %hen brightness F 64 and contrast F 5!. Construct a 7!C
.I.
AeB Compute the mean response of the useful range %hen brightness F 64 and contrast
F 5!. Compute a 7!C CI.
AfB Interpret parts AdB and AeB and comment on the comparison bet%een the 7!C .I
and 7!C CI.
/. , study %as performed on %ear of a bearing y and its relationship to '
1
F oil viscosity
and '
2
F load. The follo%ing data %ere obtained#
'
1
1./ 1!.
!
22.4 43.4 33.4 44.4
'
2
6!
1
61/ 14!
6
124
1
13!
5
111
!
y 27
3
234 152 71 113 12!
AaB ?ir a multiple regression model to these data.
AbB Estimate
2
and the standard errors of the regression coefficients.
AcB Mse the model to predict %ear %hen '1 F 2! and '2 F 1444.
AdB ?it a multiple regression model %ith an interaction term to these data.
AeB Estimate
2
and seACB for this ne% model. 0o% did these 8uantities
changeD $oes
this tell you anything about the value of adding the interaction term to the modelD
AfB Mse the model in AdB, to predict %hen '
1
F2! and '
2
F1444. Compare this
prediction %ith the predicted value from part AcB above.
Chapter 1+
1+. 1esign of "#perient
117
,t the end of the lesson, the student should be able to#
$esign and conduct factorial e'periments involving one and t%o factors using
factorial design.
Mnderstand the concept of one(%ay and t%o ,+9:,.
Sno% ho% to construct ,+9:, table.
Mnderstand ho% to use ,+9:, to analy2e data from these e'periments
,naly2e and interpret main effects and interactions.
14.1 Introduction

E'perimental design techni8ues based on statistics are useful in the engineering
%orld for improving the performance of manufacturing process.
"y using design e'periments, %e can determine %hich subsets of the process
variables have the most influence on process performance. ,mong the advantage
of using this e'perimental design are# it can improved process yield, reduced
variability in the process, reduced design and development time and also can
reduced cost of operation. Statistically designed e'periments allo% efficiency and
economy in the e'perimental process. If data are collected %ithout an
e'perimental design, it may not be possible to e'tract the desired information.
)esults of the analysis may be confusing, misleading, not credible and not
reproducible. +ormally %hen several factors are of interest in an e'periment, a
factorial e'periment should be used.
14.2 Terminology and definition
There is some terminology in e'perimental design such as
h Factor# variable %hose influence upon the response variable is being studied
in the e'periment.
h Factor >e&el# different modes or settings of a factor.
h Trial Aor runsB# applying of a treatment to an e'perimental unit.
h Treatent or le&el cobination# specific combination of the levels of
different factors.
h "#periental units Asub:ectsB# the basic unit for %hich the response
measurement are collected.
h Replicates# number of e'perimental units on %hich a particular treatment is
applied.
h , factorial e#periment means that in each complete replicate of the
e'periment all possible combinations of the levels of the factors are
investigated.
h The effect of a factor is defined as the change in response produced by a
change in the level of the factor.
h Bain effect# the pi7a8 >a/tos in the study that change the response
variable.
Interaction effect# the change in response variable is due to an intea/tion
bet%een the factors
124
14.3 9ne P>ay ,+9:,
, factor is a variable that can ta1e one of several le&els used to differentiate one
group from another. ,n e'periment has a one5(ay$ or copletely randoized
design if several levels of one factor are being studied and the individuals are
randomly assigned to its levels. AThere is only one %ay to group the data.B
Analysis of &ariance AA3!@AB is the techni8ue used to determine %hether more
than t%o population means are e8ual. !ne5(ay A3!@A is used for completely
randomi2ed, one(%ay designs. The esponse variable is the variable you;re
comparing. The >a/to variable is the categorical variable being used to define the
groups. >e %ill assume ; samples AgroupsB. The one-?a8 is because each value is
classified in e'actly one %ay. E'amples include comparisons by gender, race,
color, etc. There are some condition or assumption have to be made such as the
data are randomly sampled. the variances of each sample are assumed e8ual and
the residuals are normally distributed. The null hypothesis is that the means are all
e8ual and the alternative hypothesis is that at least one of the means is different.
The ,+9:, doesn;t test that one mean is less than another. Its only test %hether
all means are e8ual or at least one is different. The hypothesis is#
9ne(%ay ,+9:, table is#
Source of &ariation SS 1F ?S F P
"et%een
group<Treatment
SS
treatment
1 ( 1 S
treatment
F
SS
treatment
<1(1
S
treatment
<S
error
>ithin group
AErrorB
SS
error
+ ( 1 S
error
FSS
error
<+(
1
Total SS
total
+ ( 1
Calculation of SS#
&rand ean#
The grand mean is the average of all the values %hen the factor is ignored. It is a
%eighted average of the individual sample means.
121
e8ual not is mean one least at #
#
1
+
R
vs
R
# U J

( )
;
; ;
;
i
i
;
i
i i
;
i
i
;
i
i
n n n
x n x n x n
x
n !
x n
n !
x
x
+ + +
+ + +

,
_

,
_

.......
.......

% 1
%
% % 1 1
%
1
1
1
%
1
Total variation#
The variation bet%een observations and the grand mean given by
SS
Total
F
"et%een &roup :ariation#
The variation bet%een each sample mean and the grand mean

given by


>ithin &roup :ariation#
The %eighted total of the individual variations given by
Calculation for degree of freedom, df#
The bet%een group<treatment df is one less than the number of groups. If %e have
three groups<treatments, then df for treatment is 2.
The %ithin group df is the sum of the individual df;s of each group. *et say the
sample si2es are 5, 7, and 6. Then the df for %ithin treatments is F / ] 6 ] 5 F 21.
The total df is one less than the sample si2e. If the total observations is 24 then the
df for total is 24 P 1 F 23.
E'ample#
The statistics classroom is divided into three ro%s# front, middle, and bac1. The
.rofessor noticed that the further the students %ere from him, the more li1ely they
%ere to miss class or sleep in the class. 0e %anted to see if the students sit in front
and near to him %ill did better on the e'ams. , random sample of the students in
each ro% %as ta1en. The score for those students on the e'am %as recorded as
122
( ) x x x x
;
i
n
C
iC
;
i
n
C
iC


1 1
%
%
1 1
( ) x x n x x n SS
;
i
i i i
;
i
i @eat7ent



%
1
%
1
B A
@eat7ent @otal Eo
i
i
;
i
i Eo
SS SS SS
s
s n SS

9)
group a %ithin variance the is %here
B A
%
%
1
1
?ront# 62, 63, 75, 73, !!, /5, !3
iddle# 63, 56, /6, /1, 55, !4, /7, !1, /3
"ac1# 36, !7, !!, //, 4!, !2, !2, /1
i. Construct the one(%ay ,+9:, table.
ii. Test %hether the difference in the mean scores of the front,
middle, and bac1
ro%s in class are significance or not.
Solution#
n
?ront
62, 63, 75, 73, !!, /5, !3 !34 41,774 5
iddle
63, 56, /6, /1, 55, !4, /7, !1, /3 /44 41,444 7
"ac1
36, !7, !!, //, 4!, !2, !2, /1 426 23.4/4 6
Total
1,!/2 14/,74
6
24
! F n
1
]n
2
]n
3
F 5]7]6 F 24 Anumber of observationsB
; F 3 Anumber of groups
Calculation of &),+$ E,+#
Calculation of SS#
123

x
%

x
% 77+ 1+1
%0
127%
%
1
%
1
. ,
B A

,
_

;
i
i
;
i
i
n !
x
x
( )
= %=< 2 % 77+ 1+1 /0= 1+7
1 1
%
%
1 1
. , . , ,



x x x x SS
;
i
n
C
iC
;
i
n
C
iC @otal
1,741.46
141,//4.2 ( 22,676 44,!3!.11 44,126.!5
. ,
B A B A B A

B A

+ +
+ +

% 77+ 1+1
=
0%=
/
7+0
<
2-+
% % %
1
%
1
;
i i
n
C
iC
@eat7ent
x
n
x
SS
i
3,36/.32
1,741.46 ( !,265.6


@eat7ent @otal Eo
SS SS SS
9ne(>ay ,+9:,
Source of &ariation SS 1F ?S F P
"et%een
group<Treatment
1741.46 2 7!4.54 !.74 4.447
>ithin group
AErrorB
336/.32 21 1/1.2!
Total !265.64 23
?
Critical
F f
4.4!,2,21
F 3.45
Since ? 3 ?
Critical
F 3.45, then %e re@ect the null hypothesis that the means score of
the three ro%s in class %ere the same. So %e can conclude that at least one ro% has
a different mean.
>e can also use the p(value to ma1e a decision. ?rom ,+9:, table, .(value F
4.447, %hich is less than the significance level of 4.4!, so %e re@ect the null
hypothesis. There is enough evidence to support the claim that there is a difference
in the mean scores of the front, middle, and bac1 ro%s in class. The ,+9:,
doesn;t tell %hich ro% is different.
14.4 T%o(>ay ,+9:,
?actorial design can be used to identify factors %ith significant effects on the
response, to identify AdiscoverB interactions among factors, to identify %hich
factors have the most important effects on the response and last to decide %hether
further investigation of a factor;s effect is @ustified. The 2
;
factorial design means
that the e'periment has been setup %ith ; factors at 2 levels for each factor. The
ob@ective is to test and to determine %hich are the main effects and the
interactions are important of all ; factors at 2 levels.
The ,nalysis of :ariance A,+9:,B can be used to analy2e the data from
e'perimental designs. ?rom the ,+9:,, the null hypothesis P that the effect is
e8ual to 4 is tested. >hen R
$
is re@ected, this provides evidence that the factor
involved actually affect the outcome AresponseB. 0o%ever some assumptions
should be made before %e do the analysis. ,mong the assumptions are the same
numbers of replicates for each treatment, at least 2 replications for each cells, and
each treatment is a random sample from a normal population.
14.! The 2
2
factorial design
The 2
2
factorial design means that the e'periment has been setup %ith 2 factors at
2 levels for each factor. The ob@ective is to test and to determine %hich are the
124
t%o main effects and the interactions are important of all 2 factors at 2 levels. In
one e'periment, the e'ample of t%o factors is factor ,# reaction time and factor
"# reaction temperature. ?or factor ,, the t%o levels are time at 1 hour and 2
hours. These levels can also be denoted as P AminusB for one level and ] AplusB for
another level. ?or factor ", the t%o levels are 3!
o
C A(B and !!
o
C A]B. This can be
e'plained by the table belo%#
?actor , ATimeB
1 hour
A ( B
2 hours
A ] B
?actor "
ATemperatureB
3!
o
C
A ( B
Qields
measured
113 112 111
, , x x x
Qields
measured
123 122 121
, , x x x
!!
o
C
A ] B
Qields
measured
213 212 211
, , x x x
Qields
measured
223 222 221
, , x x x
%here, n = 3 epli/ations and x
iC;
, ; = 1,"",n are the observations in the cell Ai,CB.
The levels can be in the form of variable data AnumbersB or attribute data such as
male and female, on and off. +ormally the levels %ill be designated one level as
high A]B and the other level as lo% A(B as e'plained in the table belo%#
?actor ,
ATimeB
*o%
A (B
0igh
A ] B
?actor "
ATemperatureB
*o%
A ( B
)1*
a
0igh
A ] B
b ab
,ll possible treatment combination of the level of the factors or called factorial
e'periment is given in a design or test matri' for 2
2
factorial designs as follo%s#
Treatent
Cobinatio
Factorial "ffect
12!
n
A B AB
)1* ( ( ]
a
] ( (
b
( ] (
ab
] ] ]
The letters (1), a, b and ab represents the total of all n observations at each
treatment combination.
14./ Estimate the effects of factors in the 2
2
factorial design
The effect of main factors, factor A and factor # and the effect of A# interaction
can be calculated using the formula belo%.
Effect of main factor A is#
[ ] B 1 A
2
1
2
B 1 A
2
+
+

+
b ab a
n n
b
n
ab a
A
Effect of main factor # is#
[ ] B 1 A
2
1
2
B 1 A
2
+
+

+
a ab b
n n
a
n
ab b
#
Effect of A# interaction is#
[ ] b a ab
n n
b a
n
ab
A# +
+

+
B 1 A
2
1
2 2
B 1 A
%here I^.U is called contrast.
14.5 Sum of s8uares formula for ,+9:,
The effects of main factors and the interaction factor can be tested by using the
t%o(%ay ,+9:, table.
,+9:, Table#
Source 1egree Su of ?ean of F P5
12/
of
&ariatio
n
of
Freedo

SAuare
s
sAuares &alu
e
A 1 SS
A
US
A
=SS
A
.1 US
A
.US
E
# 1 SS
#
US
#
=SS
#
.1 US
#
.US
E
A# 1 SS
A#
US
A#
=SS
A#
.
1
US
A#
.US
E
Error n-4 SS
E
US
E
=SS
E
.n-
4
Total n-1 SS
@
The sum of s8uares for factor A, SS
,
is given by
[ ]
n
A +ontast
b ab a
n
SS
A
4
B A
B 1 A
4
1
2
+
The sum of s8uares for factor #, SS
#
is given by
[ ]
n
# +ontast
a ab b
n
SS
#
4
B A
B 1 A
4
1
2
+
The sum of s8uares for factor ,, SS
A#
is given by
[ ]
n
A# +ontast
b a ab
n
SS
A#
4
B A
B 1 A
4
1
2
+
The total sum of s8uares, SS
@
is given by,
2
2
1
1 1
2
1
1 1
2
4
1

,
_


C
i
n
;
iC;
C
i
n
;
iC; @
x
n
x SS
The error sum of s8uares is obtained by subtraction#

SS
E
= SS
@
5 SS
A
5 SS
#
- SS
A#
14.6 *east s8uare regression model
,n initial estimated regression model is,
2 1 3 2 2 1 1 4
f f f f
f x x x x 8 + + +
%here,
125
4
f
F constant Agrand average of all 4n observationsB
1
f
F the estimated coefficient of x
1
Athe effect of having factor AB F Aeffect AB<2
2
f
F the estimated coefficient of x
2
Athe effect of having factor #B F Aeffect #B<2
3
f
F the estimated coefficient of x
1
x
2
Athe effect of interaction bet%een factor A
and factor #B F Aeffect A#B<2

The final regression model can be determined from the ,+9:, table. ?or
instance that the interaction factor bet%een , and " is not significant, then the
final regression model is,
2 2 1 1 4
f f f
f x x 8 + +
E'ample 1#
,n engineer is interested in the effect of cutting speed A,B and tool
geometry A"B on the life in hours of a machine tool. T%o cutting speeds and
t%o different geometries are used. Three e'perimental tests %ere done at each of
the four combinations. The data are as follo%s#
Tool
&eometry
A"B
Cutting Speed A,B
*o% 0igh
1 22 26 24 34 35 27
2 16 1! 1/ 11 14 14
AaB Construct the 2
2
factorial design table.
AbB ?ind the estimate of all effects and interaction.
AcB Construct the ,+9:, table for each effect= test the null hypothesis that the
effect is e8ual to 4.
Solution#
AaB The 2
2
factorial design table#
Treatent
Cobination
Factorial
"ffect
>ife tie
)hour*
A B AB Total A&erage
)1* ( ( ] 22 26 24 54 23.33
a
] ( ( 34 35 27 144 33.33
b
( ] ( 16 1! 1/ 47 1/.33
126
ab
] ] ] 11 14 14 31 14.33
AbB Estimates of the effects#
, F 1<2nIa C ab ; b ;')(B; 1C7D1++ E -1 9 0/ 9 <+B; %
" F 1<2nIb C ab ; a ;')(B; 1C7D0/ E -1 9 1++ 9 <+B; 512
," F 1<2nI')( C ab ; b ; aB; 1C7D<+ E -1 9 0/ 9 1++B; 5=
AcB ,+9:, table#
To construct ,+9:, table %e have to find the sum of s8uares for ,, ", ,"
and Total.
SS
A
; Da C ab ; b ;')(B
%
C0n ; D1++ E -1 9 0/ 9 <+B
%
C1% ; 100C1% ; 1%
SS
B
; Ib C ab ; a ;')(B
%
C0n ; D0/ E -1 9 1++ 9 <+B
%
C1% ; =1++C1% ; 7<2
SS
AB
; I')( C ab ; b ; aB
%
C0n ; D<+ E -1 9 0/ 9 1++B
%
C1% ; %-+0C1% ; 1/%
SS
T
; ) %%
%
E%=
%
E%+
%
E-0
%
E-<
%
E%/
%
E1=
%
E12
%
E17
%
E11
%
E1+
%
E1+
%
* 9 )%2+*
%
C1%
; 717+ 9 2%+=.--- ; /21.77<
SS
"
; SS
T
5 SS
A
5 SS
B
5 SS
AB
; /21.77< 5 1% 9 7<2 9 1/% ; <%.77<
A3!@A Table:
Source of variation SS df S ?
4
, 12 1 12 1.321
" /5! 1 /5! 54.31!
," 172 1 172 21.136
Error 52.//5 6 7.463
Total 7!1.//5 11
127
?rom ?(Table# ?
4.4!,1,6
F !.32
Conclusion#
The effect of , is not significant because ?
4
T !.32. "ut because ?
4
is greater than
!.32 for " and ," so it means that the effects of factor " and the interactions are
significant to the effective life of the machine.
"#ercise 1+
1. MT. %ishes to compare four programs for training staff to perform a certain tas1.
T%enty ne% staffs are randomly selected to the training programs, %ith ! in each
program. ,t the end of the training, a test is conducted to see ho% 8uic1ly trainees
can perform the tas1. The number of times the tas1 is performed per minute is
recorded for each trainee and the results as in table belo%#
Time in minutes
134
Program 1 9, 12, 14, 11, 13
Program 2 12, 14, 11, 13, 11
Program 3 9, 8, 7, 8, 11
Program 4 10, 6, 9, 9, 10
i. Construct the ,+9:, table.
ii. ,re there any differences bet%een the four programsD
2. , farmer %ants to determine %hich type of fertili2er is best for gro%ing mango trees
in his orchard. 0e chooses three types of fertili2ers, ?1, ?2 and ?3 and treats t%o
mango trees to each tree is recorded in table belo%#
Fertilizer Number o mangoes
rom ea!" tree
F1 32, 40
F2 43, 42
F3 20, 10
i. Construct the ,+9:, table.
ii. ,re there any differences bet%een the three fertili2erD
3. ,n engineer is investigating the thic1ness of epita'ial layer %hich %ill be sub@ect to
t%o variations in ,, deposition time A] for short time, and P for long timeB and t%o
levels of ", arsenic flo% rate A( for !!C and ] for !7CB. The engineer conduct 22
factorial design %ith n F 4 replicates. The data are as follo%#
131
,rsenic *evel
$eposition Time
" P
A*o% ( !!CB
" ]
A0igh P !7CB
, ( A*ongB
14.435
14.1/!
13.752
13.745
13.664
13.6/4
14.432
13.714
, ] AShortB
14.621
14.5!5
14.643
14.656
14.666
14.721
14.41!
14.732
i. Construct the 2 - 2 factorial design table.
ii. ?ind the estimate of all effects and interaction.
iii. Construct the ,+9:, table for each effect, test the null hypothesis that the effect
is e8ual to 4.
4. , t%o factor e'perimental design %as conducted to investigate the lifetime of a
component being manufactured. The t%o factors are , AdesignB and " Acost of
materialB. T%o levels AA]B and A(BB of each factor are considered. Three components
are manufactured %ith each combination of design and material, and the total lifetime
measured Ain hoursB is as sho%n in table belo%
Treatment
Combinatio
n
$esign
,
aterial
"
,"
Total lifetime of 3
components
Ain hoursB
A1B
( ( ] 122
a
] ( ( /4
b
( ] ( 124
ab
] ] ] 116
i. .erform a t%o %ay analysis of variance to estimate the effects of design and
material e'pense on the component life time if the sum s8uares of total are 14!4.
ii. "ased on your results in part AaB, %hat conclusions can you dra% from the
factorial e'perimentD
132
iii. Indicate %hich effects are significant to the lifetime of a component.
iv. >rite the least s8uare fitted model using only the significant sources.
!. ,n engineer suspects that the surface finish of metal parts is influenced by the type of
paint used and the drying time. 0e selected t%o drying times, 24 and 34 minutes and
used t%o types of paint. Three parts are tested %ith each combination of paint typoe
and drying time. The data are as follo%#

i. Compute the estimates of the effects and their standard errors for this design.
ii. .erform an analysis of variance of the appropriate regression model for this
design. Include in your analysis hypothesis tests for each coefficient, as %ell as
residual
/. ,n e'periment involves a storage battery used in the launching mechanism of a
shoulder(fired ground(to(air missile. T%o material types can be used to ma1e the
battery plates. The ob@ective is to design a battery that is relatively unaffected by the
ambient temperature. The output response from the battery is effective life in hours.
T%o temperature levels are selected, and a factorial e'periment %ith four replicates is
run. The data are as follo%s#
133
$rying Time AminB
.aint 24min 34min
ICI 54
/4
!4
56
6!
72
+I..9+ 72
6/
/6
//
4!
6!
i. Compute the estimates of the effects and their standard errors for this design.
ii. .erform an analysis of variance of the appropriate regression model for this
design. Include in your analysis hypothesis tests for each coefficient, as %ell as
residual analysis. State your final conclusions about the ade8uacy of the model.
Compare your results to part AcB and comment.
5. ,n article in the IEEE Transactions on Semiconductor anufacturing A:ol. !, 1772,
pp. 214(222B describes an e'periment to investigate the surface charge on a silicon
%afer. The factors thought to influence induced surface charge are cleaning method
Aspin rinse dry or S)$ and spin dry or S$ and the position on the %afer %here the
charge %as measured. The surface charge A -14
11
8<cm
3
B response data are sho%n.
Cleaning
ethod
Test .osition
S$
* )
1.// 1.64
1.74 1.64
1.72 1./2
S)$
(4.21 (5.!6
(1.3! (2.24
(2.46 (!.3/
i. Compute the estimates of the effects and their standard errors for this design.
ii. .erform an analysis of variance of the appropriate regression model for this
134
Temperature A_?B
aterial *o% 0igh
1 13
4
1!
!
2
4
54
54 16
4
6
2
!6
2 13
6
11
4
7
/
14
4
1/
6
1/
4
6
2
/4
design. Include in your analysis hypothesis tests for each coefficient, as %ell as
residual analysis. State your final conclusions about the ade8uacy of the model.
Compare your results to part AcB and comment.
oooOOOooo

13!

You might also like