PPT07 - Discrete Data Analysis

STAT6174– Probability Theory and
Applied Statistics
Topik – 7
Discrete Data Analysis
LEARNING OUTCOMES
LO2 : Use proper statistical techniques for statistical
decision making in the real problem
LO3 : Use MINITAB software to conduct analyses
LO4 : Interpret the results of statistics calculation and output of

analysis using MINITAB
LO5 : Explain the suitable decision from statistical

method solution
OUTLINE
• Inferences on a population proportion
• Comparing two population proportions
• Goodness of fit tests for one-way contingency tables
• Testing for independence in two way
• Application with MINITAB
Inferences on a population
proportion
proportion
Discrete random variable take only discrete values
Example :
A random variable measuring the number of errors in a software product:
0, 1, 2, 3, ….
A product’s quality level :
high, medium, or low
Machine breakdown:
mechanical failure, electrical failure, or operator misuse
proportion
 Discrete data analysis is also often referred to as categorical data
analysis.
 Discrete data set consist of the frequencies or counts of the

observation found at each of the possible levels or cells.
 The data can be represented in a tabular form known as

contingency table and analyses by chi square goodness of fit.
proportion
Population
Proportion p with
characteristic
Random sample of
size n
With Without
characteristic characteristic
Cell probability p Cell probability 1-p

---------------------- ----------------------
Cell frequency x Cell frequency n-x
proportion
Estimation
Suppose that a parameter p represent the unknown proportion of a
population that possesses a particular characteristic. p can be
estimated :
X
pˆ 
n
proportion
Confidence Interval
• For large enough values of n the sample proportion can be taken to have
approximately the normal distribution
pˆ  p
 p (1  p )  ~ N 0,1
pˆ ~ N  p,  p (1  p )
 n 
n
• Confidence Interval Estimation
ˆ (1  p
p ˆ) ˆ (1  p
p ˆ)
ˆ  z / 2
p ˆ  z / 2
 p p
n n
proportion
Hypothesis Test
Hypothesis Test Statistic Critical value
H0: p  po z   z
H1: p < po
pˆ  po
z
H0: p  po po (1  po )
z  z
H1 : p > p o n
z  z / 2 or
H0: p = po
z   z / 2
H1: p ¹ po
proportion
Example
A biologist is interested in whether opossums give birth to male and female progeny with equal
probabilities. A group opossums is observed, and out of 23 births, 14 are male and 9 are female.
Suppose that each opossum offspring has a probability p of being male, independent of any
other births. The number of male births out of 23 births is then a random variable with a
B(23,p) distribution. Use α=5%
H0: p = 0,5
H1: p ¹ = 0,5
proportion
Test statistics :
pˆ  po 14 / 23  0,5
z   0,83
po (1  po ) 0,5(1  0,5)
n 23
Critical Value :
z  z / 2 or z   z / 2  z  1,96 or z  1,96
P value = 2 Φ(-0,83) = 2(0,2033) = 0,4066

Conclusion : don’t reject H0. There is not sufficient evidence to
conclude that male and female births are not equally likely.
proportion
Confidence Interval Estimation
14 14 / 23(1  14 / 23) 14 14 / 23(1  14 / 23)

  1,96  p   1,96
23 23 23 23
 0,410  p  0,808
Comparing two population
proportions
Definition
If random variable X1 measures the number of successes

observed in a sample of size n1 from population 1, then
X1 ~ B(n1, p1)
Then random variable X2 measures the number of

successes observed in a sample size n2 from population 2,
then X2 ~ B(n2, p2)
Estimation
Inference on difference between the two population proportion
p1 – p2
The unbiased point estimation of the two populations proportion are
x1 x2
pˆ 1  pˆ 2 
Difference n1 n2
x1 x2
pˆ 1  pˆ 2  
n1 n2
Confidence Interval
Confidence Interval Estimation :
pˆ 1qˆ1 pˆ 2 qˆ 2 pˆ 1qˆ1 pˆ 2 qˆ 2
( pˆ 1  pˆ 2 )  z / 2  ˆ ˆ
 ( p1  p2 )  ( p1  p2 )  z / 2 
n1 n2 n1 n2
Where,
qˆ1  1  pˆ 1
qˆ 2  1  pˆ 2
Hypothesis Test
Hypothesis Test Statistic Critical value
H0: p1-p2 ≥ 0
Z
 pˆ1  pˆ 2    p1  p2  z   z
H1: p1-p2 < 0
1 1
pˆ (1  pˆ )   
H0: p1-p2 ≤ 0  n1 n2  z  z
H1: p1-p2 > 0 X1  X 2 X1 X2
p̂  , p̂1  , p̂ 2 
n1  n 2 n1 n2
H0: p1-p2 = 0 z  z / 2 or
where X1 and X2 are the number of
z   z / 2
H1: p1-p2 ≠ 0 successes in samples 1 and 2
Example
Recall that x = 19 of then n = 33 offspring raised by opossums with the enhanced diet are
male, and that y = 15 of the m = 30 offspring raised by opossums without the enhanced
diet are male. The Trivers-Willard hypothesis suggest that proportion with enhanced diet
higher than proportion of without enhanced. α = 5%.
Hypothesis :
H0: p1-p2 = 0
H1: p1-p2 > 0
Example
• Hypothesis:
H0: p1-p2  0
H1: p1-p2 > 0
• α = 5% = 0,05
• Statistic Test:
Z 
 pˆ1  pˆ 2    p1  p2  
19 / 33  15 / 30  0  0,6
1 1  1 1 
ˆp (1  pˆ )    0,54(1  0,54)   
 n1 n2   33 30 
19  15
p̂   0,54
33  30
Example
Critical values :
Reject H0 if
z  z
z  1,645
Conclusion : Don’t reject H0
Because 0,6 < 1,645
Goodness of fit tests for one-way
contingency tables
Definition
 Use for analysis of classification with more than two levels is

considered
 The hypothesis test are describe for assessing whether the

probability vector of multinomial distribution takes a specified value.
 Goodness of fit test, offered to as Chi Square test, are used to test
the hypothesis.
Definition
Population
Random sample of size n
Classification
Category 1 Category 2 Category k

…
Cell frequency x1 Cell frequency x2 … Cell frequency xk

Goodness of Fit Test
 Consider a multinomial distribution with k cells and a set of
unknown probabilities p1, p2,…pk.
 Based upon a set of observed cell frequencies x1, x2, …, xk =n, The
hypothesis :
H0 : pi = pi*
H1 : pi ≠ pi*
Where 1 ≤ i ≤ k
Goodness of Fit Test
 Chi square test hypothesis

2
(x 
k
e )
χ2   i i
i 1 ei
with the expected cell frequencies ei = npi

 Critical value
reject H0 if
χ 2  χ (2k -1),α
χ (2k -1),α
is chi square distribution with degrees of freedom k-1
Example
Recall that out of n=46 machine breakdown x 1=9 are attributable to electrical
problems, x2=24 are attribute to mechanical problems, and x 3=13 are attribute
to operator issues.
It is suggested that the probabilities of these three kinds of breakdown are

respectively
p1*=0,2 p2*=0,5 p3*=0,3
The plausibility of this suggestion can be examined with chi-square goodness of

fit test. α=5%
Example
Hypothesis :
H0 : p1=0,2 p2=0,5 p3=0,3
H1 : p1≠0,2 p2≠0,5 p3≠0,3
Chi-square goodness of fit test
2
k
(x  e )
χ 2   i i  0,0942
i 1 ei
χ 2  5,991
Critical value
Conclusion : Don’t reject H0
• Chi-square Goodness of Fit Test
Electrical Mechanical Operator misue

xi X1=9 X2=24 X3=13
ei 46x0,2=9,2 46x0,5=23 46x0,3=13,8
2 2 2
(9  9, 2) (24  23) (13  13,8)
χ2     0,0942
9,2 23 9,213,8
Testing for independence in two
way
Definition
 A two way contingency table is a set of frequency that summarize

how a set of object is simultaneously classified under two (variable)
different categorization
 If the first categorization has r levels and the second categorization

has c levels, then the form of a set of frequencies
xij where1≤ i ≤ r 1 ≤ j ≤ c
Definition
Second categorization (Variable)

First 1 2 ... j ... c Totals
Categorization
(variable)
1 x11 x12 … x1j … x1c X1.
2 x21 x22 … x2j … x2c x2.
… … … … … … … …
i xi1 xi2 … xij … xic xi.
… … … … … … … …
r xr1 xr2 … xrj … xrc xr.
Totals … …
x.1 x.2 x.j x.c n=x..
Hypothesis Test
 The hypothesis
H0: There is no relationship between the two variables
(independent)
H1: There is a relationship between the two variables
(dependent)
 Statistic test
r c (x ij  e ij ) 2
χ 2  
i 1 j1 e ij
x i.x .j
where the expected cell eij 
n
Hypothesis Test
 Critical value
Reject H0 if
χ 2  χ (r2 1)(c1),α
is chi square distribution with degrees of freedom (r-
1)(c-1) χ (r2 1)(c1),α
Example
• Test whether gender and type of products that are of interest related
(α=5%)
Type of product
Gender Total
A B
Female 12 108 120
Male 24 156 180
Total 36 264 300

Example
The hypothesis
H0: There is no relationship between gender and types of products
H1: There is a relationship between gender and types of products
Statistic test
r c (x ij  eij ) 2
χ 2  
i 1 j1 e ij
2 2 2
(12  14.4) (108  105.6) (24  21.6)
2   
14.4 105.6 21.6
2
(156  158.4)
  0.758
158.4
Example
Type of product (36)(120)

Gender Total e11   14.4
A B 300
Observed = 12 Observed = 108
Female Expected = Expected = 120 (264)(120)
e12   105.6
14.4 105.6 300
Observed = 24 Observed = 156
Male Expected = Expected = 180 (36)(180)
e 21   21.6
21.6 158.4 300
Total 36 264 300 (264)(180)
e 22   158.4
300
Example
 Reject H0 if
χ 2  χ (r2 1)(c1),α
χ (21),5%  3.841
 Conclusion : Don’t reject H0.
There is no relationship between gender and types of products.
Application with MINITAB
(5) Application with Minitab
Inferences on a Population Proportion
(Example 1)
Bina Nusantara University 40

Comparing Two Population
Proportions (Example 2)

Testing for Independence in Two Way
(Example 4)

Output

EXERCISES
(1)
(2)
REFERENCES
Hayter, Anthony.J, (2012), Probabilty and Statistics for Engineers

and Scientiest 4th edition,Cengage Learning, Chapter 10

PPT07 - Discrete Data Analysis

Uploaded by

Copyright:

Available Formats

PPT07 - Discrete Data Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PPT07 - Discrete Data Analysis

Uploaded by

Copyright:

Available Formats

STAT6174– Probability Theory and

LO3 : Use MINITAB software to conduct analyses

LO4 : Interpret the results of statistics calculation and output of

LO5 : Explain the suitable decision from statistical

 Discrete data set consist of the frequencies or counts of the

 The data can be represented in a tabular form known as

Cell probability p Cell probability 1-p

P value = 2 Φ(-0,83) = 2(0,2033) = 0,4066

Confidence Interval Estimation

14 14 / 23(1  14 / 23) 14 14 / 23(1  14 / 23)

If random variable X1 measures the number of successes

Then random variable X2 measures the number of

The unbiased point estimation of the two populations proportion are

Hypothesis Test Statistic Critical value

 Use for analysis of classification with more than two levels is

 The hypothesis test are describe for assessing whether the

Random sample of size n

Category 1 Category 2 Category k

Cell frequency x1 Cell frequency x2 … Cell frequency xk

 Chi square test hypothesis

with the expected cell frequencies ei = npi

It is suggested that the probabilities of these three kinds of breakdown are

p1*=0,2 p2*=0,5 p3*=0,3

The plausibility of this suggestion can be examined with chi-square goodness of

Electrical Mechanical Operator misue

 A two way contingency table is a set of frequency that summarize

 If the first categorization has r levels and the second categorization

Second categorization (Variable)

Female 12 108 120

Male 24 156 180

Total 36 264 300

Type of product (36)(120)

Bina Nusantara University 40

Bina Nusantara University 41

Bina Nusantara University 42

Bina Nusantara University 44

Hayter, Anthony.J, (2012), Probabilty and Statistics for Engineers

You might also like

p1=0,2 p2=0,5 p3*=0,3