Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

PPT07 - Discrete Data Analysis

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 49

STAT6174– Probability Theory and

Applied Statistics
Topik – 7
Discrete Data Analysis
LEARNING OUTCOMES
LO2 : Use proper statistical techniques for statistical
decision making in the real problem

LO3 : Use MINITAB software to conduct analyses

LO4 : Interpret the results of statistics calculation and output of


analysis using MINITAB

LO5 : Explain the suitable decision from statistical


method solution
OUTLINE
• Inferences on a population proportion
• Comparing two population proportions
• Goodness of fit tests for one-way contingency tables
• Testing for independence in two way
• Application with MINITAB
Inferences on a population
proportion
Inferences on a population
proportion
Discrete random variable take only discrete values
Example :
A random variable measuring the number of errors in a software product:
0, 1, 2, 3, ….
A product’s quality level :
high, medium, or low
Machine breakdown:
mechanical failure, electrical failure, or operator misuse
Inferences on a population
proportion
 Discrete data analysis is also often referred to as categorical data
analysis.

 Discrete data set consist of the frequencies or counts of the


observation found at each of the possible levels or cells.

 The data can be represented in a tabular form known as


contingency table and analyses by chi square goodness of fit.
Inferences on a population
proportion
Population
Proportion p with
characteristic

Random sample of
size n

With Without
characteristic characteristic

Cell probability p Cell probability 1-p


---------------------- ----------------------
Cell frequency x Cell frequency n-x
Inferences on a population
proportion
Estimation
Suppose that a parameter p represent the unknown proportion of a
population that possesses a particular characteristic. p can be
estimated :

X
pˆ 
n
Inferences on a population
proportion
Confidence Interval
• For large enough values of n the sample proportion can be taken to have
approximately the normal distribution
pˆ  p
 p (1  p )  ~ N 0,1
pˆ ~ N  p,  p (1  p )
 n 
n
• Confidence Interval Estimation

ˆ (1  p
p ˆ) ˆ (1  p
p ˆ)
ˆ  z / 2
p ˆ  z / 2
 p p
n n
Inferences on a population
proportion
Hypothesis Test
Hypothesis Test Statistic Critical value

H0: p  po z   z
H1: p < po
pˆ  po
z
H0: p  po po (1  po )
z  z
H1 : p > p o n

z  z / 2 or
H0: p = po
z   z / 2
H1: p ¹ po
Inferences on a population
proportion
Example
A biologist is interested in whether opossums give birth to male and female progeny with equal
probabilities. A group opossums is observed, and out of 23 births, 14 are male and 9 are female.
Suppose that each opossum offspring has a probability p of being male, independent of any
other births. The number of male births out of 23 births is then a random variable with a
B(23,p) distribution. Use α=5%

H0: p = 0,5
H1: p ¹ = 0,5
Inferences on a population
proportion
Test statistics :
pˆ  po 14 / 23  0,5
z   0,83
po (1  po ) 0,5(1  0,5)
n 23

Critical Value :
z  z / 2 or z   z / 2  z  1,96 or z  1,96

P value = 2 Φ(-0,83) = 2(0,2033) = 0,4066


Conclusion : don’t reject H0. There is not sufficient evidence to
conclude that male and female births are not equally likely.
Inferences on a population
proportion

Confidence Interval Estimation

14 14 / 23(1  14 / 23) 14 14 / 23(1  14 / 23)


  1,96  p   1,96
23 23 23 23
 0,410  p  0,808
Comparing two population
proportions
Definition

If random variable X1 measures the number of successes


observed in a sample of size n1 from population 1, then

X1 ~ B(n1, p1)

Then random variable X2 measures the number of


successes observed in a sample size n2 from population 2,
then X2 ~ B(n2, p2)
Estimation
Inference on difference between the two population proportion

p1 – p2

The unbiased point estimation of the two populations proportion are

x1 x2
pˆ 1  pˆ 2 
Difference n1 n2

x1 x2
pˆ 1  pˆ 2  
n1 n2
Confidence Interval
Confidence Interval Estimation :
pˆ 1qˆ1 pˆ 2 qˆ 2 pˆ 1qˆ1 pˆ 2 qˆ 2
( pˆ 1  pˆ 2 )  z / 2  ˆ ˆ
 ( p1  p2 )  ( p1  p2 )  z / 2 
n1 n2 n1 n2

Where,
qˆ1  1  pˆ 1
qˆ 2  1  pˆ 2
Hypothesis Test

Hypothesis Test Statistic Critical value

H0: p1-p2 ≥ 0
Z
 pˆ1  pˆ 2    p1  p2  z   z
H1: p1-p2 < 0
1 1
pˆ (1  pˆ )   
H0: p1-p2 ≤ 0  n1 n2  z  z
H1: p1-p2 > 0 X1  X 2 X1 X2
p̂  , p̂1  , p̂ 2 
n1  n 2 n1 n2
H0: p1-p2 = 0 z  z / 2 or
where X1 and X2 are the number of
z   z / 2
H1: p1-p2 ≠ 0 successes in samples 1 and 2
Example
Recall that x = 19 of then n = 33 offspring raised by opossums with the enhanced diet are
male, and that y = 15 of the m = 30 offspring raised by opossums without the enhanced
diet are male. The Trivers-Willard hypothesis suggest that proportion with enhanced diet
higher than proportion of without enhanced. α = 5%.

Hypothesis :

H0: p1-p2 = 0
H1: p1-p2 > 0
Example
• Hypothesis:
H0: p1-p2  0
H1: p1-p2 > 0
• α = 5% = 0,05
• Statistic Test:

Z 
 pˆ1  pˆ 2    p1  p2  
19 / 33  15 / 30  0  0,6
1 1  1 1 
ˆp (1  pˆ )    0,54(1  0,54)   
 n1 n2   33 30 
19  15
p̂   0,54
33  30
Example
Critical values :
Reject H0 if
z  z
z  1,645
Conclusion : Don’t reject H0
Because 0,6 < 1,645
Goodness of fit tests for one-way
contingency tables
Definition

 Use for analysis of classification with more than two levels is


considered

 The hypothesis test are describe for assessing whether the


probability vector of multinomial distribution takes a specified value.

 Goodness of fit test, offered to as Chi Square test, are used to test
the hypothesis.
Definition

Population

Random sample of size n

Classification

Category 1 Category 2 Category k


Cell frequency x1 Cell frequency x2 … Cell frequency xk


Goodness of Fit Test
 Consider a multinomial distribution with k cells and a set of
unknown probabilities p1, p2,…pk.
 Based upon a set of observed cell frequencies x1, x2, …, xk =n, The
hypothesis :
H0 : pi = pi*
H1 : pi ≠ pi*
Where 1 ≤ i ≤ k
Goodness of Fit Test

 Chi square test hypothesis


2
(x 
k
e )
χ2   i i
i 1 ei

with the expected cell frequencies ei = npi


 Critical value
reject H0 if
χ 2  χ (2k -1),α

χ (2k -1),α
is chi square distribution with degrees of freedom k-1
Example
Recall that out of n=46 machine breakdown x 1=9 are attributable to electrical
problems, x2=24 are attribute to mechanical problems, and x 3=13 are attribute
to operator issues.

It is suggested that the probabilities of these three kinds of breakdown are


respectively

p1*=0,2 p2*=0,5 p3*=0,3

The plausibility of this suggestion can be examined with chi-square goodness of


fit test. α=5%
Example
Hypothesis :
H0 : p1=0,2 p2=0,5 p3=0,3
H1 : p1≠0,2 p2≠0,5 p3≠0,3
Chi-square goodness of fit test
2
k
(x  e )
χ 2   i i  0,0942
i 1 ei
χ 2  5,991
Critical value
Conclusion : Don’t reject H0
• Chi-square Goodness of Fit Test

Electrical Mechanical Operator misue


xi X1=9 X2=24 X3=13
ei 46x0,2=9,2 46x0,5=23 46x0,3=13,8

2 2 2
(9  9, 2) (24  23) (13  13,8)
χ2     0,0942
9,2 23 9,213,8
Testing for independence in two
way
Definition

 A two way contingency table is a set of frequency that summarize


how a set of object is simultaneously classified under two (variable)
different categorization

 If the first categorization has r levels and the second categorization


has c levels, then the form of a set of frequencies

xij where1≤ i ≤ r 1 ≤ j ≤ c
Definition

Second categorization (Variable)


First 1 2 ... j ... c Totals
Categorization
(variable)
1 x11 x12 … x1j … x1c X1.
2 x21 x22 … x2j … x2c x2.
… … … … … … … …
i xi1 xi2 … xij … xic xi.
… … … … … … … …
r xr1 xr2 … xrj … xrc xr.
Totals … …
x.1 x.2 x.j x.c n=x..
Hypothesis Test
 The hypothesis
H0: There is no relationship between the two variables
(independent)
H1: There is a relationship between the two variables
(dependent)

 Statistic test
r c (x ij  e ij ) 2
χ 2  
i 1 j1 e ij

x i.x .j
where the expected cell eij 
n
Hypothesis Test
 Critical value
Reject H0 if
χ 2  χ (r2 1)(c1),α
is chi square distribution with degrees of freedom (r-
1)(c-1) χ (r2 1)(c1),α
Example
• Test whether gender and type of products that are of interest related
(α=5%)

Type of product
Gender Total
A B

Female 12 108 120

Male 24 156 180

Total 36 264 300


Example
The hypothesis
H0: There is no relationship between gender and types of products
H1: There is a relationship between gender and types of products
Statistic test

r c (x ij  eij ) 2
χ 2  
i 1 j1 e ij
2 2 2
(12  14.4) (108  105.6) (24  21.6)
2   
14.4 105.6 21.6
2
(156  158.4)
  0.758
158.4
Example

Type of product (36)(120)


Gender Total e11   14.4
A B 300
Observed = 12 Observed = 108
Female Expected = Expected = 120 (264)(120)
e12   105.6
14.4 105.6 300
Observed = 24 Observed = 156
Male Expected = Expected = 180 (36)(180)
e 21   21.6
21.6 158.4 300
Total 36 264 300 (264)(180)
e 22   158.4
300
Example
 Reject H0 if

χ 2  χ (r2 1)(c1),α
χ (21),5%  3.841
 Conclusion : Don’t reject H0.
There is no relationship between gender and types of products.
Application with MINITAB
(5) Application with Minitab
Inferences on a Population Proportion
(Example 1)

Bina Nusantara University 40


(5) Application with Minitab
Comparing Two Population
Proportions (Example 2)

Bina Nusantara University 41


(5) Application with Minitab
Testing for Independence in Two Way
(Example 4)

Bina Nusantara University 42


(5) Application with Minitab
(5) Application with Minitab
Output

Bina Nusantara University 44


EXERCISES
(1)
(2)
REFERENCES

Hayter, Anthony.J, (2012), Probabilty and Statistics for Engineers


and Scientiest 4th edition,Cengage Learning, Chapter 10

You might also like