Biostatistics Teaching
Biostatistics Teaching
Biostatistics Teaching
By
Kamukama Robert
Chapter 1
Introduction To
Biostatistics
QUANTITATIVE (CONTINOUS)
Number of Children
Hb
CONTINUOUS DATA
DISCRETE DATA
: Interval scale
Data is placed in meaningful intervals and order. The unit of
.measurement are arbitrary
Data Colllection
Inferential Statistiscs
Descriptive Statistics
Data Presentation
Estimation Hypothesis Univariate analysis
Measures of Location
Tabulation Testing
Measures of Dispersion
Diagrams Ponit estimate Multivariate analysis
& Measures of Skewness
Graphs Inteval estimate
Kurtosis
Frequency Distributions
data distribution – pattern of •
.variability
the center of a distribution –
the ranges –
the shapes –
simple frequency distributions •
grouped frequency distributions •
midpoint –
Tabulate the hemoglobin values of 30 adult
male patients listed below
9.0< 0 2 2
9.9 – 9.0 1 3 4
10.9 – 10.0 3 5 8
11.9 – 11.0 6 8 14
12.9 – 12.0 10 6 16
13.9 – 13.0 5 4 9
14.9 – 14.0 3 2 5
15.9 – 15.0 2 0 2
Total 30 30 60
Elements of a Table
Ideal table should have Number
Title
Column headings
Foot-notes
Number – Table number for identification in a report
Continuous data
Histogram ---
Frequency polygon (curve) ---
Stem-and –leaf plot ---
Box-and-whisker plot ---
Example data
32 28 36 30 27 42 63 68
65 44 25 24 28 22 27 79
31 28 42 36 51 74 25 43
32 12 51 57 12 45 25 28
21 38 50 31 27 42 38 49
27 43 22 23 47 64 24 16
31 46 52 11 19 23 28 49
12 49 43 30
Histogram
20
Frequency
10
10
32 28 36 30 27 42 63 68
65 44 25 24 28 22 27 79
31 28 42 36 51 74 25 43
32 12 51 57 12 45 25 28
21 38 50 31 27 42 38 49
27 43 22 23 47 64 24 16
31 46 52 11 19 23 28 49
12 49 43 30
Stem and leaf plot
Stem-and-leaf of Age N = 60
Leaf Unit = 1.0
122269 1 6
1223344555777788888 2 19
00111226688 3 )11(
2223334567999 4 13
01127 5 5
3458 6 4
49 7 2
* A population:
It is the largest collection of values of a
random variable for which we have an
interest at a particular time.
For example:
The weights of all the children enrolled in
a certain elementary school.
Populations may be finite or infinite.
2
2 2 2
1
Frequency
30 – 39 11
40 – 49 46
50 – 59 70
60 – 69 45
70 – 79 16
80 – 89 1
Total 189
:The Mid-interval
It can be computed by adding the lower bound of
the interval plus the upper bound of it and then
.divide over 2
= x
i 1
i
n
Example:
Here is a random sample of size 10 of ages, where
1 = 42, 2 = 28, 3 = 28, 4 = 61, 5 = 31,
6 = 23, 7 = 50, 8 = 34, 9 = 32, 10 = 37.
S2 i 1
n 1
deviation.
• X : Sample mean.
Negative(A) 28 35 63
Bipolar 19 38 57
Disorder(B)
Unipolar (C) 41 44 85
P( A B)
P(A\B)= P( B) , P(B)≠ 0
P( A B)
P(B\A)= P ( A) , P(A)≠ 0
Definition.2
P ( D) 1 P( D)
p(T | D) 1 P(T | D )
P(T | D) P( D)
P( D | T )
P (T | D) P( D) P(T | D) P( D)
where,
p(T | D) 1 P(T | D)
A false positive is when the test indicates a positive result (T) when
the person does not have the disease D
x ! x (x 1)(x 2)....(1)
* Note: 0! =1 Text Book : Basic Concepts and Methodology for the 119
Health Sciences
Properties of the binomial
distribution
1. f (x ) 0
2. f (x ) 1
3.The parameters of the binomial
distribution are n and p
4. E (X ) np
2
5. var(X ) np (1 p )
1. f (x ) 0
2. f (x ) 1
3. E (X )
2
4. var(X )
• π, e : constants
• µ: population mean.
• σ : Population standard deviation.
1 < 2 < 3
Text Book : Basic Concepts and 135
Methodology for the Health Sciences
Note that : (As seen in Figure
4.6.2)
Example 4.6.2:
P(-2.74 < Z < 1.53) is the area between
-2.74 and 1.53.
P(-2.74 < Z < 1.53) =0.9370 – 0.0031
= 0.9339.
2.74- 1.53
Example : 2.71
0.84
Text Book : Basic Concepts and 141
Methodology for the Health Sciences
How to transform normal
distribution (X) to standard
normal distribution (Z)?
• This is done by the following formula:
x
z
• Example:
• If X is normal with µ = 3, σ = 2. Find the
value of standard normal Z, If X= 6?
• Answer:
x 63
z 1.5
2
Example 4.7.1:
The ‘Uptime ’is a custom-made light weight battery-operated
activity monitor that records the amount of time an individual
spend the upright position. In a study of children ages 8 to 15
years. The researchers found that the amount of time children
spend in the upright position followed a normal distribution with
Mean of 5.4 hours and standard deviation of 1.3.Find
X 3 5.4
P( X < 3) = P( < ) = P(Z < -1.85) = 0.0322
1.3
-------------------------------------------------------------------------
2-The probability that the child spend more than 5
hours in the upright position 24-hour period
X 5 5.4
P( X > 5) = P( > ) = P(Z > -0.31)
1.3
P( X = 6.2) = 0
mean.
3- It ranges from - to .
------------------------------
t (7, 0.975)
t (24, 0.995) = 2.7696
0.005
0.995
--------------------------
If P (T(18) > t) = 0.975, t (24, 0.995)
0.025
then t = -2.1009 0.975
-------------------------
t
If P (T(22) < t) = 0.99,
0.01
then t = 2.508 0.99
i
is an estimator of the population mean,. The
single numerical value that results from
evaluating this formula is called an estimate of
the parameter .
Text Book : Basic Concepts and
Methodology for the Health
Sciences 153
Confidence Interval for 6.2
a Population Mean: (C.I)
Suppose researchers wish to estimate the mean
of some normally distributed population.
They draw a random sample of size n from the
P( L ≤ ≤ U ) = 1-
P(
t
x - t (1- /2),n-1 s/n < < x + t (1- /2),n-1 s/n) = 1-
(1- /2),n-1 = t 0.975,18 = 2.1009 (refer to table E)
t 0.975,18(s/n) =2.1009 (130.9 / 19)=63.1
250.8 ± 2.1009 (130.9 / 19) →
12 22 12 22
( x1 x2 ) Z 1 2 ( x1 x2 ) Z
1
2 n1 n2 1
2 n1 n2
Text Book : Basic Concepts and
Methodology for the Health
Sciences 165
2) When variances are unknown but equal, and the
sample size is small, the C.I. has the form:
1 1 1 1
( x1 x2 ) t Sp 1 2 ( x1 x2 ) t Sp
1 ,( n1 n2 2 )
2 n1 n2 1
2
, ( n1 n 2 2 ) n1 n2
where
2 (n1 1) S12 (n2 1) S 22
S
p
n1 n2 2
S12 S 22 S12 S 22
( x1 x2 ) Z 1 2 ( x1 x2 ) Z
1
2 n1 n2 1
2 n1 n2
n2 – 2 = 18 + 10 -2 = 26+ n1
t (1- /2),(n1+n2-2) = t0.995,26 = 2.7787, then 99% C.I for μ1 – μ2
1 1
( x1 x2 ) t Sp
1
2
, ( n1 n2 2 ) n1 n2
where 2 (n1 1) S12 (n2 1) S 22 (17 x9.32 ) (9 x11 .52 )
Sp 102.33
n1 n2 2 18 10 2
then
The 98% C. I is
ˆ (1 P
P ˆ) 0.18(1 0.18)
ˆZ
P 0.18 2.33
1
2 n 1220
The 99% C. I is
ˆ (1 P
P ˆ ) ˆ (1 P
P ˆ )
ˆ P
(P ˆ )Z F F
M M
F M
1
2 nF nM
σ2 is known σ2 is unknown
( n large or small)
X - o n large n small
Z X - o
Z
X - o T
n s s
n n
ii) If H : μ> μ
A 0
Reject H if Z>Z
0 1-α (when use Z - test)
H : μ 30
A
5.Decision Rule
The alternative hypothesis is
H : μ > 30
A
X - o 27 30
Z = = -2.12
20
n 10
5. Decision Rule: Reject H0 if Z< Z α, where
σ2 is known σ2 is unknown if
( n1 ,n2 large or small)
( n1 ,n2 small)
(X1 - X 2 ) - ( 1 2 )
Z
12 22
n1 n2
population population
(X1 - X 2 ) - (Variances
1 2 )
(X1 - X 2 ) - ( 1 2 ) T
T Variances equal notS1equal
2
S 22
1 1
Sp n1 n2
n1 n2
2 2
(n 1) S (n 1) S
S p2 1 1 2 2
n1 n2 2
where Text Book : Basic Concepts and 201
Methodology for the Health Sciences
Case2: If population is not normally distributed
and n1, n2 is large(n1 ≥ 0 ,n2≥ 0)
and population variances is known,
(X1 - X 2 ) - ( 1 2 )
Z
12 22
n1 n2
ii) HA: μ 1
> μ2 →μ 1 - μ 2 > 0
Reject H0 if Z>Z1-α (when use Z - test)
Or Reject H0 if T>t1-α,(n1+n2 -2) (when use T - test)
4.Test Statistic:
(X1 - X 2 ) - ( 1 2 ) (4.5 - 3.4) - (0)
Z = = 2.57
12 22 1 1.5
n1 n2 12 15
5. Desicion Rule:
Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
Z1-α/2= Z1-0.05/2= Z0.975=1.96 (from table D)
6-Conclusion: Reject H0 since 2.57 > 1.96
Or if p-value =0.102→ reject H0 if p < α → then reject H0
Text Book : Basic Concepts and 206
Methodology for the Health Sciences
Example7.3.2 page 240
The purpose of a study by Tam, was to investigate wheelchair
Maneuvering in individuals with over-level spinal cord injury (SCI)
And healthy control (C). Subjects used a modified a wheelchair to
incorporate a rigid seat surface to facilitate the specified
experimental measurements. The data for measurements of the
left ischial tuerosity (لمتحركbbلكرسياbbلفخذ وتأثيرها مناbb اb )عظامfor SCI and
control C are shown below
4.Test Statistic:
(X - X ) - ( 2 ) (126.1 133.1) 0
T 1 2 1
0.569
1 1 1 1
Sp 756.04
n1 n2 10 10
4.Test Statistic:
(X1 - X 2 ) - ( 1 2 ) (59.01 46.61) 0
Z 1.59
2 2 2 2
S S 44.89 34.85
1
2
n1 n2 53 54
Text Book : Basic Concepts and 212
Methodology for the Health Sciences
5. Decision Rule:
Reject H 0 if Z > Z1-α
Z1-α = Z0.99 = 2.33 (from table D)
ii) If H : P> P
A 0
Reject H if Z>Z
0 1-α
_____________________________
iii) If H : P< P
A 0
H0: P = 0.063
HA: P > 0.063
4.Test Statistic :
ˆ p0
p 0.08 0.063
Z 1.21
p 0 q0 0.063(0.937)
n 301
ii) If H : P > P
A 1 2
Reject H if Z >Z
0 1-α
_____________________________
iii) If H : P < P
A 1 2
Reject H0 if Z< - Z1-α
Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from
table D
6. Conclusion: reject or fail to reject H0
Text Book : Basic Concepts and 222
Methodology for the Health Sciences
Example7.6.1 page 262
Noonan is a genetic condition that can affect the heart growth,
blood clotting and mental and physical development. Noonan examined
the stature of men and women with Noonan. The study contained 29
Male and 44 female adults. One of the cut-off values used to assess
stature was the third percentile of adult height .Eleven of the males fell
below the third percentile of adult male height ,while 24 of the female
fell below the third percentile of female adult height .Does this study
provide sufficient evidence for us to conclude that among subjects with
Noonan ,females are more likely than males to fall below the respective
of adult height? Let α=0.05
Solution:
1.Data: n M = 29, n F = 44 , x M= 11 , x F= 24, α=0.05
xM x F 11 24
p 0.479 pˆ M xm 11 0.379, pˆ F xF 24 0.545
nM n F 29 44 nM 29 nF 44
Text Book : Basic Concepts and 223
Methodology for the Health Sciences
2- Assumption : Two populations are independent .
3.Hypotheses:
Case II : H0: PF = PM → PF - PM = 0
HA: PF > PM → PF - PM > 0
4.Test Statistic:
( pˆ 1 pˆ 2 ) ( p1 p2 ) (0.545 0.379) 0
Z 1.39
p (1 p ) p (1 p ) (0.479)(0.521) (0.479)(0.521)
n1 n2 44 29
5.Decision Rule:
Reject H0 if Z >Z1-α , Where Z1-α = Z1-0.05 =Z0.95= 1.645
6. Conclusion: Fail to reject H0
Since Z =1.39 > Z1-α=1.645
Or , If P-value = 0.0823 → fail to reject H0 → P > α
Text Book : Basic Concepts and 224
Methodology for the Health Sciences
Exercises:
Questions : Page 234 -237
7.2.1,7.8.2 ,7.3.1,7.3.6 ,7.5.2 ,,7.6.1
H.W:
7.2.8,7.2.9, 7.2.11, 7.2.15,7.3.7,7.3.8,7.3.10
7.5.3,7.6.4
=B
0.5 620
1.0 630
1.5 800
2.0 840
2.5 840
3.0 870
3.5 1010
4.0 940
4.5 950
5.0 1130
o
r
49 81
50 88
53 83
120
55 99
60 91
100
55 89
80
60 95
50 9060
1سلةbمتسل
40
20
0
0 10 20 30 40 50 60 70
•
• Value of r (positive or negative) Meaning
• ______________________________________________________
_
•
• 0.00 to 0.19 A very weak correlation
• 0.20 to 0.39 A weak correlation
• 0.40 to 0.69 A modest correlation
• 0.70 to 0.89 A strong correlation
• 0.90 to 1.00 A very strong correlation
• ______________________________________________________
__
• With Pearson’s r,
• means that we add the products of the deviations to see if the positive
products or negative products are more abundant and sizable. Positive
products indicate cases in which the variables go in the same direction (that
is, both taller or heavier than average or both shorter and lighter than
average);
• negative products indicate cases in which the variables go in opposite
directions (that is, taller but lighter than average or shorter but heavier than
average).
•
Text Book : Basic Concepts and
Methodology for the Health Sciences 238
Computational Formula for Pearsons’s Correlation Coefficient r •
Second
↓ Criterion
1 2 3 ..… c Total
1 N11 N12 N13 …… N1c .N1
2 N21 N22 N 23 …… N2c .N2
. . . . …… . .
. . . . . .
N
Text Book : Basic Concepts and
Methodology for the Health Sciences 246
Chi-square Test
After the calculations of expected frequency,
Prepare a table for expected frequencies and use Chi-
square
2
(oi ei )
[
2
]
k
i 1
ei
Where summation is for all values of r xc = k cells.
D.F.: the degrees of freedom for using the table are (r-
1)(c-1) for α level of significance
Note that the test is always one-sided.
2
5.991 =
, ( r 1)( c 1)
2 2
:Calculations
(260 247.86) / 247.86 (299311
2
.14) / 311.14
2
..... (1411 .69) / 11.69 9.091
<0.025
We also reject the hypothesis at 0.025 level of
cld bc
Text Book : Basic Concepts and
Methodology for the Health Sciences 251
ODDS RATIO
Where a, b, c and d are the numbers given in the
following table: Risk Sample Total
Factor
↓
Cases Control
Presen a b a+b
t
Absent c d c+d
We may construct 100(1-
Totalα)%CI
a + cfor OR
b +by
d formula:
2
1 ( z / X )
R / 2
(64)(3496)
OR 9.62 Smoked 64 342 406
(342)(68) throughout
Never smoked 68 3496 3564
Obesity status Total 132 3838 3970
.pair no 1 2 3 4 5 6 7 8 9 10 11 12
instructed 1.5 2.0 3.5 3.0 3.5 2.5 2.0 1.5 1.5 2.0 3.0 2.0
Not 2.0 2.0 4.0 2.5 4.0 3.0 3.5 3.0 2.5 2.5 2.5 2.5
instructed
H0 : P(+) = P(-) = 0.5
Difference - 0 - + - - - - - - + -
1.Data. Scores of dental hygiene, one member instructed how
to brush and other remained uninstructed.
2. Assumption: the variable of dist is continues
3. Ho : The median of the difference is zero [P(+) =P(-)]
HA : The median of the difference is negative
[P(+) <P(-)]
H0: Mx ≤ My
HA: Mx > My
H0: Mx = My
HA: Mx ≠ My