Biostatistics Assignment
Biostatistics Assignment
BIOSTATISTICS ASSIGNMENT
Prepared by:
Name: Wondimnew Walle
ID: UGR/2899/12
October 21,2021
Biostatistics
Assignment
Assignment one
1.. Consider the experiment of tossing a fair die and define the following events:
A = {Observe an even number of dots}
B = {Observe a number of dots less or equal to 4}.
Are events A and B independent?
solution:
A is an event of founding an even number from tossing of a single die which has a
probability of 3/6 or 1/2
A = {2, 4, 6}
B is an event of getting number less than or equal to 4 with a probability of 4/6 or
2/3
B= {1, 2, 3, 4}
Two events A and B are independent if the knowledge that one occurred does not
affect the chance the other occurs.
Two events are independent if one of the following are true:
P(A|B) = P(A)
P(B|A) = P(B)
P (A AND B) = P(A)P(B)
Let us check by at least one P (A AND B) =2/6=1/3 and P(A) X
P(B)=1/2X2/3=1/3
Therefore, our events of A and B satisfy these conditions. Hence, the outcomes of
two roles of event A and B of a fair die are independent events
2
2. Suppose that three programmers are designing computer code for a project:
Mr. A has designed 60% of the code, Mr. B 30% and Mr. C 10%. Suppose further
that Mr. A has a bug in 3% of her work, Mr. B in 7% of her work, and Mr. C in 5%
of his.
A. What percentage of the code written has a bug?
B. Given that you find a bug in a line of code, who is most likely to have written it?
Who is least likely?
C. How does the ordering compare to the unconditional probabilities and why
does this relationship make sense?
Solution:
A. in order to find the total percentage of code having bug we have to find the
bug produced by each person.
Bug by MR. A =3% (60%) =3/100(0.6) =0.018
Bug by MR. B=7% (30%) =7/100(0.3) =0.021
Bug by Mr.=5% (10%) =5/100(0.1) =0.005
Total bug written by three persons =0.018+0.021+0.005=0.044
=44/1000
= 44/1000 x 100
= 4.4%
Therefore, the percentage of the code written has a bug is 4.4%
B. code having largest bug is written by person B. Person B has a bug of 0.021 OR
2.1% than others.
but code written by person C has smallest bug which is 0.005 or 0.5 % in his
code.
C. it is similar to unconditional probabilities. no conditioning event is given to
compare the probabilities.
Assignment two
3
3. Suppose you take a sample of N independent biologists to determine how
many of them use valid statistical methods.
• In particular, you have a sample of N independent, identically distributed RVs.
With Yi with p=P(Y=1)
• What is the distribution of the number of successes Y=∑N I=1 Yi in N trials?
!
Y~Bin(y;N,p) P(Y=Y) = Py (1-p) n-y when y=n, p(y=n) becomes pn which is
!( )!
the success
• Calculate the probability that 0 out of 10 biologists use valid statistical methods
when the probability of using valid statistical methods is 0.8
Solution:
This is binomial distribution of random variable x .
the random experiment has trial of n=10 with probability of success of each single
trial p=0.8 and faller of q=1-p=1-0.8=0.2
For binomial distribution of random variable X probability is given by formula
!
P(X=x) = px (1-p) n-x
!( )!
Assignment three
4. Assume that among diabetics the fasting blood level of glucose is
approximately normally distribute with a mean of 105 mg per 100 ml and SD of 9
mg per 100 ml.
a) What proportions of diabetics have levels between 90 and 125 mg per 100 ml?
SOLUTION:
4
This question is about probability distribution of continuous random
variable that is blood glucose level. Therefore, we need normal probability
distribution to calculate probability
Let X be fasting blood glucose level
Mean (µ) =105 mg and SD = 9 mg
Z = (raw score - population mean) / population SD i.e. Z= (x-μ) /σ
P (90< x<125) =P (90-105/9 <z<125-105/9)
=P(-1.67<z<2.22) = P (z<2.22) -P(z<-1.67)
=0.9868-0.0475
= 0.9393
Therefore 94% of blood glucose level is between 90mg and 125mg
b) What proportions of diabetics have levels below 87.4 mg per 100 ml?
solution:
P (x<87.4) =P (z < 87.4-105/9)
=P (z<-1.96)=P(z>1.96)=1- P(z<1.96)
=1-0.9750
= 0.025
Therefore 2.5% of blood glucose are below 87.4mg/100ml
c) What level cuts of the lower 10% of diabetics?
Solution:
Lower 10% is equal to upper 10% diabetic’s level. We can find z value from
standard normal distribution table at probability of 10% or 0.1
P (z<z0) =0.1 from table z=1.28 but the lowest become Z= -1.28
Z= (x-μ) /σ = -1.28 =x-105/9
X=93.5 mg/100ml
Therefore 93.5 mg/100ml cut lowest 10%
5
d) What are the two levels which encompass 95% of diabetics?
solution:
From normal distribution table 95% of area are in between z=-1.96 and z=1.96 so
using such Z value we can find blood glucose at these two points.
WHEN z= -1.96
Z= (x1-μ) /σ= -1.96=x1-105/9
X1=87.36 mg/100ml
When z=1.96
Z= (x2-μ) /σ=1.96=x2-105/9
X2=122.64mg/100ml
Therefore, the two level that encompass 95% are blood glucose level of
87.36mg/100ml and 122.64mg/1ooml.
5. Among a large group of coronary patients it is found that their serum
cholesterol levels approximate a normal distribution. It was found that 10% of the
group had cholesterol levels below 182.3 mg per 100 ml where as 5% had values
above 359.0 mg per 100 ml. What is the mean and SD of the distribution?
Solution:
From SND table Z value of lowest 10% or area of 0.1 is -1.29
At this level we have serum cholesterol of 182.3mg/100ml
Z= (x-μ) /σ
-1.29=182.3-μ/ σ
µ=182.3+1.29 σ…………………. .equation 1
Z value of above 5% or area of 0.05 is 1.65
Z= (x-μ) /σ
=1.65=359.0-µ/ σ
µ=359-1.65 σ…………………equation 2
6
by combine equation 1 and 2 we can get mean and SD
182.3+1.29 σ=359-1.65 σ
2.94 σ=176.7
σ =60.1mg/100ml Hence, standard deviation is 60.1 mg/100ml
from equation 1 we get mean by substitute SD
µ=182.3+1.29(60.1)
=259.83mg/100ml
Hence mean is 259.83mg/100ml
Assignment four
6. Let A and B denote two independent genetic traits. Suppose the probability
that an individual will exhibit trait A is ½ and the probability that an individual will
exhibit trait B is ¾.
a) What is the probability that an individual will exhibit Both traits?
Solution:
Since two events are independent event, probability of both traits become
P (A and B) = P(A) X P(B)=1/2 x ¾=3/8=0.375
b) Neither trait?
Solution:
Both events are independent, so probability of neither traits become
P (A’ and B’) =P(A’) X P(B’) =1/2 X ¼ =1/8=0.125
c) trait A but not trait B?
Solution:
Still two events are independent probability of trait A but not B becomes
P(A and B’)=P(A) X P(B’)=1/2 X ¼=0.125
7
d) trait B but not trait A?
solution;
P(A’ and B)=P(A’) X P(B)=1/2 X ¾=3/8=0.375
e) exactly one trait?
Solution:
We sum the probabilities of the two mutually exclusive ways that yield “exactly
one”
Pr[exactly one] = Pr[(A, not B) or (not A, b)]
=Pr [A, not B]+Pr[not A, B]
=[(.50)(.25)]+[(.50)(.75)] =.125+.375
=0.50
7. A physician develops a diagnostic test that is positive for 95% of the patients
who have disease and is positive for 10% of the patients who do not have disease.
Of patients tested, 20% actually have disease. Suppose you evaluate a patient by
administering this diagnostic test and obtain a positive result. Using the
information given, calculate the probability that this patient has disease.
Solution:
This is conditional probability
By using Bayesian formula, we can calculate the required as follow
We are asked to calculate Probability (Disease | + test)
Given • Probability (+ test | disease) =0 .95
• Probability (+ test | no disease) = 0.10
Probability (Disease) =0.20 implies Probability (not Disease) =0.80
Pr (disease | +) = Pr (disease and +) / Pr (+) …………………….. conditional probability
= Pr (+ | disease) Pr(disease) / Pr (+)
8
=Pr (+ | disease) Pr(disease) / Pr (+ | disease) Pr(disease) + Pr (+ | no
disease) Pr (no disease)
= (0.95) (0.20) / (0.95) (0.20) + (0.10) (0.80)
= 0 .7037
8.The height, X, of young American women is distributed normal with mean
μ=65.5 and standard deviation σ=2.5 inches. Find the probability of each of the
following events.
a. X < 67
solution:
Height is a continuous random variable. so, its probability found using normal
distribution and z score
Z=(x-μ) /σ
P(x<67) =P(z<67-65.5/2.5) =P(z<0.6)
=0.7257
b. 64 < X < 67
solution:
P(64<X<67) =P(64-65.5/2.5<Z<67-65.5/2.5)
=P(-0.6<Z<0.6) =P(Z<0.6)-P(Z<-0.6)
=0.7257-0.2743
=0.4515
9. Four buses carrying 148 students arrive at a football stadium. The buses carry,
respectively, 25, 33, 40 and 50 students. After everyone gets off the buses, a
random student is picked at random. Let X denote the number of students that
were on his/her bus. Also, one of the drivers is picked at random. Let Y denote the
number of students that were on his/her bus.
(a) Compute E[X] and E[Y ]. How do you explain the difference?
Solution:
9
X = 25, 33,40,50=y p(x)=148 and P(y)=4
X or y P(x) P(y)
25 25/148 1/4
33 33/148 1/4
40 40/148 1/4
50 50/148 1/4
( )( )
CI= 𝑝 +/− 𝑍 𝛼/2
( . )( . )
CI= 0.144 +/−2.58
11
Solution:
This is required to calculate the sample size for estimating mean of a
population serum cholesterol for a large population
For N (population size) > 10,000 we have a formula
n=
Assignment 5
12.Please review literature and identify sample size calculation formula for the
following study design
12
a. Unmatched and matched Case control study design
Case-control study is a study that compares patients who have a disease or
outcome of interest (cases) with patients who do not have the disease or
outcome (controls). It looks back retrospectively to compare how frequently the
exposure to a risk factor is present in each group to determine the relationship
between the risk factor and the disease Unmatched Case-Control study calculates
the sample size recommended for a study given a set of parameters and the
desired confidence level.
13
Sample size, when proportion is parameter of the study or data are on
nominal/ordinal scale:
( )
𝑛= 𝑥
( )
𝑤ℎ𝑒𝑟𝑒
n = Desired number of samples
r = Control to cases ratio (1 if same numbers of subject in both groups)
p = Proportion of population = (P1 +P2 )/2
Z1-β = It is the desired power (0.84 for 80% power and 1.28 for 90% power)
z1-α/2 = Critical value and a standard value for the corresponding level of
confidence. (At 95% CI or 5% type I error it is 1.96 and at 99% CI or 1% type
I error it is 2.58)
P1 = Proportion in cases
P2 = Proportion in controls
Sample size in case data is on interval/ratio (quantitative) scale and mean
as a parameter of the study:
( / )
𝑛= 𝑥
14
cohort study is one of analytic type of study design in which healthy individual
with and without exposure to some risk factors are observed and follow up toa
certain time period and their outcome is observed. The cohort study design is the
best available scientific method for measuring the effects of a suspected risk
factor. In a prospective cohort study, researchers raise a question and form a
hypothesis about what might cause a disease. Then they observe a group of
people, known as the cohort, over a period of time. The result can be
compared by using relative risk.
The sample size for cohort study can be calculated by the following
Formula:
15
𝜶
𝒁𝟏 𝟐 𝑷(𝟏 𝑷)
Sample size= 𝟐
𝒅𝟐
16
For quantitative study:
𝜶
𝟐𝝈𝟐 𝒁 𝒁𝜷 𝟐
Sample size= 𝟐
𝒅𝟐
Daniel-1995-Biostatistics.
https://www.ncbi.nlm.nih.gov/pmc/articles
The end!
17