Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
13 views6 pages

HWK4 - Correctversion

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 6

Stat 371 Homework #4

Allison Kelley

• Submit your homework to Canvas by the due date and time. Email your lecturer if you have
extenuating circumstances and need to request an extension.
• If an exercise asks you to use R, include a copy of all relevant code and output in your submitted
homework file. You can copy/paste your code, take screenshots, or compile your work in an
Rmarkdown document.
• If a problem does not specify how to compute the answer, you many use any appropriate method. I
may ask you to use R or use manual calculations on your exams, so practice accordingly.
• You must include an explanation and/or intermediate calculations for an exercise to be complete.
• Be sure to submit the HWK4 Autograde Quiz which will give you ~20 of your 40 accuracy points.
• 50 points total: 40 points accuracy, and 10 points completion

Discrete Random Variables


Exercise 1. A chemical supply company ships a certain solvent in 10-gallon drums. Let X represent the
number of drums ordered by a randomly chosen customer. Assume X has the following probability mass
function (pmf). The mean and variance of X are : µX = 2.3 and σX2 = 1.81:

X P(X=x)
1 0.4
2 0.2
3 0.2
4 0.1
5 0.1

a. Calculate P (X ≤ 2) and describe what it means in the context of the problem.

P(2) = 0.2, P(1) = 0.4, 0.2 + 0.4 = 0.6. 0.6 = 60% is the probability that a randomly chosen
customer orders either 1 or 2 drums of the chemical

b. Let Y be the number of gallons ordered, so Y = 10X. Find the probability mass function of
Y.
Take all X values from the previous table and multiply them by 10.
The probabilities of these new values remain the same.

Y P(Y=y)
10 0.4
20 0.2
30 0.2
40 0.1
50 0.1
1
c. Calculate the expected value (mean) number of gallons ordered µY .

10(0.4) + 20(0.2) +30(0.2) + 40(0.1) + 50(0.1) = 4 + 4 + 6 + 4 +5 = 23

The expected value of gallons ordered is 23.


d. Calculate the standard deviation of the number of gallons ordered, σY .

The mean from the pmf = 23. The sd of (Y) is equal to the square root of the variance.
So to find the variance of this data we must do:
0.4(10-23)^2 + 0.2(20-23)^2 + 0.2(30-23)^2 + 0.1(40-23)^2 + 0.1(50-23)^2 = 181,
Then to find the sd, we take the square root of 181, which is 13.453624

2
Normal Random Variables
Exercise 2. Weights of female cats of a certain breed (A) are well approximated by a normal distribution
with mean 4.1 kg and standard deviation of 0.6 kg WA ∼ N (4.1, 0.62).
a. What proportion of female cats of that breed (A) have weights between 3.7 and 4.4 kg?
Using the pnorm function we have:

> pnorm(3.7, 4.1, 0.6)


[1] 0.2524925
> pnorm(4.4, 4.1, 0.6)
[1] 0.6914625

Then we subtract the value we got from 4.4 from that of 3.7 to find the area under that curve, which
is 0.69146 – 0.25249 = approx.. 0.4390

b. A female cat of that breed (A) has a weight that is 0.5 standard deviations above the
mean. What proportion of female cats of that breed (A) are heavier than this one?
The current standard deviation is 0.6, multiply that by 0.5, which is 0.3. 0.3 is then
added onto the original mean of 4.1 to get 4.4.
To find the value on the right side of 4.4, we use 1 – pnorm:
1 – pnorm(4.4, 4.1, 0.6) = 0.3085375

c. How heavy is a female cat of this breed whose weight is the 80th percentile?
Qnorm(0.8, 4.1, 0.6) = 4.604973 kg

d. What is the IQR of weights for female cats of this breed using the normal distribution
approximation?
Qnorm(0.75, 4.1, 0.6) – qnorm(0.25, 4.1, 0.6) = 4.504694 - 3.695306 = 0.8093

e. Females from another breed of cats (breed B) have weights well approximated by a
normal distribution with mean 10.6 lb and standard deviation of 0.9 lb. WB.lb ∼ N (10.6,
0.92). Transform the weights of cat breed B into kilograms using the conversion: 1 lb ≈
0.454 kgs. You can use the transformation: WB = 0.454(WB.lb). Compare the shape, center,
and spread of the two breeds.
10.6 lb (0.454 kg) = 4.8124 kg (mean)
0.9 lb (0.454 kg) = 0.4086 kg (sd)

Original: Wa.kg ∼ N (4.1, 0.6^2) New: Wb.kg ∼ N (4.8, 0.41^1)


Mean comparisons: 4.1, 4.8
SD comparisons: 0.6, 0.41
The center of the new breed is shifted to the right by 0.7 due to its larger mean, and the new breed’s
smaller standard deviation suggests a more condensed (closer to mean) graph. This would also mean the
new breed has a smaller spread compared to the original (based off of the standard deviations).

3
Sampling Distributions
Exercise 3. A serving of breakfast cereal has a sugar content that is well approximated by a Normal
random variable X with mean 13 g and variance 1.32g2. We can consider each serving as an independent
and identical draw from X.
a. In what percent of servings will the sugar content be above 13.3 g?
X ~ N(13, 1.3^2) Above suggests the right side. 1 – pnorm(13.3, 13, 1.3) = 0.408747 or 40.9%

b. What is the probability that a randomly chosen serving will have a sugar content between
13.877 and 12.123? What do we call the difference: 13.877-12.123=1.754?
Pnorm(13.877, 13, 1.3) – pnorm(12.123, 13, 1.3) = 0.5000798. The difference of 13.877 and
12.123 is 1.754 or the IQR.

c. Calculate the probability that in 6 servings, only 1 has a sugar content below 13 g.
dbinom(1, 6, 0.5)
The probability of getting below 13g is found by pnorm(13, 12, 1.3). This value is then plugged into
the probability within the dbinom function. 1 is our desired value, 6 is the population/sample
number, and 0.5 is the probability calculated from the pnorm of 13g.
(We can use dbinom because in a serving, it either has sugar or it doesn’t have sugar (below 13g) so
this is a binomial.)
= 0.094
d. Describe the sampling distribution for the mean sugar content of 6 servings X¯ .
X¯ ~ N(mean, variance/servings), so X¯ ~ N(13, 1.3^2/6)

e. What is the interquartile range of the sampling distribution for the sample mean X¯ when
n=6? Is that value larger or smaller than the IQR implied in part (b)? Why do the
relative sizes of the IQRs make sense?
Mean = 13
SD = 1.3/sqrt(6)
qnorm(0.75, 13, 1.3/sqrt(6)) - qnorm(0.25, 13, 1.3/sqrt(6)) = 0.7159341

The standard deviation is the square root of the variance, so the sd would become 1.3/sqrt(6).
We can use qnorm as we are looking to find the difference between quartiles (IQR), and we
subract the qnorm of 0.75 from 0.25 as shown above. This value of 0.7159341 is smaller than the
IQR from part b (1.754), because we are dividing the variance by 6, because of the sample
size, which in turn makes the standard deviation smaller. More of the numbers, then,
land closer to the mean.

f. What is the probability that the mean sugar content in 6 servings is more than 13.3 g ?
Mean= 13, SD = 1.3/sqrt(6)
P(x > 13.3) = P(x-13 > 13.3-13) = P((x-13.3)/ 1.3/sqrt(6) > (13.3-13)/ 1.3/sqrt(6)) =
P(((x-13.3)(sqrt6)/1.3) > (0.3(sqrt6)/1.3)) = 0.557 = z-score
Using a standard normal distribution chart for the value 0.557, the corresponding value is 0.714. We then
subtract this value from 1, to find the area to the right: 1 – 0.714 = 0.286
4
g. Is it more or less likely that the mean sugar content is above 13.3 g in 10 servings or 6
servings (as computed in f)? Can you explain it without actually computing the new
probability?
A larger sample size, decreases the standard deviation, meaning more values are closer to the
mean, lessening the spread. Because the mean, or the place at which the graph is centered at is 13,
and the value we are looking at is 13.3 (above the mean), having a smaller spread/smaller SD
would decrease the chances of the sugar content being above 13.3.

h. Suppose each cereal box of this type contains 10 servings and consider the total sugar
content in each box as a sum of 10 iid random draws from X ∼ N (13, 1.32). If you
were to eat a whole box of cereal, above what total sugar content would you consume
with 95% probability? Show and briefly explain your calculations.
N=10 servings, multiply this value by the OG mean to get new sample mean (130)
X ∼ N (13, 1.32^2), new standard deviation is (1.32/sqrt10) = 4.17
95% probability = 0.95, this value can be used to find the z-score, which is 1.6.
X = 130 + 1.6(4.17)
X = 130 + 6.672 = 136.672 g

Exercise 4. You will be comparing the sampling distributions for two different estimators of σ, the
population standard deviation.
When trying to estimate the standard deviation of a population (σ) from a sample we could use:
¯ )2
s1 òq(X−X ñq
n−1 (Xn−X¯ )2
or s2 =
=

5
The graphs below give the sampling distributions produced by these estimators when drawing a sample
of size 8 from a normal population with mean µx = 3 and standard deviation σX = 5.
What do you notice about the mean of the standard deviations produced using the s1 estimator compared
to the s2 estimator compared to the true population standard deviation? Why do we prefer to use the
s1 formulation when we have a sample of data and are interested in estimating the population standard
deviation? (You should use the resulting histograms to help you answer the question and use the word
“bias”.)

The two histograms have 2 different bias values, but the mean of SDs of s1 falls closer to the population
standard deviation,.
Both estimators have different biases, representing the data and differing standard deviations/means of SDs.

s1 sampling distribution
0.30
0.20
Density

Population SD sigma=5
Mean of Generated S1 SDs
0.10
0.00

0 2 4 6 8 10 12

s2 sampling distribution
0.30
0.20
Density

Population SD sigma=5
Mean of Generated S2 SDs
0.10
0.00

0 2 4 6 8 10 12

You might also like