Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
9 views

RM Assignment 2

1. The document provides examples and explanations for statistical concepts including central limit theorem, z-tests, t-tests, and confidence intervals. 2. For a one-tailed z-test example, the document tests if the mean weight of water bottles is greater than 10kg. For two tailed t-test examples, it tests if a material's mean strength is equal to a target value, and if the mean blood sodium concentration of patients is within the total population range. 3. The document calculates a 95% confidence interval for the mean blood sodium concentration using a t-distribution, given a sample size of 18 patients.

Uploaded by

7 Edu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

RM Assignment 2

1. The document provides examples and explanations for statistical concepts including central limit theorem, z-tests, t-tests, and confidence intervals. 2. For a one-tailed z-test example, the document tests if the mean weight of water bottles is greater than 10kg. For two tailed t-test examples, it tests if a material's mean strength is equal to a target value, and if the mean blood sodium concentration of patients is within the total population range. 3. The document calculates a 95% confidence interval for the mean blood sodium concentration using a t-distribution, given a sample size of 18 patients.

Uploaded by

7 Edu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

RESOURCE METHODOLOGY

Assignment-2
Q1. Take any data with 50 data points each for two sets and find the following:

1. No of observations
2. Minimum Value
3. Maximum Value
4. Range
5. Mean
6. Median
7. Mode
8. Standard Deviation
9. Variance
10. Coefficient of variance

X 1 2 2 3 3 3 4 4 4 4 5 5
5 5 5 6 6 6 6 6 6 7 7 7
7 7 7 7 8 8 8 8 8 8 8 8
8 9 9 9 9 9 9 9 9 10 10 10
10 10

Y 2 4 4 6 6 6 8 8 8 8 10 10
10 10 10 12 12 12 12 12 12 14 14 14
14 14 14 14 16 16 16 16 16 16 16 16
16 18 18 18 18 18 18 18 18 20 20 20
20 20
X Y
1.       No of observations 50 50
2.       Minimum Value 1 2
3.       Maximum Value 10 20
4.       Range 9 18
5.       Mean 6.61 13.2
6.       Median 7 14
7.       Mode 8 16
8.       Standard Deviation Population size:49 Population size:49
Mean (μ): 6.6122448979592 Mean (μ): 13.224489795918
9.       Variance 5.42 21.68
10.   Coeffi cient of variance N: 49 N: 49
M: 6.61 M: 13.43
SS: 265.63 SS: 936
σ2 = SS ⁄N  = 265.63/49 = 5.42 σ2 = SS ⁄N  = 936/49 = 19.1
σ = √σ2 = √5.42 = 2.33 σ = √σ2 = √19.1 = 4.37
CV  = (σ/M )*100 = (2.33/6.61)*100
CV ==(σ/M
35.21)*100 = (4.37/13.43)*100 = 32.55

Median:

When the number of observations is odd the formula is:

When the number of observations is even the formula is:

standard deviation:

The formula for the standard deviation of a sample is:

where n is the sample size and x-bar are the sample mean.

The formula for the standard deviation of an entire population is:


where N is the population size and μ is the population mean.

Variance

The formula for the variance of a sample is:

where n is the sample size and x-bar are the sample mean.

The formula for the variance of an entire population is:

where N is the population size and μ is the population mean.

coefficient of variation:

Coefficient of Variation = (Standard Deviation / Mean) * 100.

In symbols: CV = (SD/ ) * 100.

Q2. Assume two sets of data “X” & “Y”. Calculate covariance between X & Y.

The covariance gives some information about how XX and YY are statistically related. covariance is a
measure of the relationship between two random variables. The metric evaluates how much – to
what extent – the variables change together. In other words, it is essentially a measure of the
variance between two variables. However, the metric does not assess the dependency between
variables.

Where:

 Xi – the values of the X-variable


 Y – the values of the Y-variable
 X̄  – the mean (average) of the X-variable
 Ȳ – the mean (average) of the Y-variable
 n – the number of data points

Difference Difference
Year Price X Price Y X Y X*Y
14535.3
2010 1692 68 -352.8 -41.2 6
2011 1978 102 -66.8 -7.2 480.96
2012 1884 110 -160.8 0.8 -128.64
2013 2151 112 106.2 2.8 297.36
21244.1
2014 2519 154 474.2 44.8 6
  10224 546 8179.2 436.8 36429.2
Mean 2044.8 109.2

Covariance (XY) = 36429.2 / 2= 9,107

In such a case, the positive covariance indicates that the price of the stock and the S&P 500 tend to
move in the same direction.

Q3. Briefly explain central limit theorem with brief examples.

Central limit theorem is a statistical theory which states that when the large sample size is having a
finite variance, the samples will be normally distributed, and the mean of samples will be
approximately equal to the mean of the whole population. This fact holds especially true for sample
sizes over 30.

Example: The average weight of a water bottle is 30 kg with a standard deviation of 1.5 kg. If a
sample of 45 water bottles is selected at random from a consignment and their weights are
measured, find the probability that the mean weight of the sample is less than 28 kg.
Solution:

Population means:  30 kg

Population standard deviation:  = 1.5Kg


Sample size: n = 45 (which is greater than 30)
Using, z-score, we have
The sample standard deviation:
= 6.7082
Find z- score for the raw score of x = 28 kg

= (28 – 30) (6.7082) = -0.2981


Using z- score table OR normal cdf function on a statistical calculator,
P(z < -0.2981) = 0.3828
Thus the probability that the weight of the cylinder is less than 28 kg is 38.28%.

Q4. Analyse z-Test with three examples for the following.

1. One example for One tailed test


2. Two examples for two tailed test

One tailed test

A test of a statistical hypothesis , where the region of rejection is on only one side of the sampling
distribution , is called a one-tailed test.

For example, suppose the null hypothesis states that the mean is less than or equal to 10. The
alternative hypothesis would be that the mean is greater than 10. The region of rejection would
consist of a range of numbers located on the right side of sampling distribution; that is, a set of
numbers greater than 10.

Two Tailed Test

A test of a statistical hypothesis , where the region of rejection is on both sides of the sampling


distribution , is called a two-tailed test.

For example,

1.suppose the null hypothesis states that the mean is equal to 10. The alternative hypothesis would
be that the mean is less than 10 or greater than 10. The region of rejection would consist of a range
of numbers located on both sides of sampling distribution; that is, the region of rejection would
consist partly of numbers that were less than 10 and partly of numbers that were greater than 10.

2.In the mechanics of materials, the strength of a material is its ability to withstand an applied load
without failure or plastic deformation. Suppose we perform a two-sided 1-sample t-test where we
compare the mean strength (4.1) of parts from a supplier to a target value (5). We use a two-tailed
test because we care whether the mean is greater than or less than the target value.

Q5. Analyse t-Test with an example (Solution should include confidence intervals). Consider two
scenarios each having 25 data points one for “One-Tiled test” and another for “Two-tiled Test”.

A rare congenital disease, Everley’s syndrome, generally causes a reduction in concentration of


blood sodium. This is thought to provide a useful diagnostic sign as well as a clue to the efficacy of
treatment. Little is known about the subject, but the director of a dermatological department in a
London teaching hospital is known to be interested in the disease and has seen more cases than
anyone else. Even so, he has seen only 18. The patients were all aged between 20 and 44.

The mean blood sodium concentration of these 18 cases was 115 mmol/l, with standard deviation of
12 mmol/l. Assuming that blood sodium concentration is Normally distributed what is the 95%
confidence interval within which the mean of the total population of such cases may be expected to
lie?

The data are set out as follows:

To find the 95% confidence interval above and below the mean we now have to find a multiple of
the standard error. In large samples we have seen that the multiple is 1.96 . For small samples we
use the table of t . As the sample becomes smaller t becomes larger for any particular level of
probability. Conversely, as the sample becomes larger t becomes smaller and approaches the values
given in table A, reaching them for infinitely large samples.

Since the size of the sample influences the value of t, the size of the sample is taken into account in
relating the value of t to probabilities in the table. Some useful parts of the full t table appear in . The
left hand column is headed d.f. for “degrees of freedom”. The use of these was noted in the
calculation of the standard deviation. In practice the degrees of freedom amount in these
circumstances to one less than the number of observations in the sample. With these data we have
18 – 1 = 17 d.f. This is because only 17 observations plus the total number of observations are
needed to specify the sample, the 18th being determined by subtraction.
To find the number by which we must multiply the standard error to give the 95% confidence
interval we enter table B at 17 in the left hand column and read across to the column headed 0.05 to
discover the number 2.110. The 95% confidence intervals of the mean are now set as follows:

Mean + 2.110 SE to Mean – 2.110 SE

which gives us:

115 – (2.110 x 283) to 115 + 2.110 x 2.83 or 109.03 to 120.97 mmol/l.

We may then say, with a 95% chance of being correct, that the range 109.03 to 120.97 mmol/l
includes the population mean.

Likewise  the 99% confidence interval of the mean is as follows:

Mean + 2.898 SE to Mean – 2.898 SE

which gives:

115 – (2.898 x 2.83) to 115 + (2.898 x 2.83) or 106.80 to 123.20 mmol/l.

Difference of sample mean from population mean (one sample t test)

Estimations of plasma calcium concentration in the 18 patients with Everley’s syndrome gave a mean
of 3.2 mmol/l, with standard deviation 1.1. Previous experience from a number of investigations and
published reports had shown that the mean was commonly close to 2.5 mmol/l in healthy people
aged 20-44, the age range of the patients. Is the mean in these patients abnormally high?

We set the figures out as follows:

t difference between means divided by standard error of sample mean. Ignoring the sign of
the t value, and entering table B at 17 degrees of freedom, we find that 2.69 comes between
probability values of 0.02 and 0.01, in other words between 2% and 1% and so It is therefore unlikely
that the sample with mean 3.2 came from the population with mean 2.5, and we may conclude that
the sample mean is, at least statistically, unusually high. Whether it should be regarded clinically as
abnormally high is something that needs to be considered separately by the physician in charge of
that case.
Q6. Assume an example and analyse CH Squire test.

The Chi-square formula is used in the Chi-square test to compare two statistical data sets. Chi-
Square is one of the most useful non-parametric statistics. The Chi-Square test is used in data consist
of people distributed across categories, and to know whether that distribution is different from what
would expect by chance.

 A very small Chi-Square test statistic means that your observed data fits your expected data
extremely well.
 A very large Chi-Square test statistic means that the data does not fit very well. If the chi-
square value is large, you can reject the null hypothesis.
Chi-Square is one way to show a relationship between two categorical variables. There are two types
of variables in statistics: numerical variables and non-numerical variables. The value can be
calculated by using the given observed frequency and expected frequency.

Formula for Chi-Square Test


The Chi-Square is denoted by χ2 and the formula is:

χ2 = ∑ (O − E)2 / E
Where,

 O = Observed frequency
 E = Expected frequency
 ∑ = Summation
 χ2 = Chi-Square value

Example: Calculate the chi-square value for the following data:

Male Female

Full Stop 6(observed) 6 (observed)


6.24 (expected) 5.76 (expected)

Rolling Stop 16 (observed) 15 (observed)


16.12 (expected) 14.88 (expected)

No Stop 4 (observed) 3 (observed)


3.64 (expected) 3.36 (expected)

Solution:
Now calculate Chi Square using the following formula:
χ2 = ∑ (O − E)2 / E
Calculate this formula for each cell, one at a time. For example, cell #1 (Male/Full Stop):
Observed number is: 6
Expected number is: 6.24
Therefore, (6 – 6.24)2 /6.24 = 0.0092
Continue doing this for the rest of the cells, and add the final numbers for each cell together to get
the final Chi-Square number. There are 6 total cells, so at the end, you should be adding six numbers
together for your final Chi-Square number.

You might also like