Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

AdHStat1 3notes

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 10

Statistics 1

Unit 1.3

Sampling Methods

Here are a few brief notes on sampling.

Know the difference between a Census and a Sample Survey.

In a Census the entire population is surveyed. In many cases the population is huge or geographically
separated (for example, the U.K. adult population or all readers of a national newspaper, etc), so a Census is
too expensive and time-consuming. In order to save expense a sample is selected from the population with
the important aim that the views or characteristics of the members of the sample will be an accurate
representation of those of the population as a whole.

The following links provide sufficient explanations on sampling methods (quota sampling is not on the clip,
but should be studied)

http://www.youtube.com/watch?v=be9e-Q-jC-0
(or type in ‘sampling methods’ on you tube)

Convenience Sampling

In this the people in the sample select themselves, e.g. "phone-in" polls for T.V. news programmes or
newspaper surveys asking people to respond. A sample selected in this way is useless: the people watching
the programme are unlikely to be representative of the whole population and people with extreme views are
much more likely to respond. Further, in a "phone-in" poll there is nothing to stop someone voting several
times!

Simple Random Sampling

This is equivalent to putting all the names in a hat, mixing the contents fully and drawing the names
randomly from the hat. Each population member has an equal chance of selection.
For a small population this is fairly easy: say the population is of size N. Then to select a random sample of
size n we could number the population then select n random numbers from this range and
choose the people corresponding to these n random numbers.
For large, well-separated populations, however, this process is near impossible, so other sampling methods
will be used.

Stratified Random Sampling

Here the population is divided into non-overlapping groups, or strata, e.g. age-groups, and a random sample
of appropriate size is drawn from each stratum. This will give a good picture of the views of different strata,
so extra information is found using this type of sampling.
Cluster Sampling

The population is divided into non-overlapping clusters. A simple random sample of clusters is drawn and all
the members of these clusters are interviewed.
For example, there are 360 secondary schools in Scotland. We could select a Cluster Sample of S5 H Maths
candidates by selecting 50 schools at random and take as our sample every S5 H Maths candidate in these 50
schools.

Quota Sampling

This is similar to Cluster Sampling, but does not use random selection.
Interviewers are given quotas to fill, e.g. 10 men aged 41-50; 12 women aged 21-30, etc.
The quotas are chosen so that the sample gives a good representation of the population.
This type of sampling is often done in practice, but there can be problems if the interviewers are not
conscientious at their work.

Much sampling of opinion nowadays is done over the telephone. 50 years ago this would have given a biased
sample since people from lower income brackets seldom had access to a telephone 50 years ago. Nowadays,
however, almost everyone has access to a telephone and so it is felt that telephone polls do not lead to bias
nowadays.

Systematic Sampling

Say there are 200 pupils in S4. We wish to select a Systematic Sample of 20 pupils.
Number the pupils
Choose a random integer from Say the number chosen is
Then the Systematic Sample consists of pupils
This is awkward for a large population, and there is a small chance that there will be some periodicity in our
listing which could lead to a biased sample.
Distribution of Sample Means

Suppose that are independent random variables from the same distribution. I.e. we have
selected a sample of size n from a large population, where our random sample process used has allowed us to
assume independence. Therefore we say these random variables are independent and identically
distributed.

Each random variable ( ) has the same mean, and variance, :

i.e. E( )= and V( )=

The mean of this random sample would be denoted as , and calculated by

This sample mean, has expected value . This seems obvious, but you must be able to prove this as
follows:

This sample mean, has variance . Again you must be able to prove this as follows:

Thus, sample means are distributed with mean and variance


The standard deviation of the sample mean is called the standard error of the mean.

Example

A drinks manufacturer makes cans of juice whose contents are normally distributed with mean 305 ml and
standard deviation 10 ml.
A random sample of 16 cans is taken. What is the probability that the sample mean lies between 301 ml and
309 ml?

Note the difference in the type of questions we previously worked on. In section 1.2, our question would
have been specific to the value that one random variable can take. Now, in this section, we are discussing the
value that the mean of a sample of 2 or more variables can take.

Now

Hence

Questions

1. A population is . A random sample of 25 is taken from the population. State the distribution
of the sample mean and its parameters.

2. The lengths of a certain type of caterpillar are known to be normally distributed with a mean length of
6.4cm and s.d. 1.8cm. A random sample of 40 caterpillars is taken. Find the probability that the
sample mean is 5.8cm of less.

3. The weight of cans of sardines can be assumed to be normally distributed with grams.
Quality control takes a random sample of size 5 at regular intervals. Find the probability that the
sample mean is more than 260 grams.

4. The life of fluorescent lighting tubes is assumed to be normally distributed with more mean 400hours
and s.d. 35days. In a large office building, a random sample of size 10 is chosen.
a) Find the probability that the sample mean is less than 380 days.
b) If the sample size is increased to 20, find the probability that the sample mean is less than 380
days.
c) Comment on your answers to a) and b) and state whether you would expect the probability to
increase or decrease if the sample size was increased further.

5. . A computer program generated a simple random sample of four observations from this
distribution:
8.1814 11.9008 7.5648 10.4211

a) Calculate the sample mean to 2dps.


b) Calculate the standard error of the mean.
c) Calculate the probability that the mean of a random sample of four observations from this
distribution would take a value less than that calculated in part a).

6. The heights of men in a population are normally distributed with a mean of 175cm and standard
deviation of 7.5cm.
a) If a man is chosen at random, what is the probability that his height if greater than 180cm?

If a simple random sample of nine men is drawn,


b) Calculate the standard error of the mean.
c) Calculate the probability that the sample mean height exceeds 180cm

7. A certain brand of light bulb has lifetimes which are normally distributed with a mean of 1000hours
and standard deviation 100hours.

a) Calculate the probability that a randomly selected light bulb has a lifetime less than 900 hours.
b) Calculate the standard error of the mean of a random sample of 6 light bulbs.
c) Calculate the probability that the mean lifetime of a random sample of six light bulbs is less than
900hours.

8. .
a) Calculate P .
A random sample of size n is drawn from this population.
b) If n=100, state the distribution of and calculate P
c) Calculate the sample size n if we require P

9. The average number of minutes spent sleeping per night by first year university students is
.
a) Find the probability that a randomly selected student spends less than an average of 430 minutes
asleep per night.
b) A researcher at a certain university takes a random sample of 25 first year students and asks each
of them to record the average time they spent asleep. Find the probability that the sample mean is
less than 430 minutes.
The Central Limit Theorem

As before, is a random sample of independently and identically distributed random variables,


all from the same distribution with have the same mean and variance . However, whether they all came
from a Poisson distribution, a Binomial distribution or a Normal distribution is unknown.

The Central Limit Theorem then states, that for sufficiently large n, the mean of the sample, , is distributed
as follows:

, approximately

the larger n is, the better the approximation will be.

Ideally, a sample size of 20 or 30 gives a good approximation.

This is amazing! – Regardless of the distribution of the original random variables, the sampling distribution
of is approximately Normally distributed provided n is large enough.

To summarise the above results:

If population normally distributed, then for any sample size n.

If population distribution unknown, then approximately for large sample size n.

Exam questions often include stuff on this.

Questions:
1. The mean of a population is 45 and the standard deviation is 4.8.
a) A random sample of 100 is taken. State the distribution of the sample mean and its parameters.
b) If a random sample of size 12 is taken, what can be said about the distribution of the sample mean
?

2. A company which bottles soft drinks uses a machine which dispenses the drink into bottles so that
the mean amount dispensed is 2 litres with s.d. 20 millilitres. A random sample of 50 bottles is taken.
Find the probability that the sample mean is between 1.995 litres and 2.005 litres.
Confidence Interval for the Population Mean

We have a normally-distributed population with known variance and unknown mean


[This is a somewhat unnatural state-of-affairs, if you think about it].

We take a random sample of size n and take the sample mean as an estimate of This sample mean will
generally not be equal to the population mean. Thus we try to identify a range of plausible values for the
population mean.

A 95% confidence interval for is


95% requires 2.5% on either side of the interval, as such 0.975 is used to obtain the from p. 8 of the
tables.

We may interpret this as saying that if we take a large number of samples and compute a confidence interval
for each sample, then 95% of these intervals would be expected to contain the true population mean.

A 99% confidence interval for is


Likewise this requires 0.995 to obtain the 2.58

Other values can be got from p.8.

When , the true population s.d, is not known it can be replaced by the sample s.d (s) as long as .

Example

A cake manufacturer makes cakes whose weights are normally distributed with standard deviation 20 g. A
random sample of 16 cakes has mean 125 g. Find a 95% confidence interval for the mean weight.

Now

Hence the 95% C.I. is


There is a 95% probability that the true mean lies in this interval, i.e. there is a 5% probability that it does
not!

Exam questions often test your understanding of this.


Questions
1. A sample of 200 female students was taken in a large university campus. The heights were measured
and the sample mean was found to be 184 cm with s.d. 2.7 cm. Calculate:
a) a 95%,
b) a 99% confidence interval for the mean height of the whole female student population.

2. It is believed that the mean lifetime of a battery has changed but the standard deviation remains the
same at 14.5hours. A random sample of 25 batteries was found to have a mean lifetime of 240.2
hours. Calculate a 95% confidence interval for the mean lifetime of these batteries.

3. The s.d. of limestone drilling bits used in the oil industry is known to be 24 hours. A random sample
of 25 drilling bits was found to have a mean of 300 hours. Calculate a 95% CI for the population
mean.

4. The owner of a large banana plantation wants to know the average yield per tree. He takes a random
sample of 100 trees and finds that the sample mean is 130.8 bananas with s.d. 28.4. Assuming that the
sample s.d. is a good estimate for the population s.d, construct a 95% CI for the population mean.
Confidence Interval for a Population Proportion (discrete)

[Population Proportion is often called Binomial Proportion]

Here we have a "yes/no" situation – i.e. Binomial

For example, a random sample of 1000 voters are asked "Will you vote for candidate A in the Presidential
Election?"
The sample will give a proportion who say "yes" and a proportion who say "no".
We use to estimate the true population proportion.

Now if X is the number saying "yes" from the random sample of size n then
Recall that

Now

And

Also Be able to prove these.

is known as the standard error of the proportion

From our random sample we take

A 95% confidence interval for is

Note that since we are using the Normal Approximation to the Binomial Distribution here, we should always
check that i.e. the numbers in each of the two categories exceed 5.

Example
From a random sample of 500 voters, 200 say they will vote for candidate A. Find a 95% C.I. for the true
proportion.

Thus the required interval is

There is a 95% probability that the population proportion lies in this interval.
Thus if candidate A claims to have the support of more than half of the voters we have strong evidence to
refute his claim, since the entire 95% interval lies well below
Exam questions often require this kind of interpretation
Questions
1. A random sample of 30 pupils at a large secondary school were asked how they had travelled to
school that day. Nine replied that they had travelled by car.
a) Calculate
i) The sample proportion
ii) The estimated standard error
b) Construct a 95% confidence interval for the true proportion of pupils at this school who travel to
school by car. Interpret this CI.

2. A sample of 1000 Scottish voters found that 220 people said they intended to vote Lib Dem.
Construct an approximate 90% confidence interval for the true proportion of voters wo intend to vote
Lib Dem. Why might this CI give a biased estimate of the actual behavior of voters?

3. The characteristics of transistors produced by a single production line are highly variable. The
transistors are coded A, B or C according to certain characteristics. From a sample of 80, 60 were
found to be code A. Calculate a 95% confidence interval for the proportion of transistors being
produced are of type A.

4. From a sample of 150 students, 30 were awarded a grade A. Calculate a 95% CI for the proportion of
all students sitting the exam who would be awarded a grade A.

You might also like