Some Definitions: Probability Sampling
Some Definitions: Probability Sampling
Some Definitions: Probability Sampling
Some Definitions
That's it. With those terms defined we can begin to define the different
probability sampling methods.
Procedure: Use a table of random numbers, a computer random number generator, or a mechanical
device to select the sample.
example, let's say you want to select 100 clients to survey and that
there were 1000 clients over the past 12 months. Then, the sampling
fraction is f = n/N = 100/1000 = .10 or 10%. Now, to actually draw the
sample, you have several options. You could print off the list of 1000
clients, tear then into separate strips, put the strips in a hat, mix them
up real good, close your eyes and pull out the first 100. But this
mechanical procedure would be tedious and the quality of the sample
would depend on how thoroughly you mixed them up and how
randomly you reached in. Perhaps a better procedure would be to use
the kind of ball machine that is popular with many of the state
lotteries. You would need three sets of balls numbered 0 to 9, one set
for each of the digits from 000 to 999 (if we select 000 we'll call that
1000). Number the list of names from 1 to 1000 and then use the ball
machine to select the three digits that selects each person. The
obvious disadvantage
here is that you need
to get the ball
For example, let's say that the population of clients for our agency can
be divided into three groups: Caucasian, African-American and
Hispanic-American. Furthermore, let's assume that both the AfricanAmericans and Hispanic-Americans are relatively small minorities of
the clientele (10% and 5% respectively). If we just did a simple
random sample of n=100 with a sampling fraction of 10%, we would
expect by chance alone that we would only get 10 and 5 persons from
each of our two smaller groups. And, by chance, we could get fewer
than that! If we stratify, we can do better. First, let's determine how
many people we want to have in each group. Let's say we still want to
take a sample of 100 from the population of 1000 clients over the past
year. But we think that in order to say anything about subgroups we
will need at least 25 cases in each group. So, let's sample 50
Caucasians, 25 African-Americans, and 25 Hispanic-Americans. We
know that 10% of the population, or 100 clients, are African-American.
If we randomly sample 25 of these, we have a within-stratum
sampling fraction of 25/100 = 25%. Similarly, we know that 5% or 50
clients are Hispanic-American. So our within-stratum sampling fraction
will be 25/50 = 50%. Finally, by subtraction we know that there are
850 Caucasian clients. Our within-stratum sampling fraction for them
is 50/850 = about 5.88%. Because the groups are more
homogeneous within-group than across the population as a whole, we
can expect greater statistical precision (less variance). And, because
we stratified, we know we will have enough cases from each group to
All of this will be much clearer with an example. Let's assume that we
have a population that only has N=100 people in it and that you want
to take a sample of n=20. To use systematic sampling, the population
must be listed in a random order. The sampling fraction would be f =
20/100 = 20%. in this case, the interval size, k, is equal to N/n =
100/20 = 5. Now, select a random integer from 1 to 5. In our example,
imagine that you chose 4. Now, to select the sample, start with the 4th
unit in the list and take every k-th unit (every 5th, because k=5). You
would be sampling units 4, 9, 14, 19, and so on to 100 and you would
wind up
with 20
units in
For this to
work, it is
that the
units in the
For instance, in the figure we see a map of the counties in New York
State. Let's say that we have to do a survey of town governments that
will require us going to the towns personally. If we do a simple random
sample state-wide we'll have to cover the entire state geographically.
Instead, we decide to do a cluster sampling of five counties (marked
in red in the figure). Once these are selected, we go to every town
government in the five areas. Clearly this strategy will help us to
economize on our mileage. Cluster or area sampling, then, is useful in
situations like this, and is done primarily for efficiency of
administration. Note also, that we probably don't have to worry about
using this approach if we are conducting a mail or telephone survey
The four methods we've covered so far -- simple, stratified, systematic
and cluster -- are the simplest random sampling strategies. In most
real applied social research, we would use sampling methods that are
considerably more complex than these simple variations. The most
important principle here is that we can combine the simple methods
described earlier in a variety of useful ways that help us address our
sampling needs in the most efficient and effective manner possible.
When we combine sampling methods, we call thismulti-stage
For example, consider the idea of sampling New York State residents
for face-to-face interviews. Clearly we would want to do some type of
cluster sampling as the first stage of the process. We might sample
townships or census tracts throughout the state. But in cluster
sampling we would then go on to measure everyone in the clusters
we select. Even if we are sampling census tracts we may not be able
to measure everyone who is in the census tract. So, we might set up a
stratified sampling process within the clusters. In this case, we would