Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

UNIT FIVE - Samp

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 10

UNIT FIVE: SAMPLING DESIGN

Basic Concepts in Sampling


The Need for Sampling and Characteristics of a Good Sample Design
Types of Sampling techniques
5.1. BASIC CONCEPTS IN SAMPLING

Meaning of Sampling
Sampling is the process of selecting a suitable or representative part of a population for the purpose
of determining parameters or characteristics of the whole population. The basic idea of sampling is
that by selecting some of the elements in a population, we may draw conclusions about the entire
population. In statistics this is called making INFERENCES.
A population is the total collection of elements about which we wish to make some inferences or
want to know some characteristics. A population element is the subject on which the measurement is
being taken. Examples of populations are hospital patients, road accidents, pet owners, unoccupied
property or bridges. It is usually far too expensive and too time consuming to collect information
from every member of the population, exceptions being the General Election and The Census, so
instead we collect it from a sample.
Bias as a statistical term means error. To say that you want an unbiased sample may sound like you're
trying to get a sample that is error free. As appealing as this notion may be, it is impossible to achieve!
Error always occurs -- even when using the most unbiased sampling techniques. One source of error is
caused by the act of sampling itself.
The major objective of sampling theory is to provide accurate estimates of unknown parameters from
sample statistics that can be easily calculated.
The list from which the respondents are drawn is referred to as the sampling frame or working
population. It includes lists that are available or that are constructed from different sources
specifically for the study. Directories, membership or customer lists, even invoices or credit card
receipts can serve as a sampling frame. However, comprehensiveness, accuracy, currency, and
duplication are all factors that must be considered when determining whether there are any potential
sampling frame errors. For instance, if reservations and payments for certain business travelers is made
by their companies without specifying the actual guest name, these would not be included if the
sampling frame is the hotel’s guest list. This could lead to potential under representation of business
travelers.
The population we want to know about is called the target population, as it is the one we are
interested in and targeting. Identifying the target population is not always as easy as it might appear,
and once identified there are many practical difficulties. If your target population is cat owners how
do you find a list of them? If it is to be of any use the sample must represent the whole of the
population we are interested in, and not be biased in any way. This is where the skill in sampling lies: in
choosing a sample that will be as representative as possible. As a general rule the larger the sample,
the better it is for estimating characteristics of the population.
The sampling frame is the complete listing of population elements from which sample is actually
drawn. As a practical matter, however, the sampling frame often differs from the theoretical
population. The list normally contains errors and omissions.

5.2. The Need for Sampling and Characteristics of a Good Sample Design

Why Sample?
There are different reasons for taking sample from a population rather than considering the whole
population in a given investigation (census).
Economic advantage (Cost factor): The economic advantage of using a sample in research is obviously.
Taking a sample requires fewer resources than a census.
Time factor: Sampling helps to collect vital information more quickly. The speed of execution reduces
the time between the recognition of a need for information and the availability of that information. The
modern world is highly dynamic. Therefore, any study must be completed in short times, otherwise, by
the time the survey is completed the situations, characteristics, or conditions might have changed. There
should be timely delivery of information for decision-making.
The experiment may be destructive: Sometimes "measuring" or "testing" something destroys it. The
government requires automakers who want to sell cars to demonstrate that their cars can survive certain
crash tests. Obviously, the company can't be expected to crash every car, to see if it survives! So the
company crashes only a sample of cars. Obviously, each car could not be crash tested to determine its
strength!
Samples may result in higher quality: The results obtained by sampling often are almost as accurate as
and sometimes even more accurate than those obtained from census. This is mainly because trained and
experienced investigators generally conduct the entire work in sample survey. Sampling possesses the
possibility of better interviewing (testing), more thorough investigation of missing, wrong or suspicious
information, better supervision, and better processing than is possible with complete coverage.
Detailed information: More detailed information through detailed analyses can be obtained through
sample survey, as the data is manageable.
Inaccessibility: There are some populations that are so difficult to get access to that only a sample can be
used. Like people in prison, like crashed airplanes in the deep seas, presidents etc. The inaccessibility
may be economic or time related. Sometimes it might be the case that not all units in the population can
be identified, such as all the air molecules in a basin. So to measure air pollution, you take a sample of air
molecules. Also, even if all those air molecules could be identified, it would be too expensive and too
time consuming to measure them all.
Nevertheless, the advantages of sampling over census studies are less compelling when the population
is small and the variability is high. Under such cases we call for census. A CENSUS is feasible when
the population is small and necessary when the elements are quite different from each other (high
variability).
What is a good sample?
The ultimate test of a sample design is how well it represents the characteristics of the population it
purports to represent. In measurement terms, the sample must be valid. Validity of a sample depends
on two considerations. These are Accuracy and Precision. Accuracy refers to the magnitude of non-
sampling error while Precision refers to the extent to which sample statistics represent population
parameters, degree of sampling error.

5.3. Types of Sampling techniques

There are several decisions to be made in securing a sample. An investigator should consider the
following points in the process of sampling design;
What is the relevant population?
Sometimes there may be confusion whether the population consists of individuals, households, or
families, or a combination of these. If a study concerns income, then the definition of the population
element as individual or household can make quite a difference. Good operational definitions are
critical at this point.
What are the parameters of interest?
What is the sampling frame?
What size of sample is needed?
How much will it cost?
There are two bases on which sampling approaches may be classified. These are representation and
element selection. According to representation sampling is classified as Probability and Non-
Probability Sampling on the other hand, with respect to element of selection we have restricted and
Unrestricted samples.

Probability (Random) Sampling Non-Probability (Non-Random) Sampling

Allows use of statistics, tests hypotheses Exploratory research, generates hypotheses

Can estimate population parameters Population parameters are not of interest

Eliminates bias Adequacy of the sample can't be known

Must have random selection of units Cheaper, easier, quicker to carry out

PROBABILITY SAMPLING
With probability sampling, all elements (e.g., persons, households) in the population have some
opportunity of being included in the sample, and the mathematical probability that any one of them
will be selected can be calculated.
Many strategies can be used to create a probability sample. Each starts with a sampling frame, which
can be thought of as a list of all elements in the population of interest (e.g., names of individuals,
telephone numbers, house addresses, census tracts). The sampling frame operationally defines the
target population from which the sample is drawn and to which the sample data will be generalized.
Random sampling technique ensures that bias is not introduced regarding who is included in the
survey. Five common random sampling techniques are:
Simple random sampling,
Systematic sampling,
Stratified sampling,
Cluster sampling, and
Multi-stage sampling
Simple Random Sampling (SRS)
A simple random sample is one in which each member (person) in the total population has an equal
chance of being picked for the sample. In addition, the selection of one member should in no way
influence the selection of another. Simple random sampling should be used with a homogeneous
population, that is, one composed of members who all possess the same attribute you are interested in
measuring. In identifying the population to be surveyed, homogeneity can be determined by asking the
question, “What is (are) the common characteristic(s) that are of interest?” These may include such
characteristics as age, sex, rank/grade, position, income, religious or political affiliation, etc. --
whatever you are interested in measuring.
The best way to choose a simple random sample is to use a random number table (or let a computer
generate a series of random numbers automatically). In either case, you would assign each member of
the population a unique number (or perhaps use a number already assigned to them such as house
number, telephone number, P.O.Box, etc.). The members of the population chosen for the sample will
be those whose numbers are identical to the ones extracted from the random number table (or
computer) in succession until the desired sample size is reached. Many statistical texts or mathematical
tables treat random number generation. A less rigorous procedure for determining randomness is to
write the name of each member of the population on a separate card, and with continuous mixing,
draw out cards until the sample size is reached.
Steps in SRS
Begin with a sampling frame = a list of every element in the population.
Find a Random Number Table or use Excel to generate random numbers. One can also use the
Lottery Method.
Pick the first number randomly.
Pick another number, and repeat the selection process until you have your full sample.
There are two desirable qualities associated with SRS:
EQUAL PROBABILITY = every element has an equal probability of inclusion (the real definition of
random).
INDEPENDENT SELECTION = Choosing one element first doesn't have any influence on what other
elements get chosen. A violation of this would be a matched pair sample, where you choose a
husband and wife together. Inclusion of the husband definitely affects choosing the wife.
Systematic Sampling
Under systematic sampling only the first unit of the sample is selected at random and the remaining
units are selected at fixed intervals.
Systematic sampling is especially applicable when the population to be studied is arranged in time. It
is often used in industry, where an item is selected for testing from a production line (say, every fifteen
minutes) to ensure that machines and equipment are working to specification.
This technique could also be used when questioning people in a sample survey. A market researcher
might select every 10th person who enters a particular store, after selecting a person at random as a
starting point; or interview occupants of every 5th house in a street, after selecting a house at random
as a starting point.
Steps in Systematic Sampling
Begin with a numbered sampling frame.
Choose your sampling interval = number in population divided by number desired in sample, or I = N/n.
If a systematic sample of 500 students were to be carried out in a university with an enrolled population
of 10,000, the sampling interval would be: I = N/n = 10,000/500 =20. If I is not a whole number, then it is
rounded up.
Choose your random number between 0 and N/n
Select the element that corresponds to the random number. Then instead of picking a second random
number, etc., count out the interval (N/n) and choose that element. When you get to the end of the list go
back to the beginning until you have your full sample.
For example, to select a sample of 25 dorm rooms in your college dorm, make a list of all the room
numbers in the dorm. Say there are 100 rooms. Divide the total number of rooms (100) by the number
of rooms you want in the sample (25). The answer is 4. This means that you are going to select every
fourth dorm room from the list. But you must first consult a table of random numbers. Pick any point
on the table, and read across or down until you come to a number between 1 and 4. This is your
random starting point. Say your random starting point is "3". This means you select dorm room 3 as
your first room, and then every fourth room down the list (3, 7, 11, 15, 19, etc.) until you have 25
rooms selected.
Stratified Sampling
A general problem with random sampling is that you could, by chance, miss out a particular group in
the sample. However, if you form the population into groups, and sample from each group, you can
make sure the sample is representative.
A stratified random sample is defined as a combination of independent samples selected in proper
proportions from homogeneous groups within a heterogeneous population. The procedure calls for
categorizing the heterogeneous population into groups that are homogeneous by themselves. If one
group is proportionally larger than another, its sample size should also be proportionally larger. The
number of groups to be considered is determined by the characteristics of the population. Many times
the survey plan will determine some or all of the groups. For example, if you are comparing enlisted
and officer segments on your base, each of these will be a separate group.
In stratified sampling, the population is divided into groups called strata. A sample is then drawn from
within these strata. In ideal stratification, groups (strata) are assumed to be internally homogenous and
externally heterogeneous.
Stratified random sampling requires a detailed knowledge of the distribution of attributes or
characteristics of interest in the population to determine the homogeneous groups that lie within it. A
stratified random sample is superior to a simple random sample since the population is divided into
smaller homogeneous groups before sampling, and this yields less variation within the sample. This
makes possible the desired degree of accuracy with a smaller sample size.
But, if you cannot accurately identify the homogeneous groups, you are better off using the simple
random sample since improper stratification can lead to serious error.
Stratified sampling techniques are generally used when the population is heterogeneous, or
dissimilar, where certain homogeneous, or similar, sub-populations can be isolated (strata). Simple
random sampling is most appropriate when the entire population from which the sample is taken is
homogeneous.
Primary purpose of stratified sampling method is to increase the representativeness of the sample
without increasing the size of the sample on the basis of having greater knowledge of the population
characteristics.
Steps in Stratified RS:
Divide the population into non-overlapping groups (strata), so that N1 + N2 + N3 + ... + Ni = N.
Then select a sample of n/N % in each stratum, which is called Proportionate Sampling, where ni =
(n/N)Ni
Each stratum is properly represented so that the sample proportion is equal to the population proportion.
n1 + n2 + ..+ ni = n.
In proportionate stratified sampling each stratum is properly represented so the sample proportion is
equal to the population proportion. On the other hand, one may also consider disproportionate
sampling in which case sample size taken from each stratum is not based on proportion rather it is
based on judgment. According to disproportionate stratified sampling, from a given stratum we take a
large sample if the stratum is large, more variable internally and sampling is cheaper in the stratum.
Example:
Suppose you wanted to find out the attitudes of students on your campus about immigration,
you may want to be sure to sample students who are from every region of the country as well as
foreign students. Say your student body of 10,000 is made up of 5,000 - West; 1,000 - East;
2,000 - Midwest; 1,500 - South; 500 - Foreign. If you want to select a sample size of 200
students, how many students will be selected from each group?
In order to say something about the attitudes of the total student population of the university,
however, you will have to apply weights to the findings for each sub-group, proportional to its
presence in the total student body.
Given: N = 10,000, n = 500, N1 = 5,000, N2 = 1,000, N3 = 2,000, N4 = 1,500, N5 = 500
Sample proportion to be selected from each stratum is n/N = 200/10,000 = 0.02.
Thus, n1 = 0.02(5,000) = 100, n2 = 0.02(1,000) = 20, n3 = 0.02(2,000) = 40,
n4 = 0.02(1,500) = 30, n5 = 0.02(500) =10.
And 100+20+40+30+10 = 200
Cluster Sampling
Cluster sampling divides the population into groups, or clusters. These clusters are internally
heterogonous and externally homogenous. In other words, any two clusters are assumed to be similar
while individual elements within a given cluster are different. Within each cluster simple random
sampling or some other method then chooses units. Ideally the clusters chosen should be dissimilar so
that the sample is as representative of the population as possible.
Cluster sampling views the units in a population as not only being a member of the total population but
as members also of naturally occurring in clusters within the population. For example, city residents
are also residents of neighborhoods, blocks, and housing structures. Other examples of clusters may be
factories, schools and geographic areas such as electoral sub-divisions. The selected clusters are then
used to represent the population.
The basic premise in cluster sampling is that each cluster will be a prototype of the population. Hence,
analysis conducted on one cluster will reflect the attribute of the whole population. But here the
question is “to what an extent is this assumption practical?” especially in the context of business
environment. This is why we say the statistical efficiency of cluster sampling is poor as compared to
other sampling options.
Two conditions foster the use of cluster sampling: economic efficiency and practical unavailability of
sampling frame.
Cluster designs where the primary sampling unit represents a cluster of units based on geographic area
are Area Sampling.
Multistage Sampling
Sometimes the population is too large and scattered for it to be practical to make a list of the entire
population from which to draw a SRS. For instance, when the a polling organization samples US
voters, they do not do a SRS. Since voter lists are compiled by counties, they might first do a sample
of the counties and then sample within the selected counties. This illustrates two stages. In some
instances, they might use even more stages. At each stage, they might do a stratified random sample on
sex, race, income level, or any other useful variable on which they could get information before
sampling.
In a multistage random sample, a large area, such as a country, is first divided into smaller regions
(such as sub cities), and a random sample of these regions is collected. In the second stage, a random
sample of smaller areas (such as weredas) is taken from within each of the regions chosen in the first
stage. Then, in the third stage, a random sample of even smaller areas (such as neighborhoods) is taken
from within each of the areas chosen in the second stage. If these areas are sufficiently small for the
purposes of the study, then the researcher might stop at the third stage. If not, he or she may continue
to sample from the areas chosen in the third stage, etc., until appropriately small areas have been
chosen.
Multi-stage sampling is like cluster sampling, but involves selecting a sample within each chosen
cluster, rather than including all units in the cluster. Thus, multi-stage sampling involves selecting a
sample in at least two stages. In the first stage, large groups or clusters are selected. These clusters are
designed to contain more population units than are required for the final sample. In the second stage,
population units are chosen from selected clusters to derive a final sample. If more than two stages are
used, the process of choosing population units within clusters continues until the final sample is
achieved.

NON-PROBABILITY SAMPLING
Unlike the case of probability sampling, in non-probability sampling the probability that an elementary
unit in the population will be included in the sample is unknown. It is not predetermined. Instead of
objective approach we follow subjective approaches. Individual elementary units are selected based
not on chance but on personal intuition feeling, judgment, etc.
With non-probability sampling, not every unit has a chance of selection in the sample and the process
involves some amount of subjectivity instead of following predetermined, probabilistic pathways. This
can be useful in small scale exploratory studies where we wish to gain great familiarity with the
population rather than to reach statistical solutions.
Samples are selected by the discretion of the researcher. They are often quick and cheap to create,
even if they usually are less representative than random ones.
Convenience Sampling
These are the ones like "man on the street interviews," or whoever walks by. The researcher selects
units that are convenient, close at hand, easy to reach, etc. Convenience sampling means that members
of such samples are chosen mainly because they are readily available and willing to be involved -
hence there is a saving of time and money.
Such samples might not be representative of the population and so it might be difficult to make
conclusions about a population based on this type of sample. If your sample is made up of volunteers,
then it is likely to be biased because the volunteers may be actively supporting/promoting a point of
view.
Examples of convenient samples include selecting:
the first ten cars to enter a car park
the first ten people to walk through a turnstile at a sporting event,
Females in the first row of a concert.
Purposive Sampling
Purposive sampling is a non-probability sampling approach that conforms to certain criteria. There are
two types of purposive sampling, quota sampling and judgment sampling.
Quota Sampling
A Quota is a sample size for a sub-group. It is sometimes useful to establish quotas to ensure that your
sample accurately reflects relevant sub-groups in your target population. For example, men and
women have somewhat different opinions in many areas. If you want your survey to accurately reflect
the general population's opinions, you will want to ensure that the percentage of men and women in
your sample reflect their percentages of the general population.
In quota sampling the selection of the sample is made by the interviewer, who has been given quotas to
fill from specified sub-groups of the population. For example, an interviewer may be told to sample
50 females between the age of 45 and 60.
Quota sampling is a type of stratified sampling in which selection within the strata is non-random. The
researcher constructs quotas for different types of units. For example, to interview a fixed number of
shoppers at a mall, half of whom are male and half of whom are female.
Market and opinion researchers often use quota sampling. Its main advantages are that it is less costly
and easier to administer than many other methods.
The main argument against quota sampling, as already explained, is that it does not meet the basic
requirement of randomness. Some units may have no chance of selection, or the chance of selection
may be unknown. Therefore, the sample may be biased.
Judgmental Sampling
The procedure is simply to ask an expert on the issue being investigated to define the members that
should comprise the sample. The representativeness of the sample is determined solely by the
judgment of the researcher. Since each member in the population does not have an equal chance of
being chosen, a judgment sample is also a nonrandom sampling method. For many statistical sampling
applications, a judgment sampling technique should never be used in a statistical evaluation effort.
This is because the sample does not meet the criterion of randomness.
Snowball Sampling
In snowball sampling, you begin by identifying someone who meets the criteria for inclusion in your
study. You then ask them to recommend others who they may know who also meet the criteria.
Although this method would hardly lead to representative samples, there are times when it may be the
best method available. Snowball sampling is especially useful when you are trying to reach
populations that are inaccessible or hard to find. For instance, if you are studying the homeless, you
are not likely to be able to find good lists of homeless people within a specific geographical area.
However, if you go to that area and identify one or two, you may find that they know very well who
the other homeless people in their vicinity are and how you can find them.

You might also like