Chapter 1 - Introduction To Sampling Techniques
Chapter 1 - Introduction To Sampling Techniques
Population:
Population is an aggregate of all units about which we are interested according to some
predetermined objective and are available in a specified area at a specified period of time.
For example, if the predetermined objective is to investigate the socioeconomic condition
of children labor in JU campus, then the total number of child labor who are working in
Jahangirnagar University campus constitute the population.
Finite population:
If a population has definite number of units, then it is called finite population. For
example, if the predetermined objective is to estimate the total number of milking cows
of our country during the period 2005, then each milking cow of our country during that
period is a unit and all the milking cows of this country during this period constitute a
finite population because the number of units is finite.
Infinite population:
If a population has indefinite or uncountable number of units, then it is called infinite
population. For example, if the predetermined objective is to estimate the number of
customers entered in a shopping center in every two hours, then each customer who can
enter in the shopping center in that time period is a unit and all the customers who can
enter in the shopping center in that time period constitute an infinite population because
the number of customers is uncountable or infinite.
Target population and sampled population:
A target population is the entire group about which information is desired and conclusion
to be made. The population, which we actually sample, is the sampled population. It is
also called survey population. The sampled population is more restricted than target
population. The chief reason for the difference between target population and sampled
population arises from the non-response and non-coverage.
Sample:
Sample is a representative part or subset of the population. For example, let us suppose
that a tire manufacturer produced a new tire to provide an increase in mileage over the
firm’s current tires. To estimate the mean number of miles provided by the new tires, the
manufacturer selected a sample of 120 tires for testing. The test results provided a sample
mean of 36,500 miles. Hence, an estimate of the mean tire mileage for the population of
new tires was 36,500 miles.
Random sample:
Any sample selected by chance mechanism with known chance of selection is called a
random sample. The chances of selection need not be equal for all samples, so long as
they are known. A random sample is free from selection bias.
Introduction to Sampling Techniques
Sampling technique:
Sampling technique is a scientific process of selecting a sample from a population and
may also embrace the derivation of estimates and inferences derived from them for that
population.
Sampling unit:
A sampling unit or simply unit is a well defined, distinct and identifiable element or
group of elements on which observation can be made. For a household survey, the
housing apartments or families may constitute the sampling units.
Unit of inquiry:
A unit of inquiry is the unit about which information is required. It may or may not be the
same as the sampling unit. We may select a sample of households and obtain information
about household members. Here households serve as sampling unit and the household
members serve as the unit of inquiry.
Sampling frame:
A sampling frame is a complete list of all units or group of units of the population to be
sampled, organized and arranged in such a manner that every unit occurs once and only
once in the list and no unit is excluded from the list. In sampling problems, we encounter
two types of sampling frames such as area sampling frames and list sampling fames.
Area sampling frames are usually used to sample geographical areas. With this technique,
each element of the population is associated with a particular geographical area
constituted by a group of people or households. In this case, a sample of area is drawn
and either all elements or a part of them in the selected areas are included in the survey.
List sampling frame is a complete list of well defined reporting units. The list should
contain relevant information about individual units, which will enable efficient sampling.
Parameter, Statistic, Estimator and Estimate:
Population characteristic is called parameter. Sample characteristic is called statistics. It
is a function of observable random variables. Clearly, it is itself a random variable. If it is
used to estimate an unknown parameter, then it is called an estimator. A particular value
of the estimator is called an estimate. For example, population mean , population
variance 2 , population standard deviation , population proportion p , etc. are
parameters. Sample mean X , sample variance s2 , sample standard deviation s ,
sample proportion p̂ , etc. are statistics as well as estimators.
Non probability sampling is a non random and subjective method of sampling where the
selection of the population elements comprising the sample depends on the personal
judgment or discretion of the sampler.
The distinguishing feature of non probability sampling is that in such sampling, the
selection of population elements is not made through any probability mechanism and
because of this, the investigator cannot claim that his or her sample is representative of
the population. This greatly limits the investigator’s ability to generalize the findings
beyond the specific sample studied. Further, no confidence interval estimation is possible
for non probability sampling. Varity of non probability sampling designs are used. Some
of the most widely used non probability sampling designs are convenience sampling,
accidental sampling, purposive sampling (judgment and quota), snowball sampling, etc.
Convenience sampling:
Non probability samples that are unrestricted are known as convenience samples.
Researchers or field workers have the freedom to choose whomever they find, thus the
name convenience. The convenience sample may consist of respondents living in an
easily accessible locality. Undoubtedly, it is the simplest and less reliable form of non
probability sampling. The primary virtue is its low cost.
While a convenience sample has no control to ensure precision, this method is quite
frequently used, especially in market research and public opinion surveys. This method is
used because probability sampling is often a time consuming and expensive procedure
and in fact, may not be feasible in many situations. In the early stages of exploratory
research, when one is seeking guidance, convenience sampling is recommended.
Accidental sampling:
An accidental type of sampling is one in which the selection of the cases is made
whatever happens to be available instantly. In such sampling, individuals are selected as
they appear in a process. If it is decided that only diabetic patients will be chosen from a
queue in front of a hospital counter, the resulting sample will lead to an accidental
sampling procedure.
Purposive sampling:
A non-probability sampling method that conforms to certain criteria is called purposive
sampling. There are two major types of purposive sampling, which are judgment
sampling and quota sampling.
Judgment sampling:
Judgment sampling or expert choice is one in which the cases are included for
investigation through a planned selection procedure. In this case, individual are selected
who are considered to be most representative of the population as a whole. It is called a
judgment sampling because choice of the individual units depends entirely on the
sampler, who, on his own judgment, decides the sample to be selected that conform to
some criteria.
In a study of labor problem, one may decide to talk only with those who have
experienced discrimination while they were in job. Election results are predicted from
only a few selected persons because of their predictive records in past elections.
Quota sampling:
Quota sampling is a non probability sampling, equivalent to a stratified sampling, in
which the interviewers are told to contact and interview a certain number of individuals
from certain sub groups or strata of the population to make up the total sample.
In this method, individuals are not pre selected at all, but once the strata are formed
(usually based on sex, age, social status, region of residence, etc.), general breakdown of
the sample is decided (that is, how many persons in each sex category, how many persons
in each age group or how many persons in each social class is to include) and quota
assignments are allocated to the interviewers, selection of the individuals within the strata
is left to the interviewers with whom they are to conduct interviews. The factors (sex,
age, social status, region of residence, etc.), which are used to form strata, are termed as
quota control.
This technique is widely used by market researchers, political opinion seekers and many
others to avoid the cost problems of interviewing a pre-selected sample of individuals.
The term quota arises from the fact that in this method, the interviewers are given quotas
of certain sub-groups (strata) of the population at the very outset to build a sample
roughly proportional to the population. That is, quotas of desired number of sample cases
are computed proportionally to the population sub groups. The sample quotas are divided
among the interviewers, who then do their best to choose persons who fit the restrictions
of their quota controls.
For example, if it is known that one-third of the population lives in urban areas and two-
thirds in rural areas, the sample can be selected purposively from urban and rural areas in
the same proportion. Thus, a total of 300 respondents would mean 100 urban residents
and 200 rural residents to be included in the sample.
Note that quota sampling may be considered equivalent to stratified sampling with the
added requirement that stratum is generally represented in the sample in the same
proportion as in the entire population.
The essential difference between a probability sample and quota sample is that with the
former, interviewers are required to interview specified (pre-selected) persons selected by
a probability mechanism, while with the later they have to complete their quotas in a way
they desire.
It is a common practice, although not necessarily mandatory, for quota samples, to adopt
random selection at the initial stages of selection in exactly the same way as probability
samples. Then, an additional difference between a probability sample and quota sample
lies in the selection of the final sampling units, say individuals.
Advantages of quota sampling:
(1) Its cost per element is lower than that for probability sample.
(2) It is easier to administer and can be can be executed more quickly than a probability
sample.
(3) It can always achieve its intended sample size in each stratum, whereas with a pre-
selected random sample, there will always be some selected individuals who cannot be
found at home or who have migrated elsewhere or who refuse to co-operate, resulting in
increased non-response.
Disadvantages of quota sampling:
(1) The choice of subjects is left to field workers to make on a judgment basis and thus it
suffers from selection bias.
(2) Since the procedure for selecting the sample is ill-defined, there is no valid method of
estimating the standard error of a sample estimator.
Snowball sampling:
Snowball sampling is non probability sampling in which persons initially chosen for the
sample are used as informants to locate other persons having necessary characteristics
making them eligible for the sample through referral network.
It is the colorful name for technique of building up a list or sample of a special
population. Some recent authors have referred to snowball sampling as chain referral
sampling. It has achieved increased used in recent years in situations where respondents
are difficult to identify and are best located by using an initial set of its members or
informants through referral network approach.
For example, consider the selection of beggars for which no frame is available. This can
be best done by asking an initial group of beggars to supply the names of other beggars
they come across. Selection of mosque Imams or the sex workers also can be made
following this network approach, since members of this population may well know each
other particularly in small areas.
Although snowball sampling is generally considered non probability sampling, strategies
have been developed to draw snowball sampling through probabilistic approach which
allows compilations of sampling errors and use of statistical test of significance. If one
wishes the snowball sample to be probabilistic, one should sample randomly within each
stage.
Snowball sampling, whether probabilistic or non probabilistic, is conducted in stages. In
the first stage, a few persons possessing the requisite characteristic are identified and
interviewed. These persons are used as informants to identify others who qualify for
inclusion in the sample. The second stage involves interviewing these persons and so on.
The term snowball stems from the analogy of a snowball, which begins small but
becomes bigger and bigger as it falls downhill. Snowball sampling has been particularly
used to study drug cultures, heroin addiction, teenage gang activities and other issues
where respondents may not be readily visible or are difficult to identify and contact.
Advantages of non probability sampling:
(1) It may meet the sampling objectives satisfactorily in some instances. It may be
perfectly adequate if the researcher has no desire to generalize his or her findings beyond
the sample or if the study is a trial run for a longer study to be attempted at a later date.
(2) In situations, where a truly representative probability sample is too complicated, time
consuming, expensive and calls for more planning and repeated callbacks, a carefully
controlled non probability sampling remains the only viable means of collecting data.
(3) Sometimes it is impossible or too expensive to find enough cases of a particular type
by using probability sampling. In such cases, non probability sampling may be the only
feasible alternative.
(4) Since it is less complicated and much less expensive, it may be executed on a spur-of
the moment basis to take advantages of available respondents without the statistical
complexity of a probability sampling.
Disadvantages of non probability sampling:
(1) It offers no insight into the reliability of the resulting estimates and hence, no
generalization can be made regarding the population which is being sampled.
(2) The investigator cannot claim that his or her sample is representative of the
population, since the probability that a unit will be chosen is not known.
(3) The investigator is unable to estimate the degree of departure from representation
(sampling error).
(4) No sampling theory can be developed out of a non probability sample, since no
element of random selection is involved.
(5) No comparison of the results can be made for the non probability samples unless one
can find a situation in which the results are known, either for the whole population or for
a probability sample.
Probability sampling:
Probability sampling is the scientific method of selecting samples according to some laws
of chance in which each unit in the population has some definite pre-assigned probability
of being selected in the sample.
As a result, selection biases are possible to be avoided and statistical theory can be
applied to derive the properties of the estimators. A probability sample is so designed that
statistical inference about the population can be based on the measures of variability
computed from the sample data. In addition, probability sampling allows us to construct a
confidence interval within which the true value of the population parameter is expected to
lie.
A good number of probability sampling designs are in use. Among the most widely used
are simple random sampling, stratified sampling, systematic sampling, cluster sampling,
multi-stage sampling, multi-phase sampling, probability proportional to size sampling,
etc.
Systematic sampling:
Systematic sampling consists of selecting only the first unit at random, the rest being
automatically selected according to some predetermined pattern involving regular
spacing of units. Suppose that a sample of n units is to be selected from a population of
N units.
Let these units be numbered from 1 to N in some order. Let N nk , where k is an
integer, called sampling interval. To select a sample of n units, choose a unit at random
from the first k units and every k th units thereafter. Thus, if a unit randomly selected
happens to be numbered r and the predetermined sampling interval is k , the sample will
consist of units bearing numbers:
r , r k , r 2k , . . . , r n 1 k
For example, suppose that a population consists of 15 elements, numbered serially from
01 to 15 and that a random sample of 3 units is desired. To achieve this, select at random
one of the first 5 units, 01 to 05 and then every 5th unit in the sequence. If the first unit is
03, then the sample will consist of units 03, 08 and 13. If the first unit is 01, then the
sample will consist of units 01, 06 and 11. This procedure termed as linear systematic
sampling.
If N nk , then a systematic sample will contain either n or n 1 units depending on the
serial number of the first selected unit. Such a sample is called a non-linear systematic
sampling.
Cluster sampling:
In random sampling, the population can be divided into a finite number of distinct and
identifiable units defined as sampling units. The smallest units into which the population
can be divided are called the elementary units or elements of the population. The groups
of such elementary units or elements, which are internally heterogeneous and externally
homogeneous with respect to the study variables, are known as clusters.
If we treat these clusters as sampling units and select only a sample of them and if all the
elementary units or elements in the selected clusters are included in the sample, then the
method is known as single-stage cluster sampling or simply, cluster sampling.
Survey is a general term that refers to the collection of data by means of interviews,
questionnaires or direct observations. Census is the complete count of all elements about
which we are interested. Sample survey is a study involving a subset (sample) of
individuals selected from a larger population by accepted statistical methods. It is an
alternative to complete count of a population serving as a basis for estimates or inferences
for that population.
Steps in planning and executing a sample survey:
Sample survey is the most efficient technique of providing relevant information for
drawing inference about a population. From economic point of view, it is the only viable
means to study the population. It is therefore essential to describe the main steps involved
in executing a sample survey. Some of the steps are as follows:
(1) Objectives of the study: Whenever we plan a sample survey, a clear and concise
statement of the objectives should be laid down. The objectives must be kept simple
enough to be understood by those working on the survey and to be met successfully when
the survey is completed.
(2) Target population: The population from which sample is to be drawn should be
defined and identified in clear and unambiguous terms. The target population may be
modified to survey population to take account of practical constraints.
(3) Data: The data to be collected must be relevant and pertinent to the purpose of the
survey. Keeping the objectives in view, a detailed list of variables should be prepared,
defined and how these variables will be measured, should be indicated in advance.
(4) Precision desired: In a sample survey, only a part of the population is measured for
which the survey results are almost always subject to error. Error of measurement is also
an additional source of distorting the survey results. These errors can be reduced to some
extent by using larger sample and improved measuring instruments. But this involves
additional cost, time and effort. Consequently, a decision on the degree of precision
desired in the result must be specified.
(5) Sampling frame: A sampling frame is an indispensable tool for conducting a sample
survey. A complete, accurate and up-dated sampling frame must be constructed in order
to draw valid sample.
(6) Duration of the study: Once the date of execution of the study is decided, it remains
to set up a work schedule for the completion of the various stages of the study.
(7) Sample design: Sampling design refers to the methods to be followed in selecting a
sample from the target population and the estimation technique (that is, formula for
computing the sample statistics). These statistics are the estimates used to infer the
population parameters.
Implicit in the concept, the sampling design also includes such issues as the choice of the
sampling frame, determination of the size of the sample, estimation of reliability of the
estimates, stratification procedure, sample allocation method, clustering of the sample,
etc.
The sampling design, which is the most efficient in terms of cost, reliability and
appropriateness to meet the objectives of the study, should be employed. An
appropriately chosen sampling design is highly desirable to obtain reliable estimates of
the population parameters.
(8) Survey design: Survey design is the process of preparing a complete plan of
operations to be followed in conducting a survey and disseminating its intended results.
Specially, it includes, among others, decisions on such factors as variables to be included
in the survey (called survey variables), the method of data collection (whether by self
administered questionnaire, interview schedule, telephonic conversation or direct
interview), construction of questionnaire, organizing fieldwork, data processing and data
analysis.
It seems obvious that the survey objectives covered under survey design determine the
sample design and in practice the sample design must be developed as an integral part of
the overall survey design. Survey design and sample design are thus two interrelated
concepts and one is complementary to other.
(9) Sample size determination: Determination of sample size is perhaps the most
difficult part of a statistical investigation. Often it is claimed that a sample should bear
some proportional relationship to the size of the population from which is to be drawn.
This is not true. The size of a sample is a function of the variation in the population
parameters under study and the precision of the estimate needed by the researcher.
A sample of 500 may be appropriate sometimes, while a sample of more than 2000 is
required in other circumstances. In another case, perhaps a sample of only 50 is called
for.
(10) Preparation of field materials: Questionnaire is an important instrument for any
scientific study. It is therefore necessary to construct questionnaires relevant to the study
keeping in mind the objectives of the study. Necessary instruction manuals to fill up the
questionnaire must also be prepared in advance so that the field workers can collect data
without any difficulty.
(11) Selection and training of field workers: The validity of the survey results largely
depends on the personnel involved and their efficiency. It is therefore important to select
and train the field workers carefully. Training is especially important if interview method
is followed because the interviewers’ personal styles and presentations largely affect the
rate of response and the accuracy of responses.
(12) Pre-testing: Pre-testing is a trial or operation that allows us to test the questionnaire
or other measurement instruments in the field, to screen interviewers and to check on the
management of field operations. The results of the pre-test usually suggest that some
modification must be made before a full-scale sampling is undertaken. It provides the
means of uncovering deficiencies and the basis for corrective action prior to carrying out
the actual survey. It may also suggest amount of workload to be assigned to each
investigator and an insight into the data processing operation in advance.
(13) Fieldwork: Efficient organization of the fieldwork is a pre-requisite of the
successful completion of a statistical investigation. The personnel involved in the
fieldwork should receive adequate training related to the work. Appropriate measures
should be taken so that the field personnel are regularly supervised and their works be
monitored. Quality of the work should be ensured at very early stage of the work so that
any inconsistencies or shortcomings can be removed well before the completion of the
work. An instruction manual should be prepared for all categories of persons involved in
the field operation. This will ensure quality data assuring maximum accuracy in the
estimates.
(14) Data management: Large surveys generate huge amounts of data. Hence, a well
prepared data management plan is of prime importance. This plan should include the
steps for processing data from the very inception of the study until the final analysis is
completed. The administrative and computer procedures to be used, the type of staff
available and whether any training will be needed to facilitate data management should
also be described. A quality control scheme should also be included in the plan in order
to check for agreement between processed data and data gathered in the field.
(15) Editing and checking: A detailed plan must be outlined at the outset to check and
edit the field data soon after they are at hand for any erroneous and inconsistent entries.
Both manual and computer checking may be employed for any inconsistencies in data.
For any erroneous entry, which cannot be corrected at this stage, should be corrected by
re-interviewing the respondents.
(16) Data processing and analysis: Once the data are checked, edited and corrected for
errors, processing of data should be attempted keeping in view the objectives of the
study. This task also needs careful planning.
The next step is the statistical analysis, which is carried out to arrive at the desired
estimates of the population parameters. Statistical methods, which will be used for the
analysis of the data, should be outlined, including a description of how the information
collected will be used to test the stated hypothesis and how any missing data will be dealt
with.
(17) Project management: For collaborative study involving several organizations,
indication should be made at the planning stage who will have the overall responsibility,
which other organizations will be involved and what their responsibilities will be and the
manner in which the work will be coordinated and monitored.
(18) Report writing: Finally, the findings of the study highlighting the policy
implications and suggesting possible actions and measures to be taken including policy
recommendation, should be written in a report.
(19) Lessons learned: Survey is a complex undertaking and is liable to large margin of
errors if not properly handled. Because of this complexity, things never go exactly as we
plan. The main obstacles and difficulties, which interfere with the successful completion
of the study within the time and cost proposed, should therefore be described.
units due to their perishable or fragile nature. The alternative in this situation is to take
only a few of the units.
For example, consider the problem of checking the quality of mango juice produced by a
company. One way to test the quality is to drink entire lot, which is impracticable.
Testing of electric bulb, screws, glasses, medicine all are examples of this type, where
sampling is a must.
Limitations of sampling:
Despite several advantages of sample survey over complete count, it has some
disadvantages or limitations too. They are as follows:
(1) The results of a sample survey are subject to sampling error and on that account are
less precise than those of a complete enumeration.
(2) A sample may seriously over represent, under represent or even fail to represent the
population. In such instances, the estimates provided by such surveys are liable to larger
margin of errors.
(3) Sampling theory requires the services of trained and qualified personnel and
sophisticated equipment for its planning, execution and analysis. In the absence of these,
the results of the sample survey are not trustworthy.
(4) However, if the information is required about each and every unit of the universe,
there is no way but to resort to complete enumeration. More over if time and money are
not important factors or if the universe is not too large, a complete enumeration may be
better than any sampling method.
Criteria of a perfect sample design:
A perfect sample design is expected to meet certain criteria, which include, among others,
the criteria of accuracy, reliability, validity and efficiency. We provide bellow a brief
account of these concepts.
(1) Accuracy: The accuracy of a sample estimate refers to its closeness to the true
population value. The closer the sample estimate to the population value, the greater is its
accuracy. The accuracy of an estimate is generally assessed on the basis of its mean
square error. The smaller the mean square error of an estimator, the greater is its
accuracy.
So, a good sample design must allow us to measure valid estimates of its sampling
variability, which is ordinarily expressed in terms of mean square error. This is possible
only when the sample is probability sample.
(2) Reliability: If we assume that there is no measurement error in the survey, then the
reliability or precision of an estimator can be stated in terms of its sampling variance or
equivalently, of its standard error.
The standard error measures the precision with which the estimate from a particular
sample approximates the hypothetical average result from all possible samples. The
smaller the standard error of an estimate, the greater is its reliability.
So, a good sample design must allow us to measure valid estimates of its sampling
variability, which is ordinarily expressed in terms of standard error.
(3) Validity: If we assume that there is no measurement error in the survey, then the
validity of an estimator can be evaluated by examining the bias of the estimator. The
smaller the bias, the greater is the validity.
Bias refers how far the average estimator lies from the parameter. Thus, if t is an
estimator and is its corresponding population parameter, then the bias of the estimator
is expressed as: B t E t .
The use of faulty data collection tools could lead to biased results. Bias can also be
introduced as a consequence of improper sampling design. This may result in the sample
not being representative of the study population.
So, a good sample design must be oriented to the research objectives in terms of its
selection and estimation of the population values. Furthermore, it must have the
compliance with the survey design and suit to the survey environment.
(4) Efficiency: The criteria of efficiency are related to the cost of sampling. A sampling
design is considered to be more efficient than another, if the former results in lower costs
than the later design, with the same degree of reliability. So, economy is another aspect of
a sample design. Therefore, a good sample design must therefore involve lowest cost for
the fulfillment of the survey objectives.