Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
18 views

Sampling

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Sampling

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 65

Inferential Statistics

SAMPLING & SAMPLING DISTRIBUTION

Khalid Ilyas Siddiqui


SAMPLE

• Subset of a larger population, or selecting


talkative, knowledgeable people from a
large population.
Population

• Any Complete Group


- People
- Sales Territories
- Stores
Census orRegistration

Investigation of all individual elements that

make up a population�����������
STAGES IN THE SELECTION OF A SAMPLE
Define the target population

Select a sampling frame

Determine if a
probability or nonprobability
sampling method will be chosen

Plan procedure
for selecting sampling units
TargetPopulation

• Relevant Population

• Operationally Define
SAMPLING FRAME

• A list of elements from which the sample


may be drawn.
For Example
• Working Population.
• Mailing Lists - Data Base Marketers
SAMPLING UNITS

• Group selected for the sample


• Primary Sampling Units (PSU) are all those
students which are interviewed in the first
meeting.
• Secondary Sampling Units, employees HR
Of MAIWAND institute.
• Tertiary Sampling Units, All management
and Security people of MAIWAND Institute.
Random SamplingError

• The difference between the sample results


and the results of a census conducted using
identical procedures
• Statistical fluctuation due to chance
variations
Systematic Errors
An error that is not determine by chance but is
introduced by an inaccuracy in the system
• Nonsampling errors
• Unrepresentative sample results
•Not due to chance
• Due to study design or imperfections in
execution
ErrorsAssociated with Sampling
• Sampling Frame Error
The variation b/w the popn and the sampling frame
This error occurs when sample frame face imperfection
Eg telephone directory have phone numbers but it will
Not show the size of popn accoractly.
• Random Sampling Error
The sample has been selected by a random method.
• Nonresponse Error

Non-response errors occur when the survey


fails to get a response to one, or possibly all,
of the questions.
Two Major Categories of
Sampling
• PROBABILITY SAMPLING
• Known, nonzero probability for every element.
• NONPROBABILITY SAMPLING
• Probability of selecting any particular member is
unknown.
NONPROBABILITY
SAMPLING

1. CONVENIENCE.

2. JUDGMENT.

3. QUOTA
Assembled sample has the same proportion of
individual as entire popn
4. SNOWBALL
Sample group appear to grow the sample size due to
Group increases eg. people who have many friends are
more likely to be recruited into the sample. tree sales system
PROBABILITYSAMPLING

1. SIMPLE RANDOM SAMPLE.


2. SYSTEMATIC SAMPLE.
3. STRATIFIED SAMPLE.
4. CLUSTER SAMPLE.
5. MULTISTAGE AREA SAMPLE.
CONVENIENCE
SAMPLING

• Also called haphazard or accidental


sampling.
• The sampling procedure of obtaining the
people or units that are most conveniently
available.
JUDGMENT SAMPLING

Also called purposive sampling


• An experienced individual selects the
sample based on his or her judgment about
some appropriate characteristics required of
the sample member
SIMPLE RANDOM
SAMPLING

• A sampling procedure that ensures that each


element in the population will have an equal
chance ofbeing included in the sample.
For Example
• Select one person from each row in class
A7. OR, 5 Male and 5 female students.
SYSTEMATIC SAMPLING

• A simple process
• Every nth name from the list will be drawn
STRATIFIED SAMPLING

• Probability sample
• Subsamples are drawn within different
strata or divisions or section
• Each stratum is more or less equal on some
characteristic.
• Do not confuse with quota sample.
CLUSTER SAMPLING

• The purpose of cluster sampling is to sample


economically while retaining the characteristics of
a probability sample.
• The primary sampling unit is no longer the
individual element in the population
• The primary sampling unit is a larger cluster of
elements located in proximity to one another
EXAMPLES OF CLUSTERS

Population Element Possible Clusters in Afghanistan

Af .adult population Provinces

Districts
UrbanArea
villages
Households

.
WHAT IS THE APPROPRIATE SAMPLE

DESIGN?
1. DEGREE OF ACCURACY.

2. RESOURCES.

3. TIME.

4. ADVANCED KNOWLEDGE OF THE


POPULATION.

5. NATIONAL VERSUS INTERNATIONAL.

6. NEED FOR STATISTICAL ANALYSIS.


After the Sample Design is Selected

• Determine sample size


• Select actual sample units
• Conduct fieldwork
SAMPLING…….

STUDY POPULATION

SAMPLE

TARGET POPULATION

24
Types of Samples

 Probability (Random) Samples


 Simple random sample

 Systematic random sample


 Stratified random sample
 Multistage sample
 Multiphase sample
 Cluster sample
 Non-Probability Samples
 Convenience sample
 Purposive sample
 Quota 25
Process
 The sampling process comprises several stages:
 Defining the population of concern
 Specifying a sampling frame, a set of items or
events possible to measure
 Specifying a sampling method for selecting
items or events from the frame
 Determining the sample size
 Implementing the sampling plan
 Sampling and data collecting
 Reviewing the sampling process

26
Population definition

A population can be defined as including all


people or items with the characteristic one
wishes to understand.
 Because there is very rarely enough time or
money to gather information from everyone
or everything in a population, the goal
becomes finding a representative sample (or
subset) of that population.

27
Population definition…….

 Note also that the population from which the


sample is drawn may not be the same as the
population about which we actually want
information. Often there is large but not
complete overlap between these two groups
due to frame issues etc .
 Sometimes they may be entirely separate - for
instance, we might study rats in order to get a
better understanding of human health, or we
might study records from people born in 2008
in order to make predictions about people born
28
SAMPLING FRAME

 In the most straightforward case, such as the sentencing


of a batch of material from production (acceptance
sampling by lots), it is possible to identify and measure
every single item in the population and to include any one
of them in our sample. However, in the more general case
this is not possible. There is no way to identify all rats in
the set of all rats. Where voting is not compulsory, there
is no way to identify which people will actually vote at a
forthcoming election (in advance of the election)
 As a remedy, we seek a sampling frame which has the
property that we can identify every single element and
include any in our sample .
 The sampling frame must be representative of the
population
29
PROBABILITY SAMPLING
 A probability sampling scheme is one in which every
unit in the population has a chance (greater than
zero) of being selected in the sample, and this
probability can be accurately determined.

 . When every element in the population does have


the same probability of selection, this is known as an
'equal probability of selection' (EPS) design. Such
designs are also referred to as 'self-weighting'
because all sampled units are given the same weight .

30
PROBABILITY SAMPLING…….

Probability sampling includes:


 Simple Random Sampling,
 Systematic Sampling,
 Stratified Random Sampling,
 Cluster Sampling
 Multistage Sampling.
 Multiphase sampling

31
NON PROBABILITY SAMPLING

 Any sampling method where some elements of population have


no chance of selection (these are sometimes referred to as 'out
of coverage'/'undercovered'), or where the probability of
selection can't be accurately determined. It involves the
selection of elements based on assumptions regarding the
population of interest, which forms the criteria for selection.
Hence, because the selection of elements is nonrandom,
nonprobability sampling not allows the estimation of sampling
errors..
 Example: We visit every household in a given street, and
interview the first person to answer the door. In any household
with more than one occupant, this is a nonprobability sample,
because some people are more likely to answer the door (e.g. an
unemployed person who spends most of their time at home is
more likely to answer than an employed housemate who might be
at work when the interviewer calls) and it's not practical to
calculate these probabilities. 32
NONPROBABILITY
SAMPLING…….
• Nonprobability Sampling includes:
Accidental Sampling, Quota Sampling and
Purposive Sampling. In addition,
nonresponse effects may turn any
probability design into a nonprobability
design if the characteristics of
nonresponse are not well understood, since
nonresponse effectively modifies each
element's probability of being sampled.
33
SIMPLE RANDOM SAMPLING

• Applicable when population is small,


homogeneous & readily available
• All subsets of the frame are given an equal
probability. Each element of the frame thus
has an equal probability of selection.
• It provides for greatest number of possible
samples. This is done by assigning a number
to each unit in the sampling frame.
• A table of random number or lottery system
is used to determine which units are to be
selected.
34
SIMPLE RANDOM SAMPLING……..

 Estimates are easy to calculate.


 Simple random sampling is always an EPS design,
but not all EPS designs are simple random
sampling.
 Disadvantages
 If sampling frame large, this method
impracticable.
 Minority subgroups of interest in population may
not be present in sample in sufficient numbers
for study.

35
REPLACEMENT OF SELECTED UNITS

 Sampling schemes may be without replacement


('WOR' - no element can be selected more than
once in the same sample) or with replacement
('WR' - an element may appear multiple times in
the one sample).
 For example, if we catch fish, measure them, and
immediately return them to the water before
continuing with the sample, this is a WR design,
because we might end up catching and measuring
the same fish more than once. However, if we do
not return the fish to the water (e.g. if we eat
the fish), this becomes a WOR design.
36
SYSTEMATIC SAMPLING

 Systematic sampling relies on arranging the target


population according to some ordering scheme and
then selecting elements at regular intervals through
that ordered list.
 Systematic sampling involves a random start and then
proceeds with the selection of every kth element
from then onwards. In this case, k=(population
size/sample size).
 It is important that the starting point is not
automatically the first in the list, but is instead
randomly chosen from within the first to the kth
element in the list.
 A simple example would be to select every 10th name
from the telephone directory (an 'every 10th' sample,
also referred to as 'sampling with a skip of 10').
37
SYSTEMATIC SAMPLING……

As described above, systematic sampling is an EPS method,


because all elements have the same probability of
selection (in the example given, one in ten). It is not
'simple random sampling' because different subsets of
the same size have different selection probabilities - e.g.
the set {4,14,24,...,994} has a one-in-ten probability of
selection, but the set {4,13,24,34,...} has zero probability
of selection.

38
SYSTEMATIC SAMPLING……

 ADVANTAGES:
 Sample easy to select
 Suitable sampling frame can be identified
easily
 Sample evenly spread over entire reference
population
 DISADVANTAGES:
 Sample may be biased if hidden periodicity in
population coincides with that of selection.
 Difficult to assess precision of estimate from
one survey.

39
STRATIFIED SAMPLING

Where population embraces a number of distinct


categories, the frame can be organized into
separate "strata." Each stratum is then sampled as
an independent sub-population, out of which
individual elements can be randomly selected.
 Every unit in a stratum has same chance of being
selected.
 Using same sampling fraction for all strata ensures
proportionate representation in the sample.
 Adequate representation of minority subgroups of
interest can be ensured by stratification & varying
sampling fraction between strata as required. 40
STRATIFIED SAMPLING……

 Finally,
since each stratum is treated as an
independent population, different sampling
approaches can be applied to different strata.
 Drawbacks to using stratified sampling.
 First, sampling frame of entire population has
to be prepared separately for each stratum
 Second, when examining multiple criteria,
stratifying variables may be related to some,
but not to others, further complicating the
design, and potentially reducing the utility of
the strata.
 Finally, in some cases (such as designs with a
large number of strata, or those with a
specified minimum sample size per group), 41
STRATIFIED SAMPLING…….

Draw a sample from each stratum

42
POSTSTRATIFICATION

 Stratification is sometimes introduced after the


sampling phase in a process called "post stratification“.
 This approach is typically implemented due to a lack of
prior knowledge of an appropriate stratifying variable
or when the experimenter lacks the necessary
information to create a stratifying variable during the
sampling phase. Although the method is susceptible to
the pitfalls of post hoc approaches, it can provide
several benefits in the right situation. Implementation
usually follows a simple random sample. In addition to
allowing for stratification on an ancillary variable, post
stratification can be used to implement weighting,
which can improve the precision of sample'sestimates.
43
OVERSAMPLING

 Choice-based sampling is one of the stratified


sampling strategies. In this, data are
stratified on the target and a sample is taken
from each strata so that the rare target class
will be more represented in the sample. The
model is then built on this biased sample. The
effects of the input variables on the target
are often estimated with more precision with
the choice-based sample even when a smaller
overall sample size is taken, compared to a
random sample. The results usually must be
adjusted to correct for the oversampling.
44
CLUSTER SAMPLING

 Cluster sampling is an example of 'two-stage


sampling' .
 First stage a sample of areas is chosen;
 Second stage a sample of respondents within
those areas is selected.
 Population divided into clusters of homogeneous
units, usually based on geographical contiguity.
 Sampling units are groups rather than individuals.
 A sample of such clusters is then selected.
 All units from the selected clusters are studied.

45
CLUSTER SAMPLING…….

 Advantages :
 Cuts down on the cost of preparing a
sampling frame.
 This can reduce travel and other
administrative costs.
 Disadvantages: sampling error is higher
for a simple random sample of same
size.
 Often used to evaluate vaccination
coverage in EPI
46
CLUSTER SAMPLING…….

• Identification of clusters
– List all cities, towns, villages & wards of cities with
their population falling in target area under study.
– Calculate cumulative population & divide by 30, this
gives sampling interval.
– Select a random no. less than or equal to sampling
interval having same no. of digits. This forms 1 st
cluster.
– Random no.+ sampling interval = population of 2 nd
cluster.
– Second cluster + sampling interval = 4th cluster.
– Last or 30th cluster = 29th cluster + sampling interval
47
CLUSTER SAMPLING…….

Two types of cluster sampling methods.


One-stage sampling. All of the elements
within selected clusters are included in
the sample.
Two-stage sampling. A subset of
elements within selected clusters are
randomly selected for inclusion in the
sample.

48
CLUSTER SAMPLING…….
• Freq cf cluster • XVI 3500 52500 17
• I 2000 2000 1 • XVII 4000 56500 18,19
• II 3000 5000 2 • XVIII 4500 61000 20
• III 1500 6500 • XIX 4000 65000 21,22
• IV 4000 10500 3 • XX 4000 69000 23
• V 5000 15500 4, 5 • XXI 2000 71000 24
• VI 2500 18000 6 • XXII 2000 73000
• VII 2000 20000 7 • XXIII 3000 76000 25
• VIII 3000 23000 8 • XXIV 3000 79000 26
• IX 3500 26500 9 • XXV 5000 84000 27,28
• X 4500 31000 10 • XXVI 2000 86000 29
• XI 4000 35000 11, 12 • XXVII 1000 87000
• XII 4000 39000 13 • XXVIII 1000 88000
• XIII 3500 44000 14,15 • XXIX 1000 89000 30
• XIV 2000 46000 • XXX 1000 90000
• XV 3000 49000 16 • 90000/30 = 3000 sampling interval

49
Difference Between Strata and Clusters

 Although strata and clusters are both non-


overlapping subsets of the population, they
differ in several ways.
 All strata are represented in the sample; but
only a subset of clusters are in the sample.
 With stratified sampling, the best survey
results occur when elements within strata are
internally homogeneous. However, with cluster
sampling, the best results occur when elements
within clusters are internally heterogeneous

50
MULTISTAGE SAMPLING

 Complex form of cluster sampling in which two or more levels of


units are embedded one in the other.
 First stage, random number of districts chosen in all
states.
 Followed by random number of talukas, villages.
 Then third stage units will be houses.
 All ultimate units (houses, for instance) selected at last step are
surveyed.

51
MULTISTAGE SAMPLING……..

 This technique, is essentially the process of taking random


samples of preceding random samples.
 Not as effective as true random sampling, but probably solves
more of the problems inherent to random sampling.
 An effective strategy because it banks on multiple
randomizations. As such, extremely useful.
 Multistage sampling used frequently when a complete list of all
members of the population not exists and is inappropriate.
 Moreover, by avoiding the use of all sample units in all
selected clusters, multistage sampling avoids the large, and
perhaps unnecessary, costs associated with traditional cluster
sampling.

52
MULTI PHASE SAMPLING

 Part of the information collected from whole sample & part from
subsample.

 In Tb survey MT in all cases – Phase I


 X –Ray chest in MT +ve cases – Phase II
 Sputum examination in X – Ray +ve cases - Phase III

 Survey by such procedure is less costly, less laborious & more


purposeful

53
MATCHED RANDOM SAMPLING

A method of assigning participants to groups in which


pairs of participants are first matched on some
characteristic and then individually assigned randomly
to groups.
 The Procedure for Matched random sampling can be
briefed with the following contexts,
 Two samples in which the members are clearly paired,
or are matched explicitly by the researcher. For
example, IQ measurements or pairs of identical twins.
 Those samples in which the same attribute, or variable,
is measured twice on each subject, under different
circumstances. Commonly called repeated measures.
 Examples include the times of a group of athletes for
1500m before and after a week of special training; the
milk yields of cows before and after being fed a
particular diet.
54
QUOTA SAMPLING

 The population is first segmented into


mutually exclusive sub-groups, just as in
stratified sampling.
 Then judgment used to select subjects or units from
each segment based on a specified proportion.
 For example, an interviewer may be told to sample 200
females and 300 males between the age of 45 and 60.
 It is this second step which makes the technique one of
non-probability sampling.
 In quota sampling the selection of the sample is non-
random.
 For example interviewers might be tempted to
interview those who look most helpful. The problem is
that these samples may be biased because not everyone
gets a chance of selection. This random element is its
greatest weakness and quota versus probability has 55
CONVENIENCE SAMPLING

 Sometimes known as grab or opportunity sampling or accidental or


haphazard sampling.
 A type of nonprobability sampling which involves the sample being
drawn from that part of the population which is close to hand. That
is, readily available and convenient.
 The researcher using such a sample cannot scientifically make
generalizations about the total population from this sample because
it would not be representative enough.
 For example, if the interviewer was to conduct a survey at a
shopping center early in the morning on a given day, the people that
he/she could interview would be limited to those given there at
that given time, which would not represent the views of other
members of society in such an area, if the survey was to be
conducted at different times of day and several times per week.
 This type of sampling is most useful for pilot testing.
 In social science research, snowball sampling is a similar technique,
where existing study subjects are used to recruit more subjects
into the sample.
56
CONVENIENCE SAMPLING…….

 Use results that are easy to get

57

57
Judgmental sampling or Purposive
sampling

 - The
researcher chooses the sample
based on who they think would be
appropriate for the study. This is used
primarily when there is a limited number
of people that have expertise in the
area being researched

58
PANEL SAMPLING

 Method of first selecting a group of participants through a


random sampling method and then asking that group for the same
information again several times over a period of time.
 Therefore, each participant is given same survey or interview at
two or more time points; each period of data collection called a
"wave".
 This sampling methodology often chosen for large scale or nation-
wide studies in order to gauge changes in the population with
regard to any number of variables from chronic illness to job
stress to weekly food expenditures.
 Panel sampling can also be used to inform researchers about
within-person health changes due to age or help explain changes in
continuous dependent variables such as spousal interaction.
 There have been several proposed methods of analyzing panel
sample data, including growth curves.

59
What sampling method u recommend?

 Determining proportion of undernourished five


year olds in a village.
 Investigating nutritional status of preschool
children.
 Selecting maternity records for the study of
previous abortions or duration of postnatal stay.
 In estimation of immunization coverage in a
province, data on seven children aged 12-23
months in 30 clusters are used to determine
proportion of fully immunized children in the
province.
 Give reasons why cluster sampling is used in this
60
Probability proportional to size sampling

 In some cases the sample designer has access to an


"auxiliary variable" or "size measure", believed to be
correlated to the variable of interest, for each element in
the population. This data can be used to improve accuracy
in sample design. One option is to use the auxiliary variable
as a basis for stratification, as discussed above.
 Another option is probability-proportional-to-size ('PPS')
sampling, in which the selection probability for each
element is set to be proportional to its size measure, up to a
maximum of 1. In a simple PPS design, these selection
probabilities can then be used as the basis for
Poisson sampling. However, this has the drawbacks of
variable sample size, and different portions of the
population may still be over- or under-represented due to
chance variation in selections. To address this problem, PPS
may be combined with a systematic approach.
61
Contd.

 Example: Suppose we have six schools with populations of 150, 180,


200, 220, 260, and 490 students respectively (total 1500 students), and
we want to use student population as the basis for a PPS sample of
size three. To do this, we could allocate the first school numbers
1 to 150, the second school 151 to 330 (= 150 + 180), the third school 331
to 530, and so on to the last school (1011 to 1500). We then generate a
random start between 1 and 500 (equal to 1500/3) and count through
the school populations by multiples of 500. If our random start was 137,
we would select the schools which have been allocated numbers 137,
637, and 1137, i.e. the first, fourth, and sixth schools.
 The PPS approach can improve accuracy for a given sample size by
concentrating sample on large elements that have the greatest impact
on population estimates. PPS sampling is commonly used for surveys
of businesses, where element size varies greatly and auxiliary
information is often available - for instance, a survey attempting to
measure the number of guest-nights spent in hotels might use each
hotel's number of rooms as an auxiliary variable. In some cases, an
older measurement of the variable of interest can be used as an
auxiliary variable when attempting to produce more current estimates.
62
Event sampling
 Event Sampling Methodology (ESM) is a new form of sampling
method that allows researchers to study ongoing experiences
and events that vary across and within days in its naturally-
occurring environment. Because of the frequent sampling of
events inherent in ESM, it enables researchers to measure the
typology of activity and detect the temporal and dynamic
fluctuations of work experiences. Popularity of ESM as a new
form of research design increased over the recent years because
it addresses the shortcomings of cross-sectional research, where
once unable to, researchers can now detect intra-individual
variances across time. In ESM, participants are asked to record
their experiences and perceptions in a paper or electronic diary.
 There are three types of ESM:# Signal contingent – random
beeping notifies participants to record data. The advantage of
this type of ESM is minimization of recall bias.
 Event contingent – records data when certain events occur
63
Contd.
 Event contingent – records data when certain events occur
 Interval contingent – records data according to the passing of
a certain period of time
 ESM has several disadvantages. One of the disadvantages of
ESM is it can sometimes be perceived as invasive and intrusive
by participants. ESM also leads to possible self-selection bias.
It may be that only certain types of individuals are willing to
participate in this type of study creating a non-random
sample. Another concern is related to participant
cooperation. Participants may not be actually fill out their
diaries at the specified times. Furthermore, ESM may
substantively change the phenomenon being studied.
Reactivity or priming effects may occur, such that repeated
measurement may cause changes in the participants'
experiences. This method of sampling data is also highly
vulnerable to common method variance.[6]
64
contd.
 Further, it is important to think about whether or not an
appropriate dependent variable is being used in an ESM
design. For example, it might be logical to use ESM in order
to answer research questions which involve dependent
variables with a great deal of variation throughout the day.
Thus, variables such as change in mood, change in stress
level, or the immediate impact of particular events may be
best studied using ESM methodology. However, it is not
likely that utilizing ESM will yield meaningful predictions
when measuring someone performing a repetitive task
throughout the day or when dependent variables are long-
term in nature (coronary heart problems).

65

You might also like