Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

Chapter 3 - Sampling Design and Data Collection

The document outlines the principles of sampling design in research, including the importance of defining the target population, sampling frame, sample size, and sampling methods. It discusses the two main types of sampling methods: probability and non-probability sampling, along with their advantages and disadvantages. Additionally, it covers data collection methods, emphasizing the differences between primary and secondary data, and various techniques for collecting primary data.

Uploaded by

kechohaile
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Chapter 3 - Sampling Design and Data Collection

The document outlines the principles of sampling design in research, including the importance of defining the target population, sampling frame, sample size, and sampling methods. It discusses the two main types of sampling methods: probability and non-probability sampling, along with their advantages and disadvantages. Additionally, it covers data collection methods, emphasizing the differences between primary and secondary data, and various techniques for collecting primary data.

Uploaded by

kechohaile
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Research Methods

Instructor: Solomon (Ph.D)


Sampling Design
Sampling Design
• Sampling design refers to the overall strategy or
framework that includes the sampling method, along
with the structure of the sampling process.
• It encompasses decisions on how to define the
population, how to select the sample, how many
participants to include, and how the sample is divided
or grouped.
• It influences the accuracy, reliability, and precision of
the results.
• It involves decisions like:
− The Target Population: Defining the group of individuals
from which the sample will be drawn.
Sampling Design …
– The Sampling Frame: The actual list or representation of
the population.
– Sample Size: How many individuals to include in the
sample.
– Sampling Method: Whether probability or non-probability
sampling methods will be used.

• Therefore, the researcher must select/prepare a sample


design that should be reliable and appropriate for his/her
research study.
Components of Sampling Design
• Population: The complete set of items, datasets,
systems, or individuals being studied in the research.
– Example:
▪ If a researcher wants to study the eating habits of college students in
Ethiopia, the population would be all college students in Ethiopia.
▪ NLP: All Amharic-written text on the web when creating a corpus for
training language models.
▪ SE: All software development teams globally when researching
productivity factors in agile development.
• In research, there are two common types of
population:
▪ Finite Population: A population with a countable and limited
number of members.
– Example:
▪ All students enrolled in a specific university/college.
Components of Sampling Design …
▪ Infinite Population: A population that is so large that it is
impractical to count all members, or it is theoretically
uncountable.
– Example:
▪ All stars in the universe.
• Sampling Frame: It is essentially the list or complete set
of elements (individuals, units, or objects) that are
eligible to be included in the sample. It should ideally
cover the entire population.
− Example: For studying employees in a company, the sampling frame
might be the company’s employee database.
• Sample Size: The number of individuals or units
selected from the population to form the sample.
− Example: If there are 10K employees in a company and the researcher
selects 500 employees for the study, the sample size is 500.
Components of Sampling Design …
• Sampling Method: It refers to the specific technique
used to select individuals from the population.
− It can be broadly categorized into two types:
▪ Probability Sampling (each member has a known chance of being
selected).
▪ Non-Probability Sampling (not random).
• Sampling Errors and Bias: These are common
challenges in CS research, and minimizing them is
essential for valid results.
− Types of Errors:
a) Sampling Error: Occurs when a small sample is not representative of
the population.
✓ For example, training a chatbot using customer service logs from a
single industry (e.g., banking) may not generalize to other domains
like retail.
Components of Sampling Design …
b) Bias:
▪ Selection Bias: occurs when the sample selected for the study is not
representative of the population, leading to incorrect conclusions.
✓ Example: The researcher only uses high-resolution images from a
specific camera brand in the evaluation dataset.
▪ Data Bias: occurs when the data used for training, testing, or analysis
contains systematic errors or prejudices that skew the results.
✓ Example: The training data includes tweets predominantly from a
specific political group, with limited diversity in opinion.

• Summary:
− Selection Bias: Results from how the sample is chosen. The
sample is not representative of the population.
− Data Bias: Results from errors in the data itself. The data used in
training or analysis is systematically skewed in some way.
Sample Size and Representativeness
• Obviously, the sample has to be representative of the
population to make generalizations possible.
• But, how is it possible to ensure representativeness?
– One way of taking care of representativeness is to use the
procedure of randomization.
– According to the Law of Statistical Regularity, a reasonably
smaller sample may be good representative if the subjects of the
sample are selected at random. The basic premise of sampling is:
“Randomize whenever possible”: Take cases at random, assign
them to groups at random if this is required, and give
experimental treatments at random in an experimental research.
• What if randomization is impossible?
− Some times randomization may be difficult to practice. Under this
condition, you ought to practice the second law – The Law of
Inertia of the Large Sample – which serves as a consequence of
Sample Size and Representativeness …
− The Law of Inertia of the Large Sample says that a large sample is
more stable or good representative as compared with small
sample. The sampling error is inversely proportional to the sample
size – the larger the sample, the smaller the error.
• There are general suggestions to be followed in making
decisions regarding sample size:
✓ If the population under study is homogeneous, a small sample is
sufficient. On the other hand, a much larger sample is necessary
if there is greater variability in the units of the population.
✓ A small sample is often satisfactory in an intensive laboratory
study in which greater precision is desired.
✓ The sample size may be adequate or inadequate depending on
how the sample is drawn. For example, a sample size of 27 may
be satisfactory in simple random sampling, but this may not be
adequate in stratified and cluster sampling because various strata
or cluster must be represented.
Sample Size and Representativeness …
▪ As a general rule, it may be said that a researcher has to consider
the number of factors or independent variables and the number
of groups or categories in each factor to generally decide on
sample size.
− According to Drapper and Smith, we may say that sample size (n)
is a function of factors (Xi) and categories (Ck) such that a
minimum of 10 observations is required for each category of a
factor.
n = 10[Cf1 x Cf2 x Cf3 … x Cfn]
Where;
▪ n = sample size
▪ Cf1 = number of categories of factor 1
▪ Cf2 = number of categories of factor 2
▪ Cf3 = number of categories of factor 3
▪ Cfn = number of categories of factor n
Sample Size and Representativeness …
For example, if the researcher has two factors in the research
(say, sex and age) such that there are 2 categories (male and
female) in the first factor and 4 categories in the second factor
(say, grade 5, 7, 9 and 11), then the minimum sample size this
researcher has to draw = 10*2*4 = 80 students.
❖ If the sample is directly drawn from a known population, then the
minimum required sample size can be determined using
Yamane’s formula for sample size estimation from a single
population:
n = N / (1 + N*e2)
Where; n = Sample size, N = Population size, e = Sampling error or
precision level, usually an alpha level of 0.05.
− Example: If you have a population size of 35,000 cases with a
margin of error of 5%, then applying this formula - you are to
draw a sample size of about 396 cases.
− Solution: n = N / (1 + N*e2) = 35000/ (1 + 35000* 0.052) = 396.
Sample Size and Representativeness …
❖ If the actual size of the population is not known, then sampling
from a single population can be estimated by the following
formula:
n = p(1-p)Z2/E2
Where;
n is the sample size,
p is the proportion of the population having the major interest
(Assumed as 50% or 0.5),
Z is the confidence interval &
E is the margin of error.
− Exercise: How many possible number of respondents (n) do you
need when you plan to conduct a research on the efficiency and
usability of a certain mobile application if the margin of error is
5%?
− Ans: n  384.
Sample Size and Representativeness …
Types of Sampling Methods
• There are two major types of sampling methods:
probability and non-probability sampling.
– Probability sampling method is a kind of sample selection where
randomization is used instead of deliberate choice. Each member of
the population has a known, non-zero chance of being selected.
– Non-probability sampling methods are where the researcher
deliberately picks items or individuals for the sample based on non-
random factors such as convenience, geographic availability, or costs.
• Types of probability sampling - every member of the
population has a chance of being selected.
1) In simple random sampling, each individual has an equal
probability of being chosen, and each selection is independent
of the others.
✓ To conduct this type of sampling, you can use tools like random number
generators or other techniques that are based entirely on chance.
Types of Sampling Methods …
2) Systematic sampling involves selecting units or elements at
regular intervals from an ordered list of the population.
3) Stratified sampling divides the population into subgroups
(strata), and random samples are drawn from each stratum in
proportion to its size in the population.
▪ To use this sampling method, you divide the population into
subgroups (called strata) based on the relevant characteristic (e.g.,
gender identity, age range, income bracket, job role, etc).
▪ Based on the overall proportions of the population, you calculate
how many people should be sampled from each subgroup. Then you
use random or systematic sampling to select a sample from each
subgroup.
4) Cluster sampling involves dividing the population into
subgroups, but each subgroup should have similar
characteristics to the whole sample.
▪ Instead of sampling individuals from each subgroup, you randomly
select entire subgroups.
Stratified Sampling

Raw Data Stratified Sample


Types of Sampling Methods …
• Types of non-probability sampling - individuals are
selected based on non-random criteria, and not every
individual has a chance of being included.
1) Convenience sampling includes the individuals who happen to
be most accessible to the researcher.
▪ This is an easy and inexpensive way to gather initial data, but there
is no way to tell if the sample is representative of the population, so
it can’t produce generalizable results.
▪ Convenience samples are at risk for both sampling bias and selection
bias.
2) Voluntary response sampling - Instead of the researcher
choosing participants and directly contacting them, people
volunteer themselves (e.g. by responding to a public online
survey).
▪ Voluntary response samples are always at least somewhat biased, as
some people are naturally more likely to volunteer than others,
leading to self-selection bias.
Types of Sampling Methods …
3) Purposive sampling (judgement sampling) is the process
whereby the researcher selects a sample based on experience
or knowledge of the group to be sampled.
▪ It is often used in qualitative research, where the researcher wants
to gain detailed knowledge about a specific phenomenon rather
than make statistical inferences, or where the population is very
small and specific.
4) Snowball sampling - if the population is hard to access,
snowball sampling can be used to recruit participants via other
participants. This process continues, with the sample
“snowballing” as more participants are recruited through
referrals.
▪ The downside here is also representativeness, as you have no way of
knowing how representative your sample is due to the reliance on
participants recruiting others. This can lead to sampling bias.
Types of Sampling Methods …
5) Quota sampling relies on the non-random selection of a
predetermined number or proportion of units. This is called a
quota.
▪ First divide the population into distinct groups (strata) based on
specific characteristics, such as age, gender, or income. Then set a
predetermined quota for each group, ensuring that the sample
reflects these proportions. Participants are selected until the quotas
for all groups are met, but the selection process within each group is
not random, often based on availability.

❖ Each method has its strengths and weaknesses, and


researchers must carefully consider these factors to
ensure the sample is representative and generalizable
to the larger population.
Data Collection Methods
Data Collection Methods
• Data Collection is the process by which the researcher
collects the information needed to answer the
research problem.
− The task of data collection begins after a research problem
has been defined and research design/plan chalked out.
• While deciding about the method of data collection to be
used for the study, the researcher should keep in mind
two types of data viz., primary and secondary.
− The primary data are those which are collected afresh and for
the first time, and thus happens to be original in character.
− The secondary data, on the other hand, are those which have
already been collected by someone else and which have
already been passed through the statistical process. It involves
less cost, time and effort.
Data Collection Methods …
• The researcher would have to decide which sort of
data s/he would be using (thus collecting) for his/her
study and accordingly, s/he would have to select one
or the other method of data collection.

• The methods of collecting primary and secondary data


differ since primary data are to be originally collected,
while in the case of secondary data, the nature of data
collection work is merely that of compilation.
Advantages and Disadvantages of Using
Primary Data
• Advantages of using primary data:
– The investigator collects data specific to the problem under study.
– There is no doubt about the quality of the data collected (for the
investigator).
• Disadvantages of using primary data:
– The investigator has to deal with all the hassles of data collection.
▪ Deciding why, what, how, when to collect.
▪ Getting the data collected (personally or through others).
▪ Getting funding and dealing with funding agencies.
▪ Ethical considerations (consent, etc.)
– Ensuring the data collected is of a high standard.
– Cost of obtaining the data is often the major expense in studies.
Advantages and Disadvantages of Using
Secondary Data
• Advantages of using secondary data:
– No hassles of data collection.
– It is less expensive.
– The investigator is not personally responsible for the quality of
data (“I didn’t do it”).
• Disadvantages of using secondary data:
– The investigator cannot decide what is to be collected.
– One can only hope that the data is of good quality.
– Obtaining additional data is not possible.
Primary Data Collection
• It involves the collection of original data directly from the
source or through direct interaction with the respondents.
– This method allows researchers to obtain firsthand information
specifically tailored to their research objectives.
• There are various techniques for primary data collection,
including:
a) Surveys and Questionnaires: Researchers design structured
questionnaires or surveys to collect data from individuals or
groups. These can be conducted through face-to-face
interviews, telephone calls, mail, or online platforms.
b) Interviews: It involve direct interaction between the researcher
and the respondent. They can be conducted in-person, over
the phone, or through video conferencing. It can be
structured (with predefined questions), semi-structured
(allowing flexibility), or unstructured (more conversational).
Primary Data Collection …
c) Observations: Researchers observe and record behaviors,
actions, or events in their natural setting. This method is useful
for gathering data on human behavior, interactions, or
phenomena without direct intervention.
d) Experiments: Experimental studies involve the manipulation
of variables to observe their impact on the outcome.
Researchers control the conditions and collect data to conclude
cause-and-effect relationships.
e) Focus Groups: Focus groups are a qualitative data collection
method which are used to explore the opinions, knowledge,
perceptions, and concerns of individuals in regard to a
particular topic. The focus group typically involves six to ten
individuals who have some knowledge or experience with the
topic.
Secondary Data Collection
• It involves using existing data collected by someone else
for a purpose different from the original intent.
Researchers analyze and interpret this data to extract
relevant information.
− Secondary data means data that are already available i.e., it
refers to the data which have already been collected and
analyzed by someone else.
• Secondary data can be obtained from various sources,
including:
a) Published Sources: Researchers refer to books, academic
journals, magazines, newspapers, government reports, and
other published materials that contain relevant data.
b) Online Databases: Numerous online databases provide access
to a wide range of secondary data, such as research
articles, statistical information, economic data, and social
surveys.
Secondary Data Collection …
c) Government and Institutional Records: Government agencies,
research institutions, and organizations often maintain
databases or records that can be used for research
purposes.
d) Publicly Available Data: Data shared by individuals,
organizations, or communities on public platforms, websites,
or social media can be accessed and utilized for research.
e) Past Research Studies: Previous research studies and their
findings can serve as valuable secondary data sources.
Researchers can review and analyze the data to gain insights or
build upon existing knowledge.
Secondary Data Collection …
• The researcher, before using secondary data, must
see that they possess the following characteristics:
1) Reliability of data: The reliability can be tested by finding out
such things about the said data:
✓ Who collected the data?
✓ What were the sources of data?
✓ Were they collected by using proper methods?
✓ At what time were they collected?
✓ Was there any bias of the compiler?
✓ What level of accuracy was desired? Was it achieved?
2) Suitability of data: The data that are suitable for one enquiry
may not necessarily be found suitable in another enquiry.
Hence, if the available data are found to be unsuitable, they
should not be used by the researcher.
Secondary Data Collection …
▪ In this context, the researcher must carefully examine the
definition of various terms and units of collection used at the time
of collecting the data from the primary source originally. Similarly,
the nature, scope and object of the original enquiry must also be
studied. If the researcher finds differences in these, the data will
remain unsuitable for the present enquiry and should not be used.
3) Adequacy of data: If the level of accuracy achieved in data is
found inadequate for the purpose of the present enquiry, they
will be considered as inadequate and should not be used by the
researcher. The data will also be considered inadequate, if they
are related to an area which may be either narrower or wider
than the area of the present enquiry.
Selection of Appropriate Method for
Data Collection
• There are various methods of data collection. As such
the researcher must wisely select the method(s) for
her/his own study, keeping in view the following factors:
1) Nature, scope and object of enquiry: This constitutes the most
important factor affecting the choice of a particular method.
The method selected should be such that it suits the type of enquiry
that is to be conducted by the researcher. This factor is also
important in deciding whether the data already available
(secondary data) are to be used or the data not yet available
(primary data) are to be collected.
2) Availability of funds: Availability of funds for the research project
determines to a large extent the method to be used for the
collection of data. When funds at the disposal of the researcher are
very limited, s/he will have to select a comparatively cheaper
method which may not be as efficient and effective as some other
costly method. Finance, in fact, is a big constraint in practice and
the researcher has to act within this limitation.
Selection of Appropriate Method for
Data Collection …
3) Time factor: Availability of time has also to be taken into
account in deciding a particular method of data collection. Some
methods take relatively more time, whereas with others the
data can be collected in a comparatively shorter duration. The
time at the disposal of the researcher, thus, affects the selection
of the method by which the data are to be collected.
4) Precision required: Precision required is yet another important
factor to be considered at the time of selecting the method of
collection of data.
Benefits of Data Collection
• Collecting data offers several benefits, including:
– Knowledge and Insight
– Evidence-Based Decision Making
– Problem Identification and Solution
– Identifying Trends and Predictions
– Support for Research and Development
– Policy Development
– Quality Improvement
– Knowledge Sharing and Collaboration, etc.
Thank You!

You might also like