Lecture 2 - Sampling Techniques - Data Processing - and Data Collection Methods
Lecture 2 - Sampling Techniques - Data Processing - and Data Collection Methods
By
Prof. Ruwan Jayathilaka
B.A, MEcon (Colombo), MSc (NUS), PhD (Griffith)
SLIIT : https://www.sliit.lk/faculty-of-business/staff/ruwan.j/
Google Scholar: https://scholar.google.com/citations?user=OggscHUAAAAJ&hl=en
Research Gate: https://www.researchgate.net/profile/Ruwan_Jayathilaka
Email : ruwan.j@sliit.ilk
30th June 2023
Outline
• Sampling techniques
• Why sampling?
• Types of sampling
• Probability samples
• Non-probability samples
• How to select the sample size ?
• Data Processing
• Why Data Preprocessing
• Data Quality and Important Steps
• Data collection methods
Slide 2 of 103
1
7/30/2023
Research
Data collection Research
philosophy and
(Sampling, design
approach
secondary data,
observation, Data processing and analysis
interviews, (Quantitative and qualitative methods)
questionnaire)
Conclusion and
Where are you now ? reports
Slide 3 of 103
Sample:
A subset of the population
Slide 4 of 103
2
7/30/2023
Why sampling?
Slide 5 of 103
Target Population:
The population to be studied/ to which the investigator
wants to generalize his results
Sampling Unit:
smallest unit from which sample can be selected
Sampling frame
List of all the sampling units from which sample is drawn
Sampling scheme
Method of selecting sampling units from sampling frame
Slide 6 of 103
3
7/30/2023
Types of sampling
• Probability samples
• Non-probability samples
Slide 8 of 103
4
7/30/2023
SAMPLING TECHNIQUES
Slide 9 of 28
PROBABILITY SAMPLING
Slide 10 of 28
10
5
7/30/2023
AN EXAMPLE/1
• A study on job satisfaction on a company of 9,000
employees. Research funds allow you to interview
450 employees – (450/9000 = 1/20)
11
Slide 12 of 28
12
6
7/30/2023
Slide 13 of 28
13
AN EXAMPLE/2
⚫ Supposing a UK national study on job satisfaction in
the largest UK firms needs to be carried out (5,000
interviews required), but limited funds for travel are
available...
⚫ How would you select your sample?
⚫ Cluster sampling: first stage of sampling is not a unit
(e.g. the employee) but a group of units. For
example, randomly selecting 10 firms from the UK
top 100 FTSE index, then 500 employees randomly
selected in each company.
Slide 14 of 103
14
7
7/30/2023
Cluster sampling
• Useful , for example, for widely dispersed
populations
Slide 15 of 103
15
Slide 16 of 103
16
8
7/30/2023
Slide 17 of 103
17
SAMPLING TECHNIQUES
Slide 18 of 103
18
9
7/30/2023
Snowball sampling
• researcher makes initial contact with a small group
• these respondents introduce others in their
network
e.g. Bryman’s(1999) sample of British visitors to
Disney theme parks
Slide 19 of 103
19
Slide 20 of 103
20
10
7/30/2023
Types of non-probability
sampling: Summary
• Quota: Select enough subjects as they come
Slide 21 of 103
21
SAMPLING BIAS
• Biased sample does not represent
population
• some groups in the population are
over-represented; others are under-
represented
• Sources of bias
• non-probability sampling
• inadequate sample frame
• non-response
Slide 22 of 103
22
11
7/30/2023
SAMPLING: LIMITS TO
GENERALISATION
• In all cases findings can only be generalised to the
population from which the sample was selected
• Time, historical events and cohort effects also
limit generalisation
23
Sample size
Quantitative Qualitative
Z 2σ 2 Z2 π(1 − π)
n= n=
D2 D2
Slide 24 of 103
24
12
7/30/2023
Problem 1
A study is to be performed to determine a certain
parameter in a community. From a previous study a
SD of 46 was obtained.
If a sample error of up to 4 is to be accepted. How
many subjects should be included in this study at 99%
level of confidence?
Answer
Slide 25 of 103
25
Problem 2
It was desired to estimate proportion of anaemic
children in a certain preparatory school. In a similar
study at another school a proportion of 30 % was
detected.
Compute the minimal sample size required at a
confidence limit of 95% and accepting a difference of
up to 4% of the true population.
Answer
Slide 26 of 103
26
13
7/30/2023
Precision
Cost
Slide 27 of 103
27
Data Processing
• Why Data Preprocessing
• Data Quality
• Important Steps
Slide 28 of 103
28
14
7/30/2023
Introduction
• The data, after collection, has to be prepared for
analysis.
• Collected data is raw and it must be converted to
the form that is suitable for the required analysis.
• The result of the analysis are affected a lot by the
form of the data.
• So, proper data preparation is must to get reliable
result.
Slide 29 of 103
29
30
15
7/30/2023
Every row
is a single
case
Slide 31 of 103
31
Data Processing
Slide 32 of 103
32
16
7/30/2023
Slide 33 of 103
33
Types of Data
There are different types of data
• Nominal
• Examples: ID numbers, eye color, zip codes
• Ordinal
• Examples: rankings (e.g., taste of potato chips on a
scale from 1-10), grades, height in {tall, medium,
short}
• Interval
• Examples: calendar dates, temperatures in Celsius or
• Ratio
• Examples: temperature, length, time, counts
Slide 34 of 103
34
17
7/30/2023
Slide 35 of 103
35
Data Quality
• What kinds of data quality problems?
• How can we detect problems with the data?
• What can we do about these problems?
Slide 36 of 103
36
18
7/30/2023
Noise
• Noise refers to modification of original values
• Examples: distortion of a person’s voice when talking on a
poor phone and “snow” on television screen
Slide 37 of 103
37
Outliers
• Outliers are data objects with characteristics that
are considerably different than most of the other
data objects in the data set
Slide 38 of 103
38
19
7/30/2023
Missing Values
• Reasons for missing values
• Information is not collected
(e.g., people decline to give their age, weight and
income)
• Attributes may not be applicable to all cases
(e.g., annual income is not applicable to children)
• Handling missing values
• Eliminate Data Objects
• Estimate Missing Values
• Ignore the Missing Value During Analysis
• Replace with all possible values (weighted by their
probabilities)
Slide 39 of 103
39
Duplicate Data
• Data set may include data objects that are duplicates,
or almost duplicates of one another
• Major issue when merging data from heterogeous
sources
• Examples:
• Same person with multiple email addresses
• Data cleaning
• Process of dealing with duplicate data issues
Slide 40 of 103
40
20
7/30/2023
Important Steps
QUESTIONNAIRE
EDITING CODING
CHECKING
GRAPHICAL
TABULATION CLASSIFICATION
REPRESENTATION
Slide 41 of 103
41
Questionnaire Checking
When the data is collected through questionnaires, the
first steps of data preparation process is to check the
questionnaires if they are accepted or not.
Slide 42 of 103
42
21
7/30/2023
Questionnaire Checking
• A questionnaire returned from the field may be
unacceptable for several reasons.
• Parts of the questionnaire may be incomplete.
Inadequate answers. No responses to specific questions
• The pattern of responses may indicate that the
respondent did not understand or follow the
instructions.
• The responses show little variance.
• One or more pages are missing.
• The questionnaire is answered by someone who does
not qualify for participation.
• Fictitious interviews, Inconsistencies, Illegible responses,
Yea- or nay-saying patterns, Middle-of-the-road patterns
Slide 43 of 103
43
Editing
• Editing of data is a process of examining the collected
raw data (specially in surveys) to detect errors and
omissions and to correct these when possible.
• Data must be inspected for completeness and
consistency.
• E.g. a respondent may not answer the question
on marriage, age, income…..
• But in other questions, respondent answers that
he/she had been married for 10 years and has 3
children
• Age ? Income ?
Slide 44 of 103
44
22
7/30/2023
Editing (contd.)
• Three basic approaches for editing:
- Go back to the respondents for clarification
- Infer from other responses
- Discard the response altogether
Slide 45 of 103
45
Editing (contd.)
Treatment of Unsatisfactory Responses
Treatment of
Unsatisfactory
Responses
46
23
7/30/2023
Editing (contd.)
• Returning to the Field – The questionnaires with
unsatisfactory responses may be returned to the
field, where the interviewers recontact the
respondents.
• Assigning Missing Values – If returning the
questionnaires to the field is not feasible, the editor
may assign missing values to unsatisfactory
responses.
• Discarding Unsatisfactory Respondents – In this
approach, the respondents with unsatisfactory
responses are simply discarded
Slide 47 of 103
47
Coding
• Coding refers to the process of assigning numerals or
other symbols to answers so that responses can be put
into limited number of categories or classes.
• It’s a process of translating information gathered from
questionnaires or other sources into something that can
be analyzed
• Involves assigning a value to the information given—
often value is given a label
• Coding can make data more consistent:
• Example: Question = Sex
• Answers = Male, Female, M, or F
• Coding will avoid such inconsistencies
Slide 48 of 103
48
24
7/30/2023
Coding : Systems
• Common coding systems (code and label) for
dichotomous variables:
• 0=No 1=Yes
(1 = value assigned, Yes= label of value)
• OR: 1=No 2=Yes
• When you assign a value you must also make it clear what
that value means
49
Slide 50 of 103
50
25
7/30/2023
51
Slide 52 of 103
52
26
7/30/2023
Slide 53 of 103
53
54
27
7/30/2023
Slide 55 of 103
55
Slide 56 of 103
56
28
7/30/2023
Coding: Tip
Slide 57 of 103
57
Classification
• Classification of data which happens to be the
process of arranging data in group or classes on the
basis of common characteristics.
Slide 58 of 103
58
29
7/30/2023
Classification (cont.)
• Attributes : only their presence
and absence in an individual
items can be noticed.
Slide 59 of 103
59
Tabulation
Slide 60 of 103
60
30
7/30/2023
Graphical Representation
Slide 61 of 103
61
Data Cleaning
• Checking the data for consistency and treatment for
missing value.
• One of the first steps in analyzing data is to “clean” it
of any obvious data entry errors:
• Outliers? (really high or low numbers)
Example: Age = 110 (really 10 or 11?)
• Value entered that doesn’t exist for variable?
Example: 2 entered where 1=male, 0=female
• Missing values?
Did the person not give an answer? Was answer
accidentally not entered into the database?
Slide 62 of 103
62
31
7/30/2023
63
Data Adjusting
Data adjusting is not always necessary but it may improve
the quality of analysis sometimes.
64
32
7/30/2023
65
Data Adjusting
Data adjusting is not always necessary but it may improve
the quality of analysis sometimes.
Variable Respecification
• Variable respecification involves the transformation of
data to create new variables or modify existing
variables.
• E.G., the researcher may create new variables that are
composites of several other variables.
• Dummy variables are used for respecifying categorical
variables. The general rule is that to respecify a
categorical variable with K categories, K-1 dummy
variables are needed
Slide 66 of 103
66
33
7/30/2023
Nonusers 1 1 0 0
Light users 2 0 1 0
Medium users 3 0 0 1
Heavy users 4 0 0 0
Slide 67 of 103
67
Zi = (Xi - X )/sx
Slide 68 of 103
68
34
7/30/2023
Slide 69 of 103
69
70
35
7/30/2023
Slide 71 of 103
71
Qualitative Quantitative
Open ended and less
Sampling structured protocols Random sampling
(Flexible)
Depend on interactive Structured data collection
Tools
interviews instruments
Produce results that give Produce results that
Results meaning, experience and generalize, compare and
views summarize
Slide 72 of 103
72
36
7/30/2023
1) Document Review
2) Observation
3) Interview (face-to-face)
4) Focus Group Discussion
Slide 73 of 103
73
Slide 74 of 103
74
37
7/30/2023
Document Review
• A qualitative research project may require review of
documents such as:
• Course syllabi
• Faculty journals
• Meeting minutes
• Strategic plans
• Newspapers
Slide 75 of 103
75
Slide 76 of 103
76
38
7/30/2023
Slide 77 of 103
77
Observation
• Observation is a technique that involves systematically
selecting, watching and recording behaviour and
characteristics of living beings, objects or phenomena.
• Without training, our observations will heavily reflect our
personal choices of what to focus on and what to
remember.
• You need to heighten your sensitivity to details that you
would normally ignore and at the same time to be able to
focus on phenomena of true interest to your study.
Slide 78 of 103
78
39
7/30/2023
Slide 79 of 103
79
Observation : Preparation
1) Determine the purpose of the observation activity as
related to the overall research objectives
2) Determine the population(s) to be observed
3) Consider the accessibility of the population(s) and
the venues in which you would like to observe them
4) Investigate possible sites for participant observation
5) Select the site(s), time(s) of day, and date(s), and
anticipate how long you will collect participant
observation data on each occasion
6) Decide how field staff will divide up or pair off to
cover all sites most effectively
Slide 80 of 103
80
40
7/30/2023
Slide 81 of 103
81
Interview
• An Interview is a data-collection (generation)
technique that involves oral questioning of
respondents.
• Answers to the questions posed during an interview
can be recorded by writing them down or by tape-
recording the responses, or by a combination of
both.
• Can take ½ hour and may extend over several hours;
repeat interviews possible
• Organising the interview (structure)
• Relationship with respondents
Slide 82 of 103
82
41
7/30/2023
Slide 83 of 103
83
Slide 84 of 103
84
42
7/30/2023
Slide 85 of 103
85
86
43
7/30/2023
87
Slide 88 of 103
88
44
7/30/2023
Slide 89 of 103
89
Interview - Skills
• Think about the motivations of interviewees and their
implications
• Listen more than you speak
• Build trust - know about the company/organisation,
Telephone and then send a letter, use appropriate
language (student/researcher, interview/discussion),
show interest and enthusiasm
• Ask straightforward questions
• Consider the location of the interview
• Begin with the general (things people know - build
confidence)
• Keep to time Slide 90 of 103
90
45
7/30/2023
91
Slide 92 of 103
92
46
7/30/2023
Slide 93 of 103
93
Slide 94 of 103
94
47
7/30/2023
Slide 95 of 103
95
Example:
• A district health officer had noticed that there were an
unusually large number of cases of malnutrition of
children under 5 reported from one area in her
district. Because she had little idea of why there might
be more malnutrition in that area she decided to
organise 3 FGD (one with leaders, one with mothers
and one with health staff from the area). She hoped to
identify potential causes of the problem through the
FGDs and then develop a more intensive study, if
necessary.
Slide 96 of 103
96
48
7/30/2023
Slide 97 of 103
97
Slide 98 of 103
98
49
7/30/2023
Slide 99 of 103
99
100
50
7/30/2023
101
102
51
7/30/2023
103
52