In-Class Exercise #1 Notes
In-Class Exercise #1 Notes
SYSTEMATIC SAMPLE
Partition the N items in the frame into n groups of k items, this is where
k = N / n.
Round k to the nearest integer. To select a systematic sample, choose
the first item to be selected at random from the first k items in the
frame.
Then, select the remaining n - 1 items by taking every kth item
thereafter from the entire frame.
STRATIFIED SAMPLE
Divide population into two or more subgroups (known as strata)
according to some common characteristic.
A simple random sample is selected from each subgroup, sample sizes
being proportional to strata sizes.
Samples from subgroups are combined into one.
Application: Population of voters
CLUSTER SAMPLE
Population is divided into several "clusters", each representative of the
population.
A simple random sample of clusters is selected.
All items in selected clusters can be used, or items can be chosen from a
cluster using alternative techniques.
Application: Election exit polls
DATA CLEANING
Data Cleaning corrects defects in inconsistent data and ensures the data
contain suitable quality for analysis.
Invalid Variable Values can be identified as being incorrect by simple
scanning techniques so long as operational definitions for the variables
the data represent exist.
Coding Errors can result from poor recording or entry of data values or
as the result of computerized operations such as copy-and-paste or data
import.
Data Integration Errors arise when data from two different
computerised sources, such as two different data repositories are
combined into one data set for analysis.
Missing Values are values that were not collected for a variable.
Outliers are values that seem excessively different from most of the
other values.
Nonresponse Error arises from failure to collect data on all items in the
sample and results in a nonresponse bias.
ETHICAL ISSUES
Coverage error can result in selection bias and becomes an ethical issue if
particular groups or individuals are purposely excluded from the frame so that
the survey results are more favourable to the survey’s sponsor.
Nonresponse error can lead to nonresponse bias and becomes an ethical issue
if the sponsor knowingly designs the survey so that particular groups or
individuals are less likely than others to respond.