Unit 4
Unit 4
Unit 4
Field work and tabulation of Data, Processing and analysis of data, Hypothesis – Formulation:
Types of Errors in Formulation of Hypothesis, Null and Alternate hypothesis: Test. T test, Chi
Square, F-test and Z Tests, Analysis of Variance.
Selection of Fieldworkers
The first step in the fieldwork process is the selection of fieldworkers. The researcher should (1)
develop job specifications for the project, taking into account the mode of data collection, (2)
decide what characteristics the fieldworkers should have, and (3) recruit appropriate individuals.
Training of Fieldworkers
Training may be conducted in person at a central location or, if the interviewers are
geographically dispersed, by mail, by videoconferencing, or by using the Internet. Training
ensures that all interviewers administer the questionnaire in the same manner so that the data can
be collected uniformly. Training should cover making the initial contact, asking the questions,
probing, recording the answers, and terminating the interview.
Supervision of Fieldworkers
Supervision of fieldworkers means making sure that they are following the procedures and
techniques in which they were trained. Supervision involves quality control and editing,
sampling control, control of cheating, and central office control.
Validation of Fieldwork
Validation of fieldwork means verifying that the fieldworkers are submitting authentic
interviews. To validate the study, the supervisors call 10 to 25 percent of the respondents to
inquire whether the fieldworkers actually conducted the interviews. The supervisors ask about
the length and quality of the interview, reaction to the interviewer, and basic demographic data.
The demographic information is cross-checked against the information reported by the
interviewers on the questionnaires.
Evaluation of Fieldworkers
It is important to evaluate fieldworkers to provide them with feedback on their performance as
well as to identify the better fieldworkers and build a better, high-quality field force. The
evaluation criteria should be clearly communicated to the fieldworkers during their training. The
evaluation of fieldworkers should be based on the criteria of cost and time, response rates,
quality of interviewing, and quality of data.
Data preparation includes editing, coding, and data entry and is the activity that ensures the
accuracy of the data and their conversion from raw form to reduced and classified forms that are
more appropriate for analysis.
Editing - The process of checking the completeness, consistency, and legibility of data and
making the data ready for coding and transfer to storage. Editing detects errors and omissions,
corrects them when possible, and certifies that maximum data quality standards are achieved.
The editor’s purpose is to guarantee that data are:
• Accurate.
• Consistent with the intent of the question and other information in the survey.
• Uniformly entered.
• Complete.
• Arranged to simplify coding and tabulation.
Field Editing - Preliminary editing by a field supervisor on the same day as the interview to
catch technical omissions, check legibility of handwriting, and clarify responses that are logically
or conceptually inconsistent. The field supervisor also validates field results by reinterviewing
some percentage of the respondents on some questions to verify that they have participated. Field
editing is used to:
1. Identify technical omissions such as a blank page on an interview form.
2. Check legibility of handwriting for open-ended responses.
3. Clarify responses that are logically or conceptually inconsistent.
In-house editing A rigorous editing job performed by a centralized office staff.
Coding involves assigning numbers or other symbols to answers so that the responses can be
grouped into a limited number of categories. In coding, categories are the partitions of a data set
of a given variable (e.g., if the variable is gender, the partitions are male and female).
Categorization is the process of using rules to partition a body of data. Both closed- and open-
response questions must be coded.
A codebook contains each variable in the study and specifies the application of coding rules to
the variable. It is used by the researcher or research staff to promote more accurate and more
efficient data entry. It is the definitive source for locating the positions of variables in the data
file during analysis. Precoding means assigning codebook codes to variables in a study and
recording them on the questionnaire. It is helpful for manual data entry because it makes the step
of completing a data entry coding sheet unnecessary. With a precoded instrument, the codes for
variable categories are accessible directly from the questionnaire. The coding should be
exhaustive, appropriate to the research problem, mutually exclusive and derived from one
classification principle.
Data Entry - Data entry converts information gathered by secondary or primary method to a
medium for viewing and manipulation. Keyboarding remains a mainstay for researchers who
need to create a data file immediately and store it in a minimal space on a variety of media.
However, researchers have benefitted from more efficient ways of speeding up the research
process, especially from bar coding and optical character and mark recognition.
Tabulation refers to the orderly arrangement of data in a table or other summary format. When
this tabulation process is done by hand, the term tallying is used. Counting the different ways
respondents answered a question and arranging them in a simple tabular form yields a frequency
table. The actual number of responses to each category is a variable’s frequency distribution.
Cross – tabulation - The appropriate technique for addressing research questions involving
relationships among multiple less-than interval variables; results in a combined frequency table
displaying one variable in rows and another in columns.
Measures of Central Tendency - Central tendency can be measured in three ways - the mean,
median, or mode - each of which has a different meaning.
Mean - The mean is simply the arithmetic average, and it is perhaps the most common measure
of central tendency.
Median - The next measure of central tendency, the median, is the midpoint of the distribution,
or the 50th percentile. In other words, the median is the value below which half the values in the
sample fall, and above which half of the values fall.
Mode - In statistics the mode is the measure of central tendency that identifies the value that
occurs most often.
Measures of Dispersion - Another way to summarize the data is to calculate the dispersion of
the data, or how the observations vary from the mean.
The Range - The simplest measure of dispersion is the range. It is the distance between the
smallest and the largest values of a frequency distribution.
Variance - A measure of variability or dispersion. Its square root is the standard deviation.
A hypothesis often states that one concept is related to another. Therefore, the concepts listed in
the hypotheses must have operational measures if the research is to be performed. In research, a
hypothesis serves several important functions:
• It guides the direction of the study.
• It identifies facts that are relevant and those that are not.
• It suggests which form of research design is likely to be most appropriate.
Types of Hypotheses
Descriptive Hypotheses - They state the existence, size, form, or distribution of some variable.
Correlational hypotheses state that the variables occur together in some specified manner
without implying that one causes the other.
With explanatory (causal) hypotheses, there is an implication that the existence of or a change
in one variable cause or leads to a change in the other variable.
Null and Alternative Hypothesis – The null hypothesis (H0) is used for testing. It is a
statement that no difference exists between the parameter (a measure taken by a census of the
population or a prior measurement of a sample of the population) and the statistic being
compared to it (a measure from a recently drawn sample of the population).
A null hypothesis is a statement of the status quo, one of no difference or no effect. If the null
hypothesis is not rejected, no changes will be made. An alternative hypothesis (H1) is one in
which some difference or effect is expected. Accepting the alternative hypothesis will lead to
changes in opinions or actions. Thus, the alternative hypothesis is the opposite of the null
hypothesis.
Following classical statistics approach, we accept or reject a hypothesis on the basis of sampling
information alone. Since any sample will almost surely vary somewhat from its population, we
must judge whether the differences are statistically significant or insignificant. A difference has
statistical significance if there is good reason to believe the difference does not represent
random sampling fluctuations only.
A significance level is a critical probability associated with a statistical hypothesis test that
indicates how likely it is that an inference supporting a difference between an observed value and
some statistical expectation is true. The term p-value stands for probability-value and is
essentially another name for an observed or computed significance level.
The terms parametric statistics and nonparametric statistics refer to the two major groupings
of statistical procedures. The major distinction between them lies in the underlying assumptions
about the data to be analyzed.
Parametric statistics - Involve numbers with known, continuous distributions; when the data
are interval or ratio scaled and the sample size is large, parametric statistical procedures are
appropriate.
Nonparametric statistics - Appropriate when the variables being analysed do not conform to
any known or continuous distribution.
The t -test - A univariate t-test is appropriate for testing hypotheses involving some observed
mean against some specified value. The t-distribution, like the standardized normal curve, is a
symmetrical, bell-shaped distribution with a mean of 0 and a standard deviation of 1.0. When
sample size (n) is larger than 30, the t-distribution and Z-distribution are almost identical. The t-
test is strictly appropriate for tests involving small sample sizes (less than 30) with unknown
standard deviations, researchers commonly apply the t-test for comparisons involving the mean
of an interval or ratio measure. The precise height and shape of the t-distribution vary with
sample size. More specifically, the shape of the t-distribution is influenced by its degrees of
freedom (df).
The degrees of freedom are determined by the number of distinct calculations that are possible
given a set of information. In the case of a univariate t-test, the degrees of freedom are equal to
the sample size (n) minus one. The calculation of t closely resembles the calculation of the Z-
value. To calculate t, use the formula -
Z - test
z-test is based on the normal probability distribution and is used for judging the significance of
several statistical measures, particularly the mean. The Z-distribution and the t-distribution are
very similar, and thus the Z-test and t-test will provide much the same result in most situations.
However, when the population standard deviation (s) is known, the Z-test is most appropriate.
When the sample size greater than 30, the Z-test is used.
X̄ = sample mean, µ = population mean and S X̄ = standard error = s/√n, where s = standard
deviation and n = sample size.
As a test of independence, χ2 test enables us to explain whether or not two attributes are
associated. In such a situation, we proceed with the null hypothesis that the two attributes are
independent. On this basis we first calculate the expected frequencies and then work out the
value of χ2. If the calculated value of χ2 is less than the table value at a certain level of
significance for given degrees of freedom, we conclude that null hypothesis stands which means
that the two attributes are independent or not associated.
Analysis of variance (ANOVA) - A statistical technique for examining the differences among
means for two or more populations. The null hypothesis, typically, is that all means are equal.
In its simplest form, analysis of variance must have a dependent variable that is metric (measured
using an interval or ratio scale). There must also be one or more independent variables.
The basic principle of ANOVA is to test for differences among the means of the populations by
examining the amount of variation within each of these samples, relative to the amount of
variation between the samples.
ANOVA Technique
One-way (or single factor) ANOVA: Under the one-way ANOVA, we consider only one factor
and then observe that the reason for said factor to be important is that several possible types of
samples can occur within that factor. We then determine if there are differences within that
factor. The technique involves the following steps:
(i) Obtain the mean of each sample i.e., obtain X1, X2, X3, ..., Xk when there are k samples.
(ii) Work out the mean of the above sample means.
(iii) Take the deviations of the sample means from the mean of the sample means and calculate
the square of such deviations which may be multiplied by the number of items in the
corresponding sample, and then obtain their total. This is known as the sum of squares for
variance between the samples (or SS between).
(iv) Divide the result of the (iii) step by the degrees of freedom (k-1) between the samples to
obtain variance or mean square (MS) between samples.
(v) Obtain the deviations of the values of the sample items for all the samples from
corresponding means of the samples and calculate the squares of such deviations and then obtain
their total. This total is known as the sum of squares for variance within samples (or SS within).
(vi) Divide the result of (v) step by the degrees of freedom (n-k) within samples to obtain the
variance or mean square (MS) within samples.