Chapter 7 - BRM

AAU DEPARTMENT OF PADM Research Methods for Public Administration
Chapter Seven: Data Analysis

7.1 Introduction
The data, after collection, has to be processed and analyzed in accordance with the outline laid
down for the purpose at the time of developing the research plan. This is essential for a scientific
study and for ensuring that we have all relevant data for making contemplated comparisons and
analysis.
Technically speaking, processing implies editing, coding, classification and tabulation of collected
data so that they are amenable to analysis. The term analysis refers to the computation of certain
measures along with searching for patterns of relationship that exist among data-groups. Thus, in
the process of analysis, relationships or differences supporting or conflicting with original or new
hypotheses should be subjected to statistical tests of significance. Generally speaking, analysis of
data involves a number of closely related operations which are performed with the purpose of
summarizing the collected data and organizing these in such a manner that they answer the research
question(s). With this brief introduction concerning the concepts of processing and analysis, we can
now proceed with the explanation of all the processing operations.
7.2 Data Processing
Editing
Editing of data is a process of examining the collected raw data (especially in surveys) to detect
errors and omissions and to correct these when possible. As a matter of fact, editing involves a
careful scrutiny of the completed questionnaires and/or schedules. Editing is done to assure that the
data are accurate, consistent with other facts gathered, uniformly entered, as completed as possible
and have been well arranged to facilitate coding and tabulation.
With regard to points or stages at which editing should be done, one can talk of field editing and
central editing. Field editing consists in the review of the reporting forms by the investigator for
completing (translating or rewriting) what the latter has written in abbreviated and/or in illegible
form at the time of recording the respondents’ responses. This type of editing is necessary in view
of the fact that individual writing styles often can be difficult for others to read. This sort of editing
should be done as soon as possible after the interview, preferably on the very day or on the next
day. While doing field editing, the investigator must restrain himself and must not correct errors of
omission by simply guessing what the informant would have said if the question had been asked.
Page 1 of 12
Central editing should take place when all forms or schedules have been completed and returned to
the office. This type of editing implies that all forms should get a thorough editing by a single editor
in a small study and by a team of editors in case of a large inquiry. Editor(s) may correct the
obvious errors such as an entry in the wrong place, entry recorded in months when it should have
been recorded in weeks, and the like. In case of inappropriate on missing replies, the editor can
sometimes determine the proper answer by reviewing the other information in the schedule. At
times, the respondent can be contacted for clarification. The editor must strike out the answer if the
same is inappropriate and s/he has no basis for determining the correct answer or the response. In
such a case, an editing entry of ‘no answer’ is called for. All the wrong replies, which are quite
obvious, must be dropped from the final results, especially in the context of mail surveys.
Editors must keep in view several points while performing their work:
a) They should be familiar with instructions given to the interviewers and coders as well as
with the editing instructions supplied to them for the purpose.
b) While crossing out an original entry for one reason or another, they should just draw a single
line on it so that the same may remain legible.
c) They must make entries (if any) on the form in some distinctive color and that too in a
standardized form.
d) They should initial all answers, which they change or supply.
e) Editor’s initials and the date of editing should be placed on each completed form or
schedule.
Coding
Coding refers to the process of assigning numerals or other symbols to answers so that responses
can be put into a limited number of categories or classes. Such classes should be appropriate to the
research problem under consideration. They must also possess the characteristic of exhaustiveness
(i.e., there must be a class for every data item) and also that of mutual exclusively which means that
a specific answer can be placed in one and only one cell in a given category set. Another rule to be
observed is that of unidimensionality by which is meant that every class is defined in terms of only
one concept.
Coding is necessary for efficient analysis and through it the several replies may be reduced to a
small number of classes which contain the critical information required for analysis. Coding
decisions should usually be taken at the designing stage of the questionnaire. This makes it possible
Page 2 of 12
to pre-code the questionnaire choices and which in turn is helpful for computer tabulation as one
can straight forward key punch from the original questionnaires. But in case of hand coding, some
standard method may be used. One such standard method is to code in the margin with a colored
pencil. The other method can be to transcribe the data from the questionnaire to a coding sheet.
Whatever method is adopted, one should see that coding errors are altogether eliminated or reduced
to the minimum level.
Classification
Most research studies result in a large volume of raw data which must be reduced into
homogeneous groups if we are to get meaningful relationships. This fact necessitates classification
of data which happens to be the process of arranging data in groups or classes on the basis of
common characteristics. Data having a common characteristic are placed in one class and in this
way the entire data get divided into a number of groups or classes. Classification can be one of the
following two types, depending upon the nature of the phenomenon involved:
a) Classification According to Attributes: As stated above, data are classified on the basis of
common characteristics can be descriptive such as literacy, sex, honesty, etc. Descriptive
characteristics refer to qualitative phenomenon, which cannot be measured quantitatively; only their
presence or absence in an individual item can be noticed. Data obtained this way based on certain
attributes are known as statistics of attributes and their classification is said to be classification
according to attributes.
b) Classification According to Class-Intervals: Unlike descriptive characteristics, the numerical

characteristics refer to quantitative phenomenon, which can be measured through some statistical
units. Data relating to income, production, age, weight, etc. come under this category; and are
classified on the basis of class intervals. For instance, persons whose incomes, say, are within Br.
201 to Br. 500 can form one group; those whose incomes are within Br. 501 to Br. 700 can form
another group and so on. In this way, the entire data may be divided into a number of groups or
classes or what are usually called, ‘class-intervals.’ Each group of class-interval, thus, has an upper
limit as well as a lower limit which are known as class limits. The difference between the two class
limits is known as class magnitude. We may have classes with equal class magnitudes or with
unequal class magnitudes. The number of items which fall in a given class is known as the
frequency of the given class. All the classes or groups, with their respective frequencies taken
Page 3 of 12
together and put in the form of a table, are described as group frequency distribution. Classification
according to class intervals usually involves the following three main problems:
i. How many classes should be there? What should be their magnitudes?
There can be no specific answer with regard to the number of classes. The decision about this calls
for skill and experience of the researcher. However, the objective should be to display the data in
such a way as to make it meaningful for the analyst.
Typically, we may have 5 to 15 classes. With regard to the second part of the question, we can say
that, to the extent possible, class-intervals should be of equal magnitudes, but in some cases unequal
magnitudes may result in better classification. Hence the researcher’s objective judgement plays an
important part in this connection. Multiples of 2, 5 and 10 are generally preferred while determining
class magnitudes.
It should also be kept in mind that in case one or two or very few items have very high or very low
values, one may use what are known as open-ended intervals in the overall frequency distribution.
Such intervals may be expressed like under Br. 500 or Br. 10001 and over. Such intervals are
generally not desirable, but often cannot be avoided. The researcher must always remain conscious
of this fact while deciding the issue of the total number of class intervals in which the data are to be
classified.
ii. How to choose class limits?
While choosing class limits, the researcher must take into consideration the criterion that the mid-
point (generally worked out first by taking the sum of the upper limit and lower limit of a class and
then divide this sum by 2) of a class-interval and the actual average of items of that class interval
should remain as close to each other as possible. Consistent with this, the class limits should be
located at multiples of 2, 5, 10, 20, 100 and such other figures. Class limits may generally be stated
in any of the following forms:
Exclusive type class intervals: They are usually stated as follows:
10–20
20–30
30–40
40–50
The above intervals should be read as under:
10 and under 20
20 and under 30
Page 4 of 12
30 and under 40
40 and under 50
Thus, under the exclusive type class intervals, the items whose values are equal to the upper limit of
a class are grouped in the next higher class. For example, an item whose value is exactly 30 would
be put in 30–40 class interval and not in 20–30 class interval.
In simple words, we can say that under exclusive type class intervals, the upper limit of a class
interval is excluded and items with values less than the upper limit (but not less than the lower
limit) are put in the given class interval.
Inclusive type class intervals: They are usually stated as follows:
11–20
21–30
31–40
41–50
In inclusive type class intervals the upper limit of a class interval is also included in the concerning
class interval. Thus, an item whose value is 20 will be put in 11–20 class interval. The stated upper
limit of the class interval 11–20 is 20 but the real limit is 20.99999 and as such 11–20 class interval
really means 11 and under 21.
When the phenomenon under consideration happens to be a discrete one (i.e., can be measured and
stated only in integers), then we should adopt inclusive type classification. But when the
phenomenon happens to be a continuous one capable of being measured in fractions as well, we can
use exclusive type class intervals.
iii. How to determine the frequency of each class?
This can be done either by tally sheets or by mechanical aids. Under the technique of tally sheet, the
class-groups are written on a sheet of paper (commonly known as the tally sheet) and for each item
a stroke (usually a small vertical line) is marked against the class group in which it falls. The
general practice is that after every four small vertical lines in a class group, the fifth line for the item
falling in the same group is indicated as horizontal line through the said four lines and the resulting
flower (IIII) represents five items. All this facilitates the counting of items in each one of the class
groups. An illustrative tally sheet can be shown as under:
Page 5 of 12
Tabulation
When a mass of data has been assembled, it becomes necessary for the researcher to arrange the
same in some kind of concise and logical order. This procedure is referred to as tabulation. Thus,
tabulation is the process of summarizing raw data and displaying the same in compact form (i.e., in
the form of statistical tables) for further analysis. In a broader sense, tabulation is an orderly
arrangement of data in columns and rows.
Tabulation is essential because of the following reasons.
1. It conserves space and reduces explanatory and descriptive statement to a minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the detection of errors and omissions.
4. It provides a basis for various statistical computations.
Tabulation can be done by hand or by mechanical or electronic devices. The choice depends on the
size and type of study, cost considerations, time pressures and the availability of tabulating
machines or computers. In relatively large inquiries, we may use mechanical or computer tabulation
if other factors are favourable and necessary facilities are available. Hand tabulation is usually
preferred in case of small inquiries where the number of questionnaires is small and they are of
relatively short length. Hand tabulation may be done using the direct tally, the list and tally or the
card sort and count methods. When there are simple codes, it is feasible to tally directly from the
questionnaire.
Under this method, the codes are written on a sheet of paper, called tally sheet, and for each
response a stroke is marked against the code in which it falls. Usually after every four strokes
against a particular code, the fifth response is indicated by drawing a diagonal or horizontal line
through the strokes. These groups of five are easy to count and the data are sorted against each code
conveniently. In the listing method, the code responses may be transcribed onto a large work-sheet,
allowing a line for each questionnaire. This way a large number of questionnaires can be listed on
one work sheet. Tallies are then made for each question. The card sorting method is the most
Page 6 of 12
flexible hand tabulation. In this method the data are recorded on special cards of convenient size
and shape with a series of holes. Each hole stands for a code and when cards are stacked, a needle
passes through particular hole representing a particular code. These cards are then separated and
counted.
In this way frequencies of various codes can be found out by the repetition of this technique. We
can as well use the mechanical devices or the computer facility for tabulation purpose in case we
want quick results, our budget permits their use and we have a large volume of straight forward
tabulation involving a number of cross-breaks.
Tabulation may also be classified as simple and complex tabulation. The former type of tabulation
gives information about one or more groups of independent questions, whereas the latter type of
tabulation shows the division of data in two or more categories and as such is designed to give
information concerning one or more sets of inter-related questions. Simple tabulation generally
results in one-way tables which supply answers to questions about one characteristic of data only.
As against this, complex tabulation usually results in two-way tables (which give information about
two inter-related characteristics of data), three-way tables (giving information about three
interrelated characteristics of data) or still higher order tables, also known as manifold tables, which
supply information about several interrelated characteristics of data. Two-way tables, three-way
tables or manifold tables are all examples of what is sometimes described as cross tabulation.
Generally accepted principles of tabulation: Such principles of tabulation, particularly of
constructing statistical tables, can be briefly states as follows:
1. Every table should have a clear, concise and adequate title so as to make the table
intelligible without reference to the text and this title should always be placed just above the
body of the table.
2. Every table should be given a distinct number to facilitate easy reference.
3. The column headings (captions) and the row headings (stubs) of the table should be clear
and brief.
4. The units of measurement under each heading or sub-heading must always be indicated.
5. Explanatory footnotes, if any, concerning the table should be placed directly beneath the
table, along with the reference symbols used in the table.
6. Source or sources from where the data in the table have been obtained must be indicated just
below the table.
Page 7 of 12
7. Usually the columns are separated from one another by lines which make the table more
readable and attractive. Lines are always drawn at the top and bottom of the table and below
the captions.
8. There should be thick lines to separate the data under one class from the data under another
class and the lines separating the sub-divisions of the classes should be comparatively thin
lines.
9. The columns may be numbered to facilitate reference.
10. Those columns whose data are to be compared should be kept side by side. Similarly,
percentages and/or averages must also be kept close to the data.
11. It is generally considered better to approximate figures before tabulation as the same would
reduce unnecessary details in the table itself.
12. In order to emphasize the relative significance of certain categories, different kinds of type,
spacing and indentations may be used.
13. It is important that all column figures be properly aligned. Decimal points and (+) or (–)
signs should be in perfect alignment.
14. Abbreviations should be avoided to the extent possible and ditto marks should not be used in
the table.
15. Miscellaneous and exceptional items, if any, should be usually placed in the last row of the
table.
16. Table should be made as logical, clear, accurate and simple as possible. If the data happen to
be very large, they should not be crowded in a single table for that would make the table
unwieldy and inconvenient.
17. Total of rows should normally be placed in the extreme right column and that of columns
should be placed at the bottom.
18. The arrangement of the categories in a table may be chronological, geographical,
alphabetical or according to magnitude to facilitate comparison. Above all, the table must
suit the needs and requirements of an investigation.
Some Problems in Processing

We can take up the following two problems of processing the data for analytical purposes:
(a) The problem concerning “Don’t know” (or DK) responses: While processing the data, the
researcher often comes across some responses that are difficult to handle. One category of such
responses may be ‘Don’t Know Response’ or simply DK response. When the DK response group is
Page 8 of 12
small, it is of little significance. But, when it is relatively big, it becomes a matter of major concern
in which case the question arises: Is the question which elicited DK response useless? The answer
depends on two points viz., the respondent actually may not know the answer or the researcher may
fail in obtaining the appropriate information. In the first case the concerned question is said to be
alright and DK response is taken as legitimate DK response. But in the second case, DK response is
more likely to be a failure of the questioning process.
How DK responses are to be dealt with by researchers? The best way is to design better type of
questions. Good rapport of interviewers with respondents will result in minimizing DK responses.
But, what about the DK responses that have already taken place? One way to tackle this issue is to
estimate the allocation of DK answers from other data in the questionnaire. The other way is to keep
DK responses as a separate category in tabulation where we can consider it as a separate reply
category if DK responses happen to be legitimate, otherwise we should let the reader make his own
decision. Yet, another way is to assume that DK responses occur more or less randomly and as such
we may distribute them among the other answers in the ratio in which the latter have occurred.
Similar results will be achieved if all DK replies are excluded from tabulation and that too without
inflating the actual number of other responses.
(b) Use of Percentages: Percentages are often used in data presentation for they simplify numbers,
reducing all of them to a 0 to 100 range. Through the use of percentages, the data are reduced in the
standard form with base equal to 100 which fact facilitates relative comparisons. While using
percentages, the following rules should be kept in view by researchers:
1. Two or more percentages must not be averaged unless each is weighted by the group size
from which it has been derived.
2. Use of too large percentages should be avoided, since a large percentage is difficult to
understand and tends to confuse, defeating the very purpose for which percentages are used.
3. Percentages hide the base from which they have been computed. If this is not kept in view,
the real differences may not be correctly read.
4. Percentage decreases can never exceed 100 per cent and as such for calculating the
percentage of decrease, the higher figure should invariably be taken as the base.
5. Percentages should generally be worked out in the direction of the causal-factor in case of
two-dimension tables and for this purpose we must select the more significant factor out of
the two given factors as the causal factor.
Page 9 of 12
7.3 Data Analysis

Elements/Types of Analysis
As stated earlier, by analysis we mean the computation of certain indices or measures along with
searching for patterns of relationship that exist among the data groups. Analysis, particularly in case
of survey or experimental data, involves estimating the values of unknown parameters of the
population and testing of hypotheses for drawing inferences.
Analysis may, therefore, be categorized as descriptive analysis and inferential analysis. Inferential
analysis is often known as statistical analysis. “Descriptive analysis is largely the study of
distributions of one variable. But, this study may also provide us with profiles of companies, work
groups, persons and other subjects on any of a multiple of characteristics such as size, composition,
efficiency, preferences, etc.”. This sort of analysis may be in respect of one variable (described as
unidimensional analysis), or in respect of two variables (described as bivariate analysis) or in
respect of more than two variables (described as multivariate analysis). In this context we work out
various measures that show the size and shape of a distribution(s) along with the study of measuring
relationships between two or more variables.
We may as well talk of correlation analysis and causal analysis. Correlation analysis studies the
joint variation of two or more variables for determining the amount of correlation between two or
more variables. Causal analysis is concerned with the study of how one or more variables affect
changes in another variable. It is thus a study of functional relationships existing between two or
more variables. This analysis can be termed as regression analysis. Causal analysis is considered
relatively more important in experimental researches, whereas in most social and business
researches, our interest lies in understanding and controlling relationships between variables then
with determining causes per se and as such we consider correlation analysis as relatively more
important.
Descriptive Analysis
Descriptive statistical analysis is concerned with numerical description of a particular group
observed and any similarity to those outside the group cannot be taken for granted. The data
describe one group and that one group only.
Page 10 of 12
Much simple research involves descriptive statistics and provides valuable information about the
nature of a particular group or class.
Most commonly used methods of descriptive analysis are:
a. Calculating frequency distribution usually in percentages of items under study.
b. Calculating percentiles and percentile ranks.
c. Calculating measures of central tendency-mean, median and mode and establishing norms.
d. Calculating measures of dispersion - standard deviation/mean deviation, quartile deviation
and range.
e. Calculating measures of relationship - coefficient of correlation, reliability and validity by
the rank-difference and product moment methods.
f. Graphical presentation of data – frequency, polygon curve, histogram, etc. While analyzing
their data investigations usually make use of as many of the above simple statistical devices
as necessary for the purpose of their study.
Inferential Analysis
When a particular finding emerges from data analysis the manager asks whether the empirical
findings represent the true picture or have occurred as a result of sampling accident. Statistical
inference is the process where we generalize from sample results to a population from which the
sample has been drawn. Thus, statistical inference is the process where we extend our knowledge
obtained from a random sample, which is only a small part of the population to the whole
population. Broadly, statistical inference involves hypothesis testing.
Hypotheses Testing
If we find a difference between two samples, we would like to know, is this a “real” difference (i.e.,
is it present in the population) or just a “chance” difference (i.e. it could just be the result of random
sampling error).
Hypothesis begins with an assumption called a hypothesis that we make about a population
parameter. We then collect sample data and calculate sample statistics such as mean, standard
deviation to decide how likely it is that our hypothesized population parameter is correct.
Essentially, the process involves judging whether a difference between a sample and assumed
population value is significant or not. The smaller the difference the greater the chance that our
hypothesized value for the mean is correct.
Some examples of real world situations where we might want to test hypotheses:
Page 11 of 12
 A random sample of 100 families in Addis Ababa finds that they consume more of a
particular brand of Addis tea per family than a random sample of 100 families in Adama.
It could be that the observed difference was caused by sampling accident and that there is
actually no difference between the two populations. However, if the results are not caused
by pure sampling fluctuations, then we have a case for the firm to take some further
marketing action based on sampling finding.
 Colgate Palmolive has decided that a new TV ad campaign can only be justified if more than
55% of viewers see the ads. In this case, the company requests a marketing research
company to carry out a survey to assess viewership. The agency comes back with an ad
penetration of 50% for a random sample of 1000. It is now the company’s problem to assess
whether the sample viewing proportion is representative of the hypothesized level of
viewership that the company desires, i.e. 55%. Can differences between the two proportions
be attributed to sampling error or is the ads true viewership actually lower.
Page 12 of 12

Chapter 7 - BRM

Uploaded by

Copyright:

Available Formats

Chapter 7 - BRM

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 7 - BRM

Uploaded by

Copyright:

Available Formats

AAU DEPARTMENT OF PADM Research Methods for Public Administration

Chapter Seven: Data Analysis

b) Classification According to Class-Intervals: Unlike descriptive characteristics, the numerical

Some Problems in Processing

7.3 Data Analysis

You might also like