Module 6 Data Analysis and Report Preparation
Module 6 Data Analysis and Report Preparation
Module 6 Data Analysis and Report Preparation
Data cleaning is one of the important processes involved in data analysis, with it being the first
step after data collection. It is a very important step in ensuring that the dataset is free of inaccurate
or corrupt information.
It can be carried out manually using data wrangling tools or can be automated by running the data
through a computer program. There are so many processes involved in data cleaning, which makes
it ready for analysis once they are completed.
Data cleaning is the process of modifying data to ensure that it is free of irrelevances and incorrect
information. Also known as data cleansing, it entails identifying incorrect, irrelevant, incomplete,
and the “dirty” parts of a dataset and then replacing or cleaning the dirty parts of the data.
Although sometimes thought of as boring, data cleansing is very valuable in improving the
efficiency of the result of data analysis. It generally helps to improve data quality, and the
process can be automated or done manually.
The process of data cleansing may involve the removal of typographical errors, data validation,
and data enhancement. This will be done until the data is reported to meet the data quality
criteria, which include; validity, accuracy, completeness,
Data Processing:-
Data continues to be in raw form, unless and until they are processed and analyzed.Processing is
a statistical method by which the collected data is so organized the further analysis and
interpretation of data become easy. It is an intermediary stage between the collection of data and
their analysis and interpretation.
Processing stages:-
There are four important stages in the processing of data. They are;
1. Editing
2. Coding
3. Classification
4. tabulation
Editing:-
As soon as the researcher receives the data, he should screen it for accuracy. Editing is the
process of examining the data collected through various methods to detect errors and omissions
and correct them for further analysis. Though editing, it is ensured that the collected data are
Miss.S.A.Jagtap Page 1
RM Chapter No-6 Data Analysis and Report Preparation
accurate, consistent with other facts gathered, uniformly entered and well arranged so that further
analysis is made easier.
Practical guidelines for editing:-
While editing care has to be taken to see that the data are as accurate and complete as possible.
The following points are to be noted;
1. The editor should familiarize with the copy of instructions given to the interviewers.
2. The original entry, if found incorrect, should not be destroyed or erased. On the other hand,
it should be crossed out in such a manner that it is still eligible.
3. Any, modification to the original entry by the editor must be specifically indicated.
4. All completed schedules must bear signature of the editor an d the date.
5. Incorrect answer to the questions can be corrected only if the editor is absolutely sure of the
answer, otherwise leave it as such.
6. Inconsistent, incomplete or missing answers should not be used.
7. Sere that all numerical answers are converted to same units.
Coding
Coding is the process by which r response categories are summarized by numerals or other symbols
to carry out subsequent operations of data analysis. This process of assigning numerals or symbols
to the responses is called coding. It facilitates efficient analysis of the collected data and helps in
reducing several replies to a small number of classes which contain the critical information
required for analysis. In general it reduces the huge amount of information collected in to a form
that is amenable to analysis.
Steps in coding
1. Study the answers carefully.
2. Develop a coding frame by listing the answers and by aligning codes to each of them.
3. Prepare a coding manual with the detail of variable names, codes and instructions.
4. If the coding manual has already been prepared before the collection of the data, make the
required additions for the open ended and partially coded questions.
Coding rules
1. Give each respondent a code number for identification.
2. Provide code number for each question.
3. All responses including ‘don’t know’, ‘no opinion’. Etc is to be coded.
4. Assign additional codes to partially coded questions.
Classification
Classification is the process of reducing large mass of data in to homogeneous groups for
meaningful analysis. It converts data from complex to understandable and unintelligible to
intelligible forms. It divides data in to different groups or classes according to their similarities
and dissimilarities. When the data are classified, they give summary of whole information.
Objectives of classification
1. To organize data in to concise, logical and intelligible form.
2. To take the similarities and dissimilarities s between various classes clear.
3. To facilitate comparison between various classes of data.
4. To help the researcher in understanding the significance of various classes of data.
5. To facilitate analysis and formulate generalizations.
Types of classification
Miss.S.A.Jagtap Page 2
RM Chapter No-6 Data Analysis and Report Preparation
Tabulation
Tabulation is the next step to classification. It is an orderly arrangement of data in rows and
columns. It is defined as the “measurement of data in columns and rows”. Data presented in
tabular
form is much easier to read and understand than the data presented in the text the main purpose
of
tabulation is to prepare the data for final analysis. It is a stage between classification of data and
final analysis.
Objectives of Tabulation
1. To clarify the purpose of enquiry
2. To make the significance of data clear.
Miss.S.A.Jagtap Page 3
RM Chapter No-6 Data Analysis and Report Preparation
Miss.S.A.Jagtap Page 4
RM Chapter No-6 Data Analysis and Report Preparation
Bar Graphs: Bar graphs are used to display the frequency distributions for variables measured
at the nominal and ordinal levels. Bar graphs use the same width for all the bars on the graph,
and there is space between the bars. Label the parts of the graph, including the title, the left (Y)
or vertical axis, the right (X) or horizontal axis, and the bar labels.
Miss.S.A.Jagtap Page 5
RM Chapter No-6 Data Analysis and Report Preparation
For example, the following table shows the average injury rate per 1,000 employees for counties
in State X for the years 1980 to 1990.
Year 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
Rate 3.6 4.2 3.4 5.5 3.8 3.1 1.7 1.8 1.0 1.6 0.9
Miss.S.A.Jagtap Page 6
RM Chapter No-6 Data Analysis and Report Preparation
A cumulative frequency polygon is used to display the cumulative distribution of values for a
variable.
PIE CHART: Another way to show the relationships between classes or categories of a variable
is in a pie or circle chart. In a pie chart, each "slice" represents the proportion of the total
phenomenon that is due to each of the classes or groups
ANALYSIS OF DATA
Analysis of data is considered to be highly skilled and technical job which should be carried out
.Only by the researcher himself or under his close supervision. Analysis of data means critical
examination of the data for studying the characteristics of the object under study and for
determining the patterns of relationship among the variables relating to it’s using both quantitative
and qualitative methods.
Purpose of Analysis
Statistical analysis of data saves several major purposes.
1. It summarizes large mass of data in to understandable and meaningful form.
2. It makes descriptions to be exact.
3. It aids the drawing of reliable inferences from observational data.
4. It facilitates identification of the casual factors unde3rlyiong complex phenomena
5. It helps making estimations or generalizations from the results of sample surveys.
6. Inferential analysis is useful for assessing the significance of specific sample results under
Miss.S.A.Jagtap Page 7
RM Chapter No-6 Data Analysis and Report Preparation
Miss.S.A.Jagtap Page 8
RM Chapter No-6 Data Analysis and Report Preparation
Univariate Analysis:
➢ Univariate techniques are used for analyzing data when there is a single measurement of
each element in the sample or when there are several measurements on each element but
each variable is analyzed in isolation.
➢ These techniques focus on averages and distributions. Univariate analysis, looking at single
variables, is typically the first procedure one does when examining data being used for the
first time.
➢ Univariate analysis explores each variable in a data set, separately. It looks at the range of
values, as well as the central tendency of the values.
➢ It describes the pattern of response to the variable. It describes each variable on its own.
Raw Data: Obtain a printout of the raw data for all the variables. Raw data resembles a matrix,
with the variable names heading the columns, and the information for each case or record displayed
across the rows.
Example: Raw data for a study of injuries among county workers (first 10 cases)
Injury Report No. County (Region) Name Cause of Injury Severity of Injury
1 County A Fall 3
2 County B Auto 4
3 County C Fall 6
4 County C Fall 4
5 County B Fall 5
6 County A Violence 9
7 County A Auto 3
8 County A Violence 2
9 County A Violence 9
10 County B Auto 3
It is difficult to tell what is going on with each variable in this data set. Raw data is difficult to
grasp, especially with large number of cases or records. Univariate descriptive statistics can
summarize large quantities of numerical data and reveal patterns in the raw data. In order to present
the information in a more organized format, start with univariate descriptive statistics for each
variable.
Miss.S.A.Jagtap Page 9
RM Chapter No-6 Data Analysis and Report Preparation
Bivariate analysis
Bivariate analysis is the simultaneous analysis of two variables (attributes). It explores the concept
of relationship between two variables, whether there exists an association and the strength of this
association, or whether there are differences between two variables and the significance of these
differences.
Cross-Tabulation: In simple tabulation, the frequency and the percentage for each question was
calculated. In cross tabulation, responses to two questions are combined and data is tabulated
together. A cross tabulation counts the number of observations in each cross- category of two
variables.
For Example: In Cross tabulating, two category measure of income (low and high income
households) with two category measure of purchase intension of a product (low and high purchase
intensions), the basic result is a cross-classification as shown in following table:
Income
Low Income High Income
Purchase Intension Low Purchase 120 60
Intension
High Purchase 80 190
Intension
200 250
The results of Cross-tabulation shows the number of sample respondents with low income having
low purchase intension, low income with high purchase intension, high income with low purchase
intension and high income with high purchase intension.
To make the cross tabulations more meaningful, frequencies can be computed in percentage. The
percentage can be computed in three different ways. (i) Row-wise so that the percentages in each
row add up to 100%. (ii) Column-wise so that the percentages in each column add up to 100% or
(iii) Cell percentages, such that percentages across all cells equal 100 percent. The interpretation
of percentages is most useful to the researcher.
The basis for calculating category percentage depends upon the nature of relationship between the
variables. One of the variables could be viewed as dependent variable and the other one as
independent variable. In the following table purchase intension could be treated as dependent
variable, which depends upon income (independent variable).
Miss.S.A.Jagtap Page 10
RM Chapter No-6 Data Analysis and Report Preparation
Income
Low Income High Income
Purchase Intension Low Purchase 60% 24%
Intension
High Purchase 40% 76%
Intension
100% 100%
Cross-Table of purchase intension and income (column- wise percentage)
By interpreting the above table, we can conclude that, 60% of respondents with low income have
low purchase intensions for the product. By calculating column wise, it is seen that 24% have low
purchase intension whereas 76% have high purchase intensions.
• does not deal with causes or relationships • deals with causes or relationships
• the major purpose of univariate analysis is to • the major purpose of bivariate analysis is to
describe explain
• bar graph, histogram, pie chart, line • tables where one variable is contingent on the
graph, box-and-whisker plot values of the other variable.
Sample question: How many of the students in the Sample question: Is there a relationship between the
freshman class are female? number of females in Computer Programming and
their scores in Mathematics?
Miss.S.A.Jagtap Page 11
RM Chapter No-6 Data Analysis and Report Preparation
Interpretation
Interpretation refers to the technique of drawing inference from the collected facts and explaining
the significance of those inferences after an analytical and experimental study. It is a
search for broader and more abstract means of the research findings. If the interpretation is not
done very carefully, misleading conclusions may be drawn. The interpreter must be creative of
ideas he should be free from bias and prejudice.
Fundamental principles of interpretation
1. Sound interpretation involves willingness on the part of the interpreter to see what is in the
data.
2. Sound interpretation requires that the interpreter knows something more than the mere
figures.
3. Sound interpretation demands logical thinking.
4. Clear and simple language is necessary for communicating the interpretation.
Need for interpretation (importance of interpretation.)
1. It is through interpretation that the interpreter is able to know the abstract principles lying in
his conclusions.
2. On the basis of the principles underlying his findings, a researcher can make various
predictions about the various other events which are unrelated to his area of findings.
3. Interpretation leads to the establishment of explaining concepts.
4. A researcher can appreciate only through interpretation, why his findings are and what they
Are.
5. The interpretation of the findings of exploratory research study usually results in to
hypothesis for experimental research.
Steps involved in the technique of interpretation
1. Researcher must give reasonable explanations of the relations he have found. He must be
able to see uniformity in diversified research findings so that generalization of findings is
possible.
2. If any extraneous information is collected during the study, it must be considered while
interpreting the final result of research study.
3. The researcher can consult with those having insight in to the study who can point out the
omission and errors in logical arguments.
4. The researcher must consider all relevant factors affecting the problem at the time of
interpretation.
5. The conclusions appearing correct at the beginning may prove to be inaccurate later. So
researcher must not be in a hurry while interpreting.
Miss.S.A.Jagtap Page 12
RM Chapter No-6 Data Analysis and Report Preparation
Parametric Tests:
The population mean (μ), standard deviation (s) and proportion (p) are called the
parameters of a distribution.
Tests of hypotheses concerning the mean and proportion are based on the assumption that
the population(s) from where the sample is drawn is normally distributed.
Tests based on the above parameters are called parametric tests.
Non-Parametric Tests:-
There are situations where the populations under study are not normally distributed. The
data collected from these populations is extremely skewed. Therefore, the parametric tests
are not valid.
The option is to use a non-parametric test. These tests are called the distribution-free tests
as they do not require any assumption regarding the shape of the population distribution
from where the sample is drawn.
These tests could also be used for the small sample sizes where the normality assumption
does not hold true.
Advantages of Non-Parametric Tests:-
They can be applied to many situations as they do not have the rigid requirements of their
parametric counterparts, like the sample having been drawn from the population following
a normal distribution.
There can be applications where a numeric observation is difficult to obtain but a rank
value is not. By using ranks, it is possible to relax the assumptions regarding the underlying
populations.
Non-parametric tests can often be applied to the nominal and ordinal data that lack exact
or comparable numerical values.
Non-parametric tests involve very simple computations compared to the corresponding
parametric tests.
Disadvantages of Non-Parametric Tests:-
A lot of information is wasted because the exact numerical data is reduced to a qualitative
form. The increase or the gain is denoted by a plus sign whereas a decrease or loss is
denoted by a negative sign. No consideration is given to the quantity of the gain or loss.
Non-parametric methods are less powerful than parametric tests when the basic
assumptions of parametric tests are valid.
Null hypothesis in a non-parametric test is loosely defined as compared to the parametric
tests. Therefore, whenever the null hypothesis is rejected, a non-parametric test yields a
less precise conclusion as compared to the parametric test.
Difference between Parametric & Non-parametric Tests:-
Miss.S.A.Jagtap Page 13
RM Chapter No-6 Data Analysis and Report Preparation
Miss.S.A.Jagtap Page 14
RM Chapter No-6 Data Analysis and Report Preparation
Compare the sample value of the statistic as obtained in previous step with the critical value
at a given level of significance and make the decision.
Chi-square test for goodness of fit
The hypothesis to be tested in this case is:
H0 : Probabilities of the occurrence of events E1, E2, ..., Ek are given by the specified probabilities
p1, p2, ..., pk
H1 : Probabilities of the k events are not the pi stated in the null hypothesis.
The procedure has already been explained.
Chi-square test for independence of variables
The chi-square test can be used to test the independence of two variables each having at least two
categories. The test makes a use of contingency tables also referred to as cross-tabs with the cells
corresponding to a cross classification of attributes or events. A contingency table with three rows
and four columns (as an example) is as shown below.
Assuming that there are r rows and c columns, the count in the cell corresponding to the i th row
and the jth column is denoted by Oij, where i = 1, 2, ..., r and j = 1, 2, ..., c. The total for row i is
denoted by Ri whereas that corresponding to column j is denoted by Cj. The total sample size is
given by n, which is also the sum of all the r row totals or the sum of all the c column totals.
The hypothesis test for independence is:
H0 : Row and column variables are independent of each other.
H1 : Row and column variables are not independent.
The hypothesis is tested using a chi-square test statistic for independence given by:
Miss.S.A.Jagtap Page 15
RM Chapter No-6 Data Analysis and Report Preparation
For a given level of significance α, the sample value of the chi-square is compared with the critical
value for the degree of freedom (r – 1) (c – 1) to make a decision.
Miss.S.A.Jagtap Page 16
RM Chapter No-6 Data Analysis and Report Preparation
A research report is considered a major component of any research study as the research remains
incomplete till the report has been presented or written. No matter how good a research study, and how
meticulously the research study has been conducted, the findings of the research are of little value unless
they are effectively documented and communicated to others. The research results must invariably enter
the general store of knowledge. Writing a report is the last step in a research study and requires a set of
skills somewhat different from those called for in actually conducting a research
Provides a framework for the work that can be conducted in the same or related
areas
Clear, concise and accurate • Easy for the audience to understand • Appropriate for the audience
• Well organised with clear section headings.
Miss.S.A.Jagtap Page 17
RM Chapter No-6 Data Analysis and Report Preparation
Purpose of Research Report:- It is formal statement of the research process and its results. • It narrates
the problem studied method used for studying it and the findings and conclusion of the study.
Characteristics of a Report:-
It is narrative but authoritative document on the outcome of a research effort. • It presents highly specific
information for a clearly designated audience. • It is non persuasive has a form of communication. • It is
a simple readable and accurate form of communication.
Miss.S.A.Jagtap Page 18
RM Chapter No-6 Data Analysis and Report Preparation
Technical Report/Thesis:-
It is a comprehensive full report of the research process and it’s out come. • It is meant for academic
community. • It is a formal long report covering all the aspects of research process – statement of the
problem, objectives, methods and techniques used sampling, field and other research procedures,
sources, tools and methods, data processing, analysis, findings, conclusions and suggestions.
Popular Report:-
This type of report is designed for an audience of executives/administrators and other non-technical
users. • The reader is less concerned with methodological details but more interested in studying quickly
the major findings and conclusion.
Interim Report:-
When there is a long time lag between data collection and the presentation of the results in the case of a
sponsored project, the study may lose its significance and usefulness and the sponsor may also lose
interest in it. • On of the most effective ways to avoid such eventualities are to present an interim report.
• Intended to last for only a short time until some thing concrete is found. • The interim report contains
a narration of what has been done so far and what were its outcome. It facilitates the sponsoring agency
to take action without waiting for the full report.
Summary Report:-
A summary report is generally prepared for the consumption of the lay audience namely the general
public. • The preparation of this type of report is desirable for any study whose findings are of general
interest. • It is written in non-technical language with a liberal use of pictorial charts. • It just contains a
brief reference to the objective of the study, its major findings and their implications. • It is a short report
of two or three pages. • Its size is so limited as to be suited for publication in daily newspapers.
Research Abstract:-
This is a short summary of the technical report. • It is usually prepared by doctoral student before
submitting his thesis. • Its copies are sent by the University along with the letters of request to the
examiners invited to evaluate the thesis. • It contains a brief presentation of the statement of the
problem, the objectives of the study, methods and techniques used and an overview of the report. • A
brief summary of the results of the study may also be added. • This abstract is primarily meant for enabling
the examinerinvitees to decide whether the study belongs to the area of their specialization and interest.
Miss.S.A.Jagtap Page 19
RM Chapter No-6 Data Analysis and Report Preparation
Report structure:-
Preliminary Section
▪ Title Page
▪ Letter of Authorization
▪ Executive Summary
▪ Acknowledgements
▪ Table of Contents
Background Section
▪ Problem Statement
▪ Study Introduction & Background
▪ Scope & Objectives of the Study
▪ Review of Literature
Methodology Section
▪ Research Design
▪ Sampling Design
▪ Data Collection
▪ Data Analysis
Findings Section
▪ Results
▪ Interpretation of Results
Conclusions Section
▪ Conclusion & Recommendations
▪ Limitations of the Study
Appendices
Glossary of terms
Bibliography
Miss.S.A.Jagtap Page 20
RM Chapter No-6 Data Analysis and Report Preparation
Anybody, who is reading the research report, must necessarily be conveyed enough about the study
so that he can place it in its general scientific context, judge the adequacy of its methods and thus
form an opinion of how seriously the findings are to be taken. For this purpose there is the need of
proper layout of the report. The layout of the report means as to what the research report should
contain. A comprehensive layout of the research report should comprise preliminary pages, the
main text and the end matter. Let us deal with them separately.
Preliminary Pages-
In its preliminary pages the report should carry a title and date, followed by acknowledgements in
the form of ‘Preface’ or ‘Foreword’. Then there should be a table of contents followed by list of
tables and illustrations so that the decision-maker or anybody interested in reading the report can
easily locate the required information in the report.
Main Text
The main text provides the complete outline of the research report along with all details. Title of
the research study is repeated at the top of the first page of the main text and then follows the other
details on pages numbered consecutively, beginning with the second page. Each main section of
the report should begin on a new page. The main text of the report should have the following
sections:
1. Introduction
2. Statement of findings and recommendations
3. The results
4. The implications drawn from the results; and
5. The summary.
1. Introduction: The purpose of introduction is to introduce the research project to the readers.
It should contain a clear statement of the objectives of research i.e., enough background
should be given to make clear to the reader why the problem was considered worth
investigating. A brief summary of other relevant research may also be stated so that the
present study can be seen in that context. The hypotheses of study, if any, and the
definitions of the major concepts employed in the study should be explicitly stated in the
introduction of the report.
The methodology adopted in conducting the study must be fully explained. The scientific
reader would like to know in detail about such thing: How was the study carried out? What
was its basic design? If the study was an experimental one, then what were the experimental
Miss.S.A.Jagtap Page 21
RM Chapter No-6 Data Analysis and Report Preparation
2. Statement of findings and recommendations: After introduction, the research report must
contain a statement of findings and recommendations in non-technical language so that it
can be easily understood by all concerned. If the findings happen to be extensive, at this
point they should be put in the summarised form.
3. Results: A detailed presentation of the findings of the study, with supporting data in the
form of tables and charts together with a validation of results, is the next step in writing the
main text of the report. This generally comprises the main body of the report, extending
over several chapters. The result section of the report should contain statistical summaries
and reductions of the data rather than the raw data. All the results should be presented in
logical sequence and splitted into readily identifiable sections. All relevant results must
find a place in the report. But how one is to decide about what is relevant is the basic
question. Quite often guidance comes primarily from the research problem and from the
hypotheses, if any, with which the study was concerned. But ultimately the researcher must
rely on his own judgement in deciding the outline of his report. “Nevertheless, it is still
necessary that he states clearly the problem with which he was concerned, the procedure
by which he worked on the problem, the conclusions at which he arrived, and the bases for
his conclusions.
4. Implications of the results: Toward the end of the main text, the researcher should again
put down the results of his research clearly and precisely. He should, state the implications
that flow from the results of the study, for the general reader is interested in the implications
for understanding the human behaviour. Such implications may have three aspects as stated
below:
▪ A statement of the inferences drawn from the present study which may be expected
to apply in similar circumstances.
▪ The conditions of the present study which may limit the extent of legitimate
generalizations of the inferences drawn from the study.
▪ The relevant questions that still remain unanswered or new questions raised by the
study along with suggestions for the kind of research that would provide answers
Miss.S.A.Jagtap Page 22
RM Chapter No-6 Data Analysis and Report Preparation
for them. It is considered a good practice to finish the report with a short conclusion
which summarises and recapitulates the main points of the study. The conclusion
drawn from the study should be clearly related to the hypotheses that were stated
in the introductory section. At the same time, a forecast of the probable future of
the subject and an indication of the kind of research which needs to be done in that
particular field is useful and desirable.
Summary: It has become customary to conclude the research report with a very brief summary,
resting in brief the research problem, the methodology, the major findings and the major
conclusions drawn from the research results.
End Matter
At the end of the report, appendices should be enlisted in respect of all technical data such as
questionnaires, sample information, mathematical derivations and the like ones. Bibliography of
sources consulted should also be given. Index (an alphabetical listing of names, places and topics
along with the numbers of the pages in a book or report on which they are mentioned or discussed)
should invariably be given at the end of the report. The value of index lies in the fact that it works
as a guide to the reader for the contents in the report.
Miss.S.A.Jagtap Page 23