Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pdf or txt
Download as pdf or txt
You are on page 1of 111

RESEARCH

METHODOLOGY
NOTES
1. AN OVERVIEW OF RESEARCH METHODOLOGY

Is Research Science or Art?

• Objectivity of Investigator
– Unbiased
– Procedural integrity
– Accurate reporting

• Accuracy of Measurement
– Valid and Reliable
– Meaningful and useful
– Appropriate design (sample, execution)

• Open-minded to Findings
– Willing to refute expectations
– Acknowledge limitations

It is an art of scientific investigation

• A movement from Known to unknown


• A voyage of discovery

Objectives of Research
To achieve new insight into a phenomenon (exploratory / formulative research.

2. To portray accurately the characteristics of a particular individual situation or a group


(Descriptive Research).

3. To determine the frequency with which something occurs or with which it is


associated with something else (Diagnostic Research).

4. To test a hypothesis of a causal relationship between variables (Hypothesis Testing


Research).

Why Research?
• Taking the challenge to solve an unsolved problem.
• Desire to get intellectual satisfaction of doing some creative work.
• Desire to get a research degree.
• Desire to move up the career ladder in the academic institutions.
• Desire to be of service to the society.

Significance of Research
1. Research inculcates scientific and inductive thinking and it promotes the development
of logical habits of thinking.

2. Research provides the basis for nearly all govt. policies in our economic system.
3. It helps to solve various operational and planning problems of business and industry.
Market research / operations research/demand forecasting

Conceptualizing the Research


• Curiosity and intuition play an important role

• What concept or puzzling phenomena is interesting?

The Research Process

(8) (2)
(7) Follow-up Determine the
Preparing and Research
Presenting the Design
(1)
Report
Identifying the
Research (3)
(6) Problem/Opport
Analyzing the Determine Data
unity Collection
Data
Method
(5)
Design Sample (4)
and Collect Design Data
Data Collection
Forms

Key Properties of Research

Validity - Have you measured (or observed) what you think you have? Were the instruments
used suitable for purpose? Have you adequately and faithfully captured the ‘state of affairs’?

Reliability
Even if the methods are valid, can we be sure that the data are consistent and a true
reflection of the phenomena under study?

Replicability
This is essential in scientific work, it means that the work has been done and described in
such a way that it is repeatable. In social science exact replication is often impossible, but
similar studies and to the weight of evidence.

Generalisability
Are the findings generally applicable, for example to other contexts, situations, times, or
persons other than the sample?
Establishing Credibility
Credibility is a property of good research.

Care and attention in planning and conducting the research.

2
Care and attention in writing it up in such a way that readers have confidence in the
integrity of the work.

Research reputations are established by repeatedly carrying out interesting and


worthwhile work that is consistently methodologically strong and accurately reported.

Approaches to the Research: Deductive and Inductive

Deductive Approach:

Deductive reasoning works from the more general to the more specific. It is a "top-down"
approach. We might begin with thinking up a theory about our topic of interest. We then
narrow that down into more specific hypotheses that we can test. We narrow down even
further when we collect observations to address the hypotheses. This ultimately leads us to
be able to test the hypotheses with specific data -- a confirmation (or not) of our original
theories.

Inductive Approach:
Inductive reasoning works the other way, moving from specific observations to broader
generalizations and theories. This is a "bottom up" approach. In inductive reasoning, we
begin with:

• specific observations and measures,


• begin to detect patterns and regularities,
• formulate some tentative hypotheses that we can explore, and finally end up
developing some general conclusions or theories.

Types of Research
1. Descriptive (Ex-post facto research) Vs. Analytical (Critical Evaluation of the
material).
2. Applied (Action) Vs. Fundamental (Basic or Pure).
3. Quantitative (Inferential/experimental/ simulation) Vs. Qualitative.
4. Conceptual (abstract idea or theory) Vs. Empirical (Experience or observations based
on data)
5. Longitudinal Research (Over a time period such as clinical or diagnostic research) Vs.
Laboratory or Simulation Research.

Quantitative and Qualitative Research


Quantitative Research
is an inquiry into an identified problem, based on testing a theory, measured with numbers,
and analyzed using statistical techniques. The goal of quantitative methods is to determine
whether the predictive generalizations of a theory hold true. All quantitative research
requires a hypothesis before research can begin.

Qualitative Research
By contrast, a study based upon a qualitative process of inquiry has the goal of
understanding a social or human problem from multiple perspectives. Qualitative research

3
is conducted in a natural setting and involves a process of building a complex and holistic
picture of the phenomenon of interest. In qualitative research, a hypothesis is not needed
to begin research.

• Quantitative Research

In quantitative research, the researcher is ideally an objective observer who neither


participates in nor influences what is being studied.

• Qualitative Research

In qualitative research, however, it is thought that the researcher ca n learn the most by
participating and/or being immersed in a research situation

Characteristics of quantitative and qualitative research


Quantitative Qualitative
Objective Subjective
Research questions: How many? Strength Research questions: What?
of association?
"Hard" science "Soft" science
Literature review must be done early in Literature review may be done as study
study progresses or afterwards
Test theory Develops theory
One reality: focus is concise and narrow Multiple realities: focus is complex and
broad
Facts are value-free and unbiased Facts are value-laden and biased
Reduction, control, precision Discovery, description, understanding,
shared interpretation
Measurable Interpretive
Mechanistic: parts equal the whole Organismic: whole is greater than the parts
Report statistical analysis. Report rich narrative, individual;
Basic element of analysis is numbers interpretation. Basic element of analysis is
words/ideas.
Researcher is separate Researcher is part of process
Subjects Participants
Context free Context dependent
Hypothesis Research questions
Reasoning is logistic and deductive Reasoning is dialectic and inductive
Establishes relationships, causation Describes meaning, discovery
Uses instruments Uses communications and observation
Strives for generalization Strives for uniqueness
Generalizations leading to prediction, Patterns and theories developed for
explanation, and understanding understanding
Highly controlled setting: experimental Flexible approach: natural setting (process
setting (outcome oriented) oriented)
Sample size: n Sample size is not a concern; seeks
"informal rich" sample

Which one to choose?


Choose a more quantitative method when most of the following conditions apply:

4
 The research is confirmatory rather than exploratory i.e. this is a frequently
researched topic, and (numerical) data from earlier research is available.

 You are trying to measure a trend (almost impossible with qualitative


research).

 There is no ambiguity about the concepts being measured, and only one
way to measure each concept.

 The concept is being measured on a ratio or ordinal scale.

And choose a qualitative method when most of these conditions apply:


 You have no existing research data on this topic.

 The most appropriate unit of measurement is not certain (Individuals?


Households? Organizations?)

 The concept is assessed on a nominal scale, with no clear demarcation


points.

 You are exploring the reasons why people do or believe something.

One extreme example:

 You are studying the trends in weather in the town where you live. There aren't many
variables: temperature ranges, wind speed, rainfall, barometric pressure, and perhaps
a few others. Most of the variables are measured mechanically, and a lot of historical
data exists. You wouldn't even consider doing qualitative research on this.

Research Methods vs Methodology

Research Methods:
Methods of data collection
Statistical methods used for establishing relationships between the data and the
unknowns.
Methods used to evaluate the accuracy of results obtained.

Research Methodology:
Research Methods
Consideration of the logic behind the methods we use.

Research Process
Series of actions or steps necessary to effectively carry out research and the desired
sequencing of these steps.

A. Formulating the Research Problem:


• Understanding the problem thoroughly, and

• Rephrasing the same into meaningful terms from an analytical point of view.

5
• Studies of broader literature (Conceptual and empirical)
This stage is important because
1. The research problem needs to be defined unambiguously.
2. It helps to collect the relevant data, choice of research methods etc.

B. Extensive Literature Survey:


1. Abstracting and indexing the journals published or unpublished bibliographies.
2. Identifying the academic journals, conference proceedings, govt. reports, books
etc.
C. Development of Working Hypothesis:
• Tentative assumptions made in order to draw out and test its logical or empirical
consequences.
• It is prior thinking about the subject.

D. Preparing the Research Design – Exploration / Description / Diagnosis /


Experimentation.
Research design needs consideration of the following:
• The means of obtaining the information
• The availability and skills of the researcher
• Time available
• Cost factor

E. Determining the sample size – Probability / Non-probability.


Probability:
Simple Random Sampling
Systematic Sampling
Cluster/Area Sampling

Non-Probability / Purposive /Deliberate sampling:


Convenience Sampling
Judgment Sampling
Quota Sampling

F. Collecting the Data: Collection of only appropriate data

Primary Data-
By observation, through personal interviews, telephone interviews and by mailing
of questionnaire

Secondary Data

G. Analysis of Data:
1. Computation of statistics viz., mean, median, mode, standard Deviation,
coefficient of variation, coefficient of skewness etc.

2. Designing regression equation for estimating dependent variable as a function of


a set of independent variables.

3. Performing correlation analysis.

6
4. Factor, Discriminat, Conjoint analysis

H. Hypotheses Testing:

I. Interpretation of Results:

J. Validation of Results: The results after interpretation must be validated by using


past data. It ensures credibility of the results.

K. Report Writing

7
2. RESEARCH DESIGN

RESEARCH DESIGN
It is a conceptual structure within which research is conducted. It constitutes the blue
print for the collection, measurement and analysis of data

The research design addresses the following questions:


1. What is the study about?
2. Why is the study being done?
3. Where will the study be carried out?
4. What type of data is required?
5. Where can the required data be found?
6. What periods of time will the study include?
7. What will be the sample design?
8. What techniques of data collection will be used?
9. How will the data be analyzed?
10. Style of Report.

Thus the Essentials of the Research Design are:


• The design is an activity and time-based plan.
• The design is always based on the research question.
• The design guides the selection of the sources and types of information.
• The design is a framework for specifying the relationships among the study’s
variables.
• The design outlines procedures for every research activity.

Research Design Type of Study


Exploratory/ Formulative Descriptive / Diagnostic
Overall Design Flexible (for considering Rigid Design
different parts of the problem
Sampling Design Non-probability (purposive or Probability (Random
Judgment) Sampling)

Statistical Design No Pre-planned design for Pre-planned design for


analysis analysis
Observational Design Unstructured instruments for Structured or well thought
collection of data out instruments for
collection of data.
Opeartional Design NO fixed decisions about the Advanced decisions about
operational procedures operational procedures.

Determining Research Design


• Exploratory Research: collecting information in an unstructured and informal
manner.

• Descriptive Research: refers to a set of methods and procedures describing


marketing variables.

8
• Causal Research (experiments and other approaches): allows isolation of causes
and effects via use of experiment or surveys.

Components of Research Design


A. Sampling Designs.
Probability: Simple Random Sampling, Systematic Sampling, Stratified Random
Sampling, Cluster Sampling, Multi-stage Sampling

Non-Probability: Convenience Sampling, Judgment Sampling, Quota


sampling, Snowball Sampling

B. Statistical Designs (Sample size, collection and analyses of data).

C. Operational Designs (Techniques by which the procedure specified in the sampling,


statistical and observational designs can be carried out).

Features of a Good Design


1. Generally, the design which minimizes biases and maximizes the reliability of the
data collected and analyzed is considered a good design.

2. A good research design involves consideration of the following factors:


a. The means of obtaining information
b. The availability and skills of the researcher and the staff,
if any
c. The objective of the problem to be studied
d. The nature of the problem to be studied.
e. The availability of time and money for the research work.

Important Concepts Relating to Research Design


1. Dependent and independent variables

1. Extraneous variables:
(Independent variables that are not related to the purpose of the study but may affect
the dependent variable are terms as extraneous variables).

3. Control:
(The technical term is used when we design the study minimizing the effects of
extraneous independent variables).

4. Confounded relationships
(When the dependent variable is not free from the influence of extraneous variables).

5. Research Hypotheses:
The research hypothesis is a predictive statement that relates an independent
variable to a dependent variable.

Predictive statements which are assumed but not to be tested are not termed as
research hypotheses.

9
Experimental and non-experimental hypothesis testing research:
For example mother’s age 15-45 overall group above 15 years.

7. Experimental and control groups: For example dummy variable of caste / religion.

10
3. METHODS OF DATA COLLECTION

Methods of Data Collection


• Primary data: information that is developed or gathered by the researcher
specifically for the research project at hand

• Secondary data: information that has previously been gathered by someone other
than the researcher and/or for some other purpose than the research project at hand

Classification of Secondary Data


• Internal secondary data: data that have been collected within the firm

• Internal databases: databases (collection of data and information describing items


of interest) consisting of information gathered by a company typically during the
normal course of business transactions

• External secondary data: data obtained from outside the firm


• Types:
• Published
• Syndicated Services Data
• External Databases

• External secondary data


• Published: sources of information prepared for public distribution and
found in libraries or a variety of other entities

• Syndicated Services Data: data provided by firms that collect data in a


standard format and make them available to subscribing firms

• External secondary data


• External Databases: databases provided by outside firms; many are now
available online (online information databases)
• Bibliographic databases..citations by subject
• Numeric or statistical databases, 2001 Census
• Government Reports, Other Studies
• Directory or list databases
• Comprehensive databases, Contain all of the above

Advantages of Secondary Data


• Obtained quickly (compared to primary data gathering)
• Inexpensive (compared to primary data gathering)
• Usually available
• Enhances existing primary data

Limitations of Secondary Data


• Exact data that one may need may not be available.
• May have difficulty in getting access.
• Errors in data base.

11
• Possible coding problems
• Data may be available but it may have problems:
Missing or incomplete data.
Unknown definitions of data.
Changed definitions or procedures.
Might be too aggregated.

Evaluating Secondary Data


• What was the purpose of the study?
• Who collected the information and when was this done?
• What information was collected (questions, scales, etc.)?
• How was the information obtained (sampling frame, method of sample draw,
communication method, resulting sample, etc.)?
• How consistent is the information with other published information?

Locating Secondary Data Sources


• Step 1:Identify what you wish to know and what you already know about your
topic.
• Step 2: Develop a list of key words and names.
• Step 3: Begin your search using several library and Web sources.
• Step 4: Compile the literature you have found and evaluate your findings.

Primary Data Collection


Methods of Primary Data: Observation, In-depth Techniques, Experimentation, Surveys

Types of Observations
Structured (Descriptive)
Unstructured (Exploratory)
Participant (Anthropological)
Non-Participant (Political forecasts)
Disguised Participation (Presence of the observer is hidden)

In-Depth Techniques:
Focus groups Interviews
Interviews: Personal, Telephonic, Focused, Non-Directive
Projective Techniques

Primary Data Collection Methods


1. Observation
 Human or physical observation includes mystery shopping, cameras in store,
watching children handle toys, etc.
 Ethnography – watching behaviors in the consumers’ natural setting
 Mechanical or electronic observation using Nielsen people meter, eye tracking
devices, or using software to track behaviors on the Web, etc.
Limitations:
A. Expensive method in terms of time and money
B. Limited Information.
C. Interference of unforeseen factors

Merits of interview method

12
• It provides greater information in-depth.
• Can overcome the resistance through persuasion .
• Personal information can be sort.
• Low no-response.
• Can secure most spontaneous reactions.
• Adaptability of the language to the level of interviewee.
• Can collate supplementary information which maybe of great value in
interpreting the results.
• Interviewer can clarify unclear questions
• Literacy is not required
• Interviewer can collect more complex answers and observations
• Interviewer can minimize missing and inappropriate responses
• Interviewer can prevent respondent from answering out of sequence

Interview method is most useful when:


• Other methods do not make sense.
• When the issues are complex and in-depth understanding is needed.
• When the issues and questions are still being determined.

Pre-requisites of interviewing

• Careful selection, training, and briefing of the interviewer.


Must ask questions properly and intelligently.
Must answer legitimate questions of the interviewee.
Should not show surprise or disapproval
Must discourage irrelevant conversation.
• Interviewer must possess the technical competence and necessary practical
experience.
• Occasional field checks.

Guidelines for successful Interviewing:


1) Choose the time when the interviewee is at ease.
2) Approach must be friendly and informal.
3) Establish Rapport with the interviewee.
- People are motivated to communicate when atmosphere is favorable.
4) Listen with understanding, respect and curiosity.
5) Control the course of the interview and avoid irrelevant conversation.

Demerits:
1. No thinking space to the interviewee
2. Survey is restricted to those, who have telephone facilities.
3. Unsuitable for intensive surveys where comprehensive answers are required.
4. Greater possibility of bias.

Limitations of the interview method


• Possibility of Data collection and interpretation biases.
• Time consuming when sample is large.
• May introduce systematic errors.
• Lack of proper rapport , with the interviewee.

Through mailed questionnaires

13
Merits:
Low cost
Free from the bias of the interviewer.
Enough thinking space.
Can be reached to otherwise inaccessible people.
Sample could be larger.

Demerits:
Low rate of return.
Only the educated and cooperating people could be approached.
Difficulty in modifying the approach once the questionnaire is made.
Possibility of ambiguous replies/omissions of questions.
Method is slightly to be slowest of all.

Important considerations while framing a questionnaire

A) General form (close ended/Open ended).


Question sequence:
First few questions are important because they are lightly to influence the attitude and the
desired cooperation from the respondent

• First questions should “break the ice”


• General to specific order of questions
• Questions on personal or sensitive topics left towards the end
• Avoid a series of questions that are likely to elicit the same response (bias)
• One question can affect another
• Questions should be easily understood and should be simple
• There should always be provision for indications of uncertainty. e.g." Don't know”
“No preference”

Questionnaire Design: General Principles

Open-ended vs closed-ended questions:

Open-ended questions generate answers that are more nuanced and information-rich.

They permit subject freedom to answer question in own words (without pre-specified
alternatives).
Open-ended questions do not provide respondents with any answers from which to choose.

Open-ended Questions: Advantages and Disadvantages

– Advantages:
• Not forced to choose between categories
• May better reflect respondents thoughts\beliefs
• Appropriate when list of possible answers is excessive
• Lets respondent have the say, let him tell the researcher what he means, and not
vice-versa (obtain unanticipated answers)

14
– Disadvantages:
• Respondent may say too much or too little
• Provide incomplete or unintelligible answers
• Flexibility in responses difficult to code and analyze -Interpretations of answers
may vary
• Too much variance in response
• Expensive and time-consuming

Closed-ended Questions
Closed-ended questions provide respondents with a list of responses from which to choose.
Alternatively, closed-ended questions can provide multiple choices for the respondent to
accept or reject

Closed-ended Questions: Advantages and Disadvantages

- Advantages:
• Easy to answer and takes little time
• Answers can be precoded (assigned a number) and easily transferred to a
computer
• Answers are easy to compare
• Easier to elicit responses to sensitive questions
• Answers are more reliable
• Meaning of responses more meaningful to researcher

– Disadvantages:
• May not be accurate--forces people to accept categories, or puts too many people
into “other” category
• Answers relative to response scale provided
• Respondent's choice not among listed alternatives
• Choices listed communicate kind of response wanted
• Wording of response choices may influence responses

Difference between a questionnaire and a schedule

1) Questionnaires are sent through mail to the informants while schedules are filled in either
by the researcher himself or by the enumerators who are specially appointed for the
purpose.
2) Questionnaire is relatively cheap but data collection through schedules in expensive.
3) Non-response is high in case of a questionnaire.
4) In case of a questionnaire, identity of the person who has actually filled in may be
unknown as he/she might be doing it on behalf of someone else.
5) Questionnaire method is slow as many respondents may not return the filled in response
in time.
6) Personal contact is not possible in case of questionnaires.
7) Questionnaire method can be used only when respondents are literate and cooperative.
8) Coverage with questionnaire could be wider and cheap.

15
9) Risk of collecting incomplete and wrong information is relatively more under the
questionnaire method particularly when people are unable to understand questions properly.
10) Observation method can also be used along with the schedules but it is not possible with
the questionnaire.

16
4. BASIC STATISTICAL MEASURES

The data of given situation must be characterized by some statistical measure for the purpose
of estimation or comparison with similar data or making inference about the sample
population to which the data belong.

Statistical measures can be classified into:

1. Measures of Central tendencies


2. Measures of Variation
3. Measures of Skew ness
4. Measures of Kurtosis
5. Time series
6. Correlation
7. Regression

1. Measures of Central Tendency: Following are the measures of central tendency.


(a) Arithmetic Mean
(b) Weighted Arithmetic Mean
(c ) Median
(d) Mode
(e) Geometric Mean
(f) Harmonic Mean

2. Measures of Variation are


1. Range and Coefficient of range
2. Quartile deviation and Average Deviation
3. Standard Deviation
4. Coefficient of Variation

3. Measures of Skewness : Shape of distribution is another characteristics which is a


measure of concern. Shape of a distribution explains the nature of the distribution of
frequencies of observations which is defined as Coefficient of Skewness and it ranges from
-1 to +1. If coefficient is zero then distribution is symmetrical. If it is +ive then distribution
is positively skewed then relation between mean, median and mode is Mean>Median>Mode
If coefficient is –ive the distribution is –ively skewed here Mean<Median<mode.
Measures of Skewness:
1. Karl Pearson’s Coefficient of skewness= (mean-mode)/st. deviation
With Mode =3(median) – 2 (Mean)
CS= 3(Mean-Median)/St. Deviation

4. Kurtosis: Even if know measures of central tendency, Variation and Skewness we still
cannot form a complete idea about distribution. We should know convexity of distribution
or of the frequency curve or Kurtosis. Kurtosis enables us to have idea about the flatness of
or peaked ness of the frequency curve.
It is Measured by Coefficient B2 or its derivative Gamma2
Normal curve is called masokurtic curve.

5. Time Series: Many a times researchers have to deal with quantities which changes in
value with time. For obtaining the knowledge about the nature of variation of a quantity
along with time, time series can be used.

17
Graphs can used to plot such values and is called Histogram.of time series.

The fluctuations may be due to:

(a) Causes which operate over a long time period


(b) Causes which operate over a short time period

These causes are segregated and this process is called analysis of time series.

The variations in the value of the variance can be analysed into the following three
main components.
1. The basic or long time period
2. Short time of periodic changes
3. Irregular fluctuations

Measurement of Trend:
a. Free hand smoothing
b. Sectional Average: in this whole series is divided into suitable number of sections
and average of each section is found. These averages are plotted against the mid year
of the sections, then a free hand smooth curve is drawn through these points. The
curve represents Trend.
c. Method of Moving Average: In this fluctuations due to cyclical changes are
eliminated by averaging the values of the variance for a specified number of
successive years. Number of years over which the values are averaged depends upon
the average length of the cycle found in the series. Then mean is taken. All these
means are plotted and successive points are joined by straight line segment. The
resulting polygonal graph indicates the trend of the given time series.
d. Method of Least Squares: From the mid year of the time series , time deviations are
to be taken and these deviations are to be squared. Then multiply the values with the
squares. The resultants are trend ordinates. When these are plotted against the
corresponding year we get the line of best fit in the sense of least square.

Correlation: The relation between two or more characteristics of a population or a sample


can be studied with the help of a statistical method called correlation. If two quantities
vary in a related manner so that a movement in increase or decrease in one tends to be
accompanied by a movement in the same or in the opposite direction in the other, the two
quantities are said to be correlated. It may be +ive or –ive. It may be perfect or imperfect.
Methods: 1. Graphic Method; 2. Scatter Diagram; 3. Co-efficient of correlation

Co-efficient of correlation is a numerical measure of correlation.


1. Karl pearson coefficient of correlation also called product movement correlation
2. Spearman’s rank correlation
Test of significance is done for both measures by using t- test

Coefficient of Determination measures variations explained by the independent variable. It


is ratio of explained variations to total variations.

Regression Analysis: Regression means stepping towards the average. Regression is


dependence of a variable on one of more variable/s.

18
Y = a+ b X +u is a linear form of regression equation.

Y = a + b1X1 + b2X2+ -------+ bn Xn+ Ui is a example of multiple regression

U is a random error to fit the st. line we apply the method of least squares. In order to
estimate a and b we need to minimize sum of square of Ui. For this we solve the required
regression equations.

Coefficient of Determination: In order to measure the extent of strength of correlation


between the dependent and independent variable/s we calculate the statistic called
coefficient of determination (r2/R2)

This measure is developed on the basis of two levels of variations


The variations of Y values around the fitted regression line given by ∑ (Y-^Y)2 and
The variations of Y values around their own mean given by ∑ (Y-Y*)2
Where Y* is mean of Y. then r2/R2 = 1- ∑e2 /∑Y2
The value of r2/R2 shows the goodness of fit of the regression equation/s. Higher value of
r2/R2 , higher the closeness of fit and lesser r 2/R2 , lesser the goodness of fit.

19
5. SAMPLING METHODS

Sampling is a process of selecting a subset of randomized number of members of the


population of a study and collecting data about their attributes. Based on the data of the
sample the analyst will draw inference about the population.

Advantages of Sampling:
(1) Less time taken to collect data
(2) Less cost for data collection
(3) Physical impossibility of complete enumeration necessitates sampling
(4) More accuracy of data collected due to its limited size.

Sampling Frame: The complete list of all the members/units of the population from which
each sampling unit is selected is known as sampling frame. It should be free from error.

Sampling Methods:

Sampling methods are divided into two/;


(a) Probability Sampling
(b) Non-Probability Sampling

Probability Sampling: In probability sampling each unit of the population has a probability
of being selected as an unit of the sample. But this probability varies from one method to
another method of probability sampling.

In non-probability sampling there may be instances that certain units of population will have
zero probability of selection, because judgment biases and convenience of the interviewer
are considered to be the criteria for the selection of sample units.

Probability Sampling Methods:


(1) Simple Random sampling
(2) Systematic Sampling
(3) Stratified Sampling
(4) Cluster Sampling
(5) Multistage Sampling

(1) Simple Random Sampling: Let N = No of units of population


n = no of unit of sample Where n< N
There are two ways of performing SRS (a) with replacement abd (b) without
replacement
SRS with replacement: Each unit of the pop. has the equal probability of being
selected.
Prob. Of selection = 1/N
Selection is done by using Random Number Tables
SRS without replacement: Each unit of pop has varing prob of being selected as an
unit of the sample.
The Prob of First unit = 1/N
The Prob of second unit = 1/N-1
-- - - - - - - - - - - - - - - -
The Prob of nth unit = 1/N-(n-1)

20
Unit are selected from the population based on the respective probability using
Monte-Crlo Simulation.

(2) Systematic Sampling: This is a special kind of random sampling in which the
selection of the first unit of the sample from the popo is based on randomistion. The
remaining units of the sample are selected from the pop at a fixed interval of n, where
n is a sample size.
Sampling interval width I = N/n

(3) Stratified Sampling: It is an improvised sampling over simple random sampling and
systematic sampling.
In this sampling pop is divided into specified set of strata such that members with
in stratum have similar attributes but members between strata have dissimilar
attributes.
(a) Proportional Stratified sampling: when same proportion of units are selected
from each stratum. There is no much difference (less variance) in attributes
with in each stratum.
n is the sample selected such that n = n1+n2+------ +nk
N=pop size, Ni=Strata Size; ni = size of sub sample
n1/N1=n2/N2= --------=nk/Nk=n/N
n1= n.N1/N ---------------------- nk=n.Nk/N

(b) Dis proportional S.S.: When different proportion of units are selected from
each stratum. Attributes differ and there is high variance. In this sampling the
stratum which has more variance will have prop more sampling units as
compared to other stratum with less variance.
No of sampling units of the stratum i=ni= qi.si. n/ Sum qisi si
is st deviation of stratum i
qi= Ni/N

(4) Cluster Sampling: Pop is divided into different clusters. Memebers within the cluster
are dissimilar in terms of their attributes. But different clusters are similar to each
other. Each cluster can be treated as a small population which possess all the
attributes of the pop. Any one cluster is selected and all units of cluster constitute the
sample.

(5) Multistage Sampling: In a large scale survey covering the entire nation the size of
the sample frame will be very large. In such study multistage sampling technique is
used.

The entire country is divided into regions.

Stage 1: Different states of the country are sampled from each region using stratified
sampling. Here it is assumed that the states within the region are similar and the regions
are dissimilar.

Stage 2: Then cluster sampling is can be used from each selected state by assuming that
different districts of each state as its cluster.
Stage 3: In each selected district a random sampling may be used to select the
proportional number of units from it.

II. Non-Probability Sampling Methods:

21
1. Convenience Sampling
2. Judgment Sampling also called Purposive Sampling
3. Quota sampling
4. Snowball Sampling

22
6. HOW TO CONDUCT SURVEYS

Introduction
This lecture will help you to learn how to conduct a survey and design a
questionnaire. Survey research is being conducted in almost all areas of management,
economics and social sciences. It is quite relevant to understand the various techniques and
tools used in survey research. As a faculty of management and social sciences, you conduct
market research, socio-economic evaluation study, opinion-based studies, and policy and
programme assessment studies. For all such type of studies, conducting survey through
questionnaire becomes essential. Keeping this in view, this lecture focuses on survey
research techniques and how to design a questionnaire that gets the true opinions of your
sample. Questionnaires are the most common marketing research method. They are used for
structured interviews, written surveys, email, and internet surveys.
Conducting a survey is a useful way of finding something out, especially when
`human factors' are under investigation. Although surveys often investigate subjective
issues, a well-designed survey should produce quantitative, rather than qualitative, results.
That is, the results should be expressed numerically, and be capable of rigorous analysis.
Researchers quite often underestimate how difficult it is to carry out a survey well; a good
survey is more than a handful of questionnaires and a couple of bar charts: it requires careful
planning, methodical application, and detailed analysis of the results

Methods of getting information


We are living in an information age. More information has been published in the last
decade than in all previous history. Everyone uses information to make decisions about the
future. If our information is accurate, we have a high probability of making a good decision.
If our information is inaccurate, our ability to make a correct decision is diminished. Better
information usually leads to better decisions. There are six common ways to get information.
These are: literature searches, talking with people, focus groups, personal interviews,
telephone surveys, and mail surveys.

1. A literature search involves reviewing all readily available materials. These materials
can include internal company information, relevant trade publications, newspapers,
magazines, annual reports, company literature, on-line databases, and any other published
materials.

2. Talking with the people is a good way to get information during the initial stages of a
research project. It can be used to gather information that is not publicly available, or that is
too new to be found in the literature. Examples might include meetings with prospects,
customers, suppliers, and other types of business conversations at trade shows, seminars, and
association meetings. Although often valuable, the information has questionable validity
because it is highly subjective and might not be representative of the population.

3. A Focus Group is used as a preliminary research technique to explore people’s ideas and
attitudes. It is often used to test new approaches (such as products or advertising), and to
discover customer concerns. A group of 6 to 20 people meet in a conference-room-like
setting with a trained moderator. The room usually contains a one-way mirror for viewing,
including audio and video capabilities. The moderator leads the group's discussion and keeps
the focus on the areas you want to explore. Their disadvantage is that the sample is small and
may not be representative of the population in general.

23
4. Personal Interviews are a way to get in-depth and comprehensive information. They
involve one person interviewing another person for personal or detailed information.
Personal interviews are very expensive because of the one-to-one nature of the interview.
Typically, an interviewer will ask questions from a written questionnaire and record the
answers verbatim. Personal interviews (because of their expense) are generally used only
when subjects are not likely to respond to other survey methods.

5. Telephone Surveys are the fastest method of gathering information from a relatively large
sample (100-400 respondents). The interviewer follows a prepared script that is essentially
the same as a written questionnaire. However, unlike a mail survey, the telephone survey
allows the opportunity for some opinion probing. Telephone surveys generally last less than
ten minutes.

5. Mail Surveys are a cost-effective method of gathering information. They are ideal for
large sample sizes, or when the sample comes from a wide geographic area. They cost a little
less than telephone interviews, however, they take over twice as long to complete (eight to
twelve weeks). Because there is no interviewer, there is no possibility of interviewer bias.
The main disadvantage is the inability to probe respondents for more detailed information.

6. Email and Internet surveys are relatively new and little is known about the effect of
sampling bias in Internet surveys. While it is clearly the most cost effective and fastest
method of distributing a survey, the demographic profile of the Internet user does not
represent the general population, although this is changing. Before doing an e-mail or
Internet survey, carefully consider the effect that this bias might have on the results.

What is Survey?
A survey is a method of collecting information directly from people about their ideas,
feelings, health, plans, beliefs, and social, educational and financial background. It usually
takes the form of self-administered questionnaires and interviews. Self- administered
questionnaires can be completed by hand or by computer. Interviews take place in person or
on telephone.

Why do we conduct survey?


There are at least three good reasons for conducting surveys.

1. A policy needs to set or a programme must be planned.

Surveys are conducted to meet policy or programmes needs. For instance, a company
is considering providing day care for children of its working staff. How many have young
children? How many would use the agency services?

2. You want to evaluate the effectiveness of programmes to change people’s


knowledge, attitudes, health or welfare.
3. You are a researcher and a survey is used to assist you.

When is a Survey Best?


Many methods are available for obtaining information about people. A survey is one
of them. Surveys can be used to make policy or to plan and evaluate programmes and

24
conduct research when the information you need should come directly from people. The
data they provide are descriptions of attitudes, values, habits and background characterizes
such as age, health, education and income.

Types of Survey

1. Cross-sectional Survey
With this design, data are collected at a single point in time. Think of a cross
sectional survey as a snapshot of a group of people or organizations. Cross-sectional
surveys have several advantages.

2. Longitudinal Surveys

With longitudinal survey, data are collected over time. At least three variations are
particularly useful.
(a) Trend: a trend design means surveying a particular group over time. For example,
studying a group of rural people’s socio-economic conditions over time.
(b) Cohort: In cohort survey, you study a particular group over time but people in the
group may vary.
(c) Panel: panel survey means collecting data from the same sample over time.

The Survey Content

 Select your information needs or Hypotheses.


 Make sure you can get the information you need
 Do not ask for information unless you can act on it.
 Survey items may take form of open-ended or forced choice questions: Forced choice
questions with several choices are easier to score than open ended, short answer,
essay questions. Open ended questions give respondents an opportunity to state a
position in their own words; unfortunately these words may be difficult to interpret.

Rules for Writing Survey Items with Forced Choices

1. Each question should be meaningful to respondents. In a survey of political views,


the questions should be about the political process, parties, candidates and so on. If
you introduce other questions that have no readily obvious purpose, such as those
about age or gender, you might want to explain why they are being asked.
2. Use Standard English. Because you want an accurate answer to each survey items,
you must use conventional grammar, spelling and syntax.
3. Make questions concrete: questions should be close to the respondent’s personal
experience. For instance, asking respondents if they enjoyed a book is more abstract
than asking if they recommended it to others or read more books of the same author.
4. Avoid biased words and phrases: certain names, places and views are emotionally
charged. When include in a survey, they unfairly influence people’s response.
5. Check your own biases: an additional source of bias is present when survey writers
are unaware of their own position towards a topic. Look at this:

Do you think the left parties and the congress will soon reach a greater degree of
understanding? (Biased question)
When you have questions that you suspect encourage strong views on either side.

25
Better: in your opinion, in the next two years, how is the relationship between the left
parties and congress likely to change?

Much improvement
Some improvement
Some worsening
Much worsening
Impossible to predict

6. Use caution when asking about the personal: another source of bias may result
from questions that may intimidate the respondent.
How much do you earn each year? Are you single or divorced? How do you feel about
your teacher, counselor or doctor? When personal information is essential to the survey,
you can ask questions in the least emotionally charged way if you provide categories of
responses.

Example:

Poor: What was your annual income last year? Rs…………………………..


Better: In which category does your annual income last year fit best:

Below Rs. 100000


Rs. 100000 - Rs.150000
Rs. 150000 – Rs. 200000
Rs. 200000 – Rs. 250000
Rs. 250000 and above.

7. Each question should have just one thought: do not use questions in which a
respondent’s truthful answer could be both ways yes and no at the same time

Survey Design
Here, we shall discuss options and provides suggestions on how to design and conduct a
successful survey research. There are 7 steps in the survey research:

1. Establish the objectives of the study - What you want to examine


2. Determine your sample - Whom you will interview
3. Choose interviewing methodology - How you will interview
4. Create your questionnaire - What you will ask
5. Pre-test the questionnaire, if practical - Test the questions
6. Conduct interviews and enter data - Ask the questions
7. Analyze the data - Produce the reports

Setting Objectives
The first step in any survey is deciding the objectives. If your objectives are unclear, the
results will probably be unclear. Some typical objectives may be like these:

 The potential market for a new product or service


 Ratings of current products or services
 Employee attitudes
 Customer/patient satisfaction levels 
 Reader/viewer/listener opinions
 Association member opinions

26
 Opinions about political candidates or issues
 Corporate images

Selecting Your Sample

There are two main components in determining whom you will interview. The first
is deciding what kind of people to interview. Researchers often call this group the target
population. If you conduct an employee attitude survey or an association membership
survey, the population is obvious. If you are trying to determine the likely success of a
product, the target population may be less obvious. Correctly determining the target
population is critical. If you do not interview the right kinds of people, you will not
successfully meet your goals.

The next thing to decide is how many people you need to interview. You must make a
decision about your sample size based on factors such as: time available, budget and
necessary degree of precision.

Bias
A survey is biased if its outcome has been influenced by factors other than the one being
studied. Bias is occasionally overt: the experimenter is not open-minded about the results,
and interprets them wrongly. But more often bias comes from poor survey design. A typical
problem is that of comparing two groups of people that are not really alike. For example, if
there are more men than women in one group, and more women than men in another, the
responses of the groups to any question will be influenced by the differences between men
and women. The solution to this problem is that of randomization. In some cases it is
necessary to use `stratified' random sampling to ensure that the sample is typical of the
population.

Selecting Respondents
Select survey respondents at random from the intended audience. If at all possible, identify
a comparison group that doesn't get the information so that you can see how much of the
change in knowledge, attitude, and/or behavior is a result of your information versus a result
of other factors in the market place. This is a variation on a control group; in a real
experiment, you would randomly assign people to either a group that gets the information or
the control group that would not. But random assignment is not feasible in the context of
report cards, so a comparison group is an acceptable alternative. One of the easiest ways to
create a comparison group is to collect baseline data, i.e., responses to key questions
collected before the information was disseminated. This is often referred to as a "pre/post"
survey. You do not have to contact the same people before and after the distribution period.
But be sure to survey a representative sample each time so that their responses are
comparable.

Interviewing Methods
Once you have decided on your sample you must decide on your method of data collection.
Each method has advantages and disadvantages. Some of the methods are listed as follows:

1. Personal Interview: An interview is called personal when the Interviewer asks the
questions face-to-face with the Interviewee. Personal interviews can take place in the home,
at a shopping mall, on the street, outside a movie theater or polling place, and so on.

27
2. Telephone Surveys: Surveying by telephone is the most popular interviewing method.
This is made possible by nearly universal coverage.

3. Mail Surveys: One way of improving response rates to mail surveys is to mail a postcard
telling your sample to watch for a questionnaire in the next week or two. Another is to follow
up a questionnaire mailing after a couple of weeks with a card asking people to return the
questionnaire.

4. Computer Direct Interviews: These are interviews in which the Interviewees enter their
own answers directly into a computer. They can be used at malls, trade shows, offices, and
so on. Some researchers set up a Web page survey for this purpose.

5. Email Surveys: Email surveys are both very economical and very fast. More people have
email than have full Internet access. This makes email a better choice than a Web page
survey for some populations. On the other hand, email surveys are limited to simple
questionnaires, whereas Web page surveys can include complex logic.

6. Internet/Intranet (Web Page) Surveys: Web surveys are rapidly gaining popularity. They
have major speed, cost, and flexibility advantages, but also significant sampling limitations.
These limitations make software selection especially important and restrict the groups you
can study using this technique. Internet survey is recommended mainly when your target
population is Internet users. Business-to-business research and employee attitude surveys
can often meet this requirement. Another reason to use a Web page s urvey is when you want
to show video or both sound and graphics. A Web page survey may be the only practical
way to have many people view and react to a video.

Tips for Improving Response Rates

 Know your respondents. Make certain the questions are understandable to


them, to the point, and not insensitive to their social and cultural values.
 Use trained personnel to recruit respondents and conduct surveys. Set up a
quality assurance system for monitoring quality and retraining
 Identify a larger number of eligible respondents than you need in case you do
not get the sample size you need.
 Keep survey responses confidential or anonymous
 Send remainders to complete mailed surveys and make repeat phone calls.
 Provide gift or cash incentives
 Be realistic about the eligibility criteria. Anticipate the proportion of respondents
who may not be able to participate because of survey circumstances (such as
incorrect addresses) or by change (sudden illness).
 Formally respect each respondent’s privacy.

28
Part-II

DESIGNING QUESTIONNAIRE/SCHEDULE

Introduction
Questionnaire is widely used for data collection in survey research. It is fairly reliable tool
for gathering data from large, diverse, varied and scattered groups. Questionnaire is a list of
questions sent to a number of persons for their answers and which obtains standardised
results which can be tabulated and treated statistically.

Sometimes a distinction is made between `questionnaire’ and `schedule’ or `interview


guide’. Generally questionnaire is mailed to the respondents who are to give answers in a
manner specified either in the covering letter or in the main questionnaire itself. On the other
hand a schedule refers to a form of questionnaire which is generally filled in by the
investigator himself. He/she sets with the informant face to face and fi lls the form. Schedule
is more effective than mailed questionnaire because in most of the cases, respondents do not
response properly the mailed questionnaire because of ignorance, illiteracy and lack of
awareness and interest, while in case of schedule, i nvestigator has face to face contact with
respondents, she/he would be able to get reliable information from the respondents.

Types of Questionnaire
Questionnaire may be broadly of two types, viz. Structured and unstructured
questionnaire. According to P.V. Young, structured questionnaires are those “which pose
definite, concrete, and pre-determined questions, i.e.; they are prepared in advance and not
constructed on the spot during the question period”. Additional questions may be asked only
when some clarification is required. Answers to these questions are normally given with
high precision. For e.g. age, sex, marital status, number of children nationality etc., are
automatically structured. Structured questionnaire may further be grouped into closed form
or open-end questionnaire. A close form questionnaire is one in which questions are set in
such a manner that it leaves only a few alternative answers. The informant is left with only
a few choices to answer them. For e.g., do you think poverty and unemployment have
increased in India after economic reform? Yes/No/Can’t say.

In above stated question, respondent has to select one out of three alternatives. The
open-ended questionnaire, on the other hand, is one in which the respondent has full choice
of using his own style and diction of language expression, length and perception. He has
enough freedom while providing answers to open questions.

The unstructured questionnaire contains a set of questions which are not structured
in advance and which may be adjusted according to the need of question period. The
unstructured questionnaire is used mainly for conducting interviews. Flexibility is its chief
merit.

A widespread criticism of closed questionnaire is that they force people to choice


among offered alternatives instead of answering in their own words. Closed questions spell
the response options; they are more specific than open questions and therefore more apt to
communicate the same frame of reference to all respondents.

Let us take a hypothetical case; we want to identify the most important problem
facing the country. In open-closed experiment people are asked what they think is the most

29
important problem facing the nation. In a close-ended framework, we set five alternatives,
namely, unemployment, economic disparity, crime, poor governance and inflation. One
open-ended question is also set. In response to the open-ended questions, the respondents
may identify power shortages the most vital problem of the country. Thus, open-ended
questions are also relevant, especially when the researcher has inadequate knowledge about
the various problems faced by the country.

Construction of Questionnaire

A. General Considerations

1. Most problems with questionnaire analysis can be traced back to the design phase
of the project. Well-defined goals are the best way to assure a good questionnaire
design. When the goals of a study can be expressed in a few clear and concise
sentences, the design of the questionnaire becomes considerably easier. The
questionnaire is developed to directly address the goals of the study.
2. One of the best ways to clarify your study goals is to decide how you intend to use
the information. This sounds obvious, but many researchers neglect this task.
3. Be sure to commit the study goals to writing. Whenever you are unsure of a
question, refer to the study goals and a solution will become clear. Ask only
questions that directly address the study goals.
4. KISS - keep it short and simple. If you present a 20-page questionnaire most
potential respondents will give up in horror before even starting. A one of the most
effective methods of maximizing response is to shorten the questionnaire.
5. If your survey over a few pages, try to eliminate questions. Many people have
difficulty knowing which questions could be eliminated. For the elimination round,
read each question and ask, "How am I going to use this information?" If the
information will be used in a decision-making process, then keep the question... it's
important. If not, throw it out.
6. Involve other experts and relevant decision-makers in the questionnaire design
process.
7. Formulate a plan for doing the statistical analysis during the design stage of the
project. Know how every question will be analyzed and be prepared to handle
missing data. If you cannot specify how you intend to analyze a question or use the
information, do not use it in the survey.
8. Provide a well written cover page. The respondent's next impression comes from
the cover letter (for mailed questionnaire). It provides your best chance to persuade
the respondent to complete the survey.
9. Giver your questionnaire a title that is short and meaningful to the respondents. A
questionnaire with a title is generally perceived to be more credible than one
without.
10. Begin with a few non-threatening and interesting items. If the first items are too
threatening or "boring", there is little chance that the person will complete the
questionnaire.
11. Leave adequate space for respondents to make comments. Leaving space for
comments will provide valuable information not captured by the response
categories.
12. Place the most important items in the first half of the questionnaire. Respondents
often send back partially completed questionnaires.

30
13. Use professional production methods for the questionnaire—either desktop
publishing or typesetting and key-lining. Be creative.
14. The final test of a questionnaire is to try it on representatives of the target
audience.

B. Language
The wording of a question is extremely important. Researchers strive for objectivity
in surveys and, therefore, must be careful not to lead the respondent i nto giving a desired
answer. Many investigators have confirmed that slight changes in the way questions are
worded can have a significant impact on how people respond.
Because questionnaires are usually written by educated persons who have special
interest in and understanding of the topic of their investigation and because these people
usually consult with other educated and concerned persons, it is common for questionnaires
to be overwritten, over complicated, and too demanding of the respondent. Therefore, it
requires special measures to cast questions that are clear and straight forward in four
important aspects; simple language, common concepts, manageable tasks and widespread
information.

In choosing the language for a good questionnaire, the nature and structure of
population to be studied should be kept in mind. Technical terms and jargons should be
avoided to the maximum possible extent. Words used in ordinary conversation should be
preferred. For example:

Acquaint - inform
Assist - help
Consider - think
Reside - live
State - say
Sufficient - enough
Initiate - start and so on

In surveys of general population, questions should consist of simple words, which


convey the exact meaning. Ambiguous and vague words should be avoided. As far as
possible, the words of local dialect should be used. Double version of questions should be
avoided.
Common concepts should be used in the questionnaire. Mathematical abstractions
tend to be difficult for the general public `variance’ for instance – survey investigators would
not think of asking the general public questions about variances or standard deviations. They
know perfectly well that the concept of an average is much more widely understood than
others.

31
C. Question Content
A questionnaire designer has to ensure that all the necessary items are duly
incorporated in the questionnaire. The investigator may take the help of standard checklists
to see that all the required items are included in the questionnaire. The checklist can also be
prepared by the investigator himself. Check lists may differ depending upon the aims and
objectives of the survey research. Some of the important items of checklist of content are
as follows:

1. Is this question necessary for clear understanding? Just how well it is used.
2. Are several questions needed on the subject matter of this one question?
3. Do the respondents have the information necessary to answer the questions?
4. Does the question need to be more concrete, more specific and closely related to
the respondent’s experience?
5. Is the question content sufficiently general and free from superiors concreteness
and specificity?
6. Is the question content biased or loaded in one direction – without accompanying
questions to balance the emphasis?

D. Question Types

Researchers use three basic types of questions: multiple choice, numeric open end and
text open end (sometimes called "verbatim"). Examples of each kind of question follow:

Multiple choice Question

1. Where do you live? (1) Northern Region (2) Central Region (3) Eastern region
(4) Western region (5) Southern region

2. Numeric Open End Question


How much did you spend on fruits last week? -------------

3. Text Open End

How can your company improve its working conditions?


---------------------------------------------------------------------------------
Rating Scales and Agreement Scales are two common types of questions.

Rating scale Example

How would you rate this Product?

1. Excellent
2. Very good
3. Good
4. Fair
5. Poor

On a scale where “10” means you have a great amount of interest in a subject and
“1” means you have none at all, how would you rate your interest in each of the following
topics?

32
1. New economic policy
2. SEZ policy
3. Corporate social responsibility
4. Labour Market reforms

Agreement scale Example

How much do you agree with each of the following statements:

Sl. Statement Strongl agree Agree disagree Strongly


No. y agree somewhat disagree
1 My manager provides
constructive criticism
2 Our medical plan provides
adequate coverage
3 Globalization has benefited
the Indian Economy
4 Rural-urban disparities has
increased in the post-reform
period.

E. Qualities of a Good Question

The qualities of a good question are as follows:

1. Evokes the truth. Questions must be non-threatening. Anonymous questionnaires


that contain no identifying information are more likely to produce honest responses
than those identifying the respondent.
2. Asks for an answer on only one dimension. For example, a researcher investigating
a new food snack asks "Do you like the texture and flavor of the snack?" If a
respondent answers "no", then the researcher will not know if the respondent
dislikes the texture or the flavor, or both. Another questionnaire asks, "Were you
satisfied with the quality of our food and service?"
3. Can accommodate all possible answers. Multiple choice items are the most popular
type of survey questions because they are generally the easiest for a respondent to
answer and the easiest to analyze. For example, consider the question:

What brand of computer do you own?


A. IBM PC
B. Apple

Clearly, there are many problems with this question. What if the respondent doesn' t own
a microcomputer? What if he owns a different brand of computer? What if he owns both
an IBM PC and an Apple? There are two ways to correct this kind of problem.

The first way is to make each response a separate dichotomous item on the questionnaire.
For example:

Do you own an IBM PC? (circle: Yes or No)

33
Do you own an Apple computer? (circle: Yes or No)

Another way to correct the problem is to add the necessary response categories and allow
multiple responses. This is the preferable method because it provides more information
than the previous method.

What brand of computer do you own?


(Check all that apply)
Do not own a computer
IBM PC
Apple
Other

4. Has Mutually exclusive options. A good question leaves no ambiguity in the mind of the
respondent. There should be only one correct or appropriate choice for the respondent to
make

5. Produces variability of responses. When a question produces no variability in


responses, we are left with considerable uncertainty about why we asked the question and
what we learned from the information. If a question does not produce variability in
responses, it will not be possible to perform any statistical analyses on the item. For
example:

What do you think about this report?


A. It's the worst report I've read
B. It's somewhere between the worst and best
C. It's the best report I've read

Since almost all responses would be choice B, very little information is learned.

6. Does not presuppose a certain state of affairs. Among the most subtle mistakes in
questionnaire design are questions that make an unwarranted assumption. An example of
this type of mistake is:

Are you satisfied with your current auto insurance? (Yes or No)

This question will present a problem for someone who does not currently have auto
insurance. Write your questions so they apply to everyone.
One of the most common mistaken assumptions is that the respondent knows the correct
answer to the question. Industry surveys often contain very specific questions that the
respondent may not know the answer to. For example:

What percent of your budget do you spend on direct mail advertising?

7. Does not imply a desired answer. The wording of a question is extremely important. As
examples:

Don't you think most of the politicians are corrupt?

34
8. Does not use emotionally loaded or vaguely defined words. Quantifying adjectives (e.g.,
most, least, majority) are frequently used in questions. It is important to understand that these
adjectives mean different things to different people.

E. Question Sequence
Items of a questionnaire should be grouped into logically coherent sections.
Grouping questions that are similar will make the questionnaire easier to complete, and the
respondent will feel more comfortable. Questions that use the same response forma ts, or
those that cover a specific topic, should appear together. Each question should follow
comfortably from the previous question. Writing a questionnaire is similar to writing
anything else. Transitions between questions should be smooth. Questionnaires that jump
from one unrelated topic to another feel disjointed and are not likely to produce high
response rates.
Some researchers have suggested that it may be necessary to present general
questions before specific ones in order to avoid response contami nation. Other researchers
have reported that when specific questions were asked before general questions,
respondents tended to exhibit greater interest in the general questions.
The numbering of questions should be in a logical sequence. To check the sequence
of questions the following questions should be answered.

1. Are the answers to the questions likely to be influenced by the content of the
preceding questions?
2. Are the questions led up to in a natural way?
3. Do some questions come too early or too late from the point of view of arousing
interest and receiving sufficient attention, avoiding resistance and inhabitations?

E. Commandments for Construction of Good Questionnaire

D.C. Miller provides a guide to the questionnaire construction:

1. Keep the language pitched to the level of respondent.


2. Try to pick words that have the same meaning for every one.
3. Avoid long questions
4. Do not have a priori assumption that your respondent possesses factual information
or first hand opinions.
5. Establish the frame of reference you have in mind.
6. In informing a question either suggest all possible alternatives or do not suggest
any.
7. Protect your respondent’s ego.
8. If you are after unpleasant orientations, give your respondent a chance to express
his positive feeling first so that he is not put in an unfavourable light.
9. Decide whether you need a direct question, an indirect question, or an indirect
followed by a direct question.
10. Decide whether the question should be open or closed.
11. Decide whether general or specific questions are needed.
12. Avoid ambiguous wording.
13. Avoid biased questions.
14. Phrase questions so that they are not unnecessarily objectionable.

35
15. Decide whether a personal or impersonal question will obtain the better response.
16. Questions should be limited to a single idea or a single reference.

Pre-test the Questionnaire


The last step in questionnaire design is to test a questionnaire with a small number of
interviews before conducting your main interviews. Ideally, you should test the survey on
the same kinds of people you will include in the main study. Pre-tests and Pilot study are the
essence of a good questionnaire. It enables the investigator to identify the mistakes and
unwarranted and undesirable trends that might have crept into the questionnaire. It helps in
enriching the design of the questionnaire and assists in testing the validity and reliability of
statistical techniques to be adopted for data processing and analysis.

Questionnaire for Interviewers/Investigator


After making a pre-test of a questionnaire, a questionnaire for interviewer should be
constructed for getting relevant information so that the mistakes or inconsistency observed
in the questionnaire may be removed.

1. Did any of the questions seem to make respondent uncomfortable?


2. Did you have to repeat any questions?
3. Did respondent misinterpret any questions?
4. Which questions were the most difficult or awkward for you to read? Have you
come to dislike any specific questions? Why?
5. Did any of the sections seem to drag?
6. Were there any sections in which you felt that the respondent would have liked the
opportunity to say more? and so on.

After finalising the questionnaire/schedule by correcting it on the basis of pre-


testing, the investigator has to collect data from the field. The following points should be
taken into account by the investigator while collecting data through questionnaire/schedule.

1. He must plan in advance and should fully know the problem under consideration. He
must choose a suitable time and place so that respondent should be ease during
interview.
2. All possible efforts should be made to establish proper rapport with the informant;
people are motivated to communicate when the atmosphere is favourable.
3. He must know that ability to listen with understanding, respect and curiosity is the
gateway to communication, and hence acts accordingly during the survey.
4. Investigator’s approach must be friendly and informal. Initially friendly greetings in
accordance with the cultural pattern of the respondent should be exchanged and then
the purpose of the survey should be explained.
5. To the extent possible, there should be a free-flow interview and the questions
must be well phrased in order to have full cooperation of the respondent.

36
7. PARAMETRIC AND NON-PARAMETRIC STATISTICAL TESTS

Testing of Hypothesis- developed by Neyman and Pearson- employs statistical tools to


arrive at a decision in certain situations where there is element of uncertainty.
In test of Hypothesis we see whether there is significant difference between parameters
and Statistic, parameter and parameter and statistic and statistic.
Inference about population on basis of sample may be pertaining to certain Hypothesis.

Hypothesis: A Hypothesis is an assumption or a theoretical proposition that is capable of


empirical verification or disproof.
It may or may not be true.
Statistical Hypothesis is an assertion about Probability distribution of one or more
random variables.
It can be simple, if it completely specifies the probability distribution of population.
or Complex or composite if it does not completely specifies the Probability distribution
of population.

Test of Significance:
The procedure to access the significance of the difference between a sample statistics and
corresponding population parameter or difference between two independent statistics is
called test of significance.
Example Agronomist wants to establish from his research experiment data if the average
yield of new variety has some specific value or not.

Or whether the yield of two varieties of wheat is same or not.

Question is how to arrive at the conclusion whether difference is real (significant) or due to
chance (called non-significant) and how large difference is to be considered statistically
significant.

Hypotheses are of two types


Null Hypothesis (H0) and Alternate Hypothesis (H1)
Decision-maker should always logically adopt a neutral or null attitude towards the
outcome of experiment.
Null Hypothesis is a statistical Hypothesis (set by statistician as a judge) of no difference
and it is tested for its possible rejection under the assumption that it is true.
Null hypothesis shall be rejected or shall not be rejected at certain level of significance
. Null hypothesis should never be accepted on the basis of one sample statistic

Alternate Hypothesis: Set by Experimental. The Hypothesis representing the opposite of


the null hypothesis is called alternate hypothesis. It is any statistical hypothesis, which is
complementary to null hypothesis
Ex. If we are to test whether the average per capita of two states differ significantly or not,
The null hypothesis will be
H0: a= b ( i.e. PCI of states A & B do not differ significantly)

Alternate Hypothesis
1. H1: a = b (Two Tailed alternate)
2. H1: a > b ( one Tailed ; Right tailed alternate)
3. H1: a < b ( one Tailed ;Left tailed alternate)

37
Significance Level: The significance is the probability with which null hypothesis will be
rejected due to sampling error though it is true.
Decision to reject or accept null hypothesis depends upon the information contained in the
sample and there is always a risk of taking wrong decision. One is likely to commit two
types of errors.

TYPE –I Error: The error of rejecting null Hypothesis on the basis of information
contained in the sample when actually it is true is called Type-I error ( probability of
rejecting null Hypothesis when it is true) It is denoted by  (In quality control it is called
producer’s risk because it is probability of rejecting a good lot).

Probability of committing type –I error is called Level Significance  = Level of


Significance = Probability of rejecting H0 when it is true.

Type-II Error: It is the probability of accepting the null hypothesis when it is false .
Also called consumer’s risk because it is prob of accepting bad lot. It is denoted by .

Hypothesis H0 and Hi are mutually exclusive events. i.e. if H0 is accepted (rejected) then
H1 is rejected (accepted).

Power of Test: The probability of accepting null Hypothesis on the basis of sample
information when null hypo is true is called Power of a test.
Therefore Power of Test: Prob. (Accept H0 when H0 is true)

Prob.(Accept H0 when true) + Prob. ( Accept H0 when H0 is false) = 1

Prob (Accept H0 when true)+  = 1


Therefore Power of test = Prob(Accept H0 when true) = 1-

So test will be more powerful when  error is small.

Sample Space: Pop size= N, Random sample size drawn= n and possible samples are
k=Ncn
Suppose some statistic ‘t’ is computed from each of the samples.
t =f(x1,x2,x3,-----xn) Possible sample statistic are t 1,t2,t3 tk constitute sample space.
It is used to test null hypothesis. Some will lead to rejection of Ho other may lead to
acceptance of H0.
Thus sample space of statistic is divided into two disjoint and exhaustive sets.

Critical Region (W) : It is part of sample space which leads to rejection of null hypothesis
if given sample statistic fall in this region.

Acceptance Region: It is that part of sample space, which leads to acceptance of null
hypothesis, if sample statistic falls in it.

Critical Point: The point in sample space which divides the sample space in two mutually
disjoint and exhaustive sets is known as critical Point.
The critical points are tabulated values for different sampling distributions. Form of
Sample Space is determined by different sampling distribution like t, F, ,Z etc.
Sampling Distribution: Sampling distribution is Probability distribution of a statistic.

38
Two tailed and one tailed tests: A two tailed test rejects the null hypothesis if, say, the
sample mean is significantly higher or lower than the hypothesized value of the mean of the
population. Such a test is appropriate when the null hypothesis is some specified value and
the alternative hypothesis is a value not equal to the specified value of the null hypothesis.

Symbolically, the two tailed test is appropriate when we have H0: = 0


and Hi : :  0 which may mean 0 or 0. Thus in a two tailed test there are two
rejection regions, one on each tail of the curve.

One tailed test : A one tailed test would be used when we are to test, say, whether the
population mean is either lower than or higher than some hypothesized value.
For ex. If H0:  = 0
H1:   0 then we are interested in what is known as left tailed test or if
H1:   0 then it is one tail test, which is known as right tailed test. (Where there is only
one rejection region either on the left tail or right tail)

Tests of Hypotheses:
Tests of hypotheses (also known as tests of significance) can be classified as:
1. Parametric Tests
2. Non-parametric Tests

Parametric Tests: Parametric tests usually assume certain properties of the parent
population from which sample is drawn. Assumptions like observations come from a normal
population, sample size is large, assumptions about the population parameters like mean,
variance etc. must hold good before parametric tests can be used. Probability distribution of
statistic (sampling distribution) is known i.e. it follows particular distribution like t, F, Z
etc. Parametric tests cannot be applied if nature of parent population is unknown and data is
measured on nominal/ ordering scale.
The important parametric tests are: z-test, t-test, - test and F-test. (-test is also used as
a test of goodness of fit and also as a test of independence in which case it is a non-
parametric test.)
All these test are based on the assumption of normality i.e. the source of data is considered
to be normally distributed.

Z-test: it is based on the normal probability distribution and is used for judging the
significance of several statistical measures, particularly the mean. The relevant test statistic,
Z, is worked out and compared with its probable value at specified level of significance for
judging the significance if the measure concerned.
As n becomes large Z-test is generally used even when binomial distribution or t-
distribution is applicable on the presumption that such a distribution tends to approximate
normal distribution.
Z- test is used for comparing the mean for the population, when pop. variance is known, for
judging the significance of difference between means of two independent samples when pop.
variance is known, for comparing the sample proportion to a theoretical value of population
or for judging the difference in proportions of two independent samples when n happens to
be large. This test may be used for judging the significance of median, mode, coefficient of
correlation and several other measures.

t-test: t-test is based on t-distribution and is considered an appropriate test for judging the
significance of a sample mean or for difference between means of two samples in case of

39
small sample (s) when pop. variance is not known (then sample variance is used for pop
variance.). In case two samples are related, we use paired t-test (difference test) for judging
the significance of mean of difference between two related samples. Also used for testing
the significance of the coefficient of simple and partial correlations. The relevant test statistic
t is calculated from the sample data and then compared with its probable value based on t-
distribution to read from table at different level of significance and degree of freedom for
accepting or rejecting the hypothesis.

 -test: It is based on chi-square distribution and as a parametric test is used for comparing
a sample variance to a theoretical population variance.
 = (Xi- X)2/2 = (n-1) S2/2 with n-1 d. f.

F-test: F-test is based on F-distribution and is used to compare the variance of the two –
independent samples. This test is also used in the context of analysis of variance (ANOVA)
for judging the significance of more than two sample means at one and the same time. It is
also used for judging the significance of multiple correlation coefficients. Test statistic, F,
is calculated and compared with its probable value for accepting or rejecting the null
hypothesis. (we use F-ratio Table for certain d.f. at certain level of significance)

Non-Parametric Tests: The tests which are used when practical data may be non normal
and /or it may not be possible to estimate the parameter(s) of the data are called non-
parametric tests. Since these tests are based on the data, which are free from distribution and
parameter, these tests are called non-parametric tests or distribution free tests. The non-
parametric tests can be used for nominal data (qualitative data, like greater or less etc.) and
ordinal data, like ranked data. These tests require less calculation, because there is no need
to compute parameters. Also these tests can be applied to very small samples, more
specifically during pilot studies in market research. Inference about the population can be
made by the non-parametric tests when assumptions of the standard methods cannot be
satisfied since the non-parametric tests involve no or less restricting assumptions when
compared to the parametric tests.
Main non-parametric tests are
1. One-sample tests
a. one sample sign tests
b. Chi-square test
c. Kolmogorov-Smirnov test
d. Run test for randomness
2. Two- Sample tests
a. Two-sample sign test
b. Median test
c. Mann-Whitney U test (Rank sum test)
3. K-sample test
a. Median test
b. Kruskal- wallis test (H test)
c. Kendall’s coefficient of concordance test

40
8. HOW TO WRITE RESEARCH PROPOSAL

ROLE OF RESEARCH IN UNIVERSITIES

 Leverage funds to expand facilities


 Accord international recognition
 Support staff training
 Enable Universities participate in community service
 Enable academic staff to achieve promotion
 Generates new knowledge for National growth and Development

 Two main purposes: (i) to get a degree and (ii) conduct sponsored and Consultancy
research projects.
 With increasing privatization of higher education and shrinking public grants,
greater stress of Academic Institutions is on generating their own resources.
 Academic institutions need faculty capable of doing independent sponsored and
consultancy research projects—lead the research team
 A good research proposal (RP) is not only necessary for a high quality of research
but also for getting grant from the funding agencies
 A RP must be convincing to anonymous experts who examine it and see whether it
is methodologically sound, conceptually clear and would make significant
contribution to the knowledge on the subject.
 As large No. of RP submitted to the funding agencies for financial assistance, your
proposal need to be excellent and not just very good for getting approved for the
grant.

TWO MAIN TYPES OF FUNDED RESEARCH

1. Research you really want to do:


Find sponsor!
--CSIR, MHRD, UGC, DST, UNDP, Foundations, NGOs, World Bank, DFID, Ministries

2. Topics some sponsor wants to see done:


Industries, organizations, Ministries,
market surveys, evaluation studies, R&D projects,

WHAT IS RESEARCH PROPOSAL?


 A RP is the presentation of an idea that you wish to pursue.

 It is intended to convince funding agency/ RDC that you have a worthwhile


research project and that you have the competence and the work-plan to complete
it.

 A good RP presumes that you have already thought about your project and have
devoted some time and efforts in gathering information, reading and organizing
your thoughts

 A high quality proposal not only promises success for the project, but also
impresses RDC about your potential as a researcher.

41
TYPES OF PROPOSALS
 Letter proposal
– Preliminary expression of interest to an investor
– Developed into full proposal only with donor’s consent
– Most unsolicited proposals should first be in form of letter proposal
 Full proposal
– Often in response to “Request For Proposal” - RFP

COMPONENTS OF A LETTER PROPOSAL


It should contain:
– Summary
– Statement of problem
– Solution
– Budget and budget explanatory notes
– Capability of investigators and the institution (Curriculum Vitae)

COMPONENTS OF A FULL PROPOSAL


 Title page
 Executive summary
 Introduction
 Problem statement
 Project description
 Project hypothesis
 Expected outputs
 Study methods/Research approach
 Budget and budgetary notes.
 Logical framework
 Project management and personnel
 references
 Important attachments

SEVERAL DIMENSIONS FOR RP


1. Content: Basic vs. applied
2. Time Frame: Short-term vs. long-term
3. Scope: Program vs. project
4. Teaming: Single PI vs. multiple
5. Selection: Competitive vs. sole source
6. Client: Scientific vs. “agency”

WHAT SHOULD THE RP ACCOMPLISH?

FOR SCIENTIFIC AGENCIES


Need to convince reviewers of scientific merit, and of your qualifications and
ability to successfully make an important contribution to the state-of-the-art.
FOR Sponsoring Agencies
Need to convince sponsoring agency that you understand the problem, that you
have a realistic approach that is likely to succeed, that could be implemented, and that you
will deliver results that will make them look good.

FEATURES OF A GOOD RP

 Doability: a good RP must be systematic, coherent and doable

42
 Parsimony: simple, unambiguous, no jargons, able to convey what you want to do
and how you want to do.
 The most important ideas are highlighted.
 Consistency among objectives, hypotheses and title
 Contains Executive summary
 A detailed schedule of activities
 Collaboration, if any clearly stated.
 Follows all of the directions given in the proposal guidelines.
 Appendices for for detailed and lengthy materials
 The length consistent with the guidelines of funding agency
 The budget and the proposal narrative are consistent.
 The uses of fund are clearly indicated.
 The qualifications and experience and credentials of PI and Co-PI mentioned

Process of Selection of a Topic


 Three factors: interest, competence and relevance

 Identify broad area and then narrow down—follow general to particular approach

 Study broadly—books, journals and reports: the more you read, the more likely
you will encounter a topic that interest you

 Think when you read; most of the ideas come upon surprisingly while you are
reading. Think and think beyond

 Be inclusive with your thinking: do not try to eliminate ideas too quickly. Build on
your ideas and see how many different research topics you can identify. Be
expensive in yr thinking at this stage—you would not be able to do later.

 Write down your ideas: whenever you have a good idea, no matter how small and
how immature it may be, write it down and save it. Later on, when you check your
‘idea box’, you will be surprised to find how many brilliant ideas you alreadyhave.
 Develop a topic that has interested you throughout your graduate or undergraduate
career
 Think about the top three issues you want to study, then turn them into questions
 Look at class notes; your teachers may have pointed out potential research topics
or commented on unanswered questions in the field
 Talk with professors or advisors about possible topics
 Study broadly to identify gaps in the literature
 Get feedback on a potential topic from your advisor
 Do research to discover why your topic has not been studied before .

 Does the topic appeal to your interest ?


 Will it bring any pecuniary reward or advancement in status ?
 Can you afford the time and expense involved in completing the work ?
 Will you get ample facilities for conducting investigation ? (for e.g. skilled
advice, equipment, field workers etc.)
 Does the topic give sufficient scope to a problem that needs investigations.
 Are the results of practical or utilitarian significance ?
 Does the topic cover a gap in the existing field of knowledge and offers new
solutions, makes advancement in techniques, thought or practices ?

43
WRITING THE PROPOSAL (TITLE)
 The function of title is to encapsulate in a few words the essence of the research

 It should be catchy, small and informative

 Should have some key words reflecting variables, theoretical basis, and purposes,
time, place, etc.

 Leave out phrases like “an investigation into”, “a study of”, “aspects of” as these
are obvious attributes of a research project

THE BACKGROUND

In the background, the researchers should:


• Create reader interest in the topic
• Make sure that the reviewers know in the first few sentences what your project is
about. It is a good idea to start a RP like this: “in the proposed study we seek to
examine..”
• Lay the broad foundation of the problem that lead to the study
• Place the study within the larger context of the scholarly literature
• Reach out to a specific audience
• If a researcher is working within a particular theoretical framework of enquiry, the
theory of inquiry should be introduced
• The efficient use of reference to keep the background short
• A flow in the text, where every issue raised , leads to the research problem.

STATEMENT OF THE PROBLEM

 It must be clear from the text what the nature of the problem is , how it was
identified and why it is significant.
 A problem statement should be presented within a context and the context should
be briefly explained, including a discussion on the conceptual framework.
 The problem might be defined as the issue that exists in the literature, theory or
practice that leads to a need for the study.
 The problem should be clearly defined, making the evaluation easy for the
reviewers/RDC members.
 Effective problem statements answer the question: Why does this research need to
be conducted?

PROCESS OF PREPARING THE STATEMENT

 Prepare your arguments that leads to the statement of yr. prob.


 Make a list of issues you will address in yr account of the context
 Issues should be summarized in a few words and put them in an order sequence so
that it is possible to progress from one issue to the next in a logical manner
 Keep the list brief but make sure it covers all vital issues
 Write yr research problem in one sentence at the end of the list
 check that in yr sequence the issues all relate to the problem and lead logically
towards it.
 if there are gaps in the logic of the argument, add linking issues
 When u think that the argument is cogent, put some ‘flesh on the bones’ in the text
by making full sentence and adding references

44
REVIEW OF LITERATURE
 It provides the background and context for the research problem.
 It shares with the readers the results of other studies that are closely related to the
proposed study.
 It relates to the proposed study to the ongoing dialogue in the literature, filling in
gaps.
 It provides a framework for establishing the importance of the study, as well as a
benchmark for comparing the results of the study with other findings.
 Demonstrate to the reader that you have a comprehensive knowledge of the field.
 Help to avoid statements that imply little has been done in he area.

The literature review must address three areas:

 Topic or problem area: This part of the literature review covers material directly
related to the problem being studied. separate substantive areas.
 Theory area: Investigators must identify the theory which relates to the problem
area.

 METHODOLOGY: Review of the literature related to various aspects of their


chosen method, including design, selection of subjects, and methods of data
collection. It describes research methods and measurement approaches used in
previous studies.

PURPOSE OF THE STUDY

 Purpose of the study should be clearly stated. If it is not clear to the writer, it
cannot be clear to the reader.
 Briefly define the specific area of the research.
 Foreshadow the hypotheses to be tested or the questions to be raised as well as
significance of the study. These will require specific elaboration in separate
sections.
 Should incorporate rationale for the study.
 Key points:
Start with “the purpose of the study is---”.
Clearly identify and define the central ideas of the study
Identify the specific method of inquiry to be used.

OBJECTIVES
 Typically comes after problem definition, motivation and significance.
 Start with overall objective (or aim of the research), and then state two to four
specific objectives
 Write a list of the objectives of yr research. Think as many as u can. When u have
done it, consider each one carefully by asking the questions:
 How will the objective be achieved—methods, resources, skills, time?
 Is it realistically possible to achieve it?
 What results are required to achieve it?
 Is the objective central to yr study?
 Are there any overlaps between the objectives?
 Is there any sequence or hierarchy that link one to another. If so, are they in the
correct order?
 Are there too many objectives to be realistically achievable?

45
HYPOTHESES
 State what kind of relationships u expect to find between variables or factors.
 Hypothesis is particularly necessary in the search for cause and effect relationship.
 It is yr intelligent guess about the possible relationship. Do not hard pressto prove
that yr guess is right. It is more common to disapprove than prove the hypothesis.
A good hypothesis should possess the following features:
 must be conceptually clear
 should have empirical reference
 must be specific
 should be related to available techniques
 Should be related to a body of theory

METHODS AND PROCEDURES

 The RP must specify the research operations you will undertake and the way you
will interpret the results of these operations in terms of your central problem.

 Do not just tell what you mean to achieve, tell how you will achieve.

 A methodology is not just a list of research tasks but an argument as to why these
tasks add up to the best attack on the problem.
 Indicates the methodological steps you will take to answer every question or test
every hypothesis.
 The variables you propose to control and how—experimentally or statistically
 Sampling design, data collection, data analysis techniques, Instruments, etc.

RELEVANCE/SIGNIFICANCE
 Indicate how your research will refine, revise, or extend existing
knowledge.

 Such refinements, revisions, or extensions may have either substantive,


theoretical, or methodological significance.

 Think about implications—how results of the study may affect scholarly


research, theory, practice, policy.

 Will results influence programs, methods, and/or interventions?

 Will results influence policy decisions?

 How will results of the study be implemented, and what innovations will
come about?

POSSIBLE OUTCOME

 Do not “promise mountains and deliver molehills”


 Be quite precise as to the nature and scope of the outcomes and as to who might be
the beneficiaries
 Make sure that the outcome relate directly to the purpose of the research

46
Problems from reviewers’ point of view

 Problem: They may not get the significance of your proposed research.
Solution: Write a compelling argument.
 Problem: They may not be familiar with all your methods.
Solution: Write to the non-expert in the field.
 Problem: They may not be familiar with your lab.
Solution: Show them you can do the job.
 Problem: They may get worn out by having to read 10 to 15 applications in detail.
Solution: Write clearly and concisely, and make sure your application is neat, well
organized, and visually appealing. Leave out anything that is not absolutely
critical.

TOP REASONS WHY PROPOSALS FAIL

 Problem not important enough.


 Study not likely to produce useful information.
 Methods unsuited to the objective.
 Problem more complex than investigator appears to realize.
 Too little detail in the research plan to convince reviewers.
 Over-ambitious Research Plan
 Direction or sense of priority not clearly defined,
 Lack of original or new ideas.
 Proposal lacking enough preliminary data or preliminary data do not support
project's feasibility.
 Insufficient consideration of statistical needs.
 Deadline for submission not met
 Guidelines for proposal - content, format, length e.t.c. not followed
 Study or project not a priority topic to the funding agenc

QUESTIONS BY THE RDC/ EVALUATION COMMITTEE

 Does the title of the RP exactly identify and delineate the area of investigation?

 Are the objectives clearly relate to the problem?

 Does the proposal clearly identify the problem?

 Does the proposal give adequate reasons to show that the study will contribute: to
knowledge; development of theory in the subject; or to either theoretical or
practical methodology?

 Does the proposed RP show: how the research will be structure; how the research
will phased and carried out; techniques & methods to be used and reasons for using
them; the proposed work is practicable and commence within time frame

47
CONVINCE THE REVIEWERS
You need to convince the selection committee that:

 your research proposal promises a notable advancement or innovation in the


discipline or results of importance to a broad range of applications;

 you have identified well-formulated short- and long-term goals;


 attaining these goals would be a significant contribution to the discipline;

 you have a good chance of attaining the goals with the resources available.

CAPTURE THE REVIEWERS’ ATTENTION

Every proposal reader constantly scans for clear answers to three questions:

 What are we going to learn as the result of the proposed project that we do not
know now?
 Why is it worth knowing?
 How will we know that the conclusions are valid?
 The opening paragraph, or the first page at most, is yr chance to grab the attention.
This is the moment to overstate, rather than understate, your point or question.
 Questions that are clearly posed are an excellent way to begin a proposal
 Most roposals are reviewed by multidisciplinary committees.

ABSTRACT VS EXECUTIVE SUMMARY


 An Abstract is an abbreviated summary of a research proposal while An ES is,
basically, anything but a product presentation, and nothing but a persuasive sales
pitch.
 While the abstract aims at convincing the reader to go through the whole document
in order to quash his thirst of information, the ES, at the opposite, aims at
persuading the reader, who is supposed to be a decision maker, to take of forgo an
action.
 Audience of abstract—specialized readers or researchers; of ES—decision-makers
 Scope of Abstract—thesis, articles patents for academics and public goods; of
ES—sponsored and consultancy projects especially as pvt goods
 Content of abstract—mainly technical (problem, scope, methodology, results,
conclusions); of ES– mainly managerial (outcome& benefits, problem solutions
and recommendation)
 Length of Abstract—shorter than the ES
 Style of abstract—technical, static and more academic; of the ES—managerial,
dynamic and enthusiastic

DOS AND DON’TS

 Be constructive (diplomatic) in reviewing others’ work;

BAD: All previous studies are worthless because they failed to recognize the effect of X
on Y. Chen and Smith (1998) tried but their approach was simply wrong. Ours is the first
study to address this question correctly.

BETTER: Previous studies have made important contributions to this challenging


problem, however none of the published studies appear to have completely accounted for

48
the effect of X on Y. A pioneering effort in this direction is described by Chen and Smith
(1998),

Do not assume that your reader/reviewer knows the problem.

DO NOT use language like:


“It is well known…”, “it is obvious”… or “it is trivial to show…”.
BETTER: “It is generally accepted in the literature…”

49
9. REFINING SKILLS IN BASIC STATISTICAL ANALYSIS

Objective
To give participants greater confidence in data analysis using statistical software.

Learning outcomes
On completion of the lecture, participant would be able to understand the following:

 What is regression analysis?


 What is regression good for?
 What are the various types of regression models?
 What kinds of variables can be used in regression model?
 How can we judge how good the predictions are?
 How do we judge how good the coefficient estimates are?
 How to test hypothesis?
 How to construct confidence interval?
 How does regression ‘control’ for variables?
 What are the mediate variables?
 How is the regression results interpreted?
 What can go wrong with regression analysis?
 What are the limitations of regression analysis?

Beneficiaries:

Faculty from engineering, management, pure sciences, and social sciences who have been
actively engaging in guiding research scholars and carrying out sponsored research projects.
A basic understanding of statistics will be useful, but is certainly not essential.

Regression Analysis

It is a statistical method for studying the relationship between a single dependent variable
and one or more independent variables, with a view to estimating and/or predicting the
(population) mean or average value of the former in terms of known or fixed values of the
latter. When independent variable is only one, it is called bivariate analysis and when
independent variables are more than one, it is known as multivariate analysis.

Variables

Dependent Independent

Explained Explanatory
Predictand Predictor
Regressand Repressor
Endogenous Exogenous

Types of Regression models

 Deterministic
 Stochastic/ Probabilistic

If we construct a model, which hypothesizes an exact relationship between


dependent and independent variables, it is called a deterministic model. On the other hand,

50
if we believe that the model should be constructed to allow for random error, then we
hypothesize a probabilistic model. This includes both a deterministic component and a
random error component.

Y = a + bX (deterministic)

Y = a + bX + u (stochastic)

How to derive OLS equations?

Ŷ = a + bX + u

∑e2 = ∑ (Y - Ŷ)2

=∑ (Y – a- bX)2

Differentiating with respect to a and b respectively and equated to zero, we get

∂e2 / ∂a = - 2 ∑ (Y - a - bX) = 0

∂e2 / ∂b = - 2 ∑X (Y -a - bX) = 0

Dividing both the side by –2, we get

∑ (Y - a - bX) = 0, ∑Y - na - b ∑ X = 0,

∑X (Y -a - bX) = 0, ∑XY -a ∑X - b∑X2 = 0

∑Y = na -+ b ∑ X

∑XY = a ∑X + b∑X2

By solving these two normal equations, intercept and slope of linear regression line are
estimated. Values of a and b can directly be estimated by the following formulae:

B = ∑ (Yi - y) (Xi- x) / ∑ (Xi - x)2

Where y is mean of dependent variable and x is mean of independent variable a

= y – bx

Bivariate regression can be done manually. When the number of independent variables is
more than one, it becomes cumbersome to estimate coefficients manually. For this, special
computer packages are used to run the regression programme.

Assumptions

All variables must be measured without error. (No measurement error)

51
For each set of values for the K independent variables, (Xi1, X2j ------- Xkjj),

E (uj)= 0, i.e. mean of error term is zero.

For each set of values for K independent variable, VAR (u j) = σ2 (i.e., the variance of
error term is constant) (homoscedasticity). Violation of the assumption creates the
problem of hetroscedasticity.

For any two sets of values for the K independent variable, COV (u i ,uj) = 0 (i.e., the
error terms are uncorrelated) (Autocorrelation problem).

For each Xi, COV (Xi, u) = 0 (i.e., each independent variable is uncorrelated with the
error terms) (Autocorrelation problem).

There is no perfect collinearity among the impendent variables. (Multicollinearity


problem).

For each set of the value for the K independent variable, uj is normally distributed.

Violation of these assumptions will provide biased estimate of coefficients.

What is Regression good for?

There are two uses of regression: prediction and causal analysis. In a prediction
study, the aim is to develop a formula for making predictions about the dependent
variable, based on the observed values of exogenous variables. For example, an
economist may want to predict next year’s GNP based on such variable as last year’s
GNP, current interest rates, current level of rate of investment and other variables.

In a causal analysis, the independent variables are regarded as causes of the


dependent variables. The aim is to determine whether a particular exogenous variable
really affects the endogenous variable and to estimate the magnitude of that effect, if
any. However, these two uses are not mutually exclusive.

Why is linear regression so popular?

It does two things: for prediction studies, it makes possible to combine many
variables to produce optimal predictions of the endogenous variable and for causal
analysis, it separates the effects of exogenous variables on the endogenous variables.
Sophisticated non-linear regression models are very complicated and require high level
of mathematical skill and specialized software.

What will happen if true relationship is not linear?

In a bivariate analysis, the relationship between the two variables can easily be
identified by plotting the data on graph. If a straight line is formed, linear function is
used. It is difficult to know the linearity among the variable when the number of
exogenous variables are large. If the real relationship is non-linear, and the linear function
is used, the analysis may provide inefficient results. A useful general principle in science
is that when you do not know the true form of a relationship, start with something simple.
A linear equation is perhaps the simplest way to describe a relationship between two or
more variables and still get reasonably accurate

52
predictions. Furthermore, it is essay to modify the linear equation to represent certain
kinds of non-linearity.

What kinds of data are needed for regression analysis?

To do a regression analysis, we first need a set of cases. These cases must be in


sufficient number. Most regression analysts would be reluctant to do a regression with
less than five cases per variable. Because we cannot afford to study the entire
population, we take a probability sample with n cases. It is a sample in which the
probability of selecting any possible sample of size is known or can be calculated. There
are basically three types of probability samples: simple random sample, stratified
samples and cluster samples.

What kinds of variables can be used in Regression model?

1. Interval Scales Variables: Quantitative variables like age, income, year of


schooling, production, and consumption, etc. are measured on some well-defined
scale. For each of these scales, it is reasonable to claim that an increase of a specified
amount means the same thing no matter where you start. These variables are the most
appropriate for regression analysis.
2. Ordinal Scales Variables: Many variables in management and social sciences are
based on the opinion of the respondents. For example, people may be asked whether
they strongly agree, agree, agree somewhat, disagree and strongly disagree to the
statement that “ local government is more effective in disaster management than
central government”. Most people would accept the claim that higher scores
represents stronger agreement with the statement, but it is not at all clear that the
distance between 1 and 2 is same as the distance between 2 and 3, or between 4 or
5. Variables like this are called ordinal scale. Ordinal variables are not appropriate
for regression analysis because the linear equation, to be meaningful, requires
information on the magnitude of changes. If you use such variables, you are
implicitly assuming that an increase or decrease of one unit on the scale means the
same no matter where you start.
3. Nominal Scales variables: There are some variables that do not have any order at all
like gender, marital status, literate-illiterate, rural-urban, poor-rich, etc. If the
variables have two categories, just assign a score of 1 to one of the categories and 0
to the other. Thus binary numbers are generated. Such variables are called dummy
variables or indicator variable. In case of more than two categories, more than one
dummy variable can be generated. Dummy variables are appropriate as exogenous
variables for the regression analysis. If dependent variable is dummy, then some
advanced regression models such as logistic regression are used.

How can we judge how good the predictions are ?

The most common statistics for doing this is coefficient of determination (R 2). The basic
idea behind R2 is to compare two quantities:

 The sum of squared errors produced by the least squares equation and
 The sum of squared errors for a least squares equation with no independent variables
(just the intercept).

When an equation has no independent variables, the least squares estimate for the intercept
is just the mean of the dependent variable.

53
R2 = 1- {∑ (Y - Ŷ) 2 / ∑ (Yi - y)}

R2 = 1- RSS / TSS = ESS/TSS R-


2
= 1- (RSS/ n-k) / (TSS/ n-1)

F = (ESS/ k-1)/ (RSS/ n-k) = (ESS/ k-1) X (n-k) / RSS = ESS (n-k)/RSS (k-1)

Dividing by TSS, we get

F = (n-k) (ESS/ TSS/ (k-1) RSS/ TSS) = (n-k) R2 / (k-1) (TSS – ESS/TSS)

F = (n-k) R2 / (k-1) (1- R2 )

How do we judge how good the coefficient estimates are?

In any regression analysis, we typically want to know something about the accuracy
of the numbers we get when we calculate estimates of regression coefficients. There are three
possible sources of error:

 Measurement error: very few variables can be measured with perfect accuracy,
especially in social sciences.
 Sampling error: In many cases, our data are only a sample from larger population
and the sample will never be exactly like the population.
 Uncontrolled variations: there may be so many other variables that are not under
the control. They can disturb the relationship between the dependent and independent
variables included in the function.

The basic assumption is that the errors occur in a random and unsystematic fashion.
We evaluate the extent and importance of this random variation by calculating confidence
intervals or hypothesis tests.

Confidence intervals give us a range of possible values for the coefficients. Although
we may not be certain that the true value falls in the calculated range, we can be reasonable
confident. Hypothesis tests are used to answer the question of whether or not the true
coefficient is zero.

Confidence Interval at 95% level = b ± 2 S.E.

For instance, if b is 600 and SE is 210, then Confidence interval is 600 + (2 x 210)
= 1020, and 600 – (2 x 210) = 180. We can say that we are 95% confidence that the true
coefficient lies somewhere between 180 and 1020.

In published research using regression analysis, we are more likely to see


hypothesis tests than confidence intervals. Usually, the kind of question people most want
answered is “ does this particular variable really affect the dependent variable?” If a variable
has no effect, then its true value is zero. To know whether a coefficient is significantly
different from zero, T- test is conducted.

T- Statistics = b / SE

54
Then we consult a t table (or computer does this for us) to calculate the associated p value. If
p value is small, it is taken as evidence that the coefficient is not zero.

What is Hypothesis Testing?

It is analogous to decision reached in court of law. Under the court system, a defendant is
brought to trail and he is assumed to be not guilty. For the judge or jury to reject the findings
of guilty, sufficient evidence must be produced. In the court system, error can be made,
innocent defendant can be found guilty and guilty individual cannot be found guilty. Under
a legal system where the evidence must show beyond a shadow of doubt that the assumption
of non-guilt is to be rejected, there is a primary concern for the influential error of the first
type i.e., of convicting an innocent person. Just as defendant is assumed not guilty until
proven guilty, in hypothesis testing, the null hypothesis is assumed true until there is
sufficient evidence that it is not true.

How does regression ‘control’ for variables?

Another use of multiple regression is to examine the effects of some independent


variables on dependent variable while controlling for other independent variables. In
regression analysis, coefficient for one variable can be interpreted while holding the other
variables constant.

How do we interpret regression results?

There are different ways of interpretation of results from different types of variables.
As discussed, there are three types of variables: interval scales, ordinal scales, and nominal
scales.

What can go wrong with regression analysis?


Any tool as widely used as regression is bound to be frequently misused. Nowadays,
statistical packages are so user-friendly that anyone can perform a multiple regression with
a few mouse clicks. As results, many researchers apply it to their data with little
understanding of the underlying assumptions or the possible pitfalls.

1. Inclusion of wrong variables

The following results indicates that the wrongly selected data and variable may provide
misleading results State-wise data for the year 2000-01 was collected to assess the impact
of technical education on economic development by some researchers. The results are
as follows:

PCI = 50.96** + 0.33** TE + 0.08 GE R2 = 0.55 (1)

(2.71) (3.64) (0.47) F-Value = 8.42

PCI = 59.55** + 0.29** TEH + 0.05TEL + 0.005 GE R2 = 0.61 (2)

(3.32) (2.77) (0.63) (30.03 F-Value = 8.42

** Significant at 5% level, figures in parentheses are t-values.

Where:

55
PCI = per capita net state domestic products

TE = Overall technical education

TEH = higher level technical education

TEL = lower level technical education

GE = general education

2. Left out of important variables


There are two possible reasons for putting a variable in the regression model: you
want to know the effect of the variable on the dependent variable and you want to
control for the variable. Obviously, researchers will include variables that are the
main focus of their study, but they may not be so careful about including important
control variables. What makes a control variable important? To answer this question,
you need to answer two other questions: Does a variable have a causal effect on the
dependent variable? Is the variable correlated with those variables whose effects are
the focus of the study? If answer is ‘yes’, the control variable is to be considered
important.
3. Reverse causation
If dependent variable affects one or more independent variables, the resulting biases
can be as serious as those produced by the omission of important variables. This
problem—known as reverse causation—actually can be worse than the omitted
variables problem because: every coefficient in the model may be biased and it is
hard to design a study that will adequately solve this problem.
4. Sample Size
Sample size has a profound effect on tests of statistical significance. The general
principle is this: In a small sample, statistically significant coefficients should be
taken seriously, but a non-significant coefficient is extremely weak evidence for the
absence of an effect. Small samples have low power to test hypotheses. Therefore
larger the sample, more accurate will be the results.
5. Effect of mediate variables
Even if the same is not small, there is another reason for being cautious in concluding
that a variable has no effect: it is possible that other variable mediate the effect of
that variable. If those other variables are also included in the model, the effect of the
variable you are interested in may disappear. Let us consider the following model:
AIRjee = a + b1 SES + b2 INTERMARKS + b3 COACHING + U
If SES has a big effect on INTERMARKS and INTERMARKS, in turn, has a big
effect on AIRjee , then INTERMARKS is an intervening variable between SES
(socio-economic status) and AIRjee. If we put both the variables, the overall effect of
SES on the dependent variable is *. We may mistakenly conclude that SES has no
impact on the AIRjee.
6. Multicollinearity Problem
It is something that nearly all users of multiple regression have heard about.
However, their knowledge is often limited to two facts.

56
 It is bad.
 It has something to do with high correlation among the independent
variables.
It comes in two forms: extreme and near-extreme. Extreme multicollinearity means
that at least two of the independent variables are perfectly correlated. The computer
easily detects this type of problem. Near-extreme multicollinearity means simply
that there are strong linear relationships among the exogenous variables. In the
presence of this problem, regression coefficients tend to have larger SE than they
would have been in its absence.
This problem can de identified by: (1) Estimating correlation among variables and if
value of R is 0.8 and above, there is severe problem, (2) fitting regression between
the two variable and if the R2 is 0.60 and above, there is problem., (3) tolerance value
i.e 1- R2 , it is above 0.4, there is problem, and finally (4) variance inflation factor
(VIF), ie. 1/ tolerance. The square root of VIFtells us how much larger the SE is,
compared with what it would be if that variable were uncorrelated with the other
variables.
Time series data, panel data and aggregate data are more prone to the problem. There
are various solutions: deletion of one or more variable from the model, combining
the collinear variables into an index, and performing joint hypothesis tests.
7. Hetroscedasticity problem
The word homoscedasticity is derived from a Latin phrase meaning ‘ same variance’.
Its opposite is hetroscedasticity which means that the degree of random noise in the
equation varies with the values of the x-variables. It can be checked by plotting the
data on graph. This problem has two effects:
 Inefficiency: in the presence of this problem. OLS is not optimal as it gives
equal weights to all observations. Biased SE: in its presence, SE estimates
can be seriously biased. That in turn leads to bias in test statistics and
confidence interval.
Solutions: WLS and transforming data.
8. Auto-correlation problem
Auto-correlation or serial correlation refers to the case in which the residual error
terms from different observations are correlated. It can be caused by several factors,
including omission of an important explanatory variable or the use of an incorrect
functional form. Whatever the cause may be, it influences the outcome of the
hypothesis testing. Its effect is underestimating the SE of coefficients. This in turn
yields an inflated t-ratio, which means that it is possible that the coefficient will be
found to be significantly different from zero when in fact they are not.
This problem can be diagnosed by Darban Whatson test (DW test). In the presence
of autocorrelation, OLS is not efficient, GLS is preferred.

What are the limitations of regression?


 Based on measurement of central tendency.
 Cannot handle more than one dependent variable simultaneously.

57
How do we run a regression?
 How do we choose a computer package?
 How do we get our data into the computer package?
 What else should we do before running the regression?
 How do we indicate which regression model to run?
 How do we interpret computer output?
 What are the common options in regression packages?
 What are standardized coefficients and how are they interpreted?

58
10. Statistical Software: SPSS
Introduction

In this lecture, we shall attempt to make you aware of the main features of SPSS and to
enrich your skill in application of the software to with a view to analyse the data for
drawing meaningful results. As SPSS is very comprehensive and flexible software,
covering almost all aspects of data processing, cleaning, tabulating, analyzing and
reporting, it would not be feasible to discuss all these aspects in one session. Therefore,
the discussion will be limited only to some of the most relevant features of the software.
SPSS (Statistical Package for the Social Sciences) can take data from almost any
type of file and use them to generate tabulated reports, charts, and plots of distributions
and trends, descriptive statistics, and conduct complex statistical analyses. Our institute
has site license for this software that allows researchers to use the software in any general
access labs on campus. A brief note about the software is given in this write up. Detailed
discussions along with some practical examples will be made during the interactive
session on SPSS.

1. The Data Window

When you start SPSS for Windows, the first thing you will see is the data window. The
data window has a spreadsheet akin to Excel spreadsheet. You can directly enter data into
it. Cases (observations) are recorded in rows and variables in columns. You can cut, past,
and delete rows and columns as per the requirement.

2. The Output Window

The output window displays the output from statistical analyses and any charts you have
run. The table can be edited by double-clicking on the section of the table that you would
like to edit. Furthermore, they can be opened as a pivot table and edited from the pivot table
window so that one may adjust the look of the table.

3. The Syntax Window

There are two approaches to working in SPSS, using a point-and-click approach, and using
SPSS syntax to program commands and routines. The syntax window is used when data is
to be extracted from large databases. CSO and NSSO unit-level data are extracted through
syntax window. It is a very useful record-keeping tool. When using the point-and-click
approach, all commands and procedures are stored "in the background." This information
can then be pasted to the syntax window, using the "Paste" option found in the GUI interface
(i.e., the point-and-click approach). Having this information saved in the syntax window can
save a researcher abundant grief if printouts of output are lost. In addition, you can place
comments in the syntax window to indicate what it is that you are doing.

Syntax Editor window can be opened by doing the following: under File, select New, and
then Syntax. Keep in mind the following rules when:

 Each command must begin on a new line and end with a period (.).
 Most subcommands are separated by slashes (/). The slash before the first
subcommand on a command is usually optional.
 Variable names must be spelled out fully.

59
 Text included within apostrophes or quotation marks must be contained on a si ngle
line.
 Each line of command syntax cannot exceed 80 characters.
 A period (.) must be used to indicate decimals, regardless of your Windows regional
settings.
 Variable names ending in a period can cause errors in commands created by the
dialog boxes. You cannot create such variable names in the dialog boxes, and you
should generally avoid them.

Command syntax is case insensitive, and three-letter abbreviations can be


used for many command specifications. You can use as many lines as you want to
specify a single command. You can add space or break lines at almost any point
where a single blank is allowed, such as around slashes, parentheses, arithmetic
operators, or between variable names.

4. Editing Options in SPSS

There are several default options in SPSS that you may find useful to change. You can edit
these options by going to the edit menu and selecting options. You will get a dialog box.

5. Getting Data into SPSS

There are three main ways to get data into SPSS: (a) creating a new SPSS data file, (b)
opening existing SPSS data files, and (c) importing data from another source such as an
ASCII file, an Excel spreadsheet, etc.

i. Creating new SPSS data files

Data can be directly entered into SPSS similar to an Excel spreadsheet. You may also cut
and paste data from other applications into SPSS. However, if you are going to enter data
directly, you will need to name and define your variables.

ii. Opening existing SPSS system files

Opening existing SPSS files is simple procedure, similar to opening other Windows files.
Select "Open" from the File menu, and you will find a dialog box.

ii. Importing data from an ASCII file

In order to use the data in SPSS, the data must be converted to a file format that SPSS can
recognize, namely something in *.sav format. SPSS can read in ASCII data, which can then
be saved in *.sav format.

iii. Importing data from other file formats

SPSS allows the user to open data directly into SPSS from many different file formats.
For example, SPSS will directly open Excel, SAS, Lotus, and *.dbf (database) files. All
the user needs to do is to go to the File Menu, select "Open", "Data", select the correct file
type from the "Files of Type" drop down menu, and navigate to the file you wish to open.

60
6. Saving Data in SPSS

Saving data in SPSS is very similar to other Windows applications. Select "Save as" from
the File Menu, move to the directory in which you want to save the file, and give the file any
name you desire. SPSS allows you to give your files descriptive names, without having an
eight character restriction. The default file type in which the data file will be saved is *.sav.
If you wish to save it as another file type (i.e., Excel), simply change the file type in the "Save
as Type" drop down menu.

7. Variable and Value Labels

A good data set will include variable and value labels that provide a fuller description of
both the variable and the meaning of each value within a variable (for nominal and ordinal
data; value labels are not needed for continuous data). Unlike the variable names that are
limited to 8 characters, the label may be up to 120 characters long. They give a fuller
description of a variable. Value may be given like 1 for males and 2 for females for a variable
on gender.

Missing Values

SPSS has two types of missing values that are automatically excluded from statistics
computed by procedures: system-missing values and user-missing values. Any variable for
which a valid value cannot be read from raw data or computed is assigned the system-
missing value. User-missing values are values that you tell SPSS to treat as missing for
particular variables. These values are values (other than blanks) that you coded into your
data to indicate non-acceptable responses.

Common menus of SPSS


1. File menu: used for create a new file, open an existing file, read in spreadsheet or
database files created by other software programs.
2. Edit Menu: used to cut, copy, and paste data values from the Data Editor; modify or
copy text from the Viewer or Syntax Editor; copy charts for pasting into other
publications from the Chart Editor, etc.
3. View Menu: used to turn toolbars and the status bar on and off, and turn grid lines on
and off from all window types; and control the display of value labels and data values
in the Data Editor.
4. Analyze: this menu is selected for various statistical procedures such as cross-
tabulation, analysis of variance, correlation, linear regression, and factor analysis.
5. Graphs: graphs menu is used to create bar charts, pie charts, histograms, scatter-
plots, and other full-color, high-resolution graphs. Some statistical procedures also
generate graphs. All graphs can be customized with the Chart Editor.
6. Utilities: used to display information about variables in the working data file and
control the list of variables from all window types; change the designated Viewer
and Syntax Editor, etc.
7. Window: use the Window menu to switch between SPSS windows or to minimize all
open SPSS windows.
8. Help: this menu opens a standard Microsoft Help window containing information on
how to use the many features of SPSS. Context-sensitive help is available through the
dialog boxes.

61
Statistical Analysis through SPSS

1. Descriptive Data Analysis

For some types of variables (especially continuous variables), we will want to


obtain summary statistics other than the number of cases in each category of
the variable. For example, we might be interested in the mean, median, or
standard deviation of a particular variable.

 Select Descriptive Statistics from Analyze menu


 Choose Frequencies/descriptive statistics
 A dialog box appears. Names of all the variables in the data set appear on the left
side of the dialog box.
 Select the variable from the list.
 Click the arrow button right to the selected variable.

Now the selected variable appears in a box on the right and disappears from the left box.
Note that when a variable is highlighted in the left box, the arrow button is pointed right for
you to complete the selection. When a variable is highlighted in the r ight box, the arrow
button is pointed left to enable you to deselect a variable (by clicking the button) if necessary.
If you need additional statistics besides the frequency count, click the Statistics... button at
the bottom of the screen. When the Statistics... dialog box appears, make appropriate
selections and click Continue. In this instance, we are interested only in frequency counts.
The output appears on the Viewer screen

The mean, standard deviation, minimum, and maximum are displayed by default. The
variables are displayed, by default, in the order in which you selected them. Click Options...
for other statistics and display order. The following output will be displayed on the Viewer
screen.

The MEANS procedure displays means, standard deviations, and group counts for
dependent variables based on grouping variables. To run the MEANS procedure:

 Select Analyze/Compare Means/Means...


 Select the dependent variables and independent variable
 Click Options...

 Select Mean, Number of cases, and Standard Deviation. Normally these options are
selected by default. If any other options are selected, deselect them by clicking them
 Click Continue
 Click OK
 The output will be displayed on the Viewer screen.

Editing Pivot Tables

SPSS displays the output in pivot table with cells divided with vertical lines. Sometimes,
the default width of the output table columns is not enough to fit the values that will be
inserted in the cells. To edit a pivot table, double-click the pivot table and this activates the
Pivot Table Editor. Or click the right mouse button on the pivot table and from the context
menu; choose SPSS Pivot Table Object/Open and the pivot table will be ready to edit in its
own separate Pivot Table Editor window.

62
Printing the Output

Once you are satisfied with your analysis you may want to obtain a hard copy of the output.
You may print the entire output on the viewer window, or delete the sections you do not
want before you print. Or you can save the output to a diskette or hard drive and print it later.
The SPSS data file contains the actual data, variable and value labels, and missing values that
appear in the SPSS Data Editor window.

Correlation analysis

A correlation analysis is performed to quantify the strength of association between two


numeric variables. Select Analyze/Correlate/Bivariate... This opens the Bivariate
Correlations dialog box. The numeric variables in your data file appear on the source list
on the left side of the screen.

Linear Regression

 Choose Analyze/Regression/Linear... The Linear Regression dialog box appears.


 Choose the dependent variable
 Choose the independent variables

T-test
T-test is a data analysis procedure to test the hypothesis that two population means are equal.
SPSS can compute independent (not related) and dependent (related) t-tests. For
independent t-tests, you must have a grouping variable with exactly two values (e.g., male
and female, pass and fail). The variable may either be numeric or character. Suppose you
have a grouping variable with more than two categories. You may use the RECODE
(Transform/Recode) command to collapse the categories into two groups. RECODE is a
powerful SPSS command for data transformation with both numeric and string variables.

Select Analyze/Compare Means/Independent-Samples T-test...

 Select Variables
 Select Grouping Variable.
 Click on Define Groups...
 Type 1 for Group 1, and 2 for Group 2.

A t-test with two related variables is performed using the Paired-Samples T-Test from the
Analyze/Compare Means menu.

One-way Analysis of Variance

The statistical technique used to test the null hypothesis that several population means are
equal is called analysis of variance. It is called that because it examines the variability in the
sample, and based on the variability, it determines whether there is a reason to believe the
population means are not equal. The statistical test for the null hypothesis that all of the
groups have the same mean in the population is based on computing the ratio of within and
between group variability estimates, called the F statistic. A significant F value only tells
you that the population means are probably not all equal. It does not tell you which pairs of
groups appear to have different means. To pinpoint exactly where the differences are,
multiple comparisons may be performed.

63
11. DATA ENVELOPMENT ANALYSIS TECHNIQUES

Introduction
Performance of any decision-making unit (DMU) largely depends on how
efficiently inputs are used in the production, marketing and distribution processes. As
resources at its disposal are limited and have competitive use, they are to be optimally applied
to enhance productivity, efficiency and profitability. In order to survive in today’s
competitive environment, it has to improve its performance not only relative to its past
performances but also relative to its competitors in the industry. In this context, it becomes
vital to study inter-firm comparison to identify best practices of efficient firms in resource
utilization and apply them to improve the efficiency of relatively less efficient firms.
In order to identify up to what extent a firm produces output efficiently and cost-
effectively, its economic efficiency is estimated. Economic efficiency is the product of two
efficiencies-technical efficiency and allocative efficiency. Technical efficiency refers to ‘the
firm’s ability to produce the maximum possible output from a given combination of inputs
and technology, regardless of market demand and prices’. Allocative efficiency refers to
the firm’s ability to use the inputs in optimal proportion, given their respective prices.
Classical production theory assumes that given the level of technology, a production
function shows maximum quantity of output that a firm can produce with the given set of
inputs. This means that the firm produces output with 100 per cent technical efficiency.
However, in reality, a firm’s realised output may be below the potential output. Hence,
measurement of individual firm’s technical efficiency becomes essential to know the extent
of deviation of firm’s actual output from its potential output. There are two most popular
approaches to estimate technical efficiency—Data Envelopment Analysis (DEA) and
Stochastic Production Frontier (SPF) Analysis. In this lecture, a detailed discussion will be
held on the DEA and the efficiency estimating procedure will be taught through DEA
software.

Genesis of DEA
Farrell (1957) laid the foundation for new approaches to efficiency and productivity
analysis at the micro level, involving new insights on two issues: how to define efficiency
and productivity, and how to calculate the benchmark technology and the efficiency
measures. He showed how to define economic efficiency and how to decompose it into its
technical and allocative components. He defines technical efficiency as the ratio of observed
output to the maximum potential output that can be attained from given inputs. If a firm’s
actual output is below the potential output, the shortages is regarded as an indicator of
inefficiency. Allocative efficiency (AE) of a firm is defined as the ratio of minimum cost to
the actual cost. It refers to the firm’s ability to use the inputs in optimal proportion, given the
prices of inputs.
Farrell’s paper gave birth to two approaches of efficiency measurement—
deterministic frontier approach and stochastic frontier approach (SFA). Deterministic
frontiers are parametric as well as non-parametric. Aigner and Chu (1968), Afriat (1972),
Richmond (1974), and Schmidt (1976) develop parametric deterministic models, while
Charnes, Cooper and Rhodes (1978) evolve a non-parametric deterministic approach,
popularly known as Data Envelopment Analysis (DEA) which is extended by Banker,
Charnes, and Cooper (1984). SFA is developed independently by Aigner, Lovell and
Schmidt (1977) and Meeusen and Broeck (1977) and later on extended by Jondrow, Lovell,
Materov, and Schmidt (1982) and Battese and Coelli (1992; 1995). Both DEA and SFA are
being applied by the researchers to measure technical efficiency of decision- making units
(DMUs) using cross-sectional as well as panel data. Earlier, economists

64
usually prefer to use econometric methods to measure efficiency. In the 1990s, many of
them have also started using DEA because of its ability to handle multiple inputs and outputs
and its suitability for studying the performance of both manufacturing and service sectors’
DMUs.

STOCHASTIC FRONTIER ANALYSIS


Deterministic frontier approach does not incorporate the measurement errors and
other noise. In it, all deviations from the frontier are assumed to be the result of technical
inefficiency, whereas, stochastic frontier production function (SFPF) accommodates
exogenous shocks. This involves the specification of the error term as being made up of two
components: a symmetric component permitting random variation of the frontier across
firms, and captures the effects of measurement error, other statistical noise, and random
shocks outside the DMUs control and a one-sided component capturing the effects of
inefficiency relative to the stochastic frontier.
Aigner, Lovell and Schmidt (1977), Meeusen and van den Broeck (1977), and
Battese and Corra (1977) propose the SFPF. Consider the following Cobb-Douglas
production function:
yi  f xi ,    i i = 1, 2….., N. (1)
where yi is the logarithm of the (scalar) output (Y) for the ith firm, xi is a (K+1)—row vector
whose first element is “1” and the remaining elements are the logarithms of the K- input
quantities used by the ith firm;  = (  0,  1, ……  k) is a (K+1)—column vector
of unknown parameters to be estimated; i is random errors which is:  i  vi  ui . Thus,
equation (3.1) can be written as:
yi  f  x i   vi  ui i = 1, 2….., N. (2)
vi ~ N (0, v ) is a two sided error term representing the usual statistical noise found in any
2

relationship, and ui  0 is one side error term representing technical inefficiency in the
sense that it measures the shortfall of output (yi) from its maximal possible value given by
the stochastic frontier [f(xi + vi]. The model (2) is known as SFPF because the output values
are bounded above by the stochastic (random) variable exp ( xi + vi). The random error vi
can be positive or negative (Coelli, et al., 1998)
Direct estimates of the stochastic frontier model can be obtained either by
maximum likelihood or corrected ordinary least square (COLS) methods. Introducing
specific probability distributions for vi and ui, assuming that ui and vi are independent and
that xi is exogenous, the asymptotic properties of the maximum likelihood estimators can
be obtained. The model can also be estimated by COLS by adjusting the constant term by
E(ui), which is derived from the moments of the OLS residuals. Once a model of this form
is estimated, one can readily obtain residuals  i = yi - f(xi, ), which can be regarded as
estimates of the error terms  i .
Meeusen et al. (1977) assign an exponential distribution to u, Battese and Corra
(1977) assign a half normal distribution to u, and Aigner et al. (1977) consider both
distributions for u. Parameters to be estimated are , v2 and variance parameter  u 2
associated with u. Either distributional assumption on u implies that the composed error (v
- u) is negatively skewed and statistical efficiency requires that the model be estimated by
maximum likelihood. After estimating production frontier, an estimate of mean technical
inefficiency in the sample is provided by E (-u) = E (v - u) = - (2/)1/2 u in the normal- half
normal case and by E (-u) = E (v- u) = -u in the normal-exponential case.
SFA approach gives less biased measure of efficiency. However, it could only
provide average technical efficiency measures for the sample observations. Although these
aggregate measures are useful in a way, individual observation- specific technical efficiency
measures are more useful from a policy viewpoint. Jondrow, Lovell, Materov

65
and Schmidt (1982) and Kalirajan and Flinn (1983) independently considered the Aigner
et al. (1977) and Meeusen and van den Broeck (1977) stochastic models to predict the
random variable ui under the assumption that  i is known. SFA does not have a priori
justification for the selection of any particular distribution form of the random error term
and resulting efficiency measures may be sensitive to the distributional assumption.
Another problem with SFA is that it cannot handle multiple output variables at a time
(Thanassoulis, 2001).

DATA ENVELOPMENT ANALYSIS


DEA is a linear programming (LP) based multi-factor productivity analysis model
for measuring the relative efficiency of homogenous set of DMUs. It optimises on each
individual observations with an objective of calculating a discrete piecewise frontier
determined by the set of Pareto-efficeint DMUs. It does not require any specific
assumptions about the functional form. It calculates a maximal performance measure for
each DMU relative to all other DMUs in the observed population with the sole requirement
that each DMU lie on or below the external frontier. Each DMU not on the frontier is scaled
down against a convex combination of the DMUs on the frontier facet closest to it (Charnes,
et al. 1994).

There is an increasing concern with measuring and comparing the efficiency of


organisational units such as local authority departments, schools, hospitals, shops, bank
branches and similar instances where there is a relatively homogeneous set of units.

The usual measure of efficiency, i.e.:

is often inadequate due to the existence of multiple inputs and outputs related to different
resources, activities and environmental factors. DEA methodology is developed to solve
this problem. This technique is quite useful for measuring the efficiency of service sector
DMUs, especially the government organization providing public goods.
We have two basic DEA models—CCR model, developed by Charnes, Cooper and
Rhodes in 1978 and BCC model, developed by Banker, Charnes, and Cooper in 1984. CCR
model generalises the single output/input ratio measure of efficiency for a single DMU in
terms of fractional linear programming (FLP) formulation transfroming the multiple
output/input characteristics of each DMU to that of a single “virtual” output and “virtual”
input. The model defines the relative efficiency for any DMU as a weighted sum of outputs
divided by a weighted sum of inputs where all efficiency scores are restricted to lay between
zero and one. An efficiency score less than one means that a linear combination of other
units from the sample could produce the same vector of outputs using a smaller vector of
inputs. The score reflects the radial distance from the estimated production frontier to the
DMU under consideration. Variables in the model are input- output weights and the LP
solution produces the weights most favourable to the unit under reference. In order to
calculate efficiency scores, FLP is converted into LP by normalising either the numerator or
the denominator of the fractional programming objective function. In case of output –
maximization DEA program, the weighted sum of inputs is constrained to be unity to
maximize weighted sum of outputs, while in input-minimization DEA program, the
weighted sum of outputs is constrained to be unity to minimize weighted sum of inputs.
CCR model is based on constant returns to scale assumption. Under this assumption, if the
input levels of a feasible input-output correspondence are scaled up or down, then another
feasible input-output correspondence is obtained in which the output

66
levels are scaled by the same factor as the input levels (Thanassoulis, 2001).
Another version of DEA was given by Banker, Charnes and Cooper (1984). The
primary difference between BCC and CCR models is the convexity constraint, which
represents the returns to scale. The CCR model is based on the assumption that constant
return to scale exists at the efficient frontiers whereas BCC assumes variable retunes to scale
frontiers. CCR efficiency is overall technical efficiency (OTE), known as global technical
efficiency whereas BCC efficiency is the pure technical efficiency (PTE) net of scale-effect,
known as local technical efficiency. If a DMU scores value of both CCR- efficiency and
BCC-efficiency one, it is operating in the most productive scale Size (MPSS). If a DMU has
BCC-efficiency score one and CCR-efficiency score less than one, it is operating locally
efficiently but not globally efficiently due to the scale size of the DMU. Thus, inefficiency
in any DMU may be caused by the inefficient operation of the DMU itself (BCC-
inefficiency) or by the disadvantageous conditions under which the DMU is operating (scale-
inefficiency). Scale efficiency is estimated by dividing the CCR- efficiency from the BCC-
efficiency for a DMU. Another technique based on DEA is Malmquist Productivity Index
(MPI) proposed by Caves, et al. in 1982. The MPI is defined with distance functions. For
panel data, distance functions permit to describe multiple input-output production
technologies without behavioural objectives such as profit maximisation or cost
minimisation. The detail description of the MPI model is presented in the chapter 7.

GROWTH OF DEA APPROACH


Since the publication of the seminal paper of Charnes, et al. (1978), a numerous
research papers have been written on both theoretical and applied aspects of the DEA
approach. On theoretical facets, a number of DEA models and their extensions have been
made. Weight restriction, non-discretionary inputs and outputs, categorical inputs and
outputs, sensitivity analysis, input congestion, returns to scale, bad outputs, supper
efficiency, target setting etc. are the major aspects on which extension of DEA models have
been made. In parallel with the theoretical development, a wide range of empirical studies
have also been published which evince the inexhaustible potential of DEA for innovative
applications.
Originally, DEA was applied to estimate the relative efficiency of non-profit
organizations such as educational institutions, government hospitals, public utilities, etc.
where market prices are not generally available. However, its ability to use multiple output-
input variables without a priori underlying functional form assumption has motivated the
researchers to extend it to the profit-organizations also. Some of the areas where applications
of DEA have been made frequently by the researcher are: banks, academic institutions,
hospitals, public utilities like gas, water, electricity supply, police services, transport
services, agriculture, and industry. Moreover, development of DEA- based MPI for
measuring total factor productivity growth and its decomposition into technical efficiency
change and technical progress is the significant achievement in the field of productivity
analysis.

Terminology of DEA

1. Benchmarking: It is the process of comparing the performance of an individual


organization against a benchmark, or ideal level of performance. Benchmarks can be set
on the basis of performance over time or across a sample of similar organizations or
some externally set standard.
2. Best Practices: Best practices refer to the set of management and work practices that
results in the highest potential or optimal quantity and combination of outputs for a

67
given quantity and combination of inputs (productivity) for a group of similar
organisations.
3. Decision Making Unit (DMU): The term DMU is first used by Charnes, Cooper and
Rhodes in 1978 in their seminal paper on DEA. DMU means individual production unit
producing tangible or intangible output under private, cooperative, government or any
other organization’s ownership. It comprises manufacturing firms, banking and
insurance companies, transport and communication firms, hospitals, schools and
universities, other service providing firms, government organizations, local
governments, municipal corporations, etc. For measuring the relative performance of
individual DMUs, the set of DMUs should face the same fundamental characteristics in
terms of environment and technological constraints. If someone wants to assess the
efficiency of educational institutions, the DMUs in the dataset should be homogeneous.
For instance, school cannot be compared with universities.
4. Economies of Scale: It refers to increasing a firm’s size until it obtains the minimum
cost per unit of output.
5. Inefficiency: The amount by which a firm lies below the estimated frontier can be
regarded as measure of inefficiency. Under the given technology, if actual output of a
firm equals the potential output, the firm would not have inefficiency in the production.
6. Most Productive Scale Size (MPSS): It is that size at which a DMU obtains 100 percent
pure technical efficiency and scale efficiency. This is possible when a DMU attains an
efficiency score of one under constant returns to scale technology assumption.
7. Pareto Efficiency: A DMU is Pareto-efficient if it is not possible to reduce any one of
its input levels without increasing at least another one of its input levels and /or without
lowering at least one of its output levels.
8. Peer: A peer is an efficient DMU which acts as a reference point (in terms of input and
output mix) for inefficient DMUs.
9. Productivity: It can be defined as the ratio of a measure of output of one or more of
inputs used to produce the output. There are two main concept of productivity: partial
(single) factor productivity and total (multiple) factor productivity. Partial factor
productivity is a simple ratio of volume of total output to the volume of total quantity of
a single input. For instance, labour productivity is measured by dividing the total
production of a firm by the number of total workers (or total hours of work) of that firm.
Partial factor productivity concept cannot provide the true performance of a resource.
For instance, labour productivity in a firm can be raised either by improving the quality
of human resource through training and retraining or simply by retrenching the
manpower and using more capital and technology intensive production process.
Therefore, total factor productivity (TFP) index is measured to assess the overall
productivity of a firm or industry. TFP is a ratio of weighted sum of output to the
weighted sum of inputs. The TFP index having value greater than one indicates to the
positive growth in the productivity and a value of TFP index less than one means negative
growth. If value of the index is equal to one, there is no growth in the productivity.
Various methods have been developed to compute TFP. In this study, we apply a non-
parametric DEA-based method, known as MPI to measure the TFP growth in the sugar
mills.
10. Production Frontier: Production frontier is what it gives maximal output that can be
achieved with the given amount of inputs.

68
11. Returns to Scale: It refers to a measure of change in output resulting from a change in
the scale of a firm’s operation as determined by its input usage. There are three returns
to scale—increasing, constant and decreasing. When inputs are doubled and output
increases more than double, it is increasing returns to scale. If the output increases in the
same proportion as inputs are increased, it is constant returns to scale. Decreasing returns
to scale exists when output increases less than the proportional increase in the inputs.
12. Pure Technical Efficiency: It refers to the proportion of technical efficiency which is
attributed to the efficient conversion of inputs into output. Effect of size of plant on the
efficiency is neutralized in it. It is also known as managerial efficiency or local
efficiency. It is estimated through BCC DEA model which is based on the variable
returns to scale technology assumption. Value of pure technical efficiency score lies
between zero and one.
13. Technical Efficiency: Technical efficiency refers to the firm’s ability to produce the
maximum possible output from a given combination of inputs and technology. In DEA,
technical efficiency is determined by the difference between the observed quantities of
a DMU’s output (s) to input (s) and the ratio achieved by best practice DMUs. It is,
therefore, a relative technical efficiency, not the absolute technical efficiency. Its value
lies between zero and one. If a DMU is on the production frontier and does not have any
input or output slack, its technical efficiency score will be equal to one. Technical
efficiency can be decomposed into scale efficiency and pure technical efficiency.
14. Scale Efficiency: The extent to which an organization can take advantage of returns to
scale by altering its size towards optimal scale. In DEA analysis, scale efficiency for a
DMU is calculated by dividing CCR efficiency score from BCC efficiency score. As
BCC score is more than or equal to CCR score, value of scale efficiency score lies
between zero and one.
15. Slacks: Slacks in DEA refer to the extra quantity by which an input (output) can be
reduced (increased) to obtain technical efficiency after all inputs (outputs) have been
radially reduced to reach the production frontier.

DEA MODELS
Basic DEA models are described as:

CCR Model
This model generalizes the usual input/output ratio measure of efficiency for a given
firm in terms of a fractional linear program formulation. Mathematically, the relative
efficiency of the kth DMU is given by:

u rk y rk
Max h k = r 1
(1)
 vik xik
m

i1
Subjected to:

u
s
rk y rj
r1
m
 1  j = 1…. k…. n
v ik xij
i1

69
urk    r = 1…... s
m

u ik xik
i1
vik
m    i=1… ..... m
v ik xik
i1
Where:

y rk = the amount of the r th output produced by the k th DMU; x = the amount of the
ik

i th input used by the k th


DMU; urk = the weight given to the r th output of the k th DMU;
v ik = the weight given to the i th input of the k th DMU; n= no. of DMUs ; s= no. of
outputs; m= no. of inputs; and  = a non-Archimedean (infinitesimal) constant
The above objective function is reformulated in LP problem as follows:
s

Max w k =   rk yrk (2)


r 1
Subjected to

 ik xxk 1
i 1
s m

 rk yrj   ik xij  0 j  1 ..... n


r 1 i 1

 rk   r  1 ..... s

 ik   i  1 ..... m

Since the number of DMUs is generally larger than the total number of inputs and
outputs, solving the dual of the model can reduce the computational burden.
Mathematically, the dual formulation of the above model is:
Min z =     S    S
s m
 
(3)
k k rk ik
r 1 i 1
Subjected to
 n
jk rj rk rk

 y  S  y  r  1 ......... s
j1

 S   x i  1 ......... m
 n
x
jk ij ik k ik
j 1

 jk  0 j  1 ........ n
 k free

S  , S   0 ;r  1.....s, i  1 .... m
rk ik
Where:
70
S = Slacks in the i th input of the k th DMU; S  = slacks in the r th output of the
rk ik

k DMU;  jk ' s = non-negative dual variables; k (scalar) is the (proportional) reduction


th

applied to all inputs of DMU k to impose efficiency. If for DMU k,  * k =1 and all slacks

are zero, it is Pareto efficient. The non-zero slacks and (or)  *k  1 identify the sources and
amount of any inefficiency that may exist in the DMU under reference.

71
BCC Model
The primary difference between BCC model and CCR model is the convexity
n

constraint. In the BCC model  jk s are restricted to summing to one (i.e.   jk =1). If we
j1
n n

impose   jk  1 instead of   jk =1, then the model is converted into Non-Increasing


j1 j1

n n

Returns to Scale (NIRS) model. Similarly if we impose  jk  1 instead of  jk =1,
j1 j1

then the model is known as Non-Decreasing Returns to Scale (NDRS) model.


The technical efficiency measured by the CCR model includes the effects of both
scale and technical efficiencies. The BCC model measures the pure technical efficiency net
of scale effect. It captures the pure resource-conversion efficiencies, irrespective of whether
the DMUs operate at increasing, decreasing or constant returns to scale. Scale efficiency of
a DMU is estimated dividing the CCR efficiency score by the BCC efficiency score. As BCC
efficiency score is more than or equal to the CCR efficiency score, value of scale efficiency
score will be less than or equal to one.

X CRS

e
m VRS
d
l
q c

h i k

b
.

o a r n
Y
Figure 1: Comparison of CRS and VRS Frontiers

Figure-1 makes the comparison of CCR and BCC models. The CCR model is based
on constant returns to scale (CRS) technology assumption and the BCC model is based on
variable returns to scale (VRS) technology assumption. The CRS surface is the straight-line
oicm and the VRS surface is abcde.

72
Efficiency of any interior point (such as ‘k’) is intuitively given by the distance
between the envelope and itself. Typically, such a distance may be measured either
horizontally along the x-axis or vertically along the y-axis, providing an input-oriented or
output-oriented measure, respectively. For example, using an input orientated measure,
technical efficiency of DMU ‘k’ will the measured by hi/hk in the CRS technology
assumption and by hj/hk in the VRS technology assumption. A measure of scale efficiency
is providing by the ratio hi/hj. A DMU at point ‘c’ is operating at most productive scale size
(MPSS).

Advantages and Limitations of DEA

DEA methodology has several advantages over the traditional regression-based


production function approach. A few of them are: it can handle multiple inputs and outputs;
it doesn't require any assumption of a functional form relating inputs to outputs; DMUs are
directly compared against a peer or combination of peers; inputs and outputs in the model
can have different units; it sets targets for inefficient DMUs to make them efficient, it also
identifies slacks in inputs and outputs; and it estimates a single efficiency score for each
DMU. This approach also has certain advantages over the SFA. Apart from not imposing any
functional form on production or technology, it makes the minimum assumptions about the
underlying technology. SFA can use only single output variable, while DEA can use more
than one output variables. In case of the stochastic frontier approach the parameter estimates
are sensitive to the choice of the probability distributions specified for the disturbance terms
(Ray, Seon n.d.), whereas DEA does not require any functional form. However, DEA has
several limitations also, such as:
1. Since DEA is an extreme point technique, noise such as measurement error can
cause significant problems.
2. In DEA, efficiency is defined relative to the efficiency of other firms under
consideration. It is not an absolute measure.
3. Since DEA is a nonparametric technique, statistical hypothesis testing is difficult8.
4. DEA scores are sensitive to input-output specification and the size of sample.

Precautions to be taken

1. Since, no hypothesis testing is possible, data accuracy must be given priority.


2. In order to make sufficient discrimination between DMUs, sample-size should be
adequate. It should be at least three times greater than the sum of input-output
variables.
3. Most important exercise in DEA is the identification of input-output variables.
Regression analysis can be conducted to identify the best fit in output and input
variables. Zero and negative values of any input or output should be avoided.
Variables in the model should be as few as possible.
4. Data scaling should be done before applying DEA so that input-output variables do
not have excessively large values.

Post-DEA Analysis

A key aspect of DEA is incorporating environmental factors into the model as either inputs
or outputs. Resources available to units are classed as inputs whilst activity levels or
performance measures are represented by outputs. One approach to incorporating
environmental factors is to consider whether they are effectively additional resources to

73
the unit in which case they can be incorporated as inputs, or whether they are resource users
in which case they may be better included as outputs. For example in comparing efficiency
of schools research has indicated that in general parents of higher educational attainment
provide greater support to their children and therefore are effectively an additional resource
to the schools and should be classed as an input. Tobit regression is an appropriate method to
study the impact of environmental and background factors on efficiency. It assumes that the
data are truncated, or censored, above or below certain values. In DEA, values of dependent
variable are censured as they range between 0 and 1.

Productivity Measurement Methods

Introduction

Productivity growth is one of the major determinants of competitiveness and


profitability of a firm. A higher level productivity growth may result in lower product
prices, better remunerations and working conditions to the employees, better returns to
the investors and adequate surplus to the firm for plant expansion and modernization.
Technical change and technical efficiency change are the two sources of productivity
growth. A study of these sources is crucial for identifying the factors that are responsible
for the productivity stagnation and for adopting appropriate measures at firm, industry
and government levels to improve the productivity. In this chapter, we examine the
productivity growth and its sources in the sugar mills of Uttar Pradesh. A non-parametric
approach, known as Malmquist productivity index (MPI) is applied on the panel data of
seven years collected from 36 sugar mills of the state. Output- oriented DEA method is
used for estimation of the TFP growth and its decomposition into technical efficiency
change and technical progress. Technical efficiency change is further decomposed into
pure technical efficiency change and scale efficiency change. The study also examines
inter-sector and inter-region variations in the TFP growth and its components.

Productivity Measurement Approaches


Most commonly used measures of productivity are partial or single factor
productivity and total factor productivity. Single factor productivity is the ratio of total output
to the quantity or number of the factor for which productivity is to be estimated. Single factor
productivity provides a distorted view about the contribution of a factor to the total
production. For instance, partial productivity of labour can be increased by reducing
quantity of labour and increasing quantity of capital in the production unit. Therefore,
concept of total factor productivity (TFP) is more relevant in context of resource use
efficiency. TFP is defined as the ratio of weighted sum of output to the weighted sum of
inputs. Over the last three decades, several theories and methods of TFP measurement have
been developed. Before the mid 1990s, most studies estimated TFP growth by growth
accounting approach (Frank et al., 2002). The approach is based on unrealistic assumptions
of perfect competition and constant returns to scale. It assumes that a firm operates on its
production frontier, implying that the firm has 100 per cent technical efficiency. Thus, TFP
growth measured through this approach is due to technical change, not due to technical
efficiency change (Mawson, et al., 2003). Parametric (stochastic frontier analysis) and non-
parametric (DEA-based MPI) are the other two productivity measurement approaches which
use panel data for estimation of productivity of individual production units. These
approaches do not assume that all production units operate at 100 per cent technical
efficiency. According to the MPI approach, TFP can increase not only due to technical
progress (shifting of frontier) but also due to

74
improvement in technical efficiency (catch up). The approach has become quite popular
because: (i) it does not require price data, therefore suitable when price data are not available
or price data are distorted; (ii) it rests on much weaker behavioral assumptions, since it does
not assume cost minimizing or revenue maximizing behaviour; (iii) it uses panel data and
provides a decomposition of productivity change into two components— technical change
and technical efficiency change. Technical change reflects improvement or deterioration in
the performance of best practice firms, while technical efficiency change reflects the
convergence toward or the divergence from best practice on the part of the remaining firms.
The significance of the decomposition is that it provides information on the source of overall
productivity change in the firms.

THE MPI Model


The MPI was initially introduced by Caves, Christensen and Diewert (CCD) in 1982
and was empirically applied by Fare, Grosskopf, Lindgren and Roos (FGLR) in 1992 and
Fare, Grosskopf, Norris and Zhing (FGNZ) in 1994. Since then, several extended versions
of MPI and its decomposition have been developed by the researchers. A few of them are:
Ray and Desli (1997), Simer and Wilson (1998), Grifell-Tatje and Lovell (1999), Balk
(2001), Kumar and Russell (2002) and Chem and Ali (2004).
DEA analysis is static in nature as the performance of a mill is assessed in response
to the best practice mills in a given year. The shift of frontier overtime is not accounted for
by this assessment. To account for this dynamic shift, the MPI model is used. Since it is also
capable of decomposing the productivity growth in technical efficiency change and
technical progress, it is able to shed light on the mechanism of productivity c hange (Ma, et
al., 2002).
The MPI is defined with distance functions. Distance functions allow us to describe
multiple input-output production technology without the need to specify a behavioural
objective such as cost minimization or profit maximization ( Coelli, et al., 1998). Both output
and input distance functions can be defined. With the given input vector, an output distance
function maximizes the proportional expansion of the output vector, whereas in case of input
distance function, the aim is to mi nimise the input vector, given the output vector.
The output-oriented Malmquist TFP change index between period t (the base period)
and period t+1 is given by

t t 1 t 1 t 1
( yt 1t , xtt1 ) ]1 2 (1)
M 0t1 ( y t 1 , x t 1 , y t , x t )  [ D0 (t y t , x t ) * D0 t 1
D ( y ,x ) D (y,x )
Equation (1) is the geometric mean of the tw0o indices—technical efficiency change
0

and technical change. The first is estimated with respect to period t technology and second
with respect to period t+1 technology. Assuming that Dt0 ( yt , xt )  1 and
D0t ( yt 1 , xt 1 )  1, equation (7.1) can be rewritten as

t t t
t 1 t 1 Dt 1 ( yt 1 , xt 1 ) Dt ( 0yt 1 , xt 1 ) D ( y ,0 x ) 1 (2)
M0 ( y ,x , y , x )  0
t1 t t
[ * ] 2
t t t t 1 t 1 t 1 t 1 t t
D 0 (y , x ) D (0 y , x ) D ( 0y , x )
Where, the ratio outside the square brackets in equation (7.2) represents technical
efficiency change (effch) and the expression in the square brackets indicates technical
change (techch). Thus, MPI can be decomposed into change in technical efficiency
(catching up) and into change in frontier (technical progress):

75
Dt01 ( yt 1 , xt 1 ) (3)
effch 
Dt 0( yt , xt )
t t 1 t 1 Dt ( yt , xt ) 1
techch  [ D 0( y , x ) * t 0 ] 2 (4)
1 t t
Dt 1 ( yt 1 , xt 1 ) D ( y , x )
Technical e0fficiency change 0 (effch) measures the change in technical efficiency
between periods t and t+1 with respect to the production possibilities existing in each period.
Technical change (techch) is the geometric mean of the shifts in frontier at the factor ratios
of periods t+1 and t respectively. The value of the MPI greater than 1 means productivity
growth and a value less than 1 means deterioration in productivity. The same is applicable
to each of the components of the Malmquist Productivity Index.
Figure-1 describes the MPI with one input (x) and one output (y) under CRS
technology and its decomposition into efficiency change and technical change, MPI under
CRS technology indicates a rise in potential productivity as the technology frontier shifts
from t to t+1. Points P and R in the figure represent the input-output combinations of a
production unit (Mill) in periods t and t+1respectively. In both periods, the unit is operating
below the production possibility frontier.

y
Frontier i n
period t+1

Y3
yt+1 Frontier in
Y2 period t

Y1
Yt

0 Xt Xt+1 x

Figure 1: Malmquist Productivity Indices using CRS Technology

Technical efficiency change and technical change are represented by the distance
functions. In terms of the distances along the y-axis, the index becomes

Mt+1(y t+1, x t+1, yt, xt ) = (y t+1/ Y3) / (yt / Y1) [(yt+1/ Y2) /( yt+1/ Y3) * [(yt / Y1)/( yt / Y2)]1/2
(7.5)

Efficiency change = (y t+1/ Y3) / (yt / Y1) (7.6)

Technical Change = [(yt+1/ Y2) /( yt+1/ Y3) * [(yt / Y1)/( yt / Y2)]1/2 (7.7)

76
In order to calculate the productivity of the year between t and t+1, we need to solve
four different LP problems: D t ( x t, y t), D t+1( x t, y t ), D t ( x t+1, y t+1 ), and D t+1( xt+1, y t+1
). Mathematical formulations are shown in Box 1. If technical efficiency change is to be
decomposed into scale efficiency change and pure technical efficiency change, two more LP
problems are to be solved by putting the convexity restriction in (7.8) and (7.9), that is, one
would estimate these two distance functions relative to VRS technology
(Coelli et al., 1998).

Box-1
Linear Programming Formulation of MPI

The MPI requires the following four LP problems:

[d X ,Y   max
 ,
subject to


( 7.8)

[d X ,Y   max
 ,
subject to

(7.9)

[d X ,Y   max
 ,
subject to



(7.10)

[d X ,Y   max
 ,
subject to

 
(7.11)
76
12. ADVANCED MULTIVARIATE ANALYSIS

In this lecture, we shall discuss two advanced topics of multivariate analysis. They are
discriminate analysis and factor analysis

Discriminant Analysis

Researchers often wish to classify people or objects into two or more groups. One
might need to classify persons as buyers or non-buyers, good or bad credit risks or superior,
average or poor performers in some activity. The objective is to establish a procedure to find
the predictors that best classify subjects.
Discriminant analysis joins a nominally scaled criterion or dependent variable with
one or more independent variables that are interval or ratio scaled. Once the discriminant
equation is found, it can be used to predict the classification of a new observation. The
researchers may be interested to check whether the predictor variables discriminate among
the group. More specifically, it is essential to identify which independent variable is more
important when compared to other predictor variables. This is done by calculating a linear
function.
Discriminant function analysis, known as discriminant analysis (DA) is used to
classify cases into the values of a categorical dependent, usually a dichotomy. It is applied
when grouping variable has only two categories. Multiple discriminant analysis (MDA) is
used to classify a categorical dependent that has more than two categories. MDA is
sometimes also called discriminant factor analysis or canonical discriminant analysis.

DA shares all the usual assumptions of correlation, requiring linear and homoscedastic
relationship. Like multiple regression, it also assumes proper model specification (inclusion
of all important independents and exclusion of extraneous variables). It also assumes the
dependent variable is a true dichotomy.

Objectives of DA

 To classify cases into groups using a discriminant prediction equation.


 To test theory by observing whether cases are classified as predicted.
 To investigate differences between or among groups.
 To determine the percent of variance in the dependent variable explained by the
independents.
 To assess the relative importance of the independent variables in classifying the
dependent variable.
 To discard variables which are little related to group distinctions.

Key Terms and Concepts


 Discriminating variables: These are the independent variables, also called
predictors.

 The criterion variable. This is the dependent variable, also called the grouping
variable.

 Discriminant function: A discriminant function is a latent variable that is created


as a linear combination of discriminating (independent) variables. This is

77
analogous to multiple regression, but the b's are discriminant coefficients which
maximize the distance between the means of the criterion (dependent) variable.

 The eigenvalue, also called the characteristic root of each discriminant function,
reflects the ratio of importance of the dimensions which classify cases of the
dependent variable. There is one eigenvalue for each discriminant function. The
eigenvalues assess relative importance because they reflect the percents of variance
explained in the dependent variable, cumulating to 100% for all functions.

 The relative percentage of a discriminant function equals a function's eigenvalue


divided by the sum of all eigenvalues of all discriminant functions in the model.
Thus it is the percent of discriminating power for the model associated with a given
discriminant function.

 The canonical correlation, R*, is a measure of the association between the groups
formed by the dependent and the given discriminant function. When R* is zero, there
is no relation between the groups and the function. When the canonical correlation is
large, there is a high correlation between the discriminant functions and the groups.
R* is used to tell how much each function is useful in determining group differences.
An R* of 1.0 indicates that all of the variability in the discriminant scores can be
accounted for by that dimension.

 The discriminant score, also called the DA score, is the value resulting from
applying a discriminant function formula to the data for a given case. The Z score is
the discriminant score for standardized data.

 Unstandardized discriminant coefficients are used in the formula for making the
classifications in DA, much as b coefficients are used in regression in making
predictions. The constant plus the sum of products of the unstandardized coefficients
with the observations yields the discriminant scores. That is, discriminant
coefficients are the regression-like b coefficients in the discriminant function, in the
form L = b1 x1 + b2x2 + ... + bnxn + c, where L is the latent variable formed by the
discriminant function, the b's are discriminant coefficients, the x' s are discriminating
variables, and c is a constant. There will be no constant when the data are
standardized or are deviations from the mean. The discriminant function coefficients
are partial coefficients, reflecting the unique contribution of each variable to the
classification of the criterion variable. The standardized discriminant coefficients,
like beta weights in regression, are used to assess the relative classifying importance
of the independent variables.

 Standardized discriminant coefficients, also termed the standardized canonical


discriminant function coefficients, are used to compare the relative importance of the
independent variables, much as beta weights are used in regression.

 Tests of significance: Wilks' lambda is used to test the significance of the


discriminant function as a whole. In SPSS, the "Wilks' Lambda" table will have a
column labeled "Test of Function(s)" and a row labeled "1 through n" (where n is
the number of discriminant functions). The "Sig." level for this row is the
significance level of the discriminant function as a whole. A significant lambda
means one can reject the null hypothesis that the two groups have the same mean
discriminant function scores and conclude the model is discriminating.

78
 ANOVA table for discriminant scores is another overall test of the DA model. It is an
F test, where a "Sig." p value < .05 means the model differentiates discriminant scores
between the groups significantly better than chance (than a model with just the
constant).

 (Variable) Wilks' lambda also can be used to test which independents contribute
significantly to the discrimiinant function. The smaller the variable Wilks' lambda for
an independent variable, the more that variable contributes to the discriminant
function. Lambda varies from 0 to 1, with 0 meaning group means differ (thus the
more the variable differentiates the groups), and 1 meaning all group means are the
same. The F test of Wilks's lambda shows which variables' contributions are
significant. Wilks's lambda is sometimes called the U statistic. In SPSS, this use of
Wilks' lambda is in the "Tests of equality of group means" table in DA output.

Method of Estimation

DA is done by calculating a linear function of the form:

Di = d0 + d1X1 + d2X2 + .......... dpXp

Where

Di is the score on discriminate function i; the Xs are the values of the discriminating
variable used in the analysis; di s are weighting coefficients; and d0 is constant.

A single discriminant equation is required if the categorization calls for two groups. If three
groups are involved, it requires two discriminant equations. If more categories are called for
in the dependent variable, it is necessary to calculate a separate discriminant function for
each pair of classification in the criterion group. Here we shall describe two- group DA.

Let X1 and X2 be the predictor variables; G1 and G2 two groups and n1 and n2 number of
set of observations in G1 and G2, respectively.

Calculation Process

1. Find the mean of X1 and X2. Let 1(G1) be the mean of X1 and 2(G2) be the mean of
X2
in Group-2. Also find the aggregate mean of X1 and X2..
2. In each group, find  X 2,  X 2 and  X X
1 2 1 2.

3. Define the linear composition as Di = d1X1 + d2X2 and find the value of d1 andd2 by
solving the following normal equations.

d1(X1 - 1)2 + d2  (X1 – 1) X2 – 2) = 1(G2) – 1(G1)


d1(X1 - 1) (X2 – 2) + d2  (X2 – 2)2 = 2 (G2) – 2(G1)

The sum of squares in the above normal equation can be substituted with the following
simple formula.
(X1 - 1)2 =  (X1 – 1 (G1))2 +  (X1– 2 (G2))2

79
(X2 – 2)2 =  (X2 – 2 (G1))2 +  (X1– 2 (G2))2

 [(X1 – 1) (X2 – 2)] =  [(X1 – 1(G1)) (X2– 2 (G1))] +  [(X1 – 1(G2)) (X2– 2 (G2))]

where
 (X1 – 1 (G1))2 =  X 2 – n1  2
 (X1 – 1 (G2))2 =  X 2 – n2  2
1 1 (G1)

1 1 (G2)
 (X2 – 2 (G1))2 =  X2 2 – n1 2 2 (G1)
2 2 2
 (X2 – 2 (G2)) =  X – n2 
2 2 (G2)
 (X1 – 1 (G1)) (X2 –2 (G1) =  X1 X2 - n11(G1) X2(G1)
 (X1 – 1 (G2)) (X2 –2 (G2) =  X1 X2 - n21(G2) X2(G2)

4. In each group, find the discriminate score for each combination of the variables X1 and
X2. Then find the average of the discriminate scores of each group and also the grant mean
of the discriminate scores for the entire problem.

5. Find the variability between group (VBG) using the following formula:

VBG = n1(S1- S)2 + n2(S2- S)2

Where S1 and S2 are the means of the discriminant scores in the group-1 and group-2,
respectively and S is the aggregate mean of the entire problem
.
6. Find the variability within group VWG using the following formula:

n1 n2
VWG =  (S1j- S1) + (S2j - S2)2
2
j=1 j=1

where S1j and S2j are the discriminant scores for the jth set of observations in the group-1
and group-2, respectively; S1 and S2 are the mean of discriminant scores of group-1 and
group-2.

7. Find the discriminate ratio (K) K = VBG/ / VWG


This is the maximum possible ratio between the variability between groups and the
variability within group.

Example
The director of a management school wants to do discriminate analysis concerning the effect
of two factors, namely, the yearly spending (Rs.lakh) on infrastructure of the school (X1)
and the yearly spending on interface events of the school (X2) on the grading of the school
by an inspection team. The data are given below:
Table-1
year Grade Expenditure on Expenditure on
infrastructure interface events
(Rs lakh) X1 (Rs lakh) X2
1993-94 Below average 3 4
94-95 Below average 4 5
95-96 Above average 10 7
96-97 Below average 5 4

80
97-98 Below average 6 6
98-99 Above average 11 4
99-00 Below average 7 4
00-01 Above average 12 5
01-02 Below average 8 7
02-03 Below average 9 5
03-04 Above average 13 6
04-05 Above average 14 8
Below = 0 and above = 1

Calculation process

Table-2
year Grade Expenditure on Expenditure on
(Group-1) infrastructure interface events
(Rs lakh) X1 (Rs lakh) X2
1993-94 0 3 4
94-95 0 4 5
96-97 0 5 4
97-98 0 6 6
99-00 0 7 4
01-02 0 8 7
02-03 0 9 5
Total 42 35
Mean 6 5

Table-3
year Grade Expenditure on Expenditure on
(Group-2) infrastructure interface events
(Rs lakh) X1 (Rs lakh) X2
95-96 1 10 7
98-99 1 11 4
00-01 1 12 5
03-04 1 13 6
04-05 1 14 8
Total 60 30
Mean 12 6
Aggregate 8.5 5.41666
Mean
(G-1+G-2)

Table-4

year Grade X1 X2 X12 X22 X1 X2


(Group-1)
1993-94 0 3 4 9 16 12
94-95 0 4 5 16 25 20
96-97 0 5 4 25 16 20
97-98 0 6 6 36 36 36
99-00 0 7 4 49 16 28
01-02 0 8 7 64 49 56

81
02-03 0 9 5 81 25 45
Total 42 35 280 183 217

Table-5

year Grade X1 X2 X12 X22 X1 X2


(Group-2)
95-96 1 10 7 100 49 70
98-99 1 11 4 121 16 44
00-01 1 12 5 144 25 60
03-04 1 13 6 169 36 78
04-05 1 14 8 196 64 112
Total 60 30 730 190 364

Table-6
Sum of squares Below Above total
(X1 - 1)2 =  (X1 – 1 (G1))2 28. 10 38
+  (X1– 2 (G2))2

(X2 – 2)2 =  (X2 – 2 (G1))2 8 10 18


+  (X1– 2 (G2))2
 [(X1 – 1) (X2 – 2)] =  [(X1 – 7 4 11
1(G1)) (X2– 2 (G1))] +  [(X1 –
1(G2)) (X2– 2 (G2))]

Discriminate Function

Di = d1X1 + d2X2

Normal Equations

d1(X1 - 1)2 + d2  (X1 – 1) X2 – 2) = 1(G2) – 1(G1)

d1(X1 - 1) (X2 – 2) + d2  (X2 – 2)2 = 2 (G2) – 2(G1)

d1 38 + 11 d2 = 12 – 6 = 6
d1 11 + 118 d2 = 6 – 5 = 1

by solving the equations, we get d1 = 0.17229 and d2 = -0.04973

Di = 0.17229X1 – 0.04973X2

Computation of Discriminate ratio (K)

Mean discriminate score of each Group

Mean score for G-1 = 0.172291 – 0.049732


= 0.17229 x 6 – 0.04973 x 5
= 0.78509

82
Mean score for G-2 = 0.17229 x 12 – 0.04973 x 6
= 1.7691

Mean score for aggregate = 0.17229 x 8.5 – 0.04973 x 5.41666


= 1.195094

Summary of Discriminate Scores and their group averages

Group-1 Group-2
year Discriminate (S1j- S1)2 Year Discriminate (S1j- S2)2
score (S1j) score (S1j)
1993-94 0.31795 0.218220 95-96 1.37479 0.155480
94-95 0.44051 0.118755 98-99 1.69627 0.005304
96-97 0.66253 0.015021 00-01 1.81883 0.002473
97-98 0.73536 0.002473 03-04 1.94139 0.029684
99-00 0.00711 0.049293 04-05 2.01422 0.060084
01-02 1.03021 0.060084
02-03 1.30196 0.267155
Total 5.49563 0.730981 8.84550 0.253025
Mean 0.78509 (G-1) 1.7691(G-2)
Mean 1.195094
(Aggregate)

Variability between Groups

VBG = n1(S1- S)2 + n2(S2- S)2


= 7 (0.78509 – 1.19094) 2 + 5(1.7691- 1.195094)2
= 2.824137
7 5
VWG =  (S1j- S1) + (S2j - S2)2
2
j=1 j=1

VWG = 0.78509 + 0.253025


= 0.984006

Discriminate Ratio

K = VBG / VWG
= 2.824137 / 0.984006 = 2.87

This is the maximum possible ratio between ‘the variability between groups’ and ‘the
variability within group’. In the discriminate function, the coefficient X2 has negative sign
which indicates that the variable X1 (spending on infrastructure) is more important than the
variable spending on interface events.

83
FACTOR ANALYSIS

Introduction

Factor analysis can simultaneously manage over a hundred variables, compensate for
random error and invalidity, and disentangle complex interrelationships into their major and
distinct regularities. It takes thousands of measurements and qualitative observations and
resolves them into distinct patterns of occurrence. It makes explicit and more precise the
building of fact-linkages going on continuously in the human mind. It is a means by which
the regularity and order in phenomena can be discerned.

What is Factor analysis?

Factor analysis refers to a variety of statistical techniques whose common objectives


is to represent a set of variables in terms of a smaller number of hypothetical variables (Kim
& Mueller, 1984: 9). It is a technique of data reduction. When a researcher deals with a large
number of variables and does not know exactly which of them are exogenous and which
endogenous, this technique may be of great use for meaningful analysis and interpretation of
data. The term factor analysis was first introduced by Thurstone in 1931.

Factor analysis assumes that the observed variables are linear combinations of some
underlying (hypothetical or unobservable) factors. Some of these factors are assumed to be
common to two or more variables and some are assumed to be unique to each variable. The
unique factors are assumed to be orthogonal to each other. They do not contribute to the co-
variation among the observed variables (Kim & Mueller, 1983: 8). As in other multivariate
analysis, in factor analysis too, we are concerned with the variance. We want to know how
big it is and where it is. The purpose of this technique is to examine which variables have
what amount of variance in common.
Many statistical methods are used to study the relation between independent and
dependent variables. Factor analysis is different; it is used to study the patterns of
relationship among many dependent variables, with the goal of discovering something about
the nature of the independent variables that affect them, even though those independent
variables were not measured directly. Thus answers obtained by factor analysis are
necessarily more hypothetical and tentative than is true when independent variables are
observed directly.
A factor analysis usually begins with a correlation matrix. It can also use co-
variances. Without getting deeply into the mathematics, we can say that factor analysis
attempts to express each variable as the sum of common and unique portions. The common
portions of all the variables are by definition fully explained by the common factors, and the
unique portions are ideally perfectly uncorrelated with each other. The degree to which a
given data set fits this condition can be judged from an analysis of what is usually called the
"residual correlation matrix".
A typical factor analysis suggests answers to four major questions:

1. How many different factors are needed to explain the pattern of relationships
among these variables?
2. What is the nature of those factors?
3. How well do the hypothesized factors explain the observed data?
4. How much purely random or unique variance does each observed variable include?

84
Uses of Factor Analysis

The main applications of factor analytic techniques are: (1) to reduce the number of
variables and (2) to detect structure in the relationships between variables, that is to classify
variables. Therefore, factor analysis is applied as a data reduction or structure detection
method. If a scientist has a table of data and he suspects that these data are interrelated in a
complex fashion, then factor analysis may be used to untangle the linear relationships into
their separate patterns.

Factor analysis may be employed to discover the basic structure of a domain. As a


case in point, a scientist may want to uncover the primary independent lines or dimensions-
-such as size, leadership, and age--of variation in group characteristics and behavior. Data
collected on a large sample of groups and factor analyzed can help disclose this structure.

It can also be used to group interdependent variables into descriptive categories, such
as ideology, revolution, liberal voting, and authoritarianism.

A scientist often wishes to develop a scale on which individuals, groups, or nations


can be rated and compared. A problem in developing a scale is to weight the characteristics
being combined. Factor analysis offers a solution by dividing the characteristics into
independent sources of variation (factors). Each factor then represents a scale based on the
empirical relationships among the characteristics.

The technique can be used to transform data to meet the assumptions of other
techniques. For instance, application of the multiple regression technique assumes that
predictors are statistically unrelated. If the predictor variables are correlated in violation of
the assumption, factor analysis can be employed to reduce them to a smaller set of
uncorrelated factor scores. The scores may be used in the regression analysis in place of the
original variables, with the knowledge that the meaningful variation in the original data has
not been lost.

Major Steps in Factor Analysis

1. The first step in factor analysis is the preparation of data matrix, which has two modes
-- entity mode, which represents cases (observations) arranged as rows and the variable
mode, which represents the variables arranged as column. After data matrix, correlation
matrix of variables is prepared.

2. The second step in this analysis is the extraction of common factors that can adequately
explain the observed correlation among the variables. There are several methods of
extraction such as: Maximum Likelihood, Least Square, Alpha Factoring, Image
Factoring, and Principal Component Analysis. The main purpose of extraction is to
know whether a small number of factors can account for the correlation among a much
larger number of variables.

3. There are several criteria to determine the number of initial factors to be extracted by
PC. Notable among them are Scree-Test and Eigenvalue greater than or equal to one
criterion as suggested by Kaiser (1969). In the present analysis, we have applied both
the criteria. Both methods provide the same number of factors.

85
4. The initially extracted factors are rarely interpretable. In order to get the meaningful
results from the initially extracted common factors, the next step is the rotation of these
factors. The purpose of rotation is to achieve the simplest possible factor structure.
Method of rotation can not improve the degree of fit between the data and factor
structure. It makes the results interpretable. There are several methods of rotation. In
orthogonal rotation, three methods: Quartimax, Varimax, and Equimax, are applied,
while in oblique rotation, two methods: Reference Axes, and Primary pattern matrix, are
used. According to Harman (1968), the varimax solution seems to be the “best”
parsimonious analytical solution.

5. Lastly, for interpretation and analysis of factors, variables with highest factor loadings
(weights) are taken into account

How many Factors to Extract?

Note that as we extract consecutive factors, they account for less and less
variability. The decision of when to stop extracting factors basically depends on when
there is only very little "random" variability left. Kaiser’s criterion of eigenvalues greater
than 1, can be adopted for identification of factors. This criterion, proposed by Kaiser
(1960) is probably the one most widely used. Another method is the scree test first
proposed by Cattell (1966). We can plot the eigenvalues shown above in a simple line
plot.
Cattell suggests to find the place where the smooth decrease of eigenvalues
appears to level off to the right of the plot. According to this criterion, we would
probably retain 2 or 3 factors in our example.

Terminology of factor analysis

1. Common factor: a factor on which two or more variables load.

2. Common variance: variance in a variable shared with common factors. Factor analysis
assumes that a variable's variance is composed of three components: common, specific and
error.

3. Communality: the proportion of a variable's variance explained by a factor structure. It is


is denoted by h2

4. Complex variable: a variable which loads on two or more factors.

86
5. Eigenvalue: the variance in a set of variables explained by a factor or component, and
denoted by lambda. Eigenvalue is the sum of squared values in the column of a factor matrix.

6. Factor loading: a term used to refer to factor pattern coefficients or structure


coefficients.

7. Factor scores: linear combinations of variables that are used to estimate the cases'
scores on the factors or components. Least squares estimates of factor scores are the most
commonly used.

8. Parsimony principle: When two or more theories explain the data equally well, select
the simplest theory. Factor analysis application: If a two-factor and a three-factor model
explain about the same amount of variance, interpret the two-factor model.

9. Unique variance: that variance of a variable that is not explained by common factors.
Unique variance is composed of specific and error variance.

10. Varimax rotation: an orthogonal rotation criterion which maximizes the variance of
the squared elements in the columns of a factor matrix. Varimax is the most common
rotational criterion.

Explanation of Method by Example


Factor analysis begins with the construction of a new set of variable based on the
relationships in the correlation matrix. While this can be done by number of ways, the most
frequently used approach is Principal Component Analysis. This method transforms a set of
variables into a net set of composite variables or principal components that are not correlated
with each other. These linear combinations of variables are called factors, account for the
variance in the data as a whole. The best combination makes up the first principal components
and is the first factor. The second principal component is defined as the best linear
combination of variables for explaining the variance not accounted for by the first factor. In
turn, there may be a third, fourth, and kth component, each being the best linear combination
of variables not accounted for by the precious factor. The process continues until all the
variance is accounted for, but as a practical point of view it is usually stopped after a small
number of factors have been extracted.
The output of PCA of a hypothetical case is shown in the following table.

Variable Un-rotated factors Rotated factors


I II h2 I II
A 0.70 -0.40 0.65 0.79 0.15
B 0.60 -0.50 0.61 0.75 0.03
C 0.60 -0.35 0.48 0.68 0.10
D 0.50 0.50 0.50 0.06 0.70
E 0.60 0.50 0.61 0.13 0.77
F 0.60 0.60 0.72 0.07 0.85
Eigen Value 2.18 1.39
Percentage of Variance 36.30 23.20
Cumulative Percentage 36.30 59.50

The values in this table are correlation coefficients between the factor and the variable. For
instance 0.70 is the r between variable A and Factor I. These correlations are called

87
loadings. Eigen values are the sum of variance of factor loadings. For example, eigen value
for factor I is 0.702 + 0.602+ 0.502+ 0.602+ 0.602. When divided by number of variables, an
eigenvalue yields an estimate of the amount of total variance explained by the factor.
Communalities (h2) measure the variance in each variable that is explained by the two
factors. It is sum of squares of factor loadings of all the factors for a variable. For instance,
with variable A, communality is 702 + -0.402 = 0.65, indicating that 65 per cent of the
variance in variable “A” is statistically explained in terms of factor I and factor II.
Un-rotated factor loadings do not provide meaningful results. They are difficult to
interpret. What one would like to find is some pattern in which factor I would be heavily
loaded on some variables and factor II on others. Such a condition would suggest rather
“pure” constructs underlying each factor. You attempt to secure this less ambiguous
condition between factors and variables by rotation. This procedure can be conducted
through orthogonal method. Rotated factor loadings are given in the table. This shows that
the measurement from six variables may be summarized by two underlying factors.
The interpretation of factor loadings is largely subjective. There is no way to
calculate the meanings of factors; they are what one sees in them. For this reason, factor
analysis is largely used for exploration. One can detect patterns in latent variables, discover
new concepts and reduce data.

MANOVA

Analysis of variance is a special case of regression model, which is generally used to analyse
data collected using experimentation. Multivariate analysis of variance (MANOVA)
examines the relationship between several dependent and independent variables. Whereas
ANOVA assess the differences between groups, MANOVA examines the dependence
relationship between a set of variables across a set of groups. It is a technique which
determines the effects of independent categorical variables on multiple continuous
dependent variables. It is usually used to compare several groups with respect to multiple
continuous variables. The main distinction between MANOVA and ANOVA is that several
dependent variables are considered in MANOVA.

Classification of MANOVA

1. One-way MANOVA: it is similar to the one-way ANOVA. It anayses the


variance between one independent variable and multiple dependent
variables.
2. Two-way MANOVA

Assumptions of MANAVA

1. Normal Distribution

The dependent variables should be normally distributed within groups. Overall, the F test
is robust to non-normality, if the non-normality is caused by skewness rather than by
outliers. Tests for outliers should be run before performing a MANOVA, and outliers
should be transformed or removed.

2. Linearity

It assumes that there are linear relationships among all pairs of dependent variables, all
pairs of covariates, and all dependent variable-covariate pairs in each cell.

88
3. Homogeneity of Variances

Homogeneity of variances assumes that the dependent variables exhibit equal levels of
variance across the range of predictor variables. Homoscedasticity can be examined
graphically or by means of a number of statistical tests.

4. Homogeneity of Variances and Covariances

In multivariate designs, with multiple dependent measures, the homogeneity of variances


assumption described earlier also applies. However, since there are multiple dependent
variables, it is also required that their intercorrelations (covariances) are homogeneous
across the cells of the design.

5. Multicollinearity and Singularity

When correlations among dependent variables are high, problem of multicollinerarity and
singularity exists. Multicollinearity – when the relationship between pairs of variables is
high (r>.90). Singularity – a variable is redundant; if it is a combination of two or more of
the other variables.

Example:

A social scientist wished to compare those respondents who had lodged an organ donor card
with those who had not. Three hundred and eighty eight new drivers completed a
questionnaire that measured their attitudes towards organ donation, their feelings about
organ donation and their previous exposure to the issue. It is hypothesized that individuals
who agreed to be donors would have more positive attitudes towards organ donation, more
positive feelings towards organ donation and greater previous exposure to the issues.
Therefore, the independent variable was whether a donor card had been signed and the
dependent variables were attitudes towards organ donation, feelings towards organ donation
and previous exposure to organ donation. Attitudes and feelings are measures on traditional
scales with a Likert scale response format. Exposure was measured in terms of media
exposure and personal experience. Conceputally and theoretically these dependent variables
were believed to be related and so MANOVA was the analysis of choice. Complete data are
available on www.johmwiley.com.au/highered/spssv

Results

Between-Subjects Factors

Value Label N

signed donor card 1 yes 189

2 no 188

Descriptive Statistics

89
signed
donor
card Mean Std. Deviation N

exposure to donation issues yes 11.78 15.946 189

no 8.69 14.752 188

Total 10.24 15.420 377

attitude towards organ yes 85.03 47.899 189


donation no 96.48 62.916 188

Total 90.74 56.114 377

feelings towards organ yes 28.07 9.277 189


donation no 31.20 8.948 188

Total 29.63 9.236 377

Box's Test of Equality of


Covariance Matricesa

Box's M 19.260

F 3.182

df1 6

df2 1018790.282

Sig. .004

Tests the null hypothesis that the


observed covariance matrices of the
dependent variables are equal across
groups.

a. Design: Intercept + donor

Multivariate Testsb

Effect Value F Hypothesis df Error df Sig.

Intercept Pillai's Trace .935 1790.688a 3.000 373.000 .000

Wilks' Lambda .065 1790.688a 3.000 373.000 .000

Hotelling's Trace 14.402 1790.688a 3.000 373.000 .000

Roy's Largest Root 14.402 1790.688a 3.000 373.000 .000

donor Pillai's Trace .033 4.255a 3.000 373.000 .006

Wilks' Lambda .967 4.255a 3.000 373.000 .006

Hotelling's Trace .034 4.255a 3.000 373.000 .006

Roy's Largest Root .034 4.255a 3.000 373.000 .006

a. Exact statistic

90
Box's Test of Equality of
Covariance Matricesa

Box's M 19.260

F 3.182

df1 6

df2 1018790.282

Sig. .004

Tests the null hypothesis that the


observed covariance matrices of the
dependent variables are equal across
groups.
b. Design: Intercept + donor

Levene's Test of Equality of Error Variancesa

F df1 df2 Sig.

exposure to donation issues 2.936 1 375 .087

attitude towards organ


15.346 1 375 .000
donation

feelings towards organ


1.284 1 375 .258
donation

Tests the null hypothesis that the error variance of the dependent variable is
equal across groups.

a. Design: Intercept + donor

Tests of Between-Subjects Effects

Type III Sum Mean


Source Dependent Variable of Squares df Square F Sig.

Corrected exposure to donation


903.925a 1 903.925 3.830 .051
Model issues

attitude towards organ


12372.705b 1 12372.705 3.960 .047
donation

feelings towards organ


922.187c 1 922.187 11.100 .001
donation

Intercept exposure to donation


39489.506 1 39489.506 167.331 .000
issues

attitude towards organ 3105144.37


3105144.376 1 993.910 .000
donation 6

feelings towards organ


331042.346 1 331042.346 3984.772 .000
donation

91
donor exposure to donation
903.925 1 903.925 3.830 .051
issues

attitude towards organ


12372.705 1 12372.705 3.960 .047
donation

feelings towards organ


922.187 1 922.187 11.100 .001
donation

Error exposure to donation


88498.590 375 235.996
issues

attitude towards organ


1171563.820 375 3124.170
donation

feelings towards organ


31153.824 375 83.077
donation

Total exposure to donation


128924.000 377
issues

attitude towards organ


4288063.000 377
donation

feelings towards organ


363028.000 377
donation

Corrected exposure to donation


89402.515 376
Total issues

attitude towards organ


1183936.525 376
donation

feelings towards organ


32076.011 376
donation

a. R Squared = .010 (Adjusted R Squared = .007)

b. R Squared = .010 (Adjusted R Squared = .008)

c. R Squared = .029 (Adjusted R Squared = .026)

Estimated Marginal Means


signed donor card

signed 95% Confidence Interval


donor
Dependent Variable card Mean Std. Error Lower Bound Upper Bound

exposure to donation issues yes 11.783 1.117 9.586 13.980

no 8.686 1.120 6.483 10.889

attitude towards organ yes 85.026 4.066 77.032 93.021


donation
no 96.484 4.077 88.468 104.500

feelings towards organ yes 28.069 .663 26.765 29.372


donation no 31.197 .665 29.890 32.504

92
Box’s M tests the homogeneity of the variance-covariance matrices. We have
homogeneity of variance because this test is not significant at an alpha level 0.001.

The multivariate tests of significance test whether there are significant group
differences on a linear combination of the independent variables. We notice that several
statistics are available. Pillai’s Trace Criterion is considered to have acceptable power and
to be the most robust statistic against violation of assumptions. Having obtained a significant
multivariate effect for donor, i.e., a significant of F less than 0.05. an examination of the
univariate F-test for each variable indicates which individual dependent variables contribute
to the significant multivariate effect.

We can conclude that a person’s decision to act a donor is significantly influenced


by their feelings towards organ donation. No significant main effects were found for the
other dependent measures.

93
13. DATA INTERPRETATION AND REPORT WRITING

What are research reports and why would I write one?


1. A research report is the only concrete evidence of your research.

2. The quality of the research may be judged directly by the quality of the writing and
how well you convey the importance of your findings.

3. If you are submitting a research report for a class or to an organization, check for
specific requirements and guidelines before beginning to write your research
report.

Types of Report
• Scientific/lab
• Technical
• Business
• Research
• Academic overview

All vary slightly in their purpose & structure.

Writing & Editing Your Report: Writing the first draft for yourself
Where do you start writing: the introduction or elsewhere?

• Reports are rarely written in linear order. The order for writing the final sections
may be Conclusions, Introductions & finally the abstract. These are the sections
most likely to be read by readers
• For every 1000 readers who see your title, 100 may read the abstract, perhaps 10
will read some of the main report [conclusions, some results etc], at most 1 may
follow all the way through
• A middle section such as the methods/ system design may be a good starting point
• Writing notes for the introduction, some background theory or a review of previous
studies may help you to clarify the focus of your report.

Writing specifications
Use 10 or 12 point font
• The most acceptable fonts: • Ariel, Times New Roman (the old reliable), Verdana,
Lucida

Unacceptable fonts:Broadway, Brush Script, Chiller, Courier, Freestyle Script, Gigli, Old
English Text, Playbill, etc.

• Your reports should be in pristine condition when they are turned in

• No frayed edges or coffee stains (front OR back)

Section/point identification systems


It represents important choices made by the writer regarding:

• The range of the material covered,

94
• The relative importance of the sections in the report, and the relatedness of
information within sections.

It, Therefore, plays a very important role in communicating meaning to the reader. The
report presents meaning and information in two complementary and equivalent ways:
 - The meaning represented by the words, thought, research,
 information
 - The meaning represented by the layout

Choosing the Layout System


chooses one of the following two layout systems:

• The decimal numbering, or


• The number-letter.

Once a system is chosen, the writer must present this system consistently throughout the
report.

The decimal numbering System


First level 1.0 2.0 3.0 4.0 5.0
(of importance/generality) (also termed the A heading)

N.B. The `point-zero' is not always used in decimal numbering


systems

Second level 1.1 2.1 3.1 4.1 5.1


(also termed the B heading)

Third level 1.1.1 2.1.1 3.1.1 4.1.1 5.1.1


(also termed the C heading)

Fourth level 1.1.1.1 2.1.1.1 3.1.1.1 4.1.1.1 5.1.1.1


(also termed the D heading)

Writer must present this system consistently throughout the report

The Indenting with decimal numbering System


This is generally used with indenting to structure the text in the following way.
It is possible for a reader to gain a strong indication of the relatedness, and relative
importance of the parts of the text as a result of this layout, even though no meaning from
the content is provided.

• 1.0
• 1.1
• 1.2
1.2.1
1.2.2
1.2.2.1
1.2.2.2
• 2.0

95
Number - letter
(still encountered, but becoming less commonly used)
First level (of importance/generality)
(A heading) I II III IV V VI VII

Second level
(B heading) A B C D E F G

Third level
(C heading) 1 2 3 4 5 6 7

Fourth level
(D heading) (a) (b) (c) (d) (e) (f) (g)
Fifth level
(E heading) (i) (ii) (iii) (iv) (v) (vi) (vii)

The Indenting with Number - letter System


• I
– A
– B
-1
-2
– (a)
– -(b)

• II
– A

Writing your report: Developing the structure


• Abstract – Executive Summary.
• Introduction – Background – Literature review.
• Method.
• Results (Analysis).
• Discussion.
• Conclusion (Recommendations).
• References.
• Appendices.

How to Write a Report. The Title Page

General guidelines:

-There are four main pieces of information that have to be included into the title page:
- The report title;
- The name of the person, company, or organization for whom the report has been
prepared;

- The name of the author and the company or university which originated the report;

96
- The date the report was completed.

-A title page might also include contract number, a security classification, or a copy
number depending on the nature of the report you are writing.

Table of contents
• Your report should include a table of contents if longer than about 5-10 pages.

• This allows the reader to quickly find the relevant section.

• While many word processing packages will automatically generate a table of


contents, it is wise to check that the page numbers are correct before printing and
before submission.

Reports: Writing the abstract


The abstract is of utmost importance, for it is read by 10 to 500 times more people than
hear or read the entire article. It should not be a mere recital of the subjects covered. The
abstract should be a condensation and concentration of the essential information in the
paper
Although the abstract is first part of the report you read, you should write it last, after
writing the introduction.
• Needs to stand alone i.e. be complete in itself,
• Allows the reader to gain a v. brief but complete overview of your entire report
from aims to conclusions
• Does not act as an introduction
• Typically 100-200 words; one paragraph
• Highly succinct but must be cohesive- i.e. flow well
• Most important section along with conclusions

Beginning an introduction

Introductions serve as a place for you:


• To catch your reader’s attention,

• Also help to place your project in its context (whether that context is background
information or your purpose in writing is up to you).

Consider the following examples; they represent two extremes that writers can take in
beginning their introductions.

What is the problem with this sentence as an opening to an introduction?

• Theuniverse has been expanding from the very moment that it was born.
One of the ways that the sentence above might be rewritten is:

Recent studies suggest that the universe will continue expanding forever and may pick
up speed over time.

97
The rewritten sentence establishes the report’s context within “recent studies” concerning
a specific theory related to universe expansion. This context is much more specific than
that of the original sentence.

The introduction may carry out the following roles


• Gives some background to the study & sets the scene for the report.
• Explains connections with any previous work & gap
• Explains background theory [longer reports?]
• Explains your aims/hypotheses clearly
• Explains briefly what you will do & why the study is being carried out
• Explains briefly how the report is structured [signposting]
Main Body
• Text with headings and sub-headings.

In general, the body of the research report will include three distinct sections:
• A section on theories, models, and your own hypothesis
• A section in which you discuss the materials and methods you used in your
research
• A section in which you present and interpret the results of your research

• The headings should be self-explanatory. The main body of the report needs to be
clear, concise and follow a logical order.

• Figures and tables must be referred to in the body of the text and need to have clear
captions. Label figures at the bottom and tables at the top in numerical order.

• Each figure should be capable of being understood on its own using the caption as
the only reference.

Theories, Models and Your Own Hypothesis


This section can be very important, especially for:

• Research articles,
• Formal reports, or
• scientific papers.

Inclusion of such theories and models directly affects:

The hypothesis that you propose and on which you base your research.

When you develop hypotheses, you predict what you will find after you conduct your
research. This prediction is based on existing theories, models, evidence, and logic.

In this section, you may need to:

• Define and explain your hypothesis and the theories and models you used to
develop it.

98
• Define and explain competing hypotheses, theories, and models, including their
strengths and weaknesses.

• Compare and contrast the specific points where they agree or disagree.

The Literature Review: What others did to solve the problem

Review important relevant sources

Identify

Theoretical framework

Variables

Methodological or conceptual gaps/flaws of previous research

Leads to your research question/hypothesis

The following questions are good ones to work through: What do I expect this
experiment to reveal? Why?

• How does my hypothesis directly answer the question posed by the problem?

• How does the hypothesis fit in with other hypotheses or more general theory? How
will my work challenge or support the work of others?

• What is the current theory to which it relates?

• What are alternative views to this theory? What are the strengths and weakness of
those views?
• On what literature did I or can I base my explanation?

Materials and Methods

All materials and methods sections should address the following questions:

• How was the experiment designed?

• On what subjects or materials was the experiment performed?

• How were the subjects/materials prepared?

• What machinery and equipment was used in the experiment?

• What sequence of events did you follow as you handled the subjects/materials or as
you recorded data?

99
Results: Presenting data
All preceding sections of the report (Introduction, Materials and Methods, etc.) lead in
to the Results section of the report and all subsequent sections will consider what the
results mean (conclusion, recommendations, etc.).

How should I incorporate figures and tables into my report?


Figures and tables should help to simplify information, so you should consider using them
when words are not able to convey information as efficiently as a visual aid would be able
to.

Consider using figures and tables when you need to decipher information or the analysis
of information, when you need to describe relationships among data that are not apparent
otherwise, and when you need to communicate purely visual aspects of a phenomenon or
apparatus.

Tables or lists are simple ways to organize the precise data points themselves in one-on-
one relationships.

A graph is best at showing the trend or relationship between two dimensions, or the
distribution of data points in a certain dimension (i.e., time, space, across studies,
statistically).

A pie chart is best at showing the relative areas, volumes, or amounts into which a whole
(100%) has been divided.

Flow charts show the organization or relationships between discrete parts of a system. For
that reason they are often used in computer programming.

• The most important general rule is that tables and figures should supplement rather
than simply repeat information in the report.

• You should never include a table or figure simply to include them. This is
redundant and wastes your reader’s time.

Additionally, all tables and figures should:


• Be self-contained—they should make complete sense on their own without
reference to the text
• Be cited in the text—it will be very confusing to your audience to suddenly come
upon a table or figure that is not introduced somewhere in the text. They will not
have a context for understanding its relevance to your report.
• Include a number such as Table 1 or Figure 10—this will help you to distinguish
multiple tables and figures from each other.

• Include a concise title—it is a good idea to make the most important feature of the
data the title of the figure.

• Use legends and clear, concise, descriptive titles for tables and figures.

• · Ensure all axes of graphs are labeled and that units are identified in all
tables and figures

100
Results & Discussion: Interpretation of Data

This section of the report is important because it demonstrates the meaning of your
research.

This section of the paper draws upon writing skills that other sections do not because you
need to write persuasively in this section as you convince readers that your interpretation
of data is logical and correct.

As you develop your argument in this section, consider arranging your evidence in the
order that best highlights your main point, cite authorities that have come to similar
interpretations under similar circumstances, and consider the superiority of your
conclusions to opposing viewpoints.

For most research reports, the most certain part of your case will be your data, and
many research reports will develop along this outline:

 Begin with a discussion of the data.

 Move on to generalize about or analyze the data.

 Consider how the data addresses the research problem or hypothesis outlined in the
Introduction.

 Proceed from most general features of the data to more specific results

 Discuss what can be inferred from the data as they relate to other research and
scientific concepts
 Compare with other studies and draw conclusions based on your findings.

 Refer back to the original hypotheses you were testing

 Identify sources of error/limitation and any inadequacies of your techniques


especially if:

 your results are inadequate, negative, or not consistent with earlier studies or with
your own hypothesis.

 Do not try to defend your research or minimize the seriousness of the limitation in
your interpretation; instead, focus on the limitation only as it affects the research
and try to account for it.

One Way to sum up the results and discussion could be as follows:


What is already known on this topic
• Low birth weight is associated with poor cognitive development
• Few studies have examined this association across the full birthweight range in the
normal population.

What this study adds


• Birth weight is significantly associated with cognitive ability at age 8 years
through adolescence, and into early adulthood, independent of social background

101
• The associations between birth weight and cognitive function at ages 8, 11, and 15
are evident across the normal birthweight range (>2.5 kg) and so are not accounted
for exclusively by low birth weight
• Birth weight is also associated with educational attainment, suggesting that the
association between birth weight and cognition may have functional implications

Conclusions

The conclusion is important because:


• It is your last chance to convey the significance and meaning of your research to
your reader by concisely summarizing your findings and generalizing their
importance.

• It is also a place to raise questions that remain unanswered and to discuss


ambiguous data.

• The conclusions you draw are opinions, based on the evidence presented in the
body of your report, but because they are opinions you should not tell the reader
what to do or what action they should take.

Be sure that you use language that distinguishes conclusions from inferences.

Use phrases like “This research demonstrates . . .” to present your conclusions and phrases
like “This research suggests . . .” or “This research implies . . .” to discuss implications.

Make sure that readers can tell your conclusions from the implications of those
conclusions, and do not claim too much for your research in discussing implications. You
can use phrases such as “Under the following circumstances,” “In most instances,” or “In
these specific cases” to warn readers that they should not generalize your conclusions.

You might also raise unanswered questions and discuss ambiguous data in your
conclusion.

Raising questions or discussing ambiguous data does not mean that your own work is
incomplete or faulty; rather, it connects your research to the larger work of science and
parallels the introduction in which you also raised questions.

The following is an example taken from a text that evaluated the hearing and speech
development following the implantation of a cochlear implant. The authors of “Beginning
To Talk At 20 Months: Early Vocal Development In a Young Cochlear Implant Recipient,”
published in Journal of Speech, Language, and Hearing Research, titled their conclusion
“Summary and Caution.” Using this title calls readers’ attention to the limitations of their
research:

Recommendations

This section appears in a report when the results and conclusions indicate that further work
needs to be done or when you have considered several ways to resolve a problem or improve
a situation and want to determine which one is best.

This gives you another opportunity to demonstrate how your research fits within the larger
project of science.

102
It also demonstrates that you fully understand the importance and implications of your
research, as you suggest ways that it could continue to be developed.

References

Reference sections are important because:

• Like the sections on the procedure you used to gather data, they allow other
researchers to build on or to duplicate your research.

• Without references, readers will not be able to tell whether the information that you
present is credible, and they will not be able to find it for themselves.

• Reference sections also allow you to refer to other researchers’ work without
reviewing that work in detail. You can refer readers to your reference page for more
information.

It is best to compile your own reference list containing a variety of information. This will
save you from having to track down pieces of information you may have neglected to make
note of if they are specifically requested after you have filed a source, returned it to the
library, or misplaced it.

Information to include on your reference list


 The reference list is placed at the end of the report.

 It is arranged in alphabetical order of authors' surnames and chronologically for each


author.

 The reference list includes only references cited in the text. The author' s surname is
placed first, immediately followed by the year of publication. This date is often
placed in brackets.

 The title of the publication appears after the date followed by place of publication,
then publisher (some sources say publisher first, then place of publication).

 The important thing is to check for any special requirements or, if there are none, to
be consistent.

Information to include on your reference list

1. The Harvard (author-date) system is the one usually encountered in the sciences and
social sciences.

2. Notice that the titles of books, journals and other major works appear in italics (or
are underlined when handwritten), while the titles of articles and s maller works
which are found in larger works are placed in (usually single) quotation marks.

Information to include on your reference list

103
1. The Harvard (author-date) system is the one usually encountered in the sciences
and social sciences.

2. Notice that the titles of books, journals and other major works appear in italics (or
are underlined when handwritten), while the titles of articles and smaller works
which are found in larger works are placed in (usually single) quotation marks.

Harvard System: Examples


BOOK
Begon, M., Harper, J.L. & Townsend, C.R. (1990). Ecology: Individuals, Populations and
Communities. Oxford: Blackwell Scientific Publications.
JOURNAL ARTICLE
Hirschberger, P. & Bauer, T. (1994). The coprophagous insect fauna and its influence on
dung disappearance. Pedobiologia, 38, 375-384.
BOOK CHAPTER
Holt, R.D. (1993). Ecology at the mesoscale: the influence of regional processes on local
communities. In R. E. Ricklefs & D. Schluter (Eds.), Species Diversity in Ecological
Communities. Chicago: University of Chicago Press, 77-88.
INTERNET SITE
Crook, A. C. & Finn, J. (2002). STARS: Scientific Training by Assignment for Research
Students [online]. Available from: http://www.ucc.ie/research/stars [Accessed 16th
November 2004].

Quotations

 When the exact words of a writer are quoted, they must be reproduced exactly in
all respects: wording, spelling, punctuation, capitalisation and paragraphing.

 Quotations should be carefully selected and sparingly used, as too many quotations
can lead to a poorly integrated argument.

 Use of a direct quotation is justified when:


 --changes, through paraphrasing, may cause misinterpretation

 --the original words are so concisely and convincingly expressed that they
cannot be improved upon

 --a major argument needs to be documented as evidence


 --the student wishes to comment upon, refute or analyse the ideas expressed
in another source.
 The intention of the original text must not be altered.

Short quotations (up to 4 lines):


Incorporate the quotation into the sentence or paragraph, without disrupting the flow of
the text, using the same spacing as in the rest of the text. The source of the quotation is
either acknowledged in a footnote or in the text. Use single quotation marks at the
beginning and end of the quotation:

EXAMPLE: The Style Manual (1978, p. 46) states that 'the modern tendency to use single
quotation marks rather than double is recommended.'

104
Long quotations (more than thirty words):
Indent the quotation from the remainder of the text.

Do not use quotation marks.

Some writers recommend the use of smaller type or italics to set off indented quotations.

Introduce the quotation appropriately, and cite the source at the end of the quotation as
you would in your text

Ellipsis
Irrelevancies within very long quotations can be omitted by the use of an ellipsis which is
indicated by three spaced dots (. . .).

Nowadays it is not usual to place an ellipsis at the beginning or the end of a quotation
which is intended to stand alone or forms part of one of your own sentences.

Appendices

o You should place information in an Appendix that is relevant to your subject but
needs to be kept separate from the main body of the report to avoid interrupting the
line of development of the report.

o An appendix should include only one set of data, but additional appendices are
acceptable if you need to include several sets of data that do not belong in the same
appendix.

o Label each appendix with a letter, A, B, C, and so on.

o Do not place the appendices in order of their importance to you, but rather in the
order in which you referred to them in your report. You should also paginate each
appendix separately so that the first page of each appendix you include begins with
1.

o Defining Your Terms

o A good general rule to follow is to define all terms that you are not completely sure
your audience will understand the same way you do.

o Words to focus on are those key to your research, those relatively new or unfamiliar,
and those that readers could not look up for themselves in a standard dictionary.
Jargon

o You should take your audience into consideration when determining when to include
jargon in your writing.

o Consider their vocabulary and whether they will be familiar with a word or phrase
before you use it.

105
o Do not simply choose to include jargon without taking your audience into
consideration. Jargon can come between your writing and your reader, and readers
who do not understand jargon may see your use of jargon as impolite.

Writing Numbers, Measurements, and Equations

Writing Numbers

1) Spell out numbers between zero and ten and use figures for all other numbers.
Examples: two cats, 11 materials, one attempt, 20,000 residents

Unfortunately, there are a number of exceptions to this general guideline. Make sure that
you are as familiar with the exceptions as you are with the rule.
Exceptions:
Mathematical operations
Raised to the power of 4
Units of measurement
6 feet
Age
9 years old
Time
1 pm
Dates
June 8, 2001
Page numbers
Page 4
Percentages
2 percent
Money
$5
Proportions
100:1
All numbers that begin a sentence should be spelled out.
Seven times the tests failed.

2) When you use two or more numbers in the same section of writing, use figures. This
makes them easier to see and compare.

Example: We are requesting funding to purchase 25 pumps, 15 fans, and 5 ducts.

Exception:
If none of the numbers included is larger than 10, then spell out all of the numbers.
We are requesting funding to purchase nine pumps, six fans, and three ducts.

3) Form the plural of a number by adding ’s.


Example: All of the 15’s tested within acceptable limits.
4) Use hyphens when you write fractions, a sequence or range of values, and between
number and unit of measure when they modify a noun.
Examples:
Fractions
Thirty-three-thirty-fifths

106
Sequence or range of values
Pages 167-170
Pages 224-35
Number and until of measure used to modify a noun
20-pound dog
20-ounce pitcher

5) Use decimals instead of fractions, whenever possible. Decimals are easier to type and
to read. Write both decimals and fractions as figures.

6) A zero is always placed before the decimal point for numbers less than one.

7) Spell out the shorter of two numbers that appear consecutively in a phrase.

Examples:
Not:But:4 6-inch nails - 4 six-inch nails
20 1,000-piece puzzles - Twenty 1,000-piece puzzles

Writing Measurements

9) Separate the figure from the name of the measure with a space, but do not separate % or
$ from the figure with a space.
Examples: 3.4 hr $22 50%

10) Do not use a period after the abbreviation of a measure.


Example: 3.4 hr

11) Use figures for years and decades and don’t abbreviate them.
Not:But:’30s The 1930s The fifties The 1950s

Writing Equations
12) Place equations on a separate line and number them consecutively with a number in
parentheses at the right margin.

13) Do not use punctuation after the equation, but punctuate words to introduce equations
as you would words forming any other sentence.

14) Refer to an equation in the body of the text by its number in parentheses.

Reports: Writing the Results


Presents the results [data] from the experiment or model
Do not just include figures and tables, ensure that your text provides:
• a commentary guiding the reader through the figures & tables: location &
summary of purpose in report e.g. Figure 3.2 shows how the incidence of malaria
increases when…; statement highlighting key trends
• Check that figures are clearly presented [see slide 10]
• Remember that the reader will look at the figures & tables only if directed to do
so in the text.

Editing Your Report with a Critical Friend

107
Murphy's Law of Errata Detection: "The very first person to see your mistake is always
the last person you want to know about it.“

Reading your own work, you don’t always spot errors as you may read your draft in the
way you want it to sound

Work with a critical friend- someone who gives honest advice- perhaps outside your field

As soon as you sit in front of the paper with your critical friend, your perspective may
change from that of the writer to a potential reader of the paper

Don’t over edit the 1st part you write

Try using the editing questions provided by the Purdue Uni. Online Writing Lab- next
slide

Style & Vocabulary


Style:
• Formal & objective
• No 'I' or 'You'; no contracted forms can’t
• Avoidance of direct questions & standard negatives
• No colloquial English: lots, stuff, things
Vocabulary
• Formal verbs chosen e.g. investigated (from Latin/ Greek, rather than look into-
(not 2 part verbs)
• Precise & often abstract vocabulary

A Few Grammar Points

• Passive e.g. Tests were made… frequently used; don’t overuse


• Longer sentences often used with clauses – don’t make them too long or complex.
• Make claims carefully using the modals: can, may, might, etc This compound may
cause an increased incidence of……
• Noun groups are often used to convey information concisely (nominalisation) e.g.

• It’s and its: Don’t use it’s, which is a spoken contraction of it is/it has; ‘its’ is a
possessive adjective

To Sum up
Use and evaluate all the data you report and do not be discouraged if your results differ
from published studies or from what you expected

 Justify all tables and figures by discussing their content and labeling them clearly
 Be creative in your presentation of data, your analysis, and your interpretation of
data - play around with different variations before completing your report
 Do not force conclusions from your data or fudge data by omitting that which does
not support pre-conceived conclusions
 Make sure all calculations and analyses are relevant to the hypotheses you are
testing and the overall objectives of the study
 Justify your ideas and conclusions with data, facts, and background literature and
with sound reasoning

108
 Ensure to keep the different sections of the report discrete, i.e. methods in the
methods section, results in the results section, and leave discussion and
interpretation of those results for the discussion section
 Plan your writing: organize your thoughts and data, and sketch the report before
actually writing. This will help maximize your time efficiency and lead to a
concise, well structured report

109

You might also like