Unit 3 RM

UNIT III
DATA COLLECTION
3.1 INTRODUCTION
The next step in the research process after identifying the type of research the researcher
intends
to do is the deciding on the selection of the data collection techniques. The data
collection
technique is different for different types of research design. There are predominantly two
types
of data: (i) the primary data and (ii) the secondary data.
Primary data is one a researcher collects for a specific purpose of investigating the
research
problem at hand. Secondary data are ones that have not been collected for the immediate
study at
hand but for purposes other than the problem at hand. Both types of data offer
specific
advantages and disadvantages.
a) Secondary data offer cost and time economies to the researcher as they already exist in
various
forms in the company or in the market.
b) It is feasible for a firm to collect.
c) Since they are collected for some other purposes, it may sometimes not fit perfectly into
the
problem defined.
d) The objectives, nature and methods used to collect the secondary data may not be appropriate
to the present situation.
Mostly secondary data helps to:
a) Identify the problem.
b) Better define the problem.

c) Develop an approach to the problem.
d) Formulate an appropriate research design by identifying the key variables.
e) Answer certain research questions and formulate hypotheses.
f) Interpret the primary data more in-depth.
3.2 SECONDARY DATA
Secondary data are the data that are in actual existence in accessible records, having been
already
collected and treated statistically by the persons maintaining the records. In other words,
secondary data are the data that have been already collected, presented tabulated, treated with
necessary statistical techniques and conclusions have been drawn. Therefore, collecting
secondary data doesn't mean doing some original enumeration but it merely means
obtaining
data that have already been collected by some agencies, reliable persons, government
departments, research workers, dependable organisations etc. Secondary data are easily
obtainable from reliable records, books, government publications and journals.
When once primary data have been originally collected, moulded by statisticians or statistical
machinery, then it becomes secondary in the hands of all other persons who may be desirous
of
handling it for their own purpose or studies. It follows, therefore, that primary and
secondary
data are demarcated separately and that the distinction between them is of degree only. It
a
person 'X' collects some data originally, then the data is primary data to 'X' whereas the same
data when used by another person 'Y' becomes secondary data to 'Y'.
3.3 SOURCES OF SECONDARY DATA
The following are some of the sources of secondary data:

1. Central and State government publications.
2. Publications brought out by international organisation like the UNO, UNESCO, etc.
3. Foreign government publications.
4. Official publications as well as reports of municipalities, district parishads, etc.
5. Reports and publications of commissions - like U.G.C. education commission, tariff
commission, chambers of commerce, co-operative societies, trade associations, banks, stock
exchanges, business houses etc.
6. Well-know newspapers and journals like the Economic Times, The Financial Express, Indian
Journal of Economics, Commerce, Capital, Economical Eastern Economist, etc. Further
Year Books such as Times of India Year Book, Statesman's Year Book also provide valuable
data.
7. Publications brought out by research institutions, universities as well as those published by
research workers give considerable secondary data.
8. Through the Internet/website sources.
Though the given list of secondary data cannot be said to be thorough or complete, yet it can
be
pointed out that it fairly indicates the chief sources of secondary data. Also, besides the above
mentioned data there are a number of other important sources, such as records of
governments in
various departments, unpublished manuscripts of eminent scholars, research workers,
statisticians, economists, private organisations, labour bureaus and records of business firms.
3.4 TYPES OF SECONDARY DATA
Secondary data are of two types. Data that are originated from within the company are
called as internal data. If they are collected for some other purpose, they are internal
secondary
data. This poses significant advantage as they are readily available in the company at low
cost.
The most convenient example internal secondary data is the figures relating sales of the
company. Important internal source of secondary data is database marketing, Database

marketing
involves the use of computers to capture and track customer profiles and purchase details.
The
information about customer profile would serve as the foundation for marketing programmes
or
product.
Published external secondary data refers to the data available without the company. There is
such
a pool of published data available in the market that it is sometimes easy to underestimate
what
is available and thereby bypass relevant information. Several sources of external data
are
available. They are:
General Business Data
Guides or small booklets containing information about a particular trade or business.
Directories are helpful for identifying individuals or organisations that collect specific data.
Indexes used to locate information on a particular topic in several different publications by
using an index.
Non-governmental statistical data refers to published statistical data of great interest to
researchers. Graphic and statistical analyses can be performed on these data to draw meaning
inference.
Government Sources
Census data is a report published by the Government containing information about the
population of the country.
Other Government publications may be pertaining to availability of train tickets just before it
leaves.
Computerised Databases
Online databases are databases consisting of data pertaining to a particular sector (e.g.,
banks) that is accessed with a computer through a telecommunication network.
Bibliographic databases comprises of citations in articles published in journals, magazines,
newspapers etc.
Numeric databases contain numerical and statistical information. For example, time
series
data about stock markets.
Directory databases provide information on individuals, organisations and service. E.g. Getit
Yellow pages.
Special-purpose databases are databases developed online for a special purpose.
External Data-syndicated In response to the growing need for data pertaining to

markets,
consumer etc., companies have started collecting and selling standardised data designed to
serve
the information needs of the shared by a number of organisations. Syndicated data sources
can be
further classified as (a) consumer data (b) retail data (c) wholesale data (d) industrial data (e)
advertising evaluation data and (f) media and audience data.
Consumer data relates to data about consumers purchases and the circumstances surrounding
the purchase.
Retail data rely on retailing establishments for their data. The data collected focus on
the
products or services sold through the outlets and / or the characteristics of the outlets
themselves.
3.5 VERIFICATION OF SECONDARY DATA
Before accepting secondary data it is always necessary to scrutinize it properly in regard to its
accuracy and reliability. It may perhaps happen that the authorities collecting a particular type
of
data may unknowingly carry out investigations using procedures wrongly. Hence it is
always
necessary to carry out the verification of the secondary data in the following manner:
(i) Whether the organization that has collected the data is reliable.
(ii) Whether the appropriate statistical methods were used by the primary data enumerators
and
investigators.
(iii) Whether the data was collected at the proper time.

3.6 COLLECTION OF PRIMARY DATA
By primary data we mean the data that have been collected originally for the first time. In
other
words, primary data may be the outcome of an original statistical enquiry, measurement of
facts
or a count that is undertaken for the first time. For instance data of population census is
primary.
Primary data being fresh from the fields of investigation is very often referred to as raw data.
In
the collection of primary data, a good deal of time, money and energy are required.
3.6.1 QUESTIONNAIRE
A questionnaire is defined as a formalised schedule for collecting data from respondents. It

may
be called as a schedule, interview form or measuring instrument.
Measurement error is a serious problem in questionnaire construction. The broad objective of

a
questionnaire include one without measurement errors. Specifically, the objectives of a
questionnaire are as follows:
a) It must translate the information needed into a set of specific questions that the respondents
can and will answer.
b) The questions should measure what they are supposed to measure.
c) It must stimulate the respondents to participate in the data collection process. The
respondents
should adequately motivated by the virtual construct of the questionnaire.
d) It should not carry an ambiguous statements that confuses the respondents.

3.6.1.1 Questionnaire Components
A questionnaire consists typically of five sections. They are:
a) Identification data
b) Request for cooperation
c) Instruction
d) Information sought
e) Classification of data
3.6.2 OBSERVATION METHODS
This is another type of method used when the researcher feels that survey type of methods
may
not be so relevant in data collection. In subjective issues, respondents need to be observed
rather
than asked lest biases and prejudices happen in their response. Observation method may be
either
structured or unstructured. Structured observation method involves having a set of items to be
observed and how the measurements are to be recorded. In unstructured observation, the
observer monitors all aspects of the phenomena that seem relevant to the problem at hand. In
this
context, the observer may have an open mind to study the persons or object.
3.7 SAMPLING DESIGN
research does not exist without sampling. Every research study requires the selection of some
kind of sample. It is the life blood of research.
Any research study aims to obtain information about the characteristics or parameters of
a
population. A population is the aggregate of all the elements that share some common
set of
characteristics and that comprise the universe for the purpose of the research problem. In
other
words, population is defined as the totality of all cases that conform to some designated
specifications. The specification helps the researcher to define the elements that ought
to be
included and to be excluded. Sometimes, groups that are of, interest to the researcher may be
significantly smaller allowing the researcher to collect data from all the elements of
population.
Collection of data from the entire population is referred to as census study. A census involves
a
complete enumeration of the elements of a population.
Collecting data from the aggregate of all the elements (population) in case of, the number of
elements being larger, would sometimes render the researcher incur huge costs and time. It
may
sometimes be a remote possibility. An alternative way would be to collect information from a
portion of the population, by taking a sample of elements from the population and the on the
basis of information collected from the sample elements, the characteristics of the population
is
inferred. Hence, Sampling is the process of selecting units (e.g., people, organizations) from a
population of interest so that by studying the sample we may fairly generalize our results
back to
the population from which they were chosen.
While deciding on the sampling, the researcher should clearly define the target
population
without allowing any kind of ambiguity and inconsistency on the boundary of the aggregate
set
of respondents. To do so, the researcher may have to use his wisdom, logic and judgment
to
define the boundary of the population keeping with the objectives of the study.
3.8 TYPES OF SAMPLING PLANS
Sampling techniques are classified into two broad categories of probability samples or
non-
probability samples.
3.8.1 Probability Sampling Techniques
Probability samples are characterised by the fact that, the sampling units are selected by
chance.
In such case, each member of the population has a known, non-zero probability of being
selected. However, it may not be true that all sample would have the same probability of
selection, but it is possible to say the probability of selecting any particular sample of a given
size. It is possible that one can calculate the probability that any given population element
would
be included in the sample. This requires a precise definition of the target population as well
as
the sampling frame.
Probability sampling techniques differ in terms of sampling efficiency which is a concept that
refers to trade off between sampling cost and precision. Precision refers to the level of
uncertainty about the characteristics being measured. Precision is inversely related to

sampling
errors but directly related to cost. The greater the precision, the greater the cost and there
should
be a tradeoff between sampling cost and precision. The researcher is required to design the
most
efficient sampling design in order to increase the efficiency of the sampling.
Probability sampling techniques are broadly classified as simple random sampling,

systematic
sampling, and stratified sampling.
Simple Random Sampling
This is the most important and widely used probability sampling technique. They gain
much
significance because of their characteristic of being used to frame the concepts and arguments
in
statistics. Another important feature is that it allows each element in the population to have a
known and equal probability of selection. This means that every element is selected
independently of every other element. This method resembles lottery method where a in a
system
names are placed in a box, the box is shuffled, and the names of the winners are then drawn
out
in an unbiased manner.
Simple random sampling has a definite process, though not, so rigid. It involves compilation
of a
sampling frame in which each element is assigned a unique identification number.
Random
numbers are generated either using random number table or a computer to determine
which
elements to include in the sample. For example, a researcher is interested in investigating the
behavioural pattern of customers while making a decision on purchasing a computer.
Accordingly, the researcher is interested in taking 5 samples from a sampling frame

containing
100 elements. The required sample may be chosen using simple random sampling technique
by
arranging the 100 elements in an order and starting with row 1 and column 1 of random table,
and going down the column until 5 numbers between 1 and 100 are selected. Numbers
outside
this range are ignored. Random number tables are found in every statistics book. It consists of
a
randomly generated series of digits from 0 ± 9. To enhance the readability of the
numbers, a
th th
space between every 4 digit and between every 10 row is given. The researcher may begin
reading from anywhere in the random number table, however, once started the researcher
should
continue to read across the row or down a column. The most important feature of simple
random
sampling is that it facilitates representation of the population by the sample ensuring that
the
statistical conclusions are valid.
Systematic Sampling
This is also another widely used type of sampling technique. This is used because of its ease
and
convenience. As in the case of simple random sampling, it is conducted choosing a
random
starting point and then picking every element in succession from the sampling frame. The
sample
interval, i, is determined by dividing the population size N by the sample size n and rounding
to
the nearest integer.
Consider a situation where the researcher intends to choose 10 elements from a population of
100. In order to choose these 10 elements, number the elements from one to 100.
Within 20
population elements and a sample of size 10, the number is 10/100 = 1/10, meaning that one
element in 10 will be selected. The sample interval will, therefore, be 10. This means that
after a
th
random start from any point in the random table, the researcher has to choose every 10
element.
Systematic sampling is almost similar to simple random sampling in that each population
element has a known and equal probability of selection. However, the difference lies in
that
simple random sampling allows only the permissible samples of size n drawn have a known
and
equal probability of selection. The remaining samples of size n have a zero probability of
being
selected
Stratified sampling
Stratified sampling is a two-way process. It is distinguished from the simple random

sampling
and systematic sampling, in that:
a) It requires division of the parent population into mutually exclusively and exhaustive
subsets;
b) A simple random sample of elements is chosen independently from each group or subset.
Therefore, it characterises that, every population element should be assigned to one and
only
stratum and no population elements should be omitted. Next, elements are selected from each
stratum by simple random sampling technique. Stratified sampling differs from quota
sampling
in that the sample elements are selected probabilistically rather than based on convenience or
on
judgemental basis.
Strata are created by a divider called the stratification variable. This variable divides the
population into strata based on homogeneity, heterogeneity, relatedness or cost.

Sometimes,
more than one variable is used for stratification purpose. This type of sampling is done in
order
to get homogenous elements within each strata and, the elements between each strata
should
have a higher degree of heterogeneity. The number of strata to be formed for the research is
left
to the discretion of the researcher, though, researchers agree that the optimum number of
strata
may be 6.
The reasons for using stratified sampling are as follows:
a) it ensures representation of all important sub-populations in the sample;
b) the cost per observation in the survey may be reduced;
c) it combines the use of simple random sampling with potential gains in precision;
d) estimates of the population parameters may be wanted for each sub-population and;
e) increased accuracy at given cost.
3.8.2 Non-probability Sampling Methods
Non-probability sampling does not involve random selection. It involves personal judgement
of
the researcher rather than chance to select sample elements. Sometimes this judgement
is
imposed by the researcher, while in other cases the selection of population elements to
be
includes is left to the individual field workers. The decision maker may also contribute
to
including a particular individual in the sampling frame. Evidently, non probability sampling
does
associated with the sample.
Sampling error is the degree to which a sample might differ from the population.
Therefore,
while inferring to the population, results could not be reported plus or minus the sampling
error.
In non-probability sampling, the degree to which the sample differs from the population
remains
unknown However, we cannot come to a conclusion that sampling error is an inherent of non
probability sample.
Non-probability samples also yield good estimates of the population characteristics.

Since,
inclusion of the elements in the sample are not determined in a probabilistic way, the
estimates
obtained are not statistically projectable to the population.
The most commonly used non-probability sampling methods are convenience sampling,
judgment sampling, quota sampling, and snowball sampling.

Convenience Sampling
Convenience samples are sometimes called accidental samples because the elements included
in
convenient elements. This refers to happening of the element at the right place at the right
time,
that is, where and when the information for the study is being collected. The selection of the
respondents is left to the discretion of the interviewer. The popular examples of convenience
sampling include (a) respondents who gather in a church (b) students in a class room (c) mall
intercept interviews
without qualifying the respondents for the study (d) tear-out questionnaire included in
magazines
and (e) people on the street. In the above examples, the people may not be qualified
respondents,
however, form part of the sample by virtue of assembling in the place where the researcher is
conveniently placed.
Convenience sampling is the least expensive and least time consuming of all sampling
techniques. The disadvantage with convenience sampling is that the researcher would have
no
way of knowing if the sample chosen is representative of the target population.
Judgement Sampling This is a form of convenience sampling otherwise called as

purposive
sampling because the sample elements are chosen since it is expected that they can serve the
research purpose. The sample elements are chosen based on the judgement that prevails in the
conclude that a particular individual may be a representative of the population in which one is
interested.
The distinguishing feature of judgment sampling is that the population elements are
purposively
selected. Again, the selection is not based on that they are representative, but rather because
they
can offer the contributions sought. In judgement sampling, the researcher may be well aware
of
the characteristics of the prospective respondents, in order that, he includes the individual in
the
requisite experience and knowledge to offer some perspective on the research
question.
Quota Sampling
Quota sampling is another non-probability sampling. It attempts to ensure that the sample
chosen
by the researcher is a representative by selecting elements in such a way that the proportion
of
the sample elements possessing a certain characteristic is approximately the same as the
proportion of the elements with the characteristic in the population.
Quota sampling is viewed as two-staged restricted judgemental sampling technique. The

first
stage consists of developing control categories, or quotas, of population elements. Control
characteristics involve age, sex, and race identified on the basis of judgement. Then the
distribution of these characteristics in the target population is determined. For example,

the
researcher may use control categories in that, he/she intends to study 40% of men and 60% of
women in a population. Sex is the control group and the percentages fixed are the quotas.
In the second stage, sample elements are selected based on convenience or judgement. Once
the
quotas have been determined, there is considerable freedom to select the elements to be
included
in the sample. For example, the researcher may not choose more than 40% of men and 60%
of
women in the study. Even if the researcher comes across qualified men after reaching the
40%
mark, he/she would still restrict entry of men into the sample and keep searching for women
till
the quota is fulfilled.
Snowball Sampling
This is another popular non-probability technique widely used, especially in academic

research.
In this technique, an initial group of respondents is selected, usually at random. After
being
interviewed, these respondents are asked to identify others who belong to the target
population of
interest. Subsequent respondents are selected based on the information provided by the
selected
group members. The group members may provide information based on their
understanding
about the qualification of the other prospective respondents. This method involves
probability
and non-probability methods. The initial respondents are chosen by a random method and the
subsequent respondents are chosen by non-probability methods.

Unit 3 RM

Uploaded by

Copyright:

Available Formats

Unit 3 RM

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 3 RM

Uploaded by

Copyright:

Available Formats

UNIT III

b) It is feasible for a firm to collect.

to the present situation.

Mostly secondary data helps to:

a) Identify the problem.

b) Better define the problem.

d) Formulate an appropriate research design by identifying the key variables.

e) Answer certain research questions and formulate hypotheses.

f) Interpret the primary data more in-depth.

3.2 SECONDARY DATA

obtainable from reliable records, books, government publications and journals.

3.3 SOURCES OF SECONDARY DATA

The following are some of the sources of secondary data:

3. Foreign government publications.

4. Official publications as well as reports of municipalities, district parishads, etc.

5. Reports and publications of commissions - like U.G.C. education commission, tariff

commission, chambers of commerce, co-operative societies, trade associations, banks, stock

exchanges, business houses etc.

Journal of Economics, Commerce, Capital, Economical Eastern Economist, etc. Further

7. Publications brought out by research institutions, universities as well as those published by

research workers give considerable secondary data.

8. Through the Internet/website sources.

company. Important internal source of secondary data is database marketing, Database

General Business Data

Guides or small booklets containing information about a particular trade or business.

Indexes used to locate information on a particular topic in several different publications by

Non-governmental statistical data refers to published statistical data of great interest to

population of the country.

banks) that is accessed with a computer through a telecommunication network.

Bibliographic databases comprises of citations in articles published in journals, magazines,

Special-purpose databases are databases developed online for a special purpose.

External Data-syndicated In response to the growing need for data pertaining to

advertising evaluation data and (f) media and audience data.

3.5 VERIFICATION OF SECONDARY DATA

(iii) Whether the data was collected at the proper time.

A questionnaire is defined as a formalised schedule for collecting data from respondents. It

Measurement error is a serious problem in questionnaire construction. The broad objective of

questionnaire are as follows:

can and will answer.

b) The questions should measure what they are supposed to measure.

d) It should not carry an ambiguous statements that confuses the respondents.

A questionnaire consists typically of five sections. They are:

b) Request for cooperation

3.6.2 OBSERVATION METHODS

3.7 SAMPLING DESIGN

kind of sample. It is the life blood of research.

3.8 TYPES OF SAMPLING PLANS

3.8.1 Probability Sampling Techniques

uncertainty about the characteristics being measured. Precision is inversely related to

Probability sampling techniques are broadly classified as simple random sampling,

Simple Random Sampling

behavioural pattern of customers while making a decision on purchasing a computer.

Accordingly, the researcher is interested in taking 5 samples from a sampling frame

Stratified sampling is a two-way process. It is distinguished from the simple random

population into strata based on homogeneity, heterogeneity, relatedness or cost.

The reasons for using stratified sampling are as follows:

a) it ensures representation of all important sub-populations in the sample;

b) the cost per observation in the survey may be reduced;

e) increased accuracy at given cost.

3.8.2 Non-probability Sampling Methods