Unit 3 RM
Unit 3 RM
Unit 3 RM
DATA COLLECTION
3.1 INTRODUCTION
The next step in the research process after identifying the type of research the researcher
intends
to do is the deciding on the selection of the data collection techniques. The data
collection
technique is different for different types of research design. There are predominantly two
types
of data: (i) the primary data and (ii) the secondary data.
Primary data is one a researcher collects for a specific purpose of investigating the
research
problem at hand. Secondary data are ones that have not been collected for the immediate
study at
hand but for purposes other than the problem at hand. Both types of data offer
specific
advantages and disadvantages.
a) Secondary data offer cost and time economies to the researcher as they already exist in
various
forms in the company or in the market.
c) Since they are collected for some other purposes, it may sometimes not fit perfectly into
the
problem defined.
d) The objectives, nature and methods used to collect the secondary data may not be appropriate
Secondary data are the data that are in actual existence in accessible records, having been
already
collected and treated statistically by the persons maintaining the records. In other words,
secondary data are the data that have been already collected, presented tabulated, treated with
necessary statistical techniques and conclusions have been drawn. Therefore, collecting
secondary data doesn't mean doing some original enumeration but it merely means
obtaining
data that have already been collected by some agencies, reliable persons, government
departments, research workers, dependable organisations etc. Secondary data are easily
When once primary data have been originally collected, moulded by statisticians or statistical
machinery, then it becomes secondary in the hands of all other persons who may be desirous
of
handling it for their own purpose or studies. It follows, therefore, that primary and
secondary
data are demarcated separately and that the distinction between them is of degree only. It
a
person 'X' collects some data originally, then the data is primary data to 'X' whereas the same
data when used by another person 'Y' becomes secondary data to 'Y'.
2. Publications brought out by international organisation like the UNO, UNESCO, etc.
6. Well-know newspapers and journals like the Economic Times, The Financial Express, Indian
Year Books such as Times of India Year Book, Statesman's Year Book also provide valuable
data.
Though the given list of secondary data cannot be said to be thorough or complete, yet it can
be
pointed out that it fairly indicates the chief sources of secondary data. Also, besides the above
mentioned data there are a number of other important sources, such as records of
governments in
various departments, unpublished manuscripts of eminent scholars, research workers,
statisticians, economists, private organisations, labour bureaus and records of business firms.
3.4 TYPES OF SECONDARY DATA
Secondary data are of two types. Data that are originated from within the company are
called as internal data. If they are collected for some other purpose, they are internal
secondary
data. This poses significant advantage as they are readily available in the company at low
cost.
The most convenient example internal secondary data is the figures relating sales of the
product.
Published external secondary data refers to the data available without the company. There is
such
a pool of published data available in the market that it is sometimes easy to underestimate
what
is available and thereby bypass relevant information. Several sources of external data
are
available. They are:
Directories are helpful for identifying individuals or organisations that collect specific data.
using an index.
researchers. Graphic and statistical analyses can be performed on these data to draw meaning
inference.
Government Sources
Census data is a report published by the Government containing information about the
Other Government publications may be pertaining to availability of train tickets just before it
leaves.
Computerised Databases
Online databases are databases consisting of data pertaining to a particular sector (e.g.,
newspapers etc.
Numeric databases contain numerical and statistical information. For example, time
series
data about stock markets.
Directory databases provide information on individuals, organisations and service. E.g. Getit
Yellow pages.
Consumer data relates to data about consumers purchases and the circumstances surrounding
the purchase.
Retail data rely on retailing establishments for their data. The data collected focus on
the
products or services sold through the outlets and / or the characteristics of the outlets
themselves.
Before accepting secondary data it is always necessary to scrutinize it properly in regard to its
accuracy and reliability. It may perhaps happen that the authorities collecting a particular type
of
data may unknowingly carry out investigations using procedures wrongly. Hence it is
always
necessary to carry out the verification of the secondary data in the following manner:
(i) Whether the organization that has collected the data is reliable.
(ii) Whether the appropriate statistical methods were used by the primary data enumerators
and
investigators.
By primary data we mean the data that have been collected originally for the first time. In
other
words, primary data may be the outcome of an original statistical enquiry, measurement of
facts
or a count that is undertaken for the first time. For instance data of population census is
primary.
Primary data being fresh from the fields of investigation is very often referred to as raw data.
In
the collection of primary data, a good deal of time, money and energy are required.
3.6.1 QUESTIONNAIRE
a) It must translate the information needed into a set of specific questions that the respondents
c) It must stimulate the respondents to participate in the data collection process. The
respondents
should adequately motivated by the virtual construct of the questionnaire.
a) Identification data
c) Instruction
d) Information sought
e) Classification of data
This is another type of method used when the researcher feels that survey type of methods
may
not be so relevant in data collection. In subjective issues, respondents need to be observed
rather
than asked lest biases and prejudices happen in their response. Observation method may be
either
structured or unstructured. Structured observation method involves having a set of items to be
observed and how the measurements are to be recorded. In unstructured observation, the
observer monitors all aspects of the phenomena that seem relevant to the problem at hand. In
this
context, the observer may have an open mind to study the persons or object.
research does not exist without sampling. Every research study requires the selection of some
Any research study aims to obtain information about the characteristics or parameters of
a
population. A population is the aggregate of all the elements that share some common
set of
characteristics and that comprise the universe for the purpose of the research problem. In
other
words, population is defined as the totality of all cases that conform to some designated
specifications. The specification helps the researcher to define the elements that ought
to be
included and to be excluded. Sometimes, groups that are of, interest to the researcher may be
significantly smaller allowing the researcher to collect data from all the elements of
population.
Collection of data from the entire population is referred to as census study. A census involves
a
complete enumeration of the elements of a population.
Collecting data from the aggregate of all the elements (population) in case of, the number of
elements being larger, would sometimes render the researcher incur huge costs and time. It
may
sometimes be a remote possibility. An alternative way would be to collect information from a
portion of the population, by taking a sample of elements from the population and the on the
basis of information collected from the sample elements, the characteristics of the population
is
inferred. Hence, Sampling is the process of selecting units (e.g., people, organizations) from a
population of interest so that by studying the sample we may fairly generalize our results
back to
the population from which they were chosen.
While deciding on the sampling, the researcher should clearly define the target
population
without allowing any kind of ambiguity and inconsistency on the boundary of the aggregate
set
of respondents. To do so, the researcher may have to use his wisdom, logic and judgment
to
define the boundary of the population keeping with the objectives of the study.
Sampling techniques are classified into two broad categories of probability samples or
non-
probability samples.
Probability samples are characterised by the fact that, the sampling units are selected by
chance.
In such case, each member of the population has a known, non-zero probability of being
selected. However, it may not be true that all sample would have the same probability of
selection, but it is possible to say the probability of selecting any particular sample of a given
size. It is possible that one can calculate the probability that any given population element
would
be included in the sample. This requires a precise definition of the target population as well
as
the sampling frame.
Probability sampling techniques differ in terms of sampling efficiency which is a concept that
refers to trade off between sampling cost and precision. Precision refers to the level of
This is the most important and widely used probability sampling technique. They gain
much
significance because of their characteristic of being used to frame the concepts and arguments
in
statistics. Another important feature is that it allows each element in the population to have a
known and equal probability of selection. This means that every element is selected
independently of every other element. This method resembles lottery method where a in a
system
names are placed in a box, the box is shuffled, and the names of the winners are then drawn
out
in an unbiased manner.
Simple random sampling has a definite process, though not, so rigid. It involves compilation
of a
sampling frame in which each element is assigned a unique identification number.
Random
numbers are generated either using random number table or a computer to determine
which
elements to include in the sample. For example, a researcher is interested in investigating the
and going down the column until 5 numbers between 1 and 100 are selected. Numbers
outside
this range are ignored. Random number tables are found in every statistics book. It consists of
a
randomly generated series of digits from 0 ± 9. To enhance the readability of the
numbers, a
th th
space between every 4 digit and between every 10 row is given. The researcher may begin
reading from anywhere in the random number table, however, once started the researcher
should
continue to read across the row or down a column. The most important feature of simple
random
sampling is that it facilitates representation of the population by the sample ensuring that
the
statistical conclusions are valid.
Systematic Sampling
This is also another widely used type of sampling technique. This is used because of its ease
and
convenience. As in the case of simple random sampling, it is conducted choosing a
random
starting point and then picking every element in succession from the sampling frame. The
sample
interval, i, is determined by dividing the population size N by the sample size n and rounding
to
the nearest integer.
Consider a situation where the researcher intends to choose 10 elements from a population of
100. In order to choose these 10 elements, number the elements from one to 100.
Within 20
population elements and a sample of size 10, the number is 10/100 = 1/10, meaning that one
element in 10 will be selected. The sample interval will, therefore, be 10. This means that
after a
th
random start from any point in the random table, the researcher has to choose every 10
element.
Systematic sampling is almost similar to simple random sampling in that each population
element has a known and equal probability of selection. However, the difference lies in
that
simple random sampling allows only the permissible samples of size n drawn have a known
and
equal probability of selection. The remaining samples of size n have a zero probability of
being
selected
Stratified sampling
a) It requires division of the parent population into mutually exclusively and exhaustive
subsets;
b) A simple random sample of elements is chosen independently from each group or subset.
Therefore, it characterises that, every population element should be assigned to one and
only
stratum and no population elements should be omitted. Next, elements are selected from each
stratum by simple random sampling technique. Stratified sampling differs from quota
sampling
in that the sample elements are selected probabilistically rather than based on convenience or
on
judgemental basis.
Strata are created by a divider called the stratification variable. This variable divides the
c) it combines the use of simple random sampling with potential gains in precision;
d) estimates of the population parameters may be wanted for each sub-population and;
Non-probability sampling does not involve random selection. It involves personal judgement
of
the researcher rather than chance to select sample elements. Sometimes this judgement
is
imposed by the researcher, while in other cases the selection of population elements to
be
includes is left to the individual field workers. The decision maker may also contribute
to
including a particular individual in the sampling frame. Evidently, non probability sampling
does
Sampling error is the degree to which a sample might differ from the population.
Therefore,
while inferring to the population, results could not be reported plus or minus the sampling
error.
In non-probability sampling, the degree to which the sample differs from the population
remains
unknown However, we cannot come to a conclusion that sampling error is an inherent of non
probability sample.
The most commonly used non-probability sampling methods are convenience sampling,
Convenience samples are sometimes called accidental samples because the elements included
in
convenient elements. This refers to happening of the element at the right place at the right
time,
that is, where and when the information for the study is being collected. The selection of the
respondents is left to the discretion of the interviewer. The popular examples of convenience
sampling include (a) respondents who gather in a church (b) students in a class room (c) mall
intercept interviews
without qualifying the respondents for the study (d) tear-out questionnaire included in
magazines
and (e) people on the street. In the above examples, the people may not be qualified
respondents,
however, form part of the sample by virtue of assembling in the place where the researcher is
conveniently placed.
Convenience sampling is the least expensive and least time consuming of all sampling
techniques. The disadvantage with convenience sampling is that the researcher would have
no
way of knowing if the sample chosen is representative of the target population.
research purpose. The sample elements are chosen based on the judgement that prevails in the
conclude that a particular individual may be a representative of the population in which one is
interested.
The distinguishing feature of judgment sampling is that the population elements are
purposively
selected. Again, the selection is not based on that they are representative, but rather because
they
can offer the contributions sought. In judgement sampling, the researcher may be well aware
of
the characteristics of the prospective respondents, in order that, he includes the individual in
the
requisite experience and knowledge to offer some perspective on the research
question.
Quota Sampling
Quota sampling is another non-probability sampling. It attempts to ensure that the sample
chosen
by the researcher is a representative by selecting elements in such a way that the proportion
of
the sample elements possessing a certain characteristic is approximately the same as the
characteristics involve age, sex, and race identified on the basis of judgement. Then the
women in a population. Sex is the control group and the percentages fixed are the quotas.
In the second stage, sample elements are selected based on convenience or judgement. Once
the
quotas have been determined, there is considerable freedom to select the elements to be
included
in the sample. For example, the researcher may not choose more than 40% of men and 60%
of
women in the study. Even if the researcher comes across qualified men after reaching the
40%
mark, he/she would still restrict entry of men into the sample and keep searching for women
till
the quota is fulfilled.
Snowball Sampling