Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit Iv (Research Methods in Business)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

UNIT – IV:

Processing of Data: Editing – Coding – Classification –Tabulation.


Concept of standard error: Criteria for judging Significance at Various levels.
Hypothesis:
Meaning – Basic concept of Hypotheses testing – Flow diagram for testing.

PROCESSING OF DATA

Data processing is concerned with editing, coding, classifying, tabulating and


charting and diagramming research data. The essence of data processing in
research is data reduction. Data reduction involves winnowing out the
irrelevant from the relevant data and establishing order from chaos and giving
shape to a mass of data. Data processing in research consists of five important
steps. They are:

1. Editing of Data
2. Coding of Data
3. Classification of Data
4. Tabulation of Data
5. Data Diagrams

1. Editing of Data
Editing is the first step in data processing. Editing is the process of examining
the data collected in questionnaires/schedules to detect errors and omissions
and to see that they are corrected and the schedules are ready for tabulation.
When the whole data collection is over a final and a thorough check up is
made. Mildred B. Parten in his book points out that the editor is responsible
for seeing that the data are;
 Accurate as possible,

1
 Consistent with other facts secured,
 Uniformly entered,
 As complete as possible,
 Acceptable for tabulation and arranged to facilitate coding tabulation.

TYPES OF EDITING
1. Editing for quality asks the following questions: are the data forms
complete, are the data free of bias, are the recordings free of errors, are the
inconsistencies in responses within limits, are there evidences to show
dishonesty of enumerators or interviewers and are there any wanton
manipulation of data.
2. Editing for tabulation does certain accepted modification to data or even
rejecting certain pieces of data in order to facilitate tabulation. or instance,
extremely high or low value data item may be ignored or bracketed with
suitable class interval.
3. Field Editing is done by the enumerator. The schedule filled up by the
enumerator or the respondent might have some abbreviated writings, illegible
writings and the like. These are rectified by the enumerator. This should be
done soon after the enumeration or interview before the loss of memory. The
field editing should not extend to giving some guess data to fill up omissions.
4. Central Editing is done by the researcher after getting all schedules or
questionnaires or forms from the enumerators or respondents. Obvious
errors can be corrected. For missed data or information, the editor may
substitute data or information by reviewing information provided by likely
placed other respondents. A definite inappropriate answer is removed and

2
“no answer” is entered when reasonable attempts to get the appropriate
answer fail to produce results.
Editors must keep in view the following points while performing their
work:

 They should be familiar with instructions given to the interviewers and


coders as well as with the editing instructions supplied to them for the
purpose,
 While crossing out an original entry for one reason or another, they
should just draw a single line on it so that the same may remain legible,
 They must make entries (if any) on the form in some distinctive color
and that too in a standardized form,
 They should initial all answers which they change or supply,
 Editor’s initials and the data of editing should be placed on each
completed form or schedule.

2. Coding of Data
Coding is necessary for efficient analysis and through it the several replies
may be reduced to a small number of classes which contain the critical
information required for analysis. Coding decisions should usually be taken at
the designing stage of the questionnaire. This makes it possible to pre-code
the questionnaire choices and which in turn is helpful for computer tabulation
as one can straight forward key punch from the original questionnaires. But in
case of hand coding some standard method may be used. One such standard
method is to code in the margin with a colored pencil. The other method can
be to transcribe the data from the questionnaire to a coding sheet. Whatever

3
method is adopted, one should see that coding errors are altogether
eliminated or reduced to the minimum level.

Coding is the process/operation by which data/responses are organized into


classes/categories and numerals or other symbols are given to each item
according to the class in which it falls. In other words, coding involves two
important operations; (a) deciding the categories to be used and (b) allocating
individual answers to them. These categories should be appropriate to the
research problem, exhaustive of the data, mutually exclusive and uni –
directional Since the coding eliminates much of information in the raw data, it
is important that researchers design category sets carefully in order to utilize
the available data more fully.
The study of the responses is the first step in coding. In the case of pressing –
coded questions, coding begins at the preparation of interview schedules.
Secondly, coding frame is developed by listing the possible answers to each
question and assigning code numbers or symbols to each of them which are
the indicators used for coding. The coding frame is an outline of what is coded
and how it is to be coded. That is, a coding frame is an outline of what is coded
and how it is to be coded. That is, coding frame is a set of explicit rules and
conventions that are used to base classification of observations variable into
values which are which are transformed into numbers. Thirdly, after
preparing the sample frame the gradual process of fitting the answers to the
questions must be begun. Lastly, transcription is undertaken i.e., transferring
of the information from the schedules to a separate sheet called transcription
sheet. Transcription sheet is a large summary sheet which contain the

4
answer/codes of all the respondents. Transcription may not be necessary
when only simple tables are required and the number of respondents are few.

3. Classification of Data
Classification or categorization is the process of grouping the statistical data
under various understandable homogeneous groups for the purpose of
convenient interpretation. A uniformity of attributes is the basic criterion for
classification; and the grouping of data is made according to similarity.
Classification becomes necessary when there is a diversity in the data
collected for meaningless for meaningful presentation and analysis. However,
it is meaningless in respect of homogeneous data. A good classification should
have the characteristics of clarity, homogeneity, equality of scale,
purposefulness and accuracy.
Objectives of Classification
1. The complex scattered and haphazard data is organized into concise,
logical and intelligible form.
2. It is possible to make the characteristics of similarities and dis –
similarities clear.
3. Comparative studies is possible.
4. Understanding of the significance is made easier and thereby good deal
of human energy is saved.
5. Underlying unity amongst different items is made clear and expressed.
6. Data is so arranged that analysis and generalization becomes possible.

Classification is of two types, viz., quantitative classification, which is on the


basis of variables or quantity and qualitative classification, in which

5
classification according to attributes. The former is the way of, grouping the
variables, say, quantifying the variables in cohesive groups, while the latter
groups the data on the basis of attributes or qualities. Again, it may be
multiple classification or dichotomous classification. The former is the way of
making many (more than two) groups on the basis of some quality or
attributes while the latter is the classification into two groups on the basis of
presence or absence of a certain quality. Grouping the workers of a factory
under various income (class intervals) groups come under the multiple
classification; and making two groups into skilled workers and unskilled
workers is the dichotomous classification. The tabular form of such
classification is known as statistical series, which may be inclusive or
exclusive.
4. Tabulation of Data
Tabulation is the process of summarizing raw data and displaying it in
compact form for further analysis. Therefore, preparing tables is a very
important step. Tabulation may be by hand, mechanical, or electronic. The
choice is made largely on the basis of the size and type of study, alternative
costs, time pressures, and the availability of computers, and computer
programmes. If the number of questionnaire is small, and their length short,
hand tabulation is quite satisfactory.

Table may be divided into: (i) Frequency tables, (ii) Response tables, (iii)
Contingency tables, (iv) Uni-variate tables, (v) Bi-variate tables, (vi) Statistical
table and (vii) Time series tables.

6
Generally a research table has the following parts: (a) table number, (b) title
of the table, (c) caption (d) stub (row heading), (e) body, (f) head note, (g) foot
note.

As a general rule the following steps are necessary in the preparation of


table:
1. Title of table: The table should be first given a brief, simple and clear title
which may express the basis of classification.
2. Columns and rows: Each table should be prepared in just adequate
number of columns and rows.
3. Captions and stubs: The columns and rows should be given simple and
clear captions and stubs.
4. Ruling: Columns and rows should be divided by means of thin or thick
rulings.
5. Arrangement of items; Comparable figures should be arranged side by
side.
6. Deviations: These should be arranged in the column near the original data
so that their presence may easily be noted.
7. Size of columns: This should be according to the requirement.
8. Arrangements of items: This should be according to the problem.
9. Special emphasis: This can be done by writing important data in bold or
special letters.
10. Unit of measurement: The unit should be noted below the lines.
11. Approximation: This should also be noted below the title.
12. Foot – notes: These may be given below the table.
13. Total: Totals of each column and grand total should be in one line.

7
14. Source : Source of data must be given. For primary data, write primary
data.
It is always necessary to present facts in tabular form if they can be presented
more simply in the body of the text. Tabular presentation enables the reader
to follow quickly than textual presentation. A table should not merely repeat
information covered in the text. The same information should not, of course
be presented in tabular form and graphical form. Smaller and simpler tables
may be presented in the text while the large and complex table may be placed
at the end of the chapter or report.

5. Data Diagrams
Diagrams are charts and graphs used to present data. These facilitate getting
the attention of the reader more. These help presenting data more effectively.
Creative presentation of data is possible. The data diagrams classified into:
1. Charts: A chart is a diagrammatic form of data presentation. Bar charts,
rectangles, squares and circles can be used to present data. Bar charts are uni-
dimensional, while rectangular, squares and circles are two-dimensional.
2. Graphs: The method of presenting numerical data in visual form is called
graph, A graph gives relationship between two variables by means of either a
curve or a straight line. Graphs may be divided into two categories. (1) Graphs
of Time Series and (2) Graphs of Frequency Distribution. In graphs of time
series one of the factors is time and other or others is / are the study factors.
Graphs on frequency show the distribution of by income, age, etc. of
executives and so on.

8
CONCEPT OF STANDARD ERROR
Standard error (SE) is a statistic that reveals how accurately sample data
represents the whole population. It measures the accuracy with which a
sample distribution represents a population by using standard deviation.

STANDARD ERROR FORMULA


The accuracy of a sample that describes a population is identified through the
SE formula. The sample mean which deviates from the given population and
that deviation is given as;

Where S is the standard deviation and n is the number of observations.

HOW TO CALCULATE STANDARD ERROR

Step 1: Note the number of measurements (n) and determine the sample
mean (μ). It is the average of all the measurements.

Step 2: Determine how much each measurement varies from the mean.

Step 3: Square all the deviations determined in step 2 and add altogether: Σ(xi
– μ)²

Step 4: Divide the sum from step 3 by one less than the total number of
measurements (n-1).

Step 5: Take the square root of the obtained number, which is the standard
deviation (σ).

Step 6: Finally, divide the standard deviation obtained by the square root of
the number of measurements (n) to get the standard error of your estimate.

Go through the example given below to understand the method of calculating


standard error.

9
CRITERIA FOR JUDGING SIGNIFICANCE AT VARIOUS LEVELS

In everyday language significant means important, but when used in statistics,


‘significant’ means a result has a high probability of being true (not due to
chance) and it does not mean (necessarily) that it is highly important. A
research finding may be true without being important.

The significance level (or α level) is a threshold that determines whether a


study result can be considered statistically significant after performing the
planned statistical tests. It is most often set to 5% (or 0.05), although other
levels may be used depending on the study. It is the probability of rejecting
the null hypothesis when it is true (the probability to commit a type I error).
For example, a significance level of 0.05 indicates a 5% risk of concluding that
a difference exists when there is no actual difference.

p-value

The probability value (p-value) is the likelihood of obtaining an effect at least


as large as the one that was observed, assuming that the null hypothesis is
true; in other words, the likelihood of the observed effect being caused by
some variable other than the one being studied or by chance.

The p-value helps to quantify the proof against the null hypothesis:

a large p-value suggests that the observed effect is very likely if the null
hypothesis is true.

a small p-value (equal to or less than the significance level) suggests that the
observed evidence is not very likely if the null hypothesis is true – i.e. either a
very unusual event has happened or the null hypothesis is incorrect.

The p-value is compared with a pre-defined cut-off for the test (significance
level). If it is smaller than this value, the estimated effect is considered to be
significant. Often a p-value of 0.05 or 0.01 (written ‘p ≤ 0.05’ or ‘p ≤ 0.01’) are
chosen as cut-offs.

10
HYPOTHESIS

“A proposition, condition or principle which is assumed, perhaps without


belief, in order to draw out its logical consequences and by this method to test
its accord with facts which are known or may be determined.”
- Webster’s New International Dictionary
“A hypothesis is a tentative generalization the validity of which remains to be
tested. In its most elementary stage, the hypothesis may be any hunch, guess,
imaginative idea which becomes the basis for further investigation.”
- Lungberg
-
CHARACTERISTICS OF HYPOTHESIS
1. Hypothesis should be clear and precise. If the hypothesis is not clear and
precise, the inferences drawn on its basis cannot be taken as reliable.
2. Hypothesis should be capable of being tested. In a swamp of untestable
hypotheses, many a time the research programs have bogged down. Some
prior studies may be done by researchers in order to make the hypothesis a
testable one. A hypothesis “is testable if other deductions can be made from it
which, in turn, can be confirmed or disproved by observation.”
3. Hypothesis should state the relationship between variables if it
happens to be a relational hypothesis.
4. Hypothesis should be limited in scope and must be specific. A
researcher must remember that narrower hypotheses are generally more
testable and he should develop such hypotheses.
5. Hypothesis should be stated as far as possible in most simple terms so
that the same is easily understandable by all concerned. But one must

11
remember that simplicity of the hypothesis has nothing to do with its
significance.
6. Hypothesis should be consistent with most known facts i.e., it must be
consistent with a substantial body of established facts. In other words, it
should be one which judges accept as being the most likely.
7. Hypothesis should be amenable to testing within a reasonable time.
One should not use even an excellent hypothesis, if the same cannot be tested
in a reasonable time for one cannot spend a lifetime collecting data to test it.
8. Hypothesis must explain the facts that gave rise to the need for
explanation. This means that by using the hypothesis plus other known and
accepted generalizations, one should be able to deduce the original problem
condition. Thus hypothesis must actually explain what it claims to explain; it
should have the empirical reference.

IMPORTANCE OF HYPOTHESIS
1. Helps in the testing of the theories.
2. Serves as a great platform in the investigation activities.
3. Provides guidance to the research work or study.
4. Hypothesis sometimes suggests theories.
5. Helps in knowing the needs of the data.
6. Explains social phenomena.
7. Develops the theory.
8. Also acts as a bridge between the theory and the investigation.

12
FORMULATION OF HYPOTHESIS
(i) Making a formal statement: The step consists in making a formal
statement of the null hypothesis (H0) and also of the alternative hypothesis
(Ha). This means that hypotheses should be clearly stated, considering the
nature of the research problem. For instance, Mr. Mohan of the Civil
Engineering Department wants to test the load bearing capacity of an old
bridge which must be more than 10 tons, in that case he can state his
hypotheses as under:
Null hypothesis H0 : m = 10 tons
Alternative Hypothesis Ha: m > 10 tons
Take another example. The average score in an aptitude test administered at
the national level is 80.
To evaluate a state’s education system, the average score of 100 of the state’s
students selected on random basis was 75. The state wants to know if there is
a significant difference between the local scores and the national scores. In
such a situation the hypotheses may be stated as under:
Null hypothesis H0: m = 80
Alternative Hypothesis Ha: m ¹ 80
The formulation of hypotheses is an important step which must be
accomplished with due care in accordance with the object and nature of the
problem under consideration. It also indicates whether we should use a one-
tailed test or a two-tailed test. If Ha is of the type greater than (or of the type
lesser than), we use a one-tailed test, but when Ha is of the type “whether
greater or smaller” then we use a two-tailed test.
(ii) Selecting a significance level: The hypotheses are tested on a pre-
determined level of significance and as such the same should be specified.

13
Generally, in practice, either 5% level or 1% level is adopted for the purpose.
The factors that affect the level of significance are: (a) the magnitude of the
difference between sample means; (b) the size of the samples; (c) the
variability of measurements within samples; and (d) whether the hypothesis
is directional or non-directional (A directional hypothesis is one which
predicts the direction of the difference between, say, means). In brief, the level
of significance must be adequate in the context of the purpose and nature of
enquiry.
(iii) Deciding the distribution to use: After deciding the level of significance,
the next step in hypothesis testing is to determine the appropriate sampling
distribution. The choice generally remains between normal distribution and
the t-distribution. The rules for selecting the correct distribution are similar to
those which we have stated earlier in the context of estimation.
(iv) Selecting a random sample and computing an appropriate value:
Another step is to select a random sample(s) and compute an appropriate
value from the sample data concerning the test statistic utilizing the relevant
distribution. In other words, draw a sample to furnish empirical data.
(v) Calculation of the probability: One has then to calculate the probability
that the sample result would diverge as widely as it has from expectations, if
the null hypothesis were in fact true.

TYPES OF HYPOTHESIS
1. Working Hypothesis
Working hypothesis is a preliminary assumption of the researcher about the
research topic, particularly when sufficient information is not available to
establish a hypothesis, and as a step towards formulating the final research

14
hypothesis. Working hypotheses are used to design the final research plan, to
place the research problem in its right context and to reduce the research
topic to an acceptable size.
2. Scientific Hypothesis
Scientific hypothesis contains statement based on or derived from sufficient
theoretical and empirical data.
3. Alternative Hypothesis
Alternative hypothesis is a set of two hypothesis (research and null) which
states the opposite of the null hypothesis. In statistical tests of null hypothesis,
acceptance of Ho (null hypothesis) means rejection of the alternative
hypothesis; and rejection of Ho means similarly acceptance of the alternative
hypothesis.
4. Research Hypothesis
Research hypothesis is a researcher’s proposition about some social fact
without reference to its particular attributes. Researcher believes that it is
true and wants that it should be disproved, e.g., Muslims have more children
than Hindus, or drug abuse is found among upper-class students living in
hostels or rented rooms. Research hypothesis may be derived from theories
or may result in developing of theories.
5. Null Hypothesis
Null hypothesis is reverse of research hypothesis. It is a hypothesis of no
relationship. Null hypothesis does not exist in reality but are used to test
research hypothesis.
6. Statistical Hypothesis
Statistical hypothesis, according to winter, is a statement/observation about
statistical populations that one seeks to support or refute. The things are

15
reduced to numerical quantities and decisions are made about these
quantities, e.g., income difference between two groups: group A is richer than
group B. Null hypothesis will be: group A is not richer than group B. Here,
variables are reduced to measurable quantities.
BASIC CONCEPTS CONCERNING TESTING OF HYPOTHESES

(a) Null hypothesis and alternative hypothesis: In the context of


statistical analysis, we often talk about null hypothesis and alternative
hypothesis. If we are to compare method A with method B about its
superiority and if we proceed on the assumption that both methods are
equally good, then this assumption is termed as the null hypothesis. As against
this, we may think that the method A is superior or the method B is inferior,
we are then stating what is termed as alternative hypothesis. The null
hypothesis is generally symbolized as H0 and the alternative hypothesis as Ha.

(b) The level of significance: This is a very important concept in the


context of hypothesis testing. It is always some percentage (usually 5%)
which should be chosen wit great care, thought and reason. In case we take
the significance level at 5 per cent, then this implies that H0 will be rejected
when the sampling result (i.e., observed evidence) has a less than 0.05
probability of occurring if H0 is true. In other words, the 5 per cent level of
significance means that researcher is willing to take as much as a 5 per cent
risk of rejecting the null hypothesis when it (H0) happens to be true. Thus the
significance level is the maximum value of the probability of rejecting H0
when it is true and is usually determined in advance before testing the
hypothesis.

(c) Decision rule or test of hypothesis: Given a hypothesis H0 and an


alternative hypothesis Ha, we make a rule which is known as decision rule
according to which we accept H0 (i.e., reject Ha) or reject H0 (i.e., accept Ha).
For instance, if (H0 is that a certain lot is good (there are very few defective
items in it) against Ha) that the lot is not good (there are too many defective
items in it), then we must decide the number of items to be tested and the
criterion for accepting or rejecting the hypothesis. We might test 10 items in

16
the lot and plan our decision saying that if there are none or only 1 defective
item among the 10, we will accept H0 otherwise we will reject H0 (or accept
Ha). This sort of basis is known as decision rule.

(d) Type I and Type II errors: In the context of testing of hypotheses, there
are basically two types of errors we can make. We may reject H0 when H0 is
true and we may accept H0 when in fact H0 is not true. The former is known
as Type I error and the latter as Type II error. In other words, Type I error
means rejection of hypothesis which should have been accepted and Type II
error means accepting the hypothesis which should have been rejected. Type I
error is denoted by  (alpha) known as  error, also called the level of
significance of test; and Type II error is denoted by  (beta) known as  error,

(e) Two-tailed and One-tailed tests: In the context of hypothesis testing,


these two terms are quite important and must be clearly understood. A two-
tailed test rejects the null hypothesis if, say, the sample mean is significantly
higher or lower than the hypothesized value of the mean of the population.
Such a test is appropriate when the null hypothesis is some specified value
and the alternative hypothesis is a value not equal to the specified value of the
null hypothesis.

17
FLOW DIAGRAM FOR HYPOTHESIS TESTING

18

You might also like