Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit 4 RM Bba

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

RESEARCH METHODOLOGY BBA-2023

Unit 4 Data Processing and Analysis

Dr. Saileja Mohanty


Assistant Professor
School of Business

MIT World Peace University, Pune

Email id- saileja.mohanty@mitwpu.edu.in


Outline of the presentation

Syllabus of Unit 4 Data processing and analysis


➢ Editing
➢ Coding

➢ Classification

➢ Tabulation

➢ Scaling and Measurement


Data processing and analysis

◼ The data, after collection, has to be processed and analysed in accordance with the
outline laid down for the purpose at the time of developing the research plan.
◼ This is essential for a scientific study and for ensuring that we have all relevant
data for making contemplated comparisons and analysis.
◼ Technically speaking, processing implies editing, coding, classification and
tabulation of collected data so that they are amenable to analysis.
◼ The term analysis refers to the computation of certain measures along with
searching for patterns of relationship that exist among data-groups.
◼ Thus, “in the process of analysis, relationships or differences supporting or
conflicting with original or new hypotheses should be subjected to statistical tests
of significance to determine with what validity data can be said to indicate any
conclusions”.
PROCESSING OPERATIONS

❖ 1. Editing: Editing of data is a process of examining the collected raw data


(specially in surveys) to detect errors and omissions and to correct these when
possible.
◼ As a matter of fact, editing involves a careful scrutiny of the completed
questionnaires and/or schedules.
◼ Editing is done to assure that the data are accurate, consistent with other facts
gathered, uniformly entered, as completed as possible and have been well arranged
to facilitate coding and tabulation.
◼ With regard to points or stages at which editing should be done, one can talk of
field editing and central editing.
◼ Field editing consists in the review of the reporting forms by the investigator for
completing (translating or rewriting) what the latter has written in abbreviated
and/or in illegible form at the time of recording the respondents’ responses.
◼ Central editing should take place when all forms or schedules have been completed
and returned to the office.
◼ This type of editing implies that all forms should get a thorough editing by a single
editor in a small study and by a team of editors in case of a large inquiry.
◼ Editor(s) may correct the obvious errors such as an entry in the wrong place, entry
recorded in months when it should have been recorded in weeks, and the like.
◼ In case of inappropriate on missing replies, the editor can sometimes determine the
proper answer by reviewing the other information in the schedule.
◼ The editor must strike out the answer if the same is inappropriate and he has no
basis for determining the correct answer or the response.
◼ In such a case an editing entry of ‘no answer’ is called for. All the wrong replies,
which are quite obvious, must be dropped from the final results, especially in the
context of mail surveys.
◼ Editors must keep in view several points while performing their work:
◼ (a) They should be familiar with instructions given to the interviewers and coders
as well as with the editing instructions supplied to them for the purpose.
◼ (b) While crossing out an original entry for one reason or another, they should just
draw a single line on it so that the same may remain legible.
◼ (c) They must make entries (if any) on the form in some distinctive colour and that
too in a standardised form.
◼ (d) They should initial all answers which they change or supply.
◼ (e) Editor’s initials and the date of editing should be placed on each completed
form or schedule.
◼ 2. Coding: Coding refers to the process of assigning numerals or other symbols to
answers so that responses can be put into a limited number of categories or classes.
◼ Such classes should be appropriate to the research problem under consideration.
◼ They must also possess the characteristic of exhaustiveness (i.e., there must be a
class for every data item) and also that of mutual exclusively which means that a
specific answer can be placed in one and only one cell in a given category set.
◼ Another rule to be observed is that of unidimensionality by which is meant that
every class is defined in terms of only one concept.
◼ Coding is necessary for efficient analysis and through it the several replies may be
reduced to a small number of classes which contain the critical information
required for analysis.
◼ Coding decisions should usually be taken at the designing stage of the
questionnaire.
◼ This makes it possible to pre-code the questionnaire choices and which in turn is
helpful for computer tabulation as one can straight forward key punch from the
original questionnaires.
◼ But in case of hand coding some standard method may be used. One such standard
method is to code in the margin with a coloured pencil.
◼ The other method can be to transcribe the data from the questionnaire to a coding
sheet.
◼ Whatever method is adopted, one should see that coding errors are altogether
eliminated or reduced to the minimum level.
◼ 3. Classification: Most research studies result in a large volume of raw data which
must be reduced into homogeneous groups if we are to get meaningful
relationships.
◼ This fact necessitates classification of data which happens to be the process of
arranging data in groups or classes on the basis of common characteristics.
◼ Data having a common characteristic are placed in one class and in this way the
entire data get divided into a number of groups or classes.
◼ Classification can be one of the following two types, depending upon the nature of
the phenomenon involved:
◼ (a) Classification according to attributes: As stated above, data are classified on the
basis of common characteristics which can either be descriptive (such as literacy,
sex, honesty, etc.) or numerical (such as weight, height, income, etc.).
◼ Descriptive characteristics refer to qualitative phenomenon which cannot be
measured quantitatively; only their presence or absence in an individual item can
be noticed.
◼ Data obtained this way on the basis of certain attributes are known as statistics of
attributes and their classification is said to be classification according to attributes.
◼ (b) Classification according to class-intervals: Unlike descriptive characteristics,
the numerical characteristics refer to quantitative phenomenon which can be
measured through some statistical units.
◼ Data relating to income, production, age, weight, etc. come under this category.
Such data are known as statistics of variables and are classified on the basis of
class intervals.
◼ For instance, persons whose incomes, say, are within Rs 201 to Rs 400 can form
one group, those whose incomes are within Rs 401 to Rs 600 can form another
group and so on.
◼ In this way the entire data may be divided into a number of groups or classes or what
are usually called, ‘class-intervals.’ Each group of class-interval, thus, has an upper
limit as well as a lower limit which are known as class limits.
◼ The difference between the two class limits is known as class magnitude. We may
have classes with equal class magnitudes or with unequal class magnitudes. The
number of items which fall in a given class is known as the frequency of the given
class.
◼ All the classes or groups, with their respective frequencies taken together and put in the
form of a table, are described as group frequency distribution or simply frequency
distribution.
◼ Classification according to class intervals usually involves the following three main
problems:
◼ (i) How may classes should be there? What should be their magnitudes? There can
be no specific answer with regard to the number of classes.
◼ The decision about this calls for skill and experience of the researcher. However,
the objective should be to display the data in such a way as to make it meaningful
for the analyst.
◼ Typically, we may have 5 to 15 classes. With regard to the second part of the
question, we can say that, to the extent possible, class-intervals should be of equal
magnitudes, but in some cases unequal magnitudes may result in better
classification.
◼ Hence the researcher’s objective judgement plays an important part in this
connection.
◼ Some statisticians adopt the following formula, suggested by H.A. Sturges,
determining the size of class interval:
◼ i = R/(1 + 3.3 log N)
◼ where i = size of class interval; R = Range (i.e., difference between the values of
the largest item and smallest item among the given items); N = Number of items to
be grouped.
◼ (ii) How to choose class limits? While choosing class limits, the researcher must
take into consideration the criterion that the mid-point (generally worked out first
by taking the sum of the upper limit and lower limit of a class and then divide this
sum by 2) of a class-interval and the actual average of items of that class interval
should remain as close to each other as possible.
◼ Consistent with this, the class limits should be located at multiples of 2, 5, 10, 20,
100 and such other figures. Class limits may generally be stated in any of the
following forms:
◼ Exclusive type class intervals: They are usually stated as follows:
◼ 10–20

◼ 20–30

◼ 30–40

◼ 40–50

◼ The above intervals should be read as under:

◼ 10 and under 20

◼ 20 and under 30

◼ 30 and under 40

◼ 40 and under 50

◼ Thus, under the exclusive type class intervals, the items whose values are equal to the
upper limit of a class are grouped in the next higher class.
◼ For example, an item whose value is exactly 30 would be put in 30–40 class interval and
not in 20–30 class interval.
◼ In simple words, we can say that under exclusive type class intervals, the upper limit of a
class interval is excluded and items with values less than the upper limit (but not less than
the lower limit) are put in the given class interval.
◼ Inclusive type class intervals: They are usually stated as follows:

◼ 11–20
◼ 21–30
◼ 31–40
◼ 41–50

◼ In inclusive type class intervals the upper limit of a class interval is also included in the
concerning class interval. Thus, an item whose value is 20 will be put in 11–20 class
interval. The stated upper limit of the class interval 11–20 is 20 but the real limit is
20.99999 and as such 11–20 class interval really means 11 and under 21.
◼ When the phenomenon under consideration happens to be a discrete one (i.e., can be
measured and stated only in integers), then we should adopt inclusive type classification.
◼ But when the phenomenon happens to be a continuous one capable of being measured in
fractions as well, we can use exclusive type class intervals.
◼ iii) How to determine the frequency of each class? This can be done either by tally
sheets or by mechanical aids.
◼ Under the technique of tally sheet, the class-groups are written on a sheet of paper
(commonly known as the tally sheet) and for each item a stroke (usually a small
vertical line) is marked against the class group in which it falls.
◼ The general practice is that after every four small vertical lines in a class group,
the fifth line for the item falling in the same group, is indicated as horizontal line
through the said four lines and the resulting flower (IIII) represents five items.
4. Tabulation:
◼ When a mass of data has been assembled, it becomes necessary for the researcher to arrange the
same in some kind of concise and logical order. This procedure is referred to as tabulation.
◼ Thus, tabulation is the process of summarising raw data and displaying the same in compact form
(i.e., in the form of statistical tables) for further analysis.
◼ In a broader sense, tabulation is an orderly arrangement of data in columns and rows.
◼ Tabulation is essential because of the following reasons.
◼ 1. It conserves space and reduces explanatory and descriptive statement to a minimum.
◼ 2. It facilitates the process of comparison.
◼ 3. It facilitates the summation of items and the detection of errors and omissions.
◼ 4. It provides a basis for various statistical computations.
◼ Tabulation can be done by hand or by mechanical or electronic devices.
◼ The choice depends on the size and type of study, cost considerations, time pressures and the
availaibility of tabulating machines or computers.
◼ In relatively large inquiries, we may use mechanical or computer tabulation if other factors are
favourable and necessary facilities are available.
◼ Hand tabulation is usually preferred in case of small inquiries where the number of questionnaires
is small and they are of relatively short length.
◼ Hand tabulation may be done using the direct tally, the list and tally or the card sort and count
methods.
◼ Tabulation may also be classified as simple and complex tabulation.
◼ The former type of tabulation gives information about one or more groups of independent
questions, whereas the latter type of tabulation shows the division of data in two or more categories
and as such is deigned to give information concerning one or more sets of inter-related questions.
◼ Simple tabulation generally results in one-way tables which supply answers to questions about one
characteristic of data only.
◼ As against this, complex tabulation usually results in two-way tables (which give information about
two inter-related characteristics of data), three-way tables (giving information about three
interrelated characteristics of data) or still higher order tables, also known as manifold tables, which
supply information about several interrelated characteristics of data.
◼ Two-way tables, three-way tables or manifold tables are all examples of what is sometimes
described as cross tabulation.
◼ Generally accepted principles of tabulation:
◼ 1. Every table should have a clear, concise and adequate title so as to make the table intelligible
without reference to the text and this title should always be placed just above the body of the table.
◼ 2. Every table should be given a distinct number to facilitate easy reference.
◼ 3. The column headings (captions) and the row headings of the table should be clear and brief.
◼ 4. The units of measurement under each heading or sub-heading must always be indicated.
◼ 5. Explanatory footnotes
◼ 6. Sources
◼ 7. Usually the columns are separated from one another by lines which make the table more readable and
attractive.
◼ 8. There should be thick lines to separate the data under one class from the data under another class and the
lines separating the sub-divisions of the classes should be comparatively thin lines.
◼ 9. The columns may be numbered to facilitate reference.
◼ 10. Those columns whose data are to be compared should be kept side by side.
◼ 11. It is generally considered better to approximate figures before tabulation as the same would reduce
unnecessary details in the table itself.
◼ 12. In order to emphasise the relative significance of certain categories, different kinds of type, spacing and
indentations may be used.
◼ 13. It is important that all column figures be properly aligned. Decimal points and (+) or (–) signs should be in
perfect alignment.
◼ 14. Abbreviations should be avoided to the extent possible and ditto marks should not be used in the table. 15.
Miscellaneous and exceptional items.
◼ 16. Table should be made as logical, clear, accurate and simple as possible.
◼ 17. Total of rows should normally be placed in the extreme right column and that of columns should be placed
at the bottom.
◼ 18. The arrangement of the categories in a table may be chronological, geographical, alphabetical or according
to magnitude to facilitate comparison
Scaling and Measurement
◼ Technically speaking, measurement is a process of mapping aspects of a domain onto other aspects of a
range according to some rule of correspondence.
◼ In measuring, we devise some form of scale in the range (in terms of set theory, range may refer to some
set) and then transform or map the properties of objects from the domain (in terms of set theory, domain
may refer to some other set) onto this scale.

For example, in case we are to find the male to female attendance ratio while conducting a study of persons who attend some show,
then we may tabulate those who come to the show according to sex. In terms of set theory, this process is one of mapping the
observed physical properties of those coming to the show (the domain) on to a sex classification (the range).

The rule of correspondence is: If the object in the domain appears to be male, assign to “0” and if female assign to “1”.

Similarly, we can record a person’s marital status as 1, 2, 3 or 4, depending on whether the person is single, married, widowed or
divorced.

We can as well record “Yes or No” answers to a question as “0” and “1” (or as 1 and 2 or perhaps as 59 and 60)
◼ Nominal data are numerical in name only, because they do not share any of the properties of the numbers we
deal in ordinary arithmetic. For instance if we record marital status as 1, 2, 3, or 4 as stated above, we cannot
write 4 > 2 or 3 < 4 and we cannot write 3 – 1 = 4 – 2, 1 + 3 = 4 or 4 ÷ 2 = 2.
◼ In those situations when we cannot do anything except set up inequalities, we refer to the data as ordinal data.
◼ For instance, if one mineral can scratch another, it receives a higher hardness number and on Mohs’ scale the
numbers from 1 to 10 are assigned respectively to talc, gypsum, calcite, fluorite, apatite, feldspar, quartz,
topaz, sapphire and diamond.
◼ With these numbers we can write 5 > 2 or 6 < 9 as apatite is harder than gypsum and feldspar is softer than
sapphire.
◼ we cannot write for example 10 – 9 = 5 – 4, because the difference in hardness between diamond and
sapphire is actually much greater than that between apatite and fluorite.
◼ When in addition to setting up inequalities we can also form differences, we refer to the data as interval data.
Suppose we are given the following temperature readings (in degrees Fahrenheit):
◼ 58°, 63°, 70°, 95°, 110°, 126° and 135°. In this case, we can write 100° > 70° or 95° < 135° which simply
means that 110° is warmer than 70° and that 95° is cooler than 135°.
◼ We can also write for example 95° – 70° = 135° – 110°, since equal temperature differences are equal in the
sense that the same amount of heat is required to raise the temperature of an object from 70° to 95° or from
110° to 135°.
MEASUREMENT SCALES
◼ From what has been stated above, we can write that scales of measurement can be
considered in terms of their mathematical properties. The most widely used
classification of measurement scales are: (a) nominal scale; (b) ordinal scale; (c)
interval scale; and (d) ratio scale.
◼ (a) Nominal scale: Nominal scale is simply a system of assigning number symbols to
events in order to label them. The usual example of this is the assignment of numbers of
basketball players in order to identify them. Such numbers cannot be considered to be
associated with an ordered scale for their order is of no consequence; the numbers are
just convenient labels for the particular class of events and as such have no quantitative
value.
◼ (b) Ordinal scale: The lowest level of the ordered scale that is commonly used is the
ordinal scale. The ordinal scale places events in order, but there is no attempt to make
the intervals of the scale equal in terms of some rule. Rank orders represent ordinal
scales and are frequently used in research relating to qualitative phenomena.
◼ A student’s rank in his graduation class involves the use of an ordinal scale. One
has to be very careful in making statement about scores based on ordinal scales.
For instance, if Ram’s position in his class is 10 and Mohan’s position is 40, it
cannot be said that Ram’s position is four times as good as that of Mohan. The
statement would make no sense at all.
◼ (c) Interval scale: In the case of interval scale, the intervals are adjusted in terms of
some rule that has been established as a basis for making the units equal. The units
are equal only in so far as one accepts the assumptions on which the rule is based.
Interval scales can have an arbitrary zero, but it is not possible to determine for
them what may be called an absolute zero or the unique origin.
◼ (d) Ratio scale: Ratio scales have an absolute or true zero of measurement. The
term ‘absolute zero’ is not as precise as it was once believed to be. We can
conceive of an absolute zero of length and similarly we can conceive of an absolute
zero of time. For example, the zero point on a centimeter scale indicates the
complete absence of length or height.
Likert-type Scales
◼ Summated Scales (or Likert-type Scales) are developed by utilizing the item analysis
approach wherein a particular item is evaluated on the basis of how well it discriminates
between those persons whose total score is high and those whose score is low. Those
items or statements that best meet this sort of discrimination test are included in the final
instrument.
◼ Most frequently used summated scales in the study of social attitudes follow the pattern
devised by Likert. For this reason they are often referred to as Likert-type scales.
◼ In a Likert scale, the respondent is asked to respond to each of the statements in terms of
several degrees, usually five degrees (but at times 3 or 7 may also be used) of agreement
or disagreement.
◼ For example, when asked to express opinion whether one considers his job quite
pleasant, the respondent may respond in any one of the following ways: (i) strongly
agree, (ii) agree, (iii) undecided, (iv) disagree, (v) strongly disagree.
Thank You

You might also like