Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

UNIT IV Cat

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

UNIT IV: Analysis of data, editing, processing, consolidation and tabulation, application

of techniques, scaling techniques:


Editing:
The editing of data is a process of examining the raw data to detect errors and omissions and
to correct them, if possible, so as to ensure legibility, completeness, consistency and accuracy.
The recorded data must be legible so that it could he coded later. An illegible response may be
corrected by getting in touch with people who recorded it or alternatively it may be inferred
from other parts of the question. Completeness involves that all the items in the questionnaire
must be fully completed. If some questions are not answered, the interviewers may be contacted
to find out whether he failed to respond to the question or the respondent refused to answer the
question. In case of former, it is quite likely that the interviewer will not remember the answer.
In such a case the respondent may be contacted again or alternatively this particular piece of
data may be treated as missing data.
Completeness involves that all the items in the questionnaire must be fully completed. If some
questions are not answered, the interviewers may be contacted to find out whether he failed to
respond to the question or the respondent refused to answer the question. In case of former, it
is quite likely that the interviewer will not remember the answer. In such a case the respondent
may be contacted again or alternatively this particular piece of data may be treated as missing
data.
It is very important to check whether or not respondent is consistent in answering the questions.
For example, there could a respondent claiming that he makes purchases by credit card may
not have one.
The inaccuracy of the survey data may be due to interviewer bias or cheating. One way of
spotting is to look for a common pattern of responses in the instrument of a particular
interviewer. Apart from ensuring quality data this will also facilitate in coding and tabulation
of data. In fact, the editing involves a careful scrutiny of the completed questionnaires.
The editing can be done at two stages: 1. Field Editing, and 2. Central Editing.
Field Editing: The field editing consists of review of the reporting forms by the investigator
for completing or translating what the latter has written in abbreviated form at the time of
interviewing the respondent. This form of editing is necessary in view of the writing of
individuals, which vary from individual to individual and sometimes difficult for the tabulator
to understand. This sort of editing should be done as soon as possible after the interview, as it
may be necessary sometimes to recall the memory. While doing so, care should be taken so
that the investigator does not correct the errors of omission by simply guessing what the
respondent would have answered if the question was put to him.
Central Editing: Central editing should be carried out when all the forms of schedules have
been completed and returned to the headquarters. This type of editing requires that all the forms
are thoroughly edited by a single person (editor) in a small field study or a small group of
persons in case of a large field study, The editor may correct the obvious errors, such as an
entry in a wrong place, entry recorded in daily terms whereas it should have been recorded in
weeks/months, etc. Sometimes, inappropriate or missing replies can also be recorded by the
editor by reviewing the other information recorded in the schedule. If necessary, the respondent
may be contacted for clarification. All the incorrect replies, which are quite obvious, must be
deleted from the schedules. The editor should be familiar with the instructions and the codes
given to the interviewers while editing. The new (corrected) entry made by the editor should
be in some distinctive form and they be initialed by the editor. The date of editing may also be
recorded on the schedule for any future references.
Coding:
Coding is the process of assigning some symbols (either) alphabetical or numerals or (both) to
the answers so that the responses can be recorded into a limited number of classes or categories.
The classes should be appropriate to the research problem being studied. They must be
exhaustive and must be mutually exclusive so that the answer can be placed in one and only
one cell in a given category. Further, every class must be defined in terms of only one concept.
The coding is necessary for the efficient analysis of data. The coding decisions should usually
be taken at the designing stage of the questionnaire itself so that the likely responses to
questions are pre-coded. This simplifies computer tabulation of the data for further analysis. It
may be noted that any errors in coding should be eliminated altogether or at least be reduced
to the minimum possible level. Coding for an open-ended question is more tedious than the
closed ended question. For a closed ended or structured question, the coding scheme is very
simple and designed prior to the field work.
For Ex: What is your monthly income?
A ) < Rs. 5000 B) Rs. 5000 - 8999 C)Rs. 13000 – 12999 D)Rs. 13000 or above.
We may code the class less than Rs.5000' as , 1', Rs. 5000 - 8999' as `2', `Rs. 9000 - 12999' as
`3' and `Rs. 13000 or above' as `4
Tabulation:
The classification of data is made with reference to time or some other variables. The graphs
are used as a visual form of presentation of data. The tabulation is used for summarization and
condensation of data. It aids in analysis of relationships, trends and other summarization of the
given data. The tabulation may be simple or complex. Simple tabulation results in one-way
tables, which can be used to answer questions related to one characteristic of the data. The
complex tabulation usually results in two way tables, which give information about two
interrelated characteristics of the data; three way tables which give information about three
interrelated characteristics of data; and still higher order tables, which supply information about
several interrelated characteristics of data. Following are the important characteristics of a
table:
1. Every table should have a clear and concise title to make it understandable without
reference to the text.
2. This title should always be just above the body of the table.
3. Every table should be given a distinct number to facilitate easy reference.
4. Every table should have captions (column headings) and stubs (row headings) and they
should be clear and brief The units of measurements used must always be indicated.
5. Source or sources from where the data in the table have been obtained must be indicated
at the bottom of the table.
6. Explanatory footnotes, if any, concerning the table should be given beneath the table
alongwith reference symbol.
7. The columns in the tables may be numbered to facilitate reference. Abbreviations
should be used to the minimum possible extent.
8. The tables should be logical, clear, accurate and as simple as possible.
9. The arrangement of the data categories in a table may be a chronological, geographical,
alphabetical or according to magnitude to facilitate comparison.
10. Finally, the table must suit the needs and requirements of the research study

Source : https://egyankosh.ac.in/bitstream/123456789/10421/1/Unit-9.pdf
Application of Techniques :

What does a statistical test do?


Statistical tests work by calculating a test statistic – a number that describes how much the
relationship between variables in your test differs from the null hypothesis of no relationship.

It then calculates a p value (probability value). The p-value estimates how likely it is that you
would see the difference described by the test statistic if the null hypothesis of no relationship
were true.

If the value of the test statistic is more extreme than the statistic calculated from the null
hypothesis, then you can infer a statistically significant relationship between the predictor
and outcome variables.

If the value of the test statistic is less extreme than the one calculated from the null hypothesis,
then you can infer no statistically significant relationship between the predictor and outcome
variables.

When to perform a statistical test


You can perform statistical tests on data that have been collected in a statistically valid manner
– either through an experiment, or through observations made using probability sampling
methods.

For a statistical test to be valid, your sample size needs to be large enough to approximate the
true distribution of the population being studied.

To determine which statistical test to use, you need to know:

 whether your data meets certain assumptions.


 the types of variables that you’re dealing with.
Statistical assumptions
Statistical tests make some common assumptions about the data they are testing:

1. Independence of observations (a.k.a. no autocorrelation): The observations/variables


you include in your test are not related (for example, multiple measurements of a single
test subject are not independent, while measurements of multiple different test subjects
are independent).
2. Homogeneity of variance: the variance within each group being compared is similar
among all groups. If one group has much more variation than others, it will limit the
test’s effectiveness.
3. Normality of data: the data follows a normal distribution (a.k.a. a bell curve). This
assumption applies only to quantitative data.

If your data do not meet the assumptions of normality or homogeneity of variance, you may be
able to perform a nonparametric statistical test, which allows you to make comparisons
without any assumptions about the data distribution.

If your data do not meet the assumption of independence of observations, you may be able to
use a test that accounts for structure in your data (repeated-measures tests or tests that include
blocking variables).

Types of variables
The types of variables you have usually determine what type of statistical test you can use.

Quantitative variables represent amounts of things (e.g. the number of trees in a forest). Types
of quantitative variables include:

 Continuous (aka ratio variables): represent measures and can usually be divided into
units smaller than one (e.g. 0.75 grams).
 Discrete (aka integer variables): represent counts and usually can’t be divided into units
smaller than one (e.g. 1 tree).

Categorical variables represent groupings of things (e.g. the different tree species in a forest).
Types of categorical variables include:

 Ordinal: represent data with an order (e.g. rankings).


 Nominal: represent group names (e.g. brands or species names).
 Binary: represent data with a yes/no or 1/0 outcome (e.g. win or lose).

Choose the test that fits the types of predictor and outcome variables you have collected (if you
are doing an experiment, these are the independent and dependent variables). Consult the tables
below to see which test best matches your variables.

Difference between Parametric Test and Non-Parametric Test:

Properties Parametric Non-parametric


Assumptions Yes No

central tendency Value Mean value Median value

Correlation Pearson Spearman

Probabilistic distribution Normal Arbitrary

Population knowledge Requires Does not require

Used for Interval data Nominal data

Applicability Variables Attributes & Variables

Examples z-test, t-test, etc. Kruskal-Wallis, Mann-Whitney

Choosing a parametric test:


Parametric tests usually have stricter requirements than nonparametric tests, and are able to
make stronger inferences from the data. They can only be conducted with data that adheres to
the common assumptions of statistical tests.

Certainly! Parametric tests are statistical tests that make certain assumptions about the
distribution of the data. These assumptions include normality and homogeneity of variances.
Parametric tests are often used when the data is continuous and normally distributed. Here are
some commonly used parametric tests in social science along with examples:

1. Independent Samples t-test:

- Purpose: Compare means of two independent groups.

- Example: Compare the average scores of two groups of students, one taught with traditional
methods and the other with a new teaching method.

2. Paired Samples t-test:

- Purpose: Compare means of two related groups (matched pairs or repeated measures).

- Example: Assess the effectiveness of a new therapy by comparing the scores of individuals
before and after the therapy.

3. Analysis of Variance (ANOVA):

- Purpose: Compare means of more than two groups.


- Example: Investigate whether there is a significant difference in the average scores of
students taught using three different teaching methods.

4. Analysis of Covariance (ANCOVA):

- Purpose: Similar to ANOVA but with the addition of one or more covariates to control for
potential confounding variables.

- Example: Determine if there is a significant difference in the average scores of students


across different teaching methods while controlling for their prior knowledge as a covariate.

5. Linear Regression:

- Purpose: Examine the relationship between two continuous variables, with one as the
predictor and the other as the outcome.

- Example: Explore the relationship between the amount of time spent studying and exam
scores.

6. Multiple Regression:

- Purpose: Extend linear regression to examine the relationship between one dependent
variable and multiple independent variables.

- Example: Investigate the factors influencing job performance by considering variables like
education, experience, and motivation.

7. Multivariate Analysis of Variance (MANOVA):

- Purpose: Extension of ANOVA for multiple dependent variables.

- Example: Examine whether there are significant differences in multiple outcome variables
(e.g., test scores, creativity scores) across different teaching methods.

8. Multivariate Regression:

- Purpose: Extension of multiple regression for multiple dependent variables.

- Example: Predict both academic achievement and emotional well-being based on various
predictor variables like study habits, social support, etc.

Remember to check the assumptions of each test before applying them to your data. Violation
of assumptions may lead to inaccurate results. Additionally, it's crucial to interpret results in
the context of your study and research questions.

Choosing a nonparametric test

Non-parametric tests don’t make as many assumptions about the data, and are useful when one
or more of the common statistical assumptions are violated. However, the inferences they make
aren’t as strong as with parametric tests.
Nonparametric tests are used when the assumptions of parametric tests are violated or when
dealing with ordinal or non-normally distributed data. Here are some commonly used
nonparametric tests in social science along with examples:

1. Mann-Whitney U Test:

- Purpose: Compare the distribution of scores between two independent groups.

- Example: Assess whether there is a significant difference in the rankings of job satisfaction
between two different departments in a company.

2. Wilcoxon Signed-Rank Test:

- Purpose: Compare the distribution of scores between two related groups.

- Example: Examine whether there is a significant difference in the pre- and post-treatment
anxiety levels of individuals in a therapeutic intervention.

3. Kruskal-Wallis Test:

- Purpose: Nonparametric alternative to one-way ANOVA, used for comparing more than
two independent groups.

- Example: Investigate whether there are differences in the levels of perceived stress among
individuals in different occupational sectors.

4. Friedman Test:

- Purpose: Nonparametric alternative to repeated measures ANOVA, used for comparing


more than two related groups.

- Example: Analyze whether there are differences in the performance of students across three
different teaching methods over multiple testing sessions.

5. Chi-Square Test of Independence:

- Purpose: Determine if there is a significant association between two categorical variables.

- Example: Investigate whether there is a relationship between gender and preferred learning
style among a group of students.

6. Spearman's Rank-Order Correlation:

- Purpose: Assess the strength and direction of a monotonic relationship between two
variables.

- Example: Examine whether there is a significant correlation between the amount of time
spent on extracurricular activities and academic achievement.

7. Kendall's Tau:
- Purpose: Another nonparametric measure of correlation, similar to Spearman's correlation.

- Example: Investigate the association between the ranks of job satisfaction and years of work
experience among employees.

8. Mood's Median Test:

- Purpose: Test the equality of medians between two or more independent groups.

- Example: Compare the median income levels across different regions to determine if there
are significant differences.

When using nonparametric tests, it's important to note that they might be less powerful than
their parametric counterparts in certain situations. Additionally, the interpretation may differ,
as nonparametric tests often focus on ranks and medians rather than means. Always consider
the specific characteristics of your data and research questions when choosing the appropriate
test.

Scaling Techniques:
MEASUREMENT AND SCALING:
a) Measurement: Measurement is the process of observing and recording the observations that
are collected as part of research. The recording of the observations may be in terms of numbers
or other symbols to characteristics of objects according to certain prescribed rules. The
respondent’s, characteristics are feelings, attitudes, opinions etc. For example, you may assign
‘1’ for Male and ‘2’ for Female respondents. In response to a question on whether he/she is
using the ATM provided by a particular bank branch, the respondent may say ‘yes’ or ‘no’.
You may wish to assign the number ‘1’ for the response yes and ‘2’ for the response no. We
assign numbers to these characteristics for two reasons. First, the numbers facilitate further
statistical analysis of data obtained. Second, numbers facilitate the communication of
measurement rules and results. The most important aspect of measurement is the specification
of rules for assigning numbers to characteristics. The rules for assigning numbers should be
standardised and applied uniformly. This must not change over time or objects.
b) Scaling: Scaling is the assignment of objects to numbers or semantics according to a rule.
In scaling, the objects are text statements, usually statements of attitude, opinion, or feeling.
For example, consider a scale locating customers of a bank according to the characteristic
“agreement to the satisfactory quality of service provided by the branch”. Each customer
interviewed may respond with a semantic like ‘strongly agree’, or ‘somewhat agree’, or
‘somewhat disagree’, or ‘strongly disagree’. We may even assign each of the responses a
number. For example, we may assign strongly agree as ‘1’, agree as ‘2’ disagree as ‘3’, and
strongly disagree as ‘4’. Therefore, each of the respondents may assign 1, 2, 3 or 4.
Typically, there are four levels of measurement scales or methods of assigning numbers: (a)
Nominal scale, (b) Ordinal scale, (c) Interval scale, and (d) Ratio scale.
a) Nominal Scale is the crudest among all measurement scales but it is also the simplest scale.
In this scale the different scores on a measurement simply indicate different categories. The
nominal scale does not express any values or relationships between variables. For example,
labelling men as ‘1’ and women as ‘2’ which is the most common way of labelling gender for
data recording purpose does not mean women are ‘twice something or other’ than men. Nor it
suggests that men are somehow ‘better’ than women.
Another example of nominal scale is to classify the respondent’s income into three groups: the
highest income as group 1. The middle income as group 2, and the low-income as group 3. The
nominal scale is often referred to as a categorical scale. The assigned numbers have no
arithmetic properties and act only as labels. The only statistical operation that can be performed
on nominal scales is a frequency count. We cannot determine an average except mode. In
designing and developing a questionnaire, it is important that the response categories must
include all possible responses. In order to have an exhaustive number of responses, you might
have to include a category such as ‘others’, ‘uncertain’, ‘don’t know’, or ‘can’t remember’ so
that the respondents will not distort their information by forcing their responses in one of the
categories provided. Also, you should be careful and be sure that the categories provided are
mutually exclusive so that they do not overlap or get duplicated in any way.
b) Ordinal Scale involves the ranking of items along the continuum of the characteristic being
scaled. In this scale, the items are classified according to whether they have more or less of a
characteristic. For example, you may wish to ask the TV viewers to rank the TV channels
according to their preference and the responses may look like this as given below: TV Channel
Viewers preferences Doordarshan-1 1 Star plus 2 NDTV News 3 Aaaj Tak TV 4 The main
characteristic of the ordinal scale is that the categories have a logical or ordered relationship.
This type of scale permits the measurement of degrees of difference, (that is, ‘more’ or ‘less’)
but not the specific amount of differences (that is, how much ‘more’ or ‘less’). This scale is
very common in marketing, satisfaction and attitudinal research. Another example is that a fast
food home delivery shop may wish to ask its customers: How would you rate the service of our
staff? (1) Excellent • (2) Very Good • (3) Good • (4) Poor • (5) Worst • Suppose respondent X
gave the response ‘Excellent’ and respondent Y gave the response ‘Good’, we may say that
respondent X thought that the service provided better than respondent Y to be thought. But we
don’t know how much better and even we can’t say that both respondents have the same
understanding of what constitutes ‘good service’. In marketing research, ordinal scales are used
to measure relative attitudes, opinions, and preferences. Here we rank the attitudes, opinions
and preferences from best to worst or from worst to best. However, the amount of difference
between the ranks cannot be found out. Using ordinal scale data, we can perform statistical
analysis like Median and Mode, but not the Mean.
c) Interval Scale is a scale in which the numbers are used to rank attributes such that
numerically equal distances on the scale represent equal distance in the characteristic being
measured. An interval scale contains all the information of an ordinal scale, but it also one
allows to compare the difference/distance between attributes. For example, the difference
between ‘1’ and ‘2’ is equal to the difference between ‘3’ and ‘4’. Further, the difference
between ‘2’ and ‘4’ is twice the difference between ‘1’ and ‘2’. However, in an interval scale,
the zero point is arbitrary and is not true zero. This, of course, has implications for the type of
data manipulation and analysis. We can carry out on data collected in this form. It is possible
to add or subtract a constant to all of the scale values without affecting the form of the scale
but one cannot multiply or divide the values. Measuring temperature is an example of interval
scale. We cannot say 400 C is twice as hot as 200 C. The reason for this is that 00 C does not
mean that there is no temperature, but a relative point on the Centigrade Scale. Due to lack of
an absolute zero point, the interval scale does not allow the conclusion that 400 C is twice as
hot as 200 C. Interval scales may be either in numeric or semantic formats. The following are
two more examples of interval scales one in numeric format and another in semantic format.
whether they have more or less of a characteristic. For example, you may wish to ask the TV
viewers to rank the TV channels according to their preference and the responses may look like
this as given below: TV Channel Viewers preferences Doordarshan-1 1 Star plus 2 NDTV
News 3 Aaaj Tak TV 4
The main characteristic of the ordinal scale is that the categories have a logical or ordered
relationship. This type of scale permits the measurement of degrees of difference, (that is,
‘more’ or ‘less’) but not the specific amount of differences (that is, how much ‘more’ or ‘less’).
This scale is very common in marketing, satisfaction and attitudinal research. Another example
is that a fast food home delivery shop may wish to ask its customers: How would you rate the
service of our staff? (1) Excellent • (2) Very Good • (3) Good • (4) Poor • (5) Worst • Suppose
respondent X gave the response ‘Excellent’ and respondent Y gave the response ‘Good’, we
may say that respondent X thought that the service provided better than respondent Y to be
thought. But we don’t know how much better and even we can’t say that both respondents have
the same understanding of what constitutes ‘good service’. In marketing research, ordinal
scales are used to measure relative attitudes, opinions, and preferences. Here we rank the
attitudes, opinions and preferences from best to worst or from worst to best. However, the
amount of difference between the ranks cannot be found out. Using ordinal scale data, we can
perform statistical analysis like Median and Mode, but not the Mean.

References:
https://egyankosh.ac.in/
https://chat.openai.com/

You might also like