Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
8 views

Unit 8-Data Analysis

R & D

Uploaded by

laxmanawale1999
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Unit 8-Data Analysis

R & D

Uploaded by

laxmanawale1999
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

CHAPTER-8

PROCESSING AND ANALYSIS OF DATA

Data Processing
• Data are analyzed and interpreted using statistical, econometric and mathematical
techniques so as to draw results which is known as data analysis
• According to Wikipedia, "Data analysis is a process of gathering, modeling and
transforming data with the goal of highlighting useful information, suggestions,
conclusions and supporting decision making. Data analysis has multiple facts and
approaches, encompassing diverse techniques under a variety of names, in different
business, sciences and social science domain."
• Data should be processed using specific techniques to draw the conclusion.
• Processing techniques is used to make data valid, simple and reliable. Data are, first of
all, classified and grouped on the basis of its nature, quality and trends.
• Data processing procedures are given below:
EDITING
• Editing of data is a process of examining the collected raw data (specially in surveys) to
detect errors and omissions and to correct these when possible. Especially, the data
obtained from interview, observation and questionnaire should be edited.
• Field editing: it consists in the review of the reporting forms by the investigator for
completing (translating or rewriting) what the latter has written in abbreviated and/or in
illegible form at the time of recording the respondents’ responses.
• Central editing: should take place when all forms or schedules have been completed and
returned to the office.
The main objectives of editing are to ensure the following things:
• Accuracy of data
• Consistent with the intent of the questions
• Data is uniformly entered
• It is complete
• Simplify coding and tabulation
The editor should follow the following rules while editing data:
• Be familiar with instruction given to interviewer and interviewees.
• Do not destroy and erase the original entry.
• Make all edited entries on an instrument in some distinctive color and in a standard form.
• Put initial to signalize in all changed or amended answers.
• Place initial signature and date of editing in each instrument completed.
CODING
• Coding refers to the process of assigning numerals or other symbols to answers so that
responses can be put into a limited number of categories or classes.
• Such classes should be appropriate to the research problem under consideration.
• They must also possess the characteristic of exhaustiveness (i.e., there must be a class for
every data item) and also that of mutual exclusively which means that a specific answer
can be placed in one and only one cell in a given category set.

1
• Data collected from observation techniques will not be similar and such unsystematic and
different types of information should be systematized.
The following rules are to be followed while coding:
• Coding should avoid unclarity and duality
• All the codes are to be defined.
• Coding system should be developed while developing data collection design.
• Codes are to be recorded in code book.
• Codes should be appropriate to the research problem and purpose.
• It should be mutually exclusive and exhaustive.
• It should be derived from one of the classification principles.
CLASSIFICATION
• Classification means separating items on the basis of similarity in characteristics and
grouping them into various classes.
• Data having a common characteristic are placed in one class and in this way the entire
data get divided into a number of groups or classes.
Classification of data can be made on the following bases:
• Geographical classification:village, district, zone, development region
• Chronological classification :time frame
• Qualitative classification: characteristics or qualities
• Quantitative classification: on the basis of class interval like production, value, marks
obtained, weight etc.
TABULATION
• Tabulation is the process of arranging data in a systematic manner into rows and
columns.
• It is the final step in collection and compilation of data.
Reasons of tabulation:
• It conserves space and reduces explanatory and descriptive statement to a minimum.
• It facilitates the process of comparison.
• It facilitates the summation of items and the detection of errors and omissions.
• It provides a basis for various statistical computations.
• It avoids repetition.
Main Parts of a Table
Parts of a table depend on the nature of research, design of data and objective of research.
Following are the main parts:
• Number of the table
• Title of the table
• Column caption
• Title of the row
• Body of the table
• Head note
• Footnote
• Sources
Tabulation can be done by hand or by mechanical or electronic devices.

2
• Presenting data in precise form so as to make easier for describing, analyzing and
interpreting the data is known as summarizing of data.
• Data can be summarized using table, graphs and charts
TABLE
Table can be classified into different groups on the basis of its construction:
1. Simple table: single characteristic, known as one way table
• Complex table: presents more than one feature. It can be classified into different groups.
• Two way table: two characteristics
• Three way table: three characteristics
• Manifold table: more than three characteristics
GRAPHS AND CHARTS
Graphic presentation refers to the presentation of data in geometrical figures.
Graphs and charts help:
• To present highest and lowest figure of a variable, simple bar diagram can be constructed.
• To present the ratio of any variable, pie chart can be prepared.
• To present the trend of variables, line chart and time series graph is prepared.
• To present the scatteredness of variable, scattered diagram is prepared.
Advantages of Diagram
• It creates impression on the mind of the observer.
• It is easy to understand the facts, draw conclusion and tame decisions.
• It saves time and labour.
• It facilitates comparison of data.
• It is more attractive and convincing.
• It helps to remember the facts for a longer period of time.
General rules for constructing diagram
Generally, following rules are followed while constructing diagram:
• Title
• Proper proportion between width and height
• Selection of scale
• Neatness and cleanliness
• Footnote
• Selection of diagram
• Simplicity
• Index
Types of diagram
• Bar diagram: Simple bar-diagram, Sub-divided bar diagram, Percentage bar-diagram &
Multiple bar diagram
• Pie chart
GRAPHS
• Graph presents to time series data and frequency distribution.
• Graphs are more precise and accurate than diagram and can be effectively used for
further statistical analysis lto study slopes, rate of change and for forecasting.
Types of graph:

3
• Time series graph: Distribution of data on the basis of units of time is known as time
series data.
• Scattered diagram: A graph that is presented to see the distribution of two variables is
known as scattered diagram.
• Graphs presenting functional relationship: Functional relationships of the variables can
be classified into following two groups:
i. Linear relationship: if a graph presents the horizontally straight line while presenting the
relationship between two variables is taken as linear relationship.
ii. Non-linear relationship: If the diagram does not present straight line while putting the values
of the variables such as relationship between social and economic activities, it forms non-linear
relationship.
Statistical Analysis
• After managing the data in workable design, data are analyzed using statistical tools to
draw the conclusions which is known as statistical analysis
• Tools used for analysis of data are based on the scale used while collecting data.
Classification of Statistical Analysis:
Scale Suitable Suitable Suitable Suitable test
used average dispersion correlation of significance
Nominal Mode None Contingency Chi-square test
coefficient
Ordinal Median Quartile deviation Rank correlation Sign test

Interval Mean Standard deviation Coefficient of T or f-test


correlation
Ratio Mean Co-efficient of All of the above All of the
variation above
Classification of Statistical Analysis
1. Descriptive Statistics:
• Those statistical tools which are used to explain the activities or fundamental
characteristics of data is known as description statistics.
• It also includes estimation, test of significance and trend.
• Frequency, mean, median, and mode are taken as descriptive statistics.
• It helps to get the summarized information of sample units.
• By the use of descriptive statistics, a businessman can assess the average of profit, ratio
of profit and change in profit from the sale of goods at a point of time.
• But researcher cannot draw important conclusion applying descriptive statistics.
• For examples descriptive statistical tools show the increase in sales but does not present
why sales has been increased.
Some of the descriptive statistical tools are: Frequency, Mean(simple arithmetic mean, weighted
arithmetic mean, geometric mean), &Mode
Measures of dispersion:

4
• Average shows a value in the series of data in which the data are heavily concentrated.
But such study cannot explain every dimension of the variables.
• Dispersion shows the scatteredness of data.
• It means how much the data are scattered from the mean value.
• It shows the relatively lower and higher value of items than the mean value.
• The main aim of measures of dispersion is to measure the reliability of measurement of
central tendency and comparison of consistency of two or more set of data.
Dispersion is measure by: Range, Quartile deviation, Mean deviation, Standard deviation &
Coefficient of standard deviation/variation
2. Inferential Statistics:
• Statistical tools that are used to estimate or take decision of population parameters using
data collected from sample is known as inferential statistics.
a. Estimation statistics:
Estimation is made using following two methods in statistics:
i. Confidence interval:
• It helps to establish difference between two points.
• It helps to estimate the value or characteristics of population analyzing to the sample. If a
campus estimates that the admission of 300-500 students and the admission fall within
these two figures then it is the estimation of confidence interval.
ii. Parameter estimation:
• It is a statistical method that helps to estimate the relationship of variables that are in the
population is known as parameter estimation.
• Linear regression, mode, correlation are some examples of parameter estimation.
b. Hypothesis testing
• Inference on population characteristics or parameters are often made on the basis of
sample observations, especially when the population is large and it may not be possible to
enumerate all the sampling units belonging to the population.
• In doing so, one has to take the help of certain assumptions about the characteristics of
population which are known as hypothesis.
• Hypothesis is tested on the basis of sample.
• Such hypotheses are tested using various statistical tools based on the analysis of sample.
• It helps to estimate the population parameter from the analysis of sample.
• Generally, hypothesis is tested based on probability value (p-value).
• First of all, level of significance or confidence interval should determine.
• If a researcher sets 5% confidence level, then the p-value having less than 0.05 null
hypothesis is accepted and alternative hypothesis is not accepted.
• P-value measures the significance of the test.
Procedure for Testing Hypothesis
We have to follow the following steps for testing a hypothesis.
• State the null and alternative hypothesis: first of all, researcher should set the hypothesis
based on the literature. Researcher should set both null (H0) and alternative hypothesis
(H1). Null hypothesis is formulated using word no. or presenting the similar situation to

5
the current position but in alternative hypothesis, researcher assumes the existence of
difference or effect.
• Establish a level of significance: the level of significance signifies the probability of
committing errors that is accepted. It is set up at the discretion of the investigator
depending on the sensitivity of the issue under study. If 5% level of significance is
accepted, the researcher is willing to take 5 percent risk of rejecting the null hypothesis
even it is true.
• Choosing a suitable test statistic: for the purpose of rejecting or accepting null hypothesis,
a suitable statistical tool is chosen which is known as test statistics. Statistical tool
selection depends on the nature of data and the requirement of precision. For example, if
data is collected using ratio and interval scale then t & z-test & ANOVA can be used but
the data generated using ordinal and ranking scale then chi-square test, frequency and
other non-parametric tests can be used.
• Obtain the critical value: we shall consult the appropriate table (like z-table, t-table etc)
for finding out the critical value like H0 at Z (5%)=1.96 (from normal table). The critical
value defines the region of rejection from the region of acceptance.
• Conclusions: researcher draws the conclusions comparing the tabulated and calculated
value. If calculated value is less than or equal to tabulated value at a certain level of
significance then null hypothesis is accepted. It means there is no significance difference
between sample and population parameter.
Parametric and Non-Parametric Test for Testing Hypothesis
• The mechanism of inferential statistics is categorized into two groups i.e. parametric and
non-parametric tests.
• The test of hypothesis assuming that the samples are taken from normally distributed
population is known as parametric test/statistics.
• Most of the parametric statistical tools require the use of interval or ratio scale for the
collection of data. With these levels of measurement, arithmetic operations are
meaningful and mean, variance and standard deviation can be computed and interpreted.
• Non-parametric test does not require normally distributed population. So, it is also known
as distribution free technique. Non-parametric tools are used when data is collected using
nominal and ordinal scale.
Important Parametric Tests in Testing Hypothesis
a. Z-test:
• T-test is used to check the significance of mean; and z-test can only be used when
population is assumed to be normally distributed.
• In z-test, data analyst calculates z-value an compares it with tabulated value of ‘z’ and
test the hypothesis. So, z-test can be made in binominal distribution too. We can use z-
test for the followings:
• To judge the significance of statistical measures, particularly the mean.
• To compare the mean of a sample with some hypothesized mean of the population. For
example, Is the average sales of a product more than Rs. 10,000 per day?
• To judge the significance difference between means of two independent samples. For
example, Is there any difference in the performance of old and young employees?

6
• To measure the significance of medium, mode, coefficient of correlation and other
measures.
• To compare sample proportion with theoretical value of population proportion if
population variance is known
b. T-test:
• T-test is a univariate hypothesis test using t distribution rather than z distribution and
used when the standard deviation of population is unknown and the sample size is small.
• Paired t-test is used when two samples are inter-related and show the significant
difference between mean of thee two samples.
• T-test is also used to check the rational of simple and partial correlation.
• Calculated value of t is compared with tabulated value.
• On the basis of the comparison, null hypothesis is accepted or rejected.
• If size is small and standard deviation of population is unknown then t-test is used.
• For example, if a researcher takes 100 vehicles as sample and he finds that the vehicle
runs 52.5 kms. using 10 litres of petrol. Standard deviation from his sample is 14.
• Does this result show that the population average is still 50? In the above example,
standard deviation of single sample (S) is given.
• It is to be used as standard derivation (σ) of population. If standard derivation of sample
is used as standard deviation of population, t distribution is used even though sample size
is not very small.
c. Analysis of Variance (ANOVA)
• The t-test and z-test only test the difference in mean of one or two sample.
• But study of more than two samples at a time is not possible using the t and z test.
• Thus, to study over the more than two samples at a time, analysis of variance (ANOVA)
is used. This tool helps to study the area of variance and degree of variance.
• It classifies the variance as contingent variance and specialized variance.
• It is used to see the significance of mean of more than two samples at a time.
• It is also used to see the significance of multiple correlation coefficients.
Important Non–Parametric Tests for Hypothesis Testing
a. One–sample test iii. Determining significance level
i. Setting Hypothesis iv. Calculation of 2value
ii. Selection of statistical test v. See critical test value
iii. Setting significance level vi. Interpretation
iv. Calculation of 2 value c. Two related sample test
v. See critical test value d. K-independent sample test
vi. Interpretation e. K-related sample test
b. Use of Chi–square test in two
independent sample test
i. Setting hypothesis
ii.Selecting statistical test

7
Chi-square Test
• Chi-square test is rigorously used as non-parametric test in the research.
• Generally, chi-square test is used to check the dependency of two or more than two
groups. Chi-square is used when,
• The data is calculated in nominal scale.
• The sample size is more than 50.
• Expected frequency is not less than 5 or if it is less than 5 then, frequency is made more
than 5 adding existing frequencies.
• The individual or events are divided into two or more than two nominal groups like: yes
and no group, agree, undecided and disagree group and a, b, c, d group etc.
• We can test the significant difference between observed data and expected data using chi-
square test.
Measures of Association
Correlation
• A research project includes several variables which means and standard deviations of the
dependent and independent variables are unknown then we would often like to know how
one variable is related to another.
• It shows the nature, direction and significance of bivariate relationship between two
variables.
• A Pearson correlation matrix indicates the direction, strength and significance of
relationship between two variables among the variables in the study.
Regression Analysis
• A statistical technique that is used to see the degree of relationship between dependent
and independent variable is known as regression analysis.
• It has two variables i.e. independent and dependent variables.
• It estimates the changes in the dependent variables due to change in independent
variables.
• An equation that is estimated to see the degree of changes in dependent variables due to
change in independent variables is known as simple regression equation.
Time Series Analysis
• A statistical technique that is used to study the variation in the variables on the basis of
time is known as time series analysis. Generally, this technique is rigorously used in the
business research. For example, time series analysis is used to study the change in
production, sales etc. due to pace of time.
• Time series helps to predict the future trend based on past activities.
• Trend refers to upward and downward flow of activities in a certain period of time.
• Trend only does not impact to the activities but other elements also impact to the
activities of an organization.
• Trend only does not impact to the activities but other elements also impact to the
activities of an organization. Those elements which affect the organizational activities are
given below:
1. Secular trend or long-term fluctuation: Social factors/activities often show a definite tendency
to increase or decrease over a considerable period of time. Such tendency is known as secular

8
trend or long-term fluctuation. Such trend is estimated based on yearly data. For example,
production and sales can be forecasted considering the long term data of past. Secular trend can
be measured using following methods:
• Semi average method
• Least square method
• Graphical method
• Moving average method
• An analyst should consider the following two factors while studying secular trend.
• Cyclical variation: the movements are wave like in character but not definitely periodic.
The fluctuation seen in time series in a long-term period is known as cyclical variation.
• Seasonal variation: the fluctuation seen within one year’s period due to various seasonal
causes like festival, religious and cultural functions etc.
2. Random or irregular variation: The irregular movement may be episodic or they may be
accidental. The episodic changes may be caused by such factors as strikes, lock outs, earth quake
or some other type of disaster or natural calamities. It results in sharp and pronounced breaks in
the variables and shows no apparent tendency toward recurrence at started intervals.
• This topic is concerned with statistical methods designed to elicit information from these
kinds of data sets because the data include simultaneous measurements on many
variables. This body of methodology is called multivariate analysis.
The process of multivariate method is given below:
• Data reduction or structural simplification.
• Storing and grouping.
• Investigating the dependency of variables to each other.
• Prediction.
• Hypothesis construction and testing.
Some of the important techniques of multivariate analysis are given below:
• Multiple regressions
• Multiple analyses of variance: ANOVA technique
• Canonical correlation analysis
• Factor analysis
• Cluster analysis
• Multi dimensional scaling
• Latent structure analysis
Analysis of Qualitative Data
• Data which is expressed in subjective way or in language but not in numbers and
collected through observation, interview and discussion in known as qualitative data.
• Generally, qualitative data are collected from open ended questions or observation.
Qualitative data are to be systematized to make it understandable.
• Such data are analyzed using various techniques to draw certain conclusion which is
known as analysis of qualitative data. Analysis of qualitative data explains the data that
helps to understand the significance or importance and complexity of subject matter.
There is no rigid process of analyzing the qualitative data but some steps that are followed by
most of the analysis are given below:

9
• Data reduction: data are collected at first in large numbers. To classify the data into
different classes, data is to be reduced.
• Data presentation: data are to be presented in a certain format for the integration of data.
It helps to keep data in condensed form so that data reduction and summarizing is
possible.
• Drawing conclusions: the researcher should draw conclusions from the presentation and
analysis of data. The researcher should check biasness and reliability in data that reflects
the reliability and validity of research. Reliability can be increased taking feedback from
external experts.
Methods of Analyzing Qualitative Data
• Qualitative data analysis is related to integration, classification and trend/ behavior of the
data. Thus, they are analyzed using different methods which are given below:
Content Analysis
• Content analysis is a qualitative data analysis technique for the systematic, objective and
quantitative descriptions of the content of data collected through interviews,
questionnaires, schedules and other expressions in written or verbal form.
• It is used to analyze the data obtained from case study, details from work field and open-
ended questions.
• Researcher synthesizes the scattered information and data so that they can be analyzed
and draw some information. For example, radio, TV and other seminars, meeting etc.
discuss over the quality of product or services all over the world.
• The collection of information related to quality of product and services from published
and unpublished, written and oral sources and integration of such information
purposefully and making them measurable is considered as content analysis.
Features of content analysis:
• Systematic: logical and systematic
• Objectivity: purposive and unbiased
• Generalizability: findings should be applicable in practice, theoretically accepted,
empirical
Guidelines for content analysis:
• Clear operational definition of the units of analysis: identification and definition of units
the researcher wants to analyze. Research questions must focus to such units. Such units
should be operationalized. Operationalization refers to define the concepts linking with
theories.
• Clear definition for the response category: responses obtained from the respondents must
be classified into different groups and such groups must be able to present research
objectives and they must be independent to each other.
• Analysis of material: every material should be thoroughly observed and analyzed before
developing categories of data for content analysis.
• Decision on developing categories: first of all, researcher should decide about creation of
classes of data. Such classes of data must be independent. Researcher should consider to
repetition of class, number of class and size of classes.

10
• Maintaining impartiality: persons who are involved in data collection should not be
involved in content analysis. If the content analyst knows the purpose of study there is
chances of biasness.
• The validity of the content analysis should be assessed: researcher should put maximum
effort to collect all the relevant materials and analysis of those materials.
Limitations of Content Analysis
• Non-reliable result
• Difficult to categorize data/material
• Difficult to get clear and appropriate information
• Costly
• Difficult to generalize
Steps for Conducting Content Analysis
• Identify the essential data
• Develop bases for tabulation
• Develop bases for content analysis
• Develop the layout for the construction of design
• Classify various variables into various groups
• Establish procedures for the use of materials
• Prepare outline of analysis and utilizing them
Narrative Analysis
• A technique of recording and analyzing the information and subject based on the story
told by the respondents or people related to an event or subject matter is known as
narrative analysis.
• In this process, researcher requests to the respondents to provide detail information
related to a subject or events on the basis of observation or experience.
• There is no pre-determined question and respondents are not asked to give the answers of
the questions.
• Narrative data come from various sources. Researcher may obtain from response to open-
ended question, the feedback from focus group, notes from field observation or the
published reports.
• Information collected for narrative analysis helps to analyze the various dimensions of
the society and human behavior.
• It gives information about the society and behavior of the people.
Elements of narrative analysis
• Understanding level
• Data collection
• Analysis
• People’s understanding over events
• Key actors and events
Steps for narrative analysis
• Obtaining data
• Focusing on analysis of data obtained from autobiography, interview, focus group
discussion etc.

11
• Codifying data using sign or symbols
• Identify the relationship among the various classes
Thematic Analysis
• Theme refers to the main point or quality of a subject or event.
• Thematic analysis is used to identify the major points of data, analyze those data and
prepare report.
• Thematic analysis is a work of searching theme of the data, event or subjects that is
important for the description of the phenomena.
• The process involves the identification of themes through careful reading and re-reading
of data, noting down initial ideas, coding interesting features of the data, relating code
into theme, generating thematic map, on-going analysis to refine each of them and
producing report through the continuous analysis.
Steps of thematic analysis
• Reviewing the previous literature
• Generating initial codes
• Searching for themes
• Reviewing themes
• Defining and naming themes
• Preparing report

CASE
A research work is assigned to a group of MBS students who are still pursuing their MBS
level within Kathmandu valley. The main objective of the study was to assess the market share
of a motorcycle producing company and potential market. They discussed on the issue and
decided to collect the data in the category, motorcycle users and non-users.
The data were collected from the 150 people who are met in Indra Chowk, Kathmandu. A
total 20 respondents are not willing to answer on this issue and 30 people provided answer to the
half of the questions and rest of them provided answers that are useable for further proceeding.
After the collection of data, they managed the data in charts and tables. They decided to
test the distribution of data. They thought that data could be normally distributed but surprisingly
found binominal distribution.
Questions:
1. Which tool of data analysis, parametric or non-parametric, can you use in such data?
-Non-parametric
2. Which statistical tool do you use to analyze data?
-Chi- square
3. Can you use correlation and regression analysis in such cases? Explain.
-Correlation can be used but regression can not be used because data are not normally distributed

12

You might also like