Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Data Analysis in Business Research Key C

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

International Journal of Research in Management & ISSN : 2348-6503 (Online)

Vol. 2 Issue 1 Jan. - Mar. 2015


Business Studies (IJRMBS 2015) ISSN : 2348-893X (Print)

Data Analysis in Business Research: Key Concepts


Niharika Singh, IIDr. Amit Kumar Singh
I

I
Research Scholar, Dept. of Management, Mizoram University Aizawl
II
Assistant Professor, Dept. of Management, Mizoram University Aizawl

Abstract
The data may be adequate, valid and reliable to any extent; it does not serve any worthwhile purpose unless it is carefully analyzed.
There are a number of techniques can be used while analyzing data. These techniques fall into two categories; descriptive and
inferential, constituting descriptive and inferential analysis. They can serve many purposes: to summarize the data in a simple
manner, to organize it so it’s easier to understand and to use the data to test theories about a larger population. Given the ready
availability of computer software, tedious formulae and calculations can be avoided today. But, there is no substitute for having a
good understanding of the conceptual basis of analytic methodologies that one applies in order to draw inferences about hard-own
research data. Hence, an effort has been made in this paper to provide theoretical introduction of few but most widely used analytical
tools, will allow one to produce meaningful data analysis in business research.

Keywords
Descriptive analysis, Inferential analysis, Hypothesis Testing, Estimation, Measures of Central Tendency, Measures of Dispersion,
Relationship etc.

I. Introduction processing implies editing, coding, classiication and tabulation


Data Analysis is used in many industries to allow companies and of collected data so that they are amenable to analysis.
organization to make better business decisions. Data Analysis is
one of the many steps that must be completed when conducting a II. Objectives
research but it assumes special signiicance. Data when collected The main objective of this paper is to provide a detailed summary
from various primary and secondary sources is in its raw form are of data analysis, and its uses in understanding the concept of data
incredibly useful but also overwhelming. It is almost impossible analysis for behavioural research.
for researcher to deal with all this data in its raw form. Through
data analysis such data is presented in a suitable and summarized III. Methodology
form without any loss of relevant information so that it can be For this study, data is collected from Secondary sources
eficiently used for decision making. Data can be presented in and available literature has been reviewed and analyzed for
the tabular or graphic form. The tabular form (tables) implies understanding the concept and use of data analysis for behavioral
numerical presentation of data. The graphical form (igure) research.
involves the presentation of data in terms of structure which can
be visually interpreted, e.g., Bar charts, Pie charts, Histograms, IV. Discussion On Data Analysis
Line charts etc.
Charts, graphs and textual write ups of data are all form of data A. Definition of Data Analysis
analysis. These methods are designed to reine and distill the data so The term analysis refers to the computation of certain measures
that readers can have clean interesting information without needing (such as measures of central tendency, variation etc) along
to sort through all of the data on their own. It has to be noted that with searching of patterns of relationship (such as correlation,
research data analysis provides the crucial link between research regression) exist among data groups. Apart from that, in the
data and information that is needed to address research questions. process of analysis, relationships or differences, supporting or
Data Analysis has multiple facets and approaches, encompassing conlicting with original or new hypothesis should be subjected
diverse statistical techniques, under a variety of names in different to statistical tests of signiicance to determine with what validity
businesses, science and social science domains. data can be said to indicate any conclusions. Analysis, therefore,
Analysis of data means studying the tabulated material in order be categorized as descriptive and inferential analysis (inferential
to determine inherent facts or meaning. A plan of analysis can analysis is also known as statistical analysis).Descriptive analysis
and should be prepared in advance before the actual collection deals with computation of certain indices from raw data with and
of material. establishing relation between two or more variables. Whereas,
Processing and analysis of data is always found to be interwoven. inferential analysis is concerned with the: (a) the estimation of
Many experts are of the view that analysis of data is different population parameters, and (b) the testing of statistical hypothesis
from processing of data. Prof. John Gatting had made distinction or test of signiicance.
between analysis of data and processing of data. According to him As Prof. Wilkinson & Bhandarkar quoted “Analysis of data
processing of data refers to concentrating, recasting and dealing involves a number of closely related operations that are performed
with the data so that they are as amenable to analysis as possible, with the purpose of summarizing the collected data and organizing
while analysis of data refers to seeing the data in the light of these in such a manner that they will yield answer to the research
hypothesis of research questions and the prevailing theories and questions or suggest hypothesis or questions if no such questions or
drawing conclusions that are as amenable to theory formation as hypothesis had initiated the study.” (Mohan & Elangovan 2011)
possible.(Gupta 2010). But there are experts who do not like to make In general, data analysis is the science of examining raw data with
difference between processing and analysis. Technically speaking, the purpose of drawing conclusions about the information.

© 2015, IJRMBS All Rights Reserved www.ijrmbs.com


50
ISSN : 2348-6503 (Online) International Journal of Research in Management &
Vol. 2 Issue 1 Jan. - Mar. 2015
ISSN : 2348-893X (Print) Business Studies (IJRMBS 2015)

B. Goals of Data Analysis: between two variables simultaneously, e.g., ross-classiication


1. Giving a feel to research data: After data collection, the irst step of age group, it is known as bivariate analysis.
towards understanding the huge mass of data has been gathered,  Multivariate analysis: In multivariate analysis, three or more
is to arrange the materials in a concise and logical order. The variables are investigated simultaneously, allowing us to
procedure is referred to as classiication and tabulation of data. consider the effects of more than one variable at the same
However, these forms of presentation may not be very interesting time. For example, identifying job satisfaction in terms of
to the common man. Too many igures are often confusing and age, sex, salary and so on. Multivariate analysis includes
may fail to convey the message effectively for which they are techniques like multiple regression analysis, multiple
meant. Hence, another important convincing and easily understood discriminant analysis, multivariate analysis of variance
method of presenting the data is the use of graphs and diagrams. (MANOVA) , factor analysis and canonical analysis. Some
Constructing tables and graphs for the concerned data is a major of these terms are briely described in upcoming sections.
part of analysis, which will facilitate in better understanding and The important statistical measures that are used in descriptive
comparison of data. analysis are:
2. Identifying average values and variability: Most of the research a) Measures of Central Tendency or location or average values :
studies result in a large volume of raw data which must be suitable The simplest type of statistical analysis of a data containing a set
reduced so that the same can be read easily and can be used for of observations, is the calculation of a single value which could
further analysis. One of the most important goals of data analysis be taken as the representative of data. There are various measures
is to get one single value that describes the characteristic of the for arriving at this value, and are known as measure of central
entire mass of unwieldy data. Such a value is called central tendency or location or average values. These measures indicate a
value or average values. The most important averages are mean, value where all the observations can be assumed to be located or
median and mode. The various measures of average values concentrated. They are important to use further higher statistical
alone cannot adequately describe a set of observations, unless calculations. There are three such measures:
all the observations are same. To identify the measurement of i. Mean: There are three types of means:
scatteredness or variability of the mass of data in a series from Arithmetic mean: (typically referred to as mean) is the most
the average is equally important to describe the data. common measure of central tendency. It is considered highly
3. Identifying relation between variables: One of the ways that valuable since it considers the whole data set and gives equal
one can get better insights into the data is by discovering that importance to all. But there is one exclusive case in arithmetic
variables are related to each other i.e. with increase in one variable mean i.e. weighted arithmetic mean, where, unequal weight
there is an increase in other and vice versa. Also effort is made or importance is given to all observations. We calculate the
to know the cause and effect relation between two or more than arithmetic mean by adding together all the values in the data set
two variables. and then dividing that sum by the number of values in the data.
4. To make inferences about population parameter: In most of the
research studies, it is not possible to enumerate whole population Chart 1. Types of Analysis of Data
in the study. Hence, a part of the population i.e. sample is taken to
consider for the study. One of the goals of data analysis of these
samples is to use information contained in sample of observation
(such as sample mean, sample standard distribution) for drawing
conclusion or making inference about the larger population (such
as population mean, standard deviation etc).
5. To test the hypothesis: A statistical hypothesis is an assumption
about any aspect of a population. For e.g., there is no relationship
between compensation and job satisfaction (i.e. null hypothesis)
Analysis of data is carried out to test a hypothesis on the basis
of sample values, so that hypothesis can be accepted or rejected.
Ultimate decisions are taken on the basis of the collected
information and the result of the test.

C. Types of Analysis:
As mentioned earlier, in section 1, statistical analysis can
be categorized into descriptive and inferential analysis. Geometric mean: In many cases giving equal importance to all
1. Descriptive analysis: is mostly concerned with computation observations may lead to misleading answers. One of the measures
of certain indices or measures from the new data. Zikmund has of location can be used in such cases is geometric mean. The
quoted “…with descriptive analysis, the raw data is transformed Geometric Mean(G.M.) of a series of observations is deined as the
into a form that will make them easy to understand & interpret.” nth root of the product. It is to be noticed that if any of observation
It is largely the study of distributions of one variable. This sort of is zero calculating G.M. is not possible since the product of various
analysis can be analyzed data in three different ways: values becomes zero.
 Univariate analysis: When a single variable is analyzed alone, Harmonic Mean: The Harmonic Mean (H.M.) is deined as
e.g., statistic 1such as “mean” which might refer to age group the reciprocal of the arithmetic mean of the reciprocals of the
of students, it is known as univariate analysis. observations. The mean is used in averaging rates when the time
 Bivariate analysis: When some association is measured factor is variable and the act being performed is the same, such
as, for calculating average speed of car H.M. is used. The main

www.ijrmbs.com 51 © All Rights Reserved, IJRMBS 2015


International Journal of Research in Management & ISSN : 2348-6503 (Online)
Vol. 2 Issue 1 Jan. - Mar. 2015
Business Studies (IJRMBS 2015) ISSN : 2348-893X (Print)

limitation of H.M. is it cannot be calculated if any value is zero. its square root is known as the standard deviation. Karl Perason
ii. Median: There are situations in which data set have extreme introduced these terms.
values at lower or higher end, termed as outliers in statistical c) Measure of Asymmetry (skewness): A distribution of data values
language. In such cases arithmetic mean is desirable to use, since is either symmetrical or skewed. In symmetrical distribution, the
it easily gets affected by those extreme values. For example, if the values below the mean are distributed exactly as the values above
data is such as 2, 3, 5, 2, 22, the mean will be 16.4 which cannot the mean. In this case, the low and high values balance each
be considered as good representative of data. Hence, another other out. In a skewed distribution, the values are not symmetrical
measure of location i.e. median is used in such cases. Further, around the mean. This results in an imbalance of low or high
whenever, exact values of some observations are not available values. Skewness is a measure of asymmetry in data. The data
median is used. can be negatively ( or left) skewed or positively (or right) skewed.
Median is the point that divides a distribution of scores in two In left skewed, most of the values are in the upper portion of the
equal parts, one part comprising all values greater and the other distribution, whereas, in right skewed, most of the values are in
all values less than median. To be remembered, the median is a the lower portion of the distribution.
hypothetical point in the distribution; it may or may not be an If the distribution is skewed, the extent of skewness could be
actual score. measure by, Bowley’s Coficient of Skewness or Pearson’s
iii. Mode: The third central tendency statistic is the mode. Mode is Measure of Skewness.
deined as the ‘most fashionable’ value, which observation is most Kurtosis is an indicator of peakedness of a distribution. Kark
frequently occurring in a set of data. For example, a data series Perason called it “Convexity of Curve”. A bell shaped or normal
is as 2, 3, 4, 2, 2, 6 and 9, the mode is 2 because 3 observations curve is Mesokurtic, whereas more peaked curve than the normal
have this value. Mode is frequently used in cases where complete curve is Leptokurtic and a curve more lat than the normal curve
data are not available, as well as, when the data is in quantitative is Platokurtic.
form where only getting a data regarding presence/ absence of Knowing the shape of distribution is necessary since some
the observation is possible. assumptions about their shape is made for the use of certain
b). Measures of variation or dispersion: In addition to central statistical methods.
tendency, every data can be characterized by its variation and shape. d) Measures of Relationship
In two or more data sets central tendency may be the same but Very often, researchers are interested to study the relationship
there can be wide disparities in the formation of the set. Variation between two or more variables, which is done with the help of
measures the dispersion, or disparities, of values in data set. correlation and regression analysis. The ideas identiied by the
Dispersion may be deined as statistical summaries throwing light terms correlation and regression were developed by Sir Francis
on the differences of items from one another or from an average. Galton in England.
Most commonly used in statistics are the standard deviation and Correlation is a statistical technique that describes the degree of
variance, but there are many other, discussed below: relationship between two variables in which with the change in
i.. Range: The range is the simplest measure of variation in a set value of one variable, the value of other variable also changes.
of data, and is deined as the difference between the maximum and The degree of correlation between two variables is called simple
minimum values of the observations. However, it only depends correlation. The degree of correlation between one variable and
on minimum and maximum values, and does not utilize the full several other variables is called multiple correlation.
information in the data, it is not considered very reliable. The simplest and yet probably the most useful graphical technique
ii. Semi Inter-Quartile Range or Quartile Deviation: Quartiles for displaying the relationship between two variables is scatter
split a set of data into four equal parts- the irst Quartile Q1, carries diagram (also called scatter plot). Here, the data for two variables
25 % of the data set values; the second Quartile Q2, carries 50% are plotted on x and y axis of graph. If the points are scattered
of the data set values; the third Quartile Q3, carries 75% of the around a straight line, the correlation is linear and if the points
data set values. The interquartile range (also called midspread) are scattered around a curve, the correlation is non-linear (curvi-
is the difference between third and irst quartiles in a data set i.e., linear).
Q3-Q1. The interquartle range measures the spread in the middle The scatter plot gives a rough indication of the nature and
50% of the data. strength of relationship between two variables, The quantitative
However, a much more popular measure of variation is Semi measurement of the degree/extent of correlation between two
Inter-Quartile Range or Quartile Deviation, and is deined as Q3- variables, is performed by coeficient of correlation. It was
Q1/ 2. developed by Karl Pearson, the great biologist and statistician,
iii. Mean or Average Deviation: Mean Deviation is deined as hence referred as “Pearsonian Correlation Coeficient” (also
known as Product moment correlation coeficient) It is denoted
by greek letter ρ(rho), when calculated from population values,
the average of difference of individual items from some average
of the series, can be mean, median or mode. Such difference of
individual items from some average value is termed as deviation. ‘r’ when calculated from sample values. The value of coeficient
While calculating mean deviation all deviations are treated as of correlation varies between two limits +1 and -1. The value +1
positive ignoring the actual sign. shows perfect positive relationship between variables, -1 shows
iv. Variance and Standard Deviation: In mean deviation negative perfect negative correlation and 0 indicates zero correlation. If the
sign is ignored, otherwise the total deviation comes out to be relationship between two variables is such that with an increase
zero, since similar values with opposite signs will cancel each in the value of one, value of other increases or decreases, in a
other. However, another way of getting over this problem of total ixed proportion, correlation between the variables is said to be
deviation being zero is to take the squares of deviations of the perfect. Similarly, perfect positive correlation means increase in
observations from the mean. The sum of squares of deviation one variable bring increase in other, in same proportion and vice
divided by number of observations is known as variance and versa. Perfect negative means increase in one variable decreases

© 2015, IJRMBS All Rights Reserved www.ijrmbs.com


52
ISSN : 2348-6503 (Online) International Journal of Research in Management &
Vol. 2 Issue 1 Jan. - Mar. 2015
ISSN : 2348-893X (Print) Business Studies (IJRMBS 2015)

the other variable in same proportion. Zero correlation shows may be noted that as many methods discussed until now involved
there is no linear relationship between two variables. It is to be only two variables i.e., simple correlation analysis and regression
noted that ‘r’ indicates the extent of only linear relationship. Zero analysis. However, very often, one is required to study the relation
value only indicates there is no linear relationship, but there could between more than two variables, impact of several independent
be other type of non-linear relationship. variables, jointly together, on dependent variable. This is possible
Above discussed, Pearsonian Correlation Coeficient is applicable through multiple correlation and multiple regression analysis
only when data is in interval or ratio form i.e. quantitative respectively. Here, multiple correlation coeficients are obtained
measurement of variables such as height, weight, temperature, which indicate the relation between one dependent variable
and income is possible. In some cases such as beauty, honesty or and several independent variables by using multiple regression
in similar cases where data is only available in ordinal or rank equations
form. Karl Pearson’s formula of correlation coeficient is not When correlation between any two variables is analyzed where the
possible. Hence, Charles Edward Spearman in 1904 developed effect of the third variable on these two variables is held constant
a measure called ‘Spearman Rank Correlation’ to measure or removed, then such analysis is known as partial correlation
correlation between ranks of two variables. It is denoted as rs. analysis and such correlation is termed as partial correlation
Value of correlation coeficient also ranges between +1 and -1. coeficient. Similarly, partial regression coeficient is the value
Spearman rank correlation is said to be a non-parametric or indicates the change that will be caused in dependent variable with
distribution free method, since it doesn’t fulil the assumption a unit change in independent variable when other independent
of normal distribution for both variables. One similar kind of variables held constant. But, as a matter of fact multiple correlation
method used for getting association between ranks of variables coeficients and multiple regression coeficients are applicable
is Kendall Tau rank correlation. only in case of ratio or interval data. In case of ordinal data such
When one or both of the variables is in categorical form i.e., not correlation can be enumerated by Kendall partial rank correlation
measurable but only on the basis of their presence or absence in & in case of nominal data discriminant analysis is used. (See
each case it is possible to know their frequency or total number Table 1)
of occurrences, the data said to be on nominal scale. In such
cases, to know the association between two attributes ‘coeficient Table 1: Choice of relationship analysis tool based on number of
of contingency’ or ‘ coeficient of mean square contingency’ variables and scale of measurement
introduced by Karl Pearson is used.
Correlation analysis deals with exploring the correlation between For twovariables(i.e. Pearson product moment
two or more variables. Whereas, regression analysis attempts to simple correlation) correlation coeficient
establish the nature of relationship between variables, that is, For interval or ratio data Spearman rank order
to study the functional relationship between the variables and For ordinal data correlation coefficient or
thereby provide a mechanism for predicting, or forecasting. Kendall Tau rank correlation
Example, correlations tells there is a strong relation between For nominal data Contingency coeficient
advertisement and sales. Regression will predict this much
increase in advertisement will give this much of increase in sales.
Regression analysis can be of two types- simple (deals with two For more than two Multiple regression analysis
variables) and multiple (deals with more than two variables). variables(i.e. multiple
If the relationship between two variables, one independent or correlation)
predictor or explanatory variable and other dependent or explained For interval or ratio data Kendall partial rank
variables, is a linear function or a straight line, then the linear For ordinal data correlation
function is called simple regression equation, and the straight For nominal data Discriminant analysis
line is known as regression line. It is a “line of best it” i.e. the NA
line on which the difference between the actual and estimated
values will be minimum. The simple regression equation is used
to make predictions. Source: Compiled by Authors
y = a + bx OR x= a + by are two possible regression equations 2. Inferential Analysis: Inferential analysis is mainly concerned
in case of two variables involved in regression analysis. First with (a) estimation of population values such as population mean,
equation said as regression equation of y on x and so on. In population standard deviation, and (b) various tests of signiicance/
irst equation y and in second equation x is dependent variable, testing of hypothesis. Inferential analysis plays a major role in
whereas x in irst equation and y in second equation is independent statistics since mostly it is not possible to go for whole population
variable. Here, ‘a’ and ‘b’ are constants, ‘a’ is intercept and ‘b’ while conducting researches, hence, a sample is chosen and using
is slope or inclination or most popularly known as regression inferential analysis the sample values obtained are used to infer
coeficient. Regression coeficient gives the change in dependent about the population. The objective of inferential analysis is to
variable when independent variable changes by 1 unit. To estimate use the information contained in a small sample of observations
the relationship between x and y it is vital to determine ‘a’ and for drawing a conclusion or making an inference about the larger
‘b’ respectively. This is done through the Principle of Least population. Such inference may be in the form of estimation or
Squares Method. Apart from that the Principles of Least Squares Testing of Hypothesis or assumptions. For example, either one
provide criterion to select “line of best it” mentioned in the last could estimate population parameter based on sample statistic,
paragraph. like ‘ mean life of a car battery’ , or one could test the claim of
In case, curved relationship is found between variables, then company that ‘mean life of car battery is 3 years’. In both the
correlation ratio eta (η) gives the degrees of its association. It cases an inference about population is made. There are various

www.ijrmbs.com 53 © All Rights Reserved, IJRMBS 2015


International Journal of Research in Management & ISSN : 2348-6503 (Online)
Vol. 2 Issue 1 Jan. - Mar. 2015
Business Studies (IJRMBS 2015) ISSN : 2348-893X (Print)

methods of estimation and testing of hypothesis. Null hypothesis is denoted as Ho and alternative hypothesis is
a) Estimation: It deals with the estimation of parameters such as denoted as Ha. It has to be kept in mind that, we cannot prove a
population mean based on the sample values. The method or rule hypothesis to be true. We may ind the evidence that supports the
of estimation is called an estimator like sample mean, the value hypothesis. Suppose, we have failed to reject the null hypothesis,
which the method or rule gives in a particular case is estimate of doesn’t mean null hypothesis have been proven to be true, because
population parameter. In other words, estimator is a function of the decision is only made on the basis of sample information.
sample values to estimate a parameter of population. With the help Once the null and alternative hypothesis has been set up, the
of samples of observation, an estimate in the form of a speciic next step is to decide on the level of signiicance. It is used as
number like 25 years can be given or in the form of an interval a criterion for rejecting the null hypothesis. It is expressed as a
23-27 years. In the former case it is referred as point estimate, percentage like 5% or 1%, or sometimes as 0.05 or 0.01. It is
whereas in the latter case it is termed as interval estimate. that level, at which we are likely to reject null hypothesis even
i. Point estimate: It is used to estimate a population parameter, if it is true. Now decision on the appropriate statistic such as t,
with the help of sample of observations. A point estimate is a single z, f etc is taken. Based on the level of signiicance critical or
value, say 50. This number is taken as the best value of unknown tabulated value is found. After calculating the statistic from the
population parameter. An estimator is said to be eficient, if it has given sample of observation, the test statistic is compared with the
minimum variance such as sample arithmetic mean. critical value. If calculated value (statistic) is equal to or less than
There are several methods of estimating the parameters of a critical value, the difference between result and expected value
distribution, such as, maximum likelihood, least squares, methods is insigniicant and this insigniicant difference can be subjected
of squares and minimum chi-square. to sampling error, hence, null hypothesis is accepted. Whereas,
ii. Interval estimate: Point estimate gives a single value, taken as if calculated value is higher than critical value, the difference is
best estimate of parameter. However, if another data is collected said to be signiicant, and can’t be subjected to sampling error,
from same population, the point estimate may change. In real therefore null hypothesis is rejected.
life situation population parameter may not be exactly equal to Whenever we take a decision about population based on sample,
sample statistic, and could be around this value. Thus it may the decision cannot be 100% reliable. The possibilities can be,
be more logical to assume that the population values lies in an we would reject null hypothesis even if it is true, termed as Type
interval containing the sample, such as 48-52, known as interval I error, denoted as α or we could accept the null hypothesis even
estimate. It is expected that the true value of population will fall if it is false, termed as Type II error, denoted as β.
within this interval with the desired level of conidence; hence Type I error is also referred as level of signiicance, as discussed
the name ‘conidence interval’ is given. above. The quantity 1- β is called the ‘power’ of test, signifying
The interval should be in reasonable limits. These limits are the test ability to reject null hypothesis when it as false, and 1- α
statistically calculated. The limits or intervals, so arrived, are is called conidence coeficient.
referred to as conidence intervals or conidence limits. Since Various tests of signiicance have been developed to meet various
we are estimating population parameter from sample values, we types of requirements. They may be heavily classiied into,
can never make any estimation with 100% conidence. Desired parametric and non-parametric tests. Parametric tests are based
conidence for estimation is termed as conidence interval. Usually, on the assumptions that the observations are drawn from a normal
95% level of conidence is considered adequate. One can state as distribution. Since the testing procedure requires assumptions
‘with 95% conidence can say that population parameter will fall about the type of population or parameters values these tests are
somewhere between conidence interval of 40-50’. known as ‘parametric tests’. The test of signiicance developed
b). Testing of Hypothesis/Test of Signiicance: In most of the for situations when this condition is not satisied, known as ‘non-
cases, it is almost impossible to get knowledge about population parametric tests’ or ‘distribution-free tests’. As a matter of fact,
parameter, therefore, hypothesis testing or test of signiicance is parametric tests are more powerful test than non-parametric
the often used strategy for deciding whether sample offers such tests.
support for a hypothesis or assumptions that generalizations Various parametric and non-parametric tests of signiicance
about population can be made. In other words, test can ind the performing different functions in different conditions is mentioned
probability that a sample statistic would differ from a parameter below in a tabular form:
or another sample .
Hypothesis testing typically begins with some assumptions or Table 2: Choice of parametric/non-parametric test based on
hypothesis or claim about a particular parameter of a population. function to perform & scale of measurement
It could be the parameters of a distribution like mean, describing Non-Parametric Parametric tests
the population; the parameters of two or more population, Function Tests (Ordinal/ Nominal
correlations or associations between two or more characteristics of (Interval/Ratio Data)
a population. Hypothesis can be of two types, null and alternative Data)
hypothesis. Null hypothesis is considered to be a hypothesis of “no Test of ‘t’(mean known,
relationship”. Such as ‘there is no signiicant difference between Signiicance of S.D.* unknown)
sample means’. The term Null hypothesis is said to have been one sample test ‘z’(mean known, Sign test
introduced by R. A. Fisher. The word Null is used because the S.D. known)
nature of testing is that we try our best to nullify or reject this
hypothesis based on sample collected. When null hypothesis is
rejected the opposite of null hypothesis i.e. alternative hypothesis
is automatically accepted. Alternative hypothesis is the statement
which is intended to be accepted if the null hypothesis is rejected.

© 2015, IJRMBS All Rights Reserved www.ijrmbs.com


54
ISSN : 2348-6503 (Online) International Journal of Research in Management &
Vol. 2 Issue 1 Jan. - Mar. 2015
ISSN : 2348-893X (Print) Business Studies (IJRMBS 2015)

Test of Independent Kolmogorov- References


Signiicance samples ‘t’ test Smirnov two sample [1] Bhattacharyya, D. K.(2006). Research Methodology, New
for difference (S.D. unknown) test or Mann Delhi, Excel Books, pp. 127- 166
between two ‘z’ test (S.D. Whitney U Test or [2] Dunn, O.J. & Clark, V.A.(2009). Basic Statistics-A Primer
independent known) Wilcoxon Sum of for the Biomedical Sciences, New Jersey, John Wiley & Sons,
sample Rank Test pp. 50-58 & 189-198.
( Ordinal Data) & [3] Gupta, S.(2010). Research Methodology, New Delhi, Deep
Chi-Square test ( & Deep Publications Pvt. Ltd, pp. 192-209
Nominal Data) [4] Kumari, A.(2008). An Introduction to Research Methodology,
Test of Wilcoxon matched- Udaipur, Agrotech Publishing Academy, pp. 79-112.
Signiicance Paired ‘t’ test pairs signed ranks [5] Kumar,C.R.(2008). Research Methodology, New Delhi, APH
for difference test (ordinal data) Publishing Corporation, pp. 113-131
between two & Mc Nemar test [6] Kothari, C.R.(2009). Research Methodology-Methods &
paired samples( for the signiicance Techniques, New Delhi, New Age International (P) Limited,
series of of changes.(nominal pp. 122-149.
samples taken data) [7] Levine, D. M., Krehbiel, T.C., Berenson, M.L. & Vishwanathan,
from same P.K.(2011). Business Statistics, New Delhi, Dorling
population) Kindersley (India) Pvt. Ltd., pp. 82-117 & 290-310
Test of ‘F’ Test & One Kruskal-Wallis [8] Mohan, S.(2011). Analysis & Interpretation of Data. Pages
Signiicance way ANOVA Rank SumTest – H 145- 163 in Research Methodology in Commerce, New Delhi,
for difference (Analysis of Test or Wilcoxon- Deep & Deep Publications Pvt. Ltd.
between series Variance) Wilcox multiple [9] Shrivastava, T.M., Rego, S.(2010). Statistics for Management,
of independent comparison test New Delhi, Tata Mc Graw Hill Pvt. Ltd., pp. 4.1-4.56 & 10.3-
samples (ordinal data) & 10.80
chi-sqaure for [10] Spatz, C.(2008). Basic Statistics-Tales of Distribution, USA,
k independent Thomson Wadsworth, pp. 41-55 & 86-97
samples (nominal [11]Taylor, B., Sinha, G. & Ghosal, T. (2007). Research
data) Methodology- A Guide for Research in Management and
Test of Repeated Freidman two- Science, New Delhi, Prentice Hall of India, pp. 135-151.
Signiicance measures ANOVA way analysis of [12]Verma, R. K. & Verma, G. (2010). Research Methodology ,
for difference (Analysis of variance( ordinal New Delhi, Commonwealth Publishers, pp. 107-118.
between series Variance) data) & Cochran Q
of paired Test (Nominal data) Author’s Profile
independent Niharika Singh is a research scholar (SRF) in the department
samples of Management, Mizoram University, Aizawl. She is pursuing
Source: Compiled by Authors her Ph.D. in the area of employee retention. She has published

Apart from above functions Chi-square test , denoted as χ2 used


(*S.D. is standard deviation of population) 1 article in international journal and contributed 3 chapters for
edited books. Miss Singh has presented 6 papers and participated
as test of goodness of it i.e. how well observed values its with in 8 national and international seminars.
of expected values and test of independence i.e. it tests existing

alternative for χ2 test as goodness of it is Kolmogorov-Smirnov


association between two categorical variables. Non- parametric Dr. Amit Kumar Singh is assistant professor in the department of
Management, Mizoram University, Aizawl. He got his education
test. Apart from that, ‘t’ test is used for testing signiicance of from BHU, Varanasi, VBS PU Jaunpur and IIM Ahmedabad.
correlation & regression coeficient or slope; ‘F’ test & ANOVA is He has published 2books and more than 35 research papers
used for testing signiicance of multiple regression coeficient. in International/National journals and edited books and also
presented/delivered lecture in more than 50 seminar/workshop/
V. Conclusions conferences. He also served as member of different academic
For any successful study, analysis of data is one of the most crucial bodies like BOS, School Board, Academic Council (as Head i/c),
step. It is always advisable that it should be designed before the etc. Currently he is in the advisory board of 12 International
data are actually collected. Otherwise there is always danger of journals. Dr. Singh is also working as Japan International Co-
being too late and the chances of missing out relevant facts. There operation Agency (JICA) service consultant for department of
are a number of analytical tools can be used for summarizing the Forest, Government of Mizoram.
data and inferring about the population based on sample values.
But to use any tools some of the assumptions have to be fulilled,
therefore, this assumptions always have to be kept in mind by the
researcher before applying any analytical tool. That is why it is
said that analysis requires a lot of experience and knowledge in
the ield of data analysis.

www.ijrmbs.com 55 © All Rights Reserved, IJRMBS 2015

You might also like