Basic Functions of Excel
Basic Functions of Excel
Page 1
C) AVERAGE: -Average is the sum of all the numbers divided
by the number of addends or respondents.
Page 2
E) MINIMUM: - MIN function returns the smallest value
from a supplied set of numeric values.
Page 3
CHAPTER 2: GRAPHICAL REPRESENTATION OF DATA
TYPES OF DIAGRAMS:
Page 4
2) The graph represents categories on one axis and a discrete
value in the other. The goal is to show the relationship between
the two axes
3) Bar charts can also show big changes in data over time.
Page 5
IMPORTANCE OF LINE GRAPHS
1) Line graphs are used to track changes over short and long
periods of time.
Page 6
the pie-chart, the total of all the data is equal to 360 degrees.
The degrees of angles that are used to represent different items
are calculated in the form of proportionality.
Page 7
CHAPTER 3: MEASURES OF CENTRAL TENDENCY
1) MEAN: mean is the sum of the value of each observation in
a dataset divided by the number of observations. This is also
known as the arithmetic average.
Merits of mean:
1) It is capable of being treated mathematically and hence it is widely
used in statistical analysis.
2) It based on all observations and it can be regarded as representative
of the given data.
3) It is least affected by the fluctuation of sampling.
Demerits of mean:
1) It can neither be determined by inspection or by graphical location
2) It is too much affected by extreme observations and hence it is not
adequately represent data consisting of some extreme point
FORMULA : - =AVERAGE(number1,number2….n)
Page 8
2) MEDIAN: The median is that value of the series which divides the
group into two equal parts, one part comprising all values greater than
the median value and the other part comprising all the values smaller
than the median value.
Merits of median
1) median value is not destroyed by the extreme values of the series.
2) Median values are always a certain specific value in the series.
3) The median value can be estimated also through the graphic
presentation of data.
Demerits of median
1) median is of limited representative character as it is not based on all
the items in the series.
2) When the median is located somewhere between the two middle
values, it remains only an approximate measure, not a precise value.
Page 9
3) MODE: The value of the variable which occurs most
frequently in a distribution is called the mode.
Merits of mode
1) the calculation of mode does not require knowledge of all
the items and frequencies of a distribution.
2) Mode is less affected by marginal values in the series.
3) Mode is very simple measure of central tendency.
Demerits of mode
1) Mode is an uncertain and vague measure of the central
tendency.
2) Mode is not capable of further algebraic treatment.
3) With frequencies of all items are identical, it is difficult to
identify the modal value.
Page 10
CHAPTER4: MEASURES OF DISPERSION
What is Dispersion?
The dispersion is the tendency of data to be scattered over a range.
Dispersion is the important feature of a frequency distribution. It is also
called spread or variation. Range, quartile deviation and standard
deviation are all measures of dispersion.
Measures of Dispersion
Measures of dispersion are important for describing the spread of the
data, or its variation around a central value. Two distinct samples may
have the same mean or median, but completely different levels of
variability, or vice versa. A proper description of a set of data should
include both of these characteristics. There are various methods that
can be used to measure the dispersion of a dataset, each with its own
set of advantages and disadvantages they are as follows:
Methods to Calculate:-
Page 11
1) RANGE: - A range is the most common and easily understandable
measure of dispersion. It is the difference between two extreme
observations of the data set.
IMPORTANCE OF RANGE
1) It is the simplest of the measure of dispersion
2) Easy to calculate and understand
3) Independent of change of origin
FORMULA: RANGE=Xmax-Xmin
Page 12
2) It uses half of the data
3) Independent of change of origin
4) The best measure of dispersion for open-end classification
FORMULA: Q3-Q1/2
Q1 =Quartile (array, quart)
Q3=Quartile (array, quart)
Page 13
3)STANDARD DEVIATION: A standard deviation is the positive
square root of the arithmetic mean of the squares of the deviations of the
given values from their arithmetic mean. It is denoted by a Greek letter
sigma, σ. It is also referred to as root mean square deviation.
FORMULA:
Page 14
CHAPTER 5: SKEWNESS AND KURTOSIS
KURTOSIS:
The degree of tailedness of a distribution is measured by
kurtosis. It tells us the extent to which the distribution is more
or less outlier-prone (heavier or light-tailed) than the normal
distribution. Kurtosis is all about the tails of the distribution —
not the peakedness or flatness. It is used to describe the
extreme values in one versus the other tail. It is actually the
measure of outliers present in the distribution.
Page 15
1) MESOKURTIC: (β= 3)
this distribution has kurtosis statistic similar to that of the
normal distribution. It means that the extreme values of the
distribution are similar to that of a normal distribution
characteristic.
2) LEPTOKURTIC :( β>3)
Distribution is longer, tails are fatter. Peak is higher and sharper
than Mesokurtic, which means that data are heavy-tailed or
profusion of outliers.
Outliers stretch the horizontal axis of the histogram graph,
which makes the bulk of the data appear in a narrow (“skinny”)
vertical range, thereby giving the “skinniness” of a leptokurtic
Page 16
distribution. Some have thus characterized leptokurtic
distributions as “concentrated toward the mean,” but the more
relevant issue is that there are occasional extreme outliers that
cause this “concentration” appearance.
3) PLATYKURTIC: (β<3)
Distribution is shorter; tails are thinner than the normal
distribution. The peak is lower and broader than Mesokurtic,
which means that data are light-tailed or lack of outliers.
The reason for this is because the extreme values are less than
that of the normal distribution.
Page 17
Page 18
SKEWNESS
Skewness is asymmetry in a statistical distribution, in which the
curve appears distorted or skewed either to the left or to the
right. It is the degree of distortion from the symmetrical bell
curve or the normal distribution. Skewness can be quantified to
define the extent to which a distribution differs from a normal
distribution. It measures the lack of symmetry in data
distribution.
It differentiates extreme values in one versus the other tail. A
symmetrical distribution will have a skewness of 0.
TYPES OF SKEWNESS:
1) SYMETRIC SKEW: NORMAL
2) ASYMETRIC SKEW: POSITIVE SKEW AND NEGATIVE SKEW
Page 19
SYMETRIC
ASYMETRIC
Page 20
MEASURES OF SKEWNESS
1) KARL PEARSON’S COEFFICIENT OF SKEWNESS:
The method is most frequently used for measuring skewness.
The formula for measuring coefficient of skewness is as
follows:-
Page 21
2) BOWLEY’S COEFFICIENT OF SKEWNESS:
This method is based on quartiles. The value of this coefficient
of skewness will be zero if it is a symmetrical distribution, if the
value is greater than zero it would be positively skewed and if
the value is less than zero, it would be a negatively skewed
distribution. This method is particularly useful in case of open
end distributions and where extreme values are present. Also,
when positional measures are called for, skewness should be
measured by bowley’s coefficient of skewness.
FORMULA:-
Page 22
by formula in coefficient of skewness we have to determine
the values of 10th, 50th and 90th percentiles first.
Page 23
CHAPTER 6: CORRELATION
Correlation is a statistical measure that indicates the extent to
which two or more variables fluctuate together. It is a statistical
method which enables the researcher to find whether two
variables are related and to what extent they are related.
Correlation is considered as the sympathetic movement of two
or more variables.
TYPES OF CORRELATION
1) POSITIVE CORRELATION
2) NEGATIVE CORRELATION
3) NO CORRELATION
Page 24
2) NEGATIVE CORRELATION:- Negati ve
correlati on occurs when an increase in one variable
decreases the value of another.
3) NO CORRELATION : - No correlati on occurs
when there is no linear dependency between the
variables.
Page 25
CHAPTER 7:- REGRESSION
Regression analysis is used to estimate the relationships
between two or more variables: Dependent variable is the
main factor you are trying to understand and predict.
Independent variables are the factors that might influence the
dependent variable. Regression analysis helps you understand
how the dependent variable changes when one of the
independent variables varies and allows to mathematically
determining which of those variables really has an impact.
Page 26
R SQUARE: - It is the Coefficient of Determination, which is used
as an indicator of the goodness of fit. It shows how many points
fall on the regression line. The R2 value is calculated from the
total sum of squares, more precisely; it is the sum of the
squared deviations of the original data from the mean.In our
example, R2 is 0.86, which is a good fit It means that 86% of our
values fit the regression analysis model. In other words, 86% of
the dependent variables (y-values) are explained by the
independent variables (x-values).
P – VALUE: - The p-value for each term tests the null hypothesis
that the coefficient is equal to zero (no effect). A high p-value (<
0.05) indicates that you can accept the null hypothesis. In other
words, a predictor that has a high p-value is likely to be a
meaningful addition to your model because changes in the
Page 27
predictor's value are related to changes in the response
variable.
REGRESSION GRAPH:-
Y
80000
70000
60000
50000 Y
Linear (Y)
40000
30000
20000
10000
0
0 10000 20000 30000 40000 50000 60000 70000
Page 28
An ANOVA test (analysis of variance) is a way to find out if
survey or experiment results are significant. In other words,
they help you to figure out if you need to reject the null
hypothesis or accept the alternate hypothesis. Basically, you’re
testing groups to see if there’s a difference between them. For
example a manufacturer has two different processes to make
light bulbs. They want to know if one process is better than the
other.
ONE WAY ANOVA: - ANOVA –SINGLE FACTOR
The one-way analysis of variance (ANOVA) is used to determine
whether there are any statistically significant differences
between the means of three or more independent (unrelated)
groups.
In a one-way ANOVA there are two possible hypotheses.
(1) The null hypothesis (H0) is that there is no difference
between the groups and equality between means.
(2) The alternative hypothesis (H1) is that there is a difference
between the means and groups.
Page 29
F CRIT: - it is known as the critical point if the value of F is
beyond the F critical value in this case we reject the null
hypothesis and if F value is within the F critical in this case we
accept the null hypothesis. Thus F critical is also known as line
or point of rejection.
Page 30
A two-way ANOVA test is a statistical test used to determine
the effect of two nominal predictor variables on a continuous
outcome variable. A two-way ANOVA tests the effect of two
independent variables on a dependent variable. A two-way
ANOVA test analyzes the effect of the independent variables on
the expected outcome along with their relationship to the
outcome itself.
EXAMPLE: -
A new fertilizer has been developed to increase the yield on
crops, and the makers of the fertilizer want to better
understand which of the three formulations (blends) of this
fertilizer are most effective for wheat, corn, barley and rice
(crops). They test each of the three blends on one sample of
each of the four types of crops.
HYPOTHESIS: - There are two null hypotheses: one for the rows
and the other for the columns. They are as follows:-
For rows:-
H0: there is no significant difference in yield between the
(population) means of the blends
for columns: -
H0: there is no significant difference in yield between the
(population) means for the crop types
Page 31
ANALYSIS :-
In this case for rows the F value is = 12.8264 and F critical value
is = 5.143 in this case F value is beyond the F critical value so we
will reject the null hypothesis and will reject that there is no
significant difference in yield between the (population) means
of the blends
Page 32
(2) FOR COLUMNS:-
Page 33
Page 34
Page 35
Page 36
Page 37
Page 38
Page 39
Page 40
Page 41
Page 42
Page 43
Page 44
Page 45
Page 46
Page 47
Page 48
Page 49
Page 50
Page 51
Page 52
Page 53
Page 54
Page 55
Page 56
Page 57
Page 58
Page 59
Page 60
Page 61
Page 62
Page 63
Page 64
Page 65
Page 66
Page 67
Page 68
Page 69
Page 70
Page 71
Page 72
Page 73
Page 74
Page 75
Page 76