Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
48 views

Basic Functions of Excel

The document provides information about various functions and analysis tools in Excel including SUM, PRODUCT, AVERAGE, MAX, MIN, COUNT. It also discusses different types of graphs like bar graphs, line graphs and pie charts that can be used for data visualization. Further, it explains measures of central tendency such as mean, median and mode. The document concludes by covering measures of dispersion like range, quartile deviation, standard deviation and concepts of skewness and kurtosis.

Uploaded by

nikita aggarwal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Basic Functions of Excel

The document provides information about various functions and analysis tools in Excel including SUM, PRODUCT, AVERAGE, MAX, MIN, COUNT. It also discusses different types of graphs like bar graphs, line graphs and pie charts that can be used for data visualization. Further, it explains measures of central tendency such as mean, median and mode. The document concludes by covering measures of dispersion like range, quartile deviation, standard deviation and concepts of skewness and kurtosis.

Uploaded by

nikita aggarwal
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 76

CHAPTER 1: BASIC FUNCTIONS OF EXCEL

A) SUM: -The total amount coming from the addition of two


or more numbers, amounts, or items.

FORMULA: =SUM (number1, number2…. n)

B) PRODUCT: - The total amount coming from the


multiplication of two or more numbers.

FORMULA: =PRODUCT (number1, number2….n)

Page 1
C) AVERAGE: -Average is the sum of all the numbers divided
by the number of addends or respondents.

FORMULA: =AVERAGE (number1, number2…n)

D) MAXIMUM: - MAX function returns the largest value


from a supplied set of numeric values.

FORMULA: =MAX (number1, number2…..n)

Page 2
E) MINIMUM: - MIN function returns the smallest value
from a supplied set of numeric values.

FORMULA: =MIN (number1, number2…..n)

F) COUNT: - COUNT function returns the total number of


values from a supplied set of value.

FORMULA: =COUNT (number1, number2…..n)

Page 3
CHAPTER 2: GRAPHICAL REPRESENTATION OF DATA

Graphic representation is another way of analyzing numerical


data. A graph is a sort of chart through which statistical data
are represented in the form of lines or curves drawn across the
coordinated points plotted on its surface. Graphs enable us in
studying the cause and effect relationship between two
variables. Graphs help to measure the extent of change in one
variable when another variable changes by a certain amount.
Graphs are also easy to understand and eye catching.

TYPES OF DIAGRAMS:

1) BAR GRAPHS: A bar graph is a very frequently used graph


in statistics. A bar graph is a type of graph which contains
rectangular bars. The lengths of these bars should be
proportional to the numerical values represented by them. In
bar graph, the bars may be plotted either horizontally or
vertically. But a vertical bar graph is used more than a
horizontal one.

IMPORTANCE OF BAR GRAPHS


1) a bar diagram makes it easy to compare sets of data
between different groups at a glance.

Page 4
2) The graph represents categories on one axis and a discrete
value in the other. The goal is to show the relationship between
the two axes
3) Bar charts can also show big changes in data over time.

2) LINE GRAPHS: A line graph is a kind of graph which


represents data in a way that a series of points are to be
connected by segments of straight lines. In a line graph, the
data points are plotted on a graph and they are joined together
with straight line.

Page 5
IMPORTANCE OF LINE GRAPHS
1) Line graphs are used to track changes over short and long
periods of time.

2) A line graph gives a clear visual comparison between two


variables which are represented on X-axis and Y-axis.

3) When smaller changes exist, line graphs are better to use


than bar graphs.

3) PIE CHARTS: A pie chart is defined as a chart which contains


a circle which is divided into sectors and segments. These
segments and sectors into which a pie chart is being divided
form a certain portion of the total (in terms of percentage). In

Page 6
the pie-chart, the total of all the data is equal to 360 degrees.
The degrees of angles that are used to represent different items
are calculated in the form of proportionality.

IMPORTANCE OF PIE CHARTS


Pie charts are best to use when you are trying to compare parts
of a whole. They do not show changes over time.

Page 7
CHAPTER 3: MEASURES OF CENTRAL TENDENCY
1) MEAN: mean is the sum of the value of each observation in
a dataset divided by the number of observations. This is also
known as the arithmetic average.

Merits of mean:
1) It is capable of being treated mathematically and hence it is widely
used in statistical analysis.
2) It based on all observations and it can be regarded as representative
of the given data.
3) It is least affected by the fluctuation of sampling.

Demerits of mean:
1) It can neither be determined by inspection or by graphical location
2) It is too much affected by extreme observations and hence it is not
adequately represent data consisting of some extreme point

FORMULA : - =AVERAGE(number1,number2….n)

Page 8
2) MEDIAN: The median is that value of the series which divides the
group into two equal parts, one part comprising all values greater than
the median value and the other part comprising all the values smaller
than the median value.

Merits of median
1) median value is not destroyed by the extreme values of the series.
2) Median values are always a certain specific value in the series.
3) The median value can be estimated also through the graphic
presentation of data.

Demerits of median
1) median is of limited representative character as it is not based on all
the items in the series.
2) When the median is located somewhere between the two middle
values, it remains only an approximate measure, not a precise value.

FORMULA: =MEDIAN (number1, number2…n)

Page 9
3) MODE: The value of the variable which occurs most
frequently in a distribution is called the mode.

Merits of mode
1) the calculation of mode does not require knowledge of all
the items and frequencies of a distribution.
2) Mode is less affected by marginal values in the series.
3) Mode is very simple measure of central tendency.

Demerits of mode
1) Mode is an uncertain and vague measure of the central
tendency.
2) Mode is not capable of further algebraic treatment.
3) With frequencies of all items are identical, it is difficult to
identify the modal value.

FORMULA: =MODE (number1, number2…n)

Page 10
CHAPTER4: MEASURES OF DISPERSION
What is Dispersion?
The dispersion is the tendency of data to be scattered over a range.
Dispersion is the important feature of a frequency distribution. It is also
called spread or variation. Range, quartile deviation and standard
deviation are all measures of dispersion.

Measures of Dispersion
Measures of dispersion are important for describing the spread of the
data, or its variation around a central value. Two distinct samples may
have the same mean or median, but completely different levels of
variability, or vice versa. A proper description of a set of data should
include both of these characteristics. There are various methods that
can be used to measure the dispersion of a dataset, each with its own
set of advantages and disadvantages they are as follows:

Methods to Calculate:-

Page 11
1) RANGE: - A range is the most common and easily understandable
measure of dispersion. It is the difference between two extreme
observations of the data set.

IMPORTANCE OF RANGE
1) It is the simplest of the measure of dispersion
2) Easy to calculate and understand
3) Independent of change of origin

FORMULA: RANGE=Xmax-Xmin

2) QUARTILE DEVIATION: The quartiles divide a data set into


quarters. The first quartile, (Q1) is the middle number between the
smallest number and the median of the data. The second quartile, (Q2) is
the median of the data set. The third quartile, (Q3) is the middle number
between the median and the largest number.

IMPORTANCE OF QUARTILE DEVIATION


1) All the drawbacks of Range are overcome by quartile deviation

Page 12
2) It uses half of the data
3) Independent of change of origin
4) The best measure of dispersion for open-end classification

Q1 first quartile (25th percentile)

Q2 Second quartile (50th percentile)/median

Q3 third quartile (75th percentile)

FORMULA: Q3-Q1/2
Q1 =Quartile (array, quart)
Q3=Quartile (array, quart)

Page 13
3)STANDARD DEVIATION: A standard deviation is the positive
square root of the arithmetic mean of the squares of the deviations of the
given values from their arithmetic mean. It is denoted by a Greek letter
sigma, σ. It is also referred to as root mean square deviation.

IMPORTANCE OF STANDARD DEVIATION


1) Squaring the deviations overcomes the drawback of ignoring signs in
mean deviations
2) Suitable for further mathematical treatment
3) Least affected by the fluctuation of the observations
4) The standard deviation is zero if all the observations are constant
5) Independent of change of origin

FORMULA:

=STDEVP (number1, number2…..n)

Page 14
CHAPTER 5: SKEWNESS AND KURTOSIS
KURTOSIS:
The degree of tailedness of a distribution is measured by
kurtosis. It tells us the extent to which the distribution is more
or less outlier-prone (heavier or light-tailed) than the normal
distribution. Kurtosis is all about the tails of the distribution — 
not the peakedness or flatness. It is used to describe the
extreme values in one versus the other tail. It is actually the
measure of outliers present in the distribution.

DIFFERENT CATEGORIES OF KURTOSIS DISTRIBUTION:

Page 15
1) MESOKURTIC: (β= 3)
this distribution has kurtosis statistic similar to that of the
normal distribution. It means that the extreme values of the
distribution are similar to that of a normal distribution
characteristic.

2) LEPTOKURTIC :( β>3)
Distribution is longer, tails are fatter. Peak is higher and sharper
than Mesokurtic, which means that data are heavy-tailed or
profusion of outliers. 
Outliers stretch the horizontal axis of the histogram graph,
which makes the bulk of the data appear in a narrow (“skinny”)
vertical range, thereby giving the “skinniness” of a leptokurtic

Page 16
distribution. Some have thus characterized leptokurtic
distributions as “concentrated toward the mean,” but the more
relevant issue is that there are occasional extreme outliers that
cause this “concentration” appearance.

3) PLATYKURTIC: (β<3)
Distribution is shorter; tails are thinner than the normal
distribution. The peak is lower and broader than Mesokurtic,
which means that data are light-tailed or lack of outliers.
The reason for this is because the extreme values are less than
that of the normal distribution.

FORMULA: =KURT (number1, number2…..n)

Page 17
Page 18
SKEWNESS
Skewness is asymmetry in a statistical distribution, in which the
curve appears distorted or skewed either to the left or to the
right. It is the degree of distortion from the symmetrical bell
curve or the normal distribution. Skewness can be quantified to
define the extent to which a distribution differs from a normal
distribution. It measures the lack of symmetry in data
distribution.
It differentiates extreme values in one versus the other tail. A
symmetrical distribution will have a skewness of 0.
TYPES OF SKEWNESS:
1) SYMETRIC SKEW: NORMAL
2) ASYMETRIC SKEW: POSITIVE SKEW AND NEGATIVE SKEW

Page 19
SYMETRIC

NORMAL: in a normal distribution, the graph appears as a


classical, symmetrical "bell-shaped curve." The mean or
average, and the mode or maximum point on the curve, are
equal. In a perfect normal distribution, the tails on either side
of the curve are exact mirror images of each other.

ASYMETRIC

POSITIVE SKEW: When a distribution is skewed to the


right, the tail on the curve's right-hand side is longer than the
tail on the left-hand side, and the mean is greater than the
mode. This situation is called positive skewness

NEGATIVE SKEW: When a distribution is skewed to the left


the tail on the curve's left-hand side is longer than the tail on
the right-hand side, and the mean is less than the mode. This
situation is called negative skewness.

Page 20
MEASURES OF SKEWNESS
1) KARL PEARSON’S COEFFICIENT OF SKEWNESS:
The method is most frequently used for measuring skewness.
The formula for measuring coefficient of skewness is as
follows:-

the Karl Pearson’s coefficient of skewness is based on the


relationship as the formula for the empirical mode. The
direction of skewness is determined by observing whether the
mean is greater than mode or less than mode. The extent of
departure from symmetry is ascertained by observing the
extent to which the mean is pulled away from mode. The
extent of departure is expressed in standard units in order to
obtain a measure that is independent of the unit of
measurement.

Page 21
2) BOWLEY’S COEFFICIENT OF SKEWNESS:
This method is based on quartiles. The value of this coefficient
of skewness will be zero if it is a symmetrical distribution, if the
value is greater than zero it would be positively skewed and if
the value is less than zero, it would be a negatively skewed
distribution. This method is particularly useful in case of open
end distributions and where extreme values are present. Also,
when positional measures are called for, skewness should be
measured by bowley’s coefficient of skewness.

FORMULA:-

3. KELLEY’S COEFFICIENT OF SKEWNESS: Another


measure of skewness devised by Kelly is based on percentiles
and deciles

Page 22
by formula in coefficient of skewness we have to determine
the values of 10th, 50th and 90th percentiles first.

FORMULA FOR SKEWNESS:

=SKEW (number1, number2….n)

Page 23
CHAPTER 6: CORRELATION
Correlation is a statistical measure that indicates the extent to
which two or more variables fluctuate together. It is a statistical
method which enables the researcher to find whether two
variables are related and to what extent they are related.
Correlation is considered as the sympathetic movement of two
or more variables.

TYPES OF CORRELATION
1) POSITIVE CORRELATION
2) NEGATIVE CORRELATION
3) NO CORRELATION

1) POSITIVE CORRELATION: - Positi ve correlati on


occurs when an increase in one variable increases the
value in another.

Page 24
2) NEGATIVE CORRELATION:- Negati ve
correlati on occurs when an increase in one variable
decreases the value of another.
3) NO CORRELATION : - No correlati on occurs
when there is no linear dependency between the
variables.

FORMULA: - =CORREL (array1, array2…..n)

Page 25
CHAPTER 7:- REGRESSION
Regression analysis is used to estimate the relationships
between two or more variables: Dependent variable is the
main factor you are trying to understand and predict.
Independent variables are the factors that might influence the
dependent variable. Regression analysis helps you understand
how the dependent variable changes when one of the
independent variables varies and allows to mathematically
determining which of those variables really has an impact.

REGRESSION EQUATION = Y=BX+A

where, Y=is a dependent variable


x= is an independent variable.
b= is the slope of a regression line, which is the rate of
change for y as x changes.
a= is the Y-intercept, which is the expected mean value
of y when all x variables are equal to 0.

EXAMPLE: - Survey of 7 families has being taken on the yearly


expenditure on wheat and rice where wheat is denoted as X
and rice is denoted as Y so by using regression analysis we will
interpret this data the null hypothesis for this is that: - null
hypothesis is that there is no significance difference between
the expenditure on rice and wheat yearly .

Page 26
R SQUARE: - It is the Coefficient of Determination, which is used
as an indicator of the goodness of fit. It shows how many points
fall on the regression line. The R2 value is calculated from the
total sum of squares, more precisely; it is the sum of the
squared deviations of the original data from the mean.In our
example, R2 is 0.86, which is a good fit It means that 86% of our
values fit the regression analysis model. In other words, 86% of
the dependent variables (y-values) are explained by the
independent variables (x-values).

P – VALUE: - The p-value for each term tests the null hypothesis
that the coefficient is equal to zero (no effect). A high p-value (<
0.05) indicates that you can accept the null hypothesis. In other
words, a predictor that has a high p-value is likely to be a
meaningful addition to your model because changes in the

Page 27
predictor's value are related to changes in the response
variable.

Conversely, a lower (insignificant) p-value suggests that


changes in the predictor are not associated with changes in the
response.

In the output, we can see that the expenditure is greater than


the common alpha level of 0.05; it means we will accept the
null hypothesis which is there is no significance difference
between the expenditure on rice and wheat yearly.

REGRESSION GRAPH:-

Y
80000
70000
60000
50000 Y
Linear (Y)
40000
30000
20000
10000
0
0 10000 20000 30000 40000 50000 60000 70000

CHAPTER 8:- ANOVA

Page 28
An ANOVA test (analysis of variance) is a way to find out if
survey or experiment results are significant. In other words,
they help you to figure out if you need to reject the null
hypothesis or accept the alternate hypothesis. Basically, you’re
testing groups to see if there’s a difference between them. For
example a manufacturer has two different processes to make
light bulbs. They want to know if one process is better than the
other.
ONE WAY ANOVA: - ANOVA –SINGLE FACTOR
The one-way analysis of variance (ANOVA) is used to determine
whether there are any statistically significant differences
between the means of three or more independent (unrelated)
groups.
In a one-way ANOVA there are two possible hypotheses.
(1) The null hypothesis (H0) is that there is no difference
between the groups and equality between means.
(2) The alternative hypothesis (H1) is that there is a difference
between the means and groups.

EXAMPLE: - A school district uses four different methods of


teaching their students how to read and wants to find out if
there is any significant difference between the reading scores
achieved using the four methods. It creates a sample of 8
students for each of the four methods.

Page 29
F CRIT: - it is known as the critical point if the value of F is
beyond the F critical value in this case we reject the null
hypothesis and if F value is within the F critical in this case we
accept the null hypothesis. Thus F critical is also known as line
or point of rejection.

ANALYSIS OF THE EXAMPLE:-

In our example F=3.055 and F critical is = 2.94 and also p-value


= .04466 < .05 this means that the value of F is beyond the F
critical value in this case we reject the null hypothesis and
conclude that there are significant differences between the
methods

TWO WAY ANOVA:- ANOVA – WITHOUT


REPLICATION

Page 30
A two-way ANOVA test is a statistical test used to determine
the effect of two nominal predictor variables on a continuous
outcome variable. A two-way ANOVA tests the effect of two
independent variables on a dependent variable. A two-way
ANOVA test analyzes the effect of the independent variables on
the expected outcome along with their relationship to the
outcome itself.

EXAMPLE: -
A new fertilizer has been developed to increase the yield on
crops, and the makers of the fertilizer want to better
understand which of the three formulations (blends) of this
fertilizer are most effective for wheat, corn, barley and rice
(crops). They test each of the three blends on one sample of
each of the four types of crops.

HYPOTHESIS: - There are two null hypotheses: one for the rows
and the other for the columns. They are as follows:-

For rows:-
H0: there is no significant difference in yield between the
(population) means of the blends

for columns: -
H0: there is no significant difference in yield between the
(population) means for the crop types

Page 31
ANALYSIS :-

(1) FOR ROWS:-

In this case for rows the F value is = 12.8264 and F critical value
is = 5.143 in this case F value is beyond the F critical value so we
will reject the null hypothesis and will reject that there is no
significant difference in yield between the (population) means
of the blends

Page 32
(2) FOR COLUMNS:-

In this case for columns the F value is = 2.631 and F critical


value is = 4.757 in this case F value is within the F critical value
so we will accept the null hypothesis and will accept that there
is no significant difference in yield between the (population)
means for the crop types

Page 33
Page 34
Page 35
Page 36
Page 37
Page 38
Page 39
Page 40
Page 41
Page 42
Page 43
Page 44
Page 45
Page 46
Page 47
Page 48
Page 49
Page 50
Page 51
Page 52
Page 53
Page 54
Page 55
Page 56
Page 57
Page 58
Page 59
Page 60
Page 61
Page 62
Page 63
Page 64
Page 65
Page 66
Page 67
Page 68
Page 69
Page 70
Page 71
Page 72
Page 73
Page 74
Page 75
Page 76

You might also like