MS Excel NNN
MS Excel NNN
MS Excel NNN
1
TABLE OF CONTENTS
MEGASTAT………………………………………………. 28-29
ANOVA…………………………………………………… 35-41
2
STATISTICS:
Statistics is a branch of mathematics that transforms numbers into useful information for the
decision maker. Statistics does this by providing a set of methods for analyzing the numbers.
BASIS TERMINOLOGY
The basis terms used in statistics are:
o Variable: It is a characteristic of an item or individual. It is something that
changes or varies.
For ex: In a business, we observe sales, expenses and net profits to have different
values from year to year. These different values are the “DATA” to be analyzed.
o Population: It consists of all the items or individuals about which you would
like to draw conclusion.
o Parameter: It is a numerical measure that describes a characteristic of a
population.
o Sample: It is the portion of a population selected for analysis.
o Statistic: It is a single value obtained to describe the relevant characteristics
about a sample.
3
BRANCHES OF STATISTICS
DESCRIPTIVE INFERENTIAL
STASTISTICS STATISTICS
DATA COLLECTION:
The process of collecting data is known as data collection.
EXPERIMENT
A coin toss has all the attributes of a statistical experiment. There is more than one possible
outcome. We can specify each possible outcome in advance - heads or tails. And there is an
element of chance. We cannot know the outcome until we actually flip the coin.
4
An experiment involves the creation of an artificial situation in order that the researcher can
manipulate one or more variables while controlling all other variables and measuring the
resultant effects.
TYPES OF EXPERIMENT
FIELD LAB
EXPERIMENT
EXPERIMEN
T
1. PRINCIPLE OF RANDOMIZATION:
This principle indicates that we should design or plan the experiment in such a way that
the variations caused by extraneous factors can all be combined under the general
heading of “chance.” For instance, if we grow one variety of rice, say, in the first half of
the parts of a field and the other variety is grown in the other half, then it is just possible
that the soil fertility may be different in the first half in comparison to the other half. If
5
this is so, our results would not be realistic. In such a situation, we may assign the variety
of rice to be grown in different parts of the field on the basis of some random sampling
technique i.e., we may apply randomization principle and protect ourselves against the
effects of the extraneous factors (soil fertility differences in the given case). As such,
through the application of the principle of randomization, we can have a better estimate
of the experimental error.
3. PRINCIPLE OF REPLICATION:
According to the Principle of Replication, the experiment should be repeated more than
once. Thus, each treatment is applied in many experimental units instead of one. By
doing so the statistical accuracy of the experiments is increased. For example, suppose
we are to examine the effect of two varieties of rice. For this purpose we may divide the
field into two parts and grow one variety in one part and the other variety in the other
part. We can then compare the yield of the two parts and draw conclusion on that basis.
But if we are to apply the principle of replication to this experiment, then we first divide
the field into several parts, grow one variety in half of these parts and the other variety in
the remaining parts. We can then collect the data of yield of the two varieties and draw
conclusion by comparing the same. The result so obtained will be more reliable in
comparison to the conclusion we draw without applying the principle of replication. The
entire experiment can even be repeated several times for better results. Conceptually
6
replication does not present any difficulty, but computationally it does. For example, if an
experiment requiring a two-way analysis of variance is replicated, it will then require a
three-way analysis of variance since replication itself may be a source of variation in the
data. However, it should be remembered that replication is introduced in order to increase
the precision of a study; that is to say, to increase the accuracy with which the main
effects and interactions can be estimated.
A. CONDUCTING AN EXPERIMENT
1. Selection of relevant variables: While conducting an experiment, a researcher
needs to select those variables that best operationalise the concepts.
For ex: If we want to study the impact of salary on the performance of the employees
in an organization , the salary may be divided in to three categories like high , middle
and low level, so that three levels of independent variables can be there for measurement.
7
B. CLASSIFICATION OF EXPERIMENTAL DESIGNS
Experimental
designs
TRUE- STATISTICAL
PRE- QUASI-
EXPERIMENTAL DESIGNS
EXPERIMENTAL EXPERIMENTAL
DESIGNS
DESIGNS DESIGNS
c) Static-group comparison:
A group that has experienced some treatment is compared with one that has not. Observed
differences between the two groups are assumed to be a result of the treatment.
2. QUASI-EXPERIMENTAL DESIGNS:
8
A quasi-experiment is simply defined as not a true experiment. Since the main component of a
true experiment is randomly assigned groups, this means a quasi-experiment does not have
randomly assigned groups. Why are randomly assigned groups so important since they are the
only difference between quasi-experimental and true experimental?
When performing an experiment, a researcher is attempting to demonstrate that variable A
influences or causes variable B to do something. They want to demonstrate cause and effect.
Random assignment helps ensure that there is no pre-existing condition that will influence the
variables and mess up the results.
This design involves a series of periodic measurement on the dependent variable for a
group of a test unit.
2. Multiple Time Series Design: These design may be presented symbolically as:
Experimental O1 O2 O3 O4 X O5 O6 O7
Group O8
Control Group O’1 O’2 O’3 O’5 O’6 O’7 O’8
O’4
In the example of sales training programme, multiple time-series design can be presented
as follows:
Group1(salesman) O1 O2 O3 O4 X(Training) O5 O6 O7
O8(Sales after
Training)
Group2(salesman) O’1 O’2 O’3 O’4 No Training O’5 O’6 O’7 O’8
(Sales without
training)
9
3. TRUE EXPERIMENTAL DESIGN: A true experiment has one main
component - randomly assigned groups. This translates to every participant having an
equal chance of being in the experimental group, where they are subject to a
manipulation, or the control group, where they are not manipulated.
3. LATIN SQUARE DESIGN: This design is used when we want to separate out
the effect of two extraneous variables.
10
SURVEY
ADVANTAGES OF SURVEY
Can be developed in less time (compared to other data-collection methods)
Cost-effective, but cost depends on survey mode
Can be administered remotely via online, mobile devices, mail, email, kiosk, or
telephone.
Relatively easy to administer
Conducted remotely can reduce or prevent geographical dependence
Capable of collecting data from a large number of respondents
Numerous questions can be asked about a subject, giving extensive flexibility in data
analysis
With survey software, advanced statistical techniques can be utilized to analyze survey
data to determine validity, reliability, and statistical significance, including the ability to
analyze multiple variables
A broad range of data can be collected (e.g., attitudes, opinions, beliefs, values, behavior,
factual).
Standardized surveys are relatively free from several types of errors
DISADVANTAGES OF SURVEY
Respondents may not feel encouraged to provide accurate, honest answers
Respondents may not feel comfortable providing answers that present themselves in a
unfavorable manner.
Respondents may not be fully aware of their reasons for any given answer because of
lack of memory on the subject, or even boredom.
Surveys with closed-ended questions may have a lower validity rate than other question
types.
Data errors due to question non-responses may exist. The number of respondents who
choose to respond to a survey question may be different from those who chose not to
respond, thus creating bias.
Survey question answer options could lead to unclear data because certain answer options
may be interpreted differently by respondents. For example, the answer option
“somewhat agree” may represent different things to different subjects, and have its own
meaning to each individual respondent. ‘Yes’ or ‘no’ answer options can also be
problematic. Respondents may answer “no” if the option “only once” is not available.
Customized surveys can run the risk of containing certain types of errors
11
DATA ENTRY IN MS-EXCEL
DATA:
Information in raw or unorganized form (such as alphabets, numbers, or symbols) that
refer to , or represent, condition, ideas , or objects is called data. Data is limitless and can
be present everywhere in the universe
TYPES OF DATA
1. PRIMARY DATA: Primary data means original data that has been collected
specially for the purpose in mind. It means someone collected the data from the original
source first hand Data collected this way is called primary data.
2. SECONDARY DATA: Secondary data is the data that have been already collected by
and readily available from other sources. Such data are cheaper and more quickly
obtainable than the primary data and also may be available when primary data can not be
obtained at all.
12
3. 1) Time and Cost effective: Usually time and cost required to collect secondary data is
less than efforts required to collect primary data. Data is available freely or at far lesser
cost through secondary sources.
4.
2) Extensiveness of data: Data collected by governments and other institutes is usually
very extensive and covers a large spectrum of issues. An organization can filter that data
and consider only parts which they are targeting.
A. DATA ENTRY: Direct input of data in the appropriate data fields of a database,
through the use of a human data-input device such as a keyboard, mouse, stylus,
or touch screen, or through speech recognition software. See also data capture and
data logging.
Here we take example to understand how entry will be done in Microsoft Excel.
Suppose we want to enter data of salary of 10 employees.
13
In above worksheet in column 1st we show the identify number of 10 employees. The 2nd column
contains the data coding entered for the gender (2 for male). The 3rd column contains the salary
of employees.
1. Open the workbook that you want to save as an Excel 2007 workbook.
14
2. Click the Microsoft Office Button , and then click Save As.
3. In the File name box, accept the suggested name or type a new name for the workbook.
15
4. In the Save as type list, do one of the following: ...
16
5. Click Save.
DESCRIPTIVE STATISTICS
Meaning:
Descriptive statistics are used to describe the basic features of the data in a study. They provide
simple summaries about the sample and the measures. Together with simple graphics analysis,
they form the basis of virtually every quantitative analysis of data.
Descriptive statistics are broken down into measures of central tendency and measures
of variability, or spread. Measures of central tendency include the mean, median and mode,
while measures of variability include the standard deviation or variance, the minimum and
maximum variables, and the kurtosis and skewness.
17
There are three measures of central tendency.
Arithmetic Mean
Median
Mode
II. Median: The median is also the number that is halfway into the set. To find the median,
the data should first be arranged in order from least to greatest. To remember the
definition of a median, just think of the median of a road, which is the middlemost part
of the road.
Median represent by ‘M’.
If total items in the series is odd then use;
M= (N+1/2)th item .
III. Mode: The mode is the value that appears most often in a set of data. The mode of
a discrete probability distribution is the value x at which its probability mass
function takes its maximum value. In other words, it is the value that is most likely to be
sampled. The mode of a continuous probability distribution is the value x at which
its probability density function has its maximum value, so the mode is at the peak.
18
After left the some rows type MEAN, MEIAN, MODE
19
2. After type the formulas of mean, median and mode.
20
3. After it click on ‘ENTER’.
21
4.2 MEASURES OF DISPERSION
Simplest meaning that can be attached to the word ‘dispersion’ is a lack of uniformity in the
sizes or quantities of the items of a group or series. According to Reiglemen, “Dispersion is the
extent to which the magnitudes or quantities of the items differ, the degree of diversity.” The
word dispersion may also be used to indicate the spread of the data.
In all these definitions, we can find the basic property of dispersion as a value that indicates the
extent to which all other values are dispersed about the central value in a particular distribution.
(a) Range:
It is the simplest method of studying dispersion. Range is the difference between
the smallest value and the largest value of a series. While computing range, we do
not take into account frequencies of different groups.
Formula: Absolute Range = L – S
Coefficient of Range =
where, L represents largest value in a distribution
S represents smallest value in a distribution
(b) Quartile Deviations (Q.D.)
The concept of ‘Quartile Deviation does take into account only the values of the
‘Upper quartile (Q3) and the ‘Lower quartile’ (Q1). Quartile Deviation is also
called ‘inter-quartile range’. It is a better method when we are interested in
knowing the range within which certain proportion of the items fall.
‘Quartile Deviation’ can be obtained as :
(i) Inter-quartile range = Q3 – Q1
22
(d) Standard deviation:
In statistics, the standard deviation (SD, also represented by the Greek letter
sigma σ or the Latin letter s) is a measure that is used to quantify the amount of
variation or dispersion of a set of data values.[1] A low standard deviation
indicates that the data points tend to be close to the mean (also called the expected
value) of the set, while a high standard deviation indicates that the data points are
spread out over a wider range of values.
Formula: √∑d2/n
(f) SKEWNESS:
It is a term in statistics used to describes asymmetry from the normal
distribution in a set of statistical data. SKEWNESS can come in the form of
negative SKEWNESS or positive SKEWNESS, depending on whether data points
are skewed to the left and negative, or to the right and positive of the data
average. A dataset that shows this characteristic differs from a normal bell curve.
23
DESCRIPTIVE STATISTICS USING MS-EXCEL
1. TYPE THE DATA INTO EXCEL, in a single column.
FOR EX: we have twelve items in our data set; type into cells A2 through A13.
24
A window named Descriptive Statistics will open.
25
4. Type an input range into the “Input Range” text box.
For ex: “A1:A13” into the box or we can select range .
5. Check the box “Labels in first row , as we have titled the column in row 1
otherwise leave the box unchecked.
6. Type a cell location into the “Output Range” box. For ex, type “A17” ,
Make sure that two adjacent columns do not have data in them.
7. Click the “Summary Statistics” check box and then click “OK” to
display descriptive statistics.
26
27
MEGASTAT
MegaStat for Excel is a full-featured Excel add-in that performs statistical analyses with an Excel
workbook. It performs basic functions, such as descriptive statistics, frequency distributions, and
probability calculations as well as hypothesis testing, ANOVA, regression, and more.
1. Start Excel
2. Click: File → Options 3. Click Add-Ins on the left menu list. You will now see a list of Excel
Add-Ins. MegaStat should be in the Inactive Application Add-ins list as shown here:
4. Click Go... for Manage Excel Add-Ins near the bottom of the screen and the Add-ins window
will appear.
5. Click the check box next to MegaStat in the Add-Ins list unless it is already checked. Click
OK when MegaStat is checked. If more than one MegaStat is listed you probably did not
uninstall an earlier version. Check only the latest version and after completing the next step go to
Control Panel and uninstall the previous version.
28
6. Click the Add-Ins ribbon. MegaStat will be on the ribbon and ready to use as shown
below. Your computer may also show other installed add-ins. Your particular setup may
look slightly different because of different colors and schemes. MegaStat should be on
the Add-Ins ribbon whenever you open Excel and should remain on the Add-ins ribbon
until you remove it.
29
HYPOTHESIS TESTING IN MS EXCEL
T-TEST: A T-TEST is an analysis of two populations means through the use of statistical
examination; at-test with two samples is commonly used with small sample size , testing the
difference between the samples when the variance of two normal distribution are not known.
To test the significance of the difference between two mean in two situation in case of pair
data ,the appropriate test statistics ‘t’ to be used:
d
T= .√ n
s
Procedure:
4. Calculate Table value of t for n-1 degree of freedom at a specific level of significance
using TINV function.
5. Decision: If the computed value is more than the table value, the differences is said to be
significance otherwise insignificant.
30
T TEST IN MS EXCEL
31
32
Decision: the calculated value of t=0.82 is less than the table value 4.60409. Therefore the null
hypothesis is accepted and it is concluded that there is no change in IQ after training programme.
2. Z-TEST IN MS EXCEL
A z-test is a statistical test used to determine whether two population means are different when
the variances are known and the sample size is large. The test statistic is assumed to have a
normal distribution, and nuisance parameters such as standard deviation should be known for an
accurate z-test to be performed.
33
34
ANOVA
Analysis of Variance (ANOVA) is a statistical method used to test differences between two or
more means. It may seem odd that the technique is called "Analysis of Variance" rather than
"Analysis of Means." As you will see, the name is appropriate because inferences about means
are made by analyzing variance.
TECHNIQUES OF ANOVA
where µ = group mean and k = number of groups. If, however, the one-way ANOVA returns a
statistically significant result, we accept the alternative hypothesis (HA), which is that there are
at least two group means that are statistically significantly different from each other.
35
2. Under the DATA TAB click on DATA ANALYSIS. A window named DATA
ANALYSIS will open.
36
37
TWO WAY ANOVA: In statistics, the two-way analysis of variance (ANOVA) is an
extension of the one-way ANOVA that examines the influence of two
different categorical independent variables on one continuous dependent variable. The
two-way ANOVA not only aims at assessing the main effect of each independent
variable but also if there is any interaction between them.
38
4. A window named ANOVA:TWO FACTOR WITHOUT REPLICATION will
open.
5. In INPUT RANGE box, select range.
6. Check the Labels box.
7. In ALPHA box type level of significance.
8. In OUTPUT OPTION , click on OUTPUT RANGE radio button and select
any cell where you wants to results.
9. CLICK on OK.
39
40
INTERPRETATION:
FOR ZONES:
Since, the calculated value of F is less than the table value of F , THE NULL HYPOTHESIS is
accepted and it can be concluded that all the Zones are similar so far as sales is concerned.
FOR SALESMAN:
The calculated value of F = 0.75
Table / critical value of F =4.757062663
Since, the calculated value of F is less than the table value of F , THE NULL HYPOTHESIS is
accepted and it can be concluded that there is no difference in the sales of four salesman.
41