Q 1. (A) What Do You Understand by Word "Statistics", Give Out Its Definitions (Minimum by 4 Authors) As Explained by Various Distinguished Authors
Q 1. (A) What Do You Understand by Word "Statistics", Give Out Its Definitions (Minimum by 4 Authors) As Explained by Various Distinguished Authors
Q 1. (A) What Do You Understand by Word "Statistics", Give Out Its Definitions (Minimum by 4 Authors) As Explained by Various Distinguished Authors
Q 1. (a) What do you understand by word “Statistics”, give out its definitions (minimum by
4 Authors) as explained by various distinguished authors.
Answer 1 [a].
Prof. Horace Secrist
Croxton and Cowden
Prof, Ya Lun Chou
Wallis and Roberts
The word statistics in our everyday life means different things to different
people. To a football fan, statistics are the information about rushing yardage,
passing yardage, and first downs, given a halftime. To a manager of a power
generating station, statistics may be information about the quantity of
pollutants being released into the atmosphere.
And to a college student, statistics are the grades made on all the quizzes
in a course this semester.
Each of these people is using the word statistics correctly, yet each uses
it in a slightly different way and for a somewhat different purpose. Statistics is
a word that can refer to quantitative data or to a field of study. As a field of
study, statistics is the science of collecting, organizing and interpreting
numerical facts, which we call data. We are bombarded by data in our everyday
The collection and study of data are important in the work of many
professions, so that training in the science of statistics is valuable preparation
for variety of careers. Each month, for example, government statistical offices
release the latest numerical information on unemployment and inflation.
Farmers study data from field trials of new crop varieties. Engineers gather
data on the quality and reliability of manufactured of products. Most areas of
academic study make use of numbers, and therefore also make use of methods
of statistics. Whatever else it may be, statistics is, first and foremost, a
collection of tools used for converting raw data into information to help
decision makers in their works. The science of data - statistics - is the subject
of this course.
The word statistik comes from the Italian word statista (meaning
“statesman”). It was first used by Gottfried Achenwall (1719-1772), a professor
at Marlborough and Gottingen. Dr. E.A.W. Zimmermam introduced the word
statistics to England. Its use was popularized by Sir John Sinclair in his work
“Statistical Account of Scotland 1791-1799”. Long before the eighteenth
century, however, people had been recording and using data. Official
government statistics are as old as recorded history. The emperor Yao had
taken a census of the population in China in the year 2238 B.C. The Old
Testament contains several accounts of census taking.
In 1662, Captain John Graunt used thirty years of these Bills to make
predictions about the number of persons who would die from various diseases
and the proportion of male and female birth that could be expected.
Summarized in his work, Natural and Political Observations ...Made upon the
Bills of Mortality, Graunt’s study was a pioneer effort in statistical analysis. For
Submitted by: AMIT ALEXANDER Page 2
his achievement in using past records to predict future events, Graund was
made a member of the original Royal Society. The history of the development of
statistical theory and practice is a lengthy one. We have only begun to list the
people who have made significant contributions to this field. Later we will
encounter others whose names are now attached to specific laws and methods.
Many people have brought to the study of statistics refinements or innovations
that, taken together, form the theoretical basis of what we will study in this
7. Statistics are placed in relation to each other: The facts must be placed
in such a way that a comparative and analytical study becomes possible. Thus,
only related facts which are arranged in logical order can be called statistics.
Thus, Statistic is an important tool to figure out various complicated
calculation and is very useful to interpret the various categories.
(b) Enumerate some important development of statistical theory, also explain merits and
limitations of statistics.
Answer 1 [b]
During the century that followed this work, other authors, including
James and Nicholas Bernoulli, Pierre Rémond de Montmort, and Abraham
De Moivre, developed more powerful mathematical tools in order to calculate
odds in more complicated games. De Moivre, Thomas Simpson, and others
also used the theory to calculate fair prices for annuities and insurance
Karl Pearson developed the concept of square goodness of fit test. Sir Ronald
fischer (1890-1962) made a major contribution in the field of experimental
design turning into science. Since 1935 “design of experiments” has made
rapid progress making collection and analysis of statistical prompter and more
economical. Design of experiments is the complete sequence of steps taken
ahead of time to ensure that the appropriate data will be obtained, which will
permit an objective analysis and will lead to valid inferences regarding the
stated problem
Facilitating comparison
Helping in predictions.
It is only a tool or means to an end and not the end itself which has to be
intelligently identified using this tool.
James Bernoulli, Pierre Fermat, Blaise Pascal, Christian Huygens,
Abraham De Moivre, Pierre Simon Laplace's, Carl Friedrich Gauss, Pafnuty
Tchebycheff, Jacques quetlet, Francis galton, Karl Pearson are the names
of the tremendous contributors in the field of math’s and Statistic. They are the
pillars of the Statistic. Their contribution is very important for the whole of the
world and without them the statistics calculations would not be so easy as it
seems to be in present.
Q 2. (a) Define elementary theory of sets, also explain various methods by giving suitable
examples, Narrate the utility of “Set Theory” in an organization.
Answer 2 [a]
A set is a collection of items, objects or elements, which are governed by a rule
indicating weather an object belong to the set or not. In conventional notation
of sets,
Alphabets like A, B, C, X, U; S etc are used to denote sets. Braces like ‘{ }’ are
used as a notation for collection of objects or elements in the set.’ Greek letter
epsilon “ ” is used to denote “belongs to”. A vertical line ‘|’ is used to denote
expression ‘such that’. Alphabet ‘I’ is used to denote an ‘integer’. Using above
notation a set called a considering of elements 0, 1, 2,3,4,5 May be
mathematically denoted in any of the following manner
Read as A is (a set of) (variable x) (such that) (x lies between 0 and 5, both
WHERE C= {0, 1, 2, 3, 4, 5}
7) Equal sets if A is sub set of B and B is a subset of A then A and B are called
equal sets. This can be denoted as follows
If A B and B A then A= B
(b) Explain the meaning and type of “Data” as applicable in any business. How would you
classify and tabulate the Data, support your answer with examples.
Answer 2 [b]
Classification and tabulation of data:
Data is any group of observation or measurement related to the area of a
business interest and to be used for decision making. It can be of the following
two types.
2. Secondary data: (are compiled by someone other than the user of the data
for decision making purpose)
geographical areas;
chronicle sequences;
qualitative attributes like urban or rural,
male or female,
literate or illiterates,
under graduate ,
graduate or post graduate,
employed or unemployed and so on;
After data is classified it is represented in a tabular form. A self explanatory
and comprehensive table has a
table number,
title of the table,
caption (column or sub-column headings),
body containing the main data which occupies the cells of the table after the
data has been classified under various caption and stubs. Head notes are
added at the top of the table for general information regarding the relevance of
the table or for cross reference or links with other literature. Foot notes are
appended for clarification, explanation or as additional comments on any of the
cells in the table.]
Q 3. (a) Describe Arithmetic, Geometric and Harmonic means with suitable examples.
Explain merits and limitations of Geometric mean.
Answer 3 [a]
Arithmetic mean
The arithmetic mean is the "standard" average, often simply called the "mean".
It is used for many purposes but also often abused by incorrectly using it to
describe skewed distributions, with highly misleading results. The classic
example is average income - using the arithmetic mean makes it appear to be
much higher than is in fact the case. Consider the scores {1, 2, 2, 2, 3, 9}. The
arithmetic mean is 3.16, but five out of six scores are below this!
The arithmetic mean of a set of numbers is the sum of all the members of the
set divided by the number of items in the set. (The word set is used perhaps
somewhat loosely; for example, the number 3.8 could occur more than once in
such a "set".) The arithmetic mean is what pupils are taught very early to call
the "average." If the set is a statistical population, then we speak of the
population mean. If the set is a statistical sample, we call the resulting statistic
a sample mean. The mean may be conceived of as an estimate of the median.
When the mean is not an accurate estimate of the median, the set of numbers,
or frequency distribution, is said to be skewed.
We denote the set of data by X = {x1, x2, ..., xn}. The symbol µ (Greek: mu) is
used to denote the arithmetic mean of a population. We use the name of the
variable, X, with a horizontal bar over it as the symbol ("X bar") for a sample
mean. Both are computed in the same way:
The arithmetic mean is the "standard" average, often simply called the "mean".
The mean may often be confused with the median, mode or range. The mean is
the arithmetic average of a set of values, or distribution; however, for skewed
distributions, the mean is not necessarily the same as the middle value
(median), or the most likely (mode). For example, mean income is skewed
upwards by a small number of people with very large incomes, so that the
majority have an income lower than the mean. By contrast, the median income
is the level at which half the population is below and half is above. The mode
income is the most likely income, and favors the larger number of people with
lower incomes. The median or mode are often more intuitive measures of such
For example, the arithmetic mean of six values: 34, 27, 45, 55, 22, 34 is
Geometric mean
The geometric mean is an average which is useful for sets of numbers which
are interpreted according to their product and not their sum (as is the case
with the arithmetic mean). For example rates of growth.
The geometric mean of a set of positive data is defined as the product of all the
members of the set, raised to a power equal to the reciprocal of the number of
members. In a formula: the geometric mean of a1, a2, ..., an is , which is . The
geometric mean is useful to determine "average factors". For example, if a stock
rose 10% in the first year, 20% in the second year and fell 15% in the third
year, then we compute the geometric mean of the factors 1.10, 1.20 and 0.85
as (1.10 × 1.20 × 0.85)1/3 = 1.0391... and we conclude that the stock rose on
average 3.91 percent per year. The geometric mean of a data set is always
smaller than or equal to the set's arithmetic mean (the two means are equal if
and only if all members of the data set are equal). This allows the definition of
the arithmetic-geometric mean, a mixture of the two which always lies in
between. The geometric mean is also the arithmetic-harmonic mean in the
sense that if two sequences (an) and (hn) are defined:
and Then an and hn will converge to the geometric mean of x and y.
Submitted by: AMIT ALEXANDER Page 18
The geometric mean is an average that is useful for sets of positive numbers
that are interpreted according to their product and not their sum (as is the
case with the arithmetic mean) e.g. rates of growth.
For example, the geometric mean of six values: 34, 27, 45, 55, 22, 34 is:
Harmonic mean
The harmonic mean is an average which is useful for sets of numbers which
are defined in relation to some unit, for example speed (distance per unit of
In mathematics, the harmonic mean is one of several methods of calculating an
The harmonic mean of the positive real numbers a1,...,an is defined to be
The harmonic mean is never larger than the geometric mean or the arithmetic
mean (see generalized mean). In certain situations, the harmonic mean
provides the correct notion of "average". For instance, if for half the distance of
a trip you travel at 40 miles per hour and for the other half of the distance you
travel at 60 miles per hour, then your average speed for the trip is given by the
harmonic mean of 40 and 60, which is 48; that is, the total amount of time for
the trip is the same as if you traveled the entire trip at 48 miles per hour.
Similarly, if in an electrical circuit you have two resistors connected in parallel,
one with 40 ohms and the other with 60 ohms, then the average resistance of
the two resistors is 48 ohms; that is, the total resistance of the circuit is the
same as it would be if each of the two resistors were replaced by a 48-ohm
resistor. (Note: this is not to be confused with their equivalent resistance, 24
ohm, which is the resistance needed for a single resistor to replace the two
resistors at once.)
The harmonic mean is an average which is useful for sets of numbers which
are defined in relation to some unit, for example speed (distance per unit of
For example, the harmonic mean of the six values: 34, 27, 45, 55, 22, and 34
It is rigidly defined
it gives less weight to large items and more to small items. Thus
geometric mean of the geometric of values is always less than their
arithmetic mean.
A G.M with zero value cannot be compounded with similar other non-
zero values with negative sign
Answer 3 [b]
(like tossing of coin, drawing cards from a pack or throwing a die) are equally
likely. For this reason this is not valid in the following cases (a) Where
outcomes of experiments are not equally likely, for example lives of different
makes of bulbs.
a) The probability of an event ranges from 0 to 1. That is, an event surely not
be happen has probability 0 and another event sure to happen is associated
with probability 1.
b) The probability of an entire sample space (that is any, some or all the
possible outcomes of an experiment) is 1. Mathematically, P(S) =1
Q 5. (a) What is “Chi - Square” (x2) test, narrate the steps for determining value of x2 with
suitable examples. Explain the conditions for applying x2 and uses of Chi-Square test.
Answer 5 [a]
Chi-square test
The observed cell frequencies are organized in rows and columns like a
spreadsheet. This table of observed cell frequencies is called a contingency
table, and the chi-square test if part of a contingency table analysis.
The chi-square statistic is the sum of the contributions from each of the
individual cells. Every cell in a table contributes something to the overall chi-
square statistic. If a given cell differs markedly from the expected frequency,
then the contribution of that cell to the overall chi-square is large. If a cell is
close to the expected frequency for that cell, then the contribution of that cell
to the overall chi-square is low. A large chi-square statistic indicates that
somewhere in the table, the observed frequencies differ markedly from the
expected frequencies. It does not tell which cell (or cells) are causing the high
chi-square...only that they are there. When a chi-square is high, you must
visually examine the table to determine which cell(s) are responsible.
When there are exactly two rows and two columns, the chi-square statistic
becomes inaccurate, and Yate's correction for continuity is usually applied.
Statistics Calculator will automatically use Yate's correction for two-by-two
tables when the expected frequency of any cell is less than 5 or the total N is
less than 50.
If there is only one column or one row (a one-way chi-square test), the degrees
of freedom is the number of cells minus one. For a two way chi-square, the
degrees of freedom is the number or rows minus one times the number of
columns minus one.
Using the chi-square statistic and its associated degrees of freedom, the
software reports the probability that the differences between the observed and
expected frequencies occurred by chance. Generally, a probability of .05 or less
is considered to be a significant difference.
A standard spreadsheet interface is used to enter the counts for each cell. After
you've finished entering the data, the program will print the chi-square,
degrees of freedom and probability of chance.
Use caution when interpreting the chi-square statistic if any of the expected
cell frequencies are less than five. Also, use caution when the total for all cells
is less than 50.
This test was developed by Karl Pearson (1857-1936), analytical situation and
professor of applied mathematics, London, Whose concept of coefficient of
correlation is most widely used. This r=test consider the magnitude of
dependency between theory and observation and is defined as
E= expected frequencies
2) Take difference between O and E for each cell and calculate their square (O-
E) 2
4) Compare calculated value with table value at given degree of freedom and
specified level of significance. If at a stated level, the calculated value is more
than table values, the difference between theoretical and observed frequencies
are considered to be significant. It could not have arisen due to fluctuation of
simple sampling. However if the values is less than table value it is not
considered as significant, regarded as due to fluctuation of simple sampling
and therefore ignored. Condition for applying x2
1) N must be large, say more than 50, to ensure the similarity between
theoretically correct distribution and our sampling distribution.
2) no theoretical cell frequency cell frequency should be too small, say less
than 5,because that may be over estimation of the value of x2 and may result
into rejection of hypotheses. In case we get such frequencies, we should pool
them up with the previous or succeeding frequencies. This action is called
Yates correction for continuity.
1) As a test of independence
Weather two or more attribute are associated or not can be tested by framing a
hypothesis and testing it against table value. For example, use of quinine is
effective in control of fever or complexions of husband and wives. Consider two
variables at the nominal or ordinal levels of measurement. A question of
interest is: Are the two variables of interest independent(not related)or are they
related (dependent)?
When the variables are independent, we are saying that knowledge of one gives
us no information about the other variable. When they are dependent, we are
saying that knowledge of one variable is helpful in predicting the value of the
other variable. One popular method used to check for independence is the chi-
squared test of independence. This version of the chi-squared distribution is a
nonparametric procedure whereas in the test of significance about a single
population variance it was a parametric procedure. Assumptions: 1. We take a
random sample of size n.
3. Observations are cross classified according to two criteria such that each
observation belongs to one and only one level of each criterion.
2) As a test of goodness of fit The Test for independence (one of the most
frequent uses of Chi Square) is for testing the null hypothesis that two criteria
of classification, when applied to a population of subjects are independent. If
they are not independent then there is an association between them. A
statistical test in which the validity of one hypothesis is tested without
specification of an alternative hypothesis is called a goodness-of-fit test. The
general procedure consists in defining a test statistic, which is some function
of the data measuring the distance between the hypothesis and the data (in
fact, the badness-of-fit), and then calculating the probability of obtaining data
which have a still larger value of this test statistic than the value observed,
assuming the hypothesis is true. This probability is called the size of the test or
confidence level. Small probabilities (say, less than one percent) indicate a poor
fit. Especially high probabilities (close to one) correspond to a fit which is too
good to happen very often, and may indicate a mistake in the way the test was
applied, such as treating data as independent when they are correlated. An
attractive feature of the chi-square goodness-of-fit test is that it can be applied
to any university distribution for which you can calculate the cumulative
distribution function. The chi-square goodness-of-fit test is applied to binned
data (i.e., data put into classes). This is actually not a restriction since for non-
binned data you can simply calculate a histogram or frequency table before
generating the chi-square test. However, the values of the chi-square test
statistic are dependent on how the data is binned. Another disadvantage of the
chi-square test is that it requires a sufficient sample size in order for the chi-
square approximation to be valid.
two more independent random samples are drawn from the same population or
different population. The Test for Homogeneity answers the proposition that
several populations are homogeneous with respect to some characteristic.
(b) How do you define “Index Numbers” ? Narrate the nature and types of Index numbers
with adequate examples.
Answer 5 [b]
According to Croxton and Cowden index numbers are devices for measuring
difference sin the magnitude of a group of related
1) Index numbers are specified average used for comparison in situation where
two or more series are expressed in different units or represent different items.
E.g. consumer price index representing prices of various items or the index of
industrial production representing various commodities produced.
2) Index number measure the net change in a group of related variable over a
period of time.
3) Index number measure the effect of change over a period of time, across the
range of industries, geographical regions or countries.
Price index numbers: A price index is any single number calculated from an
array of prices and quantities over a period. Since not all prices and quantities
Q 6. (a) What are the important Index Numbers used in Indian Economy. Explain index
numbers of Industrial Production.
Answer 6. [a]
1) Obtain decision about class of people for whom the index number is to be
computed, for instance, the industrial personnel, officers or teachers etc. also
decide on the geographical area to be covered.
2) Conduct a family budget inquiry covering the class of people for whom the
index number is to be computed. The enquiry should be conducted for the base
year by the process of random sampling. This would give information regarding
the nature, quality and quantities of commodities consumed by an average
family of the class and also the amount spent on different items of
4) Collect retail prices in respect of the items from the localities in which the
class of people concerned reside, or from the markets where they usually make
their purchases.
7) Separate index number are first of all determined for each of the five major
groups, by calculating the weighted average of price-relatives of the selected
items in the group.
Mining industries like iron ore, iron, coal, copper, petroleum etc.
The figure of output for a various industries classifies above are obtained on a
monthly, quarterly or yearly basis. Weights are assigned to various industries
on the basis of some criteria such as capital invested turnover, net output,
production etc. usually the weights in the index are based on the values of net
output of different industries. The index of industrial production is obtained by
taking the simple mean or geometric mean of relatives. When the simple
arithmetic mean is used the formula for constructing the index is as follows.
Answer 6 [b]
The Bernoulli distribution, which takes value 1 with probability p and value 0
with probability q = 1 - p.
e= exponential constant=2.7183
= mathematical constant=3.1416
A bar graph such that the area over each class interval is proportional to the
relative frequency of data within this interval in plotting a histogram, one starts
y dividing the range of all values into non-overlapping intervals, called class
intervals, in such a way that every piece of data is contained in some class
Visually strong
We determine the height of each rectangular bar of the histogram shows the
midpoints of the intervals. Bars are centered above the midpoints of the
intervals. The vertical axis of the histogram shows the frequency of the scores
in each of the intervals on the horizontal axis. In 20 to 29, it shows an
extremely high or low score. From this histogram, we did not found the title of
this histogram. The scales that are used in this histogram are suitable because
the data are small. However, the raw data is not shown.
The graph shown below is the shape of the famous bell curve. This bell curve
drawing is the same as a normal probability curve. This is the density curve.
Density curve means that it shows the likelihood or probability at any given
point on the curve. This illustrates the strong central bias for any given data
The shape of the cumulative curve is probably not as familiar, but it is more
useful. Starting from the left and moving to the right, the cumulative curve is
the summation of all the points on the bell curve or the density curve behind or
to the left. This is the curve used to calculate useful probabilities