Stats For Managers - Intro

Statistics for Managers –
Theoretical Introduction
Prepared By:
Manuj Madan,
Assistant Professor,
Chitkara University
Elements Vs Variables?
Elements Vs Variables
Elements are entities on which data

are collected and variables are
characteristic of interest for the
element.
Quantitative Variable?
Quantitative Variable
It tells us about the quantity of

what is measured.
Categorical Variable?
Categorical Variable
They do not measure a quantity of

something.
Is Telephone Country Code a categorical
variable? Yes/No
Yes, because it does not measure a
quantity of something
Ordinal Vs Nominal Variables?
Ordinal Vs Nominal Variables
A categorical variable can be ordinal as well as nominal. When order is

specified such as a customer is asked about a product or service whether
‘Not Satisfied’, ‘Moderately satisfied’, ‘Extremely satisfied’, then this is
called________________________variable
Ordinal categorical
When order is not given, then it is called nominal variable(quantitative or

categorical). Let us see some examples on next slide.
Example of identifying variables types
Identifiers are special type of categorical variable

Population in Statistics?
Population in Statistics
The whole data, which is focus of our study is population.

I want to identify “How many male students wear jeans
daily in statistics class of Chitkara University” What is my
population?
Answer: Statistics Class, not Chitkara University

Sampling Frame?
Sampling Frame
The list from which we draw our sample from is called sampling frame.
What should be the sampling frame if I want to identify -
“How many male students wear jeans daily in statistics class of Chitkara
University?”
Male students of Statistic class only

Sample?
Sample
It is a subset of population.
Census?
Census
This is a special sample that contains whole of the

population.
Statistic and Parameter?
Statistic and Parameter
A statistic is description of the sample

whereas parameter is description of
population.
Mean, Median, mode if given for a sample

is called statistic and when given for a
population is known as parameter.
Descriptive Statistics?
Descriptive Statistics
A descriptive statistic is a summary statistic that quantitatively describes or

summarizes features of a collection of information.
Any Examples of descriptive stats?
Mean, Median, Standard deviation etc. are descriptive in nature as these describe the
performance of one set of data and no generalization about other data sets is made
from this.
Inferential Statistics?
Inferential Statistics
Inferential statistics makes inferences and

predictions about a population based on
a sample of data taken from the
population in question.
Any Examples?
Hypothesis Testing(will see later in course
what it is?)
Frequency Distribution?
Frequency Distribution
It is a tabular summary of data showing the number of

items in each non-overlapping class.
Probability Distribution?
Probability Distribution
It is a frequency distribution, which is one that describes

how outcomes (dependent variables) are expected to
vary.
Relative Frequency?
Relative Frequency
Frequency of class divided by total of all frequencies.

Scatter Diagram?
Scatter Diagram
Relationship Between two quantitative variables on graph.

Trendline?
Trendline?
Trendline is approximation of relation between

two variables.
Bar Chart vs Histogram
Bar Chart vs Histogram
Bar Chart uses qualitative data

whereas histogram uses quantitative
data.
Note: Pareto diagram is a type of bar

chart where arrangement of bars is in
descending order of height
Simpson’s Paradox
Conclusions drawn from two or more separate crosstabulations that can

be reversed when the data are aggregated into a single crosstabulation.
Percentile?
Percentile
p percentile refers to at least p percent of observations are

less or equal.
i = (p/100 ) * n
If i is an integer, pth percentile is average of values in i and
i+1 and if i is not an integer, then next integer greater then i
denotes the position of pth percentile.
Interquartile Range?
Interquartile Range
Q3 – Q1
where Q3 = 75th percentile
and Q1 = 25th percentile
Variance and Standard
Deviation?
Variance and Standard Deviation
∑(xi – µ)2 / N is population variance

where N is population size and µ = population mean
∑(xi – Х)2 / n-1 is sample variance
where n is sample size and Х is sample mean
Standard Deviation is another measure of dispersion of data and is calculated

using square root of variance.
Why we use n-1 in case of sample?

To avoid biases using degrees of freedom.
Coefficient of variation?
Coefficient of Variation
It is equal to standard deviation of mean divided by mean itself

multiplied whole by 100.
It is used when units are different for same data and std deviation is not
the right measure of dispersion.
Formula of Z-score?
Formula of Z-score
Z-score = (xi – X) / σ
xi= observation
where X = average of sample means data and
σ = standard deviation of sample means data
What is Outlier?
A value in a set of observations that is abnormally away from mean, median or
mode.
If Z-score > 3 or Z-score < -3, then
Observation is an outlier
How to calculate covariance
and correlation?
Calculate covariance and correlation
Covariance =∑ (xi- X)(yi-Y) / n-1

Where n = no of total observations in both variables
xi= ith observation of x variable
Yi = yth observation of y variable
X = mean of x variable observations
Y = mean of y variable observations
Correlation coefficient= covariance (x,y) / (σx *σy)
where σx= standard deviation of x variable observations

σy = standard deviation of x variable observations
Both are called
Measures of Association
Correlation vs Regression?
Correlation vs Regression
Both are used to describe nature and strength of relationship between two
continuous variables.
Correlation focusses on association whereas regression is inclined towards
making predictions.
Regression analysis always have dependent and independent variables
whereas correlation has any two variables.
Cause and effect? Yes/No
NEVER
Broad types of sampling?
Broad types of sampling
Probability Sampling and Non-Probability Sampling.

What is probability sampling?
The sampling on which statistical analysis can be done is called
probability sampling.
Non-probability Sampling?
No statistical analysis can be done. Also called judgement sampling.
Simple Random Sampling?
Simple Random Sampling
It is a type of probability sampling in which each observation has equal

chance of being selected.
Is it possible in real world? Yes/No
No, that is why sampling errors exist.
Biased Samples?
Biased Samples
The parliament is debating some gun control laws. You are asked to conduct
an opinion survey. Because hunters are the ones that are most affected by the
gun control laws, you went to a hunting lodge and interviewed the members
there. Then you reported that in a survey done by you, about 97 percent of
the respondents were in favour of repealing all gun control laws.
A week later, the Parliament took up another bill: “Should working pregnant
women be given a maternity leave of one year with full pay to take care of
new-born babies?” Because this issue affects women most, this time you went
to all the high-rise office complexes in your city and interviewed several
working women of child-bearing age. Again you reported that in a survey
done by you, about 93 percent of the respondents were in favour of the one-
year maternity leave with full pay.
In both of these situations you picked a biased sample by choosing people
who would have very strong feelings on one side of the issue.
Other names of
non-probability Sampling?
Convenience Sample (drawn according to

convenience of researcher)
Purposive or judgement sampling (drawn based on

experience of expert)
Sample Size and Its Determination?
Sample Size and Its Determination?
Optimization between achieving objectives and costs / resources is done.

Size depends on:
 Nature of Universe(Homogeneous or Heterogenous)
 Nature of Study (intense- small or general- large)
 Types of sampling technique ( small SRS is superior to large badly selected
sample)
 Availability of Finance
 Standard of accuracy
Primary Data and Secondary Data?
Primary data is data collected from methods of

questionnaire, observation, interviews and
schedules etc.
Secondary data is one that already exists and you
use somebody else’s primary data in your
research by giving references. Sources of
secondary data are to be checked for reliability.
Observation Method -
Advantages and Disadvantages?
Observation Method –
Advantages and Disadvantages
This method is subjected to checks and controls of
validity & reliability.
Subjective Bias is eliminated.
Info obtained is current and independent of
respondents willingness to respond.
Less demanding but very costly method.
Some people are rarely accessible to direct
observation.
Interview Method – Types?
Interview Method – Types
Structured – Set of predetermined questions.

Unstructured – freedom to ask supplementary questions and omit certain
questions and may even change the sequence of questions.
Focussed – attention on given experience of respondent & its effects.
Clinical interview – concerned with individuals life experience and feelings.
Non-directive interview – encourage respondents to talk about the given topic
with a bare minimum of direct questioning.
Question Sequence – easy questions in the
beginning or end?
Ideally, easy questions should be at the beginning

coz if respondent leaves questions in end,
considerable info would have already been
obtained.
What is Schedule method of collecting
primary data?
Enumerator in place of self for collecting data is
the only difference between questionnaire
method and schedule method.
What is Probability Distribution of two
possible number of tails from two tosses of a
fair coin?
No. of tails = 0 (H,H)
Probability of outcome is 0.5 X 0.5 = 0.25
No. of Tails = 1 (T,H) or (H,T)
Probability of outcome = 0.5
No. of tails = 2 (T,T)
Probability of outcome = 0.5 X 0.5 = 0.25
Frequency Distribution
vs Probability Distribution
Freq. distribution is a listing of observed frequencies of all possible
outcomes of an experiment that actually occurred when experiment
was done whereas
Probability Distribution is listing of the probability of possible outcomes

that could result if experiment was done.
Types of Probability Distribution
Discrete, which can be done only on a limited number of values that can be
listed down. Probability that you were born in a given month is discrete
because there are only 12 possible values.
Continuous in which variable under consideration is allowed to take on any

values within a range e.g. examining the level of effluent in a variety of
streams. We would expect continuous range of ppm from very low levels in
clear mountain streams to very high levels in polluted streams.
Random Variables
A variable is random if it takes on different values

as a result of outcomes of a random experiment.
A random variable can be discrete or continuous.

Bernoulli Process assumptions?
1. Each trial has only 2 possible outcomes i.e. heads / tails, yes/no,
success or failure.
2. Probability of outcome of any trial remains fixed over time e.g. with
a fair coin, the probability of head is 0.5.
3. All trials are statistically independent i.e. one outcome of toss does
not affect outcome of any other toss.
Binomial Distribution?
It is applied to discrete random variables only. It describes data

resulting from an experiment known as Bernoulli process.
Probability of r success in n trials =

(n!)*prqn-r
n!*(n-r)!
Where p = probability of success
q = probability of failure = 1-p
n = no. of trials undertaken and r = no. of successes
Using Binomial tables
Measures of central tendency and
dispersion for binomial distribution
Mean = np and
standard deviation = Square root(npq)
Poisson Distribution?
Discrete probability distribution again. Poisson distribution is useful for

characterizing events with very low probabilities of occurrence within some definite
time or space.
This is used in cases such as arrivals of trucks and cars at a tollbooth.

No. of patients who arrive at a physicians office in a given interval of time will be
0,1,2,3,4,5 or some other whole number.
Formula is P(x) =( λx e –λ )/ x!
Where P(x) = probability of exactly x no of occurrences
λ = mean number of occurrences per interval of time
Poisson as an approximation
of Binomial
Poisson can be approximation of binomial when n is large and p is small.
Formula becomes:
P(x) = (np)x e-np / x!
Normal Distribution?
It is a continuous probability distribution.
It occupies important place in statistics.

Characteristics are:
1. Unimodal bell shaped curve
2. Mean of normally distributed population lies at centre of its normal curve
3. Median and mode of distribution are also at centre
4. Two tails of normal distribution extend indefinitely
Areas under the normal curve
68 % of all values lie within + 1 σ
95.5 % of all values lie within + 2 σ
99.7 % of all values lie within + 3 σ

Z-table demonstration
Shortcoming of Normal Distribution
Tails approach horizontal axis, but never touch it

resulting in some probability that random variable
can take on enormous values.
Normal Distribution as approximation of
Binomial Distribution
Although Normal Distribution is continuous, but it
can be used to approximate discrete distribution
such as binomial when np>5 and nq>5.
Other continuous distributions
1. t-distribution
2. Chi-square
3. F-distribution
Sampling distribution of the mean
A probability distribution of all the possible means of

samples of given size n, from a population.
Sampling Error
Error or variation among sample statistics due to chance;

a measure of the extent to which we expect the means
from different samples to vary from the population mean,
owing to the chance error in sampling.
Standard error
The standard deviation of the sampling distribution

of a statistic.
Central Limit theorem
It is a result assuring that the sampling distribution of mean

approaches normality as the sample size increases,
regardless of the shape of the population distribution from
which the sample is collected.
Estimator and estimate
A sample statistic used to estimate the population

parameter is called estimator.
A specific observed value of an estimator is estimate.

Point estimate and interval
estimate
A single number used to estimate an unknown population
parameter is point estimate.
A range of values to estimate an unknown population

parameter is interval estimate.
Hypothesis?
Any assumption about the population

NULL and Alternate Hypothesis
Null hypothesis specifies a parameter and a value

for that parameter.
Alternate Hypothesis specifies a range of plausible

values should we succeed to reject the Null.
Testing Hypothesis
We must state the assumed or hypothesized value of the

population parameter before we begin sampling.
The assumption we wish to test is called NULL hypothesis

and is symbolized by H0(H sub zero).
Purpose of hypothesis testing
The purpose of hypothesis testing is not to question the

computed value of sample but to make a judgment
about difference between sample stat and hypothesized
population parameter.
Significance Level
A value indicating the percentage of sample values that is

outside limits, assuming the null hypothesis is correct.
It can also be called probability of rejecting Null hypothesis

when actually it is true.
Generally it is taken as 5% or 1% in real world situations.

Type I and Type II error
Rejecting a NULL hypothesis when it is true is type I error. Denoted by α.
Accepting a Null hypothesis when it is false is type II error. Denoted by β.
Power of a test is 1- β.
Trade-off between the two errors is needed depending on penalties attached

to each error.
Two tailed and one tailed tests
A hypothesis test in which the null hypothesis is rejected if the

sample value is significantly higher or lower than hypothesized
value of the population parameter, a test involving two
rejection regions.
Two sample tests
Hypothesis tests based on samples taken from two

populations in order to compare their means or
proportions.
t-Distribution
A family of probability distributions distinguished by their

individual degrees of freedom similar to normal distribution
and used when population standard deviation is unknown
with sample size less than 30.
Chi-Square distribution
A family of probability distributions, differentiated by their degrees of freedom,

used to test a number of different hypothesis about variances, proportions and
goodness of fit.
Chi square test is done when both the variables are categorical.
ANOVA
A statistical technique used to test the equality of 3 or

more sample means and thus, make inference as to
whether the samples come from populations having the
same mean.
Goodness of fit test
A statistical test for determining whether there is a significant

difference between an observed frequency distribution and a
theoretical probability distribution hypothesized to describe the
observed distribution.

Stats For Managers - Intro

Uploaded by

Copyright:

Available Formats

Stats For Managers - Intro

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stats For Managers - Intro

Uploaded by

Copyright:

Available Formats

Statistics for Managers –

Elements are entities on which data

It tells us about the quantity of

They do not measure a quantity of

A categorical variable can be ordinal as well as nominal. When order is

When order is not given, then it is called nominal variable(quantitative or

Identifiers are special type of categorical variable

The whole data, which is focus of our study is population.

Answer: Statistics Class, not Chitkara University

Male students of Statistic class only

This is a special sample that contains whole of the

A statistic is description of the sample

Mean, Median, mode if given for a sample

A descriptive statistic is a summary statistic that quantitatively describes or

Inferential statistics makes inferences and

It is a tabular summary of data showing the number of

It is a frequency distribution, which is one that describes

Frequency of class divided by total of all frequencies.

Relationship Between two quantitative variables on graph.

Trendline is approximation of relation between

Bar Chart uses qualitative data

Note: Pareto diagram is a type of bar

Conclusions drawn from two or more separate crosstabulations that can

p percentile refers to at least p percent of observations are

∑(xi – µ)2 / N is population variance

Standard Deviation is another measure of dispersion of data and is calculated

Why we use n-1 in case of sample?

It is equal to standard deviation of mean divided by mean itself

Covariance =∑ (xi- X)(yi-Y) / n-1

where σx= standard deviation of x variable observations

Probability Sampling and Non-Probability Sampling.

It is a type of probability sampling in which each observation has equal

Convenience Sample (drawn according to

Purposive or judgement sampling (drawn based on

Optimization between achieving objectives and costs / resources is done.

Primary data is data collected from methods of

Structured – Set of predetermined questions.

Ideally, easy questions should be at the beginning

Probability Distribution is listing of the probability of possible outcomes

Continuous in which variable under consideration is allowed to take on any

A variable is random if it takes on different values

A random variable can be discrete or continuous.

It is applied to discrete random variables only. It describes data

Probability of r success in n trials =

Discrete probability distribution again. Poisson distribution is useful for

This is used in cases such as arrivals of trucks and cars at a tollbooth.

It is a continuous probability distribution.

It occupies important place in statistics.

68 % of all values lie within + 1 σ

95.5 % of all values lie within + 2 σ

99.7 % of all values lie within + 3 σ

Tails approach horizontal axis, but never touch it

A probability distribution of all the possible means of

Error or variation among sample statistics due to chance;

The standard deviation of the sampling distribution

It is a result assuring that the sampling distribution of mean

A sample statistic used to estimate the population

A specific observed value of an estimator is estimate.