Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Business Statistics

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Business Statistics

Kaushal Jha( BE + MBA)


What is data ?
• Data is distinct pieces of information, usually formatted in a special
way.
• Types of data
1. Categorial Data
2. Numerical Data
Types of Data
• Nominal Scale:
A nominal scale classifies data into several distinct categories in which no ranking criteria is
implied. For example Gender, Marital Status
• Ordinary Scale:
An ordinal scale classifies data into distinct categories during which ranking is implied For
example:
1. Faculty rank : Professor, Associate Professor, Assistant Professor
2. Students grade : A, B, C, D.E.F
• Interval scale:
An interval scale may be an ordered scale during which the difference between measurements is a
meaningful quantity but the measurements don’t have a true zero point. For example:
1. Temperature in Fahrenheit and Celsius.
2. Years
• Ratio scale:
A ratio scale may be an ordered scale during which the difference between the measurements is a
meaningful quantity and therefore the measurements have a true zero point. Hence, we can perform
arithmetic operations on real scale data. For example : Weight, Age, Salary etc.
Probability
• Types of probability
• Axioms
• Addition and Multiplication Rule
• Independence of event
• Probability Tree
• Bayes Theorem
• Concept of Mean, Median and Mode
Central tendency
Types of Kurtosis
Categorization of Kurtosis

Category

Mesokurtic Platykurtic Leptokurtic

Tailedness Medium-tailed Thin-tailed Fat-tailed

Outlier frequency Medium Low High

Kurtosis Moderate (3) Low (< 3) High (> 3)

Excess kurtosis 0 Negative Positive


Mesokurtic distribution example
Platykurtic distribution example
Leptokurtic distribution example
Random Variable
• Discrete Random Variable - A discrete random variable can be defined as a type of variable whose value
depends upon the numerical outcomes of a certain random phenomenon. It is also known as a stochastic variable. Discrete
random variables are always whole numbers, which are easily countable

• Continuous Random Variable- a random variable is said to be continuous if it assumes a value that falls
between a particular interval. Continuous random variables are used to denote measurements such as height, weight, time, etc.

• Expected Value- Expected value (also known as EV, expectation, average, or mean value) is a
long-run average value of random variables. It also indicates the probability-weighted average of all possible
values.
• Examples of Discrete and Random Variable
Example 1: Number of Items Sold (Discrete)
Example 2: Number of Customers (Discrete)
Example 3: Number of Defective Products (Discrete)
Example 4: Marathon Time (Continuous)
Example 5: Interest Rate (Continuous)
Example 6: Plant Height (Continuous)
Binomial Distribution

• The binomial distribution encompasses the range of probabilities for any binary event that is repeated
over time.
Poisson Distributor
• The probability of events occurring at a specific time is Poisson Distribution. In other words, when you are
aware of how often the event happened, Poisson Distribution can be used to predict how often that event will
occur. It provides the likelihood of a given number of events occurring in a set period.

• X= Actual number of Occurring success.


• Lemda= Average number of success in specific region
• e = 2.72
Sampling Distribution
• What is Sampling distribution?
• Types of Sampling distribution
Sampling distribution
• It is a probability distribution obtained from large number of sample drawn from a specific population
• For mean values, when we take out all the possible samples of the population and plot the probability of the
mean values of those samples on the graph, we get a sample distribution of sample mean
• In a sampling distribution
Mu(Xmean)= Mu, Sigma(X mean)= sigma/(root(n))
• The shape of a sampling distribution is not necessarily like the bell-shape. But as the sample size increases,
the graph tends towards forming a normal distribution curve. This is also called central Limit Theorem.
Central Limit Theorem
• When the population of not normally distributed or is skewed but the sample size is fairly large i.e. greater
than 30, then the sampling distribution will approach to the shape of normal distribution(in case of sampling
without replacement)
• If the population is normally distributed, then the sampling distribution would also be normal no matter
what the sample size is.
• Also, mean of the sample mean would be equal to the population mean
• And standard deviation of the sample means would be equal to the standard deviation of the population
divided by square root of sample size.
Standard Error
• The term “standard error” is used to refer to the standard deviation of various sample statistics like mean or
median
• In the other word, Standard error of a statistics is the standard deviation of its sampling distribution.
• If the statistics is the sample mean, it is called the “Standard Error of Mean”.
• So, Standard error of mean refers to the standard deviation of the theoretical/Probability distribution of
sample mean (i.e. sampling distribution) taken from population.
• It shows the deviation between the sample statistics and the population parameter.
• When the sample mean deviates from the population mean, this deviation is called standard error mean.
Sigma(X Mean)= Sigma/(Root(n))
Types of Sampling
• Simple Random Method- It involves selecting a sample size n from population of size N. So that all
elements of the population have equal chances of being part of the sample. Example- Lotteries, Table of
random numbers etc.
• Systematic Random Sampling- It involves using random start to determine the first element of the
sample and the selection of the rest of the sample is done systematically, i.e. every kth interval, where k =
N/n. Example: Assume there are 100 Patients(N) in a hospital and to select a sample of 20 patients(n) by
systematic random sampling procedure.
• Stratified Sampling- It is divided the population into groups called STRATA according to some chosen
classification category such as age, gender, geographic location, and so on. Subsample from each stratum are
selected by simple random sampling.
• Cluster Sampling- Elements of the population are divided into groups called CLUSTER. Clusters are
naturally occurring like barangays, cities, or municipalities. Samples are obtained from each cluster by Simple
Random Sampling.
Estimation
• It is specific observed numerical value used to estimate an unknown
population parameter.
• Point Estimate- It is a single numerical value used to estimate an
unknown population parameter. Sample mean is a point estimate of
population mean.
• Interval Estimate- It takes range of values used to estimate an
unknown population parameter. Interval estimates of population
parameter is called as confidence intervals.
Confidence Interval

• The confidence interval is based on the mean and standard deviation. Thus, the formula to find CI is
• X̄ ± Zα/2 × [ σ / √n ]
Where,
X̄ = Mean
Z = Confidence coefficient
α = Confidence level
σ = Standard deviation
N = sample space
The value after the ± symbol is known as the margin of error.

LCL= X̄ - Zα/2 × [ σ / √n ]
UCL=X̄ + Zα/2 × [ σ / √n ]
Hypothesis
• It is an assumption or an idea
• A statistical hypothesis is a claim (assertion, statement, belief or
assumption) about an unknown population parameter values.
• For Example- A judge assumed that a person charged with a crime is
innocent and subject this assumption(hypothesis) to a verification by
reviewing the evidence and hearing testimony before reacting to a
verdict.
Hypothesis Testing
• A hypothesis is a statement to be tested about the true value of the
population parameter using sample statistics.
• To test the validity of the claim or assumption about the population
parameter,
a) A sample is drawn from the population and analysed.
b) The result of the analysis are used to decide whether the claim is
true or not.
Hypothesis steps
• Null hypothesis- The hypothesis which is initially assumed to be true,
although it may in fact be either true or false based on the sample
parameter. Initially, We have not taken any difference between
sample statistics and Population parameter.
• Hypothesis testing requires that the null hypothesis be considered
true(No difference) until it is proved false on the basis of results
observed from the sample data.
Correlation
• Correlation is a linear relation between two random variables. It analyses how to determine both the nature
and strength of relationship between two variables.
• Correlation lies between +1 and -1.
• A zero correlation indicates that there is no relation between the variables. -1 indicates perfect negative
correlation and +1 indicates perfect positive correlation.
• Positive Correlation- If one variable increases then other also increases or If one value decreases the other
one also decreases, will be called as Positive correlation.
• Negative Correlation- If one variable increases then the other one decreases and vice versa will be called as
negative correlation.
Regression
• It is the measure of the average relationship between two or more variables in terms of the original units of
the data.
• Simple regression- We study about only two variables at a time in which one variable is dependent and other
is independent F. For Ex- The functional relationship between income and expenditure.
• Multiple regression- We study about multiple variables, among which one is dependent and the other is
independent. For Eg- The study of effect of rain and irrigation on the yield of wheat is an example of multiple
regression.
• Linear Regression- When one variable changes with another variable in a fixed ratio, it is known as linear
regression and this type of graph is straight line.
• Non-Linear Regression- When one variable changes with another variable in a changing ratio then it is
referred to as non-linear regression.
• Partial/Total regression- when two or more variables are studied for functional relationship but at a time,
relation between two variable are studied and other variables are held constant.

You might also like