Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Ba 03

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 46

1

Business
Analytics
2

Objectives:
 Statistical learning including
quantitative, qualitative analysis
techniques
 Predictive Analytics using linear,

polynomial and logistic regression


techniques and model comparison
 The use of the above analysis and

visualization to aid decision making


3

Content:
 Business Analytics - Introduction
 Statistical Methods for Business Analytics
 Basics of Hypothesis Testing
 Correlation and Regression
 Multiple Linear Regression
 Model Comparison and Performance
 Classification
 Time Series Analysis
4

Till now…
5

All Together:
Mode: 24
5 Numbers: 14-21-23-25-30

SD: 3.235

Skewness: -0.329

N: 75

Mean: 22.8

Median: 23
6

Introduction
• What is Business Analytics
• Types of Data Analysis: Descriptive, Predictive
and Prescriptive
• Big Data Analytics – Volume , Velocity, Variety
• Data Mining
• Data Visualization
• Data Analytics Lifecycle
• Business Intelligence vs. Data Science
7

Summary
1) Functions and its Variables
2) Statistical Learning
3) Estimating Function
4) Purpose of Estimating Function:
• Inferences
• Predictions
5) Methods to Estimate f
• Parametric
• Non-parametric
6) Prediction Accuracy vs Interpretability
7) Supervised vs Unsupervised Learning
8

Interval Estimates
9
10
11
12
13

Sampling Distribution

Sampling
distribution
of x

/2 1 -  of all /2


x values

x

z /2  x z /2  x
14

Margin of Error
A point estimator cannot be expected to provide the
exact value of the population parameter.

An interval estimate can be computed by adding and


subtracting a margin of error to the point estimate.
Point Estimate +/ Margin of Error

x  Margin of Error

The purpose of an interval estimate is to provide


information about how close the point estimate is to
the value of the parameter.
15

Interval Estimate - Population Known

Sampling
distribution
of x

/2 1 -  of all /2


x values

x

z /2  x z /2  x
16

Interval Estimate - Population Known


Sampling
distribution
of x
1 -  of all
/2 /2
x values
interval
does not x
 interval
include  z /2  x z /2  x includes 
[------------------------- x -------------------------]
[------------------------- x -------------------------]
[------------------------- x -------------------------]
17

Interval Estimate - Population Known


Interval Estimate of

x  z /2
n

where: x is the sample mean


1 - is the confidence coefficient
z/2 is the z value providing an area of
/2 in the upper tail of the standard
normal probability distribution
 is the population standard deviation
n is the sample size
18

Interval Estimate - Population Known


Discount Sounds has 260 retail outlets throughout the United
States. The firm is evaluating a potential location for a new
outlet, based in part, on the mean annual income of the
individuals in the marketing area of the new location.
A sample of size n = 36 was taken; the sample mean income is
$41,100. The population is not believed to be highly skewed.
The population standard deviation is estimated to be $4,500,
and the confidence coefficient to be used in the interval
estimate is 0.95.

$41,100 + $1,470
or
$39,630 to $42,570
19

Command:
CONFIDENCE.NORM(alpha,standard_dev,size)
•Alpha     Required. The significance level used to compute the confidence
level. The confidence level equals 100*(1 - alpha)%, or in other words, an
alpha of 0.05 indicates a 95 percent confidence level.
•Standard_dev     Required. The population standard deviation for the data

range and is assumed to be known.


•Size     Required. The sample size.

Sample Average: 41100


Sample Size: 36
Population Std Deviation: 4500
At 95 % Level of Confidence
20

27
Command: 25
CONFIDENCE.NORM(alpha,standard_dev,size) 23
26
•Alpha     Required. The significance level used to compute the confidence 20
level. The confidence level equals 100*(1 - alpha)%, or in other words, an 22
alpha of 0.05 indicates a 95 percent confidence level. 19
•Standard_dev     Required. The population standard deviation for the data 28
range and is assumed to be known. 23
•Size     Required. The sample size.
16
27
23
24
19
22
21
28
12
Population Std Deviation = 4.394 17
26
21

Interval Estimate: Population Unknown


Interval Estimate
s
x  t /2
n

where: 1 - = the confidence coefficient


t/2 = the t value providing an area of /2
in the upper tail of a t distribution
with n - 1 degrees of freedom
s = the sample standard deviation
22

Interval Estimate: Population Unknown


A reporter for a student newspaper is writing an article on the cost of off-
campus housing. A sample of 16 one-bedroom apartments within a half-
mile of campus resulted in a sample mean of $750 per month and a sample
standard deviation of $55.

Let us provide a 95% confidence interval estimate of the mean rent per
month for the population of one- bedroom efficiency apartments within a
half-mile of campus. We will assume this population to be normally
distributed.

At 95% confidence,  = 0.05, and /2 = 0.025.


t0.025 is based on n - 1 = 15 degrees of freedom.
In the t distribution table we see that t0.025 = 2.131.
23

Interval Estimate: Population Unknown


A reporter for a student newspaper is writing an article on the cost of off-
campus housing. A sample of 16 one-bedroom apartments within a half-
mile of campus resulted in a sample mean of $750 per month and a sample
standard deviation of $55.

Let us provide a 95% confidence interval estimate of the mean rent per
month for the population of one- bedroom efficiency apartments within a
half-mile of campus. We will assume this population to be normally
distributed.
24

Interval Estimate: Population Unknown


Interval Estimate
s
x  t.025 Margin
n of Error
55
750  2.131  750  29.30
16

We are 95% confident that the mean rent per month


for the population of one-bedroom apartments within
a half-mile of campus is between $720.70 and $779.30.
25

Command:
CONFIDENCE.T(alpha,standard_dev,size)
Alpha     Required. The significance level used to compute the confidence
level. The confidence level equals 100*(1 - alpha)%, or in other words, an
alpha of 0.05 indicates a 95 percent confidence level.
Standard_dev     Required. The sample standard deviation for the data range.
Size     Required. The sample size.

Sample Average: 750


Sample Size: 16
Sample Std Deviation: 55
At 95 % Level of Confidence
26

27
Command: 25
CONFIDENCE.T(alpha,standard_dev,size) 23
26
Alpha     Required. The significance level used to compute the confidence 20
level. The confidence level equals 100*(1 - alpha)%, or in other words, an 22
alpha of 0.05 indicates a 95 percent confidence level. 19
Standard_dev     Required. The sample standard deviation for the data range. 28
Size     Required. The sample size. 23
16
27
23
24
19
22
21
28
12
17
26
27

Interval Estimate: Process


Can the
Yes population standard No
deviation  be assumed
known ?

Use the sample


standard deviation
s to estimate

Use  Use s
x  z /2 x  t /2
n n
30

Hypothesis Test: Null and Alternate

H 0 :   0 H 0 :   0 H 0 :   0
H a :   0 H a :   0 H a :   0

One-tailed One-tailed Two-tailed


(lower-tail) (upper-tail)
31

Hypothesis Test: Steps


Step 1. Develop the null and alternative hypotheses.
Step 2. Specify the level of significance .
Step 3. Collect the sample data and compute the
value of the test statistic.

p-Value Approach
Step 4. Use the value of the test statistic to compute
the p-value.
Step 5. Reject H0 if p-value < .
32

Hypothesis Test: Steps


Step 1. Develop the null and alternative hypotheses.
Step 2. Specify the level of significance .
Step 3. Collect the sample data and compute the
value of the test statistic.

Critical Value Approach


Step 4. Use the level of significance to determine the
critical value and the rejection rule.
Step 5. Use the value of the test statistic and the
rejection rule to determine whether to reject
H0.
33

Hypothesis Test: Exercise


The response times for a random sample of 40 medical
emergencies were tabulated. The sample mean is 13.25
minutes. The population standard deviation is believed to be
3.2 minutes.
The EMS director wants to perform a hypothesis test, with a .
05 level of significance, to determine whether the service goal
of 12 minutes or less is being achieved.
H0: 
Ha:
x   13.25  12
z   2.47
 / n 3.2 / 40
For z = 2.47, cumulative probability = 0.9932.
p–value = 1  0.9932 = 0.0068
34

Hypothesis Test: Exercise


The response times for a random sample of 40 medical
emergencies were tabulated. The sample mean is 13.25
minutes. The population standard deviation is believed to be
3.2 minutes.
The EMS director wants to perform a hypothesis test, with a .
05 level of significance, to determine whether the service goal
of 12 minutes or less is being achieved.
H0: 
Ha:
x   13.25  12
z   2.47
 / n 3.2 / 40
For  = .05, z.05 = 1.645
Reject H0 if z > 1.645
35

Hypothesis Test: Exercise


The response times for a random sample of 40 medical
emergencies were tabulated. The sample mean is 13.25
minutes. The population standard deviation is believed to be
3.2 minutes.
The EMS director wants to perform a hypothesis test, with a .
05 level of significance, to determine whether the service goal
of 12 minutes or less is being achieved.
36

Command:
Z.TEST(array,x,[sigma])

•Array     Required. The array or range of data against which to


test x.
•x     Required. The value to test.
•Sigma     Optional. The population (known) standard deviation.

If omitted, the sample standard deviation is used.

Sample Average: 13.25


Population Std Deviation: 3.2
At 95 % Level of Confidence
Ho: µ ≥ 12
Ha: µ < 12
37

Always gives p value to the


Command: RH side of sample mean
Z.TEST(array,x,[sigma])

•Array     Required. The array or range of data against which to


test x.
•x     Required. The value to test.
•Sigma     Optional. The population (known) standard deviation.

If omitted, the sample standard deviation is used.

Sample Average: 10.75


Population Std Deviation: 3.2
At 95 % Level of Confidence
Ho: µ ≥ 12
Ha: µ < 12
38

27
Command: 25
Z.TEST(array,x,[sigma]) 23
26
20
•Array     Required. The array or range of data against which to 22
test x. 19
•x     Required. The value to test. 28
•Sigma     Optional. The population (known) standard deviation. 23
If omitted, the sample standard deviation is used. 16
27
Population Std Deviation: 4.394 23
24
At 95 % Level of Confidence 19

Ho: µ ≥ 20
22
21

Ha: µ < 20
28
12
17
26
39

Hypothesis Test: Two Tailed Hypothesis


Test
1. Compute the value of the test statistic z.
2. If z is in the upper tail (z > 0), compute the
probability that z is greater than or equal to the
value of the test statistic. If z is in the lower tail
(z < 0), compute the probability that z is less than or
equal to the value of the test statistic.
3. Double the tail area obtained in step 2 to obtain
the p –value.
The rejection rule:
Reject H0 if the p-value <  .
H 0 :   0
H a :   0
40

Hypothesis Test: Two Tailed Hypothesis


Test
1. Compute the value of the test statistic z.
2. The critical values will occur in both the lower and
upper tails of the standard normal curve.
3. Use the standard normal probability distribution
table to find z/2 (the z-value with an area of /2 in
the upper tail of the distribution).
The rejection rule is:
Reject H0 if z < -z/2 or z > z/2.

H 0 :   0
H a :   0
41

Hypothesis Test: Exercise


The production line for Glow toothpaste is designed
to fill tubes with a mean weight of 6 oz. Periodically,
a sample of 30 tubes will be selected in order to
check the filling process.
Quality assurance procedures call for the
continuation of the filling process if the sample
results are consistent with the assumption that the
mean filling weight for the population of toothpaste
tubes is 6 oz.; otherwise the process will be adjusted.
Assume that a sample of 30 toothpaste tubes
provides a sample mean of 6.1 oz. The population
standard deviation is believed to be 0.2 oz. Perform
a hypothesis test, at the 0.03 level of significance, to
help determine whether the filling process should
continue operating or be stopped and corrected.
42

Hypothesis Test: Exercise


The production line for Glow toothpaste is designed to fill
tubes with a mean weight of 6 oz.
Assume that a sample of 30 toothpaste tubes provides a sample
mean of 6.1 oz. The population standard deviation is believed
to be 0.2 oz. Perform a hypothesis test, at the 0.03 level of
significance, to help determine whether the filling process
should continue operating or be stopped and corrected.
H0:   = .03
Ha: ≠
x  0 6.1  6
z   2.74
 / n .2 / 30
For /2 = 0.03/2 = 0.015, z.015 = 2.17
Reject H0 if z < -2.17 or z > 2.17
43

Hypothesis Test: Exercise


The production line for Glow toothpaste is designed to fill
tubes with a mean weight of 6 oz.
Assume that a sample of 30 toothpaste tubes provides a sample
mean of 6.1 oz. The population standard deviation is believed
to be 0.2 oz. Perform a hypothesis test, at the 0.03 level of
significance, to help determine whether the filling process
should continue operating or be stopped and corrected.
H0:   = .03
Ha: ≠
x  0 6.1  6
z   2.74
 / n .2 / 30
For z = 2.74, cumulative probability = 0.9969
p–value = 2 x (1  0.9969) = 0.0062
44

Hypothesis Test: Exercise


The production line for Glow toothpaste is designed to fill
tubes with a mean weight of 6 oz.
Assume that a sample of 30 toothpaste tubes provides a sample
mean of 6.1 oz. The population standard deviation is believed
to be 0.2 oz. Perform a hypothesis test, at the 0.03 level of
significance, to help determine whether the filling process
should continue operating or be stopped and corrected.

H0: 
H : ≠
45

Hypothesis Test: Exercise


The production line for Glow toothpaste is designed to fill
tubes with a mean weight of 6 oz.
Assume that a sample of 30 toothpaste tubes provides a sample
mean of 6.1 oz. The population standard deviation is believed
to be 0.2 oz. Perform a hypothesis test, at the 0.03 level of
significance, to help determine whether the filling process
should continue operating or be stopped and corrected.

There is sufficient statistical evidence to infer that the


alternative hypothesis is true (i.e. the mean filling weight is
not 6 ounces).
H0: 
H : ≠
46

Modification
Command: 2 * MIN(p Value, 1 – p Value).
Z.TEST(array,x,[sigma])
•Array     Required. The array or range of data against which to
test x.
•x     Required. The value to test.
•Sigma     Optional. The population (known) standard deviation.

If omitted, the sample standard deviation is used.


Sample Average: 6.1
Sample Size: 30
Population Std Deviation: 0.2
At 97 % Level of Confidence
Ho: µ = 6
Ha: µ ≠ 6
47

Modification 27
Command: 2 * MIN(p Value, 1 – p Value). 25
Z.TEST(array,x,[sigma]) 23
26
•Array     Required. The array or range of data against which to 20
test x. 22
•x     Required. The value to test. 19
•Sigma     Optional. The population (known) standard deviation.
28
23
If omitted, the sample standard deviation is used.
16
27
Population Std Deviation: 4.394 23
At 95 % Level of Confidence 24
19
Ho: µ = 20 22
21
Ha: µ ≠ 20 28
12
17
26
48

Thank You!

???

You might also like