Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
25 views

Normal Distribution and Regression Notes

To predict the final exam grade of a student with a midterm grade of 60, we use the regression equation: y = 0.8x + 34.2 Plugging in x = 60, we get: y = 0.8(60) + 34.2 = 48.2 + 34.2 = 82.4 Therefore, the predicted final exam grade is 82.4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Normal Distribution and Regression Notes

To predict the final exam grade of a student with a midterm grade of 60, we use the regression equation: y = 0.8x + 34.2 Plugging in x = 60, we get: y = 0.8(60) + 34.2 = 48.2 + 34.2 = 82.4 Therefore, the predicted final exam grade is 82.4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Data Management

Math in the Modern World Module 5


“Statistics is the Grammar of Science.”

—Karl Pearson—
Normal Linear Regression
Distributions and Correlation
Frequency Distributions Linear Regression
Standard Normal Distribution Linear Correlation Coefficient
01
Normal Distributions
Normal Distributions
and the Empirical Rule
Let us recall…
• Statistics consists of a body of methods for
collecting and analyzing data.
• You can use statistical methods to determine what
kind and how much sample data you need to gather,
how you should organize and summarize these data,
and how you can analyze them and make
conclusions from them.
Let us recall…
• There are three important components for the
success of any statistical research study – design,
description, and inference.
1. Design – the researcher must know the
appropriate statistical methods to carry out a
plan, implement rules, and evaluate experiments
properly.
Let us recall…
2. Description – the researcher must know how to
guide readers in understanding the methods of a
research and in analyzing its results.
3. Inference – the researcher must use the results
of data analysis to make good predictions and
correct decisions.
The Normal Distribution
• The normal distribution is perhaps the most
commonly used continuous probability distribution in
the entire field of statistics.
• It provides a good model for most continuous
populations.
• It has a bell-shaped curve also known as the
normal curve.
Empirical Rules for Normal Distribution

• For any normal population, it has the following


characteristics:
1. About 68.3% of the population falls within the interval 𝜇 ± 𝜎;
2. About 95.4% of the population falls within the interval 𝜇 ± 2𝜎;
3. About 99.7% of the population falls within the interval 𝜇 ± 3𝜎;

where 𝜇 is the population mean and 𝜎 is the population standard


deviation.
Empirical Rules for Normal Distribution

• This can also be stated as follows:


In a normal distribution, approximately
• 68.3% of the data lie within 1 standard deviation of the mean.
• 95.4% of the data lie within 2 standard deviations of the mean.
• 99.7% of the data lie within 3 standard deviations of the mean.
Empirical Rules for Normal Distribution
Empirical Rules for Normal Distribution - Example
Empirical Rules for Normal Distribution - Example
Empirical Rules for Normal Distribution - Example
Empirical Rules for Normal Distribution - Example
Empirical Rules for Normal Distribution - Example
Empirical Rules for Normal Distribution - Example
Empirical Rules for Normal Distribution - Example
Empirical Rules for Normal Distribution - Example
Empirical Rules for Normal Distribution - Example
Empirical Rules for Normal Distribution - Example
Empirical Rules for Normal Distribution – Skills Check Up
A vegetable distributor knows that during the month of August, the
weights of its tomatoes are normally distributed with a mean of 0.61
lb and a standard deviation of 0.15 lb.

a. What percent of the tomatoes weigh less than 0.76 lb?


b. In a shipment of 6000 tomatoes, how many tomatoes can be
expected to weigh more than 0.31 lb?
c. In a shipment of 4500 tomatoes, how many tomatoes can be
expected to weigh from 0.31 lb to 0.91 lb?
The Standard Normal
Distribution
The Standard Normal Distribution
It is often helpful to convert data values x to z-scores,
as we did in the previous section by using the z-score
formulas:
The Standard Normal Distribution
• If the original distribution of x values is a normal
distribution, then the corresponding distribution of z-
scores will also be a normal distribution.
• This normal distribution of z-scores is called the
standard normal distribution.
The Standard Normal Distribution
See Figure 13.7.
It has a mean of 0 and a
standard deviation of 1.
The Standard Normal Distribution

We take note:

Because the standard normal distribution is symmetrical about the mean


of 0, we can also use Table 13.10 to find the area of a region that is
located to the left of the mean. This process is explained in the next
Example.
The Standard Normal Distribution - Example
The Standard Normal Distribution - Example
The Standard Normal Distribution - Example
The Standard Normal Distribution
In Figure 13.10, the region to the right
of 𝒛 = 𝟎. 𝟖𝟐 is called a tail region.

A tail region is a region of the standard


normal distribution to the right of a
positive z-value or to the left of a
negative z-value.

To find the area of a tail region, we


subtract the entry in Table 13.10 from
0.500. This procedure is illustrated in
the next example.
The Standard Normal Distribution - Example
The Standard Normal Distribution - Example
The Standard Normal Distribution - Example
The Standard Normal Distribution – Skills Check Up
1. Find the area of the standard normal distribution to the left of:
a) 𝑧 = −1.25
b) 𝑧 = −2.53

2. Find the area of the standard normal distribution to the right of:
a) 𝑧 = 2.5
b) 𝑧 = 0.24
The Standard Normal Distribution

Because the area of a portion of the standard normal distribution can be


interpreted as a percentage of the data or as a probability that the variable
lies in an interval, we can use the standard normal distribution to solve
many application problems.
Solving an Application - Example
Solving an Application - Example
Solving an Application - Example
Solving an Application - Example
Solving an Application - Example
Solving an Application - Example
Solving an Application - Example
Solving an Application – Skills Check Up
A study of the careers of professional football players shows that the
lengths of their careers are nearly normally distributed, with a mean
of 6.1 years and a standard deviation of 1.8 years.

a. What percent of professional football players have a career of


more than 9 years?
b. If a professional football player is chosen at random, what is the
probability that the player will have a career of between 3 and 4
years?
02
Linear Regression
and Correlation
Linear Regression

• Linear Regression is the most basic and commonly


used predictive analysis in the field of statistics and
time analysis.
• Regression estimates are used to describe data and
to explain the nature of relationship among the
variables involved.
Correlation Analysis and Regression Analysis
• Correlation Analysis is one statistical technique used to study
causal relationships among variables.
• Regression Analysis is used to determine the nature of the
relationship.
• In a two-variable linear regression or simple linear regression,
a positive relationship occurs when the two variables increase at
the same time while a negative relationship occurs when one
variable increases and the other variable decreases, or vice
versa.
Linear Correlation Coefficient
• To determine if there exists a linear relationship between two
variables, use correlation coefficient 𝑟 whose values range from
− 1 𝑡𝑜 1.

Values of r Interpretation

Close to +1 Strong positive relationship

Close to 0 Weak or no relationship

Close to −1 Strong negative linear relationship


Linear Correlation Coefficient
• Rule of thumb for interpreting the size of a correlation coefficient
Linear Correlation Coefficient
Figure 13.19 shows
some scatter diagrams
along with the type of
linear correlation that
exists between the 𝑥 and
𝑦 variables.

The closer |𝑟| is to 1, the


stronger the linear
relationship between the
variables.
Linear Correlation Coefficient
The value of 𝑟 may be obtained by the least squares method.

𝑆𝑆𝑥𝑦
𝑟=
𝑆𝑆𝑥𝑥 𝑆𝑆𝑦𝑦

where
2 σ𝑥 2 2 σ𝑦 2 σ𝑥σ𝑦
𝑆𝑆𝑥𝑥 = σ 𝑥 − , 𝑆𝑆𝑦𝑦 = σ 𝑦 − , 𝑆𝑆𝑥𝑦 = σ 𝑥𝑦 − ,
𝑛 𝑛 𝑛

and 𝑛 is the sample size and “𝑆𝑆” stands for sum of squares.
Linear Correlation Coefficient
We note that:

• The linear correlation coefficient indicates the strength of a linear


relationship between two variables; however, it does not
indicate the presence of a cause-and-effect relationship.

• The square of 𝑟 is called the coefficient of determination which


describes the degree of variability between the dependent
variable 𝑦 and the independent variable 𝑥.
Least-Squares Line
The line corresponding to a given set of points is called the least-
squares line of the linear regression model.

𝑦 = 𝑚𝑥 + 𝑏

where
𝑆𝑆𝑥𝑦
𝑏 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 =
𝑆𝑆𝑥𝑥
𝑚 𝑠𝑙𝑜𝑝𝑒 = 𝑦ത − 𝑏𝑥ҧ
𝑥ҧ = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑥
𝑦ത = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑦
Linear Correlation Coefficient
In your work with applications that involve the linear correlation coefficient 𝑟, it is important to
remember the following properties of 𝑟.
Using Excel – Example
The grades of 10 senior high school students on a midterm report
𝑥 and on the final examination 𝑦 are as follows:

x 78 72 50 99 68 94 72 81 96 68

y 86 56 65 99 70 84 80 55 99 70

a. Determine the correlation coefficient r.


b. Determine the linear regression line.
c. Predict the final examination grade of a student whose
midterm grade is 60.
Solution

Using Excel, we
obtain the following:
Solution

We now have the


value of 𝑟.
Solution

We solve for the


values of the slope
and the intercept.
Solution

We solve for the


values of the slope
and the intercept.
Solution

We solve for the


values of the slope
and the intercept.
Solution

We solve for the


values of the slope
and the intercept.
Answers (a and c)

With 𝒓 = 𝟎. 𝟖𝟕 , there exists a highly


positive linear relationship between the
midterm report and the final
examination grade. The relationship is
given by
𝒚 = 𝒎𝒙 + 𝒃
𝒚 = 𝟎. 𝟕𝟏𝟓𝟖𝒙 + 𝟐𝟑. 𝟕𝟏𝟑𝟓

Thus, if 𝑥 = 60, the final examination


grade is estimated to be

𝒚 = 𝟎. 𝟕𝟏𝟓𝟖(𝟔𝟎) + 𝟐𝟑. 𝟕𝟏𝟑𝟓 ≈ 𝟔𝟕


Line Chart
2. Insert:
Scatter Plot

1. Select the
data set.
Line Chart Edit Chart Title
Right Click on the
Data and Choose
Line Chart “Add Trendline”
Line Chart
Choose Linear
Line Chart
Set the Dash Type
Line Chart
Click the “+” symbol to edit Chart Elements
Line Chart

Label each axis


Line Chart (Answer b)

You might also like