Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
40 views8 pages

ASS#1-FINALS Doromal

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 8

Doromal, Ana Mae C.

22-1-00246
TCIE 2-2 TCCB 122
ASSIGNMENT #1- FINALS

❖ Empirical Models
• Many problems in engineering and science involve exploring the relationships
between two or more variables.
• Regression analysis is a statistical technique that is very useful for these types of
problems.
• For example, in a chemical process, suppose that the yield of the product is related
to the process-operating temperature.
• Regression analysis can be used to build a model to predict yield at a given
temperature level.

Based on the scatter diagram, it is probably reasonable to assume that the mean of the random
variable Y is related to x by the following straight-line relationship:

where the slope and intercept of the line are called regression coefficients. The simple linear
regression model is given by,

by where ℇ is the random error term.


We think of the regression model as an empirical model. Suppose that the mean and variance of
ℇ are 0 and ϭ^2, respectively, then,

The variance of Y given x is,

• The true regression model is a line of mean values:

where β1 can be interpreted as the change in the mean of Y for a unit change in x.
• Also, the variability of Y at a particular value of x is determined by the error variance, ϭ^2.
• This implies there is a distribution of Y-values at each x and that the variance of this
distribution is the same at each x.

❖ Correlation
• defined as the statistical association between two variables.
• A correlation exists between two variables when one of them is related to the other
in some way. A scatterplot is the best place to start. A scatterplot (or scatter diagram)
is a graph of the paired (x, y) sample data with a horizontal x-axis and a vertical y-
axis. Each individual (x, y) pair is plotted as a single point.
In this example, we plot bear chest girth (y) against bear length (x). When examining a scatterplot,
we should study the overall pattern of the plotted points. In this example, we see that the value for
chest girth does tend to increase as the value of length increases. We can see an upward slope and
a straight-line pattern in the plotted data points.
A scatterplot can identify several different types of relationships between two variables.
• A relationship has no correlation when the points on a scatterplot do not show any pattern.
• A relationship is non-linear when the points on a scatterplot follow a pattern but not a
straight line.
• A relationship is linear when the points on a scatterplot follow a somewhat straight-line
pattern. This is the relationship that we will examine.
Linear relationships can be either positive or negative. Positive relationships have points that
incline upwards to the right. As x values increase, y values increase. As x values decrease, y values
decrease. For example, when studying plants, height typically increases as diameter increases.

Negative relationships have points that decline downward to the right. As x values increase, y
values decrease. As x values decrease, y values increase. For example, as wind speed increases,
wind chill temperature decreases.

Non-linear relationships have an apparent pattern, just not linear. For example, as age increases
height increases up to a point then levels off after reaching a maximum height.
When two variables have no relationship, there is no straight-line relationship or non-linear
relationship. When one variable changes, it does not influence the other variable.

➢ Linear Correlation Coefficient


Because visual examinations are largely subjective, we need a more precise and objective measure
to define the correlation between the two variables. To quantify the strength and direction of the
relationship between two variables, we use the linear correlation coefficient:

where x̄ and sx are the sample mean and sample standard deviation of the x’s, and ȳ and sy are the
mean and standard deviation of the y’s. The sample size is n.

An alternate computation of the correlation coefficient is:

where
The linear correlation coefficient is also referred to as Pearson’s product moment correlation
coefficient in honor of Karl Pearson, who originally developed it. This statistic numerically
describes how strong the straight-line or linear relationship is between the two variables and the
direction, positive or negative.

The properties of “r”:

• It is always between -1 and +1.


• It is a unitless measure so “r” would be the same value whether you measured the two
variables in pounds and inches or in grams and centimeters.
• Positive values of “r” are associated with positive relationships.
• Negative values of “r” are associated with negative relationships.

Correlation is not causation. Just because two variables are correlated does not mean that one
variable causes another variable to change.

Examine these next two scatterplots. Both data sets have an r = 0.01, but they are very different.
Plot 1 shows little linear relationship between x and y variables. Plot 2 shows a strong non-linear
relationship. Pearson’s linear correlation coefficient only measures the strength and direction of a
linear relationship. Ignoring the scatterplot could result in a serious mistake when describing the
relationship between two variables.
When you investigate the relationship between two variables, always begin with a scatterplot. This
graph allows you to look for patterns (both linear and non-linear). The next step is to quantitatively
describe the strength and direction of the linear relationship using “r”. Once you have established
that a linear relationship exists, you can take the next step in model building.

❖ Simple Linear Regression

Once we have identified two variables that are correlated, we would like to model this relationship.
We want to use one variable as a predictor or explanatory variable to explain the other variable,
the response or dependent variable. To do this, we need a good relationship between our two
variables. The model can then be used to predict changes in our response variable. A strong
relationship between the predictor variable and the response variable leads to a good model.

A simple linear regression model is a mathematical equation that allows us to predict a response
for a given predictor value.
Our model will take the form of ŷ = b 0 + b1x where b0 is the y-intercept, b1 is the slope, x is the
predictor variable, and ŷ an estimate of the mean value of the response variable for any value of
the predictor variable.
The y-intercept is the predicted value for the response (y) when x = 0. The slope describes the
change in y for each one-unit change in x. Let’s look at this example to clarify the interpretation
of the slope and intercept.

Example 1
A hydrologist created a model to predict the volume flow for a stream at a bridge crossing with a
predictor variable of daily rainfall in inches.
Ans
ŷ = 1.6 + 29x. The y-intercept of 1.6 can be interpreted this way: On a day with no rainfall, there
will be 1.6 gal. of water/min. flowing in the stream at that bridge crossing. The slope tells us that
if it rained one inch that day the flow in the stream would increase by an additional 29 gal./min. If
it rained 2 inches that day, the flow would increase by an additional 58 gal./min.

Example 2
What would be the average stream flow if it rained 0.45 inches that day?
Ans
ŷ = 1.6 + 29x = 1.6 + 29(0.45) = 14.65 gal./min.

The Least-Squares Regression Line (shortcut equations)

The equation is given by ŷ = b 0 + b1 x

where is the slope and b0 = ŷ – b1 x̄ is the y-intercept of the regression line.

An alternate computational equation for slope is:

This simple model is the line of best fit for our sample data. The regression line does not go through
every point; instead, it balances the difference between all data points and the straight-line model.
The difference between the observed data value and the predicted value (the value on the straight
line) is the error or residual. The criterion to determine the line that best describes the relation
between two variables is based on the residuals.

Residual = Observed – Predicted

For example, if you wanted to predict the chest girth of a black bear given its weight, you could
use the following model.
Chest girth = 13.2 +0.43 weight

The predicted chest girth of a bear that weighed 120 lb. is 64.8 in.

Chest girth = 13.2 + 0.43(120) = 64.8 in.

But a measured bear chest girth (observed value) for a bear that weighed 120 lb. was 62.1 in.

The residual would be 62.1 – 64.8 = -2.7 in.

A negative residual indicates that the model is over-predicting. A positive residual indicates that
the model is under-predicting. In this instance, the model over-predicted the chest girth of a bear
that weighed 120 lb.

This random error (residual) considers all unpredictable and unknown factors that are not included
in the model. An ordinary least squares regression line minimizes the sum of the squared errors
between the observed and predicted values to create a best fitting line. The differences between
the observed and predicted values are squared to deal with the positive and negative differences.

You might also like