Advanced Engineering Statistics: Lectures For M.SC
Advanced Engineering Statistics: Lectures For M.SC
Advanced Engineering Statistics: Lectures For M.SC
Lectures for
M.Sc. Course
2022 – 2023
Introduction to Statistics
Statistics: is the branch of scientific inquiry that provides methods for organizing
and summarizing data, and for using information in the data to draw various
conclusions.
Descriptive Statistics: The part of statistics that deals with methods for
organization and summarization of data. Descriptive methods can be used with list of
all population members (a census), or when the data consists of a samples.
Population
Sample
Probability
Probability
Population Sample
Statistics
-2-
Overview
Statistical Package for the Social Sciences (SPSS) ()الحزمة اإلحصائية للعلوم االجتماعية
provides a powerful statistical-analysis and data-management system in a
graphical environment, using descriptive menus and simple dialog boxes
to do most of the work for the researchers.
-3-
Windows
There are a number of different types of windows in SPSS:
Data Editor. The Data Editor displays the contents of the data file. You
can create new data files or modify existing data files with the Data
Editor. If you have more than one data file open, there is a separate Data
Editor window for each data file.
Viewer. All statistical results, tables, and charts are displayed in the
Viewer. You can edit the output and save it for later use. A Viewer
window opens automatically the first time you run a procedure that
generates output.
Chart Editor. You can modify high-resolution charts and plots in chart
windows. You can change the colors, select different type fonts or sizes,
switch the horizontal and vertical axes, rotate 3-D scatterplots, and even
change the chart type.
Text Output Editor. Text output that is not displayed in pivot tables can
be modified with the Text Output Editor. You can edit the output and
change font characteristics (type, style, color, size).
Syntax Editor. You can paste your dialog box choices into a syntax
window, where your selections appear in the form of command syntax.
You can then edit the command syntax to use special features that are not
available through dialog boxes. You can save these commands in a file
for use in subsequent sessions.
-4-
Figure 1: Data Editor and Viewer
If you have more than one open Viewer window, output is routed to the
designated Viewer window. If you have more than one open Syntax
Editor window, command syntax is pasted into the designated Syntax
Editor window. The designated windows are indicated by a plus sign in
the icon in the title bar. You can change the designated windows at any
time.
The designated window should not be confused with the active window,
which is the currently selected window. If you have overlapping
windows, the active window appears in the foreground.
If you open a window, that window automatically becomes the active
window and the designated window.
Make the window that you want to designate the active window
(click anywhere in the window).
-5-
Click the Designate Window button on the toolbar (the plus sign
icon).
Or
Status Bar
The status bar at the bottom of each SPSS window provides the following
information:
Command status. For each procedure or command that you run, a case
counter indicates the number of cases processed so far. For statistical
procedures that require iterative processing, the number of iterations is
displayed.
Split File status. The message Split File on indicates that the data file has
been split into separate groups for analysis, based on the values of one or
more grouping variables.
Dialog Boxes
Most menu selections open dialog boxes. You use dialog boxes to select
variables and options for analysis. Dialog boxes for statistical procedures
and charts typically have two basic components:
Target variable list(s). One or more lists indicating the variables that
you have chosen for the analysis, such as dependent and independent
variable list
-6-
Variable Names and Variable Labels in Dialog Box
Lists
You can display either variable names or variable labels in dialog box
lists, and you can control the sort order of variables in source variable
lists.
-7-
Figure 3: Resized dialog box
OK. Runs the procedure. After you select your variables and choose any
additional specifications, click OK to run the procedure and close the
dialog box.
Paste. Generates command syntax from the dialog box selections and
pastes the syntax into a syntax window. You can then customize the
commands with additional features that are not available from dialog
boxes.
Reset. Deselects any variables in the selected variable list(s) and resets all
specifications in the dialog box and any subdialog boxes to the default
state.
Cancel. Cancels any changes that were made in the dialog box settings
since the last time it was opened and closes the dialog box. Within a
session, dialog box settings are persistent. A dialog box retains your last
set of specifications until you override them.
Selecting Variables
To select a single variable, simply select it in the source variable list and
drag and drop it into the target variable list. You can also use arrow
button to move variables from the source list to the target lists. If there is
only one target variable list, you can double-click individual variables to
-8-
move them from the source list to the target list. You can also select
multiple variables:
-9-
Figure 5: Variable information
Get your data into SPSS. You can open a previously saved SPSS data
file, you can read a spreadsheet, database, or text data file, or you can
enter your data directly in the Data Editor.
Select the variables for the analysis. The variables in the data file are
displayed in a dialog box for the procedure.
Run the procedure and look at the results. Results are displayed in the
Viewer.
- 01 -
Homework:
0.684, 2.54, 0.924, 3.13, 1.03, 0.598, 0.483, 3.52, 1.285, 2.65, 1.497
- 00 -
Data Editor
The Data Editor provides a convenient, spreadsheet-like method for creating and
editing data files.
The Data Editor window opens automatically when you start a session.
In both views, you can add, change, and delete information that is contained in the
data file.
Data View
Many of the features of Data View are similar to the features that are found in
spreadsheet applications. There are, however, several important distinctions:
Rows are cases. Each row represents a case or an observation. For example,
each individual respondent to a questionnaire is a case.
- 21 -
Columns are variables. Each column represents a variable or characteristic
that is being measured. For example, each item on a questionnaire is a
variable.
Cells contain values. Each cell contains a single value of a variable for a case.
The cell is where the case and the variable intersect. Cells contain only data
values. Unlike spreadsheet programs, cells in the Data Editor cannot contain
formulas.
The data file is rectangular. The dimensions of the data file are determined
by the number of cases and variables. You can enter data in any cell. If you
enter data in a cell outside the boundaries of the defined data file, the data
rectangle is extended to include any rows and/or columns between that cell
and the file boundaries. There are no “empty” cells within the boundaries of
the data file. For numeric variables, blank cells are converted to the system-
missing value. For string variables, a blank is considered a valid value.
Variable View
- 21 -
Variable View contains descriptions of the attributes of each variable in the data file.
In Variable View:
You can add or delete variables and modify attributes of variables, including the
following attributes:
Variable name
Data type
Number of digits or characters
Number of decimal places
Descriptive variable and value labels
User-defined missing values
Column width
Measurement level
All of these attributes are saved when you save the data file.
Variable Names
64 bytes typically means 64 characters in single-byte languages (for example, English, French,
German, Spanish, Italian, Hebrew, Russian, Greek, Arabic, and Thai)
32 characters in double-byte languages (for example, Japanese, Chinese, and Korean)
You can specify the level of measurement as scale (numeric data on an interval or ratio scale),
ordinal, or nominal. Nominal and ordinal data can be either string (alphanumeric) or numeric.
Nominal. A variable can be treated as nominal when its values represent categories with
no intrinsic ranking (for example, the department of the company in which an employee
works).
Examples of nominal variables include region, zip code, and religious affiliation.
- 21 -
Ordinal. A variable can be treated as ordinal when its values represent categories with
some intrinsic ranking (for example, levels of service satisfaction from highly dissatisfied
to highly satisfied). Examples of ordinal variables include attitude scores representing
degree of satisfaction or confidence and preference rating scores.
Scale. A variable can be treated as scale when its values represent ordered categories with
a meaningful metric, so that distance comparisons between values are appropriate.
Examples of scale variables include age in years and income in thousands of dollars.
Variable Type
Variable Type specifies the data type for each variable. By default, all new variables are assumed
to be numeric. You can use Variable Type to change the data type.
Numeric. A variable whose values are numbers. Values are displayed in standard numeric format.
The Data Editor accepts numeric values in standard format or in scientific notation.
Comma. A numeric variable whose values are displayed with commas delimiting every three
places. Values cannot contain commas to the right of the decimal indicator.
Dot. A numeric variable whose values are displayed with periods delimiting every three places
and with the comma as a decimal delimiter. Values cannot contain periods to the right of the
decimal indicator.
Scientific notation. A numeric variable whose values are displayed with an embedded E and a
signed power-of-10 exponent, for example, 123, 1.23E2, 1.23D2, 1.23E+2, and 1.23+2.
Date. A numeric variable whose values are displayed in one of several calendar-date or clock-
time formats.
Dollar. A numeric variable displayed with a leading dollar sign ($), commas delimiting every
three places, and a period as the decimal delimiter.
- 21 -
Custom currency. A numeric variable whose values are displayed in one of the custom currency
formats that you have defined on the Currency tab of the Options dialog box.
String. A variable whose values are not numeric and therefore are not used in calculations. The
values can contain any characters up to the defined length. (Also known as an alphanumeric
variable)
Value Labels
You can assign descriptive value labels for each value of a variable. This process is particularly
useful if your data file uses numeric codes to represent non-numeric categories (for example,
codes of 1 and 2 for male and female).
Click the button in the Values cell for the variable that you want to define.
For each value, enter the value and a label.
Click Add to enter the value label.
Click OK.
Column Width
You can specify a number of characters for the column width. Column widths can also be changed
in Data View by clicking and dragging the column borders.
Column formats affect only the display of values in the Data Editor. Changing the column width
does not change the defined width of a variable.
In Variable View, select the attribute cell that you want to apply to other variables.
From the menus choose:
Edit
Copy
- 21 -
Select the attribute cell(s) to which you want to apply the attribute. (You can select
multiple target variables.)
From the menus choose:
Edit
Paste
If you paste the attribute to blank rows, new variables are created with default attributes for all
attributes except the selected attribute.
Custom variable attributes can be displayed and edited in the Data Editor in Variable View.
- 21 -
Figure 8: Customize Variable View
In Variable View, right-click on the Labels or Values column and from the context menu
choose:
Spelling
or
In Variable View, from the menus choose:
Utilities
Spelling
Converting numeric or date into string. Numeric (for example, numeric, dollar, dot, or
comma) and date formats are converted to strings if they are pasted into a string variable cell.
Converting string into numeric or date. String values that contain acceptable characters for
the numeric or date format of the target cell are converted to the equivalent numeric or date value.
Converting date into numeric. Date and time values are converted to a number of seconds if
the target cell is one of the numeric formats (for example, numeric, dollar, dot, or comma).
Because dates are stored internally as the number of seconds since October 14, 1582, converting
dates to numeric values can yield some extremely large numbers. For example, the date 10/29/91
is converted to a numeric value of 12,908,073,600.
Converting numeric into date or time. Numeric values are converted to dates or times if the
value represents a number of seconds that can produce a valid date or time. For dates, numeric
values that are less than 86,400 are converted to the system-missing value.
- 21 -
Working with Multiple Data Sources
Multiple data sources can be open at the same time, making it easier to:
Any previously open data sources remain open and available for further use.
When you first open a data source, it automatically becomes the active dataset.
- 91 -
Copying and pasting an entire variable in Data View by selecting the variable name at the
top of the column pastes all of the data and all of the variable definition attributes for that
variable.
Copying and pasting variable definition attributes or entire variables in Variable View
pastes the selected attributes (or the entire variable definition) but does not paste any data
values.
Data Transformations
In an ideal situation, your raw data are perfectly suitable for the type of analysis you want to
perform, and any relationships between variables are either conveniently linear or neatly
orthogonal. Unfortunately, this is rarely the case. Preliminary analysis may reveal inconvenient
coding schemes or coding errors, or data transformations may be required in order to expose the
true relationship between variables.
You can perform data transformations ranging from simple tasks, such as collapsing categories for
analysis, to more advanced tasks, such as creating new variables based on complex equations and
conditional statements.
Computing Variables
Use the Compute dialog box to compute values for a variable based on numeric transformations of
other variables.
- 02 -
Figure 2: Compute Variable dialog box
To Compute Variables
You can paste functions or commonly used system variables by selecting a group from
the Function group list and double-clicking the function or variable in the Functions and
Special Variables list (or select the function or variable and click the arrow adjacent to the
Function group list). Fill in any parameters indicated by question marks (only applies to
functions).
The function group labeled All provides a listing of all available functions and system variables. A
brief description of the currently selected function or variable is displayed in a reserved area in the
dialog box.
- 09 -
Compute Variable: If Cases
The If Cases dialog box allows you to apply data transformations to selected subsets of cases,
using conditional expressions. A conditional expression returns a value of true, false, or missing
for each case.
Functions
Many types of functions are supported, including:
Arithmetic functions
Statistical functions
String functions
Date and time functions
Distribution functions
Random variable functions
Missing value functions
Scoring functions (SPSS Server only)
For more information and a detailed description of each function, type functions on the Index tab
of the Help system.
- 00 -
Missing Values in Functions
Functions and simple arithmetic expressions treat missing values in different ways. In the
expression:
(var1+var2+var3)/3
the result is missing if a case has a missing value for any of the three variables.
In the expression:
the result is missing only if the case has missing values for all three variables.
For statistical functions, you can specify the minimum number of arguments that must have
nonmissing values. To do so, type a period and the minimum number after the function name, as
in:
Transform
Count Values within Cases...
Optionally, you can define a subset of cases for which to count occurrences of values.
The If Cases dialog box allows you to count occurrences of values for a selected subset of cases,
using conditional expressions. A conditional expression returns a value of true, false, or missing
for each case.
- 02 -
Figure 4: Count Occurrences If Cases dialog box
Transform
- 02 -
Recode into Same Variables...
Select the variables you want to recode. If you select multiple variables, they must be the
same type (numeric or string).
Click Old and New Values and specify how to recode values.
Optionally, you can define a subset of cases to recode. The If Cases dialog box for doing this is
the same as the one described for Count Occurrences.
Transform
Recode into Different Variables...
Select the variables you want to recode. If you select multiple variables, they must be the
same type (numeric or string).
Enter an output (new) variable name for each new variable and click Change.
Click Old and New Values and specify how to recode values.
Optionally, you can define a subset of cases to recode. The If Cases dialog box for doing this is
the same as the one described for Count Occurrences.
Missing observations can be problematic in analysis, and some time series measures cannot be
computed if there are missing values in the series. Sometimes the value for a particular
observation is simply not known.
Missing data at the beginning or end of a series pose no particular problem; they simply shorten
the useful length of the series. Gaps in the middle of a series (embedded missing data) can be a
much more serious problem. The extent of the problem depends on the analytical procedure you
are using.
Default new variable names are the first six characters of the existing variable used to create it,
followed by an underscore and a sequential number. For example, for the variable price, the new
variable name would be price_1.
- 02 -
Figure 6: Replace Missing Values dialog box
Transform
Replace Missing Values...
Select the estimation method you want to use to replace missing values.
Select the variable(s) for which you want to replace missing values.
Series mean. Replaces missing values with the mean for the entire series.
Mean of nearby points. Replaces missing values with the mean of valid surrounding values.
The span of nearby points is the number of valid values above and below the missing value used
to compute the mean.
Median of nearby points. Replaces missing values with the median of valid surrounding values.
The span of nearby points is the number of valid values above and below the missing value used
to compute the median.
Linear interpolation. Replaces missing values using a linear interpolation. The last valid value
before the missing value and the first valid value after the missing value are used for the
interpolation. If the first or last case in the series has a missing value, the missing value is not
replaced.
- 02 -
Means
The Means procedure calculates subgroup means and related univariate statistics for dependent
variables within categories of one or more independent variables. Optionally, you can obtain a
one-way analysis of variance, eta, and tests for linearity.
Example. Measure the average amount of fat absorbed by three different types of cooking oil,
and perform a one-way analysis of variance to see whether the means differ.
Statistics. Sum, number of cases, mean, median, grouped median, standard error of the mean,
minimum, maximum, range, variable value of the first category of the grouping variable, variable
value of the last category of the grouping variable, standard deviation, variance, kurtosis, standard
error of kurtosis, skewness, standard error of skewness, percentage of total sum, percentage of
total N, percentage of sum in, percentage of N in, geometric mean, and harmonic mean. Options
include analysis of variance, eta, eta squared, and tests for linearity R and R2.
Data. The dependent variables are quantitative, and the independent variables are categorical. The
values of categorical variables can be numeric or string.
Assumptions. Some of the optional subgroup statistics, such as the mean and standard deviation,
are based on normal theory and are appropriate for quantitative variables with symmetric
distributions. Robust statistics, such as the median, are appropriate for quantitative variables that
may or may not meet the assumption of normality. Analysis of variance is robust to departures
from normality, but the data in each cell should be symmetric. Analysis of variance also assumes
that the groups come from populations with equal variances. To test this assumption, use Levene’s
homogeneity-of-variance test, available in the One-Way ANOVA procedure.
- 72 -
Select one or more dependent variables.
Use one of the following methods to select categorical independent variables:
Select one or more independent variables. Separate results are displayed for each
independent variable.
Means Options
You can choose one or more of the subgroup statistics for the variables within each category of
each grouping variable.
T Tests
Three types of t tests are available:
Independent-samples t test (two-sample t test). Compares the means of one variable for two
groups of cases. Descriptive statistics for each group and Levene’s test for equality of variances
are provided, as well as both equal- and unequal-variance t values and a 95% confidence interval
for the difference in means.
Paired-samples t test (dependent t test). Compares the means of two variables for a single
group. This test is also for matched pairs or case-control study designs. The output includes
descriptive statistics for the test variables, the correlation between the variables, descriptive
statistics for the paired differences, the t test, and a 95% confidence interval.
One-sample t test. Compares the mean of one variable with a known or hypothesized value.
Descriptive statistics for the test variables are displayed along with the t test. A 95% confidence
- 72 -
interval for the difference between the mean of the test variable and the hypothesized test value is
part of the default output.
Independent-Samples T Test
The Independent-Samples T Test procedure compares means for two groups of cases. Ideally, for
this test, the subjects should be randomly assigned to two groups, so that any difference in
response is due to the treatment (or lack of treatment) and not to other factors.
Select one or more quantitative test variables. A separate t test is computed for each
variable.
Select a single grouping variable, and then click Define Groups to specify two codes for
the groups that you want to compare.
Optionally, click Options to control the treatment of missing data and the level of the
confidence interval.
- 72 -
For numeric grouping variables, define the two groups for the t test by specifying two values or a
cutpoint:
Use specified values. Enter a value for Group 1 and another value for Group 2. Cases
with any other values are excluded from the analysis. Numbers need not be integers (for
example, 6.25 and 12.5 are valid).
Cutpoint. Enter a number that splits the values of the grouping variable into two sets. All
cases with values that are less than the cutpoint form one group, and cases with values
that are greater than or equal to the cutpoint form the other group.
Paired-Samples T Test
The Paired-Samples T Test procedure compares the means of two variables for a single group.
The procedure computes the differences between values of the two variables for each case and
tests whether the average differs from 0.
- 03 -
Example 1: Scientists and engineers frequently wish to compare two different techniques for
measuring or determining the value of variable. In such situation, interest centers on testing
whether or not the mean difference in measurements is zero.
The following test results have been reported from a construction laboratory indicates the
compressive strength of a concrete mix evaluated by two different methods: cubes molds, and
cylinder molds. Use the t test at level 0.05 to see whether or not the true average difference in
measured compressive strength for the two methods is zero?
Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Cube strength 25.09 24.18 25.61 25.56 31.69 27.6 20.98 21.98 24.79 22.81 24.14 29.54 31.74 30.58
Cylinder strength 24.98 22.54 23.36 25.65 30.00 23.18 24.1 21.29 23.42 21.24 24.68 26.04 27.22 25.18
Solution:
Calculate the average difference between both two methods of compressive strength
measurements: testing by cubes, and testing by cylinders:
Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Cube strength 25.09 24.18 25.61 25.56 31.69 27.6 20.98 21.98 24.79 22.81 24.14 29.54 31.74 30.58
Cylinder strength 24.98 22.54 23.36 25.65 30.00 23.18 24.1 21.29 23.42 21.24 24.68 26.04 27.22 25.18
Difference 0.11 1.64 2.25 -0.09 1.69 4.42 -3.12 0.69 1.37 1.57 -0.54 3.5 4.52 5.4
Let μD = average difference between cube strength measurements and cylinder strength
measurements.
and:
dˉ = = 1.67
Ho: μD = 0
Versus
Ha: μD ≠ 0
The test is therefore two tailed. From the t table t 0.025, 13 = 2.16,
The sample average difference and sample standard deviation of the difference are:
dˉ=1.67
and D = 2.282
So,
t paired = = = 2.74
⁄
√
- 03 -
One-Sample T Test
The One-Sample T Test procedure tests whether the mean of a single variable differs from a
specified constant.
Select one or more variables to be tested against the same hypothesized value.
Enter a numeric test value against which each sample mean is compared.
Optionally, click Options to control the treatment of missing data and the level of the
confidence interval.
- 07 -
Confidence Interval. By default, a 95% confidence interval for the difference between the mean
and the hypothesized test value is displayed. Enter a value between 1 and 99 to request a different
confidence level.
Example 2: In order to test concrete mixture performance for a new type of cement type used, a
concrete producer selected six set of cubes samples to test. The compressive strength values for
the six specimens were 27.2, 29.3, 31.5, 28.7, 30.2, and 29.6 MPa. The concrete producer wishes
to say that concrete of this type average at least 30 MPa on such test. Dose the sample data
contradict the validity of this claim?
Solution:
The producer will claim that μ ≥ 30 MPa unless the data strongly suggests otherwise.
So, the relevant hypotheses are: Ho: μ = 30 MPa versus Ha: μ < 30 MPa.
t=
⁄
√
𝓢2 = 2.086
𝓢 = 1.44
and
t= = = -0.99
⁄
√
Since t= -.99 is not in the rejection region (t > -2.015), Ho is not rejected at level 0.05.
The claim that the producer wishes to make is not contradicted by the data.
- 00 -
Homework:
Q1: A certain type of brick is being considered for use in a particular construction
project. The brick will be used unless sample evidence strongly suggests that the true
average compressive strength is below 22 MPa. A random sample of 9 bricks is
selected and each is subjected to a compressive strength test. The resulting sample
average compressive strength and sample standard deviation of compressive strength
are 21.3 MPa and 1.1 MPa respectively. State the relevant hypotheses and carry out a
test to reach a decision using level of significance (α) a) 0.05 b) 0.01?
Note:
- 03 -
T-Test
Group Statistics
Equality of
95% Confidence
Interval of the
Difference
VAR00010 Equal variances .262 .613 2.543 24 .018 .13263 .05216 .02497 .24028
assumed
not assumed
- 53 -
Previously,
- 53 -
- 53 -
- 53 -
- 53 -
- 04 -
- 04 -
- 04 -
- 05 -
- 00 -
Example:-
A construction engineer has four different suppliers of this aggregate material who
obtain their raw materials from four different locations. The engineer would like to
assess whether the aggregates from the four different locations have different values
of resilient modulus. An experiment is performed by randomly selecting ten samples
of aggregate from each of the four locations and measuring their resilient modulus.
The resilient data set is given below:
- 03 -
H.W
- 03 -
1
- 25 -
- 25 -
- 25 -
2
- 22 -
- 25 -
- 25 -
- 25 -
- 25 -
- 56 -
- 56 -
- 55 -
MULTIPLE LINEAR REGRESSION MODEL
1.0 Introduction
Many applications of regression analysis involve situations in which there are
more than one regressor variable. A regression model that contains more than one
regressor variable is called a multiple regression model.
As an example, suppose that the effective life of a cutting tool depends on the
cutting speed and the tool angle. A multiple regression model that might describe this
relationship is
[1]
where Y represents the tool life, x1 represents the cutting speed, x2 represents the tool
angle, and ε is a random error term. This is a multiple linear regression model with
two regressors. The term linear is used because Equation 1 is a linear function of the
unknown parameters β0, β1, and β2.
[2]
is called a multiple linear regression model with k regressor variables. The parameters
βj, j= 0, 1, ….. k, are called the regression coefficients.
- 36 -
Models those are more complex in structure than Equation 2 may often still be
analyzed by multiple linear regression techniques.
For example, consider the cubic polynomial model in one regressor variable.
[3]
[4]
which is a linear regression model. Figure 2(a) and (b) shows the three-dimensional
plot of the regression model
and the corresponding two-dimensional contour plot. Notice that, although this model
is a linear regression model, the shape of the surface that is generated by the model is
not linear.
- 36 -
EXAMPLE 1:
Table 1 below shows data on pull strength of a wire bond in a semiconductor
manufacturing process, wire length, and die height,
1
:
:
:
- 36 -
A three-dimensional scatter plot of the data is presented in Fig. 4a.
4, a
Figure 4, b: Matrix of scatter plots (from SPSS) for the wire bond pull strength data in
.
where Y= pull strength, x1= wire length, and x2= die height.
- 33 -
1
- 36 -
This equation can be used to predict pull strength for pairs of values of the regressor
variables wire length (x1) and die height (x2). For example, the first observation has x1= 2
and x2= 50, and the fitted value is
The corresponding observed value is y1= 9.95. The residual corresponding to the first
observation is
Estimating σ2
Just as in simple linear regression, it is important to estimate σ2, the variance
of the error term ε, in a multiple regression model.
- 36 -
Homework 1:
A study was performed on wear of a bearing y and its relationship to x1 oil viscosity
and x2 load. The following data were obtained.
- 36 -
2.0 Categorical Regressors and Indicator Variables
1: 1
- 67 -
Since the type of cutting tool likely affects the surface finish, we will fit the
model
where Y is the surface finish, x1 is the lathe speed in revolutions per minute, and x2 is
an indicator variable denoting the type of cutting tool used; that is,
The parameters in this model may be easily interpreted. If x2=0, the model becomes
which is a straight-line model with slope β1and intercept β0. However, if x2=1, the
model becomes
which is a straight-line model with slope β1 and intercept β0 + β2. Thus, the model
implies that surface finish is linearly related to lathe speed and that the slope β1 does
not depend on the type of cutting tool used. However, the type of cutting tool does
affect the intercept, and β2 indicates the change in the intercept associated with a
change in tool type from 302 to 416.
- 67 -
-72-
;
-73-
-74-
-75-
-76-
-77-