Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Advanced Engineering Statistics: Lectures For M.SC

Download as pdf or txt
Download as pdf or txt
You are on page 1of 77

‫الجـــامعـــــة التــكنــولــوجـــيــة‬

‫قســم الهندســــــة الـــمـــدنيــــة‬


‫فرع هندسة البناء وإدارة المشاريع االنشائية‬
‫الدراســـات العليا – الماجســـتير‬

‫‪Advanced Engineering Statistics‬‬

‫‪Lectures for‬‬
‫‪M.Sc. Course‬‬

‫‪Prof. Dr.‬‬ ‫‪Asst. Prof. Dr.‬‬


‫‪Maan S. Hassan‬‬ ‫‪Waleed A. Abbas‬‬

‫‪2022 – 2023‬‬
Introduction to Statistics
Statistics: is the branch of scientific inquiry that provides methods for organizing
and summarizing data, and for using information in the data to draw various
conclusions.

Descriptive Statistics: The part of statistics that deals with methods for
organization and summarization of data. Descriptive methods can be used with list of
all population members (a census), or when the data consists of a samples.

Inferential Statistics: When the data is a sample and the objective is to go


beyond the sample to draw conclusions about the population based on sample
information.

Population

Sample

Probability

Link between statistics and software

Probability
Population Sample

Statistics

-2-
Overview
Statistical Package for the Social Sciences (SPSS) (‫)الحزمة اإلحصائية للعلوم االجتماعية‬
provides a powerful statistical-analysis and data-management system in a
graphical environment, using descriptive menus and simple dialog boxes
to do most of the work for the researchers.

In addition to the simple point-and-click interface for statistical analysis,


SPSS provides:

Data Editor. The Data Editor is a versatile spreadsheet-like system for


defining, entering, editing, and displaying data.

Viewer. The Viewer makes it easy to browse your results, selectively


show and hide output, change the display order results, and move
presentation-quality tables and charts to and from other applications.

Multidimensional pivot tables. Your results come alive with


multidimensional pivot tables. Explore your tables by rearranging rows,
columns, and layers. Uncover important findings that can get lost in
standard reports. Compare groups easily by splitting your table so that
only one group is displayed at a time.

High-resolution graphics. High-resolution, full-color pie charts, bar


charts, histograms, scatterplots, 3-D graphics, and more are included as
standard features.

Database access. Retrieve information from databases by using the


Database Wizard instead of complicated SQL queries.

Data transformations. Transformation features help get your data ready


for analysis. You can easily subset data; combine categories; add,
aggregate, merge, split, and transpose files; and more.

Others. Online Help, Command language

-3-
Windows
There are a number of different types of windows in SPSS:

Data Editor. The Data Editor displays the contents of the data file. You
can create new data files or modify existing data files with the Data
Editor. If you have more than one data file open, there is a separate Data
Editor window for each data file.

Viewer. All statistical results, tables, and charts are displayed in the
Viewer. You can edit the output and save it for later use. A Viewer
window opens automatically the first time you run a procedure that
generates output.

Pivot Table Editor. Output that is displayed in pivot tables can be


modified in many ways with the Pivot Table Editor. You can edit text,
swap data in rows and columns, add color, create multidimensional
tables, and selectively hide and show results.

Chart Editor. You can modify high-resolution charts and plots in chart
windows. You can change the colors, select different type fonts or sizes,
switch the horizontal and vertical axes, rotate 3-D scatterplots, and even
change the chart type.

Text Output Editor. Text output that is not displayed in pivot tables can
be modified with the Text Output Editor. You can edit the output and
change font characteristics (type, style, color, size).

Syntax Editor. You can paste your dialog box choices into a syntax
window, where your selections appear in the form of command syntax.
You can then edit the command syntax to use special features that are not
available through dialog boxes. You can save these commands in a file
for use in subsequent sessions.

-4-
Figure 1: Data Editor and Viewer

Designated Window versus Active Window

If you have more than one open Viewer window, output is routed to the
designated Viewer window. If you have more than one open Syntax
Editor window, command syntax is pasted into the designated Syntax
Editor window. The designated windows are indicated by a plus sign in
the icon in the title bar. You can change the designated windows at any
time.

The designated window should not be confused with the active window,
which is the currently selected window. If you have overlapping
windows, the active window appears in the foreground.
If you open a window, that window automatically becomes the active
window and the designated window.

Changing the Designated Window

 Make the window that you want to designate the active window
(click anywhere in the window).

-5-
 Click the Designate Window button on the toolbar (the plus sign
icon).
Or

 From the menus choose:


Utilities
Designate Window

Status Bar
The status bar at the bottom of each SPSS window provides the following
information:

Command status. For each procedure or command that you run, a case
counter indicates the number of cases processed so far. For statistical
procedures that require iterative processing, the number of iterations is
displayed.

Filter status. If you have selected a random sample or a subset of cases


for analysis, the message Filter on indicates that some type of case
filtering is currently in effect and not all cases in the data file are included
in the analysis.

Weight status. The message Weight on indicates that a weight variable is


being used to weight cases for analysis.

Split File status. The message Split File on indicates that the data file has
been split into separate groups for analysis, based on the values of one or
more grouping variables.

Dialog Boxes

Most menu selections open dialog boxes. You use dialog boxes to select
variables and options for analysis. Dialog boxes for statistical procedures
and charts typically have two basic components:

Source variable list. A list of variables in the active dataset. Only


variable types that are allowed by the selected procedure are displayed in
the source list. Use of short string and long string
variables is restricted in many procedures.

Target variable list(s). One or more lists indicating the variables that
you have chosen for the analysis, such as dependent and independent
variable list
-6-
Variable Names and Variable Labels in Dialog Box
Lists

You can display either variable names or variable labels in dialog box
lists, and you can control the sort order of variables in source variable
lists.

 To control the default display attributes of variables in source lists,


choose Options on the Edit menu.

 To change the source variable list display attributes within a dialog


box, right-click on any variable in the source list and select the
display attributes from the context menu. You can display either
variable names or variable labels (names are displayed for any
variables without defined labels), and you can sort the source list
by file order, alphabetical order, or measurement level. For more
information on measurement level.

Figure 2: Variable labels displayed in a dialog box

Resizing Dialog Boxes


You can resize dialog boxes just like windows, by clicking and dragging
the outside borders or corners. For example, if you make the dialog box
wider, the variable lists will also be wider.

-7-
Figure 3: Resized dialog box

Dialog Box Controls


There are five standard controls in most dialog boxes:

OK. Runs the procedure. After you select your variables and choose any
additional specifications, click OK to run the procedure and close the
dialog box.

Paste. Generates command syntax from the dialog box selections and
pastes the syntax into a syntax window. You can then customize the
commands with additional features that are not available from dialog
boxes.

Reset. Deselects any variables in the selected variable list(s) and resets all
specifications in the dialog box and any subdialog boxes to the default
state.

Cancel. Cancels any changes that were made in the dialog box settings
since the last time it was opened and closes the dialog box. Within a
session, dialog box settings are persistent. A dialog box retains your last
set of specifications until you override them.

Help. Provides context-sensitive Help. This control takes you to a Help


window that contains information about the current dialog box.

Selecting Variables
To select a single variable, simply select it in the source variable list and
drag and drop it into the target variable list. You can also use arrow
button to move variables from the source list to the target lists. If there is
only one target variable list, you can double-click individual variables to

-8-
move them from the source list to the target list. You can also select
multiple variables:

 To select multiple variables that are grouped together in the


variable list, click the first variable and then Shift-click the last
variable in the group.

 To select multiple variables that are not grouped together in the


variable list, click the first variable, then Ctrl-click the next
variable, and so on (Macintosh: Command-click).

Data Type, Measurement Level, and Variable List


Icons
The icons that are displayed next to variables in dialog box lists provide
information about the variable type and measurement level.

Figure 4: Measurement level

Getting Information about Variables in Dialog Boxes


 Right-click a variable in the source or target variable list.

 Choose Variable Information.

-9-
Figure 5: Variable information

Basic Steps in Data Analysis

Analyzing data with SPSS is easy. All you have to do is:

Get your data into SPSS. You can open a previously saved SPSS data
file, you can read a spreadsheet, database, or text data file, or you can
enter your data directly in the Data Editor.

Select a procedure. Select a procedure from the menus to calculate


statistics or to create a chart.

Select the variables for the analysis. The variables in the data file are
displayed in a dialog box for the procedure.

Run the procedure and look at the results. Results are displayed in the
Viewer.

- 01 -
Homework:

Strength is an important characteristics of materials used in precast


ferrocement elements. Each of the 11 elements was subjected to severe
stress test and the maximum width of the resulting cracks was recorded
and shown below.
1. Compute the descriptive statistics (i.e. S.D, Variance, range,
skewness, mean, median, and mode)?
2. Draw the frequency charts (histogram style with normal curve)?

0.684, 2.54, 0.924, 3.13, 1.03, 0.598, 0.483, 3.52, 1.285, 2.65, 1.497

- 00 -
Data Editor

The Data Editor provides a convenient, spreadsheet-like method for creating and
editing data files.

The Data Editor window opens automatically when you start a session.

The Data Editor provides two views of your data:


 Data View. This view displays the actual data values or defined value labels.
 Variable View. This view displays variable definition information, including
defined variable and value labels, data type (for example, string, date, or
numeric).

In both views, you can add, change, and delete information that is contained in the
data file.

Data View

Figure 1: Data View

Many of the features of Data View are similar to the features that are found in
spreadsheet applications. There are, however, several important distinctions:
 Rows are cases. Each row represents a case or an observation. For example,
each individual respondent to a questionnaire is a case.

- 21 -
 Columns are variables. Each column represents a variable or characteristic
that is being measured. For example, each item on a questionnaire is a
variable.
 Cells contain values. Each cell contains a single value of a variable for a case.
The cell is where the case and the variable intersect. Cells contain only data
values. Unlike spreadsheet programs, cells in the Data Editor cannot contain
formulas.
 The data file is rectangular. The dimensions of the data file are determined
by the number of cases and variables. You can enter data in any cell. If you
enter data in a cell outside the boundaries of the defined data file, the data
rectangle is extended to include any rows and/or columns between that cell
and the file boundaries. There are no “empty” cells within the boundaries of
the data file. For numeric variables, blank cells are converted to the system-
missing value. For string variables, a blank is considered a valid value.

Variable View

Figure 2: Variable View

- 21 -
Variable View contains descriptions of the attributes of each variable in the data file.
In Variable View:

 Rows are variables.


 Columns are variable attributes.

You can add or delete variables and modify attributes of variables, including the
following attributes:

 Variable name
 Data type
 Number of digits or characters
 Number of decimal places
 Descriptive variable and value labels
 User-defined missing values
 Column width
 Measurement level

All of these attributes are saved when you save the data file.
Variable Names

The following rules apply to variable names:

 Each variable name must be unique; duplication is not allowed.


 Variable names can be up to 64 bytes long, and the first character must be a letter.
Subsequent characters can be any combination of letters, numbers, and a period (.).

64 bytes typically means 64 characters in single-byte languages (for example, English, French,
German, Spanish, Italian, Hebrew, Russian, Greek, Arabic, and Thai)
32 characters in double-byte languages (for example, Japanese, Chinese, and Korean)

 Variable names cannot contain spaces.


 The (.), _, $, #, and @ can be used within variable names. For example, A._$@#1 is a
valid variable name.
 Variable names ending with a period (.) should be avoided, since the period may be
interpreted as a command terminator.
 Variable names ending in underscores should be avoided, since such names may conflict
with names of variables automatically created by commands and procedures.
 Reserved keywords cannot be used as variable names. Reserved keywords are ALL,
AND, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, and WITH.

Variable Measurement Level

You can specify the level of measurement as scale (numeric data on an interval or ratio scale),
ordinal, or nominal. Nominal and ordinal data can be either string (alphanumeric) or numeric.

 Nominal. A variable can be treated as nominal when its values represent categories with
no intrinsic ranking (for example, the department of the company in which an employee
works).

Examples of nominal variables include region, zip code, and religious affiliation.

- 21 -
 Ordinal. A variable can be treated as ordinal when its values represent categories with
some intrinsic ranking (for example, levels of service satisfaction from highly dissatisfied
to highly satisfied). Examples of ordinal variables include attitude scores representing
degree of satisfaction or confidence and preference rating scores.

 Scale. A variable can be treated as scale when its values represent ordered categories with
a meaningful metric, so that distance comparisons between values are appropriate.
Examples of scale variables include age in years and income in thousands of dollars.

Variable Type

Variable Type specifies the data type for each variable. By default, all new variables are assumed
to be numeric. You can use Variable Type to change the data type.

Figure 3: Variable Type dialog box

The available data types are as follows:

Numeric. A variable whose values are numbers. Values are displayed in standard numeric format.
The Data Editor accepts numeric values in standard format or in scientific notation.

Comma. A numeric variable whose values are displayed with commas delimiting every three
places. Values cannot contain commas to the right of the decimal indicator.

Dot. A numeric variable whose values are displayed with periods delimiting every three places
and with the comma as a decimal delimiter. Values cannot contain periods to the right of the
decimal indicator.

Scientific notation. A numeric variable whose values are displayed with an embedded E and a
signed power-of-10 exponent, for example, 123, 1.23E2, 1.23D2, 1.23E+2, and 1.23+2.

Date. A numeric variable whose values are displayed in one of several calendar-date or clock-
time formats.

Dollar. A numeric variable displayed with a leading dollar sign ($), commas delimiting every
three places, and a period as the decimal delimiter.

- 21 -
Custom currency. A numeric variable whose values are displayed in one of the custom currency
formats that you have defined on the Currency tab of the Options dialog box.

String. A variable whose values are not numeric and therefore are not used in calculations. The
values can contain any characters up to the defined length. (Also known as an alphanumeric
variable)

Value Labels

You can assign descriptive value labels for each value of a variable. This process is particularly
useful if your data file uses numeric codes to represent non-numeric categories (for example,
codes of 1 and 2 for male and female).

 Value labels can be up to 120 bytes.

Figure 4: Value Labels dialog box

To Specify Value Labels

 Click the button in the Values cell for the variable that you want to define.
 For each value, enter the value and a label.
 Click Add to enter the value label.
 Click OK.

Column Width

You can specify a number of characters for the column width. Column widths can also be changed
in Data View by clicking and dragging the column borders.

Column formats affect only the display of values in the Data Editor. Changing the column width
does not change the defined width of a variable.

To Apply Individual Attributes from a Defined Variable

 In Variable View, select the attribute cell that you want to apply to other variables.
 From the menus choose:
Edit
Copy

- 21 -
 Select the attribute cell(s) to which you want to apply the attribute. (You can select
multiple target variables.)
 From the menus choose:
Edit
Paste

If you paste the attribute to blank rows, new variables are created with default attributes for all
attributes except the selected attribute.

Displaying and Editing Custom Variable Attributes

Custom variable attributes can be displayed and edited in the Data Editor in Variable View.

Figure 7: Custom variable attributes displayed in Variable View

Custom variable attribute names are enclosed in square brackets.


Attribute names that begin with a dollar sign are reserved and cannot be modified.
A blank cell indicates that the attribute does not exist for that variable; the text Empty
displayed in a cell indicates that the attribute exists for that variable but no value has been
assigned to the attribute for that variable. Once you enter text in the cell, the attribute exists
for that variable with the value you enter.
The text Array... displayed in a cell indicates that this is an attribute array—an attribute that
contains multiple values. Click the button in the cell to display the list of values.

To Display and Edit Custom Variable Attributes

 In Variable View, from the menus choose:


View
Customize Variable View...
 Select (check) the custom variable attributes you want to display. (The custom variable
attributes are the ones enclosed in square brackets.)

- 21 -
Figure 8: Customize Variable View

Spell Checking Variable and Value Labels

To check the spelling of variable labels and value labels:

 In Variable View, right-click on the Labels or Values column and from the context menu
choose:
Spelling
or
 In Variable View, from the menus choose:
Utilities
Spelling

Converting numeric or date into string. Numeric (for example, numeric, dollar, dot, or
comma) and date formats are converted to strings if they are pasted into a string variable cell.

Converting string into numeric or date. String values that contain acceptable characters for
the numeric or date format of the target cell are converted to the equivalent numeric or date value.

Converting date into numeric. Date and time values are converted to a number of seconds if
the target cell is one of the numeric formats (for example, numeric, dollar, dot, or comma).
Because dates are stored internally as the number of seconds since October 14, 1582, converting
dates to numeric values can yield some extremely large numbers. For example, the date 10/29/91
is converted to a numeric value of 12,908,073,600.

Converting numeric into date or time. Numeric values are converted to dates or times if the
value represents a number of seconds that can produce a valid date or time. For dates, numeric
values that are less than 86,400 are converted to the system-missing value.

- 21 -
Working with Multiple Data Sources
Multiple data sources can be open at the same time, making it easier to:

 Switch back and forth between data sources.


 Compare the contents of different data sources.
 Copy and paste data between data sources.
 Create multiple subsets of cases and/or variables for analysis.
 Merge multiple data sources from various data formats (for example, spreadsheet,
database, text data) without saving each data source first.

Figure 1: Two data sources open at same time

 Any previously open data sources remain open and available for further use.
 When you first open a data source, it automatically becomes the active dataset.

Copying and Pasting Information between Datasets


You can copy both data and variable definition attributes from one dataset to another dataset in
basically the same way that you copy and paste information within a single data file.

 Copying and pasting selected data cells in Data View pastes only the data values, with no
variable definition attributes.

- 91 -
 Copying and pasting an entire variable in Data View by selecting the variable name at the
top of the column pastes all of the data and all of the variable definition attributes for that
variable.
 Copying and pasting variable definition attributes or entire variables in Variable View
pastes the selected attributes (or the entire variable definition) but does not paste any data
values.

Data Transformations
In an ideal situation, your raw data are perfectly suitable for the type of analysis you want to
perform, and any relationships between variables are either conveniently linear or neatly
orthogonal. Unfortunately, this is rarely the case. Preliminary analysis may reveal inconvenient
coding schemes or coding errors, or data transformations may be required in order to expose the
true relationship between variables.

You can perform data transformations ranging from simple tasks, such as collapsing categories for
analysis, to more advanced tasks, such as creating new variables based on complex equations and
conditional statements.

Computing Variables
Use the Compute dialog box to compute values for a variable based on numeric transformations of
other variables.

 You can compute values for numeric or string (alphanumeric) variables.


 You can create new variables or replace the values of existing variables. For new
variables, you can also specify the variable type and label.
 You can compute values selectively for subsets of data based on logical conditions.
 You can use more than 70 built-in functions, including arithmetic functions, statistical
functions, distribution functions, and string functions.

- 02 -
Figure 2: Compute Variable dialog box

To Compute Variables

 From the menus choose:


Transform
Compute Variable...
 Type the name of a single target variable. It can be an existing variable or a new variable
to be added to the active dataset.
 To build an expression, either paste components into the Expression field or type directly
in the Expression field.

 You can paste functions or commonly used system variables by selecting a group from
the Function group list and double-clicking the function or variable in the Functions and
Special Variables list (or select the function or variable and click the arrow adjacent to the
Function group list). Fill in any parameters indicated by question marks (only applies to
functions).

The function group labeled All provides a listing of all available functions and system variables. A
brief description of the currently selected function or variable is displayed in a reserved area in the
dialog box.

 String constants must be enclosed in quotation marks or apostrophes.


 If values contain decimals, a period (.) must be used as the decimal indicator.
 For new string variables, you must also select Type & Label to specify the data type.

- 09 -
Compute Variable: If Cases

The If Cases dialog box allows you to apply data transformations to selected subsets of cases,
using conditional expressions. A conditional expression returns a value of true, false, or missing
for each case.

Figure 3: Compute Variable If Cases dialog box



 If the result of a conditional expression is true, the case is included in the selected subset.
 If the result of a conditional expression is false or missing, the case is not included in the
selected subset.
 Most conditional expressions use one or more of the six relational operators (<, >, <=, >=,
=, and ~=) on the calculator pad.
 Conditional expressions can include variable names, constants, arithmetic operators,
numeric (and other) functions, logical variables, and relational operators.

Functions
Many types of functions are supported, including:

 Arithmetic functions
 Statistical functions
 String functions
 Date and time functions
 Distribution functions
 Random variable functions
 Missing value functions
 Scoring functions (SPSS Server only)

For more information and a detailed description of each function, type functions on the Index tab
of the Help system.

- 00 -
Missing Values in Functions
Functions and simple arithmetic expressions treat missing values in different ways. In the
expression:

(var1+var2+var3)/3

the result is missing if a case has a missing value for any of the three variables.

In the expression:

MEAN(var1, var2, var3)

the result is missing only if the case has missing values for all three variables.

For statistical functions, you can specify the minimum number of arguments that must have
nonmissing values. To do so, type a period and the minimum number after the function name, as
in:

MEAN.2(var1, var2, var3)

Count Occurrences of Values within Cases


This dialog box creates a variable that counts the occurrences of the same value(s) in a list of
variables for each case. For example, a survey might contain a list of magazines with yes/no check
boxes to indicate which magazines each respondent reads. You could count the number of yes
responses for each respondent to create a new variable that contains the total number of magazines
read.

Figure 6: Count Occurrences of Values within Cases dialog box

To Count Occurrences of Values within Cases

 From the menus choose:

Transform
Count Values within Cases...

 Enter a target variable name.


 Select two or more variables of the same type (numeric or string).
 Click Define Values and specify which value or values should be counted.

Optionally, you can define a subset of cases for which to count occurrences of values.

Count Occurrences: If Cases

The If Cases dialog box allows you to count occurrences of values for a selected subset of cases,
using conditional expressions. A conditional expression returns a value of true, false, or missing
for each case.

- 02 -
Figure 4: Count Occurrences If Cases dialog box

Recode into Same Variables


The Recode into Same Variables dialog box allows you to reassign the values of existing variables
or collapse ranges of existing values into new values. For example, you could collapse salaries
into salary range categories.

Figure 5: Recode into Same Variables dialog box

 From the menus choose:

Transform

- 02 -
Recode into Same Variables...

 Select the variables you want to recode. If you select multiple variables, they must be the
same type (numeric or string).
 Click Old and New Values and specify how to recode values.

Optionally, you can define a subset of cases to recode. The If Cases dialog box for doing this is
the same as the one described for Count Occurrences.

To Recode Values of a Variable into a New Variable

 From the menus choose:

Transform
Recode into Different Variables...

 Select the variables you want to recode. If you select multiple variables, they must be the
same type (numeric or string).
 Enter an output (new) variable name for each new variable and click Change.
 Click Old and New Values and specify how to recode values.

Optionally, you can define a subset of cases to recode. The If Cases dialog box for doing this is
the same as the one described for Count Occurrences.

Replace Missing Values

Missing observations can be problematic in analysis, and some time series measures cannot be
computed if there are missing values in the series. Sometimes the value for a particular
observation is simply not known.

Missing data at the beginning or end of a series pose no particular problem; they simply shorten
the useful length of the series. Gaps in the middle of a series (embedded missing data) can be a
much more serious problem. The extent of the problem depends on the analytical procedure you
are using.

Default new variable names are the first six characters of the existing variable used to create it,
followed by an underscore and a sequential number. For example, for the variable price, the new
variable name would be price_1.

- 02 -
Figure 6: Replace Missing Values dialog box

 From the menus choose:

Transform
Replace Missing Values...

 Select the estimation method you want to use to replace missing values.
 Select the variable(s) for which you want to replace missing values.

Optionally, you can:



 Enter variable names to override the default new variable names.
 Change the estimation method for a selected variable.

Estimation Methods for Replacing Missing Values

Series mean. Replaces missing values with the mean for the entire series.

Mean of nearby points. Replaces missing values with the mean of valid surrounding values.
The span of nearby points is the number of valid values above and below the missing value used
to compute the mean.

Median of nearby points. Replaces missing values with the median of valid surrounding values.
The span of nearby points is the number of valid values above and below the missing value used
to compute the median.

Linear interpolation. Replaces missing values using a linear interpolation. The last valid value
before the missing value and the first valid value after the missing value are used for the
interpolation. If the first or last case in the series has a missing value, the missing value is not
replaced.

- 02 -
Means
The Means procedure calculates subgroup means and related univariate statistics for dependent
variables within categories of one or more independent variables. Optionally, you can obtain a
one-way analysis of variance, eta, and tests for linearity.

Example. Measure the average amount of fat absorbed by three different types of cooking oil,
and perform a one-way analysis of variance to see whether the means differ.

Statistics. Sum, number of cases, mean, median, grouped median, standard error of the mean,
minimum, maximum, range, variable value of the first category of the grouping variable, variable
value of the last category of the grouping variable, standard deviation, variance, kurtosis, standard
error of kurtosis, skewness, standard error of skewness, percentage of total sum, percentage of
total N, percentage of sum in, percentage of N in, geometric mean, and harmonic mean. Options
include analysis of variance, eta, eta squared, and tests for linearity R and R2.

Data. The dependent variables are quantitative, and the independent variables are categorical. The
values of categorical variables can be numeric or string.

Assumptions. Some of the optional subgroup statistics, such as the mean and standard deviation,
are based on normal theory and are appropriate for quantitative variables with symmetric
distributions. Robust statistics, such as the median, are appropriate for quantitative variables that
may or may not meet the assumption of normality. Analysis of variance is robust to departures
from normality, but the data in each cell should be symmetric. Analysis of variance also assumes
that the groups come from populations with equal variances. To test this assumption, use Levene’s
homogeneity-of-variance test, available in the One-Way ANOVA procedure.

To Obtain Subgroup Means


 From the menus choose:
Analyze
Compare Means
Means...

Figure 1: Means dialog box

- 72 -
 Select one or more dependent variables.
 Use one of the following methods to select categorical independent variables:
 Select one or more independent variables. Separate results are displayed for each
independent variable.

Means Options

Figure 2: Means Options dialog box

You can choose one or more of the subgroup statistics for the variables within each category of
each grouping variable.

T Tests
Three types of t tests are available:

Independent-samples t test (two-sample t test). Compares the means of one variable for two
groups of cases. Descriptive statistics for each group and Levene’s test for equality of variances
are provided, as well as both equal- and unequal-variance t values and a 95% confidence interval
for the difference in means.

Paired-samples t test (dependent t test). Compares the means of two variables for a single
group. This test is also for matched pairs or case-control study designs. The output includes
descriptive statistics for the test variables, the correlation between the variables, descriptive
statistics for the paired differences, the t test, and a 95% confidence interval.

One-sample t test. Compares the mean of one variable with a known or hypothesized value.
Descriptive statistics for the test variables are displayed along with the t test. A 95% confidence

- 72 -
interval for the difference between the mean of the test variable and the hypothesized test value is
part of the default output.

Independent-Samples T Test
The Independent-Samples T Test procedure compares means for two groups of cases. Ideally, for
this test, the subjects should be randomly assigned to two groups, so that any difference in
response is due to the treatment (or lack of treatment) and not to other factors.

To Obtain an Independent-Samples T Test

 From the menus choose:


Analyze
Compare Means
Independent-Samples T Test...

Figure 3: Independent-Samples T Test dialog box

 Select one or more quantitative test variables. A separate t test is computed for each
variable.
 Select a single grouping variable, and then click Define Groups to specify two codes for
the groups that you want to compare.
 Optionally, click Options to control the treatment of missing data and the level of the
confidence interval.

Independent-Samples T Test Define Groups

Figure 4: Define Groups dialog box for numeric or string variables

- 72 -
For numeric grouping variables, define the two groups for the t test by specifying two values or a
cutpoint:

 Use specified values. Enter a value for Group 1 and another value for Group 2. Cases
with any other values are excluded from the analysis. Numbers need not be integers (for
example, 6.25 and 12.5 are valid).

 Cutpoint. Enter a number that splits the values of the grouping variable into two sets. All
cases with values that are less than the cutpoint form one group, and cases with values
that are greater than or equal to the cutpoint form the other group.

Paired-Samples T Test
The Paired-Samples T Test procedure compares the means of two variables for a single group.
The procedure computes the differences between values of the two variables for each case and
tests whether the average differs from 0.

To Obtain a Paired-Samples T Test

 From the menus choose:


Analyze
Compare Means
Paired-Samples T Test...

Figure 5: Paired-Samples T Test dialog box

 Select one or more pairs of variables


 Optionally, click Options to control the treatment of missing data and the level of the
confidence interval.

- 03 -
Example 1: Scientists and engineers frequently wish to compare two different techniques for
measuring or determining the value of variable. In such situation, interest centers on testing
whether or not the mean difference in measurements is zero.

The following test results have been reported from a construction laboratory indicates the
compressive strength of a concrete mix evaluated by two different methods: cubes molds, and
cylinder molds. Use the t test at level 0.05 to see whether or not the true average difference in
measured compressive strength for the two methods is zero?

Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Cube strength 25.09 24.18 25.61 25.56 31.69 27.6 20.98 21.98 24.79 22.81 24.14 29.54 31.74 30.58
Cylinder strength 24.98 22.54 23.36 25.65 30.00 23.18 24.1 21.29 23.42 21.24 24.68 26.04 27.22 25.18

Solution:

Calculate the average difference between both two methods of compressive strength
measurements: testing by cubes, and testing by cylinders:

Sample No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Cube strength 25.09 24.18 25.61 25.56 31.69 27.6 20.98 21.98 24.79 22.81 24.14 29.54 31.74 30.58
Cylinder strength 24.98 22.54 23.36 25.65 30.00 23.18 24.1 21.29 23.42 21.24 24.68 26.04 27.22 25.18
Difference 0.11 1.64 2.25 -0.09 1.69 4.42 -3.12 0.69 1.37 1.57 -0.54 3.5 4.52 5.4

Let μD = average difference between cube strength measurements and cylinder strength
measurements.

and:

dˉ = = 1.67

So, the relevant hypotheses are:

Ho: μD = 0

Versus

Ha: μD ≠ 0

The test is therefore two tailed. From the t table t 0.025, 13 = 2.16,

So Ho will be rejected if (t paired ≥ 2.16) or (t paired ≤ -2.16)

The sample average difference and sample standard deviation of the difference are:

dˉ=1.67

and D = 2.282

So,

t paired = = = 2.74

Because 2.74 > 2.16,

Ho is rejected (i.e. average difference is something other than zero.)

- 03 -
One-Sample T Test
The One-Sample T Test procedure tests whether the mean of a single variable differs from a
specified constant.

To Obtain a One-Sample T Test

 From the menus choose:


Analyze
Compare Means
One-Sample T Test...

Figure 6: One-Sample T Test dialog box

 Select one or more variables to be tested against the same hypothesized value.
 Enter a numeric test value against which each sample mean is compared.
 Optionally, click Options to control the treatment of missing data and the level of the
confidence interval.

One-Sample T Test Options

Figure 7: One-Sample T Test Options dialog box

- 07 -
Confidence Interval. By default, a 95% confidence interval for the difference between the mean
and the hypothesized test value is displayed. Enter a value between 1 and 99 to request a different
confidence level.

Example 2: In order to test concrete mixture performance for a new type of cement type used, a
concrete producer selected six set of cubes samples to test. The compressive strength values for
the six specimens were 27.2, 29.3, 31.5, 28.7, 30.2, and 29.6 MPa. The concrete producer wishes
to say that concrete of this type average at least 30 MPa on such test. Dose the sample data
contradict the validity of this claim?

Solution:

The producer will claim that μ ≥ 30 MPa unless the data strongly suggests otherwise.

So, the relevant hypotheses are: Ho: μ = 30 MPa versus Ha: μ < 30 MPa.

The formula for the test statistic value is:

t=

At significant level (α) =0.05,

Ho should be rejected if t ≤ - t α, n-1 = -t.05, 5 = -2.015


xˉ= = 29.42,

𝓢2 = 2.086

𝓢 = 1.44

and

t= = = -0.99

Since t= -.99 is not in the rejection region (t > -2.015), Ho is not rejected at level 0.05.

The claim that the producer wishes to make is not contradicted by the data.

- 00 -
Homework:
Q1: A certain type of brick is being considered for use in a particular construction
project. The brick will be used unless sample evidence strongly suggests that the true
average compressive strength is below 22 MPa. A random sample of 9 bricks is
selected and each is subjected to a compressive strength test. The resulting sample
average compressive strength and sample standard deviation of compressive strength
are 21.3 MPa and 1.1 MPa respectively. State the relevant hypotheses and carry out a
test to reach a decision using level of significance (α) a) 0.05 b) 0.01?

Q2: A sample of 8 speedometers of a particular brand is obtained, and each is


calibrated to check for accuracy at 55 km/hr. The resulting sample data are shown
below. Let μ = the true average reading when actual speed is 55 km/hr. Dose the
sample evidence suggest strongly that μ ≠ 55? Use a test level (α) a) 0.01 b) 0.05?

53.2, 53.4, 52.9, 54.0, 54.9, 53.6, 54.5, 53.8, 54.1

Note:

 Homework should be submitted next week .


 Solution should be done by both hand calculation and by using SPSS software
program, and then compare the results.

- 03 -
T-Test

Group Statistics

VAR00011 N Mean Std. Deviation Std. Error Mean

VAR00010 1.00 10 .4920 .11727 .03708

2.00 16 .3594 .13616 .03404

Independent Samples Test

Levene's Test for

Equality of

Variances t-test for Equality of Means

95% Confidence
Interval of the

Difference

Sig. (2- Mean Std. Error

F Sig. t df tailed) Difference Difference Lower Upper

VAR00010 Equal variances .262 .613 2.543 24 .018 .13263 .05216 .02497 .24028

assumed

Equal variances 2.635 21.429 .015 .13263 .05034 .02807 .23718

not assumed

- 53 -
Previously,

- 53 -
- 53 -
- 53 -
- 53 -
- 04 -
- 04 -
- 04 -
- 05 -
- 00 -
Example:-

Immediately below the asphalt surface of a roadway is a layer of base material


composed of a crushed stone or gravel aggregate. The resilient modulus of this
aggregate is a measure of how the aggregate deforms when subjected to stress, and it
is an important property affecting the manner in which the roadway responds to loads.

A construction engineer has four different suppliers of this aggregate material who
obtain their raw materials from four different locations. The engineer would like to
assess whether the aggregates from the four different locations have different values
of resilient modulus. An experiment is performed by randomly selecting ten samples
of aggregate from each of the four locations and measuring their resilient modulus.
The resilient data set is given below:

Location 1 Location 2 Location 3 Location 4


30060.00 31280.00 29950.00 30430.00
28740.00 30380.00 29190.00 28120.00
29140.00 30620.00 31870.00 30310.00
29090.00 29650.00 30010.00 31650.00
30220.00 28080.00 28490.00 29770.00
31120.00 27920.00 31600.00 33100.00
31360.00 27420.00 29450.00 28680.00
28300.00 28860.00 32890.00 31730.00
29750.00 29850.00 31170.00 31480.00
33350.00 30550.00 29470.00 32960.00

- 03 -
H.W

Re-solve the previous example but with the following data:

Location 1 Location 2 Location 3 Location 4


34569.00 38161.60 32196.25 30430.00
33051.00 37063.60 31379.25 28120.00
33511.00 37356.40 34260.25 30310.00
33453.50 36173.00 32260.75 31650.00
34753.00 34257.60 30626.75 29770.00
35788.00 34062.40 33970.00 33100.00
36064.00 33452.40 31658.75 28680.00
32545.00 35209.20 35356.75 31730.00
34212.50 36417.00 33507.75 31480.00
38352.50 37271.00 31680.25 32960.00

- 03 -
1

- 25 -
- 25 -
- 25 -
2

- 22 -
- 25 -
- 25 -
- 25 -
- 25 -
- 56 -
- 56 -
- 55 -
MULTIPLE LINEAR REGRESSION MODEL

1.0 Introduction
Many applications of regression analysis involve situations in which there are
more than one regressor variable. A regression model that contains more than one
regressor variable is called a multiple regression model.

As an example, suppose that the effective life of a cutting tool depends on the
cutting speed and the tool angle. A multiple regression model that might describe this
relationship is

[1]

where Y represents the tool life, x1 represents the cutting speed, x2 represents the tool
angle, and ε is a random error term. This is a multiple linear regression model with
two regressors. The term linear is used because Equation 1 is a linear function of the
unknown parameters β0, β1, and β2.

The regression model in Equation 1 describes a plane in the three-dimensional space


of Y, x1, and x2. Figure 1(a) shows this plane for the regression model

In general, the dependent variable or response Y may be related to k independent or


regressor variables. The model

[2]

is called a multiple linear regression model with k regressor variables. The parameters
βj, j= 0, 1, ….. k, are called the regression coefficients.

- 36 -
Models those are more complex in structure than Equation 2 may often still be
analyzed by multiple linear regression techniques.

For example, consider the cubic polynomial model in one regressor variable.

[3]

If we let x1=x, x2= x2, x3= x3, Equation 3 can be written as

[4]

which is a multiple linear regression model with three regressor variables.

Models that include interaction effects may also be analyzed by multiple


linear regression methods. An interaction between two variables can be represented
by a cross-product term in the model, such as

If we let x3= x1x2, and β3= β12, Equation 5 can be written as

which is a linear regression model. Figure 2(a) and (b) shows the three-dimensional
plot of the regression model

and the corresponding two-dimensional contour plot. Notice that, although this model
is a linear regression model, the shape of the surface that is generated by the model is
not linear.

In general, any regression model that is linear in parameters (the β’s) is a


linear regression model, regardless of the shape of the surface that it generates.

- 36 -
EXAMPLE 1:
Table 1 below shows data on pull strength of a wire bond in a semiconductor
manufacturing process, wire length, and die height,

1
:
:
:

- 36 -
A three-dimensional scatter plot of the data is presented in Fig. 4a.

4, a

Figure 4, b shows a matrix of two-dimensional scatter plots of the data. These


displays can be helpful in visualizing the relationships among variables in a
multivariable data set.

Figure 4, b: Matrix of scatter plots (from SPSS) for the wire bond pull strength data in
.

Specifically, we will fit the multiple linear regression model

where Y= pull strength, x1= wire length, and x2= die height.

- 33 -
1

From the data in Table 1we calculate:

For the model Y= β0 + β1 x1 + β2 x2 +ε, the normal Equations are:

Inserting the computed summations into the normal equations, we obtain

The solution to this set of equations is

Therefore, the fitted regression equation is

- 36 -
This equation can be used to predict pull strength for pairs of values of the regressor
variables wire length (x1) and die height (x2). For example, the first observation has x1= 2
and x2= 50, and the fitted value is

The corresponding observed value is y1= 9.95. The residual corresponding to the first
observation is

Figure 5 shows a three-dimensional plot of the plane of predicted values yˆ generated


from this equation.

Estimating σ2
Just as in simple linear regression, it is important to estimate σ2, the variance
of the error term ε, in a multiple regression model.

- 36 -
Homework 1:

A study was performed on wear of a bearing y and its relationship to x1 oil viscosity
and x2 load. The following data were obtained.

(a) Fit a multiple linear regression model to these data.


(b) Estimate σ2 and the standard errors of the regression coefficients.
(c) Use the model to predict wear when x1= 25 and x2= 1000.
(d) Fit a multiple linear regression model with an interaction term to these data.
(e) Estimate σ2 and se ( βˆj) for this new model. How did these quantities change?
Does this tell you anything about the value of adding the interaction term to the
model?
(f) Use the model in (d) to predict when x1= 25 and x2=1000. Compare this prediction
with the predicted value from part (b) above.

- 36 -
2.0 Categorical Regressors and Indicator Variables

The regression models presented in previous sections have been based on


quantitative variables, that is, variables that are measured on a numerical scale. For
example, variables such as temperature, pressure, distance, and voltage are
quantitative variables.
Occasionally, we need to incorporate categorical, or qualitative, variables in
a regression model.
For example, suppose that one of the variables in a regression model is the
operator who is associated with each observation yi. Assume that only two operators
are involved. We may wish to assign different levels to the two operators to account
for the possibility that each operator may have a different effect on the response. The
usual method of accounting for the different levels of a qualitative variable is to use
indicator variables. For example, to introduce the effect of two different operators
into a regression model, we could define an indicator variable as follows:

EXAMPLE 1: A mechanical engineer is investigating the surface finish of metal


parts produced on a lathe and its relationship to the speed (in revolutions per minute)
of the lathe. The data are shown in Table 1. Note that the data have been collected
using two different types of cutting tools.

1: 1

- 67 -
Since the type of cutting tool likely affects the surface finish, we will fit the
model

where Y is the surface finish, x1 is the lathe speed in revolutions per minute, and x2 is
an indicator variable denoting the type of cutting tool used; that is,

The parameters in this model may be easily interpreted. If x2=0, the model becomes

which is a straight-line model with slope β1and intercept β0. However, if x2=1, the
model becomes

which is a straight-line model with slope β1 and intercept β0 + β2. Thus, the model

implies that surface finish is linearly related to lathe speed and that the slope β1 does
not depend on the type of cutting tool used. However, the type of cutting tool does
affect the intercept, and β2 indicates the change in the intercept associated with a
change in tool type from 302 to 416.

- 67 -
-72-
;

-73-
-74-
-75-
-76-
-77-

You might also like