Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Minitab Statguide Multivariate

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Summary - Principal Components Page 1 of 25

Principal Components
Summary

Use Principal Components to form a smaller number of uncorrelated variables. The goal of principal components analysis is to explain the
maximum amount of variance with the fewest number of principal components .
For example:
• You record information on 10 socioeconomic variables and you want to reduce the variables into a smaller number of components to
more easily analyze the data.
• You want to analyze the customer responses to several attributes of a new product in order to form a smaller number of uncorrelated
variables that are easier to interpret.
Principal components analysis is commonly used as one step in a series of analyses. For example, you can use Principal Components to
reduce your data and avoid multicollinearity or when you have too many predictors relative to the number of observations.
A principal components analysis often uncovers unsuspected relationships, allowing you to interpret the data in a new way.
You can perform principal components analysis when you have one sample and several variables are measured on each sampling unit.

Data Description

A bank requires eight pieces of information when applying for a loan: income, education level, age, length of time at current residence,
length of time with current employer, savings, debt, and number of credit cards. A bank administrator wants to analyze this information for
reporting purposes.

Principal Components
Eigenanalysis − Eigenvalues

Use the eigenvalue results to determine the number of principal components .


One way to determine the number of principal components is based on the size of eigenvalues . In the eigenanalysis of the correlation
matrix, the eigenvalues of the correlation matrix equal the variances of the principal components. According to the Kaiser criterion, retain
principal components with eigenvalues greater than 1.
You can also decide on the number of principal components based on the amount of explained variance. For example, you may retain
components that cumulatively explain 90% of the variance. Another technique is to analyze a scree plot. Use any one or combination of
these techniques to determine the number of principal components.

Example Output

Eigenanalysis of the Correlation Matrix

Eigenvalue 3.5476 2.1320 1.0447 0.5315 0.4112


Proportion 0.443 0.266 0.131 0.066 0.051
Cumulative 0.443 0.710 0.841 0.907 0.958

Eigenvalue 0.1665 0.1254 0.0411


Proportion 0.021 0.016 0.005
Cumulative 0.979 0.995 1.000

Interpretation

For the loan applicant data:


• The first principal component has variance 3.5476 (equal to the largest eigenvalue) and accounts for 0.443 (44.3%) of the total
variation in the data.
• The second principal component (variance 2.1320) accounts for 0.266 (26.6%) of the total data variation.
• The third principal component (variance 1.0447) accounts for 0.131 (13.1%) of the total data variation.
The first three principal components with variances equal to the eigenvalues greater than 1 represent 0.841 (84.1%) of the total variability,
suggesting that 3 principal components adequately explain the variation in the data.

Principal Components
Eigenanalysis − Coefficients

The principal components are the linear combinations of the original variables that account for the variance in the data. The maximum
number of components extracted always equals the number of variables. The eigenvectors , which are comprised of coefficients
corresponding to each variables, are used to calculate the principal component scores . The coefficients indicate the relative weight of
each variable in the component. The bigger the absolute value of the coefficient, the more important the corresponding variable is in
constructing the component.
Note You must standardize the variables to obtain the correct component score.

Example Output

Variable PC1 PC2 PC3 PC4 PC5


Income 0.314 0.145 -0.676 -0.347 -0.241

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 2 of 25

Education 0.237 0.444 -0.401 0.240 0.622


Age 0.484 -0.135 -0.004 -0.212 -0.175
Residence 0.466 -0.277 0.091 0.116 -0.035
Employ 0.459 -0.304 0.122 -0.017 -0.014
Savings 0.404 0.219 0.366 0.436 0.143
Debt -0.067 -0.585 -0.078 -0.281 0.681
Credit cards -0.123 -0.452 -0.468 0.703 -0.195

Variable PC6 PC7 PC8


Income 0.494 0.018 -0.030
Education -0.357 0.103 0.057
Age -0.487 -0.657 -0.052
Residence -0.085 0.487 -0.662
Employ -0.023 0.368 0.739
Savings 0.568 -0.348 -0.017
Debt 0.245 -0.196 -0.075
Credit cards -0.022 -0.158 0.058

Interpretation

For the loan applicant data, the first principal component's scores are computed from the original data using the coefficients listed under
PC1 :
PC1 = 0.314 Income + 0.237 Education + 0.484 Age + 0.466 Residence + 0.459 Employ + 0.404 Savings − 0.067 Debt − 0.123 Credit
cards
The interpretation of the principal components is subjective and requires knowledge of the data:
• Age (0.484), Residence (0.466), Employ (0.459), and Savings (0.404) have large positive loadings on component 1, so label this
component Applicant Background.
• Debt (−
−0.585) and Credit cards (−
−0.452) have large negative loadings on component 2, so label this component Credit History.
• Income (−
−0.676) and Education (−
−0.401) have large negative loadings on Component 3, so label this component Academic and
Income qualifications.

Principal Components : Topics


Summary
Eigenvalues
Coefficients
Graphs
Scree plot
Score plot
Loading plot
Biplot
Principal Components
Graphs − Scree Plot

The scree, or eigenvalue, plot provides one method for determining the number of principal components. The scree plot displays the
component number versus the corresponding eigenvalue . The eigenvalues of the correlation matrix equal the variances of the principal
components; therefore, choose the number of components based on the size of the eigenvalues.
The ideal pattern is a steep curve, followed by a bend and then a straight line. Retain those components in the steep curve before the first
point that starts the line trend. In practice, you may have difficulty interpreting a scree plot. Use your knowledge of the data and the
results from the other methods of selecting components to help decide the number of components to retain.

Example Output

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 3 of 25

Interpretation

For the loan applicant data you can conclude that the first three principal components account for most of the total variability in data
(given by the eigenvalue). The remaining principal components account for a very small proportion of the variability (close to zero) and
are probably unimportant.

Principal Components
Graphs − Score Plot

The score plot graphs the second principal component scores versus the first principal component scores. If the first two components
account for most of the variance in the data, you can use the score plot to assess the data structure and detect clusters, outliers , and
trends. The plot may reveal groupings of points, which may indicate two or more separate distributions in the data. If the data follow a
normal distribution and no outliers are present, the points are randomly distributed around zero.
To create plots for other components, store the scores and use Graph > Scatterplot.

Example Output

Interpretation

For the loan applicant data, the point in the lower right hand corner may be an outlier. Investigate this point further.

Principal Components
Graphs − Loading Plot

The loading plot provides information about the loadings of the first two principal components.

Example Output

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 4 of 25

Interpretation

For the loan applicant data:


• Age, Residence, Employ, and Savings have large positive loadings on component 1, so label this component Applicant Background.
• Debt and Credit Cards have large negative loadings on component 2, so label this component Credit history.

Principal Components
Graphs − Biplot

The biplot overlays the score and loading plots. Use the biplot to assess the data structure and loadings on one graph. The second
principal component scores are plotted versus the first principal component scores. The loadings for these two principal components are
plotted on the same graph.

Example Output

Interpretation

For the loan applicant data:


• Age, Residence, Employ, and Savings have large positive loadings on component 1, so label this component Applicant background.
• Debt and Credit cards have large negative loadings on component 2, so label this component Credit History.
• The data point in the lower right hand corner may be an outlier. You may want to investigate this point further.

Multivariate analysis
Use multivariate analysis to:
• Understand or reduce the data dimension by analyzing the data covariance structure.
− Principal Components − use to reduce the data into a smaller number of components
− Factor Analysis − use to describe the covariance among variables in terms of a few underlying factors
• Assign group membership.

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 5 of 25

− Discriminant Analysis − use to classify observations into two or more groups when groups are known
− Cluster Observations − use to classify similar observations into groups when groups are initially unknown
− Cluster Variables − use to classify similar variables into groups when groups are initially unknown
− Cluster K-Means − use to classify similar observations into groups when groups are initially unknown but good starting points for
clusters are available
Because Minitab does not compare tests of significance for multivariate procedures, interpreting the results is somewhat subjective.
However, you can make informed conclusions if you are familiar with the data.
Multivariate analysis of means (MANOVA) is another case of multivariate analysis. MANOVA allows you to compare the means of
variables across groups. See the ANOVA StatGuide for information on both Balanced and General MANOVA techniques.

Principal components analysis or factor analysis


You can use principal components analysis or factor analysis to summarize the data covariance structure in fewer dimensions of the data.
Though the techniques are similar, they actually are designed for different purposes. Principal components analysis is used to reduce
data into a smaller number of components, and factor analysis is used to understand what constructs underlie the data. The two analyses
are often performed on the same data. For example, you can conduct a principal components analysis to determine the number of factors
to extract in a factor analytic study.

Correlation or covariance matrix


Use the correlation matrix to calculate the principal components if variables are measured by different scales and you want to standardize
them or if the variances differ widely among variables. You can use the covariance or correlation matrix in all other situations.

Determining number of components


You can determine the number of principal components using several methods:
• Kaiser method − only retain components with eigenvalues greater than 1.
• Scree test − find the place on the scree plot where the smooth decrease of eigenvalues levels off. Retain those components in the
steep curve before the first point that starts the line trend.
• Percent variation explained − retain components that cumulatively explain a certain percent of variation. The acceptable level of
explained variance depends on how you use Principal Components. For descriptive purposes, you may only need 80% of the
variance explained. However, if you are doing other analyses on these data, you may want to have at least 90% of the variance
explained.
These methods may detect a different number of components for the same data. In such cases, choose the most interpretable and logical
solution for your data.

Factor Analysis
Summary

Use Factor Analysis to summarize the data covariance structure into a smaller number of dimensions. The emphasis in factor analysis is
to describe the covariance among variables in terms of a few underlying unobservable random quantities or factors .
For example, you may want to analyze:
• Customer responses on several attributes of a new product to form fewer uncorrelated factors that are easier to interpret.
• Test scores in different subject areas, looking for correlations among the variables that may indicate the existence of factors.
You can perform factor analysis when you have one sample and several variables are measured on each sampling unit.

Data Description

Job applicants were measured on 12 different characteristics: academic record, appearance, communication, company fit, experience,
job fit, letter of interest, likeability, organization, potential, résumé, and self-confidence. After conducting a principal components analysis,
you speculate that five factors may fit the data well. You want to conduct a factor analysis using a maximum likelihood extraction to
determine what factors underlie the data.

Factor Analysis : Topics


Summary
Loadings and communalities
Unrotated
Rotated
Sorted rotated
Factor scores
Coefficients
Graphs
Scree plot

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 6 of 25

Score plot
Loading plot
Biplot
Factor Analysis
Loadings and Communalities − Unrotated

Use factor analysis to determine the underlying factors responsible for correlations in the data.
The unrotated factor analysis displays:
• Loadings − represent how much a factor explains a variable. High loadings (positive or negative) indicate that the factor strongly
influences the variable. Low loadings (positive or negative) indicate that the factor has a weak influence on the variables. Examine the
loading pattern to determine on which factor each variable loads. Some variables may load on multiple factors. In the unrotated factor
loading table, the loadings are often difficult to interpret.
• Communality − each variable's proportion of variability that is explained by the factors. The closer the communality is to 1, the better
the variable is explained by the factors. You can decide to add a factor if it contributes to the fit of certain variables.
• Variance − variability in the data explained by each factor. Variance equals the eigenvalue if you use principal components to extract
factors.
• %Var − proportion of variability in the data explained by each factor.
After you select the number of factors, try different rotations so you can more easily interpret the factor loadings. Look at the rotated
solution to interpret the factors.

Example Output

Maximum Likelihood Factor Analysis of the Correlation Matrix

Unrotated Factor Loadings and Communalities

Variable Factor1 Factor2 Factor3 Factor4 Factor5 Communality


Academic record 0.571 -0.329 -0.270 0.246 0.471 0.789
Appearance 0.590 -0.290 0.228 0.398 -0.016 0.643
Communication 0.558 -0.505 0.465 -0.166 0.159 0.835
Company Fit 0.616 -0.620 -0.174 -0.210 -0.117 0.853
Experience 0.585 -0.138 -0.404 0.226 0.264 0.646
Job Fit 0.635 -0.606 -0.343 -0.168 -0.160 0.943
Letter 0.824 0.154 0.006 -0.253 0.021 0.768
Likeability 0.582 -0.371 0.153 0.354 -0.234 0.680
Organization 0.465 -0.659 0.500 -0.119 0.095 0.925
Potential 0.606 -0.476 -0.356 0.221 0.307 0.864
Resume 0.964 0.244 0.018 -0.013 -0.004 0.989
Self-confidence 0.549 -0.378 0.157 0.583 -0.263 0.878

Variance 4.9426 2.2441 1.0782 0.9621 0.5841 9.8111


% Var 0.412 0.187 0.090 0.080 0.049 0.818

Interpretation

For the job applicant data, 5 factors were extracted from the 12 variables. All variables are well represented by the 5 chosen factors,
given that the corresponding communalities are generally high. For example, 0.789, or 78.9%, of the variability in Academic Record is
explained by the 5 factors. Also, the 5 chosen factors explain most of the total data variation (0.818 or 81.8%).

Factor Analysis
Loadings and Communalities − Rotated

Factor rotation simplifies the loading structure, allowing you to more easily interpret the factor loadings. There are four methods to
orthogonally rotate the initial factor loadings:
• Equimax − maximizes variance of squared loadings within both variables and factors.
• Varimax − maximizes variance of squared loadings within factors (i.e. simplifies the columns of the loading matrix); the most widely
used rotation method. This method attempts to make the loadings either large or small to ease interpretation.
• Quartimax − maximizes variance of squared loadings within variables (i.e. simplifies the rows of the loading matrix).
• Orthomax − rotation that comprises the above three depending on the value of the parameter gamma (0-1).
All methods simplify the loading structure. However, one method may not work best in all cases. You may want to try different rotations
and use the one that produces the most interpretable results.
The percent of the total variability explained by the factors does not change with rotation and the communalities remain the same. But,
after rotation, the factors are more evenly balanced in the percent of variability that they account for (compare the % Var rows from the
unrotated and rotated output).

Example Output

Rotated Factor Loadings and Communalities

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 7 of 25

Varimax Rotation

Variable Factor1 Factor2 Factor3 Factor4 Factor5 Communality


Academic record 0.815 0.191 0.206 0.167 -0.135 0.789
Appearance 0.283 0.644 0.317 0.213 -0.047 0.643
Communication 0.139 0.242 0.815 0.235 -0.197 0.835
Company Fit 0.300 0.216 0.376 0.207 -0.729 0.853
Experience 0.694 0.196 -0.048 0.290 -0.199 0.646
Job Fit 0.372 0.228 0.231 0.210 -0.810 0.943
Letter 0.184 0.073 0.193 0.805 -0.208 0.768
Likeability 0.173 0.703 0.247 0.182 -0.248 0.680
Organization 0.097 0.315 0.864 0.070 -0.255 0.925
Potential 0.785 0.268 0.186 0.126 -0.355 0.864
Resume 0.267 0.297 0.104 0.899 -0.100 0.989
Self-confidence 0.216 0.877 0.166 0.090 -0.162 0.878

Variance 2.2803 2.1752 1.9204 1.8175 1.6177 9.8111


% Var 0.190 0.181 0.160 0.151 0.135 0.818

Interpretation

For the job applicant data, the varimax rotation was performed. You can now interpret the factors more easily. For example:
• Academic record (0.815), Experience (0.694), and Potential (0.785) have large positive loadings on factor 1, so label this factor
qualifications.
• Appearance (0.644), Likeability (0.703), and Self-confidence (0.877) have large positive loadings on factor 2, so label this factor
personal qualities.
• Communication (0.815) and Organization (0.864) have large positive loadings on factor 3, so label this factor work skills.
• Letter (0.805) and Resume (0.899) have large positive loadings on factor 4, so label this factor writing skills.
• Company Fit (−
−0.729) and Job Fit (−
−0.810) have large positive loadings on factor 5, so label this factor company and job fit.

Factor Analysis
Loadings and Communalities − Sorted Rotated

The sorted rotated table contains the same content as the rotated solution. Because the factor loadings are sorted by size, they are
easier to read in the sorted rotated table.
Minitab sorts by the maximum absolute loading for any factor. Minitab sorts the loadings so that the high loadings for all factors form a
diagonal, making the loadings easier to read.

Example Output

Sorted Rotated Factor Loadings and Communalities

Variable Factor1 Factor2 Factor3 Factor4 Factor5 Communality


Academic record 0.815 0.191 0.206 0.167 -0.135 0.789
Potential 0.785 0.268 0.186 0.126 -0.355 0.864
Experience 0.694 0.196 -0.048 0.290 -0.199 0.646
Self-confidence 0.216 0.877 0.166 0.090 -0.162 0.878
Likeability 0.173 0.703 0.247 0.182 -0.248 0.680
Appearance 0.283 0.644 0.317 0.213 -0.047 0.643
Organization 0.097 0.315 0.864 0.070 -0.255 0.925
Communication 0.139 0.242 0.815 0.235 -0.197 0.835
Resume 0.267 0.297 0.104 0.899 -0.100 0.989
Letter 0.184 0.073 0.193 0.805 -0.208 0.768
Job Fit 0.372 0.228 0.231 0.210 -0.810 0.943
Company Fit 0.300 0.216 0.376 0.207 -0.729 0.853

Variance 2.2803 2.1752 1.9204 1.8175 1.6177 9.8111


% Var 0.190 0.181 0.160 0.151 0.135 0.818

Interpretation

For the job applicant data, you can easily see what variables load on each factor:
• Academic record (0.815), Experience (0.694), and Potential (0.785) have large positive loadings on factor 1, so label this factor
Qualifications.
• Appearance (0.644), Likeability (0.703), and Self-confidence (0.877) have large positive loadings on factor 2, so label this factor
Personal Qualities.
• Communication (0.815) and Organization (0.864) have large positive loadings on factor 3, so label this factor Work Skills.
• Letter (0.805) and Resume (0.899) have large positive loadings on factor 4, so label this factor Writing Skills.
• Company Fit (−
−0.729) and Job Fit (−
−0.810) have large positive loadings on factor 5, so label this factor Company and Job Fit.

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 8 of 25

Factor Analysis
Coefficients

The coefficients are used to calculate the factor scores . Minitab calculates factor scores by multiplying factor score coefficients (listed
under Factor1, Factor2, and so on) and your data after they have been scaled and centered by subtracting means.
Use factor scores to:
• Examine the behavior of observations
• Use in another analysis such as regression or MANOVA.
Note You must standardize the variables to obtain the correct factor score.

Example Output

Factor Score Coefficients

Variable Factor1 Factor2 Factor3 Factor4 Factor5


Academic record 0.482 -0.101 0.066 -0.091 0.204
Appearance 0.028 0.154 0.010 -0.046 0.071
Communication -0.004 -0.095 0.372 0.010 0.100
Company Fit -0.081 -0.059 0.017 0.005 -0.324
Experience 0.209 -0.023 -0.038 -0.037 0.050
Job Fit -0.143 -0.073 -0.251 -0.018 -1.007
Letter -0.030 -0.106 0.045 0.072 -0.043
Likeability -0.065 0.221 -0.045 -0.045 -0.016
Organization -0.113 -0.035 0.793 -0.042 0.151
Potential 0.594 -0.065 -0.001 -0.144 0.125
Resume -0.104 0.055 -0.119 1.108 0.231
Self-confidence -0.119 0.807 -0.216 -0.194 0.046

Interpretation

For the job applicant data, the first factor scores are computed from the original data using the coefficients listed under Factor1 :
Factor 1 = 0.482 Academic record + 0.028 Appearance − 0.004 Communication − 0.081 Company Fit + 0.209 Experience − 0.143 Job Fit
− 0.030 Letter − 0.065 Likeability − 0.113 Organization + 0.594 Potential − 0.104 Resume − 0.119 Self-confidence
For the job applicant data, the factor score coefficient pattern matches the loading pattern. For example, for factor 1 the coefficients with
the highest absolute value (Academic record, Experience, and Potential) match the three variables for that load on factor 1.

Factor Analysis
Graphs − Scree Plot

Use the scree, or eigenvalue, plot (graph of factors versus the corresponding eigenvalues ) to provide visual information about the
factors. The eigenvalues of the correlation matrix equal the variances of the factors in the unrotated solution.
From this plot, you can determine how well the chosen number of components fit the data.

Example Output

Interpretation

For the job applicant data, you can conclude that the first five factors account for most of the total variability in data (given by the
eigenvalues). The remaining factors account for a very small proportion of the variability (close to zero) and are likely unimportant.

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 9 of 25

Factor Analysis
Graphs − Score Plot

The score plot graphs the second factor scores versus the first factor scores. The plot of the factors provides checks on the assumption of
normality and reveals outliers . If the data are normal and no outliers are present, the score plot shows the points randomly distributed
around zero.
When the first two factors account for most of the variance, you can use the score plot to visually assess the structure of your data.
To create plots for other factors, store the scores and use Graph > Scatterplot.

Example Output

Interpretation

For the job applicant data, the data appear normal and no outliers are apparent.

Factor Analysis
Graphs − Loading Plot

After selecting the number of factors, try different rotations so you can more easily interpret the factor loadings . The loading plot provides
information about the loadings of the first two factors.

Example Output

Interpretation

For the job applicant data, the varimax rotation was performed. You can now interpret the first two factors more easily:
• Academic record, Experience, and Potential have large positive loadings on factor 1, so label this factor Qualifications.
• Appearance, Likeability, and Self-confidence have large positive loadings on factor 2, so label this factor Personal Qualities.

Factor Analysis

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 10 of 25

Graphs − Biplot

The biplot overlays the score and loading plots. Use the biplot to assess the data structure and loadings on one graph. The second
principal component scores are plotted versus the first principal component scores. The loadings for these two principal components are
plotted on the same graph.

Example Output

Interpretation

For the job applicant data, the varimax rotation was performed. You can now interpret the first two factors more easily.
• The data appear normal and no outliers are apparent
• Academic record, Experience, and Potential have large positive loadings on factor 1, so label this factor Qualifications.
• Appearance, Likeability, and Self-confidence have large positive loadings on factor 2, so label this factor Personal qualities.

Extraction methods
Minitab offers two extraction methods for factor analysis: principal components and maximum likelihood. When performing factor analysis:
• If the factors and the errors obtained after fitting the factor model are assumed to follow a normal distribution , use the maximum
likelihood method to obtain maximum likelihood estimates of the factor loadings .
• If the factors and errors obtained after fitting the factor model are not assumed to follow a normal distribution , use the principal
components method.

Steps in factor analysis


The goal of factor analysis is to find a few factors, or unobservable variables, that explain most of the data variability and yet make
contextual sense. You must decide how many factors to use, and find loadings that make the most sense for your data.
The number of factors is often based on the proportion of variance explained by the factors, subject matter knowledge, and
reasonableness of the solution.
1. Perform a principal components analysis to reduce the data and to give you an idea of the number of factors. Examine the
proportion of variability explained by different factors, the eigenvalues, and the scree plot to narrow down your choice of how many
factors to use.
2. Perform a factor analysis, specifying the number of factors. Try different rotation methods to help you interpret the factor loadings.
3. Label the factors based on your knowledge of the data.
4. Construct scales or factor scores for use in other analyses.

Item Analysis
Summary

Use Item Analysis to evaluate whether the items in a survey or test assess the same skill or characteristic. Commonly used in social
sciences, education, or service quality industry.

Data Description

A restaurant manager wants to assess customer satisfaction with the following three questions:
Item 1 – How satisfied are you with our services?
Item 2 – How likely are you to visit us again?

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 11 of 25

Item 3 – How likely are you to recommend our restaurant to others?


Responses are recorded using the appropriate 5-point Likert scale below:
1 = Very dissatisfied, 2 = Dissatisfied, 3 = Neutral, 4 = Satisfied, and 5 = Very satisfied
1 = Very unlikely, 2 = Unlikely, 3 = Neutral, 4 = Likely, and 5 = Very likely
Before using these questions, the manager wants to assess whether the three questions truly measure the same construct, customer
satisfaction. A random sample of 50 customers are chosen and their responses are recorded.

Item Analysis
Test Statistics − Correlation Matrix

Use the correlation matrix to assess the strength and direction of the relationship between two items or variables. Items that measure the
same construct should have high, positive correlation values. If the items are not highly correlated, then the items may be ambiguous,
difficult to understand, or measure different constructs.
Often, variables with correlation values greater than 0.7 are considered highly correlated. However, your correlation benchmark value will
depend on subject area knowledge and the number of items in your analysis.

Example Output

Correlation Matrix

Item 1 Item 2
Item 2 0.903
Item 3 0.867 0.864

Cell Contents: Pearson correlation

Interpretation

For the customer service data, Item 1 and Item 2 have a correlation of 0.903. All the items are highly correlated to each other.

Item Analysis
Test Statistics − Item and Total Statistics

The tabulated statistics table summarizes information about each item and the total for all items:
• Total count – Number of observations.
• Mean – Sum of all observations divided by the total count.
• StDev – Measure of dispersion analogous to the average distance (independent of direction) of each observation from the mean.

Example Output

Item and Total Statistics

Total
Variable Count Mean StDev
Item 1 50 3.1600 1.2675
Item 2 50 2.8400 1.3607
Item 3 50 2.9400 1.3463
Total 50 8.9400 3.8087

Interpretation

For the customer service data, Item 1 has 50 observations, the mean response is 3.1600, and the standard deviation is 1.2675.

Item Analysis
Test Statistics − Cronbach's Alpha

A measure of internal consistency used to assess how reliably multiple items in a survey or test assess the same skill or characteristic.
Values range between 0 and 1. If Cronbach's alpha is low, then the items may not reliably measure a single construct. Typically, a value
of 0.7 or higher is considered good. However, the benchmark value varies by subject area and number items.

Example Output

Cronbach's Alpha = 0.9550

Interpretation

For the customer service data, the overall Cronbach's alpha is 0.9550. The value is greater than the common benchmark of 0.7 and
suggests that the items are measuring the same construct.

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 12 of 25

Item Analysis
Test Statistics − Omitted Item Statistics

Use the omitted item statistics table to assess whether removing an item from the analysis improves the internal consistency:
• Adj Total Mean – Mean of the total after omitting an item from the set.
• Adj Total Stdev – Standard deviation after omitting an item from the set.
• Item-Adi Total Corr – Correlation between the scores of one omitted item and the total scores of all other items. In practice, values
range between 0 and 1. A higher value suggests that the omitted item measures the same construct as the other items.
2
• Squared Multiple Corr – Coefficient of determination (R ) when the omitted item is regressed on the remaining items. Values range
between 0 and 1. A higher value suggests that the omitted item measures the same construct as the other items.
• Cronbach's Alpha – Cronbach's alpha calculated after an item is omitted from the analysis. Fairly constant values for all omitted items
suggest that all items measure the same construct. An increase in Cronbach's alpha for a specific omitted item suggests that it does
not measure the same construct.
If an omitted item has a low Item-Adj Total Corr, a low Squared Multiple Corr, and Cronbach's alpha substantially increases, then you
might consider removing it from the set.

Example Output

Omitted Item Statistics

Adj. Adj. Item-Adj. Squared


Omitted Total Total Total Multiple Cronbach's
Variable Mean StDev Corr Corr Alpha
Item 1 5.780 2.613 0.9166 0.8447 0.9268
Item 2 6.100 2.525 0.9134 0.8413 0.9277
Item 3 6.000 2.563 0.8870 0.7869 0.9476

Interpretation

For the customer service data, item adjusted total correlation when item 1, 2, and 3 are omitted from the model is 0.9166, 0.9134, and
0.8870 respectively.
The squared multiple correlation when item 1, 2, and 3 are omitted from the model is 0.8447, 0.8413, and 0.7869 respectively.
Cronbach's alpha when item 1, 2, and 3 are omitted from the model is 0.9268, 0.9277, and 0.9476 respectively.
The Item-Adj Total Corr and Squared Multiple Corr are consistently high and the Cronbach's alpha values do not differ significantly.
Collectively, the evidence suggests that all items measure the same construct.

Item Analysis : Topics


Summary
Correlation matrix
Item and total statistics
Cronbach's alpha
Omitted item statistics
Graphs
Matrix plot
Item Analysis
Graphs − Matrix Plot

Use the plot to visually assess the relationship between every combination of items or variables.

Example Output

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 13 of 25

Interpretation

For the Customer service data, the plot suggests that all the items have a linear and positive relationship.

Cluster Observations
Summary

Use the cluster observations procedure to classify similar observations into groups, when the groups are initially not known. Cluster
observations uses a hierarchical clustering procedure.
For example, you can:
• Make measurements on 5 nutritional characteristics of 12 breakfast cereal brands, then group the cereal brands that have similar
characteristics
• Record information about characteristics of recreation parks in the United States, then group the parks by their similarities
You can perform the clustering of observations analysis when:
• You want to sort observations into two or more categories
• You do not have missing data

Data Description

You are currently developing a new piece of sporting equipment and you want to test different groups on the equipment's ease of use.
You have data on gender, height, weight, and handedness of 20 people, and you want to group the available people by their similarities.

Cluster Observations
Amalgamation Steps

At each amalgamation step, two clusters are joined until only one cluster remains. The amalgamation table shows the clusters joined,
their similarity level, distance between them, and other characteristics of each newly formed cluster. In Minitab, your choice of linkage
method and distance measure will greatly influence the clustering outcome.
You should look at the similarity level and distance between the joined clusters to choose the number of clusters for the final partition of
data. The step prior to where the values change abruptly may determine the number of clusters for the final partition. For the final
partition, you want reasonably large similarity levels and reasonably small distances between the joined clusters.

Example Output

Standardized Variables, Euclidean Distance, Complete Linkage


Amalgamation Steps
Number
of obs .
Number of Similarity Distance Clusters New in new
Step clusters level level joined cluster cluster
1 19 96.6005 0.16275 13 16 13 2
2 18 95.4642 0.21715 17 20 17 2
3 17 95.2648 0.22669 6 9 6 2
4 16 92.9178 0.33905 17 18 17 3
5 15 90.5296 0.45339 11 15 11 2
6 14 90.3124 0.46378 12 19 12 2
7 13 88.2431 0.56285 2 14 2 2
8 12 88.2431 0.56285 5 8 5 2
9 11 85.9744 0.67146 6 10 6 3
10 10 83.0639 0.81080 7 13 7 3
11 9 83.0639 0.81080 1 3 1 2

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 14 of 25

12 8 81.4039 0.89027 2 17 2 5
13 7 79.8185 0.96617 6 11 6 5
14 6 78.7534 1.01716 4 12 4 3
15 5 66.2112 1.61760 2 5 2 7
16 4 62.0036 1.81904 1 6 1 7
17 3 41.0474 2.82229 1 4 1 10
18 2 40.1718 2.86421 2 7 2 10
19 1 0.0000 4.78739 1 2 1 20

Interpretation

For the sports data, the amalgamation steps show that:


• The similarity level decreases, first by increments of about 4 or less, then by about 21 (from 62.0036 to 41.0474) at steps 16 and 17
(from 4 clusters to 3).
• The distance between the joined clusters increases, first by about 0.6 or less, then by about 1 (from 1.81904 to 2.82229) at steps 16
and 17 (from 4 clusters to 3).
These facts could indicate that 4 clusters are reasonably sufficient for the final partition, assuming that this grouping makes intuitive sense
for the data.

Cluster Observations
Final Partition

Minitab summarizes each cluster in the final partition by the:


• Number of observations
• Within cluster sum of squares
• Average distance from the observation to the cluster centroid
• Maximum distance of the observation to the cluster centroid.
The Cluster Centroids are the vectors of variable means for the observations in the clusters and are used as cluster midpoints.
The Distances Between Cluster Centroids (computed using the chosen distance measure between observations) show how far the
formed clusters are from each other. These numbers are not very informative by themselves, but you can compare the cluster differences
to see how different the clusters are from each other.

Example Output

Number of clusters: 4
Average Maximum
Within distance distance
Number of cluster sum from from
observations of squares centroid centroid
Cluster1 7 3.25713 0.612540 1.12081
Cluster2 7 2.72247 0.581390 0.95186
Cluster3 3 0.55977 0.398964 0.54907
Cluster4 3 0.37116 0.326533 0.48848

Cluster Centroids
Variable Cluster1 Cluster2 Cluster3 Cluster4 Grand centroid
Gender 0.97468 -0.97468 0.97468 -0.97468 -0.0000000
Height -1.00352 1.01283 -0.37277 0.35105 -0.0000000
Weight -0.90672 0.93927 -0.86797 0.79203 -0.0000000
Handedness 0.63808 0.63808 -1.48885 -1.48885 -0.0000000

Distances Between Cluster Centroids


Cluster1 Cluster2 Cluster3 Cluster4
Cluster1 0.00000 3.35759 2.21882 3.61171
Cluster2 3.35759 0.00000 3.67557 2.23236
Cluster3 2.21882 3.67557 0.00000 2.66074
Cluster4 3.61171 2.23236 2.66074 0.00000

Interpretation

For the sports data, four clusters are evident in the final partition. The within cluster sums of squares are reasonably small − the sum of
squares is bigger for the first cluster (3.25713). You should further analyze your data to see if the grouping makes sense.

Cluster Observations : Topics


Summary
Amalgamation steps
Final partition
Graphs
Dendrogram

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 15 of 25

Cluster Observations
Graphs − Dendrogram

The dendrogram displays the groups formed by clustering of observations and their similarity levels. You can also display the distance
level on the y-axis. You should analyze the data to see if the classification makes sense.
You can label observations using a column of case labels to help interpret the clusters in the dendrogram.

Example Output

Interpretation

The final partition for the sports data contains four clusters of 20 people, classified by similarities:

Cluster Observations Similarity

Red 7 Right-handed females

Blue 3 Left-handed females

Green 7 Right-handed males

Orange 3 Left-handed males


The orange cluster has the higher similarity level.

Steps in clustering of observations


The final grouping of objects (also called the final partition) is the grouping of objects that should identify groups whose observations
share common characteristics. The decision about final grouping is also called cutting the dendrogram. The complete dendrogram (tree
diagram) is a graphical depiction of the amalgamation of observations into one cluster. Cutting the dendrogram is akin to drawing a line
across the dendrogram to specify the final grouping.
How do you know where to cut the dendrogram?
• First execute cluster analysis without specifying a final partition. Examine the similarity and distance levels in the Session window
results and in the dendrogram. The pattern of how similarity or distance values change from step to step can help you to choose the
final grouping. The step prior to the values changing abruptly may identify a good point for cutting the dendrogram, if this makes
sense for your data.
• After choosing where you wish to make your partition, rerun the clustering procedure, using either number of clusters or similarity level
to give you either a set number of groups or a similarity level for cutting the dendrogram. Examine the resulting clusters in the final
partition to see if the grouping seems logical.
• Looking at dendrograms for different final groupings can also help you to decide which one makes the most sense for your data.

Standardizing variables
When the variables are in different units, you should standardize all variables to minimize the effect of scale differences. Minitab
standardizes all variables by subtracting the means and dividing by the standard deviation before calculating the distance matrix. When
you standardize variables, the grand centroid is 0 for all clusters.

Distance measures for observations


The distance matrix contains distances between observations. Minitab provides five measures to calculate distance (you should choose a
distance measure according to properties of your data).

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 16 of 25

• Euclidean distance − a standard mathematical measure of distance (square root of the sum of squared differences).
• Pearson distance − a square root of the sum of square distances divided by variances. This distance gives a standardized measure of
distance.
• Manhattan distance − the sum of absolute differences, so that outliers receive less weight than they would if the Euclidean method
were used.
• Squared Euclidean and squared Pearson distances − the square of the Euclidean and Pearson distances, respectively. Therefore, the
distances that are large under the Euclidean and Pearson methods are even larger under the squared Euclidean and squared
Pearson distances.

Linkage methods for observations


A linkage rule is necessary for calculating inter-cluster distances when a cluster has multiple observations.
• Single linkage or "nearest neighbor" − the distance between two clusters is the minimum distance between an observation in one
cluster and an observation in the other cluster. Single linkage is a good choice when clusters are clearly separated.
• Average linkage − the distance between two clusters is the mean distance between an observation in one cluster and an observation
in the other cluster.
• Centroid linkage − the distance between two clusters is the distance between the cluster centroids .
• Complete linkage or "furthest neighbor" − the distance between two clusters is the maximum distance between an observation in one
cluster and an observation in the other cluster. This method ensures that all observations in a cluster are within a maximum distance
and tends to produce clusters with similar diameters. The results can be sensitive to outliers.
• Median linkage − the distance between two clusters is the median distance between an observation in one cluster and an observation
in the other cluster. This technique uses the median rather than the mean, thus dampening the influence of outliers.
• McQuitty's linkage − when two clusters are joined, the distance of the new cluster to any other cluster is calculated as the average of
the distances of the soon-to-be-joined clusters to the other cluster. For example, if clusters 1 and 3 are to be joined into a new cluster,
say 1*, then the distance from 1* to cluster 4 is the average of the distances from 1 to 4 and 3 to 4.
• Ward's linkage − the distance between two clusters is the sum of squared deviations from points to centroids. The objective of Ward's
linkage is to minimize the within-cluster sum of squares. It tends to produce clusters with similar numbers of observations, but it is
sensitive to outliers. In Ward's linkage, the distance between two clusters can be larger than dmax, the maximum value in the original
distance matrix. If this happens, the similarity will be negative.
If you choose Average, Centroid, Median, or Ward as the linkage method, you should use one of the squared distance measures.

Cluster Variables
Summary

Use the cluster variables procedure to classify variables into groups when the groups are initially not known. The primary reason to
cluster variables is to reduce the number of variables. Cluster variables uses a hierarchical clustering procedure.
For example, you can:
• Record information about the characteristics of recreation parks in the United States, then group the similar variables to reduce the
data.
• Conduct a study to determine the effects of a change in environment on blood pressure. You can record different physical
measurements of the subjects, analyze the variables for possible similar characteristics, and combine the variables to reduce the
data.
You can perform the clustering of variables analysis when:
• You want to sort variables into two or more categories
• You do not have missing data

Data Description

You conduct a study to determine effects of the number of media outlets and universities and the literacy rate on the college admissions
of the population. For 10 cities around the world, you count the number of newspaper copies, radios, and television sets per 1,000
people. You also determine the literacy rate and whether a university is located in each city. You want to reduce the number of variables
by combining variables based on similar characteristics.

Cluster Variables
Amalgamation Steps

At each amalgamation step, two clusters variables are joined until only one cluster remains. The amalgamation table shows the clusters
joined, their similarity level, distance between them, and other characteristics of each newly formed cluster. In Minitab, your choice of
linkage method and distance measure will greatly influence the clustering outcome.
You should look at the similarity level and distance between the joined clusters to choose the number of clusters for the final partition of
data. The step prior to where the values change abruptly may determine the number of clusters for the final partition. For the final
partition, you want reasonably large similarity levels and reasonably small distances between the joined clusters.

Example Output

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 17 of 25

Correlation Coefficient Distance, Average Linkage


Amalgamation Steps
Number
of obs.
Number of Similarity Distance Clusters New in new
Step clusters level level joined cluster cluster
1 4 93.9666 0.120669 2 3 2 2
2 3 93.1548 0.136904 4 5 4 2
3 2 87.3150 0.253700 1 4 1 3
4 1 79.8113 0.403775 1 2 1 5

Interpretation

For the college admissions data, the amalgamation steps show that the similarity level changes first from 93.97 to 93.15, then changes
abruptly from 93.1548 to 87.3150 at steps 2 and 3 (from 3 clusters to 2). This change could indicate that 3 clusters are reasonably
sufficient for the final partition, assuming that this grouping makes sense for the data.

Cluster Variables
Final Partition

If you request a final partition , you receive a list of the variables in each cluster.

Example Output

Final Partition

Cluster 1
Newspaper
Cluster 2
Radios TV Sets
Cluster 3
Literacy Rate University

Interpretation

For the college admissions data, three clusters are formed:


• Numbers of newspaper copies per 1,000 people
• Number of radios and television sets
• Literacy level and whether a university is located in the city
This grouping seems reasonable.

Cluster Variables : Topics


Summary
Amalgamation steps
Final partition
Graphs
Dendrogram
Cluster Variables
Graphs − Dendrogram

The dendrogram displays the groups formed by clustering of variables, and their similarity levels. You can also display distance levels on
the y-axis. You should analyze your data to see if the classification makes sense.

Example Output

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 18 of 25

Interpretation

For the college admissions data, the final partition contains three clusters classified by similarities:

Cluster Variables Similarity

Red 1 Number of newspaper copies

Blue 2 Literacy rate and presence of a university

Green 2 Number of radios and television sets

Steps in clustering of variables


The final grouping of clusters (also called the final partition) is the grouping of clusters that should identify groups whose observations
share common characteristics. The decision about final grouping is also called cutting the dendrogram. The complete dendrogram (tree
diagram) is a graphical depiction of the amalgamation of observations into one cluster. Cutting the dendrogram is akin to drawing a line
across the dendrogram to specify the final grouping.
How do you know where to cut the dendrogram?
• You might first execute cluster analysis without specifying a final partition. Examine the similarity and distance levels in the Session
window results and in the dendrogram. The pattern of how similarity or distance values change from step to step can help you to
choose the final grouping. The step where the values change abruptly may identify a good point for cutting the dendrogram, if this
makes sense for your data.
• After choosing where you wish to make your partition, rerun the clustering procedure, using either Number of clusters or Similarity
level to give you either a set number of groups or a similarity level for cutting the dendrogram. Examine the resulting clusters in the
final partition to see if the grouping seems logical.
• Looking at dendrograms for different final groupings can also help you to decide which one makes the most sense for your data.
• If the purpose behind the clustering of variables is data reduction, you may decide to use your knowledge of the data to a greater
degree in determining the final partition.

Distance measures for variables


The distance matrix contains distances between variables. Minitab provides two methods to measure distance (you should choose a
distance measure according to properties of your data).
• Correlations for distance measures − The (i,j) entry of the distance matrix is dij = 1 − ρij where ρij is the correlation between variables i
and j. The correlation method gives distances between 0 and 1 for positive correlations, and between 1 and 2 for negative correlations. If
it makes sense to consider negatively correlated data to be farther apart than positively correlated data, use the correlation method.
• Absolute correlations for distance measures − The (i,j) entry of the distance matrix is dij = 1 − | ρij | where ρij is the correlation between
variables i and j. The absolute correlation method gives distances between 0 and 1. If you think that the strength of the relationship is
important in considering distance and not the sign, then use the absolute correlation method.

Linkage methods for variables


A linkage rule is necessary for calculating inter-cluster distances when there are multiple variables in a cluster.
• Single linkage or "nearest neighbor" − the distance between two clusters is the minimum distance between a variable in one cluster
and a variable in the other cluster. Single linkage is a good choice when clusters are clearly separated.
• Average linkage − the distance between two clusters is the mean distance between a variable in one cluster and a variable in the
other cluster.
• Centroid linkage − the distance between two clusters is the distance between the cluster centroids .

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 19 of 25

• Complete linkage or "furthest neighbor" − the distance between two clusters is the maximum distance between a variable in one
cluster and a variable in the other cluster. This method ensures that all variables in a cluster are within a maximum distance and tends
to produce clusters with similar diameters. The results can be sensitive to outliers.
• Median linkage − the distance between two clusters is the median distance between a variable in one cluster and a variable in the
other cluster. This technique uses the median rather than the mean, thus downweighting the influence of outliers.
• McQuitty's linkage − when two clusters are joined, the distance of the new cluster to any other cluster is calculated as the average of
the distances of the soon-to-be-joined clusters to the other cluster. For example, if clusters 1 and 3 are to be joined into a new cluster,
say 1*, then the distance from 1* to cluster 4 is the average of the distances from 1 to 4 and 3 to 4.
• Ward's linkage − the distance between two clusters is the sum of squared deviations from points to centroids. The objective of Ward's
linkage is to minimize the within-cluster sum of squares. It tends to produce clusters with similar numbers of variables, but it is
sensitive to outliers. In Ward's linkage, the distance between two clusters can be larger than dmax, the maximum value in the original
distance matrix. If this happens, the similarity will be negative.

Cluster K-Means
Summary

Use Cluster K-Means to cluster observations into groups when the groups are initially unknown. This procedure uses non-hierarchical
clustering of observations. K-means clustering works best when sufficient information is available to make good starting cluster
designations .
For example, you can use Cluster K-Means to:
• Group people based on physical fitness level when you suspect that people fall into three categories
• Cluster population data with known characteristics from a large database
Use Cluster K-Means when you have:
• Good starting points for clusters
• Multiple measurements on items or subjects
• No missing data

Data Description

As a business analyst, you want to classify 22 successful small-to-medium size manufacturing companies into meaningful groups for
future analyses.
You suspect that the data fall into three groups:
• Established companies − in existence for over 15 years, over 130 clients, a strong return on investments, and strong sales
• Mid-growth companies − in existence for over 10 years, over 100 clients, a good return on investments, and good sales
Young companies − in existence for less than 10 years, less than 100 clients, a good return on investments, and good sales

Cluster K-Means : Topics


Summary
Final partition
Clusters and centroids
Cluster K-Means
Final Partition − Clusters and Centroids

Minitab summarizes each cluster in the final partition by:


• Number of observations
• Within cluster sum of squares
• Average distance from observation to the cluster centroid
• Maximum distance of the observation to the cluster centroid
In general, a cluster with a small sum of squares is more compact than one with a large sum of squares.
The cluster centroids are the vectors of variable means for the observations in the clusters and are used as cluster midpoints.
The distances between cluster centroids show the distances between the formed clusters. These numbers are not very informative by
themselves, but you can compare the cluster distances to see how different the clusters are from each other. The nearest cluster is the
one which has the smallest Euclidean distance between the observation and the centroid of the cluster.

Example Output

Standardized Variables

Final Partition

Number of clusters: 3

Within Average Maximum


cluster distance distance

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 20 of 25

Number of sum of from from


observations squares centroid centroid
Cluster1 4 1.593 0.578 0.884
Cluster2 8 8.736 0.964 1.656
Cluster3 10 12.921 1.093 1.463

Cluster Centroids

Grand
Variable Cluster1 Cluster2 Cluster3 centroid
Clients 1.2318 0.5225 -0.9108 0.0000
Rate of return 1.2942 0.2217 -0.6950 0.0000
Sales 1.1866 0.5157 -0.8872 0.0000
Years 1.2030 0.5479 -0.9195 0.0000

Distances Between Cluster Centroids

Cluster1 Cluster2 Cluster3


Cluster1 0.0000 1.5915 4.1658
Cluster2 1.5915 0.0000 2.6488
Cluster3 4.1658 2.6488 0.0000

Interpretation

For the manufacturing company data, K-means clustering classified the 22 companies as 4 established companies, 8 mid-growth
companies, and 10 young companies. Cluster 1 (established companies) has the smallest within-cluster sum of squares (1.593), probably
because there are only 4 observations in this cluster.

Initializing Cluster K-Means


K-means clustering begins with a grouping of observations into a predefined number of clusters.
1 Minitab evaluates each observation, moving it into the nearest cluster. The nearest cluster is the one that has the smallest Euclidean
distance between the observation and the centroid of the cluster.
2 When a cluster changes, by losing or gaining an observation, Minitab recalculates the cluster centroid.
3 This process is repeated until no more observations can be moved into a different cluster. At this point, all observations are in their
nearest cluster according to the criterion listed above.
Unlike hierarchical clustering of observations, it is possible for two observations to be split into separate clusters after they are joined
together.

K-means procedures work best when you provide good starting points for clusters.

Outliers
The presence of outliers , unusually large or small values, in your data can affect the clustering of observations. The clusters are typically
larger when outliers are not removed, and the resulting solution is more confused. If you have a large data set, consider deleting outliers
from the Cluster K-Means analysis.

Discriminant Analysis
Summary

Use Discriminant Analysis to classify observations into two or more groups if you have a sample with known groups. You can also use a
discriminant analysis to investigate how variables contribute to group separation and to place objects or individuals into defined groups;
for example, to classify three species of birds based on wingspan, color, and diet.
Discriminant analysis is similar to multiple regression because both techniques use two or more predictor variables and a single response
variable. However, in discriminant analysis, the response variable is categorical . Examples of categorical variables are gender, color, and
operational status.
You can perform a linear or quadratic discriminant analysis:
• Use a linear analysis when you assume the covariance matrices are equal for all groups.
• Use a quadratic analysis when you assume the covariance matrices are not equal for all groups.

Data Description

High school administrators assign each student to one of three educational tracks:
• 1 − for above average students who can learn independently and have strong math and language skills
• 2 − for average students who learn best with a moderate amount of teacher attention and have average math and language skills
• 3 − for students who require substantial interaction with the teacher and have weak math and language skills
An intelligence test and motivation assessment were administered to 60 students from each track. School officials want to know if the
students' intelligence test and motivation assessment scores accurately classify student placement.

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 21 of 25

Discriminant Analysis : Topics


Summary
Classification
Without cross-validation
With cross-validation
Distance and discriminant functions
Squared distance between groups
Linear discriminant function for groups
Group and pooled statistics
Descriptive statistics
Covariance matrices
Summary
Summary of classified observations
Discriminant Analysis
Classification − Without Cross-Validation

In the summary of the classification results, Minitab classifies the observations by:
• The group in which they are placed
• The group in which they are predicted to be placed
Minitab provides the following statistics for each true group :
• Total N − the number of observations in each true group
• N Correct − the number of observations correctly placed in each true group
• Proportion − the proportion of observations correctly placed in each true group
N , N Correct , and Proportion Correct are also shown for all groups.

Example Output

Linear Method for Response: Track

Predictors: IQ Score, Motivation

Group 1 2 3
Count 60 60 60

Summary of classification
True Group
Put into Group 1 2 3
1 59 5 0
2 1 53 3
3 0 2 57
Total N 60 60 60
N correct 59 53 57
Proportion 0.983 0.883 0.950

N = 180 N Correct = 169 Proportion Correct = 0.939

Interpretation

For the education data:


• Group 1 has a placement of 98.3% (0.983)
• Group 2 has a placement of 88.3% (0.883)
• Group 3 has placement of 95.0% (0.950)
Classifying group 2 students presents the most problems. Overall, 169 out of 180 students or 93.9% (0.939), are correctly placed.

Discriminant Analysis
Classification − With Cross-Validation

The discriminant analysis output includes a summary of the classification results with cross-validation. Cross-validation compensates for
an optimistic error rate . Therefore, the cross-validation classification results are more conservative than classification without cross-
validation.
As in the regular classification situation, observations are classified by:
• The group in which they are placed
• The group in which they are predicted to be placed
Minitab provides the following statistics for each true group :

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 22 of 25

• Total N − the number of observations in each true group


• N Correct − the number of observations correctly placed in each true group
• Proportion − the proportion of observations correctly placed in each true group
N , N Correct , and Proportion Correct are also shown for all groups.

Example Output

Summary of Classification with Cross-validation

True Group
Put into Group 1 2 3
1 59 5 0
2 1 52 3
3 0 3 57
Total N 60 60 60
N correct 59 52 57
Proportion 0.983 0.867 0.950

N = 180 N Correct = 168 Proportion Correct = 0.933

Interpretation

For the education data:


• Group 1 has a placement of 98.3% (0.983)
• Group 2 has a placement of 86.7% (0.867)
• Group 3 has placement of 95.0% (0.950)
Classifying group 2 students presents the most problems.
Overall, 168 out of 180 students or 91.1% (0.933), are correctly placed.

Discriminant Analysis
Distance and Discriminant Functions − Squared Distance Between Groups

An observation is classified into a group if the squared distance (also called the Mahalanobis distance) of the observation to the group
center (mean) is the minimum. For linear discriminant analysis, an assumption is made that covariance matrices are equal for all groups.
The table displays the distance between groups. Linear discriminant analysis has the property of symmetric squared distance, meaning
that the linear discriminant function of group i evaluated with the mean of group j is equal to the linear discriminant function of group j
evaluated with the mean of group i.

Example Output

Squared Distance Between Groups

1 2 3
1 0.0000 12.9853 48.0911
2 12.9853 0.0000 11.3197
3 48.0911 11.3197 0.0000

Interpretation

For the education data, the greatest distance is between groups 1 and 3 (48.0911). The difference between groups 1 and 2 is 12.9853,
and the difference between groups 2 and 3 is 11.3197.

Discriminant Analysis
Distance and Discriminant Functions − Linear Discriminant Function for Groups

The linear discriminant scores for each group correspond to the regression coefficients in multiple regression analyses. In general, you
can fit a linear equation of the type:
Group = b0 + b1x1 + b2x1 + ... + bmxm
where:
• b0 is the constant
• b1 through bm are the estimated regression coefficients
• x1... xm are the predictors
The linear discriminant functions discriminate between groups. For example, when you have three groups, Minitab estimates a function
for discriminating between:
• Group 1 and groups 2 and 3
• Group 2 and groups 1 and 3

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 23 of 25

• Group 3 and groups 1 and 2


The groups with the largest linear discriminant function, or regression coefficients, contribute most to the classification of observations.

Example Output

Linear Discriminant Function for Groups

1 2 3
Constant -237.85 -170.58 -115.65
IQ Score 1.57 1.19 0.90
Motivation 5.15 4.66 4.00

Interpretation

For the education data, group 1 has the highest linear discriminant function (1.57), indicating that group 1 contributes more than group 2
or 3 to the classification of group membership.

Discriminant Analysis
Group and Pooled Statistics − Descriptive Statistics

The descriptive statistic summary displays the pooled mean and group means , and the pooled standard deviation and group standard
deviations .

Example Output

Pooled Means for Group


Variable Mean 1 2 3
IQ Score 102.08 127.37 100.62 78.25
Motivation 47.056 53.600 47.417 40.150

Pooled StDev for Group


Variable StDev 1 2 3
IQ Score 8.109 8.308 9.266 6.511
Motivation 2.994 2.409 3.243 3.251

Interpretation

For the education intelligence test data:


• Group 1 has the highest mean score (127.37)
• Group 2 has the second highest mean (100.62)
• Group 3 has the lowest mean (78.25)
The overall intelligence test mean for all groups was 102.08.
The pooled standard deviation for the intelligence test for all groups is 8.109. The standard deviations for groups 1 through 3 are: 8.308,
9.266, and 6.511.
For the education motivation assessment data:
• Group 1 has the highest mean score (53.600)
• Group 2 has the second highest mean (47.417)
• Group 3 has the lowest mean (40.150)
The overall motivation mean for all groups was 47.056.
The pooled standard deviation for the motivation assessment for all groups is 2.994. The standard deviations for groups 1 through 3 are:
2.409, 3.243, and 3.251.

Discriminant Analysis
Group and Pooled Statistics − Covariance Matrices

The covariance tables display the pooled covariance matrix and the covariance matrices for each group. Covariance is a measure of
linear association among observations.

Example Output

Pooled Covariance Matrix

IQ Score Motivation
IQ Score 65.759
Motivation 4.730 8.964

Covariance matrix Group 1

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 24 of 25

IQ Score Motivation
IQ Score 69.0158
Motivation 0.6915 5.8034

Covariance matrix for Group 2

IQ Score Motivation
IQ Score 85.8675
Motivation 4.6031 10.5184

Covariance matrix for Group 3

IQ Score Motivation
IQ Score 42.3941
Motivation 8.8941 10.5703

Interpretation

For the education data, the covariances for groups 1 through 3 are: 69.0158, 85.8675, and 42.3941. The pooled covariance is 65.759.

Discriminant Analysis
Summary of Classified Observations

The summary of classified observations provides the following information:


• Obs − observation number for each observation (the symbols ** denote a misclassified observation)
• True Group − actual group in which the observation is classified
• Pred Group − predicted group for each observation
• X-val Group − predicted group for each observation based on the cross-validation procedure
• Squared Distance Pred and Squared Distance X-val − predicted distance from each group's mean based on results without and with
cross-validation
• Probability Pred and Probability X-val − predicted probabilities of a student being placed in each group based on results without and
with cross-validation

Example Output

Summary of Classified Observations

True Pred X-val Squared Distance Probability


Obs Group Group Group Group Pred X-val Pred X-val
1 1 1 1 1 1.257 1.302 0.99 0.99
2 10.155 10.102 0.01 0.01
3 42.782 42.564 0.00 0.00
2 1 1 1 1 1.098 1.136 1.00 1.00
2 21.379 21.388 0.00 0.00
3 63.521 63.554 0.00 0.00
3 1 1 1 1 1.257 1.302 0.99 0.99
2 10.155 10.102 0.01 0.01
3 42.782 42.564 0.00 0.00
4** 1 2 2 1 3.524 3.699 0.44 0.42
2 3.028 3.072 0.56 0.58
3 25.579 25.960 0.00 0.00

--------------------------------------------------------------

178 3 3 3 1 40.644 40.465 0.00 0.00


2 9.234 9.182 0.02 0.02
3 1.531 1.589 0.98 0.98
179 3 3 3 1 39.0226 38.9067 0.00 0.00
2 7.3604 7.3357 0.03 0.03
3 0.5249 0.5414 0.97 0.97
180 3 3 3 1 38.1331 38.0429 0.00 0.00
2 6.7162 6.7011 0.05 0.05
3 0.6069 0.6263 0.95 0.95

Note Table truncated for space.

Interpretation

For the education data, the summary of classified observations indicates in which group the observation should have been placed. For
example, school officials should have placed student 4 into group 2, but they misclassified this student into group 1.

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 25 of 25

Linear versus quadratic discriminant analysis


You can perform linear or quadratic discriminant analyses:
• Use a linear analysis when you assume the covariance matrices are equal for all groups.
• Use a quadratic analysis when you assume the covariance matrices are not equal for all groups.
For linear discriminant analysis, an observation is classified into a group if the squared distance (also called the Mahalanobis distance) of
the observation to the group center (mean) is the minimum. An assumption is made that covariance matrices are equal for all groups. The
unique part of the squared distance formula for each group is called the linear discriminant function. For any observation, the group with
the smallest squared distance has the largest linear discriminant function and the observation is then classified into this group.
Linear discriminant analysis has the property of symmetric squared distance: the linear discriminant function of group i evaluated with the
mean of group j is equal to the linear discriminant function of group j evaluated with the mean of group i.
With a quadratic discriminant analysis, no assumption that the groups have equal covariance matrices exists. As with a linear discriminant
analysis, an observation is classified into the group that has the smallest squared distance. However, the squared distance does not
simplify into a linear function, hence the name quadratic discriminant analysis.
Unlike linear distance, quadratic distance is not symmetric. In other words, the quadratic discriminant function of group i evaluated with
the mean of group j is not equal to the quadratic discriminant function of group j evaluated with the mean of group i. In the Minitab results,
quadratic distance is called the generalized squared distance. If the determinant of the sample group covariance matrix is less than one,
the generalized squared distance can be negative.

Cross-validation
Cross-validation is used to compensate for an optimistic apparent error rate. The apparent error rate is the percent of misclassified
observations. This number tends to be optimistic because the classified data are the same data used to build the classification function.
The cross-validation routine works by omitting each observation one at a time, recalculating the classification function using the remaining
data, and then classifying the omitted observation. The computation time takes approximately four times longer with this procedure. When
cross-validation is performed, Minitab displays an additional summary table.
As an alternative to cross-validation, you can calculate a more realistic error rate by splitting your data into two parts. Use one part to
create the discriminant function, and the other part as a validation set. Predict group membership for the validation set and calculate the
error rate as the percent of these data that are misclassified.

Predicting with discriminant analysis


You can use a discriminant analysis to calculate the discriminant functions from observations with known groups. When new observations
are made, you can use the discriminant function to predict which group they belong to.
If explanatory variables do not follow a multivariate normal distribution with equal covariance matrices for the level of the response, the
use of standard discriminant analysis procedures will be statistically inconsistent. In such cases, logistic regression gives more accurate
results.

Prior probabilities
Sometimes you know the probability of an observation belonging to a group prior to conducting a discriminant analysis. For example, if
you are classifying the buyers of a particular car, you may already know that 60% of purchasers are male and 40% are female. If you
know or can estimate these probabilities, a discriminant analysis can use these prior probabilities in calculating the posterior probabilities
(the probabilities of assigning observations to groups given the data).
With the assumption that the data have a normal distribution, the linear discriminant function is increased by ln(pi), where pi is the prior
probability of group i. Because observations are assigned to groups according to the smallest generalized distance, or equivalently the
largest linear discriminant function, the effect is to increase the posterior probabilities for a group with a high prior probability.
Specifying prior probabilities can greatly affect the accuracy of your results. Investigate whether the unequal proportions across groups
reflect a real difference in the true population or whether the difference is a result of sampling error.

file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07

You might also like