Minitab Statguide Multivariate

Summary - Principal Components Page 1 of 25
Principal Components
Summary
Use Principal Components to form a smaller number of uncorrelated variables. The goal of principal components analysis is to explain the
maximum amount of variance with the fewest number of principal components .
For example:
• You record information on 10 socioeconomic variables and you want to reduce the variables into a smaller number of components to
more easily analyze the data.
• You want to analyze the customer responses to several attributes of a new product in order to form a smaller number of uncorrelated
variables that are easier to interpret.
Principal components analysis is commonly used as one step in a series of analyses. For example, you can use Principal Components to
reduce your data and avoid multicollinearity or when you have too many predictors relative to the number of observations.
A principal components analysis often uncovers unsuspected relationships, allowing you to interpret the data in a new way.
You can perform principal components analysis when you have one sample and several variables are measured on each sampling unit.
Data Description
A bank requires eight pieces of information when applying for a loan: income, education level, age, length of time at current residence,
length of time with current employer, savings, debt, and number of credit cards. A bank administrator wants to analyze this information for
reporting purposes.
Eigenanalysis − Eigenvalues
Use the eigenvalue results to determine the number of principal components .

One way to determine the number of principal components is based on the size of eigenvalues . In the eigenanalysis of the correlation
matrix, the eigenvalues of the correlation matrix equal the variances of the principal components. According to the Kaiser criterion, retain
principal components with eigenvalues greater than 1.
You can also decide on the number of principal components based on the amount of explained variance. For example, you may retain
components that cumulatively explain 90% of the variance. Another technique is to analyze a scree plot. Use any one or combination of
these techniques to determine the number of principal components.
Example Output
Eigenanalysis of the Correlation Matrix
Eigenvalue 3.5476 2.1320 1.0447 0.5315 0.4112

Proportion 0.443 0.266 0.131 0.066 0.051
Cumulative 0.443 0.710 0.841 0.907 0.958
Eigenvalue 0.1665 0.1254 0.0411

Proportion 0.021 0.016 0.005
Cumulative 0.979 0.995 1.000
Interpretation
For the loan applicant data:

• The first principal component has variance 3.5476 (equal to the largest eigenvalue) and accounts for 0.443 (44.3%) of the total
variation in the data.
• The second principal component (variance 2.1320) accounts for 0.266 (26.6%) of the total data variation.
• The third principal component (variance 1.0447) accounts for 0.131 (13.1%) of the total data variation.
The first three principal components with variances equal to the eigenvalues greater than 1 represent 0.841 (84.1%) of the total variability,
suggesting that 3 principal components adequately explain the variation in the data.
Eigenanalysis − Coefficients
The principal components are the linear combinations of the original variables that account for the variance in the data. The maximum
number of components extracted always equals the number of variables. The eigenvectors , which are comprised of coefficients
corresponding to each variables, are used to calculate the principal component scores . The coefficients indicate the relative weight of
each variable in the component. The bigger the absolute value of the coefficient, the more important the corresponding variable is in
constructing the component.
Note You must standardize the variables to obtain the correct component score.
Example Output
Variable PC1 PC2 PC3 PC4 PC5

Income 0.314 0.145 -0.676 -0.347 -0.241
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Education 0.237 0.444 -0.401 0.240 0.622

Age 0.484 -0.135 -0.004 -0.212 -0.175
Residence 0.466 -0.277 0.091 0.116 -0.035
Employ 0.459 -0.304 0.122 -0.017 -0.014
Savings 0.404 0.219 0.366 0.436 0.143
Debt -0.067 -0.585 -0.078 -0.281 0.681
Credit cards -0.123 -0.452 -0.468 0.703 -0.195
Variable PC6 PC7 PC8

Income 0.494 0.018 -0.030
Education -0.357 0.103 0.057
Age -0.487 -0.657 -0.052
Residence -0.085 0.487 -0.662
Employ -0.023 0.368 0.739
Savings 0.568 -0.348 -0.017
Debt 0.245 -0.196 -0.075
Credit cards -0.022 -0.158 0.058
Interpretation
For the loan applicant data, the first principal component's scores are computed from the original data using the coefficients listed under
PC1 :
PC1 = 0.314 Income + 0.237 Education + 0.484 Age + 0.466 Residence + 0.459 Employ + 0.404 Savings − 0.067 Debt − 0.123 Credit
cards
The interpretation of the principal components is subjective and requires knowledge of the data:
• Age (0.484), Residence (0.466), Employ (0.459), and Savings (0.404) have large positive loadings on component 1, so label this
component Applicant Background.
• Debt (−
−0.585) and Credit cards (−
−0.452) have large negative loadings on component 2, so label this component Credit History.
• Income (−
−0.676) and Education (−
−0.401) have large negative loadings on Component 3, so label this component Academic and
Income qualifications.
Principal Components : Topics

Summary
Eigenvalues
Coefficients
Graphs
Scree plot
Score plot
Loading plot
Biplot
Graphs − Scree Plot
The scree, or eigenvalue, plot provides one method for determining the number of principal components. The scree plot displays the
component number versus the corresponding eigenvalue . The eigenvalues of the correlation matrix equal the variances of the principal
components; therefore, choose the number of components based on the size of the eigenvalues.
The ideal pattern is a steep curve, followed by a bend and then a straight line. Retain those components in the steep curve before the first
point that starts the line trend. In practice, you may have difficulty interpreting a scree plot. Use your knowledge of the data and the
results from the other methods of selecting components to help decide the number of components to retain.
Example Output
Interpretation
For the loan applicant data you can conclude that the first three principal components account for most of the total variability in data
(given by the eigenvalue). The remaining principal components account for a very small proportion of the variability (close to zero) and
are probably unimportant.
Graphs − Score Plot
The score plot graphs the second principal component scores versus the first principal component scores. If the first two components
account for most of the variance in the data, you can use the score plot to assess the data structure and detect clusters, outliers , and
trends. The plot may reveal groupings of points, which may indicate two or more separate distributions in the data. If the data follow a
normal distribution and no outliers are present, the points are randomly distributed around zero.
To create plots for other components, store the scores and use Graph > Scatterplot.
Example Output
Interpretation
For the loan applicant data, the point in the lower right hand corner may be an outlier. Investigate this point further.
Graphs − Loading Plot
The loading plot provides information about the loadings of the first two principal components.
Example Output
Interpretation

• Age, Residence, Employ, and Savings have large positive loadings on component 1, so label this component Applicant Background.
• Debt and Credit Cards have large negative loadings on component 2, so label this component Credit history.
Graphs − Biplot
The biplot overlays the score and loading plots. Use the biplot to assess the data structure and loadings on one graph. The second
principal component scores are plotted versus the first principal component scores. The loadings for these two principal components are
plotted on the same graph.
Example Output
Interpretation

• Age, Residence, Employ, and Savings have large positive loadings on component 1, so label this component Applicant background.
• Debt and Credit cards have large negative loadings on component 2, so label this component Credit History.
• The data point in the lower right hand corner may be an outlier. You may want to investigate this point further.
Multivariate analysis
Use multivariate analysis to:
• Understand or reduce the data dimension by analyzing the data covariance structure.
− Principal Components − use to reduce the data into a smaller number of components
− Factor Analysis − use to describe the covariance among variables in terms of a few underlying factors
• Assign group membership.
− Discriminant Analysis − use to classify observations into two or more groups when groups are known
− Cluster Observations − use to classify similar observations into groups when groups are initially unknown
− Cluster Variables − use to classify similar variables into groups when groups are initially unknown
− Cluster K-Means − use to classify similar observations into groups when groups are initially unknown but good starting points for
clusters are available
Because Minitab does not compare tests of significance for multivariate procedures, interpreting the results is somewhat subjective.
However, you can make informed conclusions if you are familiar with the data.
Multivariate analysis of means (MANOVA) is another case of multivariate analysis. MANOVA allows you to compare the means of
variables across groups. See the ANOVA StatGuide for information on both Balanced and General MANOVA techniques.
Principal components analysis or factor analysis

You can use principal components analysis or factor analysis to summarize the data covariance structure in fewer dimensions of the data.
Though the techniques are similar, they actually are designed for different purposes. Principal components analysis is used to reduce
data into a smaller number of components, and factor analysis is used to understand what constructs underlie the data. The two analyses
are often performed on the same data. For example, you can conduct a principal components analysis to determine the number of factors
to extract in a factor analytic study.
Correlation or covariance matrix

Use the correlation matrix to calculate the principal components if variables are measured by different scales and you want to standardize
them or if the variances differ widely among variables. You can use the covariance or correlation matrix in all other situations.
Determining number of components

You can determine the number of principal components using several methods:
• Kaiser method − only retain components with eigenvalues greater than 1.
• Scree test − find the place on the scree plot where the smooth decrease of eigenvalues levels off. Retain those components in the
steep curve before the first point that starts the line trend.
• Percent variation explained − retain components that cumulatively explain a certain percent of variation. The acceptable level of
explained variance depends on how you use Principal Components. For descriptive purposes, you may only need 80% of the
variance explained. However, if you are doing other analyses on these data, you may want to have at least 90% of the variance
explained.
These methods may detect a different number of components for the same data. In such cases, choose the most interpretable and logical
solution for your data.
Factor Analysis
Summary
Use Factor Analysis to summarize the data covariance structure into a smaller number of dimensions. The emphasis in factor analysis is
to describe the covariance among variables in terms of a few underlying unobservable random quantities or factors .
For example, you may want to analyze:
• Customer responses on several attributes of a new product to form fewer uncorrelated factors that are easier to interpret.
• Test scores in different subject areas, looking for correlations among the variables that may indicate the existence of factors.
You can perform factor analysis when you have one sample and several variables are measured on each sampling unit.
Data Description
Job applicants were measured on 12 different characteristics: academic record, appearance, communication, company fit, experience,
job fit, letter of interest, likeability, organization, potential, résumé, and self-confidence. After conducting a principal components analysis,
you speculate that five factors may fit the data well. You want to conduct a factor analysis using a maximum likelihood extraction to
determine what factors underlie the data.
Factor Analysis : Topics

Summary
Loadings and communalities
Unrotated
Rotated
Sorted rotated
Factor scores
Coefficients
Graphs
Scree plot
Score plot
Loading plot
Biplot
Factor Analysis
Loadings and Communalities − Unrotated
Use factor analysis to determine the underlying factors responsible for correlations in the data.
The unrotated factor analysis displays:
• Loadings − represent how much a factor explains a variable. High loadings (positive or negative) indicate that the factor strongly
influences the variable. Low loadings (positive or negative) indicate that the factor has a weak influence on the variables. Examine the
loading pattern to determine on which factor each variable loads. Some variables may load on multiple factors. In the unrotated factor
loading table, the loadings are often difficult to interpret.
• Communality − each variable's proportion of variability that is explained by the factors. The closer the communality is to 1, the better
the variable is explained by the factors. You can decide to add a factor if it contributes to the fit of certain variables.
• Variance − variability in the data explained by each factor. Variance equals the eigenvalue if you use principal components to extract
factors.
• %Var − proportion of variability in the data explained by each factor.
After you select the number of factors, try different rotations so you can more easily interpret the factor loadings. Look at the rotated
solution to interpret the factors.
Example Output
Maximum Likelihood Factor Analysis of the Correlation Matrix
Unrotated Factor Loadings and Communalities
Variable Factor1 Factor2 Factor3 Factor4 Factor5 Communality

Academic record 0.571 -0.329 -0.270 0.246 0.471 0.789
Appearance 0.590 -0.290 0.228 0.398 -0.016 0.643
Communication 0.558 -0.505 0.465 -0.166 0.159 0.835
Company Fit 0.616 -0.620 -0.174 -0.210 -0.117 0.853
Experience 0.585 -0.138 -0.404 0.226 0.264 0.646
Job Fit 0.635 -0.606 -0.343 -0.168 -0.160 0.943
Letter 0.824 0.154 0.006 -0.253 0.021 0.768
Likeability 0.582 -0.371 0.153 0.354 -0.234 0.680
Organization 0.465 -0.659 0.500 -0.119 0.095 0.925
Potential 0.606 -0.476 -0.356 0.221 0.307 0.864
Resume 0.964 0.244 0.018 -0.013 -0.004 0.989
Self-confidence 0.549 -0.378 0.157 0.583 -0.263 0.878
Variance 4.9426 2.2441 1.0782 0.9621 0.5841 9.8111

% Var 0.412 0.187 0.090 0.080 0.049 0.818
Interpretation
For the job applicant data, 5 factors were extracted from the 12 variables. All variables are well represented by the 5 chosen factors,
given that the corresponding communalities are generally high. For example, 0.789, or 78.9%, of the variability in Academic Record is
explained by the 5 factors. Also, the 5 chosen factors explain most of the total data variation (0.818 or 81.8%).
Factor Analysis
Loadings and Communalities − Rotated
Factor rotation simplifies the loading structure, allowing you to more easily interpret the factor loadings. There are four methods to
orthogonally rotate the initial factor loadings:
• Equimax − maximizes variance of squared loadings within both variables and factors.
• Varimax − maximizes variance of squared loadings within factors (i.e. simplifies the columns of the loading matrix); the most widely
used rotation method. This method attempts to make the loadings either large or small to ease interpretation.
• Quartimax − maximizes variance of squared loadings within variables (i.e. simplifies the rows of the loading matrix).
• Orthomax − rotation that comprises the above three depending on the value of the parameter gamma (0-1).
All methods simplify the loading structure. However, one method may not work best in all cases. You may want to try different rotations
and use the one that produces the most interpretable results.
The percent of the total variability explained by the factors does not change with rotation and the communalities remain the same. But,
after rotation, the factors are more evenly balanced in the percent of variability that they account for (compare the % Var rows from the
unrotated and rotated output).
Example Output
Rotated Factor Loadings and Communalities
Varimax Rotation

Academic record 0.815 0.191 0.206 0.167 -0.135 0.789
Appearance 0.283 0.644 0.317 0.213 -0.047 0.643
Communication 0.139 0.242 0.815 0.235 -0.197 0.835
Company Fit 0.300 0.216 0.376 0.207 -0.729 0.853
Experience 0.694 0.196 -0.048 0.290 -0.199 0.646
Job Fit 0.372 0.228 0.231 0.210 -0.810 0.943
Letter 0.184 0.073 0.193 0.805 -0.208 0.768
Likeability 0.173 0.703 0.247 0.182 -0.248 0.680
Organization 0.097 0.315 0.864 0.070 -0.255 0.925
Potential 0.785 0.268 0.186 0.126 -0.355 0.864
Resume 0.267 0.297 0.104 0.899 -0.100 0.989
Self-confidence 0.216 0.877 0.166 0.090 -0.162 0.878
Variance 2.2803 2.1752 1.9204 1.8175 1.6177 9.8111

% Var 0.190 0.181 0.160 0.151 0.135 0.818
Interpretation
For the job applicant data, the varimax rotation was performed. You can now interpret the factors more easily. For example:
• Academic record (0.815), Experience (0.694), and Potential (0.785) have large positive loadings on factor 1, so label this factor
qualifications.
• Appearance (0.644), Likeability (0.703), and Self-confidence (0.877) have large positive loadings on factor 2, so label this factor
personal qualities.
• Communication (0.815) and Organization (0.864) have large positive loadings on factor 3, so label this factor work skills.
• Letter (0.805) and Resume (0.899) have large positive loadings on factor 4, so label this factor writing skills.
• Company Fit (−
−0.729) and Job Fit (−
−0.810) have large positive loadings on factor 5, so label this factor company and job fit.
Factor Analysis
Loadings and Communalities − Sorted Rotated
The sorted rotated table contains the same content as the rotated solution. Because the factor loadings are sorted by size, they are
easier to read in the sorted rotated table.
Minitab sorts by the maximum absolute loading for any factor. Minitab sorts the loadings so that the high loadings for all factors form a
diagonal, making the loadings easier to read.
Example Output
Sorted Rotated Factor Loadings and Communalities

Academic record 0.815 0.191 0.206 0.167 -0.135 0.789
Potential 0.785 0.268 0.186 0.126 -0.355 0.864
Experience 0.694 0.196 -0.048 0.290 -0.199 0.646
Self-confidence 0.216 0.877 0.166 0.090 -0.162 0.878
Likeability 0.173 0.703 0.247 0.182 -0.248 0.680
Appearance 0.283 0.644 0.317 0.213 -0.047 0.643
Organization 0.097 0.315 0.864 0.070 -0.255 0.925
Communication 0.139 0.242 0.815 0.235 -0.197 0.835
Resume 0.267 0.297 0.104 0.899 -0.100 0.989
Letter 0.184 0.073 0.193 0.805 -0.208 0.768
Job Fit 0.372 0.228 0.231 0.210 -0.810 0.943
Company Fit 0.300 0.216 0.376 0.207 -0.729 0.853
Variance 2.2803 2.1752 1.9204 1.8175 1.6177 9.8111

% Var 0.190 0.181 0.160 0.151 0.135 0.818
Interpretation
For the job applicant data, you can easily see what variables load on each factor:
• Academic record (0.815), Experience (0.694), and Potential (0.785) have large positive loadings on factor 1, so label this factor
Qualifications.
• Appearance (0.644), Likeability (0.703), and Self-confidence (0.877) have large positive loadings on factor 2, so label this factor
Personal Qualities.
• Communication (0.815) and Organization (0.864) have large positive loadings on factor 3, so label this factor Work Skills.
• Letter (0.805) and Resume (0.899) have large positive loadings on factor 4, so label this factor Writing Skills.
• Company Fit (−
−0.729) and Job Fit (−
−0.810) have large positive loadings on factor 5, so label this factor Company and Job Fit.
Factor Analysis
Coefficients
The coefficients are used to calculate the factor scores . Minitab calculates factor scores by multiplying factor score coefficients (listed
under Factor1, Factor2, and so on) and your data after they have been scaled and centered by subtracting means.
Use factor scores to:
• Examine the behavior of observations
• Use in another analysis such as regression or MANOVA.
Note You must standardize the variables to obtain the correct factor score.
Example Output
Factor Score Coefficients
Variable Factor1 Factor2 Factor3 Factor4 Factor5

Academic record 0.482 -0.101 0.066 -0.091 0.204
Appearance 0.028 0.154 0.010 -0.046 0.071
Communication -0.004 -0.095 0.372 0.010 0.100
Company Fit -0.081 -0.059 0.017 0.005 -0.324
Experience 0.209 -0.023 -0.038 -0.037 0.050
Job Fit -0.143 -0.073 -0.251 -0.018 -1.007
Letter -0.030 -0.106 0.045 0.072 -0.043
Likeability -0.065 0.221 -0.045 -0.045 -0.016
Organization -0.113 -0.035 0.793 -0.042 0.151
Potential 0.594 -0.065 -0.001 -0.144 0.125
Resume -0.104 0.055 -0.119 1.108 0.231
Self-confidence -0.119 0.807 -0.216 -0.194 0.046
Interpretation
For the job applicant data, the first factor scores are computed from the original data using the coefficients listed under Factor1 :
Factor 1 = 0.482 Academic record + 0.028 Appearance − 0.004 Communication − 0.081 Company Fit + 0.209 Experience − 0.143 Job Fit
− 0.030 Letter − 0.065 Likeability − 0.113 Organization + 0.594 Potential − 0.104 Resume − 0.119 Self-confidence
For the job applicant data, the factor score coefficient pattern matches the loading pattern. For example, for factor 1 the coefficients with
the highest absolute value (Academic record, Experience, and Potential) match the three variables for that load on factor 1.
Factor Analysis
Graphs − Scree Plot
Use the scree, or eigenvalue, plot (graph of factors versus the corresponding eigenvalues ) to provide visual information about the
factors. The eigenvalues of the correlation matrix equal the variances of the factors in the unrotated solution.
From this plot, you can determine how well the chosen number of components fit the data.
Example Output
Interpretation
For the job applicant data, you can conclude that the first five factors account for most of the total variability in data (given by the
eigenvalues). The remaining factors account for a very small proportion of the variability (close to zero) and are likely unimportant.
Factor Analysis
Graphs − Score Plot
The score plot graphs the second factor scores versus the first factor scores. The plot of the factors provides checks on the assumption of
normality and reveals outliers . If the data are normal and no outliers are present, the score plot shows the points randomly distributed
around zero.
When the first two factors account for most of the variance, you can use the score plot to visually assess the structure of your data.
To create plots for other factors, store the scores and use Graph > Scatterplot.
Example Output
Interpretation
For the job applicant data, the data appear normal and no outliers are apparent.
Factor Analysis
Graphs − Loading Plot
After selecting the number of factors, try different rotations so you can more easily interpret the factor loadings . The loading plot provides
information about the loadings of the first two factors.
Example Output
Interpretation
For the job applicant data, the varimax rotation was performed. You can now interpret the first two factors more easily:
• Academic record, Experience, and Potential have large positive loadings on factor 1, so label this factor Qualifications.
• Appearance, Likeability, and Self-confidence have large positive loadings on factor 2, so label this factor Personal Qualities.
Factor Analysis
Graphs − Biplot
The biplot overlays the score and loading plots. Use the biplot to assess the data structure and loadings on one graph. The second
principal component scores are plotted versus the first principal component scores. The loadings for these two principal components are
plotted on the same graph.
Example Output
Interpretation
For the job applicant data, the varimax rotation was performed. You can now interpret the first two factors more easily.
• The data appear normal and no outliers are apparent
• Academic record, Experience, and Potential have large positive loadings on factor 1, so label this factor Qualifications.
• Appearance, Likeability, and Self-confidence have large positive loadings on factor 2, so label this factor Personal qualities.
Extraction methods
Minitab offers two extraction methods for factor analysis: principal components and maximum likelihood. When performing factor analysis:
• If the factors and the errors obtained after fitting the factor model are assumed to follow a normal distribution , use the maximum
likelihood method to obtain maximum likelihood estimates of the factor loadings .
• If the factors and errors obtained after fitting the factor model are not assumed to follow a normal distribution , use the principal
components method.
Steps in factor analysis

The goal of factor analysis is to find a few factors, or unobservable variables, that explain most of the data variability and yet make
contextual sense. You must decide how many factors to use, and find loadings that make the most sense for your data.
The number of factors is often based on the proportion of variance explained by the factors, subject matter knowledge, and
reasonableness of the solution.
1. Perform a principal components analysis to reduce the data and to give you an idea of the number of factors. Examine the
proportion of variability explained by different factors, the eigenvalues, and the scree plot to narrow down your choice of how many
factors to use.
2. Perform a factor analysis, specifying the number of factors. Try different rotation methods to help you interpret the factor loadings.
3. Label the factors based on your knowledge of the data.
4. Construct scales or factor scores for use in other analyses.
Item Analysis
Summary
Use Item Analysis to evaluate whether the items in a survey or test assess the same skill or characteristic. Commonly used in social
sciences, education, or service quality industry.
Data Description
A restaurant manager wants to assess customer satisfaction with the following three questions:
Item 1 – How satisfied are you with our services?
Item 2 – How likely are you to visit us again?
Item 3 – How likely are you to recommend our restaurant to others?

Responses are recorded using the appropriate 5-point Likert scale below:
1 = Very dissatisfied, 2 = Dissatisfied, 3 = Neutral, 4 = Satisfied, and 5 = Very satisfied
1 = Very unlikely, 2 = Unlikely, 3 = Neutral, 4 = Likely, and 5 = Very likely
Before using these questions, the manager wants to assess whether the three questions truly measure the same construct, customer
satisfaction. A random sample of 50 customers are chosen and their responses are recorded.
Item Analysis
Test Statistics − Correlation Matrix
Use the correlation matrix to assess the strength and direction of the relationship between two items or variables. Items that measure the
same construct should have high, positive correlation values. If the items are not highly correlated, then the items may be ambiguous,
difficult to understand, or measure different constructs.
Often, variables with correlation values greater than 0.7 are considered highly correlated. However, your correlation benchmark value will
depend on subject area knowledge and the number of items in your analysis.
Example Output
Correlation Matrix
Item 1 Item 2
Item 2 0.903
Item 3 0.867 0.864
Cell Contents: Pearson correlation
Interpretation
For the customer service data, Item 1 and Item 2 have a correlation of 0.903. All the items are highly correlated to each other.
Item Analysis
Test Statistics − Item and Total Statistics
The tabulated statistics table summarizes information about each item and the total for all items:
• Total count – Number of observations.
• Mean – Sum of all observations divided by the total count.
• StDev – Measure of dispersion analogous to the average distance (independent of direction) of each observation from the mean.
Example Output
Item and Total Statistics
Total
Variable Count Mean StDev
Item 1 50 3.1600 1.2675
Item 2 50 2.8400 1.3607
Item 3 50 2.9400 1.3463
Total 50 8.9400 3.8087
Interpretation
For the customer service data, Item 1 has 50 observations, the mean response is 3.1600, and the standard deviation is 1.2675.
Item Analysis
Test Statistics − Cronbach's Alpha
A measure of internal consistency used to assess how reliably multiple items in a survey or test assess the same skill or characteristic.
Values range between 0 and 1. If Cronbach's alpha is low, then the items may not reliably measure a single construct. Typically, a value
of 0.7 or higher is considered good. However, the benchmark value varies by subject area and number items.
Example Output
Cronbach's Alpha = 0.9550
Interpretation
For the customer service data, the overall Cronbach's alpha is 0.9550. The value is greater than the common benchmark of 0.7 and
suggests that the items are measuring the same construct.
Item Analysis
Test Statistics − Omitted Item Statistics
Use the omitted item statistics table to assess whether removing an item from the analysis improves the internal consistency:
• Adj Total Mean – Mean of the total after omitting an item from the set.
• Adj Total Stdev – Standard deviation after omitting an item from the set.
• Item-Adi Total Corr – Correlation between the scores of one omitted item and the total scores of all other items. In practice, values
range between 0 and 1. A higher value suggests that the omitted item measures the same construct as the other items.
2
• Squared Multiple Corr – Coefficient of determination (R ) when the omitted item is regressed on the remaining items. Values range
between 0 and 1. A higher value suggests that the omitted item measures the same construct as the other items.
• Cronbach's Alpha – Cronbach's alpha calculated after an item is omitted from the analysis. Fairly constant values for all omitted items
suggest that all items measure the same construct. An increase in Cronbach's alpha for a specific omitted item suggests that it does
not measure the same construct.
If an omitted item has a low Item-Adj Total Corr, a low Squared Multiple Corr, and Cronbach's alpha substantially increases, then you
might consider removing it from the set.
Example Output
Omitted Item Statistics
Adj. Adj. Item-Adj. Squared

Omitted Total Total Total Multiple Cronbach's
Variable Mean StDev Corr Corr Alpha
Item 1 5.780 2.613 0.9166 0.8447 0.9268
Item 2 6.100 2.525 0.9134 0.8413 0.9277
Item 3 6.000 2.563 0.8870 0.7869 0.9476
Interpretation
For the customer service data, item adjusted total correlation when item 1, 2, and 3 are omitted from the model is 0.9166, 0.9134, and
0.8870 respectively.
The squared multiple correlation when item 1, 2, and 3 are omitted from the model is 0.8447, 0.8413, and 0.7869 respectively.
Cronbach's alpha when item 1, 2, and 3 are omitted from the model is 0.9268, 0.9277, and 0.9476 respectively.
The Item-Adj Total Corr and Squared Multiple Corr are consistently high and the Cronbach's alpha values do not differ significantly.
Collectively, the evidence suggests that all items measure the same construct.
Item Analysis : Topics

Summary
Correlation matrix
Item and total statistics
Cronbach's alpha
Omitted item statistics
Graphs
Matrix plot
Item Analysis
Graphs − Matrix Plot
Use the plot to visually assess the relationship between every combination of items or variables.
Example Output
Interpretation
For the Customer service data, the plot suggests that all the items have a linear and positive relationship.
Cluster Observations
Summary
Use the cluster observations procedure to classify similar observations into groups, when the groups are initially not known. Cluster
observations uses a hierarchical clustering procedure.
For example, you can:
• Make measurements on 5 nutritional characteristics of 12 breakfast cereal brands, then group the cereal brands that have similar
characteristics
• Record information about characteristics of recreation parks in the United States, then group the parks by their similarities
You can perform the clustering of observations analysis when:
• You want to sort observations into two or more categories
• You do not have missing data
Data Description
You are currently developing a new piece of sporting equipment and you want to test different groups on the equipment's ease of use.
You have data on gender, height, weight, and handedness of 20 people, and you want to group the available people by their similarities.
Amalgamation Steps
At each amalgamation step, two clusters are joined until only one cluster remains. The amalgamation table shows the clusters joined,
their similarity level, distance between them, and other characteristics of each newly formed cluster. In Minitab, your choice of linkage
method and distance measure will greatly influence the clustering outcome.
You should look at the similarity level and distance between the joined clusters to choose the number of clusters for the final partition of
data. The step prior to where the values change abruptly may determine the number of clusters for the final partition. For the final
partition, you want reasonably large similarity levels and reasonably small distances between the joined clusters.
Example Output
Standardized Variables, Euclidean Distance, Complete Linkage

Amalgamation Steps
Number
of obs .
Number of Similarity Distance Clusters New in new
Step clusters level level joined cluster cluster
1 19 96.6005 0.16275 13 16 13 2
2 18 95.4642 0.21715 17 20 17 2
3 17 95.2648 0.22669 6 9 6 2
4 16 92.9178 0.33905 17 18 17 3
5 15 90.5296 0.45339 11 15 11 2
6 14 90.3124 0.46378 12 19 12 2
7 13 88.2431 0.56285 2 14 2 2
8 12 88.2431 0.56285 5 8 5 2
9 11 85.9744 0.67146 6 10 6 3
10 10 83.0639 0.81080 7 13 7 3
11 9 83.0639 0.81080 1 3 1 2
12 8 81.4039 0.89027 2 17 2 5
13 7 79.8185 0.96617 6 11 6 5
14 6 78.7534 1.01716 4 12 4 3
15 5 66.2112 1.61760 2 5 2 7
16 4 62.0036 1.81904 1 6 1 7
17 3 41.0474 2.82229 1 4 1 10
18 2 40.1718 2.86421 2 7 2 10
19 1 0.0000 4.78739 1 2 1 20
Interpretation
For the sports data, the amalgamation steps show that:

• The similarity level decreases, first by increments of about 4 or less, then by about 21 (from 62.0036 to 41.0474) at steps 16 and 17
(from 4 clusters to 3).
• The distance between the joined clusters increases, first by about 0.6 or less, then by about 1 (from 1.81904 to 2.82229) at steps 16
and 17 (from 4 clusters to 3).
These facts could indicate that 4 clusters are reasonably sufficient for the final partition, assuming that this grouping makes intuitive sense
for the data.
Final Partition
Minitab summarizes each cluster in the final partition by the:

• Number of observations
• Within cluster sum of squares
• Average distance from the observation to the cluster centroid
• Maximum distance of the observation to the cluster centroid.
The Cluster Centroids are the vectors of variable means for the observations in the clusters and are used as cluster midpoints.
The Distances Between Cluster Centroids (computed using the chosen distance measure between observations) show how far the
formed clusters are from each other. These numbers are not very informative by themselves, but you can compare the cluster differences
to see how different the clusters are from each other.
Example Output
Number of clusters: 4
Average Maximum
Within distance distance
Number of cluster sum from from
observations of squares centroid centroid
Cluster1 7 3.25713 0.612540 1.12081
Cluster2 7 2.72247 0.581390 0.95186
Cluster3 3 0.55977 0.398964 0.54907
Cluster4 3 0.37116 0.326533 0.48848
Cluster Centroids
Variable Cluster1 Cluster2 Cluster3 Cluster4 Grand centroid
Gender 0.97468 -0.97468 0.97468 -0.97468 -0.0000000
Height -1.00352 1.01283 -0.37277 0.35105 -0.0000000
Weight -0.90672 0.93927 -0.86797 0.79203 -0.0000000
Handedness 0.63808 0.63808 -1.48885 -1.48885 -0.0000000
Distances Between Cluster Centroids

Cluster1 Cluster2 Cluster3 Cluster4
Cluster1 0.00000 3.35759 2.21882 3.61171
Cluster2 3.35759 0.00000 3.67557 2.23236
Cluster3 2.21882 3.67557 0.00000 2.66074
Cluster4 3.61171 2.23236 2.66074 0.00000
Interpretation
For the sports data, four clusters are evident in the final partition. The within cluster sums of squares are reasonably small − the sum of
squares is bigger for the first cluster (3.25713). You should further analyze your data to see if the grouping makes sense.
Cluster Observations : Topics

Summary
Amalgamation steps
Final partition
Graphs
Dendrogram
Graphs − Dendrogram
The dendrogram displays the groups formed by clustering of observations and their similarity levels. You can also display the distance
level on the y-axis. You should analyze the data to see if the classification makes sense.
You can label observations using a column of case labels to help interpret the clusters in the dendrogram.
Example Output
Interpretation
The final partition for the sports data contains four clusters of 20 people, classified by similarities:
Cluster Observations Similarity
Red 7 Right-handed females
Blue 3 Left-handed females
Green 7 Right-handed males
Orange 3 Left-handed males

The orange cluster has the higher similarity level.
Steps in clustering of observations

The final grouping of objects (also called the final partition) is the grouping of objects that should identify groups whose observations
share common characteristics. The decision about final grouping is also called cutting the dendrogram. The complete dendrogram (tree
diagram) is a graphical depiction of the amalgamation of observations into one cluster. Cutting the dendrogram is akin to drawing a line
across the dendrogram to specify the final grouping.
How do you know where to cut the dendrogram?
• First execute cluster analysis without specifying a final partition. Examine the similarity and distance levels in the Session window
results and in the dendrogram. The pattern of how similarity or distance values change from step to step can help you to choose the
final grouping. The step prior to the values changing abruptly may identify a good point for cutting the dendrogram, if this makes
sense for your data.
• After choosing where you wish to make your partition, rerun the clustering procedure, using either number of clusters or similarity level
to give you either a set number of groups or a similarity level for cutting the dendrogram. Examine the resulting clusters in the final
partition to see if the grouping seems logical.
• Looking at dendrograms for different final groupings can also help you to decide which one makes the most sense for your data.
Standardizing variables
When the variables are in different units, you should standardize all variables to minimize the effect of scale differences. Minitab
standardizes all variables by subtracting the means and dividing by the standard deviation before calculating the distance matrix. When
you standardize variables, the grand centroid is 0 for all clusters.
Distance measures for observations

The distance matrix contains distances between observations. Minitab provides five measures to calculate distance (you should choose a
distance measure according to properties of your data).
• Euclidean distance − a standard mathematical measure of distance (square root of the sum of squared differences).
• Pearson distance − a square root of the sum of square distances divided by variances. This distance gives a standardized measure of
distance.
• Manhattan distance − the sum of absolute differences, so that outliers receive less weight than they would if the Euclidean method
were used.
• Squared Euclidean and squared Pearson distances − the square of the Euclidean and Pearson distances, respectively. Therefore, the
distances that are large under the Euclidean and Pearson methods are even larger under the squared Euclidean and squared
Pearson distances.
Linkage methods for observations

A linkage rule is necessary for calculating inter-cluster distances when a cluster has multiple observations.
• Single linkage or "nearest neighbor" − the distance between two clusters is the minimum distance between an observation in one
cluster and an observation in the other cluster. Single linkage is a good choice when clusters are clearly separated.
• Average linkage − the distance between two clusters is the mean distance between an observation in one cluster and an observation
in the other cluster.
• Centroid linkage − the distance between two clusters is the distance between the cluster centroids .
• Complete linkage or "furthest neighbor" − the distance between two clusters is the maximum distance between an observation in one
cluster and an observation in the other cluster. This method ensures that all observations in a cluster are within a maximum distance
and tends to produce clusters with similar diameters. The results can be sensitive to outliers.
• Median linkage − the distance between two clusters is the median distance between an observation in one cluster and an observation
in the other cluster. This technique uses the median rather than the mean, thus dampening the influence of outliers.
• McQuitty's linkage − when two clusters are joined, the distance of the new cluster to any other cluster is calculated as the average of
the distances of the soon-to-be-joined clusters to the other cluster. For example, if clusters 1 and 3 are to be joined into a new cluster,
say 1*, then the distance from 1* to cluster 4 is the average of the distances from 1 to 4 and 3 to 4.
• Ward's linkage − the distance between two clusters is the sum of squared deviations from points to centroids. The objective of Ward's
linkage is to minimize the within-cluster sum of squares. It tends to produce clusters with similar numbers of observations, but it is
sensitive to outliers. In Ward's linkage, the distance between two clusters can be larger than dmax, the maximum value in the original
distance matrix. If this happens, the similarity will be negative.
If you choose Average, Centroid, Median, or Ward as the linkage method, you should use one of the squared distance measures.
Cluster Variables
Summary
Use the cluster variables procedure to classify variables into groups when the groups are initially not known. The primary reason to
cluster variables is to reduce the number of variables. Cluster variables uses a hierarchical clustering procedure.
For example, you can:
• Record information about the characteristics of recreation parks in the United States, then group the similar variables to reduce the
data.
• Conduct a study to determine the effects of a change in environment on blood pressure. You can record different physical
measurements of the subjects, analyze the variables for possible similar characteristics, and combine the variables to reduce the
data.
You can perform the clustering of variables analysis when:
• You want to sort variables into two or more categories
• You do not have missing data
Data Description
You conduct a study to determine effects of the number of media outlets and universities and the literacy rate on the college admissions
of the population. For 10 cities around the world, you count the number of newspaper copies, radios, and television sets per 1,000
people. You also determine the literacy rate and whether a university is located in each city. You want to reduce the number of variables
by combining variables based on similar characteristics.
Cluster Variables
Amalgamation Steps
At each amalgamation step, two clusters variables are joined until only one cluster remains. The amalgamation table shows the clusters
joined, their similarity level, distance between them, and other characteristics of each newly formed cluster. In Minitab, your choice of
linkage method and distance measure will greatly influence the clustering outcome.
You should look at the similarity level and distance between the joined clusters to choose the number of clusters for the final partition of
data. The step prior to where the values change abruptly may determine the number of clusters for the final partition. For the final
partition, you want reasonably large similarity levels and reasonably small distances between the joined clusters.
Example Output
Correlation Coefficient Distance, Average Linkage

Amalgamation Steps
Number
of obs.
Number of Similarity Distance Clusters New in new
Step clusters level level joined cluster cluster
1 4 93.9666 0.120669 2 3 2 2
2 3 93.1548 0.136904 4 5 4 2
3 2 87.3150 0.253700 1 4 1 3
4 1 79.8113 0.403775 1 2 1 5
Interpretation
For the college admissions data, the amalgamation steps show that the similarity level changes first from 93.97 to 93.15, then changes
abruptly from 93.1548 to 87.3150 at steps 2 and 3 (from 3 clusters to 2). This change could indicate that 3 clusters are reasonably
sufficient for the final partition, assuming that this grouping makes sense for the data.
Cluster Variables
Final Partition
If you request a final partition , you receive a list of the variables in each cluster.
Example Output
Final Partition
Cluster 1
Newspaper
Cluster 2
Radios TV Sets
Cluster 3
Literacy Rate University
Interpretation
For the college admissions data, three clusters are formed:

• Numbers of newspaper copies per 1,000 people
• Number of radios and television sets
• Literacy level and whether a university is located in the city
This grouping seems reasonable.
Cluster Variables : Topics

Summary
Amalgamation steps
Final partition
Graphs
Dendrogram
Cluster Variables
Graphs − Dendrogram
The dendrogram displays the groups formed by clustering of variables, and their similarity levels. You can also display distance levels on
the y-axis. You should analyze your data to see if the classification makes sense.
Example Output
Interpretation
For the college admissions data, the final partition contains three clusters classified by similarities:
Cluster Variables Similarity
Red 1 Number of newspaper copies
Blue 2 Literacy rate and presence of a university
Green 2 Number of radios and television sets
Steps in clustering of variables

The final grouping of clusters (also called the final partition) is the grouping of clusters that should identify groups whose observations
share common characteristics. The decision about final grouping is also called cutting the dendrogram. The complete dendrogram (tree
diagram) is a graphical depiction of the amalgamation of observations into one cluster. Cutting the dendrogram is akin to drawing a line
across the dendrogram to specify the final grouping.
How do you know where to cut the dendrogram?
• You might first execute cluster analysis without specifying a final partition. Examine the similarity and distance levels in the Session
window results and in the dendrogram. The pattern of how similarity or distance values change from step to step can help you to
choose the final grouping. The step where the values change abruptly may identify a good point for cutting the dendrogram, if this
makes sense for your data.
• After choosing where you wish to make your partition, rerun the clustering procedure, using either Number of clusters or Similarity
level to give you either a set number of groups or a similarity level for cutting the dendrogram. Examine the resulting clusters in the
final partition to see if the grouping seems logical.
• Looking at dendrograms for different final groupings can also help you to decide which one makes the most sense for your data.
• If the purpose behind the clustering of variables is data reduction, you may decide to use your knowledge of the data to a greater
degree in determining the final partition.
Distance measures for variables

The distance matrix contains distances between variables. Minitab provides two methods to measure distance (you should choose a
distance measure according to properties of your data).
• Correlations for distance measures − The (i,j) entry of the distance matrix is dij = 1 − ρij where ρij is the correlation between variables i
and j. The correlation method gives distances between 0 and 1 for positive correlations, and between 1 and 2 for negative correlations. If
it makes sense to consider negatively correlated data to be farther apart than positively correlated data, use the correlation method.
• Absolute correlations for distance measures − The (i,j) entry of the distance matrix is dij = 1 − | ρij | where ρij is the correlation between
variables i and j. The absolute correlation method gives distances between 0 and 1. If you think that the strength of the relationship is
important in considering distance and not the sign, then use the absolute correlation method.
Linkage methods for variables

A linkage rule is necessary for calculating inter-cluster distances when there are multiple variables in a cluster.
• Single linkage or "nearest neighbor" − the distance between two clusters is the minimum distance between a variable in one cluster
and a variable in the other cluster. Single linkage is a good choice when clusters are clearly separated.
• Average linkage − the distance between two clusters is the mean distance between a variable in one cluster and a variable in the
other cluster.
• Centroid linkage − the distance between two clusters is the distance between the cluster centroids .
• Complete linkage or "furthest neighbor" − the distance between two clusters is the maximum distance between a variable in one
cluster and a variable in the other cluster. This method ensures that all variables in a cluster are within a maximum distance and tends
to produce clusters with similar diameters. The results can be sensitive to outliers.
• Median linkage − the distance between two clusters is the median distance between a variable in one cluster and a variable in the
other cluster. This technique uses the median rather than the mean, thus downweighting the influence of outliers.
• McQuitty's linkage − when two clusters are joined, the distance of the new cluster to any other cluster is calculated as the average of
the distances of the soon-to-be-joined clusters to the other cluster. For example, if clusters 1 and 3 are to be joined into a new cluster,
say 1*, then the distance from 1* to cluster 4 is the average of the distances from 1 to 4 and 3 to 4.
• Ward's linkage − the distance between two clusters is the sum of squared deviations from points to centroids. The objective of Ward's
linkage is to minimize the within-cluster sum of squares. It tends to produce clusters with similar numbers of variables, but it is
sensitive to outliers. In Ward's linkage, the distance between two clusters can be larger than dmax, the maximum value in the original
distance matrix. If this happens, the similarity will be negative.
Cluster K-Means
Summary
Use Cluster K-Means to cluster observations into groups when the groups are initially unknown. This procedure uses non-hierarchical
clustering of observations. K-means clustering works best when sufficient information is available to make good starting cluster
designations .
For example, you can use Cluster K-Means to:
• Group people based on physical fitness level when you suspect that people fall into three categories
• Cluster population data with known characteristics from a large database
Use Cluster K-Means when you have:
• Good starting points for clusters
• Multiple measurements on items or subjects
• No missing data
Data Description
As a business analyst, you want to classify 22 successful small-to-medium size manufacturing companies into meaningful groups for
future analyses.
You suspect that the data fall into three groups:
• Established companies − in existence for over 15 years, over 130 clients, a strong return on investments, and strong sales
• Mid-growth companies − in existence for over 10 years, over 100 clients, a good return on investments, and good sales
Young companies − in existence for less than 10 years, less than 100 clients, a good return on investments, and good sales
Cluster K-Means : Topics

Summary
Final partition
Clusters and centroids
Cluster K-Means
Final Partition − Clusters and Centroids
Minitab summarizes each cluster in the final partition by:

• Number of observations
• Within cluster sum of squares
• Average distance from observation to the cluster centroid
• Maximum distance of the observation to the cluster centroid
In general, a cluster with a small sum of squares is more compact than one with a large sum of squares.
The cluster centroids are the vectors of variable means for the observations in the clusters and are used as cluster midpoints.
The distances between cluster centroids show the distances between the formed clusters. These numbers are not very informative by
themselves, but you can compare the cluster distances to see how different the clusters are from each other. The nearest cluster is the
one which has the smallest Euclidean distance between the observation and the centroid of the cluster.
Example Output
Standardized Variables
Final Partition
Number of clusters: 3
Within Average Maximum

cluster distance distance
Number of sum of from from

observations squares centroid centroid
Cluster1 4 1.593 0.578 0.884
Cluster2 8 8.736 0.964 1.656
Cluster3 10 12.921 1.093 1.463
Cluster Centroids
Grand
Variable Cluster1 Cluster2 Cluster3 centroid
Clients 1.2318 0.5225 -0.9108 0.0000
Rate of return 1.2942 0.2217 -0.6950 0.0000
Sales 1.1866 0.5157 -0.8872 0.0000
Years 1.2030 0.5479 -0.9195 0.0000
Distances Between Cluster Centroids
Cluster1 Cluster2 Cluster3

Cluster1 0.0000 1.5915 4.1658
Cluster2 1.5915 0.0000 2.6488
Cluster3 4.1658 2.6488 0.0000
Interpretation
For the manufacturing company data, K-means clustering classified the 22 companies as 4 established companies, 8 mid-growth
companies, and 10 young companies. Cluster 1 (established companies) has the smallest within-cluster sum of squares (1.593), probably
because there are only 4 observations in this cluster.
Initializing Cluster K-Means

K-means clustering begins with a grouping of observations into a predefined number of clusters.
1 Minitab evaluates each observation, moving it into the nearest cluster. The nearest cluster is the one that has the smallest Euclidean
distance between the observation and the centroid of the cluster.
2 When a cluster changes, by losing or gaining an observation, Minitab recalculates the cluster centroid.
3 This process is repeated until no more observations can be moved into a different cluster. At this point, all observations are in their
nearest cluster according to the criterion listed above.
Unlike hierarchical clustering of observations, it is possible for two observations to be split into separate clusters after they are joined
together.
K-means procedures work best when you provide good starting points for clusters.
Outliers
The presence of outliers , unusually large or small values, in your data can affect the clustering of observations. The clusters are typically
larger when outliers are not removed, and the resulting solution is more confused. If you have a large data set, consider deleting outliers
from the Cluster K-Means analysis.
Discriminant Analysis
Summary
Use Discriminant Analysis to classify observations into two or more groups if you have a sample with known groups. You can also use a
discriminant analysis to investigate how variables contribute to group separation and to place objects or individuals into defined groups;
for example, to classify three species of birds based on wingspan, color, and diet.
Discriminant analysis is similar to multiple regression because both techniques use two or more predictor variables and a single response
variable. However, in discriminant analysis, the response variable is categorical . Examples of categorical variables are gender, color, and
operational status.
You can perform a linear or quadratic discriminant analysis:
• Use a linear analysis when you assume the covariance matrices are equal for all groups.
• Use a quadratic analysis when you assume the covariance matrices are not equal for all groups.
Data Description
High school administrators assign each student to one of three educational tracks:
• 1 − for above average students who can learn independently and have strong math and language skills
• 2 − for average students who learn best with a moderate amount of teacher attention and have average math and language skills
• 3 − for students who require substantial interaction with the teacher and have weak math and language skills
An intelligence test and motivation assessment were administered to 60 students from each track. School officials want to know if the
students' intelligence test and motivation assessment scores accurately classify student placement.
Discriminant Analysis : Topics

Summary
Classification
Without cross-validation
With cross-validation
Distance and discriminant functions
Squared distance between groups
Linear discriminant function for groups
Group and pooled statistics
Descriptive statistics
Covariance matrices
Summary
Summary of classified observations
Classification − Without Cross-Validation
In the summary of the classification results, Minitab classifies the observations by:
• The group in which they are placed
• The group in which they are predicted to be placed
Minitab provides the following statistics for each true group :
• Total N − the number of observations in each true group
• N Correct − the number of observations correctly placed in each true group
• Proportion − the proportion of observations correctly placed in each true group
N , N Correct , and Proportion Correct are also shown for all groups.
Example Output
Linear Method for Response: Track
Predictors: IQ Score, Motivation
Group 1 2 3
Count 60 60 60
Summary of classification
True Group
Put into Group 1 2 3
1 59 5 0
2 1 53 3
3 0 2 57
Total N 60 60 60
N correct 59 53 57
Proportion 0.983 0.883 0.950
N = 180 N Correct = 169 Proportion Correct = 0.939
Interpretation
For the education data:

• Group 1 has a placement of 98.3% (0.983)
• Group 3 has placement of 95.0% (0.950)
Classifying group 2 students presents the most problems. Overall, 169 out of 180 students or 93.9% (0.939), are correctly placed.
Classification − With Cross-Validation
The discriminant analysis output includes a summary of the classification results with cross-validation. Cross-validation compensates for
an optimistic error rate . Therefore, the cross-validation classification results are more conservative than classification without cross-
validation.
As in the regular classification situation, observations are classified by:
• The group in which they are placed
• The group in which they are predicted to be placed
Minitab provides the following statistics for each true group :
• Total N − the number of observations in each true group

• N Correct − the number of observations correctly placed in each true group
• Proportion − the proportion of observations correctly placed in each true group
N , N Correct , and Proportion Correct are also shown for all groups.
Example Output
Summary of Classification with Cross-validation
True Group
Put into Group 1 2 3
1 59 5 0
2 1 52 3
3 0 3 57
Total N 60 60 60
N correct 59 52 57
Proportion 0.983 0.867 0.950
N = 180 N Correct = 168 Proportion Correct = 0.933
Interpretation
For the education data:

• Group 3 has placement of 95.0% (0.950)
Classifying group 2 students presents the most problems.
Overall, 168 out of 180 students or 91.1% (0.933), are correctly placed.
Distance and Discriminant Functions − Squared Distance Between Groups
An observation is classified into a group if the squared distance (also called the Mahalanobis distance) of the observation to the group
center (mean) is the minimum. For linear discriminant analysis, an assumption is made that covariance matrices are equal for all groups.
The table displays the distance between groups. Linear discriminant analysis has the property of symmetric squared distance, meaning
that the linear discriminant function of group i evaluated with the mean of group j is equal to the linear discriminant function of group j
evaluated with the mean of group i.
Example Output
Squared Distance Between Groups
1 2 3
1 0.0000 12.9853 48.0911
2 12.9853 0.0000 11.3197
3 48.0911 11.3197 0.0000
Interpretation
For the education data, the greatest distance is between groups 1 and 3 (48.0911). The difference between groups 1 and 2 is 12.9853,
and the difference between groups 2 and 3 is 11.3197.
Distance and Discriminant Functions − Linear Discriminant Function for Groups
The linear discriminant scores for each group correspond to the regression coefficients in multiple regression analyses. In general, you
can fit a linear equation of the type:
Group = b0 + b1x1 + b2x1 + ... + bmxm
where:
• b0 is the constant
• b1 through bm are the estimated regression coefficients
• x1... xm are the predictors
The linear discriminant functions discriminate between groups. For example, when you have three groups, Minitab estimates a function
for discriminating between:
• Group 1 and groups 2 and 3

The groups with the largest linear discriminant function, or regression coefficients, contribute most to the classification of observations.
Example Output
Linear Discriminant Function for Groups
1 2 3
Constant -237.85 -170.58 -115.65
IQ Score 1.57 1.19 0.90
Motivation 5.15 4.66 4.00
Interpretation
For the education data, group 1 has the highest linear discriminant function (1.57), indicating that group 1 contributes more than group 2
or 3 to the classification of group membership.
Group and Pooled Statistics − Descriptive Statistics
The descriptive statistic summary displays the pooled mean and group means , and the pooled standard deviation and group standard
deviations .
Example Output
Pooled Means for Group

Variable Mean 1 2 3
IQ Score 102.08 127.37 100.62 78.25
Motivation 47.056 53.600 47.417 40.150
Pooled StDev for Group

Variable StDev 1 2 3
IQ Score 8.109 8.308 9.266 6.511
Motivation 2.994 2.409 3.243 3.251
Interpretation
For the education intelligence test data:

• Group 1 has the highest mean score (127.37)
• Group 2 has the second highest mean (100.62)
• Group 3 has the lowest mean (78.25)
The overall intelligence test mean for all groups was 102.08.
The pooled standard deviation for the intelligence test for all groups is 8.109. The standard deviations for groups 1 through 3 are: 8.308,
9.266, and 6.511.
For the education motivation assessment data:
• Group 1 has the highest mean score (53.600)
• Group 2 has the second highest mean (47.417)
• Group 3 has the lowest mean (40.150)
The overall motivation mean for all groups was 47.056.
The pooled standard deviation for the motivation assessment for all groups is 2.994. The standard deviations for groups 1 through 3 are:
2.409, 3.243, and 3.251.
Group and Pooled Statistics − Covariance Matrices
The covariance tables display the pooled covariance matrix and the covariance matrices for each group. Covariance is a measure of
linear association among observations.
Example Output
Pooled Covariance Matrix
IQ Score Motivation
IQ Score 65.759
Motivation 4.730 8.964
Covariance matrix Group 1
IQ Score Motivation
IQ Score 69.0158
Covariance matrix for Group 2
IQ Score Motivation
IQ Score 85.8675
Covariance matrix for Group 3
IQ Score Motivation
IQ Score 42.3941
Interpretation
For the education data, the covariances for groups 1 through 3 are: 69.0158, 85.8675, and 42.3941. The pooled covariance is 65.759.
Summary of Classified Observations
The summary of classified observations provides the following information:

• Obs − observation number for each observation (the symbols ** denote a misclassified observation)
• True Group − actual group in which the observation is classified
• Pred Group − predicted group for each observation
• X-val Group − predicted group for each observation based on the cross-validation procedure
• Squared Distance Pred and Squared Distance X-val − predicted distance from each group's mean based on results without and with
cross-validation
• Probability Pred and Probability X-val − predicted probabilities of a student being placed in each group based on results without and
with cross-validation
Example Output
Summary of Classified Observations
True Pred X-val Squared Distance Probability

Obs Group Group Group Group Pred X-val Pred X-val
1 1 1 1 1 1.257 1.302 0.99 0.99
2 10.155 10.102 0.01 0.01
3 42.782 42.564 0.00 0.00
2 1 1 1 1 1.098 1.136 1.00 1.00
2 21.379 21.388 0.00 0.00
3 63.521 63.554 0.00 0.00
3 1 1 1 1 1.257 1.302 0.99 0.99
2 10.155 10.102 0.01 0.01
3 42.782 42.564 0.00 0.00
4** 1 2 2 1 3.524 3.699 0.44 0.42
2 3.028 3.072 0.56 0.58
3 25.579 25.960 0.00 0.00
--------------------------------------------------------------
178 3 3 3 1 40.644 40.465 0.00 0.00

2 9.234 9.182 0.02 0.02
3 1.531 1.589 0.98 0.98
179 3 3 3 1 39.0226 38.9067 0.00 0.00
2 7.3604 7.3357 0.03 0.03
3 0.5249 0.5414 0.97 0.97
180 3 3 3 1 38.1331 38.0429 0.00 0.00
2 6.7162 6.7011 0.05 0.05
3 0.6069 0.6263 0.95 0.95
Note Table truncated for space.
Interpretation
For the education data, the summary of classified observations indicates in which group the observation should have been placed. For
example, school officials should have placed student 4 into group 2, but they misclassified this student into group 1.
Linear versus quadratic discriminant analysis

You can perform linear or quadratic discriminant analyses:
• Use a linear analysis when you assume the covariance matrices are equal for all groups.
• Use a quadratic analysis when you assume the covariance matrices are not equal for all groups.
For linear discriminant analysis, an observation is classified into a group if the squared distance (also called the Mahalanobis distance) of
the observation to the group center (mean) is the minimum. An assumption is made that covariance matrices are equal for all groups. The
unique part of the squared distance formula for each group is called the linear discriminant function. For any observation, the group with
the smallest squared distance has the largest linear discriminant function and the observation is then classified into this group.
Linear discriminant analysis has the property of symmetric squared distance: the linear discriminant function of group i evaluated with the
mean of group j is equal to the linear discriminant function of group j evaluated with the mean of group i.
With a quadratic discriminant analysis, no assumption that the groups have equal covariance matrices exists. As with a linear discriminant
analysis, an observation is classified into the group that has the smallest squared distance. However, the squared distance does not
simplify into a linear function, hence the name quadratic discriminant analysis.
Unlike linear distance, quadratic distance is not symmetric. In other words, the quadratic discriminant function of group i evaluated with
the mean of group j is not equal to the quadratic discriminant function of group j evaluated with the mean of group i. In the Minitab results,
quadratic distance is called the generalized squared distance. If the determinant of the sample group covariance matrix is less than one,
the generalized squared distance can be negative.
Cross-validation
Cross-validation is used to compensate for an optimistic apparent error rate. The apparent error rate is the percent of misclassified
observations. This number tends to be optimistic because the classified data are the same data used to build the classification function.
The cross-validation routine works by omitting each observation one at a time, recalculating the classification function using the remaining
data, and then classifying the omitted observation. The computation time takes approximately four times longer with this procedure. When
cross-validation is performed, Minitab displays an additional summary table.
As an alternative to cross-validation, you can calculate a more realistic error rate by splitting your data into two parts. Use one part to
create the discriminant function, and the other part as a validation set. Predict group membership for the validation set and calculate the
error rate as the percent of these data that are misclassified.
Predicting with discriminant analysis

You can use a discriminant analysis to calculate the discriminant functions from observations with known groups. When new observations
are made, you can use the discriminant function to predict which group they belong to.
If explanatory variables do not follow a multivariate normal distribution with equal covariance matrices for the level of the response, the
use of standard discriminant analysis procedures will be statistically inconsistent. In such cases, logistic regression gives more accurate
results.
Prior probabilities
Sometimes you know the probability of an observation belonging to a group prior to conducting a discriminant analysis. For example, if
you are classifying the buyers of a particular car, you may already know that 60% of purchasers are male and 40% are female. If you
know or can estimate these probabilities, a discriminant analysis can use these prior probabilities in calculating the posterior probabilities
(the probabilities of assigning observations to groups given the data).
With the assumption that the data have a normal distribution, the linear discriminant function is increased by ln(pi), where pi is the prior
probability of group i. Because observations are assigned to groups according to the smallest generalized distance, or equivalently the
largest linear discriminant function, the effect is to increase the posterior probabilities for a group with a high prior probability.
Specifying prior probabilities can greatly affect the accuracy of your results. Investigate whether the unequal proportions across groups
reflect a real difference in the true population or whether the difference is a result of sampling error.

Minitab Statguide Multivariate

Uploaded by

Copyright:

Available Formats

Minitab Statguide Multivariate

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Minitab Statguide Multivariate

Uploaded by

Copyright:

Available Formats

Summary - Principal Components Page 1 of 25

Use the eigenvalue results to determine the number of principal components .

Eigenanalysis of the Correlation Matrix

Eigenvalue 3.5476 2.1320 1.0447 0.5315 0.4112

Eigenvalue 0.1665 0.1254 0.0411

For the loan applicant data:

Variable PC1 PC2 PC3 PC4 PC5

Education 0.237 0.444 -0.401 0.240 0.622

Variable PC6 PC7 PC8

Principal Components : Topics

For the loan applicant data:

For the loan applicant data:

Principal components analysis or factor analysis

Correlation or covariance matrix

Determining number of components

Factor Analysis : Topics

Maximum Likelihood Factor Analysis of the Correlation Matrix

Unrotated Factor Loadings and Communalities

Variable Factor1 Factor2 Factor3 Factor4 Factor5 Communality

Variance 4.9426 2.2441 1.0782 0.9621 0.5841 9.8111

Rotated Factor Loadings and Communalities

Variable Factor1 Factor2 Factor3 Factor4 Factor5 Communality

Variance 2.2803 2.1752 1.9204 1.8175 1.6177 9.8111

Sorted Rotated Factor Loadings and Communalities

Variable Factor1 Factor2 Factor3 Factor4 Factor5 Communality

Variance 2.2803 2.1752 1.9204 1.8175 1.6177 9.8111

Factor Score Coefficients

Variable Factor1 Factor2 Factor3 Factor4 Factor5

Steps in factor analysis

Item 3 – How likely are you to recommend our restaurant to others?

Cell Contents: Pearson correlation

Item and Total Statistics

Cronbach's Alpha = 0.9550

Omitted Item Statistics

Adj. Adj. Item-Adj. Squared

Item Analysis : Topics

Standardized Variables, Euclidean Distance, Complete Linkage

For the sports data, the amalgamation steps show that:

Minitab summarizes each cluster in the final partition by the:

Distances Between Cluster Centroids

Cluster Observations : Topics

Cluster Observations Similarity

Red 7 Right-handed females

Blue 3 Left-handed females

Green 7 Right-handed males

Orange 3 Left-handed males

Steps in clustering of observations

Distance measures for observations

Linkage methods for observations

Correlation Coefficient Distance, Average Linkage

For the college admissions data, three clusters are formed:

Cluster Variables : Topics

Cluster Variables Similarity

Red 1 Number of newspaper copies

Blue 2 Literacy rate and presence of a university

Green 2 Number of radios and television sets

Steps in clustering of variables

Distance measures for variables