Minitab Statguide Multivariate
Minitab Statguide Multivariate
Minitab Statguide Multivariate
Principal Components
Summary
Use Principal Components to form a smaller number of uncorrelated variables. The goal of principal components analysis is to explain the
maximum amount of variance with the fewest number of principal components .
For example:
• You record information on 10 socioeconomic variables and you want to reduce the variables into a smaller number of components to
more easily analyze the data.
• You want to analyze the customer responses to several attributes of a new product in order to form a smaller number of uncorrelated
variables that are easier to interpret.
Principal components analysis is commonly used as one step in a series of analyses. For example, you can use Principal Components to
reduce your data and avoid multicollinearity or when you have too many predictors relative to the number of observations.
A principal components analysis often uncovers unsuspected relationships, allowing you to interpret the data in a new way.
You can perform principal components analysis when you have one sample and several variables are measured on each sampling unit.
Data Description
A bank requires eight pieces of information when applying for a loan: income, education level, age, length of time at current residence,
length of time with current employer, savings, debt, and number of credit cards. A bank administrator wants to analyze this information for
reporting purposes.
Principal Components
Eigenanalysis − Eigenvalues
Example Output
Interpretation
Principal Components
Eigenanalysis − Coefficients
The principal components are the linear combinations of the original variables that account for the variance in the data. The maximum
number of components extracted always equals the number of variables. The eigenvectors , which are comprised of coefficients
corresponding to each variables, are used to calculate the principal component scores . The coefficients indicate the relative weight of
each variable in the component. The bigger the absolute value of the coefficient, the more important the corresponding variable is in
constructing the component.
Note You must standardize the variables to obtain the correct component score.
Example Output
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 2 of 25
Interpretation
For the loan applicant data, the first principal component's scores are computed from the original data using the coefficients listed under
PC1 :
PC1 = 0.314 Income + 0.237 Education + 0.484 Age + 0.466 Residence + 0.459 Employ + 0.404 Savings − 0.067 Debt − 0.123 Credit
cards
The interpretation of the principal components is subjective and requires knowledge of the data:
• Age (0.484), Residence (0.466), Employ (0.459), and Savings (0.404) have large positive loadings on component 1, so label this
component Applicant Background.
• Debt (−
−0.585) and Credit cards (−
−0.452) have large negative loadings on component 2, so label this component Credit History.
• Income (−
−0.676) and Education (−
−0.401) have large negative loadings on Component 3, so label this component Academic and
Income qualifications.
The scree, or eigenvalue, plot provides one method for determining the number of principal components. The scree plot displays the
component number versus the corresponding eigenvalue . The eigenvalues of the correlation matrix equal the variances of the principal
components; therefore, choose the number of components based on the size of the eigenvalues.
The ideal pattern is a steep curve, followed by a bend and then a straight line. Retain those components in the steep curve before the first
point that starts the line trend. In practice, you may have difficulty interpreting a scree plot. Use your knowledge of the data and the
results from the other methods of selecting components to help decide the number of components to retain.
Example Output
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 3 of 25
Interpretation
For the loan applicant data you can conclude that the first three principal components account for most of the total variability in data
(given by the eigenvalue). The remaining principal components account for a very small proportion of the variability (close to zero) and
are probably unimportant.
Principal Components
Graphs − Score Plot
The score plot graphs the second principal component scores versus the first principal component scores. If the first two components
account for most of the variance in the data, you can use the score plot to assess the data structure and detect clusters, outliers , and
trends. The plot may reveal groupings of points, which may indicate two or more separate distributions in the data. If the data follow a
normal distribution and no outliers are present, the points are randomly distributed around zero.
To create plots for other components, store the scores and use Graph > Scatterplot.
Example Output
Interpretation
For the loan applicant data, the point in the lower right hand corner may be an outlier. Investigate this point further.
Principal Components
Graphs − Loading Plot
The loading plot provides information about the loadings of the first two principal components.
Example Output
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 4 of 25
Interpretation
Principal Components
Graphs − Biplot
The biplot overlays the score and loading plots. Use the biplot to assess the data structure and loadings on one graph. The second
principal component scores are plotted versus the first principal component scores. The loadings for these two principal components are
plotted on the same graph.
Example Output
Interpretation
Multivariate analysis
Use multivariate analysis to:
• Understand or reduce the data dimension by analyzing the data covariance structure.
− Principal Components − use to reduce the data into a smaller number of components
− Factor Analysis − use to describe the covariance among variables in terms of a few underlying factors
• Assign group membership.
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 5 of 25
− Discriminant Analysis − use to classify observations into two or more groups when groups are known
− Cluster Observations − use to classify similar observations into groups when groups are initially unknown
− Cluster Variables − use to classify similar variables into groups when groups are initially unknown
− Cluster K-Means − use to classify similar observations into groups when groups are initially unknown but good starting points for
clusters are available
Because Minitab does not compare tests of significance for multivariate procedures, interpreting the results is somewhat subjective.
However, you can make informed conclusions if you are familiar with the data.
Multivariate analysis of means (MANOVA) is another case of multivariate analysis. MANOVA allows you to compare the means of
variables across groups. See the ANOVA StatGuide for information on both Balanced and General MANOVA techniques.
Factor Analysis
Summary
Use Factor Analysis to summarize the data covariance structure into a smaller number of dimensions. The emphasis in factor analysis is
to describe the covariance among variables in terms of a few underlying unobservable random quantities or factors .
For example, you may want to analyze:
• Customer responses on several attributes of a new product to form fewer uncorrelated factors that are easier to interpret.
• Test scores in different subject areas, looking for correlations among the variables that may indicate the existence of factors.
You can perform factor analysis when you have one sample and several variables are measured on each sampling unit.
Data Description
Job applicants were measured on 12 different characteristics: academic record, appearance, communication, company fit, experience,
job fit, letter of interest, likeability, organization, potential, résumé, and self-confidence. After conducting a principal components analysis,
you speculate that five factors may fit the data well. You want to conduct a factor analysis using a maximum likelihood extraction to
determine what factors underlie the data.
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 6 of 25
Score plot
Loading plot
Biplot
Factor Analysis
Loadings and Communalities − Unrotated
Use factor analysis to determine the underlying factors responsible for correlations in the data.
The unrotated factor analysis displays:
• Loadings − represent how much a factor explains a variable. High loadings (positive or negative) indicate that the factor strongly
influences the variable. Low loadings (positive or negative) indicate that the factor has a weak influence on the variables. Examine the
loading pattern to determine on which factor each variable loads. Some variables may load on multiple factors. In the unrotated factor
loading table, the loadings are often difficult to interpret.
• Communality − each variable's proportion of variability that is explained by the factors. The closer the communality is to 1, the better
the variable is explained by the factors. You can decide to add a factor if it contributes to the fit of certain variables.
• Variance − variability in the data explained by each factor. Variance equals the eigenvalue if you use principal components to extract
factors.
• %Var − proportion of variability in the data explained by each factor.
After you select the number of factors, try different rotations so you can more easily interpret the factor loadings. Look at the rotated
solution to interpret the factors.
Example Output
Interpretation
For the job applicant data, 5 factors were extracted from the 12 variables. All variables are well represented by the 5 chosen factors,
given that the corresponding communalities are generally high. For example, 0.789, or 78.9%, of the variability in Academic Record is
explained by the 5 factors. Also, the 5 chosen factors explain most of the total data variation (0.818 or 81.8%).
Factor Analysis
Loadings and Communalities − Rotated
Factor rotation simplifies the loading structure, allowing you to more easily interpret the factor loadings. There are four methods to
orthogonally rotate the initial factor loadings:
• Equimax − maximizes variance of squared loadings within both variables and factors.
• Varimax − maximizes variance of squared loadings within factors (i.e. simplifies the columns of the loading matrix); the most widely
used rotation method. This method attempts to make the loadings either large or small to ease interpretation.
• Quartimax − maximizes variance of squared loadings within variables (i.e. simplifies the rows of the loading matrix).
• Orthomax − rotation that comprises the above three depending on the value of the parameter gamma (0-1).
All methods simplify the loading structure. However, one method may not work best in all cases. You may want to try different rotations
and use the one that produces the most interpretable results.
The percent of the total variability explained by the factors does not change with rotation and the communalities remain the same. But,
after rotation, the factors are more evenly balanced in the percent of variability that they account for (compare the % Var rows from the
unrotated and rotated output).
Example Output
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 7 of 25
Varimax Rotation
Interpretation
For the job applicant data, the varimax rotation was performed. You can now interpret the factors more easily. For example:
• Academic record (0.815), Experience (0.694), and Potential (0.785) have large positive loadings on factor 1, so label this factor
qualifications.
• Appearance (0.644), Likeability (0.703), and Self-confidence (0.877) have large positive loadings on factor 2, so label this factor
personal qualities.
• Communication (0.815) and Organization (0.864) have large positive loadings on factor 3, so label this factor work skills.
• Letter (0.805) and Resume (0.899) have large positive loadings on factor 4, so label this factor writing skills.
• Company Fit (−
−0.729) and Job Fit (−
−0.810) have large positive loadings on factor 5, so label this factor company and job fit.
Factor Analysis
Loadings and Communalities − Sorted Rotated
The sorted rotated table contains the same content as the rotated solution. Because the factor loadings are sorted by size, they are
easier to read in the sorted rotated table.
Minitab sorts by the maximum absolute loading for any factor. Minitab sorts the loadings so that the high loadings for all factors form a
diagonal, making the loadings easier to read.
Example Output
Interpretation
For the job applicant data, you can easily see what variables load on each factor:
• Academic record (0.815), Experience (0.694), and Potential (0.785) have large positive loadings on factor 1, so label this factor
Qualifications.
• Appearance (0.644), Likeability (0.703), and Self-confidence (0.877) have large positive loadings on factor 2, so label this factor
Personal Qualities.
• Communication (0.815) and Organization (0.864) have large positive loadings on factor 3, so label this factor Work Skills.
• Letter (0.805) and Resume (0.899) have large positive loadings on factor 4, so label this factor Writing Skills.
• Company Fit (−
−0.729) and Job Fit (−
−0.810) have large positive loadings on factor 5, so label this factor Company and Job Fit.
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 8 of 25
Factor Analysis
Coefficients
The coefficients are used to calculate the factor scores . Minitab calculates factor scores by multiplying factor score coefficients (listed
under Factor1, Factor2, and so on) and your data after they have been scaled and centered by subtracting means.
Use factor scores to:
• Examine the behavior of observations
• Use in another analysis such as regression or MANOVA.
Note You must standardize the variables to obtain the correct factor score.
Example Output
Interpretation
For the job applicant data, the first factor scores are computed from the original data using the coefficients listed under Factor1 :
Factor 1 = 0.482 Academic record + 0.028 Appearance − 0.004 Communication − 0.081 Company Fit + 0.209 Experience − 0.143 Job Fit
− 0.030 Letter − 0.065 Likeability − 0.113 Organization + 0.594 Potential − 0.104 Resume − 0.119 Self-confidence
For the job applicant data, the factor score coefficient pattern matches the loading pattern. For example, for factor 1 the coefficients with
the highest absolute value (Academic record, Experience, and Potential) match the three variables for that load on factor 1.
Factor Analysis
Graphs − Scree Plot
Use the scree, or eigenvalue, plot (graph of factors versus the corresponding eigenvalues ) to provide visual information about the
factors. The eigenvalues of the correlation matrix equal the variances of the factors in the unrotated solution.
From this plot, you can determine how well the chosen number of components fit the data.
Example Output
Interpretation
For the job applicant data, you can conclude that the first five factors account for most of the total variability in data (given by the
eigenvalues). The remaining factors account for a very small proportion of the variability (close to zero) and are likely unimportant.
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 9 of 25
Factor Analysis
Graphs − Score Plot
The score plot graphs the second factor scores versus the first factor scores. The plot of the factors provides checks on the assumption of
normality and reveals outliers . If the data are normal and no outliers are present, the score plot shows the points randomly distributed
around zero.
When the first two factors account for most of the variance, you can use the score plot to visually assess the structure of your data.
To create plots for other factors, store the scores and use Graph > Scatterplot.
Example Output
Interpretation
For the job applicant data, the data appear normal and no outliers are apparent.
Factor Analysis
Graphs − Loading Plot
After selecting the number of factors, try different rotations so you can more easily interpret the factor loadings . The loading plot provides
information about the loadings of the first two factors.
Example Output
Interpretation
For the job applicant data, the varimax rotation was performed. You can now interpret the first two factors more easily:
• Academic record, Experience, and Potential have large positive loadings on factor 1, so label this factor Qualifications.
• Appearance, Likeability, and Self-confidence have large positive loadings on factor 2, so label this factor Personal Qualities.
Factor Analysis
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 10 of 25
Graphs − Biplot
The biplot overlays the score and loading plots. Use the biplot to assess the data structure and loadings on one graph. The second
principal component scores are plotted versus the first principal component scores. The loadings for these two principal components are
plotted on the same graph.
Example Output
Interpretation
For the job applicant data, the varimax rotation was performed. You can now interpret the first two factors more easily.
• The data appear normal and no outliers are apparent
• Academic record, Experience, and Potential have large positive loadings on factor 1, so label this factor Qualifications.
• Appearance, Likeability, and Self-confidence have large positive loadings on factor 2, so label this factor Personal qualities.
Extraction methods
Minitab offers two extraction methods for factor analysis: principal components and maximum likelihood. When performing factor analysis:
• If the factors and the errors obtained after fitting the factor model are assumed to follow a normal distribution , use the maximum
likelihood method to obtain maximum likelihood estimates of the factor loadings .
• If the factors and errors obtained after fitting the factor model are not assumed to follow a normal distribution , use the principal
components method.
Item Analysis
Summary
Use Item Analysis to evaluate whether the items in a survey or test assess the same skill or characteristic. Commonly used in social
sciences, education, or service quality industry.
Data Description
A restaurant manager wants to assess customer satisfaction with the following three questions:
Item 1 – How satisfied are you with our services?
Item 2 – How likely are you to visit us again?
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 11 of 25
Item Analysis
Test Statistics − Correlation Matrix
Use the correlation matrix to assess the strength and direction of the relationship between two items or variables. Items that measure the
same construct should have high, positive correlation values. If the items are not highly correlated, then the items may be ambiguous,
difficult to understand, or measure different constructs.
Often, variables with correlation values greater than 0.7 are considered highly correlated. However, your correlation benchmark value will
depend on subject area knowledge and the number of items in your analysis.
Example Output
Correlation Matrix
Item 1 Item 2
Item 2 0.903
Item 3 0.867 0.864
Interpretation
For the customer service data, Item 1 and Item 2 have a correlation of 0.903. All the items are highly correlated to each other.
Item Analysis
Test Statistics − Item and Total Statistics
The tabulated statistics table summarizes information about each item and the total for all items:
• Total count – Number of observations.
• Mean – Sum of all observations divided by the total count.
• StDev – Measure of dispersion analogous to the average distance (independent of direction) of each observation from the mean.
Example Output
Total
Variable Count Mean StDev
Item 1 50 3.1600 1.2675
Item 2 50 2.8400 1.3607
Item 3 50 2.9400 1.3463
Total 50 8.9400 3.8087
Interpretation
For the customer service data, Item 1 has 50 observations, the mean response is 3.1600, and the standard deviation is 1.2675.
Item Analysis
Test Statistics − Cronbach's Alpha
A measure of internal consistency used to assess how reliably multiple items in a survey or test assess the same skill or characteristic.
Values range between 0 and 1. If Cronbach's alpha is low, then the items may not reliably measure a single construct. Typically, a value
of 0.7 or higher is considered good. However, the benchmark value varies by subject area and number items.
Example Output
Interpretation
For the customer service data, the overall Cronbach's alpha is 0.9550. The value is greater than the common benchmark of 0.7 and
suggests that the items are measuring the same construct.
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 12 of 25
Item Analysis
Test Statistics − Omitted Item Statistics
Use the omitted item statistics table to assess whether removing an item from the analysis improves the internal consistency:
• Adj Total Mean – Mean of the total after omitting an item from the set.
• Adj Total Stdev – Standard deviation after omitting an item from the set.
• Item-Adi Total Corr – Correlation between the scores of one omitted item and the total scores of all other items. In practice, values
range between 0 and 1. A higher value suggests that the omitted item measures the same construct as the other items.
2
• Squared Multiple Corr – Coefficient of determination (R ) when the omitted item is regressed on the remaining items. Values range
between 0 and 1. A higher value suggests that the omitted item measures the same construct as the other items.
• Cronbach's Alpha – Cronbach's alpha calculated after an item is omitted from the analysis. Fairly constant values for all omitted items
suggest that all items measure the same construct. An increase in Cronbach's alpha for a specific omitted item suggests that it does
not measure the same construct.
If an omitted item has a low Item-Adj Total Corr, a low Squared Multiple Corr, and Cronbach's alpha substantially increases, then you
might consider removing it from the set.
Example Output
Interpretation
For the customer service data, item adjusted total correlation when item 1, 2, and 3 are omitted from the model is 0.9166, 0.9134, and
0.8870 respectively.
The squared multiple correlation when item 1, 2, and 3 are omitted from the model is 0.8447, 0.8413, and 0.7869 respectively.
Cronbach's alpha when item 1, 2, and 3 are omitted from the model is 0.9268, 0.9277, and 0.9476 respectively.
The Item-Adj Total Corr and Squared Multiple Corr are consistently high and the Cronbach's alpha values do not differ significantly.
Collectively, the evidence suggests that all items measure the same construct.
Use the plot to visually assess the relationship between every combination of items or variables.
Example Output
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 13 of 25
Interpretation
For the Customer service data, the plot suggests that all the items have a linear and positive relationship.
Cluster Observations
Summary
Use the cluster observations procedure to classify similar observations into groups, when the groups are initially not known. Cluster
observations uses a hierarchical clustering procedure.
For example, you can:
• Make measurements on 5 nutritional characteristics of 12 breakfast cereal brands, then group the cereal brands that have similar
characteristics
• Record information about characteristics of recreation parks in the United States, then group the parks by their similarities
You can perform the clustering of observations analysis when:
• You want to sort observations into two or more categories
• You do not have missing data
Data Description
You are currently developing a new piece of sporting equipment and you want to test different groups on the equipment's ease of use.
You have data on gender, height, weight, and handedness of 20 people, and you want to group the available people by their similarities.
Cluster Observations
Amalgamation Steps
At each amalgamation step, two clusters are joined until only one cluster remains. The amalgamation table shows the clusters joined,
their similarity level, distance between them, and other characteristics of each newly formed cluster. In Minitab, your choice of linkage
method and distance measure will greatly influence the clustering outcome.
You should look at the similarity level and distance between the joined clusters to choose the number of clusters for the final partition of
data. The step prior to where the values change abruptly may determine the number of clusters for the final partition. For the final
partition, you want reasonably large similarity levels and reasonably small distances between the joined clusters.
Example Output
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 14 of 25
12 8 81.4039 0.89027 2 17 2 5
13 7 79.8185 0.96617 6 11 6 5
14 6 78.7534 1.01716 4 12 4 3
15 5 66.2112 1.61760 2 5 2 7
16 4 62.0036 1.81904 1 6 1 7
17 3 41.0474 2.82229 1 4 1 10
18 2 40.1718 2.86421 2 7 2 10
19 1 0.0000 4.78739 1 2 1 20
Interpretation
Cluster Observations
Final Partition
Example Output
Number of clusters: 4
Average Maximum
Within distance distance
Number of cluster sum from from
observations of squares centroid centroid
Cluster1 7 3.25713 0.612540 1.12081
Cluster2 7 2.72247 0.581390 0.95186
Cluster3 3 0.55977 0.398964 0.54907
Cluster4 3 0.37116 0.326533 0.48848
Cluster Centroids
Variable Cluster1 Cluster2 Cluster3 Cluster4 Grand centroid
Gender 0.97468 -0.97468 0.97468 -0.97468 -0.0000000
Height -1.00352 1.01283 -0.37277 0.35105 -0.0000000
Weight -0.90672 0.93927 -0.86797 0.79203 -0.0000000
Handedness 0.63808 0.63808 -1.48885 -1.48885 -0.0000000
Interpretation
For the sports data, four clusters are evident in the final partition. The within cluster sums of squares are reasonably small − the sum of
squares is bigger for the first cluster (3.25713). You should further analyze your data to see if the grouping makes sense.
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 15 of 25
Cluster Observations
Graphs − Dendrogram
The dendrogram displays the groups formed by clustering of observations and their similarity levels. You can also display the distance
level on the y-axis. You should analyze the data to see if the classification makes sense.
You can label observations using a column of case labels to help interpret the clusters in the dendrogram.
Example Output
Interpretation
The final partition for the sports data contains four clusters of 20 people, classified by similarities:
Standardizing variables
When the variables are in different units, you should standardize all variables to minimize the effect of scale differences. Minitab
standardizes all variables by subtracting the means and dividing by the standard deviation before calculating the distance matrix. When
you standardize variables, the grand centroid is 0 for all clusters.
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 16 of 25
• Euclidean distance − a standard mathematical measure of distance (square root of the sum of squared differences).
• Pearson distance − a square root of the sum of square distances divided by variances. This distance gives a standardized measure of
distance.
• Manhattan distance − the sum of absolute differences, so that outliers receive less weight than they would if the Euclidean method
were used.
• Squared Euclidean and squared Pearson distances − the square of the Euclidean and Pearson distances, respectively. Therefore, the
distances that are large under the Euclidean and Pearson methods are even larger under the squared Euclidean and squared
Pearson distances.
Cluster Variables
Summary
Use the cluster variables procedure to classify variables into groups when the groups are initially not known. The primary reason to
cluster variables is to reduce the number of variables. Cluster variables uses a hierarchical clustering procedure.
For example, you can:
• Record information about the characteristics of recreation parks in the United States, then group the similar variables to reduce the
data.
• Conduct a study to determine the effects of a change in environment on blood pressure. You can record different physical
measurements of the subjects, analyze the variables for possible similar characteristics, and combine the variables to reduce the
data.
You can perform the clustering of variables analysis when:
• You want to sort variables into two or more categories
• You do not have missing data
Data Description
You conduct a study to determine effects of the number of media outlets and universities and the literacy rate on the college admissions
of the population. For 10 cities around the world, you count the number of newspaper copies, radios, and television sets per 1,000
people. You also determine the literacy rate and whether a university is located in each city. You want to reduce the number of variables
by combining variables based on similar characteristics.
Cluster Variables
Amalgamation Steps
At each amalgamation step, two clusters variables are joined until only one cluster remains. The amalgamation table shows the clusters
joined, their similarity level, distance between them, and other characteristics of each newly formed cluster. In Minitab, your choice of
linkage method and distance measure will greatly influence the clustering outcome.
You should look at the similarity level and distance between the joined clusters to choose the number of clusters for the final partition of
data. The step prior to where the values change abruptly may determine the number of clusters for the final partition. For the final
partition, you want reasonably large similarity levels and reasonably small distances between the joined clusters.
Example Output
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 17 of 25
Interpretation
For the college admissions data, the amalgamation steps show that the similarity level changes first from 93.97 to 93.15, then changes
abruptly from 93.1548 to 87.3150 at steps 2 and 3 (from 3 clusters to 2). This change could indicate that 3 clusters are reasonably
sufficient for the final partition, assuming that this grouping makes sense for the data.
Cluster Variables
Final Partition
If you request a final partition , you receive a list of the variables in each cluster.
Example Output
Final Partition
Cluster 1
Newspaper
Cluster 2
Radios TV Sets
Cluster 3
Literacy Rate University
Interpretation
The dendrogram displays the groups formed by clustering of variables, and their similarity levels. You can also display distance levels on
the y-axis. You should analyze your data to see if the classification makes sense.
Example Output
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 18 of 25
Interpretation
For the college admissions data, the final partition contains three clusters classified by similarities:
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 19 of 25
• Complete linkage or "furthest neighbor" − the distance between two clusters is the maximum distance between a variable in one
cluster and a variable in the other cluster. This method ensures that all variables in a cluster are within a maximum distance and tends
to produce clusters with similar diameters. The results can be sensitive to outliers.
• Median linkage − the distance between two clusters is the median distance between a variable in one cluster and a variable in the
other cluster. This technique uses the median rather than the mean, thus downweighting the influence of outliers.
• McQuitty's linkage − when two clusters are joined, the distance of the new cluster to any other cluster is calculated as the average of
the distances of the soon-to-be-joined clusters to the other cluster. For example, if clusters 1 and 3 are to be joined into a new cluster,
say 1*, then the distance from 1* to cluster 4 is the average of the distances from 1 to 4 and 3 to 4.
• Ward's linkage − the distance between two clusters is the sum of squared deviations from points to centroids. The objective of Ward's
linkage is to minimize the within-cluster sum of squares. It tends to produce clusters with similar numbers of variables, but it is
sensitive to outliers. In Ward's linkage, the distance between two clusters can be larger than dmax, the maximum value in the original
distance matrix. If this happens, the similarity will be negative.
Cluster K-Means
Summary
Use Cluster K-Means to cluster observations into groups when the groups are initially unknown. This procedure uses non-hierarchical
clustering of observations. K-means clustering works best when sufficient information is available to make good starting cluster
designations .
For example, you can use Cluster K-Means to:
• Group people based on physical fitness level when you suspect that people fall into three categories
• Cluster population data with known characteristics from a large database
Use Cluster K-Means when you have:
• Good starting points for clusters
• Multiple measurements on items or subjects
• No missing data
Data Description
As a business analyst, you want to classify 22 successful small-to-medium size manufacturing companies into meaningful groups for
future analyses.
You suspect that the data fall into three groups:
• Established companies − in existence for over 15 years, over 130 clients, a strong return on investments, and strong sales
• Mid-growth companies − in existence for over 10 years, over 100 clients, a good return on investments, and good sales
Young companies − in existence for less than 10 years, less than 100 clients, a good return on investments, and good sales
Example Output
Standardized Variables
Final Partition
Number of clusters: 3
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 20 of 25
Cluster Centroids
Grand
Variable Cluster1 Cluster2 Cluster3 centroid
Clients 1.2318 0.5225 -0.9108 0.0000
Rate of return 1.2942 0.2217 -0.6950 0.0000
Sales 1.1866 0.5157 -0.8872 0.0000
Years 1.2030 0.5479 -0.9195 0.0000
Interpretation
For the manufacturing company data, K-means clustering classified the 22 companies as 4 established companies, 8 mid-growth
companies, and 10 young companies. Cluster 1 (established companies) has the smallest within-cluster sum of squares (1.593), probably
because there are only 4 observations in this cluster.
K-means procedures work best when you provide good starting points for clusters.
Outliers
The presence of outliers , unusually large or small values, in your data can affect the clustering of observations. The clusters are typically
larger when outliers are not removed, and the resulting solution is more confused. If you have a large data set, consider deleting outliers
from the Cluster K-Means analysis.
Discriminant Analysis
Summary
Use Discriminant Analysis to classify observations into two or more groups if you have a sample with known groups. You can also use a
discriminant analysis to investigate how variables contribute to group separation and to place objects or individuals into defined groups;
for example, to classify three species of birds based on wingspan, color, and diet.
Discriminant analysis is similar to multiple regression because both techniques use two or more predictor variables and a single response
variable. However, in discriminant analysis, the response variable is categorical . Examples of categorical variables are gender, color, and
operational status.
You can perform a linear or quadratic discriminant analysis:
• Use a linear analysis when you assume the covariance matrices are equal for all groups.
• Use a quadratic analysis when you assume the covariance matrices are not equal for all groups.
Data Description
High school administrators assign each student to one of three educational tracks:
• 1 − for above average students who can learn independently and have strong math and language skills
• 2 − for average students who learn best with a moderate amount of teacher attention and have average math and language skills
• 3 − for students who require substantial interaction with the teacher and have weak math and language skills
An intelligence test and motivation assessment were administered to 60 students from each track. School officials want to know if the
students' intelligence test and motivation assessment scores accurately classify student placement.
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 21 of 25
In the summary of the classification results, Minitab classifies the observations by:
• The group in which they are placed
• The group in which they are predicted to be placed
Minitab provides the following statistics for each true group :
• Total N − the number of observations in each true group
• N Correct − the number of observations correctly placed in each true group
• Proportion − the proportion of observations correctly placed in each true group
N , N Correct , and Proportion Correct are also shown for all groups.
Example Output
Group 1 2 3
Count 60 60 60
Summary of classification
True Group
Put into Group 1 2 3
1 59 5 0
2 1 53 3
3 0 2 57
Total N 60 60 60
N correct 59 53 57
Proportion 0.983 0.883 0.950
Interpretation
Discriminant Analysis
Classification − With Cross-Validation
The discriminant analysis output includes a summary of the classification results with cross-validation. Cross-validation compensates for
an optimistic error rate . Therefore, the cross-validation classification results are more conservative than classification without cross-
validation.
As in the regular classification situation, observations are classified by:
• The group in which they are placed
• The group in which they are predicted to be placed
Minitab provides the following statistics for each true group :
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 22 of 25
Example Output
True Group
Put into Group 1 2 3
1 59 5 0
2 1 52 3
3 0 3 57
Total N 60 60 60
N correct 59 52 57
Proportion 0.983 0.867 0.950
Interpretation
Discriminant Analysis
Distance and Discriminant Functions − Squared Distance Between Groups
An observation is classified into a group if the squared distance (also called the Mahalanobis distance) of the observation to the group
center (mean) is the minimum. For linear discriminant analysis, an assumption is made that covariance matrices are equal for all groups.
The table displays the distance between groups. Linear discriminant analysis has the property of symmetric squared distance, meaning
that the linear discriminant function of group i evaluated with the mean of group j is equal to the linear discriminant function of group j
evaluated with the mean of group i.
Example Output
1 2 3
1 0.0000 12.9853 48.0911
2 12.9853 0.0000 11.3197
3 48.0911 11.3197 0.0000
Interpretation
For the education data, the greatest distance is between groups 1 and 3 (48.0911). The difference between groups 1 and 2 is 12.9853,
and the difference between groups 2 and 3 is 11.3197.
Discriminant Analysis
Distance and Discriminant Functions − Linear Discriminant Function for Groups
The linear discriminant scores for each group correspond to the regression coefficients in multiple regression analyses. In general, you
can fit a linear equation of the type:
Group = b0 + b1x1 + b2x1 + ... + bmxm
where:
• b0 is the constant
• b1 through bm are the estimated regression coefficients
• x1... xm are the predictors
The linear discriminant functions discriminate between groups. For example, when you have three groups, Minitab estimates a function
for discriminating between:
• Group 1 and groups 2 and 3
• Group 2 and groups 1 and 3
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 23 of 25
Example Output
1 2 3
Constant -237.85 -170.58 -115.65
IQ Score 1.57 1.19 0.90
Motivation 5.15 4.66 4.00
Interpretation
For the education data, group 1 has the highest linear discriminant function (1.57), indicating that group 1 contributes more than group 2
or 3 to the classification of group membership.
Discriminant Analysis
Group and Pooled Statistics − Descriptive Statistics
The descriptive statistic summary displays the pooled mean and group means , and the pooled standard deviation and group standard
deviations .
Example Output
Interpretation
Discriminant Analysis
Group and Pooled Statistics − Covariance Matrices
The covariance tables display the pooled covariance matrix and the covariance matrices for each group. Covariance is a measure of
linear association among observations.
Example Output
IQ Score Motivation
IQ Score 65.759
Motivation 4.730 8.964
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 24 of 25
IQ Score Motivation
IQ Score 69.0158
Motivation 0.6915 5.8034
IQ Score Motivation
IQ Score 85.8675
Motivation 4.6031 10.5184
IQ Score Motivation
IQ Score 42.3941
Motivation 8.8941 10.5703
Interpretation
For the education data, the covariances for groups 1 through 3 are: 69.0158, 85.8675, and 42.3941. The pooled covariance is 65.759.
Discriminant Analysis
Summary of Classified Observations
Example Output
--------------------------------------------------------------
Interpretation
For the education data, the summary of classified observations indicates in which group the observation should have been placed. For
example, school officials should have placed student 4 into group 2, but they misclassified this student into group 1.
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07
Summary - Principal Components Page 25 of 25
Cross-validation
Cross-validation is used to compensate for an optimistic apparent error rate. The apparent error rate is the percent of misclassified
observations. This number tends to be optimistic because the classified data are the same data used to build the classification function.
The cross-validation routine works by omitting each observation one at a time, recalculating the classification function using the remaining
data, and then classifying the omitted observation. The computation time takes approximately four times longer with this procedure. When
cross-validation is performed, Minitab displays an additional summary table.
As an alternative to cross-validation, you can calculate a more realistic error rate by splitting your data into two parts. Use one part to
create the discriminant function, and the other part as a validation set. Predict group membership for the validation set and calculate the
error rate as the percent of these data that are misclassified.
Prior probabilities
Sometimes you know the probability of an observation belonging to a group prior to conducting a discriminant analysis. For example, if
you are classifying the buyers of a particular car, you may already know that 60% of purchasers are male and 40% are female. If you
know or can estimate these probabilities, a discriminant analysis can use these prior probabilities in calculating the posterior probabilities
(the probabilities of assigning observations to groups given the data).
With the assumption that the data have a normal distribution, the linear discriminant function is increased by ln(pi), where pi is the prior
probability of group i. Because observations are assigned to groups according to the smallest generalized distance, or equivalently the
largest linear discriminant function, the effect is to increase the posterior probabilities for a group with a high prior probability.
Specifying prior probabilities can greatly affect the accuracy of your results. Investigate whether the unequal proportions across groups
reflect a real difference in the true population or whether the difference is a result of sampling error.
file://C:\Users\SKSINGH\AppData\Local\Temp\~hhB41E.htm 03-Dec-07