Lecture Notes On Compositional Data Analysis
Lecture Notes On Compositional Data Analysis
Lecture Notes
on
Compositional Data Analysis
V. Pawlowsky-Glahn
J. J. Egozcue
R. Tolosana-Delgado
2007
Prof. Dr. Vera Pawlowsky-Glahn
Catedrática de Universidad (Full professor)
University of Girona
Dept. of Computer Science and Applied Mathematics
Campus Montilivi — P-1, E-17071 Girona, Spain
vera.pawlowsky@udg.edu
These notes have been prepared as support to a short course on compositional data
analysis. Their aim is to transmit the basic concepts and skills for simple applications,
thus setting the premises for more advanced projects. One should be aware that frequent
updates will be required in the near future, as the theory presented here is a field of
active research.
The notes are based both on the monograph by John Aitchison, Statistical analysis
of compositional data (1986), and on recent developments that complement the theory
developed there, mainly those by Aitchison, 1997; Barceló-Vidal et al., 2001; Billheimer
et al., 2001; Pawlowsky-Glahn and Egozcue, 2001, 2002; Aitchison et al., 2002; Egozcue
et al., 2003; Pawlowsky-Glahn, 2003; Egozcue and Pawlowsky-Glahn, 2005. To avoid
constant references to mentioned documents, only complementary references will be
given within the text.
Readers should be aware that for a thorough understanding of compositional data
analysis, a good knowledge in standard univariate statistics, basic linear algebra and
calculus, complemented with an introduction to applied multivariate statistical analysis,
is a must. The specific subjects of interest in multivariate statistics in real space can
be learned in parallel from standard textbooks, like for instance Krzanowski (1988)
and Krzanowski and Marriott (1994) (in English), Fahrmeir and Hamerle (1984) (in
German), or Peña (2002) (in Spanish). Thus, the intended audience goes from advanced
students in applied sciences to practitioners.
Concerning notation, it is important to note that, to conform to the standard praxis
of registering samples as a matrix where each row is a sample and each column is a
variate, vectors will be considered as row vectors to make the transfer from theoretical
concepts to practical computations easier.
Most chapters end with a list of exercises. They are formulated in such a way that
they have to be solved using an appropriate software. A user friendly, MS-Excel based
freeware to facilitate this task can be downloaded from the web at the following address:
http://ima.udg.edu/Recerca/EIO/inici eng.html
i
ii Preface
subroutines for Matlab, developed mainly by John Aitchison, which can be obtained
from John Aitchison himself or from anybody of the compositional data analysis group
at the University of Girona. Finally, those interested in working with R (or S-plus)
may either use the set of functions “mixeR” by Bren (2003), or the full-fledged package
“compositions” by van den Boogaart and Tolosana-Delgado (2005).
Contents
Preface i
1 Introduction 1
4 Coordinate representation 17
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Compositional observations in real space . . . . . . . . . . . . . . . . . . 18
4.3 Generating systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 Orthonormal coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.5 Working in coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.6 Additive log-ratio coordinates . . . . . . . . . . . . . . . . . . . . . . . . 27
4.7 Simplicial matrix notation . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
iii
iv Contents
7 Statistical inference 57
7.1 Testing hypothesis about two groups . . . . . . . . . . . . . . . . . . . . 57
7.2 Probability and confidence regions for compositional data . . . . . . . . . 60
7.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
8 Compositional processes 63
8.1 Linear processes: exponential growth or decay of mass . . . . . . . . . . 63
8.2 Complementary processes . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.3 Mixture process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
8.4 Linear regression with compositional response . . . . . . . . . . . . . . . 72
8.5 Principal component analysis . . . . . . . . . . . . . . . . . . . . . . . . 75
References 81
Appendices a
Introduction
The pre-1960 phase rode on the crest of the developmental wave of stan-
dard multivariate statistical analysis, an appropriate form of analysis for
the investigation of problems with real sample spaces. Despite the obvious
fact that a compositional vector—with components the proportions of some
whole—is subject to a constant-sum constraint, and so is entirely different
from the unconstrained vector of standard unconstrained multivariate sta-
tistical analysis, scientists and statisticians alike seemed almost to delight in
applying all the intricacies of standard multivariate analysis, in particular
correlation analysis, to compositional vectors. We know that Karl Pearson,
in his definitive 1897 paper on spurious correlations, had pointed out the
pitfalls of interpretation of such activity, but it was not until around 1960
that specific condemnation of such an approach emerged.
In the second phase, the primary critic of the application of standard
multivariate analysis to compositional data was the geologist Felix Chayes
(1960), whose main criticism was in the interpretation of product-moment
correlation between components of a geochemical composition, with negative
bias the distorting factor from the viewpoint of any sensible interpretation.
For this problem of negative bias, often referred to as the closure problem,
Sarmanov and Vistelius (1959) supplemented the Chayes criticism in geolog-
ical applications and Mosimann (1962) drew the attention of biologists to it.
However, even conscious researchers, instead of working towards an appro-
priate methodology, adopted what can only be described as a pathological
1
2 Chapter 1. Introduction
The notes presented here correspond to the fourth phase. They pretend to sum-
marise the state-of-the-art in the staying-in-the-simplex approach. Therefore, the first
part will be devoted to the algebraic-geometric structure of the simplex, which we call
Aitchison geometry.
4 Chapter 1. Introduction
Chapter 2
However, this definition does not include compositions in e.g. meq/L. Therefore, a more
general definition, together with its interpretation, is given in Section 2.2.
Definition 2.3. For any vector of D real positive components
z = [z1 , z2 , . . . , zD ] ∈ RD
+
5
6 Chapter 2. Sample space
The result is the same vector rescaled so that the sum of its components is κ. This
operation is required for a formal definition of subcomposition. Note that κ depends
on the units of measurement: usual values are 1 (proportions), 100 (%), 106 (ppm) and
109 (ppb).
Definition 2.4. Given a composition x, a subcomposition xs with s parts is obtained
applying the closure operation to a subvector [xi1 , xi2 , . . . , xis ] of x. Subindexes i1 , . . . , is
tell us which parts are selected in the subcomposition, not necessarily the first s ones.
Very often, compositions contain many variables; e.g., the major oxide bulk compo-
sition of igneous rocks have around 10 elements, and they are but a few of the total
possible. Nevertheless, one seldom represents the full sample. In fact, most of the
applied literature on compositional data analysis (mainly in geology) restrict their fig-
ures to 3-part (sub)compositions. For 3 parts, the simplex is an equilateral triangle, as
the one represented in Figure 2.1 left, with vertices at A = [κ, 0, 0], B = [0, κ, 0] and
C = [0, 0, κ]. But this is commonly visualized in the form of a ternary diagram—which
is an equivalent representation—. A ternary diagram is an equilateral triangle such
that a generic sample p = [pa , pb , pc ] will plot at a distance pa from the opposite side
of vertex A, at a distance pb from the opposite side of vertex B, and at a distance pc
from the opposite side of vertex C, as shown in Figure 2.1 right. The triplet [pa , pb , pc ]
is commonly called the barycentric coordinates of p, easily interpretable but useless in
plotting (plotting them would yield the three-dimensional left-hand plot of Figure 2.1).
What is needed (to get the right-hand plot of the same figure) is the expression of the
2.2 Principles of compositional analysis 7
• F was unchanged, but 25 gr Q were depleted from the sandstone and at the same
time 2 gr R were added (for instance, because Q was better cemented in the
sandstone, thus it tends to form coarser grains),
The first two cases yield final masses of [53, 76, 14] gr, respectively [28, 41, 8] gr. In a
purely compositional data set, we do not know if we added or subtracted mass from
the sandstone to the sand. Thus, we cannot decide which of these cases really oc-
curred. Without further (non-compositional) information, there is no way to distin-
guish between [53, 76, 14] gr and [28, 41, 8] gr, as we only have the value of the sand
composition after closure. Closure is a projection of any point in the positive orthant
of D-dimensional real space onto the simplex. All points on a ray starting at the
origin (e.g., [53, 76, 14] and [28, 41, 8]) are projected onto the same point of S D (e.g.,
[37, 53, 10] %). We say that the ray is an equivalence class and the point on S D a rep-
resentant of the class: Figure 2.2 shows this relationship. Moreover, if we change the
Figure 2.2: Representation of the compositional equivalence relationship. A represents the original
sandstone composition, B the final sand composition, F the amount of each part if feldspar was added
to the system (first hypothesis), and Q the amount of each part if quartz was depleted from the system
(second hypothesis). Note that the points B, Q and F are compositionally equivalent.
units of our data (for instance, from % to ppm), we simply multiply all our points by
the constant of change of units, moving them along their rays to the intersections with
another triangle, parallel to the plotted one.
Definition 2.5. Two vectors of D positive real components x, y ∈ RD + (xi , yi ≥ 0 for all
i = 1, 2, . . . , D), are compositionally equivalent if there exists a positive scalar λ ∈ R+
such that x = λ · y and, equivalently, C(x) = C(y).
It is highly reasonable to ask our analyses to yield the same result, independently of
the value of λ. This is what Aitchison (1986) called scale invariance:
2.3 Exercises 9
Definition 2.6. a function f (·) is scale-invariant if for any positive real value λ ∈ R+
and for any composition x ∈ S D , the function satisfies f (λx) = f (x), i.e. it yields the
same result for all vectors compositionally equivalent.
This can only be achieved if f (·) is a function only of log-ratios of the parts in x
(equivalently, of ratios of parts) (Aitchison, 1997; Barceló-Vidal et al., 2001).
• The distance measured between two full compositions must be greater (or at least
equal) then the distance between them when considering any subcomposition.
This particular behaviour of the distance is called subcompositional dominance.
Excercise 2.4 proves that the Euclidean distance between compositional vectors
does not fulfill this condition, and is thus ill-suited to measure distance between
compositions.
• If we erase a non-informative part, our results should not change; for instance if
we have available hydrogeochemical data from a source, and we are interested in
classifying the kind of rocks that water washed, we will mostly use the relations
between some major oxides and ions (SO2+ −
4 , HCO3 , Cl , to mention a few), and
−
we should get the same results working with meq/L (including implicitly water
content), or in weight percent of the ions of interest.
10 Chapter 2. Sample space
2.3 Exercises
Exercise 2.1. If data have been measured in ppm, what is the value of the constant
κ?
Exercise 2.2. Plot a ternary diagram using different values for the constant sum κ.
Exercise 2.3. Verify that data in table 2.1 satisfy the conditions for being composi-
tional. Plot them in a ternary diagram.
sample 1 2 3 4 5 6 7 8 9 10
x1 79.07 31.74 18.61 49.51 29.22 21.99 11.74 24.47 5.14 15.54
x2 12.83 56.69 72.05 15.11 52.36 59.91 65.04 52.53 38.39 57.34
x3 8.10 11.57 9.34 35.38 18.42 18.10 23.22 23.00 56.47 27.11
sample 11 12 13 14 15 16 17 18 19 20
x1 57.17 52.25 77.40 10.54 46.14 16.29 32.27 40.73 49.29 61.49
x2 3.81 23.73 9.13 20.34 15.97 69.18 36.20 47.41 42.74 7.63
x3 39.02 24.02 13.47 69.12 37.89 14.53 31.53 11.86 7.97 30.88
Exercise 2.4. Compute the Euclidean distance between the first two vectors of table
2.1. Imagine we originally measured a fourth variable x4 , which was constant for all
samples, and equal to 5%. Take the first two vectors, close them to sum up to 95%, add
the fourth variable to them (so that they sum up to 100%) and compute the Euclidean
distance between the closed vectors. If the Euclidean distance is subcompositionally
dominant, the distance measured in 4 parts must be greater or equal to the distance
measured in the 3 part subcomposition.
Chapter 3
Intuitively we would say that the difference between [5, 65, 30] and [10, 60, 30] is not
the same as the difference between [50, 20, 30] and [55, 15, 30]. The Euclidean distance
between them is certainly the same, as there is a difference of 5 units both between
the first and the second components, but in the first case the proportion in the first
component is doubled, while in the second case the relative increase is about 10%, and
this relative difference seems more adequate to describe compositional variability.
This is not the only reason for discarding Euclidean geometry as a proper tool for
analysing compositional data. Problems might appear in many situations, like those
where results end up outside the sample space, e.g. when translating compositional vec-
tors, or computing joint confidence regions for random compositions under assumptions
of normality, or using hexagonal confidence regions. This last case is paradigmatic, as
such hexagons are often naively cut when they lay partly outside the ternary diagram,
and this without regard to any probability adjustment. This kind of problems are not
just theoretical: they are practical and interpretative.
What is needed is a sensible geometry to work with compositional data. In the
simplex, things appear not as simple as we feel they are in real space, but it is possible
to find a way of working in it that is completely analogous. First of all, we can define
11
12 Chapter 3. Aitchison geometry
two operations which give the simplex a vector space structure. The first one is the
perturbation operation, which is analogous to addition in real space, the second one is
the power transformation, which is analogous to multiplication by a scalar in real space.
Both require in their definition the closure operation; recall that closure is nothing else
but the projection of a vector with positive components onto the simplex. Second, we
can obtain a linear vector space structure, and thus a geometry, on the simplex. We just
add an inner product, a norm and a distance to the previous definitions. With the inner
product we can project compositions onto particular directions, check for orthogonality
and determine angles between compositional vectors; with the norm we can compute
the length of a composition; the possibilities of a distance should be clear. With all
together we can operate in the simplex in the same way as we operate in real space.
x ⊕ y = C [x1 y1 , x2 y2 , . . . , xD yD ] .
1. commutative property: x ⊕ y = y ⊕ x;
3. neutral element:
1 1 1
n = C [1, 1, . . . , 1] = , ,..., ;
D D D
n is the baricenter of the simplex and is unique;
3.3 Inner product, norm and distance 13
Figure 3.1: Left: Perturbation of initial compositions (◦) by p = [0.1, 0.1, 0.8] resulting in compo-
sitions (⋆). Right: Power transformation of compositions (⋆) by α = 0.2 resulting in compositions
(◦).
4. inverse of x: x−1 = C x−1 −1 −1
1 , x2 , . . . , xD ; thus, x ⊕ x
−1
= n. By analogy with
standard operations in real space, we will write x ⊕ y−1 = x ⊖ y.
Property 3.2. The power transformation satisfies the properties of an external product.
For x, y ∈ S D , α, β ∈ R it holds
1. associative property: α ⊙ (β ⊙ x) = (α · β) ⊙ x;
Note that the closure operation cancels out any constant and, thus, the closure
constant itself is not important from a mathematical point of view. This fact allows
us to omit in intermediate steps of any computation the closure without problem. It
has also important implications for practical reasons, as shall be seen during simplicial
principal component analysis. We can express this property for z ∈ RD + and x ∈ S
D
as
x ⊕ (α ⊙ z) = x ⊕ (α ⊙ C(z)). (3.1)
Nevertheless, one should be always aware that the closure constant is very important
for the correct interpretation of the units of the problem at hand. Therefore, controlling
for the right units should be the last step in any analysis.
14 Chapter 3. Aitchison geometry
In practice, alternative but equivalent expressions of the inner product, norm and
distance may be useful. Two possible alternatives of the inner product follow:
D−1 D D D
! D !
1 X X xi yi X 1 X X
hx, yia = ln ln = ln xi ln xj − ln xj ln xk ,
D i=1 j=i+1 xj yj i<j
D j=1 k=1
P PD−1 PD
where the notation i<j means exactly i=1 j=i+1 .
D
To refer to the properties of (S , ⊕, ⊙) as an Euclidean linear vector space, we shall
talk globally about the Aitchison geometry on the simplex, and in particular about the
Aitchison distance, norm and inner product. Note that in mathematical textbooks,
such a linear vector space is called either real Euclidean space or finite dimensional real
Hilbert space.
The algebraic-geometric structure of S D satisfies standard properties, like compati-
bility of the distance with perturbation and power transformation, i.e.
da (p ⊕ x, p ⊕ y) = da (x, y), da (α ⊙ x, α ⊙ y) = |α|da(x, y),
for any x, y, p ∈ S D and α ∈ R. For a discussion of these and other properties, see
(Billheimer et al., 2001) or (Pawlowsky-Glahn and Egozcue, 2001). For a comparison
with other measures of difference obtained as restrictions of distances in RD to S D , see
(Martı́n-Fernández et al., 1998; Martı́n-Fernández et al., 1999; Aitchison et al., 2000;
Martı́n-Fernández, 2001). The Aitchison distance is subcompositionally coherent, as all
this set of operations induce the same linear vector space structure in the subspace cor-
responding to the subcomposition. Finally, the distance is subcomposionally dominant,
as shown in Exercise 3.7.
3.4. GEOMETRIC FIGURES 15
x y x y
Figure 3.2: Orthogonal grids of compositional lines in S 3 , equally spaced, 1 unit in Aitchison distance
(Def. 3.5). The grid in the right is rotated 45o with respect to the grid in the left.
in a ternary diagram, forming a square, orthogonal grid of side equal to one Aitchison
distance unit. Recall that parallel lines have the same leading vector, but different
starting points, like for instance y1 = x1 ⊕ (α ⊙ x) and y2 = x2 ⊕ (α ⊙ x), while
orthogonal lines are those for which the inner product of the leading vectors is zero,
i.e., for y1 = x0 ⊕ (α1 ⊙ x1 ) and y2 = x0 ⊕ (α2 ⊙ x2 ), with x0 their intersection point
and x1 , x2 the corresponding leading vectors, it holds hx1 , x2 ia = 0. Thus, orthogonal
means here that the inner product given in Definition 3.3 of the leading vectors of two
lines, one of each family, is zero, and one Aitchison distance unit is measured by the
distance given in Definition 3.5.
Once we have a well defined geometry, it is straightforward to define any geometric
figure we might be interested in, like for instance circles, ellipses, or rhomboids, as
illustrated in Figure 3.3.
3.5 Exercises
Exercise 3.1. Consider the two vectors [0.7, 0.4, 0.8] and [0.2, 0.8, 0.1]. Perturb one
vector by the other with and without previous closure. Is there any difference?
Exercise 3.2. Perturb each sample of the data set given in Table 2.1 with x1 =
C [0.7, 0.4, 0.8] and plot the initial and the resulting perturbed data set. What do you
observe?
16 Chapter 3. Aitchison geometry
x1
x2 x3
Figure 3.3: Circles and ellipses (left) and perturbation of a segment (right) in S 3 .
Exercise 3.3. Apply the power transformation with α ranging from −3 to +3 in steps
of 0.5 to x1 = C [0.7, 0.4, 0.8] and plot the resulting set of compositions. Join them by
a line. What do you observe?
Exercise 3.4. Perturb the compositions obtained in Ex. 3.3 by x2 = C [0.2, 0.8, 0.1].
What is the result?
Exercise 3.5. Compute the Aitchison inner product of x1 = C [0.7, 0.4, 0.8] and x2 =
C [0.2, 0.8, 0.1]. Are they orthogonal?
Exercise 3.6. Compute the Aitchison norm of x1 = C [0.7, 0.4, 0.8] and call it a. Apply
to x1 the power transformation α ⊙ x1 with α = 1/a. Compute the Aitchison norm of
the resulting composition. How do you interpret the result?
Exercise 3.7. Re-do Exercise 2.4, but using the Aitchison distance given in Definition
3.5. Is it subcompositionally dominant?
Exercise 3.8. In a 2-part composition x = [x1 , x2 ], simplify the formula for the Aitchi-
son distance, taking x2 = 1 − x1 (so, using κ = 1). Use it to plot 7 equally-spaced
points in the segment (0, 1) = S 2 , from x1 = 0.014 to x1 = 0.986.
Exercise 3.9. In a mineral assemblage, several radioactive isotopes have been mea-
sured, obtaining [238 U,232 Th,40 K] = [150, 30, 110]ppm. Which will be the composition
after ∆t = 109 years? And after another ∆t years? Which was the composition ∆t
years ago? And ∆t years before that? Close these 5 compositions and represent them
in a ternary diagram. What do you see? Could you write the evolution as an equation?
(Half-life disintegration periods: [238 U,232 Th,40 K] = [4.468; 14.05; 1.277] · 109 years)
Chapter 4
Coordinate representation
4.1 Introduction
J. Aitchison (1986) used the fact that for compositional data size is irrelevant —as
interest lies in relative proportions of the components measured— to introduce trans-
formations based on ratios, the essential ones being the additive log-ratio transformation
(alr) and the centred log-ratio transformation (clr). Then, he applied classical statistical
analysis to the transformed observations, using the alr transformation for modelling, and
the clr transformation for those techniques based on a metric. The underlying reason
was, that the alr transformation does not preserve distances, whereas the clr transfor-
mation preserves distances but leads to a singular covariance matrix. In mathematical
terms, we say that the alr transformation is an isomorphism, but not an isometry, while
the clr transformation is an isometry, and thus also an isomorphism, but between S D
and a subspace of RD , leading to degenerate distributions. Thus, Aitchison’s approach
opened up a rigourous strategy, but care had to be applied when using either of both
transformations.
Using the Euclidean vector space structure, it is possible to give an algebraic-
geometric foundation to his approach, and it is possible to go even a step further.
Within this framework, a transformation of coefficients is equivalent to express obser-
vations in a different coordinate system. We are used to work in an orthogonal system,
known as a Cartesian coordinate system; we know how to change coordinates within
this system and how to rotate axis. But neither the clr nor the alr transformations
can be directly associated with an orthogonal coordinate system in the simplex, a fact
that lead Egozcue et al. (2003) to define a new transformation, called ilr (for isometric
logratio) transformation, which is an isometry between S D and RD−1 , thus avoiding
the drawbacks of both the alr and the clr. The ilr stands actually for the association
of coordinates with compositions in an orthonormal system in general, and this is the
framework we are going to present here, together with a particular kind of coordinates,
named balances, because of their usefulness for modelling and interpretation.
17
18 Chapter 4. Coordinate representation
and this is the way we are used to interpret it. The problem is, that the set of vectors
{~e1 ,~e2 , . . . ,~eD } is neither a generating system nor a basis with respect to the vector
space structure of S D defined in Chapter 3. In fact, not every combination of coefficients
gives an element of S D (negative and zero values are not allowed), and the ~ei do not
belong to the simplex as defined in Equation (2.1). Nevertheless, in many cases it
is interesting to express results in terms of compositions, so that interpretations are
feasible in usual units, and therefore one of our purposes is to find a way to state
statistically rigourous results in this coordinate system.
where in each wi the number e is placed in the i-th column, and the operation exp(·) is
assumed to operate component-wise on a vector. In fact, taking into account Equation
(3.1) and the usual rules of precedence for operations in a vector space, i.e., first the
external operation, ⊙, and afterwards the internal operation, ⊕, any vector x ∈ S D can
be written
D
M
x = ln xi ⊙ wi =
i=1
= ln x1 ⊙ [e, 1, . . . , 1] ⊕ ln x2 ⊙ [1, e, . . . , 1] ⊕ · · · ⊕ ln xD ⊙ [1, 1, . . . , e] .
It is known that the coefficients with respect to a generating system are not unique;
thus, the following equivalent expression can be used as well,
D
M xi
x = ln ⊙ wi =
i=1
g(x)
x1 xD
= ln ⊙ [e, 1, . . . , 1] ⊕ · · · ⊕ ln ⊙ [1, 1, . . . , e] ,
g(x) g(x)
4.3. Generating systems 19
where !1/D !
D D
Y 1 X
g(x) = xi = exp ln xi ,
i=1
D i=1
is the component-wise geometric mean of the composition. One recognises in the coef-
ficients of this second expression the centred logratio transformation defined by Aitchi-
son (1986). Note that we could indeed replace the denominator by any constant. This
non-uniqueness is consistent with the concept of compositions as equivalence classes
(Barceló-Vidal et al., 2001).
We will denote by clr the transformation that gives the expression of a composition
in centred logratio coefficients
x1 x2 xD
clr(x) = ln , ln , . . . , ln = ξ. (4.3)
g(x) g(x) g(x)
The inverse transformation, which gives us the coefficients in the canonical basis of real
space, is then
clr−1 (ξ) = C [exp(ξ1 ), exp(ξ2 ), . . . , exp(ξD )] = x. (4.4)
The centred logratio transformation is symmetrical in the components, but the price is
a new constraint on the transformed sample: the sum of the components has to be zero.
This means that the transformed sample will lie on a plane, which goes through the ori-
gin of RD and is orthogonal to the vector of unities [1, 1, . . . , 1]. But, more importantly,
it means also that for random compositions the covariance matrix of ξ is singular, i.e.
the determinant is zero. Certainly, generalised inverses can be used in this context when
necessary, but not all statistical packages are designed for it and problems might arise
during computation. Furthermore, clr coefficients are not subcompositionally coherent,
because the geometric mean of the parts of a subcomposition g(xs ) is not necessarily
equal to that of the full composition, and thus the clr coefficients are in general not the
same.
A formal definition of the clr coefficients is the following.
Definition 4.1. For a composition x ∈ S D , the clr coefficients are the components of
ξ = [ξ1 , ξ2 , . . . , ξD ] = clr(x), the unique vector satisfying
D
X
−1
x = clr (ξ) = C (exp(ξ)) , ξi = 0 .
i=1
Although the clr coefficients are not coordinates with respect to a basis of the sim-
plex, they have very important properties. Among them the translation of operations
and metrics from the simplex into the real space deserves special attention. Denote or-
dinary distance, norm and inner product in RD−1 by d(·, ·), k · k, and h·, ·i respectively.
The following property holds.
Property 4.1. Consider xk ∈ S D and real constants α, β; then
It implies that the (D − 1, D)-matrix Ψ satisfies ΨΨ′ = ID−1 , being ID−1 the identity
matrix of dimension D − 1. When the product of these matrices is reversed, then
Ψ′ Ψ = ID − (1/D)1′D 1D , with ID the identity matrix of dimension D, and 1D a D-
row-vector of ones; note this is a matrix of rank D − 1. The compositions of the basis
are recovered from Ψ using clr−1 in each row of the matrix. Recall that these rows of
Ψ also add up to 0 because they are clr coefficients (see Definition 4.1).
Once an orthonormal basis has been chosen, a composition x ∈ S D is expressed as
D−1
M
x= x∗i ⊙ ei , x∗i = hx, ei ia , (4.6)
i=1
where x∗ = x∗1 , x∗2 , . . . , x∗D−1 is the vector of coordinates of x with respect to the
selected basis. The function ilr : S D → RD−1 , assigning the coordinates x∗ to x has
4.4 Orthonormal coordinates 21
A suitable algorithm to recover x from its coordinates x∗ consists of the following steps:
(i) construct the clr-matrix of the basis, Ψ; (ii) carry out the matrix product x∗ Ψ; and
(iii) apply clr−1 to obtain x.
There are some ways to define orthonormal bases in the simplex. The main criterion
for the selection of an orthonormal basis is that it enhances the interpretability of
the representation in coordinates. For instance, when performing principal component
analysis an orthogonal basis is selected so that the first coordinate (principal component)
represents the direction of maximum variability, etc. Particular cases deserving our
attention are those bases linked to a sequential binary partition of the compositional
vector (Egozcue and Pawlowsky-Glahn, 2005). The main interest of such bases is that
they are easily interpreted in terms of grouped parts of the composition. The Cartesian
coordinates of a composition in such a basis are called balances and the compositions of
the basis balancing elements. A sequential binary partition is a hierarchy of the parts of
a composition. In the first order of the hierarchy, all parts are split into two groups. In
22 Chapter 4. Coordinate representation
Table 4.1: Example of sign matrix, used to encode a sequential binary partition and build an
orthonormal basis. The lower part of the table shows the matrix Ψ of the basis.
order x1 x2 x3 x4 x5 x6 r s
1 +1 +1 −1 −1 +1 +1 4 2
2 +1 −1 0 0 −1 −1 1 3
3 0 +1 0 0 −1 −1 1 2
4 0 0 0 0 +1 −1 1 1
5 0 0 −1 +1 0 0 1 1
1 + √112 + √112 − √13 − √13 + √112 + √112
√
2 + 23 − √112 0 0 − √112 − √112
√
3 0 + √23 0 0 − √16 − √16
4 0 0 0 0 + √12 − √12
5 0 0 + √12 0 0 − √12
the following steps, each group is in turn split into two groups, and the process continues
until all groups have a single part, as illustrated in Table 4.1. For each order of the
partition, one can define the balance between the two sub-groups formed at that level:
if i1 , i2 , . . . , ir are the r parts of the first sub-group (coded by +1), and j1 , j2 , . . . , js the
s parts of the second (coded by −1), the balance is defined as the normalised logratio
of the geometric mean of each group of parts:
This means that, for the i-th balance, the parts receive a weight of either
r r
1 rs 1 rs
a+ = + , a− = − or a0 = 0, (4.10)
r r+s s r+s
a+ for those in the numerator, a− for those in the denominator, and a0 for those not
involved in that splitting. The balance is then
D
X
bi = aij ln xj ,
j=1
where aij equals a+ if the code, at the i-th order partition, is +1 for the j-th part; the
value is a− if the code is −1; and a0 = 0 if the code is null, using the values of r and s
at the i-th order partition. Note that the matrix with entries aij is just the matrix Ψ,
as shown in the lower part of Table 4.1.
4.4 Orthonormal coordinates 23
Example 4.1. In Egozcue et al. (2003) an orthonormal basis of the simplex was
obtained using a Gram-Schmidt technique. It corresponds to the sequential binary
partition shown in Table 4.2. The main feature is that the entries of the Ψ matrix can
Table 4.2: Example of sign matrix for D = 5, used to encode a sequential binary partition in a
standard way. The lower part of the table shows the matrix Ψ of the basis.
level x1 x2 x3 x4 x5 r s
1 +1 +1 +1 +1 −1 4 1
2 +1 +1 +1 −1 0 3 1
3 +1 +1 −1 0 0 2 1
4 +1 −1 0 0 0 1 1
1 + √120 + √120 + √120 + √120 − √25
√
2 + √112 + √112 + √112 − √34 0
√
3 + √16 + √16 − √23 0 0
4 + √12 − √12 0 0 0
be easily expressed as
s
1
Ψij = aji = + , j ≤D−i ,
(D − i)(D − i + 1)
r
D−i
Ψij = aji = − , j =D−i ;
D−i+1
and Ψij = 0 otherwise. This matrix is closely related to the so-called Helmert matrices.
The interpretation of balances relays on some of its properties. The first one is the
expression itself, specially when using geometric means in the numerator and denomi-
nator as in
(x1 · · · xr )1/r
r
rs
b= ln .
r + s (xr+1 · · · xD )1/s
The geometric means are central values of the parts in each group of parts; its ratio
measures the relative weight of each group; the logarithm provides the appropriate
scale; and the square root coefficient is a normalising constant which allows to compare
numerically different balances. A positive balance means that, in (geometric) mean,
the group of parts in the numerator has more weight in the composition than the group
in the denominator (and conversely for negative balances).
A second interpretative element is related to the intuitive idea of balance. Imagine
that in an election, the parties have been divided into two groups, the left and the right
24 Chapter 4. Coordinate representation
wing ones (there are more than one party in each wing). If, from a journal, you get only
the percentages within each group, you are unable to know which wing, and obviously
which party, has won the elections. You probably are going to ask for the balance
between the two wings as the information you need to complete the actual state of the
elections. The balance, as defined here, permits you to complete the information. The
balance is the remaining relative information about the elections once the information
within the two wings has been removed. To be more precise, assume that the parties
are six and the composition of the votes is x ∈ S 6 ; assume the left wing contested
with 4 parties represented by the group of parts {x1 , x2 , x5 , x6 } and only two parties
correspond to the right wing {x3 , x4 }. Consider the sequential binary partition in Table
4.1. The first partition just separates the two wings and thus the balance informs
us about the equilibrium between the two wings. If one leaves out this balance, the
remaining balances inform us only about the left wing (balances 3,4) and only about
the right wing (balance 5). Therefore, to retain only balance 5 is equivalent to know the
relative information within the subcomposition called right wing. Similarly, balances
2, 3 and 4 only inform about what happened within the left wing. The conclusion is
that the balance 1, the forgotten information in the journal, does not inform us about
relations within the two wings: it only conveys information about the balance between
the two groups representing the wings.
Many questions can be stated which can be handled easily using the balances. For
instance, suppose we are interested in the relationships between the parties within
the left wing and, consequently, we want to remove the information within the right
wing. A traditional approach to this is to remove parts x3 and x4 and then close the
remaining subcomposition. However, this is equivalent to project the composition of 6
parts orthogonally on the subspace associated with the left wing, what is easily done
by setting b5 = 0. If we do so, the obtained projected composition is
i.e. each part in the right wing has been substituted by the geometric mean within
the right wing. This composition still has the information on the left-right balance,
b1 . If we are also interested in removing it (b1 = 0) the remaining information will be
only that within the left-wing subcomposition which is represented by the orthogonal
projection
with g(x1 , x2 , x5 , x6 ) = (x1 , x2 , x5 , x6 )1/4 . The conclusion is that the balances can be very
useful to project compositions onto special subspaces just by retaining some balances
and making other ones null.
4.5. WORKING IN COORDINATES 25
and we can think about perturbation as having the same properties in the simplex
as translation has in real space, and of the power transformation as having the same
properties as multiplication.
Furthermore,
da (x, y) = d(h(x), h(y)) = d(x∗ , y∗ ),
where d stands for the usual Euclidean distance in real space. This means that, when
performing analysis of compositional data, results that could be obtained using com-
positions and the Aitchison geometry are exactly the same as those obtained using the
coordinates of the compositions and using the ordinary Euclidean geometry. This latter
possibility reduces the computations to the ordinary operations in real spaces thus fa-
cilitating the applied procedures. The duality of the representation of compositions, in
the simplex and by coordinates, introduces a rich framework where both representations
can be interpreted to extract conclusions from the analysis (see Figures 4.1, 4.2, 4.3,
and 4.4, for illustration). The price is that the basis selected for representation should
be carefully selected for an enhanced interpretation.
Working in coordinates can be also done in a blind way, just selecting a default
basis and coordinates and, when the results in coordinates are obtained, translating the
results back in the simplex for interpretation. This blind strategy, although acceptable,
hides to the analyst features of the analysis that may be relevant. For instance, when
detecting a linear dependence of compositional data on an external covariate, data can
be expressed in coordinates and then the dependence estimated using standard linear
regression. Back in the simplex, data can be plotted with the estimated regression line
in a ternary diagram. The procedure is completely acceptable but the visual picture of
the residuals and a possible non-linear trend in them can be hidden or distorted in the
ternary diagram. A plot of the fitted line and the data in coordinates may reveal new
interpretable features.
26 Chapter 4. Coordinate representation
2
x1
n
-1
-2
x2 x3 -2 -1 0 1 2
3
2
2
2
1
1
1
0
0
-1
0
-2
-1
-1
-2 -3
-3 -2
x2 x3 -4 -3 -2 -1 0 1 2 3 4
0
-2 -1 0 1 2 3
-1
-2
x1 4
0
-4 -2 0 2 4
n
-2
x2 x3 -4
There is one thing that is crucial in the proposed approach: no zero values are al-
lowed, as neither division by zero is admissible, nor taking the logarithm of zero. We
are not going to discuss this subject here. Methods on how to approach the prob-
lem have been discussed by Aitchison (1986), Fry et al. (1996), Martı́n-Fernández
et al. (2000), Martı́n-Fernández (2001), Aitchison and Kay (2003), Bacon-Shone (2003),
Martı́n-Fernández et al. (2003) and Martı́n-Fernández et al. (2003).
In fact, they are coordinates in an oblique basis, something that affects distances if
the usual Euclidean distance is computed from the alr coordinates. This approach is
frequent in many applied sciences and should be avoided (see for example (Albarède,
1995), p. 42).
v11 v12 · · · v1D
ℓ
v21 v22 · · · v2D
M
= [a1 , a2 , . . . , aℓ ] ⊙ .. .. .. = ai ⊙ vi .
..
. . . . i=1
vℓ1 vℓ2 · · · vℓD
In coordinates this simplicial matrix product takes the form of a linear combination of
the coordinate vectors. In fact, if h is the function assigning the coordinates,
ℓ
! ℓ
M X
h(a ⊙ V) = h ai ⊙ vi = ai h(vi ) .
i=1 i=1
Substituting clr(x) by its expression as a function of the logarithms of parts, the com-
position y is "D #
D D
a
Y a Y a
Y
y=C xj j1 , xj j2 , . . . , xj jD ,
j=1 j=1 j=1
which, taking into account that products and powers match the definitions of ⊕ and ⊙,
deserves the definition
y = x ◦ A = x ◦ (Ψ′ A∗ Ψ) , (4.13)
where ◦ is the perturbation-matrix product representing an endomorphism in the sim-
plex. This matrix product in the simplex should not be confused with that defined
between a vector of scalars and a matrix of compositions and denoted by ⊙.
An important conclusion is that endomorphisms in the simplex are represented by
matrices with a peculiar structure given by A = Ψ′ A∗ Ψ, which have some remarkable
properties:
30 Chapter 4. Coordinate representation
(d) the identity endomorphism corresponds to A∗ = ID−1 , the identity in RD−1 , and
to A = Ψ′ Ψ = ID − (1/D)1′D 1D , where ID is the identity (D, D)-matrix, and 1D
is a row vector of ones.
where, A0 satisfies the above conditions, ~ei = [0, 0, . . . , 1, . . . , 0, 0] is the i-th row-vector
in the canonical basis of RD , and ci , dj are arbitrary constants. Each additive term
in this expression adds a constant row or column, being the remaining entries null. A
simple development proves that A∗ = ΨAΨ′ = ΨA0 Ψ′ . This means that x◦A = x◦A0 ,
i.e. A, A0 define the same linear transformation in the simplex. To obtain A0 from A,
first compute A∗ = ΨAΨ′ and then compute
where the second member is the required computation and the third member explains
that the computation is equivalent to add constant rows and columns to A.
4.8 Exercises
Exercise 4.1. Consider the data set given in Table 2.1. Compute the clr coefficients
(Eq. 4.3) to compositions with no zeros. Verify that the sum of the transformed
components equals zero.
Exercise 4.2. Using the sign matrix of Table 4.1 and Equation (4.10), compute the
coefficients for each part at each level. Arrange them in a 6 × 5 matrix. Which are the
vectors of this basis?
Using the binary partition of Table 4.1 and Eq. (4.9), compute its 5 balances. Compare
with what you obtained in the preceding exercise.
Exercise 4.6. Six parties have contested elections. In five districts they have obtained
the votes in Table 4.3. Parties are divided into left (L) and right (R) wings. Is there
some relationship between the L-R balance and the relative votes of R1-R2? Select
an adequate sequential binary partition to analyse this question and obtain the cor-
responding balance coordinates. Find the correlation matrix of the balances and give
an interpretation to the maximum correlated two balances. Compute the distances be-
tween the five districts; which are the two districts with the maximum and minimum
inter-distance. Are you able to distinguish some cluster of districts?
Exercise 4.7. Consider the data set given in Table 2.1. Check the data for zeros.
Apply the alr transformation to compositions with no zeros. Plot the transformed data
in R2 .
Exercise 4.8. Consider the data set given in table 2.1 and take the components in a
different order. Apply the alr transformation to compositions with no zeros. Plot the
transformed data in R2 . Compare the result with those obtained in Exercise 4.7.
32 Chapter 4. Coordinate representation
L1 L2 R1 R2 L3 L4
d1 10 223 534 23 154 161
d2 43 154 338 43 120 123
d3 3 78 29 702 265 110
d4 5 107 58 598 123 92
d5 17 91 112 487 80 90
Exercise 4.9. Consider the data set given in table 2.1. Apply the ilr transformation
to compositions with no zeros. Plot the transformed data in R2 . Compare the result
with the scatterplots obtained in exercises 4.7 and 4.8 using the alr transformation.
Exercise 4.10. Compute the alr and ilr coordinates, as well as the clr coefficients of
the 6-part composition
Exercise 4.11. Consider the 6-part composition of the preceding exercise. Using the
binary partition of Table 4.1 and Equation (4.9), compute its 5 balances. Compare with
the results of the preceding exercise.
Chapter 5
In this chapter we are going to address the first steps that should be performed when-
ever the study of a compositional data set X is initiated. Essentially, these steps are
five. They consist in (1) computing descriptive statistics, i.e. the centre and variation
matrix of a data set, as well as its total variability; (2) centring the data set for a better
visualisation of subcompositions in ternary diagrams; (3) looking at the biplot of the
data set to discover patterns; (4) defining an appropriate representation in orthonormal
coordinates and computing the corresponding coordinates; and (5) compute the sum-
mary statistics of the coordinates and represent the results in a balance-dendrogram.
In general, the last two steps will be based on a particular sequential binary partition,
defined either a priori or as a result of the insights provided by the preceding three steps.
The last step consist of a graphical representation of the sequential binary partition,
including a graphical and numerical summary of descriptive statistics of the associated
coordinates.
Before starting, let us make some general considerations. The first thing in standard
statistical analysis is to check the data set for errors, and we assume this part has been
already done using standard procedures (e.g. using the minimum and maximum of each
component to check whether the values are within an acceptable range). Another, quite
different thing is to check the data set for outliers, a point that is outside the scope
of this short-course. See Barceló et al. (1994, 1996) for details. Recall that outliers
can be considered as such only with respect to a given distribution. Furthermore, we
assume there are no zeros in our samples. Zeros require specific techniques (Fry et al.,
1996; Martı́n-Fernández et al., 2000; Martı́n-Fernández, 2001; Aitchison and Kay, 2003;
Bacon-Shone, 2003; Martı́n-Fernández et al., 2003; Martı́n-Fernández et al., 2003) and
will be addressed in future editions of this short course.
33
34 Chapter 5. Exploratory data analysis
Thus, tij stands for the usual experimental variance of the log-ratio of parts i and j,
while t∗ij stands for the usual experimental variance of the normalised log-ratio of parts
i and j, so that the log ratio is a balance.
Note that
1 xi 1
= var √ ln
t∗ij = tij ,
2 xj 2
1
and thus T = 2 T. Normalised variations have squared Aitchison distance units (see
∗
Figure 3.3).
5.3 Centring and scaling 35
By definition, T and T∗ are symmetric and their diagonal will contain only zeros.
Furthermore, neither the total variance nor any single entry in both variation matrices
depend on the constant κ associated with the sample space S D , as constants cancel out
when taking ratios. Consequently, rescaling has no effect. These statistics have further
connections. From their definition, it is clear that the total variation summarises the
variation matrix in a single quantity, both in the normalised and non-normalised version,
and it is possible (and natural) to define it because all parts in a composition share a
common scale (it is by no means so straightforward to define a total variation for a
pressure-temperature random vector, for instance). Conversely, the variation matrix,
again in both versions, explains how the total variation is split among the parts (or
better, among all log-ratios).
Figure 5.1: Simulated data set before (left) and after (right) centring.
with the same relative contribution of each log-ratio in the variation array. This is a
significant difference with conventional standardisation: with real vectors, the relative
contributions variable is an artifact of the units of each variable, and most usually should
be ignored; in contrast, in compositional vectors, all parts share the same “units”, and
their relative contribution to total variation is a rich information.
where s is the rank of Z and the singular values k1 , k2 , . . . , ks are in descending order of
magnitude. Usually s = D − 1. Both matrices L and M are orthonormal. The rank-2
approximation is then obtained by simply substituting all singular values with index
larger then 2 by zero, thus keeping
k1 0 m1
Y = l1 l2 ′ ′
0 k2 m2
ℓ11 ℓ21
ℓ12 ℓ22
k1 0 m11 m12 · · · m1D
= .. .. 0 k .
. . 2 m21 m22 · · · m2D
ℓ1n ℓ2n
It must be clear from the above aspects of interpretation that the fundamental
elements of a compositional biplot are the links, not the rays as in the case of variation
diagrams for unconstrained multivariate data. The complete constellation of links,
by specifying all the relative variances, informs about the compositional covariance
structure and provides hints about subcompositional variability and independence. It
is also obvious that interpretation of the biplot is concerned with its internal geometry
and would, for example, be unaffected by any rotation or indeed mirror-imaging of the
diagram. For an illustration, see Section 5.6.
For some applications of biplots to compositional data in a variety of geological
contexts see (Aitchison, 1990), and for a deeper insight into biplots of compositional
data, with applications in other disciplines and extensions to conditional biplots, see
(Aitchison and Greenacre, 2002).
with Ψ the matrix whose rows contain the clr coefficients of the orthonormal basis
chosen (see Section 4.4 for its construction); g the centre of the dataset as defined in
Definition 5.1, and T∗ the normalised variation matrix as introduced in Definition 5.2.
There is a graphical representation, with the specific aim of representing a system of
coordinates based on a sequential binary partition: the CoDa- or balance-dendrogram
(Egozcue and Pawlowsky-Glahn, 2006; Pawlowsky-Glahn and Egozcue, 2006; Thió-
Henestrosa et al., 2007). A balance-dendrogram is the joint representation of the fol-
lowing elements:
1. a sequential binary partition, in the form of a tree structure;
5.6 Illustration
We are going to use, both for illustration and for the exercises, the data set X given
in table 5.1. They correspond to 17 samples of chemical analysis of rocks from Kilauea
Iki lava lake, Hawaii, published by Richter and Moore (1966) and cited by Rollinson
(1995).
Originally, 14 parts had been registered, but H2 O+ and H2 O− have been omitted
because of the large amount of zeros. CO2 has been kept in the table, to call attention
upon parts with some zeros, but has been omitted from the study precisely because of
the zeros. This is the strategy to follow if the part is not essential in the characterisation
of the phenomenon under study. If the part is essential and the proportion of zeros is
high, then we are dealing with two populations, one characterised by zeros in that
component and the other by non-zero values. If the part is essential and the proportion
of zeros is small, then we can look for input techniques, as explained in the beginning
of this chapter.
The centre of this data set is
g = (48.57, 2.35, 11.23, 1.84, 9.91, 0.18, 13.74, 9.65, 1.82, 0.48, 0.22) ,
the total variance is totvar[X] = 0.3275 and the normalised variation matrix T∗ is given
in Table 5.2.
5.6. Illustration 41
0.060
0.08
0.055
0.06
coordinate mean
0.050
0.04
0.045
0.02
0.040
0.00
FeO
Fe2O3
TiO2
Al2O3
SiO2
sample min
Q1
Q2
Q3
sample max
range lower limit range upper limit
Figure 5.2: Illustration of elements included in a balance-dendrogram. The left subfigure represents
a full dendrogram, and the right figure is a zoomed part, corresponding to the balance of (FeO,Fe2 O3 )
against TiO2 .
The biplot (Fig. 5.3), shows an essentially two dimensional pattern of variability,
two sets of parts that cluster together, A = [TiO2 , Al2 O3 , CaO, Na2 O, P2 O5 ] and B =
[SiO2 , FeO, MnO], and a set of one dimensional relationships between parts.
The two dimensional pattern of variability is supported by the fact that the first two
axes of the biplot reproduce about 90% of the total variance, as captured in the scree
plot in Fig. 5.3, left. The orthogonality of the link between Fe2 O3 and FeO (i.e., the
oxidation state) with the link between MgO and any of the parts in set A might help
in finding an explanation for this behaviour and in decomposing the global pattern into
two independent processes.
Concerning the two sets of parts we can observe short links between them and, at
the same time, that the variances of the corresponding log-ratios (see the normalised
variation matrix T∗ , Table 5.2) are very close to zero. Consequently, we can say that
they are essentially redundant, and that some of them could be either grouped to a
single part or simply omitted. In both cases the dimensionality of the problem would
be reduced.
Another aspect to be considered is the diverse patterns of one-dimensional variability
42 Chapter 5. Exploratory data analysis
Table 5.1: Chemical analysis of rocks from Kilauea Iki lava lake, Hawaii
SiO2 TiO2 Al2 O3 Fe2 O3 FeO MnO MgO CaO Na2 O K2 O P2 O5 CO2
48.29 2.33 11.48 1.59 10.03 0.18 13.58 9.85 1.90 0.44 0.23 0.01
48.83 2.47 12.38 2.15 9.41 0.17 11.08 10.64 2.02 0.47 0.24 0.00
45.61 1.70 8.33 2.12 10.02 0.17 23.06 6.98 1.33 0.32 0.16 0.00
45.50 1.54 8.17 1.60 10.44 0.17 23.87 6.79 1.28 0.31 0.15 0.00
49.27 3.30 12.10 1.77 9.89 0.17 10.46 9.65 2.25 0.65 0.30 0.00
46.53 1.99 9.49 2.16 9.79 0.18 19.28 8.18 1.54 0.38 0.18 0.11
48.12 2.34 11.43 2.26 9.46 0.18 13.65 9.87 1.89 0.46 0.22 0.04
47.93 2.32 11.18 2.46 9.36 0.18 14.33 9.64 1.86 0.45 0.21 0.02
46.96 2.01 9.90 2.13 9.72 0.18 18.31 8.58 1.58 0.37 0.19 0.00
49.16 2.73 12.54 1.83 10.02 0.18 10.05 10.55 2.09 0.56 0.26 0.00
48.41 2.47 11.80 2.81 8.91 0.18 12.52 10.18 1.93 0.48 0.23 0.00
47.90 2.24 11.17 2.41 9.36 0.18 14.64 9.58 1.82 0.41 0.21 0.01
48.45 2.35 11.64 1.04 10.37 0.18 13.23 10.13 1.89 0.45 0.23 0.00
48.98 2.48 12.05 1.39 10.17 0.18 11.18 10.83 1.73 0.80 0.24 0.01
48.74 2.44 11.60 1.38 10.18 0.18 12.35 10.45 1.67 0.79 0.23 0.01
49.61 3.03 12.91 1.60 9.68 0.17 8.84 10.96 2.24 0.55 0.27 0.01
49.20 2.50 12.32 1.26 10.13 0.18 10.51 11.05 2.02 0.48 0.23 0.01
that can be observed. Examples that can be visualised in a ternary diagram are Fe2 O3 ,
K2 O and any of the parts in set A, or MgO with any of the parts in set A and any of
the parts in set B. Let us select one of those subcompositions, e.g. Fe2 O3 , K2 O and
Na2 O. After closure, the samples plot in a ternary diagram as shown in Figure 5.4 and
we recognise the expected trend and two outliers corresponding to samples 14 and 15,
which require further explanation. Regarding the trend itself, notice that it is in fact a
line of isoproportion Na2 O/K2 O: thus we can conclude that the ratio of these two parts
is independent of the amount of Fe2 O3 .
As a last step, we compute the conventional descriptive statistics of the orthonormal
coordinates in a specific reference system (either a priori chosen or derived from the
previous steps). In this case, due to our knowledge of the typical geochemistry and
mineralogy of basaltic rocks, we choose a priori the set of balances of Table 5.3, where
the resulting balance will be interpreted as
Table 5.2: Normalised variation matrix of data given in table 5.1. For simplicity, only the upper
triangle is represented, omitting the first column and last row.
var( √12 ln xxji ) TiO2 Al2 O3 Fe2 O3 FeO MnO MgO CaO Na2 O K2 O P2 O5
SiO2 0.012 0.006 0.036 0.001 0.001 0.046 0.007 0.009 0.029 0.011
TiO2 0.003 0.058 0.019 0.016 0.103 0.005 0.002 0.015 0.000
Al2 O3 0.050 0.011 0.008 0.084 0.000 0.002 0.017 0.002
Fe2 O3 0.044 0.035 0.053 0.054 0.050 0.093 0.059
FeO 0.001 0.038 0.012 0.015 0.034 0.017
MnO 0.040 0.009 0.012 0.033 0.015
MgO 0.086 0.092 0.130 0.100
CaO 0.003 0.016 0.004
Na2 O 0.024 0.002
K2 O 0.014
Table 5.3: A possible sequential binary partition for the data set of table 5.1.
balance SiO2 TiO2 Al2 O3 Fe2 O3 FeO MnO MgO CaO Na2 O K2 O P2 O5
v1 0 0 0 +1 -1 0 0 0 0 0 0
v2 +1 0 -1 0 0 0 0 0 0 0 0
v3 0 +1 0 0 0 0 0 0 0 0 -1
v4 +1 -1 +1 0 0 0 0 0 0 0 -1
v5 0 0 0 0 0 0 0 +1 -1 0 0
v6 0 0 0 0 0 0 0 +1 +1 -1 0
v7 0 0 0 0 0 +1 -1 0 0 0 0
v8 0 0 0 +1 +1 -1 -1 0 0 0 0
v9 0 0 0 +1 +1 +1 +1 -1 -1 -1 0
v10 +1 +1 +1 -1 -1 -1 -1 -1 -1 -1 +1
10. importance of cation oxides (those filling the crystalline structure of minerals)
against frame oxides (those forming that structure, mainly Al and Si).
11
71 %
0.4
1.0
Fe2O3
2 8 12
0.15
0.2
7
0.5
10
16 5
Variances
Na2O
Comp.2
TiO2 9
P2O5Al2O3
0.0
0.0
6
0.10
CaO
SiO2
MnO
K2O 3
1 FeO
MgO
−0.5
90 %
−0.2
17
0.05
14
15
98 %
−1.0
4
100 %
−0.4
13
0.00
Comp.1
Figure 5.3: Biplot of data of Table 5.1 (right), and scree plot of the variances of all principal
components (left), with indication of cumulative explained variance.
with CaO (as they would probably come from different rock types, e.g. siliciclastic
against carbonate).
Using the sequential binary partition given in Table 5.3, Figure 5.5 represents the
balance-dendrogram of the sample, within the range (−3, +3). This range translates for
two part compositions to proportions of (1.4, 98.6)%; i.e. if we look at the balance MgO-
MnO the variance bar is placed at the lower extreme of the balance axis, which implies
that in this subcomposition MgO represents in average more than 98%, and MnO less
than 2%. Looking at the lengths of the several variance bars, one easily finds that the
balances P2 O5 -TiO2 and SiO2 -Al2 O3 are almost constant, as their bars are very short
and their box-plots extremely narrow. Again, the balance between the subcompositions
(P2 O5 , TiO2 ) vs. (SiO2 , Al2 O3 ) does not display any box-plot, meaning that it is above
+3 (thus, the second group of parts represents more than 98% with respect to the
first group). The distribution between K2 O, Na2 O and CaO tells us that Na2 O and
CaO keep a quite constant ratio (thus, we should interpret that there are no strong
variations in the plagioclase composition), and the ratio of these two against K2 O is
also fairly constant, with the exception of some values below the first quartile (probably,
a single value with an particularly high K2 O content). The other balances are well
equilibrated (in particular, see how centred is the proxy balance between feldspar and
5.6. Illustration 45
Figure 5.4: Plot of subcomposition (Fe2 O3 ,K2 O,Na2 O). Left: before centring. Right: after centring.
K2O
Na2O
CaO
MgO
MnO
FeO
Fe2O3
P2O5
TiO2
Al2O3
SiO2
Figure 5.5: Balance-dendrogram of data from Table 5.1 using the balances of Table 5.3.
Once the marginal empirical distribution of the balances have been analysed, we can
use the biplot to explore their relations (Figure 5.6), and the conventional covariance
or correlation matrices (Table 5.4). From these, we can see, for instance:
• The constant behaviour of v3 (balance TiO2 -P2 O5 ), with a variance below 10−4 ,
and in a lesser degree, of v5 (anortite-albite relation, or balance CaO-Na2 O).
• The orthogonality of the pairs of rays v1-v2, v1-v4, v1-v7, and v6-v8, suggests
the lack of correlation of their respective balances, confirmed by Table 5.4, where
correlations of less than ±0.3 are reported. In particular, the pair v6-v8 has a
correlation of −0.029. These facts would imply that silica saturation (v2), the
46 Chapter 5. Exploratory data analysis
11
0.20
0.4
71 %
1.0
v1 2
12 8
0.2
0.15
0.5
v8
10
5 16
v6
v7
9
Variances
Comp.2
v9 v3 v10
0.0
0.0
6
0.10
v4v2 v5
3 1
−0.5
90 %
−0.2
17
0.05
14
−1.0
15
98 %
4
−0.4
100 %
13
0.00
−1.5
−0.4 −0.2 0.0 0.2 0.4
Comp.1
Figure 5.6: Biplot of data of table 5.1 expressed in the balance coordinate system of Table 5.3
(right), and scree plot of the variances of all principal components (left), with indication of cumulative
explained variance. Compare with Figure 5.3, in particular: the scree plot, the configuration of data
points, and the links between the variables related to balances v1, v2, v3, v5 and v7.
presence of heavy minerals (v4) and the MnO-MgO balance (v7) are uncorrelated
with the oxidation state (v1); and that the type of feldspars (v6) is unrelated to
the type of mafic minerals (v8).
• The balances v9 and v10 are opposite, and their correlation is −0.936, imply-
ing that the ratio mafic oxides/feldspar oxides is high when the ratio Silica-
Alumina/cation oxides is low, i.e. mafics are poorer in Silica and Alumina.
A final comment regarding balance descriptive statistics: since the balances are
chosen due to their interpretability, we are no more just “describing” patterns here.
Balance statistics represent a step further towards modelling: all our conclusions in
these last three points heavily depend on the preliminary interpretation (=“model”) of
the computed balances.
5.7 Exercises
Exercise 5.1. This exercise pretends to illustrate the problems of classical statistics if
applied to compositional data. Using the data given in Table 5.1, compute the classical
5.7. Exercises 47
Table 5.4: Covariance (lower triangle) and correlation (upper triangle) matrices of balances
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
v1 0.047 0.120 0.341 0.111 -0.283 0.358 -0.212 0.557 0.423 -0.387
v2 0.002 0.006 -0.125 0.788 0.077 0.234 -0.979 -0.695 0.920 -0.899
v3 0.002 -0.000 0.000 -0.345 -0.380 0.018 0.181 0.423 -0.091 0.141
v4 0.003 0.007 -0.001 0.012 0.461 0.365 -0.832 -0.663 0.821 -0.882
v5 -0.004 0.000 -0.000 0.003 0.003 -0.450 -0.087 -0.385 -0.029 -0.275
v6 0.013 0.003 0.000 0.007 -0.004 0.027 -0.328 -0.029 0.505 -0.243
v7 -0.009 -0.016 0.001 -0.019 -0.001 -0.011 0.042 0.668 -0.961 0.936
v8 0.018 -0.008 0.001 -0.011 -0.003 -0.001 0.021 0.023 -0.483 0.516
v9 0.032 0.025 -0.001 0.031 -0.001 0.029 -0.069 -0.026 0.123 -0.936
v10 -0.015 -0.013 0.001 -0.017 -0.003 -0.007 0.035 0.014 -0.059 0.032
correlation coefficients between the following pairs of parts: (MnO vs. CaO), (FeO vs.
Na2 O), (MgO vs. FeO) and (MgO vs. Fe2 O3 ). Now ignore the structural oxides Al2 O3
and SiO2 from the data set, reclose the remaining variables, and recompute the same
correlation coefficients as above. Compare the results. Compare the correlation matrix
between the feldspar-constituent parts (CaO,Na2 O,K2 O), as obtained from the original
data set, and after reclosing this 3-part subcomposition.
Exercise 5.2. For the data given in Table 2.1 compute and plot the centre with the
samples in a ternary diagram. Compute the total variance and the variation matrix.
Exercise 5.3. Perturb the data given in table 2.1 with the inverse of the centre. Com-
pute the centre of the perturbed data set and plot it with the samples in a ternary
diagram. Compute the total variance and the variation matrix. Compare your results
numerically and graphically with those obtained in exercise 5.2.
Exercise 5.4. Make a biplot of the data given in Table 2.1 and give an interpretation.
Exercise 5.5. Figure 5.3 shows the biplot of the data given in Table 5.1. How would
you interpret the different patterns that can be observed in it?
Exercise 5.6. Select 3-part subcompositions that behave in a particular way in Figure
5.3 and plot them in a ternary diagram. Do they reproduce properties mentioned in
the previous description?
Exercise 5.7. Do a scatter plot of the log-ratios
1 K2 O 1 Fe2 O3
√ ln against √ ln ,
2 MgO 2 FeO
identifying each point. Compare with the biplot. Compute the total variance of the
subcomposition (K2 O,MgO,Fe2 O3 ,FeO) and compare it with the total variance of the
full data set.
48 Chapter 5. Exploratory data analysis
Exercise 5.8. How would you recast the data in table 5.1 from mass proportion of
oxides (as they are) to molar proportions? You may need the following molar weights.
Any idea of how to do that with a perturbation?
SiO2 TiO2 Al2 O3 Fe2 O3 FeO MnO
60.085 79.899 101.961 159.692 71.846 70.937
Exercise 5.9. Re-do all the descriptive analysis (and the related exercises) with the
Kilauea data set expressed in molar proportions. Compare the results.
Exercise 5.10. Compute the vector of arithmetic means of the ilr transformed data
from table 2.1. Apply the ilr−1 backtransformation and compare it with the centre.
Exercise 5.11. Take the parts of the compositions in table 2.1 in a different order.
Compute the vector of arithmetic means of the ilr transformed sample. Apply the ilr−1
backtransformation. Compare the result with the previous one.
Exercise 5.12. Centre the data set of table 2.1. Compute the vector of arithmetic
means of the ilr transformed data. What do you obtain?
Exercise 5.13. Compute the covariance matrix of the ilr transformed data set of table
2.1 before and after perturbation with the inverse of the centre. Compare both matrices.
Chapter 6
The usual way to pursue any statistical analysis after an exhaustive exploratory analysis
consists in assuming and testing distributional assumptions for our random phenomena.
This can be easily done for compositional data, as the linear vector space structure of
the simplex allows us to express observations with respect to an orthonormal basis, a
property that guarantees the proper application of standard statistical methods. The
only thing that has to be done is to perform any standard analysis on orthonormal
coefficients and to interpret the results in terms of coefficients of the orthonormal basis.
Once obtained, the inverse can be used to get the same results in terms of the canonical
basis of RD (i.e. as compositions summing up to a constant value). The justification of
this approach lies in the fact that standard mathematical statistics relies on real analysis,
and real analysis is performed on the coefficients with respect to an orthonormal basis
in a linear vector space, as discussed by Pawlowsky-Glahn (2003).
There are other ways to justify this approach coming from the side of measure theory
and the definition of density function as the Radon-Nikodým derivative of a probability
measure (Eaton, 1983), but they would divert us too far from practical applications.
Given that most multivariate techniques rely on the assumption of multivariate
normality, we will concentrate on the expression of this distribution in the context of
random compositions and address briefly other possibilities.
49
50 Chapter 6. Distributions on the simplex
mum likelihood estimates will be used, which are the vector of arithmetic means x̄∗ for
the vector of expected values, and the sample covariance matrix Sx∗ with the sample
size n as divisor. Remember that, in the case of compositional data, the estimates
are computed using the orthonormal coordinates x∗ of the data and not the original
measurements.
As we have considered coordinates x∗ , we will obtain results in terms of coefficients
of x∗ coordinates. To obtain them in terms of the canonical basis of RD we have
to backtransform whatever we compute by using the inverse transformation h−1 (x∗ ).
In particular, we can backtransform the arithmetic mean x̄∗ , which is an adequate
measure of central tendency for data which follow reasonably well a multivariate normal
distribution. But h−1 (x̄∗ ) = g, the centre of a compositional data set introduced in
Definition 5.1, which is an unbiased, minimum variance estimator of the expected value
of a random composition (Pawlowsky-Glahn and Egozcue, 2002). Also, as stated in
Aitchison (2002), g is an estimate of C [exp(E[ln(x)])], which is the theoretical definition
of the closed geometric mean, thus justifying its use.
which is equivalent to
Out of the large number of published tests, for x∗ ∈ RD−1 , Aitchison selected the
Anderson-Darling, Cramer-von Mises, and Watson forms for testing hypothesis on sam-
ples coming from a uniform distribution. We repeat them here for the sake of com-
pleteness, but only in a synthetic form. For clarity we follow the approach used in
(Pawlowsky-Glahn and Buccianti, 2002) and present each case separately; in (Aitchi-
son, 1986) an integrated approach can be found, in which the orthonormal basis selected
for the analysis comes from the singular value decomposition of the data set.
The idea behind the approach is to compute statistics which under the initial hy-
pothesis should follow a uniform distribution in each of the following three cases:
1. Compute the maximum likelihood estimates of the expected value and the vari-
ance:
n n
1X ∗ 1X ∗
µ̂i = xri , σ̂i2 = (x − µ̄i )2 .
n r=1 n r=1 ri
52 Chapter 6. Distributions on the simplex
2. Obtain from the corresponding tables or using a computer built-in function the
values ∗
xri − µ̂i
Φ = zr , r = 1, 2, ..., n,
σ̂i
where Φ(.) is the N (0; 1) cumulative distribution function.
7. Compare the results with the critical values in table 6.1. The null hypothesis
will be rejected whenever the test statistic lies in the critical region for a given
significance level, i.e. it has a value that is larger than the value given in the table.
The underlying idea is that if the observations are indeed normally distributed, then
the z(r) should be approximately the order statistics of a uniform distribution over the
interval (0, 1). The tests make such comparisons, making due allowance for the fact that
the mean and the variance are estimated. Note that to follow the van den Boogaart and
6.3. Tests of normality 53
Tolosana-Delgado (2007) approach, one should simply apply this schema to all pair-wise
log-ratios, y = log(xi /xj ), with i < j, instead to the x∗ coordinates of the observations.
A visual representation of each test can be given in the form of a plot in the unit
square of the z(r) against the associated order statistic (2r − 1)/(2n), r = 1, 2, ..., n, of
the uniform distribution (a PP plot). Conformity with normality on S D corresponds to
a pattern of points along the diagonal of the square.
3. Compute the radian angles θ̂r required to rotate the ur -axis anticlockwise about
the origin to reach the points (ur , vr ). If arctan(t) denotes the angle between − 21 π
and 21 π whose tangent is t, then
vr (1 − sgn(ur )) π (1 + sgn(ur )) (1 − sgn(vr )) π
θ̂r = arctan + + .
ur 2 4
54 Chapter 6. Distributions on the simplex
Table 6.2: Critical values for the bivariate angle test statistics.
4. Rearrange the values of θ̂r /(2π) in ascending order of magnitude to obtain the
ordered values z(r) .
8. Compare the results with the critical values in Table 6.2. The null hypothesis
will be rejected whenever the test statistic lies in the critical region for a given
significance level, i.e. it has a value that is larger than the value given in the table.
The same representation as mentioned in the previous section can be used for visual
appraisal of conformity with the hypothesis tested.
1. Compute the maximum likelihood estimates for the vector of expected values and
for the covariance matrix, as described in the previous tests.
8. Compare the results with the critical values in table 6.3. The null hypothesis
will be rejected whenever the test statistic lies in the critical region for a given
significance level, i.e. it has a value that is larger than the value given in the table.
6.4 Exercises
Exercise 6.1. Test the hypothesis of normality of the marginals of the ilr transformed
sample of table 2.1.
Exercise 6.2. Test the bivariate normality of each variable pair (x∗i , x∗j ), i < j, of the
ilr transformed sample of table 2.1.
Exercise 6.3. Test the variables of the ilr transformed sample of table 2.1 for joint
normality.
Chapter 7
Statistical inference
3. the centres are the same, but the covariance structure is different;
Note that if we accept the first hypothesis, it makes no sense to test the second or the
third; the same happens for the second with respect to the third, although these two
are exchangeable. This can be considered as a lattice structure in which we go from the
bottom or lowest level to the top or highest level until we accept one hypothesis. At
that point it makes no sense to test further hypothesis and it is advisable to stop.
To perform tests on these hypothesis, we are going to use coordinates x∗ and to
assume they follow each a multivariate normal distribution. For the parameters of the
two multivariate normal distributions, the four hypothesis are expressed, in the same
order as above, as follows:
1. the vectors of expected values and the covariance matrices are the same:
µ1 = µ2 and Σ1 = Σ2 ;
57
58 Chapter 7. Statistical inference
2. the covariance matrices are the same, but not the vectors of expected values:
µ1 6= µ2 and Σ1 = Σ2 ;
3. the vectors of expected values are the same, but not the covariance matrices:
µ1 = µ2 and Σ1 6= Σ2 ;
4. neither the vectors of expected values, nor the covariance matrices are the same:
µ1 6= µ2 and Σ1 6= Σ2 .
The last hypothesis is called the model, and the other hypothesis will be confronted
with it, to see which one is more plausible. In other words, for each test, the model will
be the alternative hypothesis H1 .
For each single case we can use either unbiased or maximum likelihood estimates of
the parameters. Under assumptions of multivariate normality, they are identical for the
expected values and have a different divisor of the covariance matrix (the sample size
n in the maximum likelihood approach, and n − 1 in the unbiased case). Here develop-
ments will be presented in terms of maximum likelihood estimates, as those have been
used in the previous chapter. Note that estimators change under each of the possible
hypothesis, so each case will be presented separately. The following developments are
based on Aitchison (1986, p. 153-158) and Krzanowski (1988, p. 323-329), although
for a complete theoretical proof see Mardia et al. (1979, section 5.5.3). The primary
computations from the coordinates, h(x1 ) = x∗1 , of the n1 samples in one group, and
h(x2 ) = x∗2 , of the n2 samples in the other group, are
1. the separate sample estimates
n1 Σ̂1 + n2 Σ̂2
Σ̂p = , (7.1)
n1 + n2
7.1. Hypothesis about two groups 59
To test the different hypothesis, we will use the generalised likelihood ratio test,
which is based on the following principles: consider the maximised likelihood function
for data x∗ under the null hypothesis, L0 (x∗ ) and under the model with no restrictions
(case 4), Lm (x∗ ). The test statistic is then R(x∗ ) = Lm (x∗ )/L0 (x∗ ), and the larger the
value is, the more critical or resistant to accept the null hypothesis we shall be. In some
cases the exact distribution of this cases is known. In those cases where it is not known,
we shall use Wilks asymptotic approximation: under the null hypothesis, which places
c constraints on the parameters, the test statistic Q(x∗ ) = 2 ln(R(x∗ )) is distributed
approximately as χ2 (c). For the cases to be studied, the approximate generalised ratio
test statistic then takes the form:
! !
∗ |Σ̂10 | |Σ̂20 |
Q0m (x ) = n1 ln + n2 ln .
|Σ̂1m | |Σ̂2m |
1
to be compared against the upper percentage points of the χ2
2
D(D − 1) dis-
tribution.
2. Equality of covariance structure with different centres: The null hypothesis is that
µ1 6= µ2 and Σ1 = Σ2 , thus we need the estimates of µ1 , µ2 and of the common
covariance matrix Σ = Σ1 = Σ2 , which are Σ̂p for Σ under the null hypothesis
and Σ̂i for Σi , i = 1, 2, under the model, resulting in a test statistic
! !
∗ | Σ̂p | | Σ̂p | 2 1
Q2vs4 (x ) = n1 ln + n2 ln ∼χ (D − 1)(D − 2) .
|Σ̂1 | |Σ̂2 | 2
60 Chapter 7. Statistical inference
(c) compute the variances of each group with respect to the common mean:
2. we do not know the mean vector and variance matrix of the random vector, and
want to plot a confidence region for its mean using a sample of size n,
3. we do not know the mean vector and variance matrix of the random vector, and
want to plot a probability region (incorporating our uncertainty).
In the first case, if a random vector x∗ follows a multivariate normal distribution
with known parameters µ and Σ, then
D−1
with κ = n−D+1 c. But (x̄∗ − µ)Σ̂−1 (x̄∗ − µ)′ = κ (constant) defines a (1 − α)100%
confidence region centred at x̄∗ in RD , and consequently ξ = h−1 (µ) defines a (1 −
α)100% confidence region around the centre in the simplex.
Finally, in the third case, one should actually use the multivariate Student-Siegel
predictive distribution: a new value of x∗ will have as density
−1
∗ 1 ∗ ∗ n/2
f (x |data) ∝ 1 + (n − 1) 1 − (x − x̄ ) · Σ · [(x∗ − x̄∗ )′ ] .
n
This distribution is unfortunately not commonly tabulated, and it is only available in
some specific packages. On the other side, if n is large with respect to D, the differences
between the first and third options are negligible.
62 Chapter 7. Statistical inference
Note that for D = 3, D − 1 = 2 and we have an ellipse in real space, in any of the
first two cases: the only difference between them is how the constant κ is computed.
The parameterisation equations in polar coordinates, which are necessary to plot these
ellipses, are given in Appendix B.
7.3 Exercises
Exercise 7.1. Divide the sample of Table 5.1 into two groups (at your will) and perform
the different tests on the centres and covariance structures.
Exercise 7.2. Compute and plot a confidence region for the ilr transformed mean of
the data from table 2.1 in R2 .
Exercise 7.3. Transform the confidence region of exercise 7.2 back into the ternary
diagram using ilr−1 .
Exercise 7.4. Compute and plot a 90% probability region for the ilr transformed data
of table 2.1 in R2 , together with the sample. Use the chi square distribution.
Exercise 7.5. For each of the four hypothesis in section 7.1, compute the number of
parameters to be estimated if the composition has D parts. The fourth hypothesis
needs more parameters than the other three. How many, with respect to each of the
three simpler hypothesis? Compare with the degrees of freedom of the χ2 distributions
of page 59.
Chapter 8
Compositional processes
Compositions can evolve depending on an external parameter like space, time, tem-
perature, pressure, global economic conditions and many other ones. The external
parameter may be continuous or discrete. In general, the evolution is expressed as a
function x(t), where t represents the external variable and the image is a composition in
S D . In order to model compositional processes, the study of simple models appearing
in practice is very important. However, apparently complicated behaviours represented
in ternary diagrams may be close to linear processes in the simplex. The main challenge
is frequently to identify compositional processes from available data. This is done using
a variety of techniques that depend on the data, the selected model of the process and
the prior knowledge about them. Next subsections present three simple examples of
such processes. The most important is the linear process in the simplex, that follows
a straight-line in the simplex. Other frequent process are the complementary processes
and mixtures. In order to identify the models, two standard techniques are presented:
regression and principal component analysis in the simplex. The first one is adequate
when compositional data are completed with some external covariates. Contrarily, prin-
cipal component analysis tries to identify directions of maximum variability of data, i.e.
a linear process in the simplex with some unobserved covariate.
63
64 8. Compositional processes
where a straight-line is identified: x(0) is a point on the line taken as the origin; p is a
constant vector representing the direction of the line; and t is a parameter with values
on the real line (positive or negative).
The linear character of this process is enhanced when it is represented using coor-
dinates. Select a basis in S D , for instance, using balances determined by a sequential
binary partition, and denote the coordinates u(t) = ilr(x)(t), q = ilr(p). The model
for the coordinates is then
u(t) = u(0) + t · q , (8.3)
a typical expression of a straight-line in RD−1 . The processes that follow a straight-line
in the simplex are more general than those represented by Equations (8.2) and (8.3),
because changing the parameter t by any function φ(t) in the expression, still produces
images on the same straight-line.
Table 8.1: Sequential binary partition and balance-coordinates used in the example growth of bacteria
population
order x1 x2 x3 balance-coord.
1 +1 +1 −1 u1 = √1 ln x1 x 2
6 x23
2 +1 −1 0 u2 = √12 ln xx12
The process of growth is shown in Figure 8.1, both in a ternary diagram (left) and in
the plane of the selected coordinates (right). Using coordinates it is easy to identify that
the process corresponds to a straight-line in the simplex. Figure 8.2 shows the evolution
8.1. Linear processes 65
x1
2
t=0
t=0 1
u2
-1
-2
t=5
t=5
-3
-4 -3 -2 -1 0 1 2 3 4
x2 x3 u1
Figure 8.1: Growth of 3 species of bacteria in 5 units of time. Left: ternary diagram; axis used are
shown (thin lines). Right: process in coordinates.
1 1
0.9
0.8 0.8
x1 x3
x3
0.7
cumulated per unit
mass (per unit)
0.6 0.6
x2
0.5
x2
0.4 0.4
0.3
x1
0.2 0.2
0.1
0 0
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 1 2 3 4 5
time time
Figure 8.2: Growth of 3 species of bacteria in 5 units of time. Evolution of per unit of mass for each
species. Left: per unit of mass. Right: cumulated per unit of mass; x1 , lower band; x2 , intermediate
band; x3 upper band. Note the inversion of abundances of species 1 and 3.
of the process in time in two usual plots: the one on the left shows the evolution of each
part-component in per unit; on the right, the same evolution is presented as parts adding
up to one in a cumulative form. Normally, the graph on the left is more understandable
from the point of view of evolution.
Example 8.2 (washing process). A liquid reservoir of constant volume V receives an
input of the liquid of Q (volume per unit time) and, after a very active mixing inside
the reservoir, an equivalent output Q is released. At time t = 0, volumes (or masses)
x1 (0), x2 (0), x3 (0) of three contaminants are stirred in the reservoir. The contaminant
species are assumed non-reactive. Attention is paid to the relative content of the three
species at the output in time. The output concentration is proportional to the mass in
the reservoir (Albarède, 1995)[p.346],
t
xi (t) = xi (0) · exp − , i = 1, 2, 3 .
V /Q
66 8. Compositional processes
Example 8.3 (one radioactive isotope). Consider the radioactive isotope x1 which is
transformed into the non-radioactive isotope x3 , while the element x2 remains unaltered.
This situation, with λ1 < 0, corresponds to
that is mass preserving. The group of parts behaving linearly is {x1 , x2 }, and a com-
plementary group is {x3 }. Table 8.2 shows parameters of the model and Figures 8.3
and 8.4 show different aspects of the compositional process from t = 0 to t = 10.
Table 8.2: Parameters for Example 8.3: one radioactive isotope. Disintegration rate is ln 2 times
the inverse of the half-lifetime. Time units are arbitrary. The lower part of the table represents the
sequential binary partition used to define the balance-coordinates.
parameter x1 x2 x3
disintegration rate 0.5 0.0 0.0
initial mass 1.0 0.4 0.5
balance 1 +1 +1 −1
balance 2 +1 −1 0
x1
1
t=0
-1
u2
t=0
-2
-3
t=10
-4
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5
x2 t=5 x3 u1
Figure 8.3: Disintegration of one isotope x1 into x3 in 10 units of time. Left: ternary diagram; axis
used are shown (thin lines). Right: process in coordinates. The mass in the system is constant and
the mass of x2 is constant.
A first inspection of the Figures reveals that the process appears as a segment in the
ternary diagram (Fig. 8.3, right). This fact is essentially due to the constant mass of x2
in a conservative system, thus appearing as a constant per-unit. In figure 8.3, left, the
evolution of the coordinates shows that the process is not linear; however, except for
initial times, the process may be approximated by a linear one. The linear or non-linear
68 8. Compositional processes
0.9 1
0.8
x3
0.8
0.7
0.6
0.6
0.5
0.4
0.4
0.3
x2
0.2
0.2
x2
x1
0.1
x1
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
time time
Figure 8.4: Disintegration of one isotope x1 into x3 in 10 units of time. Evolution of per unit of
mass for each species. Left: per unit of mass. Right: cumulated per unit of mass; x1 , lower band; x2 ,
intermediate band; x3 upper band. Note the inversion of abundances of species 1 and 3.
character of the process is hardly detected in Figures 8.4 showing the evolution in time
of the composition.
Example 8.4 (three radioactive isotopes). Consider three radioactive isotopes that
we identify with a linear group of parts, {x1 , x2 , x3 }. The disintegrated mass of x1
is distributed on the non-radioactive parts {x4 , x5 , x6 } (complementary group). The
whole disintegrated mass from x2 and x3 is assigned to x4 and x5 respectively. The
Table 8.3: Parameters for Example 8.4: three radioactive isotopes. Disintegration rate is ln 2 times
the inverse of the half-lifetime. Time units are arbitrary. The middle part of the table corresponds to
the coefficients aij indicating the part of the mass from xi component transformed into the xj . Note
they add to one and the system is mass conservative. The lower part of the table shows the sequential
binary partition to define the balance coordinates.
parameter x1 x2 x3 x4 x5 x6
disintegration rate 0.2 0.04 0.4 0.0 0.0 0.0
initial mass 30.0 50.0 13.0 1.0 1.2 0.7
mass from x1 0.0 0.0 0.0 0.7 0.2 0.1
mass from x2 0.0 0.0 0.0 0.0 1.0 0.0
mass from x3 0.0 0.0 0.0 0.0 0.0 1.0
balance 1 +1 +1 +1 −1 −1 −1
balance 2 +1 +1 −1 0 0 0
balance 3 +1 −1 0 0 0 0
balance 4 0 0 0 +1 +1 −1
balance 5 0 0 0 +1 −1 0
8.2. Complementary processes 69
0.5 0.2
x5
0.4
x4
0.0
per unit mass
0.3
t=0
y5
x6
0.2
-0.2
0.1
t=20
0.0 -0.4
0.0 2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 20.0 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5
time y4
Figure 8.5: Disintegration of three isotopes x1 , x2 , x3 . Disintegration products are masses added to
x4 , x5 , x6 in 20 units of time. Left: evolution of per unit of mass of x4 , x5 , x6 . Right: x4 , x5 , x6 process
in coordinates; a loop and a double point are revealed.
x4
x5 x6
values of the parameters considered are shown in Table 8.3. Figure 8.5 (left) shows
the evolution of the subcomposition of the complementary group in 20 time units; no
special conclusion is got from it. Contrarily, Figure 8.5 (right), showing the evolution
of the coordinates of the subcomposition, reveals a loop in the evolution with a double
point (the process passes two times through this compositional point); although less
clearly, the same fact can be observed in the representation of the ternary diagram in
Figure 8.6. This is a quite surprising and involved behaviour despite of the very simple
character of the complementary process. Changing the parameters of the process one
can obtain more simple behaviours, for instance without double points or exhibiting less
curvature. However, these processes only present one possible double point or a single
bend point; the branches far from these points are suitable for a linear approximation.
Example 8.5 (washing process (continued)). Consider the washing process. Let us
assume that the liquid is water with density equal to one and define the mass of water
70 8. Compositional processes
P
x0 (t) = V · 1 − xi (t), that may be considered as a complementary process. The mass
concentration at the output is the closure of the four components, being the closure
constant proportional to V . The compositional process is not a straight-line in the
simplex, because the new balance now needed to represent the process is
Exercise 8.3. In the washing process example, set x1 (0) = 1., x2 (0) = 2., x3 (0) = 3.,
V = 100., Q = 5.. Find the sequential binary partition used in the example. Plot the
evolution in time of the coordinates and mass concentrations including the water x0 (t).
Plot, in a ternary diagram, the evolution of the subcomposition x0 , x1 , x2 .
where z(t) is the process of the concentration in the first container. Note that x, y, z
are considered closed to 1. The final composition in the first container is
1
z1 = (m1 x + m2 y) (8.4)
m1 + m2
The mixture process can be alternatively expressed as mixture of the initial and final
compositions (often called end-points):
for some function of time, α(t), where, to fit the physical statement of the process,
0 ≤ α ≤ 1. But there is no problem in assuming that α may take values on the whole
real-line.
8.3. Mixture processes 71
which, being a simple system, is not linear in the unknowns. Note that (8.5) involves
masses or volumes and, therefore, it is not a purely compositional equation. This
situation always occurs in mixture processes. Figure 8.7 shows the process of mixing (M)
both in a ternary diagram (left) and in the balance-coordinates u1 = 6−1/2 ln(z1 z2 /z3 ),
u2 = 2−1/2 ln(z1 /z2 ) (right). Fig. 8.7 also shows a perturbation-linear process, i.e. a
x1 0.0
-0.2
z0
-0.4
-0.6
M
u2
-0.8
P
-1.0
z0
-1.2
M z1
-1.4
P
z1
-1.6
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5
x2 x3 u1
Figure 8.7: Two processes going from z0 to z1 . (M) mixture process; (P) linear perturbation
process. Representation in the ternary diagram, left; using balance-coordinates u1 = 6−1/2 ln(z1 z2 /z3 ),
u2 = 2−1/2 ln(z1 /z2 ), right.
unit per second with volume composition x = C[80, 2, 18] gets into the box. After a
complete mixing there is an output whose flow equals Q with the volume composition
x(t) at the time t. Model the evolution of the volumes of the three components in
the container using ordinary linear differential equations and solve them (Hint: these
equations are easily found in textbooks, e.g. Albarède (1995)[p. 345–350]). Are you
able to plot the curve for the output composition x(t) in the simplex without using the
solution of the differential equations? Is it a mixture?
where t = [t0 , t1 , . . . , tr ] are real covariates and are identified as the parameters of the
curve or surface; the first parameter is defined as the constant t0 = 1, as assumed for the
observations. The compositional coefficients of the model, β j ∈ S D , are to be estimated
from the data. The model (8.6) is very general and takes different forms depending on
how the covariates tj are defined. For instance, defining tj = tj , being t a covariate, the
model is a polynomial, particularly, if r = 1, it is a straight-line in the simplex (8.2).
The most popular fitting method of the model (8.6) is the least-squares deviation
criterion. As the response x(t) is compositional, it is natural to measure deviations
also in the simplex using the concepts of the Aitchison geometry. The deviation of the
model (8.6) from the data is defined as x̂(ti) ⊖ xi and its size by the Aitchison norm
kx̂(ti ) ⊖ xi k2a = d2a (x̂(ti ), xi ). The target function (sum of squared errors, SSE) is
n
X
SSE = kx̂(ti ) ⊖ xi k2a ,
i=1
where k · k is the norm of a real vector. The last right-hand member of (8.8) has been
obtained permuting the order of the sums on the components of the vectors and on
the data. All sums in (8.8) are non-negative and, therefore, the minimisation of SSE
implies the minimisation of each term of the sum in k,
n
X
SSEk = |x̂∗k (ti ) − x∗ik |2 , k = 1, 2, . . . , D − 1 . (8.9)
i=1
This is, the fitting of the compositional model (8.6) reduces to the D − 1 ordinary
least-squares problems in (8.7).
Example 8.7 (Vulnerability of a system). A system is subjected to external actions.
The response of the system to such actions is frequently a major concern in engineering.
For instance, the system may be a dike under the action of ocean-wave storms; the
response may be the level of service of the dike after one event. In a simplified scenario,
three responses of the system may be considered: θ1 , service; θ2 , damage; θ3 collapse.
The dike can be designed for a design action, e.g. wave-height, d, ranging 3 ≤ d ≤ 20
(metres wave-height). Actions, parameterised by some wave-height of the storm, h, also
ranging 3 ≤ d ≤ 20 (metres wave-height). Vulnerability of the system is described by
the conditional probabilities
D
X
pk (d, h) = P[θk |d, h] , k = 1, 2, 3 = D , pk (d, h) = 1 ,
k=1
74 8. Compositional processes
where, for any d, h, p(d, h) = [p1 (d, h), p2 (d, h), p3(d, h)] ∈ S 3 . In practice, p(d, h) only
is approximately known for a limited number of values p(di , hi ), i = 1, . . . , n. The
whole model of vulnerability can be expressed as a regression model
p̂(d, h) = β 0 ⊕ (d ⊙ β 1 ) ⊕ (h ⊙ β 2 ) , (8.10)
1.2 1.2
1.0 1.0
0.8 0.8
Probability
Probability
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
3 7 11 15 19 3 7 11 15 19
wave-height, h wave-height, h
1.2 1.2
1.0 1.0
0.8 0.8
Probability
Probability
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
3 7 11 15 19 3 7 11 15 19
wave-height, h wave-height, h
1.2 1.2
1.0 1.0
0.8 0.8
Probability
Probability
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
3 7 11 15 19 3 7 11 15 19
wave-height, h wave-height, h
Figure 8.8: Vulnerability models obtained by regression in the simplex from the data in the Table
8.4. Horizontal axis: incident wave-height in m. Vertical axis: probability of the output response.
Shown designs are 3.5, 6.0, 8.5, 11.0, 13.5, 16.0 (m design wave-height).
service
2
2^(1/2) ln(p1/p2)
-1
-2
-10 -8 -6 -4 -2 0 2 4 6 8 10
6^(1/2) ln(p0^2/ p1 p2) damage collapse
Figure 8.9: Vulnerability models in Figure 8.8 in coordinates (left) and in the ternary diagram (right).
Design 3.5 (circles); 16.0 (thick line).
sample: service probabilities decrease as the level of action increases and conversely for
collapse. This changes smoothly for increasing design level. Despite the challenging
shapes of these curves describing the vulnerability, they come from a linear model as
can be seen in Figure 8.9 (left). In Figure 8.9 (right) these straight-lines in the simplex
are shown in a ternary diagram. In these cases, the regression model has shown its
smoothing capabilities.
Exercise 8.6 (sand-silt-clay from a lake). Consider the data in Table 8.5. They are
sand-silt-clay compositions from an Arctic lake taken at different depths (adapted from
Coakley and Rust (1968) and cited in Aitchison (1986)). The goal is to check whether
there is some trend in the composition related to the depth. Particularly, using the
standard hypothesis testing in regression, check the constant and the straight-line mod-
els
x̂(t) = β 0 , x̂(t) = β 0 ⊕ (t ⊙ β 1 ) ,
being t =depth. Plot both models, the fitted model and the residuals, in coordinates
and in the ternary diagram.
Figure 8.10: Principal components in S 3 . Left: before centring. Right: after centring
describes the trend reflected by the centred sample, and g ⊕ α ⊙ a1 , with g the centre
of the sample, describes the trend reflected in the non-centred data set. The evolution
of the proportion per unit volume of each part, as described by the first principal
component, is reflected in Figure 8.11 left, while the cumulative proportion is reflected
in Figure 8.11 right.
To interpret a trend we can use Equation (3.1), which allows us to rescale the vector
a1 assuming whatever is convenient according to the process under study, e.g. that one
part is stable. Assumptions can be made only on one part, as the interrelationship with
8.5. Principal component analysis 77
1,00
cumulated proportions
0,80
proporciones
0,80
0,60
0,60
0,40
0,40
0,20 0,20
0,00 0,00
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
alpha alpha
Figure 8.11: Evolution of proportions as described by the first principal component. Left: propor-
tions. Right: cumulated proportions.
the other parts conditions their value. A representation of the result is also possible, as
can be seen in Figure 8.12. The component assumed to be stable, K2 O, has a constant,
30
20
10
0
-3 -2 -1 0 1 2 3
Figure 8.12: Interpretation of a principal component in S 2 under the assumption of stability of K2O.
unit perturbation coefficient, and we see that under this assumption, within the range
of variation of the observations, Na2 O has only a very small increase, which is hardly
to perceive, while Fe2 O3 shows a considerable increase compared to the other two. In
other words, one possible explanation for the observed pattern of variability is that
Fe2 O3 varies significantly, while the other two parts remain stable.
The graph gives even more information: the relative behaviour will be preserved
under any assumption. Thus, if the assumption is that K2 O increases (decreases), then
Na2 O will show the same behaviour as K2 O, while Fe2 O3 will always change from below
to above.
Note that, although we can represent a perturbation process described by a PC only
78 8. Compositional processes
in a ternary diagram, we can extend the representation in Figure 8.12 easily to as many
parts as we might be interested in.
8.5. Principal component analysis 79
Table 8.4: Assumed vulnerability for a dike with only three outputs or responses. Probability values
of the response θk conditional to values of design d and level of the storm h.
Table 8.5: Sand, silt, clay composition of sediment samples at different water depths in an Arctic
lake.
sample no. sand silt clay depth (m) sample no. sand silt clay depth (m)
1 77.5 19.5 3.0 10.4 21 9.5 53.5 37.0 47.1
2 71.9 24.9 3.2 11.7 22 17.1 48.0 34.9 48.4
3 50.7 36.1 13.2 12.8 23 10.5 55.4 34.1 49.4
4 52.2 40.9 6.9 13.0 24 4.8 54.7 40.5 49.5
5 70.0 26.5 3.5 15.7 25 2.6 45.2 52.2 59.2
6 66.5 32.2 1.3 16.3 26 11.4 52.7 35.9 60.1
7 43.1 55.3 1.6 18.0 27 6.7 46.9 46.4 61.7
8 53.4 36.8 9.8 18.7 28 6.9 49.7 43.4 62.4
9 15.5 54.4 30.1 20.7 29 4.0 44.9 51.1 69.3
10 31.7 41.5 26.8 22.1 30 7.4 51.6 41.0 73.6
11 65.7 27.8 6.5 22.4 31 4.8 49.5 45.7 74.4
12 70.4 29.0 0.6 24.4 32 4.5 48.5 47.0 78.5
13 17.4 53.6 29.0 25.8 33 6.6 52.1 41.3 82.9
14 10.6 69.8 19.6 32.5 34 6.7 47.3 46.0 87.7
15 38.2 43.1 18.7 33.6 35 7.4 45.6 47.0 88.1
16 10.8 52.7 36.5 36.8 36 6.0 48.9 45.1 90.4
17 18.4 50.7 30.9 37.8 37 6.3 53.8 39.9 90.6
18 4.6 47.4 48.0 36.9 38 2.5 48.0 49.5 97.7
19 15.6 50.4 34.0 42.2 39 2.0 47.8 50.2 103.7
20 31.9 45.1 23.0 47.0
80 8. Compositional processes
Bibliography
81
82 BIBLIOGRAPHY
study of surface waters of a mediterranean river. Water Research Vol 39 (7), 1404–
1414.
Pawlowsky-Glahn, V. (2003). Statistical modelling on coordinates. See Thió-
Henestrosa and Martı́n-Fernández (2003).
Pawlowsky-Glahn, V. and A. Buccianti (2002). Visualization and modeling of sub-
populations of compositional data: statistical methods illustrated by means of
geochemical data from fumarolic fluids. International Journal of Earth Sciences
(Geologische Rundschau) 91 (2), 357–368.
Pawlowsky-Glahn, V. and J. Egozcue (2006). Anisis de datos composicionales con el
coda-dendrograma. In J. Sicilia-Rodrı́guez, C. González-Martı́n, M. A. González-
Sierra, and D. Alcaide (Eds.), Actas del XXIX Congreso de la Sociedad de Es-
tadı́stica e Investigación Operativa (SEIO’06), pp. 39–40. Sociedad de Estadı́stica
e Investigación Operativa, Tenerife (ES), CD-ROM.
Pawlowsky-Glahn, V. and J. J. Egozcue (2001). Geometric approach to statistical
analysis on the simplex. Stochastic Environmental Research and Risk Assessment
(SERRA) 15 (5), 384–398.
Pawlowsky-Glahn, V. and J. J. Egozcue (2002). BLU estimators and compositional
data. Mathematical Geology 34 (3), 259–274.
Peña, D. (2002). Análisis de datos multivariantes. McGraw Hill. 539 p.
Pearson, K. (1897). Mathematical contributions to the theory of evolution. on a form
of spurious correlation which may arise when indices are used in the measurement
of organs. Proceedings of the Royal Society of London LX, 489–502.
Richter, D. H. and J. G. Moore (1966). Petrology of the Kilauea Iki lava lake, Hawaii.
U.S. Geol. Surv. Prof. Paper 537-B, B1-B26, cited in (Rollinson, 1995).
Rollinson, H. R. (1995). Using geochemical data: Evaluation, presentation, interpre-
tation. Longman Geochemistry Series, Longman Group Ltd., Essex (UK). 352
p.
Sarmanov, O. V. and A. B. Vistelius (1959). On the correlation of percentage values.
Doklady of the Academy of Sciences of the USSR – Earth Sciences Section 126,
22–25.
Solano-Acosta, W. and P. K. Dutta (2005). Unexpected trend in the compositional
maturity of second-cycle sand. Sedimentary Geology 178 (3-4), 275–283.
Thió-Henestrosa, S., J. J. Egozcue, V. Pawlowsky-Glahn, L. O. Kovács, and
G. Kovács (2007). Balance-dendrogram. a new routine of codapack. Computer
& Geosciences.
86 BIBLIOGRAPHY
Denote the three vertices of the ternary diagram counter-clockwise from the upper
vertex as A, B and C (see Figure 13). The scale of the plot is arbitrary and a unitary
1.2
A=[u0+0.5, v0+3^0.5/2]
0.8
0.6
v
0.4
0.2
B=[u0,v0] C=[u0+1, v0]
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4
u
Figure 13: Plot of the frame of a ternary diagram. The shift plotting coordinates are [u0 , v0 ] =
[0.2, 0.2], and the length of the side is 1.
equilateral triangle can be chosen. Assume that [u0 , v0 ] are the plotting coordinates
of the B vertex. The C vertex is then C = [u0 + 1, v0 ]; and the vertex A has abscisa
u0 + 0.5 and the square-height is obtained
√ using Pythagorean Theorem: 12 −0.52 = 3/4.
Then, the vertex A = [u0 + 0.5, v0 + 3/2]. These are the vertices of the triangle shown
in Figure 13, where the origin has been shifted to [u0 , v0 ] in order to center the plot.
The figure is obtained plotting the segments AB, BC, CA.
To plot a sample point x = [x1 , x2 , x3 ], closed to a constant κ, the corresponding
plotting coordinates [u, v] are needed. They are obtained as a convex linear combination
of the plotting coordinates of the vertices
1
[u, v] = (x1 A + x2 B + x3 C) ,
κ
with √
A = [u0 + 0.5, v0 + 3/2] , B = [u0 , v0 ] , C = [u0 + 1, v0 ] .
a
b Appendix A. The ternary diagram
Note that the coefficients of the convex linear combination must be closed to 1 as
obtained dividing by κ. Deformed ternary diagrams can be obtained just changing the
plotting coordinates of the vertices and maintaining the convex linear combination.
B. Parametrisation of an elliptic
region
To plot an ellipse in R2 , and to plot its backtransform in the ternary diagram, we need
to give to the plotting program a sequence of points that it can join by a smooth curve.
This requires the points to be in a certain order, so that they can be joint consecutively.
The way to do this is to use polar coordinates, as they allow to give a consecutive
sequence of angles which will follow the border of the ellipse in one direction. The
degree of approximation of the ellipse will depend on the number of points used for
discretisation.
The algorithm is based on the following reasoning. Imagine an ellipse located in R2
with principal axes not parallel to the axes of the Cartesian coordinate system. What we
have to do to express it in polar coordinates is (a) translate the ellipse to the origin; (b)
rotate it in such a way that the principal axis of the ellipse coincide with the axis of the
coordinate system; (c) stretch the axis corresponding to the shorter principal axis in such
a way that the ellipse becomes a circle in the new coordinate system; (d) transform the
coordinates into polar coordinates using the simple expressions x∗ = r cos ρ, y ∗ = r sin ρ;
(e) undo all the previous steps in inverse order to obtain the expression of the original
equation in terms of the polar coordinates. Although this might sound tedious and
complicated, in fact we have results from matrix theory which tell us that this procedure
can be reduced to a problem of eigenvalues and eigenvectors.
In fact, any symmetric matrix can be decomposed into the matrix product QΛQ′ ,
where Λ is the diagonal matrix of eigenvalues and Q is the matrix of orthonormal
eigenvectors associated with them. For Q we have that Q′ = Q−1 and therefore (Q′ )−1 =
Q. This can be applied to either the first or the second options of the last section.
In general, we are interested in ellipses whose matrix is related to the sample covari-
ance matrix Σ̂, particularly its inverse. We have Σ̂−1 = QΛ−1 Q′ and substituting into
the equation of the ellipse (7.5), (7.6):
(x̄∗ − µ)QΛ−1 Q′ (x̄∗ − µ)′ = (Q′ (x̄∗ − µ)′ )′ Λ−1 (Q′ (x̄∗ − µ)′ ) = κ ,
where x̄∗ is the estimated centre or mean and µ describes the ellipse. The vector
Q′ (x̄∗ −µ)′ corresponds to a rotation in real space in such a way, that the new coordinate
c
d Appendix B. Parametrisation of an elliptic region
axis are precisely the eigenvectors. Given that Λ is a diagonal matrix, the next step
consists in writing Λ−1 = Λ−1/2 Λ−1/2 , and we get:
(Q′ (x̄∗ − µ)′ )′ Λ−1/2 Λ−1/2 (Q′ (x̄∗ − µ)′ )
= (Λ−1/2 Q′ (x̄∗ − µ)′ )′ (Λ−1/2 Q′ (x̄∗ − µ)′ ) = κ.
This transformation is equivalent to a√re-scaling of the basis vectors in such a way, that
the ellipse becomes a circle of radius κ, which is easy to express in polar coordinates:
√ √
−1/2 ′ ∗ ′ √ κ cos θ ∗ ′ 1/2 √ κ cos θ
Λ Q (x̄ − µ) = , or (x̄ − µ) = QΛ .
κ sin θ κ sin θ
The parametrisation that we are looking for is thus given by:
√
′ ∗ ′ 1/2 √ κ cos θ
µ = (x̄ ) − QΛ .
κ sin θ
Note that QΛ1/2 is the upper triangular matrix of the Cholesky decomposition of Σ̂:
Σ̂ = QΛ1/2 Λ1/2 Q′ = (QΛ1/2 )(Λ1/2 Q′ ) = UL;
thus, from Σ̂ = UL and L = U ′ we get the condition:
u11 u12 u11 0 Σ̂11 Σ̂12
= ,
0 u22 u12 u22 Σ̂12 Σ̂22
which implies
q
u22 = Σ̂22 ,
Σ̂12
u12 = p ,
Σ̂22
s s
Σ̂11 Σ̂22 − Σ̂212 |Σ̂|
u11 = = ,
Σ̂22 Σ̂22
and for each component of the vector µ we obtain:
s
|Σ̂| √ Σ̂12 √
µ1 = x̄∗1 − κ cos θ − p κ sin θ
Σ̂22 Σ̂22
√
q
∗
µ2 = x̄2 − Σ̂22 κ sin θ.
The points describing the ellipse in the simplex are ilr−1 (µ) (see Section 4.4).
The procedures described apply to the three cases studied in section 7.2, just using
the appropriate covariance matrix Σ̂. Finally, recall that κ will be obtained from a
chi-square distribution.