Abstract: This Paper Considers Three Conceptions of Musical Distance (Or
Abstract: This Paper Considers Three Conceptions of Musical Distance (Or
Abstract: This Paper Considers Three Conceptions of Musical Distance (Or
Dmitri Tymoczko
310 Woolworth Center, Princeton University, Princeton, NJ 08544.
1 Introduction
We begin with voice-leading spaces that make use of the log-frequency metric.1
Pitches here are represented by the logarithms of their fundamental frequencies, with
distance measured according to the usual metric on R; pitches are therefore close if
they are near each other on the piano keyboard. A point in Rn represents an ordered
series of pitch classes. Distance in this higher-dimensional space can be interpreted
as the aggregate distance moved by a collection of musical voices in passing from
one chord to another. (We can think of this, roughly, as the aggregate physical
distance traveled by the fingers on the piano keyboard.)
By disregarding
informationsuch as the octave or order of a group of noteswe fold Rn into an
non-Euclidean quotient space or orbifold.
(For example, imposing octave
equivalence transforms Rn into the n-torus Tn, while transpositional equivalence
transforms Rn into Rn1, orthogonally projecting points onto the hyperplane whose
coordinates sum to zero.) Points in the resulting orbifolds represent equivalence
classes of musical objectssuch as chords or set classeswhile generalized line
For more on these spaces, see Callender 2004, Tymoczko 2006, and Callender, Quinn, and
Tymoczko 2008.
CC
CDf
BCs
CsD
DEf
CsDs
CD
CEf
BD
BfD
BDs
BfEf
EfEf
DD
DsE
BE
DfF
CF
EF
DF
FGf
EFs
EfGf
DFs
DfGf
[FsFs]
FF
EfF
DE
CsE
CE
EE
EG
EfG
DG
[FG]
[EGs]
EfAf
unison
minor second
major second
minor third
major third
perfect fourth
BfE
BF
CFs
CsG
Fig. 1. The Mbius strip representing voice-leading relations among two-note chords.
Lets now turn to a very different sort of model, the Tonnetz and related structures,
which I will describe generically as tuning lattices. These models are typically
discrete, with adjacent points on a particular axis being separated by the same
interval. The leftmost lattice in Figure 3 shows the most familiar of these, where the
two axes represent acoustically pure perfect fifths and major thirds. (One can imagine
a third axis, representing either the octave or the acoustical seventh, projecting
outward from the paper.) The model asserts that the pitch G4 has an acoustic affinity
to both C4 (its underfifth) and D5 (its overfifth), as well as to Ef4 and B4 (its
underthird and overthird, respectively). The lattice thus encodes a fundamentally
different notion of musical distance than the earlier voice leading models: whereas A3
and Af3 are very close in log-frequency space, they are four steps apart our tuning
lattice. Furthermore, where chords (or more generally musical objects) are
represented by points in the voice leadings spaces, they are represented by polytopes
in the lattices.3
Finally, there are measures of musical distance that rely on chords shared interval
content. From this point of view, the chords {C, Cs, E, Fs} and {C, Df, Ef, G}
resemble one another, since they are nontrivially homometric or Z-related: that is,
they share the same collection of pairwise distances between their notes. (For
instance, both contain exactly one pair that is one semitone apart, exactly one pair that
is two semitones apart, and so on.) However, these chords are not particularly close
2
The adjective generalized indicates that these line segments may pass through one of the
spaces singular points, giving rise to mathematical complications.
3
For a modern introduction to the Tonnetz, see Cohn 1997, 1998, and 1999.
in either of the two models considered previously. It is not intuitively obvious that
this notion of similarity produces any particular geometrical space. But Ian Quinn
has shown that one can use the discrete Fourier transform to generate (in the familiar
equal-tempered case) a six-dimensional quality space in which chords that share the
same interval content are represented by the same point.4 We will explore the details
shortly.
Fig. 2. The cone representing voice-leading relations among three-note transpositional set
classes.
A3
E4
B4
Fs5
Cs6
A3
E4
B4
F5
C6
F3
C4
G4
D5
A5
F3
C4
G4
D5
A5
Df3
Af3
Ef4
Bf4
F5
D3
A3
E4
B4
F5
Fig. 3. Two discrete tuning lattices. On the left, the chromatic Tonnetz, where horizontally
adjacent notes are linked by acoustically pure fifths, while vertically adjacent notes are linked
by acoustically pure major thirds. On the right, a version of the structure that uses diatonic
intervals.
Clearly, these three musical models are very different, and it would be somewhat
surprising if there were to be close connections between them. But we will soon see
that this is in fact this case.
EE
DD
G
{F
FF
CF
{BC}
CE
DG
BF
DA
CG
AF
F}
E
[CC]
{AB}
BB
AA
EB
CB
GG
BA
AG
{D
BG
CA
GF
DA
BF
AE
AE
BE
EG
A}
{G
[FG]
DF
BD
GD
DE
{C
D}
CG
FC
EF
CD
CC
Fig. 4. (left) most efficient voice-leadings between diatonic fifths form a chain that runs
through the center of the Mbius strip from Figure 1. (right) These voice leadings form an
abstract circle, in which adjacent dyads are related by three-step diatonic transposition, and are
linked by single-step voice leading.
C
{B
{G
A}
a
F}
{DE}
{E
G}
{C
{F
{AB}
Fig. 5. (left) most efficient voice-leadings between diatonic triads form a chain that runs
through the center of the orbifold representing three-note chords. (right) These voice leadings
form an abstract circle, in which adjacent triads are linked by single-step voice leading. Note
that here, adjacent triads are related by transposition by two diatonic steps.
Fig. 6. Major, minor, and augmented triads as they appear in the orbifold representing threenote chords. Here, triads are particularly close to their major-third transpositions.
. . .
t0}
{024579e}
67
G 9e}
{0
4}
{3
1}
7}
{6
Af
t0}
68
E 9e}
{13
t}
. . .
78
135
3}
{9
{2
2}
{1
Ef
8}
{124689e}
{8
9}
Bf
{23
579
t0}
6}
{7
{23578t0}
e}
{t
{5
9e}
467
{12 D
{02
79
5
{24
{13
}
Fs/Gf
8te
5
46
Cs 68t0}
Cf
/D
{e
B/ {45}
f
0}
{13
{13568te}
Fig. 7. Fifth-related diatonic scales form a chain that runs through the center of the sevendimensional orbifold representing seven-note chords. It is structurally analogous to the circles
in Figures 4 and 5.
Correlation
Bach
MAJOR
MINOR
.96
Haydn
.93
Mozart
.91
Beethoven
.96
Bach
.95
Haydn
.91
Mozart
.91
Beethoven
.96
Fig. 8. Correlations between modulation frequency and voice-leading distances among scales,
in Bachs Well-Tempered Clavier, and the piano sonatas of Haydn, Mozart, and Beethoven.
The very high correlations suggest that composers typically modulate between keys whose
associated scales can be linked by efficient voice leading.
Bf
Gf
E
Af
Similar points could potentially be made about the prevalence, in functionally tonal music, of
root-progressions by perfect fifth. It may be that the diatonic circle of thirds shown in Figure
5 provides a more perspicuous model of functional harmony than do more traditional fifthbased representations.
7
See Cohn 1997.
8
This is not true of the voice leading spaces considered earlier: for example, in three-note
chord space {C, D, F} is not particularly close to {F, Af, Bf}.
Ff
Df
Cf
Af
Gf
Ef
Df
Bf
& b
4
F
1
2
Fs
Cs
Fig. 10. On the Tonnetz, F major (triangle 3) is closer to C major (triangle 1) than F minor
(triangle 4) is. In actual music, however, F minor frequently appears as a passing chord
between F major and C major. Note that, unlike in Figure 3, I have here used a Tonnetz in
which the axes are not orthogonal; this difference is merely orthographical, however.
In general, the notion of closeness needs to be spelled out carefully, since chords can
contain notes that are very far apart on the lattice. In the cases we are concerned with, chords
occupy a small region of the tuning lattice, and the notion of closeness is fairly
straightforward.
10
See Tymoczko 2006 and 2008. The point is relatively obvious when one thinks
geometrically: the two chords divide the pitch-class circle nearly evenly into the same
number of pieces; hence, if any two of their notes are close, then each note of one chord is
near some note of the other.
theory, which considers triangles sharing an edge to be one unit apart and which
decomposes larger distances into sequences of one-unit moves.) Yet it takes only two
semitones of total motion to move from C major to F minor, and three to move from
C major to F major. (This is precisely why F minor often appears as a passing chord
between F major and C major.) The Tonnetz thus depicts F major as being closer to C
major than F minor is, even though contrapuntally the opposite is true. This means
we cannot use the figure to explain the ubiquitous nineteenth-century IV-iv-I
progression, in which the two-semitone motion ^6^5 is broken into two singlesemitone motions ^6 f^6 ^5 .
One way to put the point is that while adjacencies on the Tonnetz reflect voiceleading facts, other relationships do not. As Cohn has emphasized, two major or
minor triads share an edge if they can be linked by parsimonious voice-leading in
which a single voice moves by one or two semitones. Thus, if we are interested in
this particular kind of voice leading then the Tonnetz provides an accurate and useful
model. However, there is no analogous characterization of larger distances in the
space. In other words, we do not get a recognizable notion of voice-leading distance
by decomposing voice leadings into sequences of parsimonious moves: as we have
seen, (F, A C)(E, G, C) can be decomposed into two parsimonious moves, while it
takes three to represent (F, Af, C)(E, G, C); yet intuitively the first voice leading
should be larger than the second. The deep issue here is that it is problematic to assert
that parsimonious voice leadings are always smaller than non-parsimonious voiceleadings: for by asserting that (C, E, A)(C, E, G) is smaller than (C, F, Af)(C, E,
G), the theorist runs afoul what Tymoczko calls the distribution constraint, known
to mathematicians as the submajorization partial order.11 Tymoczko argues that
violations of the distribution constraint invariably produce distance measures that
violate our intuitions about voice leading; the problem with larger distances on the
Tonnetz would seem to illustrate this more general claim.
Nevertheless, the fact remains that the two kinds of distance are roughly consistent:
for major and minor triads, the correlation between Tonnetz distance and voiceleading distance is a reasonably high .79.12 Furthermore, since Tymoczkos
distribution constraint is not intuitively obvious, unwary theorists might well think
that they could consistently declare the parsimonious voice leading (C, E, G)(C,
E, A) to be smaller than the non-parsimonious (C, E, G)(Cs, E, Gs). (Indeed, the
very meaning of the term parsimonious suggests that some theorists have in fact
done so.) Consequently, Tonnetz-distances might well appear, at first or even second
blush, to reflect some reasonable notion of voice-leading distance; and this in turn
could lead the theorist to conclude that the Tonnetz provides a generally applicable
tool for investigating triadic voice-leading. I have argued that we should resist this
conclusion: if we use the Tonnetz to model chromatic music, than Schuberts major11
See Tymoczko 2006, and Hall and Tymoczko 2007. Metrics that violate the distribution
constraint have counterintuitive consequences, such as preferring crossed voice leadings to
their uncrossed alternatives. Here, the claim that A minor is closer to C major than F minor
leads to the F minor/F major problem discussed in Figure 10.
12
Here I use the L1 or taxicab metric. The correlation between Tonnetz distances and the
number of shared common tones is an even-higher .9.
third juxtapositions will seem very different from his habit of interposing F minor
between F major and C major, since the first can be readily explained using the
Tonnetz whereas the second cannot.13 The danger, therefore, is that we might find
ourselves drawing unnecessary distinctions between these two casesparticularly if
we mistakenly assume the Tonnetz is a fully faithful model of voice-leading
relationships.
either {0}, {0, 4} or {0, 4, 8}, and so on. Figure 11 shows the location of the subsets
of the n-note perfectly even chord, as they appear in the orbifold representing threenote set-classes, for values of n ranging from 1 to 6.17 Associated to each graph is one
of the six Fourier components. For any three-note set class, the magnitude of its nth
Fourier component is a decreasing function of the distance to the nearest of these
marked points: for instance, the magnitude of the third Fourier component (FC3)
decreases, the farther one is from the nearest of {0}, {0, 4} and {0, 4, 8}. Thus,
chords in the shaded region of Figure 12 will tend to have a relatively large FC3,
while those in the unshaded region will have a smaller FC3. Figure 13 shows that this
relationship is very-nearly linear for twelve-tone equal-tempered trichords.
Fig. 11. The magnitude of a set classs nth Fourier component is approximately linearly related
to the size of the minimal voice leading to the nearest subset of the perfectly even n-note chord,
shown here as dark spheres.
17
See Callender 2004, Tymoczko 2006, Callender, Quinn, and Tymoczko, 2008.
Fig. 12. Chords in the shaded region will have a large FC3 component, since they are near
subsets of {0, 4, 8}. Those in the unshaded region will have a smaller FC3 component.
014 001
015 003
037 005
048
004
000
y = 1.38x + 3.16
024 002
026 006
027 013
016 036
012 025
0.5
1.5
Figure 13. For trichords, the equation FC3 = 1.38VL + 3.16 relates the third Fourier
component to the Euclidean size of the minimal voice leading to the nearest subset of {0, 4, 8}.
Table 1. Correlations between voice-leading distances and Fourier magnitudes.
Dyads
Trichords
Tetrachords
Pentachords
Hexachords
Septachords
Octachords
Nonachords
Decachords
FC1
-.97
-.98
-.96
-.96
-.96
-.96
-.96
-.96
-.96
FC2
-.96
-.97
-.96
-.96
-.96
-.96
-.96
-.96
-.96
FC3
-.97
-.97
-.95
-.95
-.95
-.96
-.95
-.96
-.96
FC4
-1
-.98
-.98
-.98
-.96
-.97
-.98
-.98
-.98
FC5
-.97
-.98
-.96
-.96
-.96
-.96
-.96
-.96
-.96
FC6
-1*
-1*
-1*
-1*
-1*
-1*
-1*
-1*
-1*
leading size in very finely quantized pitch-class space. Since 48-tone equal
temperament is so finely quantized, these numbers are approximately valid for
continuous, unquantized pitch-class space.18
Table 2. Correlations between voice-leading distances and Fourier magnitudes in 48-tone equal
temperament.
Trichords
Tetrachords
Pentachords
Hexachords
FC1
-.99
-.97
-.97
-.96
Explaining these correlations, though not very difficult, is beyond the scope of this
paper. From our perspective, the important question is whether we should measure
chord quality using the Fourier transform or voice leading.19 In particular, the issue is
whether the Fourier components model the musical intuitions we want to model: as
we have seen, the Fourier transform requires us to measure a chords harmonic
quality in terms of its distance from all the subsets of the perfectly even n-note
chord. But we might sometimes wish to employ a different set of harmonic
prototypes. For instance, Figure 14 uses a chords distance from the augmented triad
to measure the trichordal set classes augmentedness. Unlike Fourier analysis, this
purely voice-leading-based method does not consider the triple unison or doubled
major third to be particularly augmented-like; hence, set classes like {0, 1, 4} do
not score particularly highly on this index of augmentedness. This example
dramatizes the fact that, when using voice leading, we are free to choose any set of
harmonic prototypes, rather than accepting those the Fourier transform imposes on us.
Fig. 14. The mathematics of the Fourier transform requires that we conceive of chord quality
in terms of the distance to all subsets of the perfectly even n-note chord (left). Purely voiceleading-based conceptions instead allow us to choose our harmonic prototypes freely (right).
Thus we can voice leading to model a chords augmentedness in terms of its distance from
the augmented triad, but not the tripled unison {0, 0, 0} or the doubled major third {0, 0, 4}.
18
It would be possible, though beyond the scope of this paper, to calculate this correlation
analytically. It is also possible to use statistical methods for higher-cardinality chords. A
large collection of randomly generated 24- and 100-tone chords in continuous space
produced correlations of .95 and .94, respectively.
19
See Robinson 2006 and Straus 2007 for related discussion.
5 Conclusion
The approximate consistency between our three models is in one sense good news:
since they are closely related, it may not matter muchat least in practical terms
which we choose. We can perhaps use a tuning lattice such as the Tonnetz to
represent voice-leading relationships, as long as we are interested in gross contrasts
(near vs. far) rather than fine quantitative differences (3 steps away vs. 2 steps
away). Similarly, we can perhaps use voice-leading spaces to approximate the
results of the Fourier analysis, as long as we are interested in modeling generic
harmonic intuitions (very fifthy vs. not very fifthy) rather than exploring very
fine differences among Fourier magnitudes.
However, if we want to be more principled, then we need to be more careful. The
resemblances among our models mean that it might be possible to inadvertently use
one sort of structure to discuss properties that are more directly modeled by another.
And indeed, the recent history of music theory displays some fascinating (and very
fruitful) imprecision about this issue. It is striking that Douthett and Steinbach, who
first described several of the lattices found in the center of the voice-leading
orbifoldsincluding Figure 6explicitly presented their work as generalizing the
familiar Tonnetz.20 Their lattices, rather than depicting parsimonious voice leading
among major and minor triads, displayed single-semitone voice leadings among a
wide range of sonorities; and as a result of this seemingly small difference, they
created models in which all distances can be interpreted as representing voice-leading
size. However, this difference only became apparent after it was understood how to
embed their discrete structures in the continuous geometrical figures described at the
beginning of this paper. Thus the continuous voice-leading spaces evolved out of the
Tonnetz, by way of Douthett and Steinbachs discrete lattices, even though the
structures now appear to be fundamentally different. Related points could be made
about Quinns quality space, whose connection to the voice-leading spaces took
several yearsand the work of several authorsto clarify.
There is, of course, nothing wrong with this: knowledge progresses slowly and
fitfully. But the preceding investigation suggests that it we may need to be precise
about which model is appropriate for which music-theoretical purpose. I have tried to
show that the issues here are complicated and subtle: the mere fact that tonal pieces
modulate by fifth does not, for example, require us to use a tuning lattice in which
fifths are smaller than semitones. Likewise, there may be close connections between
voice-leading spaces and the Fourier transform, even though the latter associates Zrelated chords while the former does not. The present paper can thus be considered a
down-payment toward a more extended inquiry, one that attempts to determine the
relative strengths and weaknesses of our three similar-yet-different conceptions of
musical distance.
20
The same is true of Tymoczko 2004, which uses the term generalized Tonnetz to describe
another set of lattices appearing in the voice-leading spaces.
References
Callender, Clifton. 2004. Continuous Transformations. Music Theory Online 10.3.
. 2007. Continuous Harmonic Spaces. Unpublished.
Callender, Clifton, Quinn, Ian, and Tymoczko, Dmitri. 2008. Generalized VoiceLeading Spaces. Science 320: 346-348.
Cohn, Richard. 1991. Properties and Generability of Transpositionally Invariant
Sets. Journal of Music Theory 35: 1-32.
.1996. Maximally Smooth Cycles, Hexatonic Systems, and the Analysis of
Late-Romantic Triadic Progressions. Music Analysis 15.1: 9-40.
. 1997. Neo-Riemannian Operations, Parsimonious Trichords, and their
Tonnetz Representations, Journal of Music Theory 41.1: 1-66.
. 1998. Introduction to Neo-Riemannian Theory: A Survey and a Historical
Perspective, Journal of Music Theory 42.2: 167-180.
. 1999. As Wonderful as Star Clusters: Instruments for Gazing at Tonality in
Schubert, Nineteenth-Century Music 22.3: 213-232.
Douthett, Jack and Steinbach, Peter. 1998. Parsimonious Graphs: a Study in
Parsimony, Contextual Transformations, and Modes of Limited Transposition.
Journal of Music Theory 42.2: 241-263.
Hall, Rachel and Tymoczko, Dmitri. 2007. Poverty and polyphony: a connection
between music and economics. In Bridges: Mathematical Connections in Art,
Music, and Science. R. Sarhanghi, ed., Donostia, Spain.
Hoffman, Justin. 2007. On Pitch-class set cartography. Unpublished.
Lewin, David. 1959. Re: Intervallic Relations between Two Collections of Notes.
Journal of Music Theory 3: 298-301.
. 2001. Special Cases of the Interval Function between Pitch-Class Sets X and
Y. Journal of Music Theory 45: 1-29.
Quinn, Ian. 2006. General Equal Tempered Harmony (Introduction and Part I).
Perspectives of New Music 44.2: 114-158.
. 2007. General Equal-Tempered Harmony (Parts II and III). Perspectives of
New Music 45.1: 4-63.
Robinson, Thomas. 2006. The End of Similarity? Semitonal Offset as Similarity
Measure. Paper presented at the annual meeting of the Music Theory
Society of New York State, Saratoga Springs, NY.
Straus, Joseph. 2003. Uniformity, Balance, and Smoothness in Atonal Voice
Leading. Music Theory Spectrum 25.2: 305-352.
. 2007. Voice leading in set-class space. Journal of Music Theory 49.1: 45108.
Tymoczko, Dmitri. 2004. Scale Networks in Debussy. Journal of Music Theory
48.2: 215-292.
. 2006. The Geometry of Musical Chords. Science 313: 72-74.
. 2008. Scale Theory, Serial Theory, and Voice Leading. Music Analysis
27.1: 1-49.