Thesis MacRitchie
Thesis MacRitchie
Thesis MacRitchie
net/publication/49188425
CITATIONS READS
14 1,004
1 author:
Jennifer Macritchie
The University of Sheffield
51 PUBLICATIONS 380 CITATIONS
SEE PROFILE
All content following this page was uploaded by Jennifer Macritchie on 29 July 2015.
Jennifer MacRitchie
16th February 2011
Department of Electronics
& Electrical Engineering
University of Glasgow
c Jennifer MacRitchie, 2010
Abstract
The differences between a musical score and an instance of that music in a per-
formance, communicates a performer’s view of the information contained in that
score.
The main hypothesis in this thesis is that by measuring quantifiable param-
eters such as tempo, dynamics and motion from live performance, performer’s
interpretation of musical structure can be detected. This will be tested for pieces
for which the structure is explicit and obvious, and then used to discover musical
structure from looking at patterns of aural and visual performance parameters in
performances of more ambiguously structured pieces.
This thesis is in two strands. The first part covers the acquisition of multi-
modal parameters in piano performance. This will explore current technologies
in acquiring MIDI information such as accurate onset timings and key velocities
as well as motion tracking systems for measuring general body movements. A
new cheap, portable and accurate system for tracking the intricacies of pianists’
finger movement is described as well as methods and tools available for analysis
and visualisation of musical data. The second strand of this thesis will explore
uses of these capture systems in empirically measuring performance parameters
to elucidate musical structure. Two experiments follow which test the hypothe-
sis of detecting musical structure from parameters such as tempo, dynamics and
movement, before using these patterns as a basis for discovering structure in per-
formances of the finale of Chopin’s B flat minor sonata.
Body movement is discovered as an indicator of phrasing boundaries, which
when combined with the measured aural parameters provides interpretations of
the performed music. Phrasing boundaries are identified correctly for the control
piece (Chopin’s Prelude in A major Op.28, No.7) and consequently for the first
test piece (Chopin’s Prelude in B minor Op.28 No.6). The proceeding experiment
identifies performers’ style of phrase endings through performances of the control
piece and tests them against patterns found in the second test piece (Chopin’s B
Flat minor Sonata Finale). Five out of the six performers confirm the musicological
hypothesis that bar 5 is not the entry of a new theme but the continuation of the
the theme beginning in bar 1.
1
Acknowledgements
I would like to thank my supervisors Dr Nick Bailey and Prof Graham Hair and
Dr John Williamson for their invaluable assistance, guidance and support. Spe-
cial thanks also go to Stuart Pullinger and Douglas McGilvray, whose specially
designed tools and software were pivotal in conducting this research and whose
help in the design of these experiments was crucial. Special thanks also to Bry-
ony Buck, who provided an opportunity for us both to conduct some truly inter-
disciplinary research and also to Tom O’Hara for his technical assistance over the
years. I would also like to thank my colleagues at the Centre for Music Technol-
ogy, Bill Evans, Ben Hillman and Graham Percival for their help. This research has
been EPSRC funded.
To Annie, Doug and Elisa for their endless coffees and support as well as a dif-
ferent perspective on musical analysis, to Lynsey and Maria for lunches of reprieve
and finally to my close friends outwith the department and my family without
whose support, none of this would have been possible.
“I can do all things through Christ who strengthens me.” Philippians 4:13
2
“The function of music is to release us from the tyranny of conscious thought.”
Sir Thomas Beecham (1879-1961)
3
Contents
1 Introduction 15
1.1 The performer as analyst . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Overview of capturing and storage technologies . . . . . . . . . . . 21
1.4 Summary of chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 Performance Analysis 25
2.1 Music Performance Theory . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Performance Analysis Studies . . . . . . . . . . . . . . . . . . . . . . 28
2.3 Music and Movement . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Research into Gesture . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4
3.4.1 Storing Musical Data . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.2 Storing Gestural Information . . . . . . . . . . . . . . . . . . 50
3.5 Visualising Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.1 Performance Worm . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5.2 Sonic Visualiser . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5.3 Summarising Video . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.4 Motion History Key-frame Displays . . . . . . . . . . . . . . 53
3.5.5 Motiongrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.6 Visualisation with the score . . . . . . . . . . . . . . . . . . . 54
4 FingerDance 57
4.1 System Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Marker Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Occlusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 3D estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.6 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.7 System Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.8 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 Musical Stimuli 86
7.1 Chopin’s Prelude in A major op.28 No.7 . . . . . . . . . . . . . . . . 86
5
7.2 Chopin’s Prelude in B minor op.28 No.6 . . . . . . . . . . . . . . . . 89
7.3 Chopin’s B Flat Minor Sonata op.35 Finale Movement . . . . . . . . 92
10 Discussion 191
6
List of Figures
7
7.3 Analysis of Chopin’s B minor Prelude Op.28 No.6 . . . . . . . . . . 90
7.4 Bisesi and Parncutt’s Accent Analysis of Chopin’s B minor Prelude
op.28 No.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.5 Score of Chopin’s B flat minor Sonata . . . . . . . . . . . . . . . . . 93
8
8.30 Various Raw Marker Data Plotted Against Phrase Boundaries for
Performer 1, Prelude 6 . . . . . . . . . . . . . . . . . . . . . . . . . . 121
8.31 Various Raw Marker Data Plotted Against Phrase Boundaries for
Performer 2, Prelude 6 . . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.32 Various Raw Marker Data Plotted Against Phrase Boundaries for
Performer 3, Prelude 6 . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.33 Motion, Tempo and Dynamics for Performer 1, Prelude 7 . . . . . . 127
8.34 Motion, Tempo and Dynamics for Performer 2, Prelude 7 . . . . . . 128
8.35 Motion, Tempo and Dynamics for Performer 3, Prelude 7 . . . . . . 129
8.36 Box-plots for all Nine Performers, Prelude 7 . . . . . . . . . . . . . . 130
8.37 Scatter Plot for Performer 1, Prelude 7 . . . . . . . . . . . . . . . . . 131
8.38 Annotated Score for Performer 1, Prelude 7 . . . . . . . . . . . . . . 132
8.39 Scatter Plot for Performer 2, Prelude 7 . . . . . . . . . . . . . . . . . 133
8.40 Annotated Score for Performer 2, Prelude 7 . . . . . . . . . . . . . . 134
8.41 Scatter Plot for Performer 3, Prelude 7 . . . . . . . . . . . . . . . . . 135
8.42 Annotated Score for Performer 3, Prelude 7 . . . . . . . . . . . . . . 135
8.43 Motion, Tempo and Dynamics for Performer 1, Prelude 6 . . . . . . 136
8.44 Motion, Tempo and Dynamics for Performer 2, Prelude 6 . . . . . . 137
8.45 Motion, Tempo and Dynamics for Performer 3, Prelude 6 . . . . . . 137
8.46 Box-plots for all Nine Performers, Prelude 6 . . . . . . . . . . . . . . 139
8.47 Scatter Plot for Performer 1, Prelude 6 . . . . . . . . . . . . . . . . . 140
8.48 Annotated Score for Performer 1, Prelude 6 . . . . . . . . . . . . . . 141
8.49 Scatter Plot for Performer 2, Prelude 6 . . . . . . . . . . . . . . . . . 142
8.50 Annotated Score for Performer 2, Prelude 6 . . . . . . . . . . . . . . 143
8.51 Scatter Plot for Performer 3, Prelude 6 . . . . . . . . . . . . . . . . . 144
8.52 Annotated Score for Performer 3, Prelude 6 . . . . . . . . . . . . . . 145
9.1 Wrist Motion, Tempo and Dynamics for Martin Jones, Prelude in A
Major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
9.2 Thumb Motion, Tempo and Dynamics for Martin Jones, Prelude in
A Major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
9.3 Database Results for Martin Jones, Prelude in A Major, Page 1 . . . 157
9.4 Database Results for Martin Jones, Prelude in A Major, Page 2 . . . 158
9.5 Wrist Motion, Tempo and Dynamics for Martin Jones performing
the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
9.6 Thumb Motion, Tempo and Dynamics for Martin Jones performing
the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
9
9.7 Database Results for Martin Jones, B Flat minor Sonata finale Page 1 161
9.8 Database Results for Martin Jones, B Flat minor Sonata finale Page 2 162
9.9 Wrist Motion, Tempo and Dynamics for Jessica Chan, Prelude in
A Major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
9.10 Thumb Motion, Tempo and Dynamics for Jessica Chan, Prelude in
A Major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
9.11 Wrist Motion, Tempo and Dynamics for Jessica Chan, performing
the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
9.12 Thumb Motion, Tempo and Dynamics for Jessica Chan, performing
the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9.13 Database Results for Jessica Chan, B Flat minor Sonata finale Page 1 167
9.14 Database Results for Jessica Chan, B Flat minor Sonata finale Page 2 168
9.15 Wrist Motion, Tempo and Dynamics for Lauren Hibberd, Prelude in
A Major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
9.16 Thumb Motion, Tempo and Dynamics for Lauren Hibberd, Prelude
in A Major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.17 Wrist Motion, Tempo and Dynamics for Lauren Hibberd, perform-
ing the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.18 Thumb Motion, Tempo and Dynamics for Lauren Hibberd, perform-
ing the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
9.19 Database Results for Lauren Hibberd, B Flat minor Sonata finale
Page 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
9.20 Database Results for Lauren Hibberd, B Flat minor Sonata finale
Page 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.21 Box-plots for all Six Performers, performing Chopin’s Prelude in A
major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
9.22 Box-plots for all Six Performers, performing Chopin’s B flat minor
finale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
9.23 Scatter Plot for Martin Jones performing the Chopin finale . . . . . 179
9.24 Annotated Score for Martin Jones performing the Chopin finale . . 180
9.25 Scatter Plot for Jessica Chan performing the Chopin finale . . . . . 181
9.26 Annotated Score for Jessica Chan performing the Chopin finale . . 182
9.27 Scatter Plot for Lauren Hibberd performing the Chopin finale . . . 183
9.28 Annotated Score for Lauren Hibberd performing the Chopin finale 184
9.29 Finger Curvature, Tempo and Dynamics for Jessica Chan, perform-
ing the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
10
9.30 Finger Curvature, Tempo and Dynamics for Martin Jones, perform-
ing the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
9.31 Finger Curvature, Tempo and Dynamics for Lauren Hibberd, per-
forming the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . 187
11.1 Loadings for the First Six Principal Components, Performer 1, Pre-
lude 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
11.2 Loadings for the First Six Principal Components, Performer 2, Pre-
lude 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
11.3 Loadings for the First Six Principal Components, Performer 3, Pre-
lude 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
11.4 Loadings for the First Six Principal Components, Performer 1, Pre-
lude 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
11.5 Loadings for the First Six Principal Components, Performer 2, Pre-
lude 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
11.6 Loadings for the First Six Principal Components, Performer 3, Pre-
lude 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
11.7 Weighted Principal Components for Performer 4, Prelude 7 . . . . . 219
11.8 Weighted Principal Components for Performer 5, Prelude 7 . . . . . 220
11.9 Weighted Principal Components for Performer 6, Prelude 7 . . . . . 220
11.10Weighted Principal Components for Performer 7, Prelude 7 . . . . . 221
11.11Weighted Principal Components for Performer 8, Prelude 7 . . . . . 221
11.12Weighted Principal Components for Performer 9, Prelude 7 . . . . . 222
11.13Weighted Principal Components for Performer 4, Prelude 6 . . . . . 222
11.14Weighted Principal Components for Performer 5, Prelude 6 . . . . . 223
11.15Weighted Principal Components for Performer 6, Prelude 6 . . . . . 223
11.16Weighted Principal Components for Performer 7, Prelude 6 . . . . . 224
11.17Weighted Principal Components for Performer 8, Prelude 6 . . . . . 224
11.18Weighted Principal Components for Performer 9, Prelude 6 . . . . . 225
11.19Motion, Tempo and Dynamics for Performer 4 , Prelude 7 . . . . . . 226
11.20Motion, Tempo and Dynamics for Performer 5 , Prelude 7 . . . . . . 227
11.21Motion, Tempo and Dynamics for Performer 6 , Prelude 7 . . . . . . 227
11.22Motion, Tempo and Dynamics for Performer 7 , Prelude 7 . . . . . . 228
11.23Motion, Tempo and Dynamics for Performer 8 , Prelude 7 . . . . . . 228
11.24Motion, Tempo and Dynamics for Performer 9 , Prelude 7 . . . . . . 229
11.25Motion, Tempo and Dynamics for Performer 4 , Prelude 6 . . . . . . 229
11.26Motion, Tempo and Dynamics for Performer 5 , Prelude 6 . . . . . . 230
11
11.27Motion, Tempo and Dynamics for Performer 6 , Prelude 6 . . . . . . 230
11.28Motion, Tempo and Dynamics for Performer 7 , Prelude 6 . . . . . . 231
11.29Motion, Tempo and Dynamics for Performer 8 , Prelude 6 . . . . . . 231
11.30Motion, Tempo and Dynamics for Performer 9 , Prelude 6 . . . . . . 232
11.31Wrist Motion, Tempo and Dynamics for Carlisle Frank, Prelude in
A Major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
11.32Thumb Motion, Tempo and Dynamics for Carlisle Frank, Prelude in
A Major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
11.33Wrist Motion, Tempo and Dynamics for Carlisle Frank, performing
the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
11.34Thumb Motion, Tempo and Dynamics for Carlisle Frank, perform-
ing the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
11.35Wrist Motion, Tempo and Dynamics for Fali Pavri, Prelude in A Major237
11.36Thumb Motion, Tempo and Dynamics for Fali Pavri, Prelude in
A Major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
11.37Wrist Motion, Tempo and Dynamics for Fali Pavri, performing the
Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
11.38Thumb Motion, Tempo and Dynamics for FPavri, performing the
Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
11.39Wrist Motion, Tempo and Dynamics for Simon Coverdale, Prelude
in A Major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
11.40Thumb Motion, Tempo and Dynamics for Simon Coverdale, Pre-
lude in A Major . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
11.41Wrist Motion, Tempo and Dynamics for Simon Coverdale, perform-
ing the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
11.42Thumb Motion, Tempo and Dynamics for Simon Coverdale, per-
forming the Chopin finale . . . . . . . . . . . . . . . . . . . . . . . . 244
11.43Database Results for Carlisle Frank, B Flat minor Sonata finale Page 1 246
11.44Database Results for Carlisle Frank, B Flat minor Sonata finale Page 2 247
11.45Database Results for Fali Pavri, B Flat minor Sonata finale Page 1 . 248
11.46Database Results for Fali Pavri, B Flat minor Sonata finale Page 2 . 249
11.47Database Results for Simon Coverdale, B Flat minor Sonata finale
Page 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
11.48Database Results for Simon Coverdale, B Flat minor Sonata finale
Page 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
12
List of Tables
13
List of Algorithms
14
Chapter 1
Introduction
15
about what is to come in the following music. The opening motif as seen in Fig-
ure 1.1 is repeated and changed several times throughout the first movement. This
is information that can be seen by just looking at the score. However, there ex-
ist many different performances of this symphony by several different orchestras.
Conductors can spend many hours in rehearsals focusing on the opening bars,
changing the stress of the rhythm, the tempo, the dynamics, the balance of instru-
ments and many other parameters as the performance of these opening bars sets
the tone for the entire performance. This suggests that performance nuances carry
certain information about the music being performed. However, as performers can
use many varied expressive features to express essentially the same structural fea-
ture, this means the relationship from performance to analysis of a piece of music
is not always straightforward. Referring back to performances of the Beethoven
symphony, these opening few bars can be varied quite entirely across orchestras
and conductors depending on their own personal interpretation.
16
So by measuring the quantifiable differences between score and performance
in the use of parameters such as tempo and dynamics, the structural information
being communicated through the performance could essentially be measured. The
question then arises as to whether audio recordings can be used to measure these
communicated features accurately as opposed to a live performance.
Despite most music being listened to on mp3 players and i-pods in recent times,
live performances of music regardless of genre are still widely in demand and well
attended. Reasons for this have been documented in a survey of listeners’ habits
with the results showing that audiences prefer live performances when possible.
The most popular answers were ’atmosphere’ or ’ambiance’ whilst in second place
was the response ’communicating with performers’ [120]. Research exploring the
visual element of music suggests that performers’ physical movements have an
impact on what is communicated to the audience [123]. Classical pianist Glenn
Gould was renowned for his strange posture and erratic movements during per-
formances, both in front of audiences and in more secluded environments, sug-
gesting that his gestures were not simply used for communicative purposes but
contained an entirely expressive purpose related to the music [38]. In other mu-
sical genres, for example in jazz performances, musicians’ movements are related
to a wide variety of musical causes such as ’groove’, classified as relating to the
beat of the music. Classification of meaningful movements i.e. physical gestures
and exploring current research in the area is discussed in Chapter 2.3. Another
example of the link between music and gestures lies with Bobby McFerin, now a
famous improvising beat-boxer, who uses his voice to emulate a number of differ-
ent instruments when he performs. When watching videos of his performances
you can see the movements he makes relate to the strumming of a guitar, or the
movement of his fingers on the microphone look like a trumpet player pressing
down valves. Recently at the World Science Festival 2009 as part of the Notes
and Neurons talks [9], he made the audience intuitively sing notes of a penta-
tonic scale by simply jumping at different points on the stage. This link between
movement and music is also explored in Chapter 2.3. Seeing as visual movements
can be important in conveying the meaning behind the music being performed,
when measuring quantifiable parameters to detect the performer’s interpretation,
motion is a factor which should be included, particularly when examining a live
performance.
17
1.1 The performer as analyst
In Western classical music, the performer provides the medium through which
the composer’s ideas can be conveyed to a live audience. A performance therefore
requires a demonstration of understanding of the piece by the performer. Whilst
learning new pieces, performers will refer to previous recordings as well as look-
ing at the score and various existing analyses, to determine the function of each
section and how best to convey what is going on in the piece. Throughout practice
sessions, this ’analysis’ of the piece by the performer will be refined by experi-
menting with different sounds and different uses of dynamics and tempi [104].
The idea that a performer acts as an analyst in this situation is echoed by Cook,
Lester and Barolsky [29, 71, 14].
Berry states that
so many ’correct’ or ’authentic’ performances can exist even though they may
be completely different from each other. Based on this, it becomes interesting to
look at places where performers agree in their interpretations of a given piece and
equally interesting to examine the places where performers disagree or diverge.
Performance traditions, or places where performers agree on certain aspects of
the music can change over time. Bach is played in a completely different fashion to
the way it was over a hundred years ago. Performance interpretations can change
completely from generation to generation despite the notated music remaining the
same. This is different for music that is not based on notated scores e.g. some folk
music is passed down aurally, and although the structure remains the same the
notes and rhythms can be entirely different. For Western classical music, the score
becomes useful as a starting point for each performer as the differences between
interpretations and the notated score can be examined. Although it is not an en-
tirely explicit document, the score contains information on structure and arguably
emotive qualities of the music [119].
1.2 Aims
Concerning performers as analysts, the work discussed in this thesis will be fo-
cussed on the main research question
18
• Can musical phrasing structure be detected from multi-modal performance
parameters?
Aim 1: To design capture systems, storage and visualisation formats that allow
accurate and robust methods of recording live performances and display the
results useful to musicological analysis
Aim 2: To determine whether structure can be deduced from the empirical anal-
ysis of multi-modal performance parameters
(a) Designing acquisition systems for piano performance that record as much in-
formation as possible from a performance, are relatively un-intrusive and pro-
vide comfortable surroundings for the performer so they can accurately recre-
ate a typical concert-setting performance.
19
(b) Using these different types of systems in experiments which analyse the rela-
tionship between body movement and phrasing structure. This will be achieved
by recording pianists performing Chopin preludes with differing structural
layouts.
(c) Analysing these gestural cues in conjunction with audible parameters such as
tempo and dynamics again in relation to the phrasing structure of the pieces.
20
that can be recorded in a performance and so to avoid an overload in data process-
ing, we must choose which data is necessary to record and how. This influences
which systems are used in the overall design.
21
1.4 Summary of chapters
To clarify how these two main areas of research will be addressed in this the-
sis,there are three main parts: Background, Developing multi-modal capture tools
and systems, and Experiments and Results.
22
Experiments and Results
Chapter 7 analyses the music being used as stimuli for the experiments in terms of
their phrasing structure. The music for the first experiment is chosen specifically
to test whether explicit structures can be detected through performance analysis
and the second set of music tests for being able to discover structure from more
ambiguous pieces.
Chapter 8 outlines the experiment for detecting musical structure. Results are
analysed in terms of relating body movement to musical structure and then look-
ing at multi-modal cues for phrasing boundaries.
Chapter 9 outlines the experiment for discovering musical structure. Results
are analysed in terms of low-level parameters like inter-onset intervals, keypress
durations, finger curvature and sound amplitude in relation to accents and phras-
ing.
Chapter 10 presents a discussion of results along with recommendations for
further work both in designing performance recording systems as well as analysing
performance data. The main conclusions of this work are presented in Chapter 11.
23
Part I
24
Chapter 2
Performance Analysis
Performance analysis techniques are used for a variety of different purposes. These
can be large scale studies of parameters, such as examining expressive timing
across several performances of a particular piece to make comments on the gen-
eral usage of tempo fluctuations for expressive purposes. Other studies involve
analysing particular interpretations of a composition to determine how one per-
former has created this interpretation by manipulating factors such as timing, dy-
namics, articulation and timbre.
This next section will cover uses of performance analysis by examining vari-
ous pieces of theoretical and empirical research, identifying the requirements and
considerations necessary for a system designed to discover musical structure from
performance data.
25
many of these points are.
In cases of ambiguity, the comparison of different performances is crucial in
order to identify not just the correspondences between points of change in differ-
ent performances but the extent and limits of the degrees of change. Identifying
the manipulation of performance nuances across several performances of the se-
lected Chopin pieces in Chapter 7 will provide the necessary tools to highlight the
communication of structural boundaries in more ambiguous compositions.
In these instances, using computers to complement musical analysis has been
the next logical step. These methods do not attempt to implement the processes of
a traditional music analyst, but are used to assist and in some cases, extend existing
analyses. Lindstedt’s work on computer-assisted analysis of the finale of Chopin’s
B Flat minor sonata [74] using score-processing program Humdrum [60], searched
for melodic and harmonic patterns in an attempt to clarify structural form. Lindst-
edt considers formal analyses such as those by Rosen [106], Tuchowski, Kholopov
and Leichentrett [122] which diverge widely in their views of the function of the
first four bars. Some place these bars as an introduction to a theme beginning at
bar 5 whilst for other readings, the initial theme begins at bar 1. As the results
of the computer analysis disclose only a general indication of the form, and no
more detail than the formal analyses discussed previously, Lindstedt suggests that
a thorough analysis of the musical structure may be acquired through combining
the score analysis with performance analysis.
The proposed research aims to do just this, by comparing traditional analy-
ses to the suggested analyses provided by measurement of certain performance
parameters. Combinations of tempo, loudness and movement will supply a po-
tential segmentation of each piece performed, as is required as an initial step in
traditional analyses.
It is suggested [29, 71, 14] that just as musical analysis informs performance,
a performer acts as a musical analyst. The performer’s “analysis” occurs during
practice [104], where each part of the music is re-considered and re-shaped as the
performer’s appreciation of each cadence in the context of the whole composition
develops. This suggests that the analysis of performance information can empha-
sise higher-level compositional issues that may not be obvious through traditional
analysis methods. What is interesting in performance analysis is the deviations or
differences between the notated score and the actual performances. Early research
found that performers did not reproduce the notations on the score mechanically
but that there was a deliberate manipulation of timing and dynamics added to
26
what was explicitly written [108]. These were found not to be completely unrelated
to the score but instead appeared to emphasise certain points. Todd [121] provided
theoretical support describing a model of expressive timing which linked expres-
sive devices such as rubato to key features of the musical structure e.g. cadences.
As well as structural information, scores can contain implied emotions or moods
as suggested by Thompson and Robitaille [119]. This research suggests that it is
not just deviations from the score that should be considered, as this is expected
from a human performance, but the similarities and differences between several
performances.
Performance in itself then does contain a mixture of structural information and
implied information about moods evident in the music. These parameters are not
entirely separable, just as musical parameters such as pitch, rhythm, timing etc.
should be considered as interacting and not entirely separable as Clark [26] states:
So in performance analysis, the context (being the score) must be examined when
considering the audio, the audio when considering the visual and the performer’s
views on structure when considering their analysed interpretation.
Palmer’s review on music performance research [86] expands on Kendall and
Carterette’s model of performance which encompasses the coding of the com-
poser’s ideas (the score), the recoding of these ideas by the performer (the interpre-
tation) and the decoding of these ideas by the audience. The score can represent
pitch and duration quite explicitly but information on structure, such as groupings
is only implied and instruction as to precise articulation is often virtually absent.
These ambiguities allow the performer a certain amount of interpretative freedom
and this interpretation includes the performer’s ideas on the musical composition.
The encoding part of this communication process is modelled for the performer
by involving the production of audio and visual cues from the origins of the no-
tated score. This can be seen in Figure 2.1. The performer uses movement to play
the instrument and produce these sounds. Feedback is used by the performer to
constantly monitor what is being produced in terms of audio and to an extent
visual content.
Parncutt believes that expression in piano performance can be explained by
immanent accents present in the score [90]resulting in performed accents such as
27
Audio
Score Performer
Movement
expressive differences in timing and dynamic stress. He labels certain accents be-
longing to time such as grouping and metrical accents as well as ones dependent
on pitch e.g. melodic or harmonic accents and reductional accents which fall along
the lines of Schenkerian reductions of the score. This lower level accent structure
is something that will be investigated after detecting higher level phrasing struc-
tures, or as Parncutt defines them, grouping accents. Drake and Palmer investi-
gated the interaction and independence of these accents in the presence of other
accents [42]. Rhythmic and grouping accents remain constant whereas melodic
accents tend to change in the presence of other accents. These represent the low-
est level of a hierarchical structure [70]. Taking the hierarchical importance of the
phrase as a factor, the relationship between expressive timing and musical struc-
ture has been documented such that the amount of rubato used reflects the hier-
archical importance of the phrasing boundary. From this we expect that the more
important the boundary e.g. the most important being the end of the piece itself,
the larger the rubato will be. This phrase-final lengthening [121] is an example
of how mid-level parameters such as tempo can provide clues as to the structure
of the music. Establishing that theoretically, the score implies certain expressive-
ness by the performer, I aim to examine how we can use performance parameters
resulting from the expressive interpretation to locate or suggest structure.
28
different passages of interest. He says of the expressive performance parameters:
[27]. This demonstrates how structural context is extremely important when con-
sidering the different performance parameters across various performances.
Another example of examining in-depth a single interpretation of a piece is
the study of Martha Argerich’s distinct performance of Chopin’s E minor prelude
op.28 no.4 [109]. Senn studied the initial four bars of the piece, attempting to dis-
cover which structural features in the score inspired Argerich’s particular interpre-
tation. A particular point of interest is at the end of the first four bar phrase, where
one would expect a traditional ritardando, Argerich instead produces a mid-bar
ritardando and then gains speed at the end of the bar. This is explained as instead
of the last note belonging to the first phrase, it instead marks the beginning of the
next phrase, hence the acceleration. This is one example of using performance data
in an effort to provide a segmentation of the score. However, while much infor-
mation can be gleaned from single interpretations, it is also necessary to examine
large numbers of performers to suggest patterns of timing or dynamics in relation
to structure.
An example of larger scale studies involving a number of performers comes
from Repp’s analysis of expressive timing patterns in graduate piano performances
of Schumann’s Traumerei [98]. This study compared those patterns of students to
previously collected timing patterns of professional performers. The patterns were
largely comparable across the two groups, however, principal components analy-
sis showed the student timing patterns to be largely undeviating from each other
whereas the professionals had the more divergent patterns of expressive timing.
Timing profiles across the group were largely repeatable on repeated recordings
when performers were asked for the same interpretation each time. That the stu-
dents played with remarkably similar timing profiles as the experts is interesting.
Despite differences between pianists’ profiles suggesting that individuality plays a
part in each performance, it is proposed that there is also a high similarity between
performances. Other conclusions from the timing data concern the accelerations
in the lead up to the melodic peak in each phrase, which are noted as sharing a
certain parabolic fit to the shaping of each melodic gesture.
29
In a similar study, this time with performances of a Debussy Prelude [99], simi-
lar results were found on the whole, suggesting that the similarity between student
and expert pianists’ timing profiles is the result of trained musicians being able to
easily interpret the timing implied by the structure of the notated music. Repp’s
studies argue that when evaluating expressive timing, it is not the absolute devia-
tion from the score that should be considered but the deviation from a performance
norm. Some amount of expressive timing and dynamics will always be expected
in any performance. Points of agreement and departure between the timing of
individual performances would therefore be more interesting to examine.
Repp’s extensive study of over 100 audio recordings of performances of Chopin’s
Etude in E major examined expressive timing and dynamics respectively in the
initial measures of the piece [102, 100]. Principal components analysis was used
in both cases to determine timing strategies and profiles of expressive dynamics.
Repp discovered that although there is the infinite potential for different perfor-
mances, actual performances tend to be realised within constraints of what is accu-
rate or authentic for the piece. Sampling such a large number of performances, it
was found that within these limits, clusters of performances do not exist suggest-
ing that different timing profiles are not necessarily the result of different structural
interpretations. The produced principal components were therefore considered as
ways of expressing the same structural features through different timing profiles.
No significant relationship between timing and dynamics was found suggesting
that a greater level of freedom is found by performers when forming their expres-
sive shape of each phrase. The correlation between the grand average profiles of
timing and dynamics produced an unexpected positive correlation but this was
mainly due to the nature of the composition where the accompaniment is played
fast and softly. This is another case where Eric Clarke’s consideration of the struc-
ture of the music i.e. the context is particularly important. Correlating with just
the melodic notes, the negative correlation produced was extremely low and not
significant suggesting timing and dynamics are relatively independent. The main
conclusions from these studies implies that performers may have more freedom
in their use of dynamics than the use of expressive timing, as this is governed
by certain constraints in defining what is acceptable. The different uses of these
expressive parameters also implies that instead of different structural interpreta-
tions, these different profiles are ways of expressing the same structure.
Investigations concerning other keyboard instruments include Gingras and col-
leagues’ studies recording 16 organists performing a Bach fugue on a MIDI pre-
30
pared organ exploring the emphasis of phrasing through expressive timing [45].
The performers’ traditional analyses of the piece were also used as a point for
comparison. The largest measured tempo fluctuations coincided with the agreed
structural boundaries whilst others coincided with features that were not relative
to the phrasing subdivisions. Again a high similarity between timing profiles was
found. An interesting point arising from this study was the non-significant corre-
lation between the performers’ formal written analyses and the analyses resulting
from their timing profiles. The author acknowledges that this may be that the
written task encouraged the performers to note structural analysis as they had
been taught through formal music analysis classes instead of the phrasing they
performed in this particular piece. This study provides a point to note when col-
lecting performers’ ideas on phrasing segmentation as their written analyses may
not be exactly the same as what they perform.
The individuality of performers through different timing profiles can be mea-
sured by looking purely at the expressive timing data in studies such as those by
Grachten and Widmer [57]. By measuring the final ritardandi through inter-onset
interval deviations from a performance norm, a classifier determines whether the
residual data can supply clues to the performer identification in performer pairs.
This theme of identifying clues about performance from measured performance
data is extended to searching for clues about musical structure through patterns
in aural performance parameters. Examining repeated timing patterns in perfor-
mances of Chopin’s Etude Op.10 No.3 [112] through pattern matching and Func-
tional Data Analysis, Spiro et al.. suggest a number of motivations including struc-
tural and motivic features. However, they note that repetitions expected by look-
ing at the score are not necessarily echoed in a performance. Also, timing patterns
seem to be more salient when the performer uses a range of expressive timing dur-
ing the piece. Full phrasing reconstruction is attempted through pattern finding in
the tempo and loudness curves [56]. Repeated musical structures are searched for
in unsegmented data of audio recordings of Schumann’s Traumerei. Correlations
between tempo and dynamic values are used as a basis for the pattern finding al-
gorithm. Similar musical structures are identified with some success for this one
piece representing a first step in phrase reconstruction.
The final ritardando in performances of the same piece is examined in terms of
visualising the implied motion from expressive timing. First and second-order
phase-plane representations are used to visualise the changes in timing across
three performances. The segmentation of the final ritardando into three motifs
31
is clear from the curves in the plot. There exist many kinematic models of expres-
sive timing in performance [59, 43] which work on the basis of music (and tempo)
being closely related to motion. For a complete review on the studies involving
keyboard and other instruments, analysis of aural parameters and the study of
motor programs and kinematic models see [44, 86]. The role that motion plays in
a musical performance is examined next, explaining why this thesis looks at phys-
ical motion as a visual performance parameter, equivalent in informing studies on
musical structure as aural parameters of timing and dynamics.
32
ment can also be useful in terms of emphasising structure, particularly in gen-
res of music where notated forms of music are not common. An example of this
is African folk music where songs are traditionally passed down aurally. In this
case, the role of the body is emphasised, particularly in communication with other
performers and particularly highlighting rhythmic properties of the song.
The body can also be used as a way for performers and audiences to ‘feel’ the
music. Embodiment cognition theory, particularly when applied to music [69] con-
siders the full body as having an important role in the experience of music. This
falls into line with Alexander’s view of the mind and body being inseparably one
unit. Embodied music cognition regards both performer and perceiver as subjects
as audiences have been seen to respond through movement to the music being
performed [48] and highly associate sonorous objects with movement.
After determining various reasons why movement is produced in performance,
it is interesting to look at how this is manifested corporeally in musical examples.
Movement in classical piano performance appears to be completely personal and
there is a range of famous performers who incorporate different physical styles
when playing. Arthur Rubinstein is one example of a performer who plays with
such visible effortlessness and barely moves from the centre of the piano. Glen
Gould on the other hand has been characatured almost as an ogre over the piano,
hunched over the keys and moving around with vigour and energy. One question
to ask when considering physical gesture and its relationship to music is whether
different styles of movement can be attributed to the same musical feature, much
like the differences in performers’ use of parameters such as tempo and dynamics
can convey the same musical feature.
So with movement coming into the foreground of theories to do with how per-
formers play and audiences perceive music, examining motion in performance
becomes crucial when researching how performers encode information from the
score. The next section looks at current empirical studies involving motion and its
relationship to the audible parameters produced in performance.
33
of performance gestures of the pianist Glenn Gould, Delalande proposes a three
level structure of gestures ranging from functional to abstract[38]. The first level
are effective gestures, which are necessary for playing the instrument i.e. bowing,
blowing, pressing keys etc. Accompanist gestures are those movements which are
associated to effective gestures i.e. elbow and chest movements which are used
to help the performer articulate a particular sound. The final level is figurative
gestures which are visually perceived by the audience but seem to have no cor-
relation with the actual production of the sound. Existing gesture taxonomies for
music are based on this three-tiered structure [21]. Several classifications on ges-
ture are also listed in [62]. The definition of gesture used in this thesis alludes to
physical motions made by the performer that carry meaning. The research in this
thesis aims to explore how musical structure factors into these gestures, whether
this information is produced visibly in accompanying gestures.
Davidson and colleagues [35] have explored various purposes for physical ges-
ture in performance, mainly the communication between performers, conveying
personal issues. It was found that between performers, features such as accents are
used to communicate with each other and physical gestures provide the anticipa-
tion to these accents. This may be a reason as to why performers watch each other
for visual cues. Jane Ginsborg also investigated the use of gestures and move-
ments in the rehearsal of singer-pianist duos [46]. Gestures were used for keeping
time, coordinating entries and also highlighting particular expressive points. Fa-
miliarity between the duo and similar levels of expertise showed a wider range
of gestures being used than in unfamiliar or unbalanced partnerships. From the
many different functions and purposes gestures in performance may have, this
thesis focusses on those made in solo performance, eliminating the communica-
tive purpose between other performers. I aim to discover gestures in piano per-
formance from full body movements to intricate fingering details which provide
some information or link to expressive features of the music.
On studying expression in musical performance Eric Clarke states
[27].Clarke and Davidson’s study into movement in piano performance [25] iden-
tified different types of head movement in their relationship to the aural parame-
34
ters measured from the recorded MIDI. Although body sway was not clarified as
being directly related to phrasing structure, the authors acknowledged that nei-
ther is it random. Further exploring and quantifying these relationships between
movement and sound in performance through multi-modal recordings, Camurri
et al. recorded information from repeated performances of a Scriabin etude by a
professional pianist to see how movement, tempo and dynamics conveyed emo-
tional information [22]. Movement was measured using the EyesWeb software
(explained further in Section 3.3.5) in terms of openness or contractedness of the
performer’s posture. Ratings of emotional intensity by audience judges were also
gathered to assess the communication of this emotional intention. Correlations
between pairs of parameters were used to judge which agreed for each bar. The
results highlighted inter-onset intervals, key velocities, movement velocity and the
openness and contractedness of posture. This study highlighted specific param-
eters that may contain emotional information on the music, that is relayed effec-
tively to an audience. Jane Davidson’s extensive work on body movement in mu-
sical performance has also established that information about intent and structure
amongst other cues such as communication between performers, can be conveyed
from performer to audience [32]. Point-light displays of ’deadpan’, ’standard’ and
’exaggerated’ performances were presented to audience judges who were asked to
rate the level of intended expressivity. When varying the level of expressive intent
from ’deadpan’ through to ’expressive’, pianists changed the amplitude of their
movements suggesting a link between movement and expression. This was also
perceived accurately by audiences judging the expressive intent from videos of
the performances. Subjects who were not given the visual information performed
poorer than those with the visual stimuli suggesting that the presence of these vi-
sual gestures enable communication of information on intent accurately. It was
also discovered that performance intentions were more detectable from the upper
torso movements than those of the hands [33] implying that audiences use the full
body gesture to make their judgements rather than more localised gestures from
the hands. Further work used 2D tracking of such movements made during piano
performance to quantify the relationship between movement size and expression
[34]. Results showed that the more exaggerated the performance intention, the
more exaggerated the amplitude of movement. Other studies on the visual com-
munication of intent include [111, 31]. Establishing a link between performance
intent and performer gesture, we now look to see if more intricate details of struc-
ture can be contained in such movements and aim to quantify more deeply the
35
relationship between visual and aural gestures.
Wanderley states that although we are not entirely certain as to why accom-
panist gestures are performed, it is evident that they exist frequently in perfor-
mances [126] and are repeatable at the same points in the score across several
performances by the same performer. The three-level topology in this paper is
based again on Delalande’s theory as stated earlier. Performers played a selection
of pieces, including Stravinsky’s three pieces for clarinet and the first movement
of the Brahms 1st clarinet sonata in standard, expressive and immobilised perfor-
mances. The Optotrak system was used (see Section 3.3) to collect data from the
bell and mouthpiece of the clarinet as well as the performer’s head, arms and legs.
Further analysis of this movement data was conducted for the opening of a solo
clarinet piece by Stravinsky, that lacked certain rhythmic accents that may influ-
ence movement such as in the Brahms sonata[125]. Movement data analysis was
influenced by recordings taken from a digital video camera and was calculated and
registered as a Total Amount of Movement value by using frame by frame subtrac-
tion. These results were time-warped to allow comparison across performers. An
interesting result from this research was the influence of performer movement on
keeping rhythm and timing which led to hypotheses about the role of continual
movement in phrasing/musical motion. However, it was also clear that move-
ments became restricted in very fast, technical passages whilst increasing their
movements at easier passages. It was noted in particular that performers moved a
lot at phrasing boundaries. There were many different performer styles of move-
ment and although there were some similarities, there were significant differences
in what parts of the body they used to move. Some would sway their heads whilst
others moved their waist and shoulders. From observational analysis they con-
cluded that these movements were related to patterns of tension and release in
phrasing. Bell movements were not always related to phrasing but in this case
appeared to be more rhythmical. Other performers who hardly moved within a
phrase would use large movements to perform a phrase-end gesture. Other cor-
relations between movement and the musical properties of the score were inves-
tigated in Rodger [105] where performers were recorded through motion capture
and audio from different stages in learning a piece of music. Principal components
analysis was used to analyse body motion and this movement was correlated with
both melodic contour and dynamics. Results found that the further through the
learning process of the piece, performers’ correlations between movement and
melody increased. This suggests that as the performers develop the interpreta-
36
tion of the music through practice, movement becomes a more integral part of
the performance, becoming more highly related to what is being played. These
studies of a few performers suggested some generalities about how clarinettists
use gesture. Different patterns of movement were found across the performers
although at phrase endings, most would perform some sort of phrase-end ges-
ture. Movements also appear to be correlated to the melodic contour of the piece
being performed. However, as these are instrumentalists who have the freedom
to move not only themselves but their instrument, it is interesting to analyse the
movements of the actual clarinet further. Bell movement in clarinet performance is
further explored regarding its relationship with rubato [89]. Intensity values taken
from the audio were found to be correlated with the melodic contour of the piece
(in this case it was the Adagio from Mozart’s A major clarinet concerto) and not
with bell height as might be expected. The bell movement however, was related to
sound properties and appeared to be related to phrasing.
From these extensive studies on clarinet performance it is inferred that move-
ment of both the full body and the movement of the clarinet is related to what is
being performed in cases of melody and phrasing. Movement in clarinet perfor-
mances has also been used to study the effect these gestures have on the perception
of certain aspects of the music. It has also been shown that visual information ap-
pears to aid perception of musical information from performances. An example of
this is seen in the work of Vines et al. [124]. This study used one of the clarinet per-
formances recorded for Wanderley’s experiments, showing them to thirty musi-
cally trained audience participants. Audience judges in a between subjects design
were shown different modes of presentation (audio-only, visual-only and audio-
visual) and asked to make real-time judgements on the phrasing structure and
emotional intensity of performances. Functional data analysis techniques were
used to examine the underlying factors changing over time. The combination of
aural and visual information appeared to be the most accurate when determining
phrasing and intensity. Further analysis on bodily gestures concluded that motion
sequences were approximately slightly longer than the duration of the musical
phrase being performed. They proposed that the contour of this movement over
time might also correlate with the phrasing contour. Further work into the cross-
modal interaction in perception used two different performances and repeated the
experiment [123]. Again they saw gestures extending the sense of phrasing for
participants. Also, the visual modality proved to be full of information relating to
the phrasing structure, as much as the auditory stream. They also found anticipa-
37
tory movements at the beginning of phrases which cued the beginning of a phrase
for the perceivers earlier than in the purely audio presentations. This effect is also
true for co-articulation gestures in speech which precede sounds [83]. Extending
the analysis even further in [84], performers were asked to play an excerpt from
the Brahms sonata without piano accompaniment and were given no performance
directions this time. Motion capture was performed through the Vicon motion
capture system. The kinematic displays produced from the Vicon captures were
modified by ’freezing’ certain parts of the body, changing the movement ampli-
tudes or showing the movement in reverse order. These performances were used
to analyse how ancillary gestures affected perceiver’s views on tension, intensity,
fluency and professionalism. Results proposed that freezing the motions of body
parts did not affect the perception of these musical values and so it could be sug-
gested that general body movement communicates more information efficiently.
These multiple studies in clarinet performance concerning gesture production
and perception identify areas where this research could benefit from in terms of
analysing motion in pianists for musical structure. High differences in move-
ment between performers are evident, however, there are similarities in the points
within the performance that these movements occur. We return to studies on pi-
ano performance but now particularly with an emphasis on relating movement to
sound and structure.
Thompson and Luck’s recordings of movement in piano performance noted
that having subjects repeat performances in multiple recording sessions had lit-
tle real-world effect on the amount of movement used in the performance. They
look at the head and shoulders as ancillary gestures as they are more removed
from physically producing the sound, whereas data from the fingers and wrists are
more involved in sound production. When asking performers to vary their levels
of expression from ’deadpan’ to ’expressive’ they noticed a change in amplitude of
movement much like the previous clarinet studies and Davidson’s piano studies.
On further examining the link between movement and audio, it was discovered
that movements sometimes predicted features of the audio stream [117, 118]. Work
by Shoda [110] looked into this temporal relationship between body movement
and temporal expression, finding that in fast tempi pieces movement appeared to
be in synchronisation with expressive timing, whereas slow tempi pieces experi-
enced lags between the movement and audible expression.
Delving deeper into more intricate movements in both clarinet and piano per-
formance, finger motion is examined in reference to its relationship to the acous-
38
tical outcome of the sound. Palmer and Dalla Bella’s investigation into the motor
movements of pianists’ fingers as they played fast passages saw a surprising re-
sult in higher amplitude movements for faster repetitions of the musical excerpt
[88]. These motor movements were then analysed for the effect they have on the
way the musical passage is performed [87]. Clarinettists were used in this partic-
ular study as they do not use finger height to change the loudness of the sound
produced. This eliminates the possibility of increased tempo passages in piano
performance requiring a louder dynamic and therefore possibly higher ampli-
tude finger movements. Again a relationship between faster passages of music
and higher finger height was seen despite this not having an affect on the loud-
ness of the sound produced. They propose that these movements are governed
by biomechanical constraints in finger movement as well as musical considera-
tions. However, more studies across different instruments are suggested as ways
to separate which movements are a response to music instead of biomechanics.
Other studies have investigated the use of tactile information at the fingertips to
enable performers to control the accuracy of their timing [51, 53]. Differences in
pianists’ touch at different tempi were evaluated by extracting landmarks in mo-
tion such as key-bottom landmark and the maximum finger height preceding per-
formed notes [52]. Results showed that a different ’touch’ was used at faster tempi
than at slower tempi. The musical extracts used in these studies were designed
specifically for ’fast’ or ’moderate’ performances and in order to manipulate cer-
tain fingering combinations. Although these results show differences in pianists’
touch, for the research executed in this thesis, I aim to look at fast passages of mu-
sic where particular notes may be accented for structural reasons. Differences of
pianists’ touch within a certain passage such as this may provide clues as to how
the music is being interpreted.
2.5 Summary
From the various theories and empirical studies examined in this chapter, it is
suggested that performers are free, within certain constraints, to use expressive
parameters such as tempo, dynamics, timbre, articulation and motion to empha-
sise structural and emotional aspects of the music. Timing and dynamics profiles
across groups of pianists are remarkably similar, however, different strategies for
these parameters can be used to express the same structural features. Gestures
within performance appear to be largely idiosyncratic although some similarities
39
are evident. Finger motion also seems to have a relationship to certain properties
of the music being performed although this needs to be more widely investigated
across instrumentation.
The results of these studies pose a number of questions which the experiments
in this thesis will aim to answer:
1. Despite the idiosyncratic nature of motion profiles across pianists, are there
commonalities which exist in occurrence with features in the phrasing struc-
ture?
3. Are there commonalities, therefore, between the tempo, dynamics and mo-
tion patterns of performers which suggest a link to phrasing?
40
Chapter 3
This chapter will detail some of the available capture technologies and analysis
techniques for audio, MIDI and video with reference to piano performance. This
is by no means a comprehensive list of all the available technologies but means to
serve as an example of the range of products and applications that exist, identify-
ing the advantages and disadvantages of each. A further review of data acquisition
techniques in music performance can be seen in Goebl et al.[50].
3.1 Audio
Performance analysis up until recently, has been mainly concerned with the anal-
ysis of audio recordings from famous pianists, due in part to the wide availability
of data. Measuring parameters such as dynamics and expressive timing can be
beneficial in this way, but when comparing two performances together, the differ-
ences in how they were recorded become a factor, particularly for the intensity of
the sound wave.
In the experiments detailed in Part III, audio is recorded through a stereo mi-
crophone setup, connected to a laptop computer via a Tascam Audio Interface.
This data is transported to the application Ardour via the Jack Audio Client which
has a low I/O latency of around 46.4ms in this particular case.
Once the audio has been recorded, there exist many tools for audio analysis.
Examples of these are libxtract [18], aubio [16] and other audio feature extraction
libraries that attempt to estimate note onsets, tempo and other lower level fea-
tures such as Mel Frequency Capstral Coefficients and spectral densities etc. The
41
disadvantage with using only audio recordings is particularly in the accuracy of
these note onset and tempo estimations. For instruments such as the piano, onsets
within a chord cannot be separated. This is something that can be overcome by
recording MIDI information. However, the audio data provides information that
cannot be recorded from simply MIDI information alone such as the effects of ped-
alling, the exact duration of notes (also influenced by pedalling and the acoustic
effects of the performance space) and also for instruments in which onset informa-
tion cannot be measured any other way.
3.2 MIDI
Information transported through the MIDI protocol can be collected in various
ways, particularly when concerning keyboard instruments. There are a number
of devices which can be used as external retrofits including the Moog Piano Bar
[5], which has a recommended retail price of approximately $1495 1 . This device
uses infra-red beams to detect depression of the keys. Internal retrofits such as
the TFT Midi Record system place a strip of carbon coated plastic underneath the
keys to record the onsets and velocities by changes in resistance and also use a
sensor detecting the onset and offset of the sustain pedal. Retail prices for an
internal retrofit such as this start from 1130 Euros 2 however, extra cost must be
accounted for the installation of the device. There are also factory installed pianos
from Yamaha and Bosendorfer that include the optical sensors for the keys and
pedals. Retail prices for the Disklavier range from the basic system at £25,000 to
the more advanced system at £35,000 3 .
The factory installed series has limits in its price and portability, issues ex-
pected to be solved by the internal retrofit optical devices. These however still
require modification to the actual piano which involves specialised installation,
reducing the portability somewhat, whereas the external retrofit devices are the
most portable and sit slightly above the keys of any piano. However, internal
retrofits would combat issues arising from the space the external device takes up
at the back of the keyboard. Interviews from the professional pianists in the ex-
periments in Chapter 9 highlighted opinions that the factory installed pianos had
a different ‘feel’. It is possible that regardless of whether the response of the piano
1 RRP taken from http://www.moogmusic.com/newsarch.phpcat_id=24 on 05/04/10
2 RRP taken from a local distributor in London on 23/06/10
3 RRP taken from a local distributor in Glasgow on 05/04/10
42
was changed by these optical devices, it may be a psychological issue for perform-
ers and that seeing a device such as the MOOG piano bar sitting above a normal
piano they had used in previous concerts helped make them more comfortable.
Table 3.1 shows a direct comparison between these three types of device. The
most portable of the three kinds of devices, which is the MOOG piano bar, will be
used for the experiments in Section III.
43
Table 3.1: Comparison of MIDI Capturing Devices
Device Type Measurements Price Advantages/Disadvantages
MOOG piano bar External retrofit Key onset, offset, on- $1495 Adv: very portable and
set velocity, pedal de- cheap, Disadv: no release
pression velocity or pedal angle, bar
sits above keys so reduces
playing space by 1cm
TFT MIDI record strip Internal retrofit Key onset, offset, on- 1130 euros Adv: cheap, could be
44
set velocity, pedal de- portable, and sits under-
pression neath the keys, Disadv: no
release velocity or pedal
angle and requires spe-
cialised setup
Yamaha Disklavier Factory installed Key onset, offset, on- £25000- Adv: All measurements
set velocity, release 35000 required, technology un-
velocity, pedal angle derneath the keys, Disadv:
Expensive and stationary
Moog Piano Bar is the best device for use in the multi-modal capture experiments.
3.3.1 Accelerometers
Active markers such as accelerometers can be used to determine acceleration pat-
terns in body movement. Available on chips with additional gyroscopes, posi-
tional information can be calculated by integrating the measured acceleration vec-
tors. An example of such a device is the IMU 6 Degrees of Freedom v2. This device
consists of three iMEMS gyroscopes with a Freescale three axes accelerometer and
costs approximately $124.95 4 . Prices increase with rises in bandwidth and sensi-
tivity. Accelerometers can now be bought with wireless capability, but as portable
as these small devices are, there are still limitations in placing them on pianists’
fingers without causing interference. Therefore, these devices are more suited to
measuring general body movement. Other disadvantages include errors that can
4 RRP from Sparfun Electronics http://www.sparkfun.com/commerce/product_info.
phpproducts_id=9184 on 23/06/10
45
arise from bias and sensitivity drifts with temperature. If using accelerometers to
integrate for positional information, this can give rise to errors with a magnitude
to the power of 2.
46
quire the cameras to be calibrated for noise before each use. The Vicon system
allows creation of body models which specify how each marker connects to the
others and once the Vicon VST model is created it can be used several times. For
recording each subject, a stationary capture must be taken for the system to recog-
nise the programmed model.
Post-recording reconstruction involves frame by frame viewing although there
is functionality for filling in gaps later. However, as it is proprietary software, these
estimation algorithms are unavailable to view and so cannot be evaluated. As it
was impossible to extract the data from the Vicon program any other way apart
from through the proprietary format, it was necessary export to ASCII files and
then process text files with huge lists of numbers. Other software packages exist
to allow full analysis of recordings made including biomechanical calculations,
however, these come at an additional price.
The highly accurate measurements of several markers for the human body
makes this system highly desirable for use in performance analysis experiments,
despite its limitations in price and portability. The experiment in Chapter 8 analysing
upper body movement in piano performance utilises this particular capture sys-
tem.
3.3.5 Eyesweb
Eyesweb [23] is a graphical user interface image processing system which allows
users to create their own analysis of captured video images using various algo-
rithms found in the OpenCv image processing library [11]. Designed for full body
47
motion in music and dance (particularly dance) it contains a number of analysis
techniques such as motion history images which provide a visualisation of motion
in time in a single snapshot [36]. EyesWeb XMI also provides the functionality to
convert between several layers and data types. Users can select various functions
as blocks and connect the input and output to other functions, as well as being able
to write their own processing blocks. It is a free and open source software (FOSS)
application and requires only video images input from video or live camera. As
well as functions for overall body motion, algorithms for finger tracking have been
assessed [19] from Hough transforms to tracking with coloured markers. The ac-
curacy and ability of coloured markers to work with complex backgrounds, such
as a piano keyboard with changing light conditions, far outweighs the benefits of
the other algorithms assessed. The application however, only runs on the Win-
dows operating system so far.
3.3.6 A portable, low cost, accurate motion capture system for pi-
anists’ fingers
Although many of these motion capture methods detailed in Table 3.2 offer advan-
tages of portability and high accuracy, few allow this in combination with being
low cost and being designed considering the distraction caused to the performer.
A solution to this lies in image processing systems. The image processing tech-
niques explored above offer solutions in tracking general hand shapes in perfor-
mance although will not allow the intricate measurement of each joint of each fin-
ger. A specially designed image processing system with passive coloured markers
for each joint of each finger is described in Chapter 4.
48
Table 3.2: Comparison of Motion Capture Devices
System Active/Passive Wireless Error Sampling Resolution Price Advantages/Disadvantages
Freq
Accelerometer Active Not fully Sensitivity 11kHz in- ±3g $124.95 Adv: direct application to
and Gyroscope Drift ±0.03% ternal body, Disadv: drift and
/◦ C, Output bias error
signal at zero
±2mg/◦ C
and gyro
noise
√
0.05◦ /s/ Hz
Optotrak Active No 0.1mm 4600Hz 0.01mm $70000 Adv: portable, never loses
49
position of markers, Dis-
adv: wired markers
Vicon Passive Yes 0.5/0.6 pixel 120Hz 0.3mm $110000 Adv: 3D motion capture,
wireless markers, Disadv:
stationary, can experience
noise and occlusion
Eyesweb Passive Yes Depends On Depends Depends Free Open Adv: used with any input
Camera On Cam- On Cam- Source camera, Disadv: can ex-
era era Software perience noise and occlu-
sion, graphical user inter-
face not good for overcom-
plicated programs
mats for representing score information, MusicXML [54], an XML based tool, has
proved the most popular. Storing precise performance data such as timing along-
side recorded audio is straight forward enough through the use of a simple audio
editor such as Audacity[2], which supports tagging audio files and can read and
write these into text files. Storing performance data in alignment with score in-
formation ,however, requires a fully integrated infrastructure that can support a
more sophisticated level of data processing. Amongst existing solutions are the
Music Encoding Initiative (MEI)[93] and Performance Mark-up Language (PML)
[7]. MEI’s main aim is to "a) provide a standardised universal XML encoding
format for music content (and its accompanying metadata) and b) facilitate inter-
change of the encoded data". MEI represents score as well as analytical data, and
also has the ability to time-stamp objects in various time codes. However, these
time-stamp objects’ associated semantics are fairly trivial, and the performance
data is not given an explicit, separate representation.
A solution to this lies in the development of Performance Markup Language
(PML) which stores the performance data in a separate hierarchy to the musical
score data, linked to each other by note IDs (see Chapter 6).
50
Figure 3.1: Example Structure of Units and Channels in a GMS file
51
3.5.1 Performance Worm
The Performance Worm created by Langner and Goebl [66] and later used by
Dixon [40], plots a 2D graph of dynamics versus tempo in the form of an ani-
mation for each performance. Using an audio signal as input, the dynamics are
measured by taking the sound pressure level and the pulse is extracted using the
beat-tracking system Beat-root [39]. The musical timing of the notes relative to
their expected time and duration can then be calculated. A circle is plotted for
each point in time (depending on the frequency of occurrence of notes within the
excerpt) with the colour fading as time progresses, plotting a path of these circles
to give the user an idea of how the tempo/dynamics change over a period of time.
In the most recent version of the application, the bar number of the music being
played is displayed within the most recently plotted circle and major boundaries
such as the end of an excerpt are identified by large black circles within the plotted
path. This is an extremely useful tool for comparing patterns of performers’ use
of tempo and dynamics within a piece and users can see distinct styles of perfor-
mance producing different paths. Unfortunately, there is no direct visualisation of
the music they are playing or a continuous feeling of time, except that the picture
of the worm moves about the screen in synchrony with the audio output. The re-
sultant graphs of dynamics versus tempo allow easy comparison of two different
performers playing the same piece and so it is a good visualisation and analysis
tool for comparing the performance styles of famous artists. However, it does not
provide useful information about the particular performance itself in terms of the
musical score.
52
so they can be used as raw data. This application is beneficial to audio analysts
particularly in the Music Information Retrieval community. However, there is no
direct view of the actual notes.
3.5.5 Motiongrams
Motion data is always too much to plot in one two dimensional graph, and so in
an effort to visualise overall motion, and particularly longer sequences of motion,
Jensenius has created a number of tools that can be used much like spectrograms
are used to look at audio files [61]. Motiongrams analyse the differences between
frames and take the mean of the rows and columns, displaying the results on a
continuous graph. This can be visualised in synchronisation with the spectrum
of the recorded audio. This particular tool allows the user to identify particular
points of interest in the audio and video spectrum for further analysis.
53
3.5.6 Visualisation with the score
Although these visualisation methods detailed in Table 3.3 all help to give an in-
stant impression of the audio or video performances such that they are distin-
guishable between performers, they all lack a direct relation to the score or a rep-
resentation of the notes being played. A representation involving both the score
notes and the performance data would be of great use to performance analysts.
The specially developed Pullinger Database (see Section 6.2) presents a method
for displaying performance metadata of any kind above the notes on a score, al-
lowing direct analysis and obvious relationships to be determined between the
performance measurements and the notes being performed.
54
Table 3.3: Comparison of Visualisation Tools
Application Input Measurements Advantages
Performance Worm Audio Tempo and Dynam- Good for quickly compar-
ics ing performance styles
Sonic Visualiser Audio Low-level audio de- Good for comparing de-
scriptors and tempo tails of differences in two
estimations performances of same
55
piece
Key-Frame Displays Video Displays images of Good for overview of
video video
Motion History Key-Frame Displays Video Motion trajectories in Good for analysis of mo-
time tion and overview of video
Motiongrams Audio and Audio spectrogram Good for comparison of
Video and Video motion- motion details compared
gram to details in audio stream
Part II
56
Chapter 4
FingerDance
From the review of available motion capture technologies in Chapter 3 a need has
been identified for a system specifically designed for tracking finger movements
in musical performance. This system should be cheap and portable as well as be-
ing as un-intrusive as possible to the performance. FingerDance is a specially de-
signed, open source, image-processing-based motion capture system for tracking
pianists’ fingers. It is designed for use with a single, fast frame rate camera, placed
with an aerial view of the keyboard of the piano. This camera captures images
containing passive paint markers applied directly to the performer’s fingers. This
chapter explains the setup of the FingerDance system and the algorithms behind
the identification and tracking of the hands.
57
The width of a pixel at this height is 1.1mm and so the error in detection is ap-
proximately 0.55mm. The calculated angular resolution of the camera is 0.076◦ , as
seen in Figure 4.1. The black box represents the camera and the light squares rep-
resent the pixels at 1.1mm width. The angle is calculated by the simple equation
−1 1.1
tan = 0.076◦ (4.1)
830
A stereo setup of these same cameras would allow for depth detection in the
image, however, a change in one step of the angular resolution in each camera
at the mid-point between the stereo pair would result in a change in depth of as
much as approximately 9.2mm, as seen in Figure 4.2. The darker square represents
a pixel closer in depth to the camera than the lighter square. This calculated error
does not account for extra error that would occur if the cameras do not have an
external sync.
Figure 4.2: Stereo camera setup showing difference in height for one step of angu-
lar resolution
58
As the error in depth calculation is so high in a stereo setup, particularly when
considering the small differences in height between each joint of the finger, a
monocular system is preferred. Using a monocular setup is cheaper and uses
less processing power to capture the raw images, hence making the system more
portable. Depth can be estimated from the 2D image reference markers as seen in
Section 4.5.
The raw images from the camera are captured through the open-source appli-
cation Coriander [10], which allows manipulation of the image parameters includ-
ing frame size, gain and packet size, and stores these images appended as a raw
video file. The raw video files are encoded using mencoder [4] and dumped into
an avi container with the video coded as lossless jpeg frames. This format is cho-
sen so that the videos obtained are compatible with the image processing library of
functions used to program the detection software, the Intel OpenCV library [11].
There is capability for the system to be real-time, as the OpenCV functions can also
grab images live from a connected camera. However, to avoid stressing the laptop
with high processing requirements during capture and to ensure the system is as
portable as possible, all image processing is done post-recording.
Once the markers have been tracked, the output data is stored as a GMS (Ges-
ture Motion Signal) file. The structure of these storage files were explained in the
Section 3.4.2.
59
Figure 4.3: Placement of Hand Markers, Plotted as Yellow Dots
age Processing library and the bolt-on OpenCV blob extraction library [11]. The
software reads in the avi video files and processes them frame by frame. The first
frame requires the user to click on the markers in order of the structure of the
hand model to allow a reference frame to be stored before tracking commences.
Each frame is passed through colour threshold filters, yellow for the left hand and
cyan for the right hand. These two sets of binary images are then submitted to
the blob detection algorithm. This algorithm scans each raster image frame line
by line and records connecting regions of similar colour. This process can be seen
starting from the captured image in Figure 4.4, which is passed through colour
thresholding for the left hand markers, which are yellow. This thresholded image
is seen in Figure 4.5. This binary image is then submitted for blob detection, the
results of which are presented in Figure 4.6. The blob detection algorithm searches
for blobs of a certain area to minimise noise. This process is repeated for the right
hand markers. Each detected set of blobs are stored in a C++ vector to be com-
pared with the coordinates of the detected markers from the previous frame. A
simple correlation algorithm determines which detected blobs are likely to be the
new position of each of the hand markers. The thresholding and blob detection
functions on an average frame tend to split the average sized 67 pixel marker into
two or three distinct blobs and calculates the centre of each. It is this centre which
is recorded as the blob’s location in the frame. An extra function is included which
calculates the distance between each registered blob, combining blobs which are
60
less than 10 pixels distance away from each other’s centre. This is in effort to can-
cel out the effect of the previous functions which split the blob into several other
blobs. The error introduced by these image processing functions of thresholding
and blob detection in an average frame in calculating the centre of each blob is one
pixel in both the x and y direction i.e. 1.1mm in each direction. As this function
to calculate the centre of the blob discretizes to approximately 1 pixel, the worst
case error is calculated by simply adding the blob and camera errors together. This
gives a total error of 1.65mm.
Even at frame rates above 50 frames per second, pianists’ finger movements
are rapid enough to require further remedial action over and above the basic blob
61
Figure 4.7: Tracked Markers Results
4.3 Heuristics
These heuristics are programmed from a list of constraints, advised by Rijpkema’s
model of human hand constraints [103] with some additions to account for the
extra constraints in the context of piano performance.
Basic constraints that are incorporated into the program include the position in
x and y coordinates of each finger on each hand, where x is the distance along the
width of the keyboard and y is the distance from the top of the frame. Calculating
distances between the base wrist points and each of the other markers can also be
used to group the markers each for the metacarpophalangeal joints and the two
sets of interphalangeal joints. Two examples of basic constraints are therefore as
follows:
1. The distances between the metacarpophalangeal points and the wrists are
unlikely to be smaller than the distances between the proximal interpha-
langeal joints and the wrists. These are again unlikely to be smaller than
the distances between the distal and the proximal interphalangeal joints. Us-
ing this simple rule, the points can be easily separated into groups of joints.
This rule is set out in pseudo code in Algorithm 1, where i is the distance
between each detected marker and the nearest wrist marker.
62
pophalangeal points of the left hand’s first finger will be larger than the sec-
ond finger and so on for the third and fourth fingers. The opposite can be
considered true for the right hand. This rule for the left hand is set out in
pseudo code in Algorithm 2. For the group of detected markers, the x co-
ordinates are evaluated to order the group in increasing value. The marker
with the highest value of x is removed from the original vector and put into
another vector, orderedgroup. This is performed for the next highest value of
x and so on until all the markers have been put into the orderedgroup vector.
The ordered group is then assigned to first, second, third and fourth fingers
respectively.
63
3. The angle between the proximal interphalangeal and the metacarpophalangeal
joints will unlikely be highly different to the angle between the wrist point
and the metacarpophalangeal joint. The same rule is applied to the angle
between the distal interphalangeal and proximal interphalangeal joint. This
algorithm is set out in pseudo code in Algorithm 3. When the angle detected
is larger than the maximum angle, it is assumed that two adjacent markers
have been wrongly labelled and so their labels are swapped.
The benefits of these constraints on the tracking system were calculated by as-
sessing the percentage of markers correctly identified in a series of three frames at
a few different points within the test video. The test video was taken from one of
the performance videos recorded in the experiment in Chapter 9. These benefits
were assessed for three different levels. The first was based on a basic system us-
ing only blob detection; the second was an improved system which incorporated
basic heuristics to improve the rate of tracking; the third was a more advanced
system using the full set of heuristics and blob tracking. Results show the basic
system has a tracking accuracy of 63%. The improved system has a 23% increase
in accuracy whilst the final system has a 40% increase, bringing the total accuracy
in tracking to approximately 88%.
This accuracy was judged for when all points were available to track and not
occluded from view as can sometimes happen in piano performance. The estima-
tion of occluded points is dealt with in the next section.
4.4 Occlusion
A significant difficulty in hand tracking arises in occlusion. This happens regu-
larly in piano performance, where the pitch range of notes for both hands overlap
or in passages that require fingering patterns which place the thumb underneath
the other fingers. The software can estimate the position of any “lost” markers
by calculating the average transformation between each frame of the other points
64
in the point set. The affine transforms of each marker are determined using the
scaling, rotation and translation matrices below:
Xscaled Scalex 0 0 X
Yscaled = 0 Scaley 0 Y
1 0 0 1 1
Xrotated cos(θ ) − sin(θ ) 0
X
Yrotated = sin(θ ) cos(θ ) 0 Y
1 0 0 1 1
Xtranslated 1 0 Dx X
Ytranslated = 0 1 Dy Y
1 0 0 1 1
This system requires the full detection of all markers in a test frame before
tracking can begin, as the estimation algorithm calculates the new position based
on the marker’s last tracked position. Future work will calculate the motion vec-
tors of each point, so that the software can predict occlusion and estimate the lost
marker’s position using the transformation matrices above.
A unique advantage of this software is that it allows a high degree of user in-
tervention, so that any wrongly assigned markers can be corrected, and estimation
points can be approved or changed. The software also has functions to allow the
re-opening of existing files, allowing users to go back and change stored values.
Having tracked and estimated the positions of all the markers, we can now
consider estimating the depth of each marker.
4.5 3D estimation
By using monocular images to track movement, the z position of the markers have
to be calculated from reference points in the 2D image. 3D images could be cap-
tured by a stereo camera array, however, the resolution for two cameras at 83cm
above the keyboard does not improve significantly to justify the extra expense of
another camera or the computational processing load to allow raw image capture
from another camera in synchrony. In an effort to produce a stable system that
is cheap, portable and accurate, only one camera is used. However, the disad-
vantages of such a system arise when wanting to measure the exact angles of the
fingers for any purpose that cannot settle for an estimation of the z axis.
65
The hand model for the pianists has been designed with several reference mark-
ers on the base of the hand to allow 3D estimation. Calculating a range of distances
between the markers of the models, the z axis can be estimated by examining the
difference of these distances between frames. The distances calculated are seen
in Figure 4.8. Although these distances will be different for each person anatomi-
cally, as long as an initial frame is recorded that contains both hands laid flat on the
keyboard, the z axis can be accurately estimated through the use of trigonometry.
Distance A is calculated between the two base wrist points, distance B is cal-
culated between the left base wrist point and the first finger metacarpophalangeal
point. Distance C is calculated between the right base wrist point and the fourth
finger metacarpophalangeal point and distance D is calculated between the first
and fourth metacarpophalangeal points. The distance from the thumb metacar-
pophalangeal point and the left base wrist point is distance H. Distances Fthumb
and F1 to F4 are calculated for each finger as the distance between its metacar-
pophalangeal and proximal points. Distances G1 to G4 are calculated for each
finger as the distance between its distal and proximal points.
Considering the view of the camera, we can consider how these distances change
with changes in depth, as seen in Figure 4.9. The first image shows the four dis-
tances A, B, C and D at a flat level. As the hand is level and approaches the camera,
i.e. rises away from the keyboard, distances A, B, C and D will all increase. This
is viewed in image(b). Equally, as the hand is level and moves away from the
camera, i.e. towards the keyboard, these distances will decrease. As the hand tilts
forward and the wrist rises towards the camera, distance A increases whilst all
other distances decrease. This is seen in image(c) with the opposite seen in im-
66
Figure 4.9: Changes in Hand Distances for Different Orientations. Image(a) shows
the hand distances as a reference frame, (b) shows the hand moving away from the
camera, (c) shows the hand tilting away from the keyboard, (d) shows the hand
tilting towards the keyboard, (e) shows the hand tilting to the right and (f) shows
the hand tilting to the left.
age(d). As the hand tilts to the right, distances A, B and D will decrease, however,
distance C will either increase or stay the same. This is presented in image(e). As
the hand tilts to the left, distances A, C and D will decrease, however, distance B
will either increase or stay the same. This is presented in image(f). For each of
the fingers, distances F and G will decrease as the finger is curved and increase
as it is flattened. Considering the thumb separately, distance H will increase as
the thumb moves towards the camera, and decrease as it moves towards the key-
board. Using these observations, estimations of depth for each joint can be devised
as follows. As the hand has several degrees of freedom, depth for the wrist and
metacarpophalangeal joints is calculated by using the average of the nearest two
applicable distances in the x and y direction. This then accounts for tilt in the x
and y directions. For all depth estimations, the new measurements of distance are
compared to the initial frame zero:
67
Metacarpophalangeal joints
68
Wrist positions
Left wrist:
At=0 + Bt=0 θ
tan ×
2 2 p
z= f × (4.7)
At + Bt
2
Right wrist:
At=0 + Ct=0 θ
tan ×
2 2 p
z= f ×
At + Ct
2
(4.8)
4.6 Storage
Once 3D estimation is completed, the tracked information is stored in GMS files
(see Section 3.4.2) which are structured in scenes, units, channels and tracks. For
purposes of the FingerDance software, each scene consists of two units corre-
sponding to each separate hand. Each unit then consists of 16 channels which
represent the 16 markers on each hand. Each channel consists of three tracks to
store the (x,y,z) coordinate of the marker, as required by the GMS file. This means
that the retrieved geometrical data from the image processing software needs to
be arranged in the same format to be read in to the GMS file. The data for each
frame is stored as a list of numbers with the offset element number for each track
recorded. When reading the GMS files, the offsets are used for each frame to locate
the correct marker position.
69
on more restrictive constraints like those of Guan et. al [58]. These constraints are
also based on Rijpkema’s model but define a set of relationships between the an-
gles of each finger. Occlusion could be improved by also calculating the velocity
and direction of each point as it reaches occlusion to better estimate the correct
position. Finally, 3D estimation can be improved by deriving a stronger algorithm
that incorporates the angular relationships between each finger much like the im-
provements that can be made to the heuristics.
4.8 Applications
In conclusion, a motion capture system has been described that is cheap, portable,
accurate and un-intrusive to performance. It is specifically designed to track finger
motion in piano performance and also allows a great deal of user control in its
estimation algorithms.
In its current version, this software can be used for a variety of purposes. Be-
ing able to track accurate positional information of the hands in piano performance
can help to answer pedagogical questions on hand movement, identifying expres-
sive movements and note accents. Investigating how finger curvature affects the
acoustic sound in amplitude and in timbre is also possible by analysing the dis-
tances between the joint markers.
Future extensions for the software include incorporating models for other types
of musical performance e.g. guitar playing and also being able to track fingering
patterns by storing the position of the keys.
70
Chapter 5
This chapter describes the design of two full multi-modal capture systems us-
ing some of the commercially available capture technologies described in Chap-
ter 3, as well as the specially designed finger motion capture system described in
Chapter 4. These two different systems are required due to differing needs in mo-
tion capture. The Vicon incorporated system captures full upper body movement
whilst the FingerDance incorporated system captures intricate measures of finger
movement.
The two systems also demonstrate a number of advantages of using each type
of motion capture technology. The Vicon system is entirely stationary and has
been used solely within the University of Glasgow Psychology Department. The
FingerDance system, however, is entirely portable, fitting on top of any 88-key
piano, and has been used at the University of Glasgow, Napier University, the
Royal Northern College of Music, Manchester and the Royal College of Music,
London.
Self-reporting is included as part of the methodology for both systems, taking
place immediately after the recordings. This enables the capture of each pianist’s
thoughts on their performance, to be used as extra information to inform future
data analysis.
71
metres. Retro-reflective markers are attached to the subject either directly onto the
skin or applied with velcro to a specialised jacket and cap. Using triangulation,
the system records accurate 3D positions of each marker at a rate of 120 frames
per second. One of the limitations of using the Vicon system is that it is com-
pletely stationary and therefore, only keyboard instruments that are portable into
the capture volume can be used. When recording performances, the pianists will
play on a Roland RD-150 weighted keyboard.
Audio is amplified from the keyboard via a Peavy KA/6 Keyboard Amplifier,
and is recorded into a laptop computer via the Tascam US122 Audio Interface.
This same audio is sent to the analogue card of the Vicon mastercomputer in syn-
chrony with the motion capture recordings. These two audio recordings are used
to synchronise the MIDI recordings with the motion capture data.
The MIDI out jack on the keyboard allows us to capture MIDI directly. This is
transported to the computer via the Tascam audio interface. The Jack Audio Client
is used to transport audio from the driver to the application Ardour [1] and also
to transport the MIDI data to the MIDI sequencer Rosegarden [8]. Jack also allows
synchronisation between the audio recording workstation and the MIDI recording
software.
To retain a record of the images of the performance, a separate video is recorded
by a Sony Handycam video camera placed in an ’audience perspective’. Figure 5.2
shows the setup for this system through the view of the ’audience perspective’
video.
This system will be used to record audio and MIDI as well as capturing full
body motion of the pianists to answer particular questions on the relationships
72
Figure 5.2: Audience Perspective of Vicon System Recordings
73
Figure 5.3: System Architecture for FingerDance Incorporated System
designed apparatus with stands at either end of the keyboard. This apparatus is
seen in Figure 5.4. Figure 5.4(a) presents a side view of the apparatus, showing
how the light is suspended over the keyboard whilst Figure 5.4(b) shows the con-
struction of the adjustable poles at either side of the apparatus, allowing the light
to suspend at heights from 122.5cm up to 189cm. Figure 5.4(c) shows how this
apparatus is then placed in front of a concert grand piano. When in use, the appa-
ratus is moved so that either side of the stand sits just in front of the keyboard.
The full configuration of the system along with two photographic lights and
diffuser umbrellas providing normal lighting is shown in Figure 5.5.
Audio is recorded through a Beyerdynamic MCE82N(C) stereo condenser mi-
crophone placed a few feet from the open lid of the grand piano. This is connected
via a balanced XLR lead through the Tascam USB audio interface to laptop com-
puter A. Audio is transported from the driver to the audio application Ardour via
the Jack Audio Connection Kit. This also provides synchronisation with the MIDI
sequencing software and the audio recording software.
MIDI is recorded through the Moog Piano Bar, via the Tascam USB audio in-
terface also to laptop computer A. The two sensors that make up the piano bar
are connected to the control module which converts the signals into MIDI proto-
col. The Moog bar must be calibrated against the piano on which it is placed, with
lights above each of the keys indicating whether the bar is sitting too high above
or too close to the keys.
A Sony Handycam video camera is set up on a tripod with full view of the
performer and the piano to record an ’audience view’ of the performance.
74
(a) Side view of (b) Enlarged (c) Front view of UV apparatus
the whole appa- view of the in place in front of the piano
ratus adjustable
poles
75
Figure 5.6: Audience Perspective of FingerDance system recordings
This system will be used to record audio, MIDI and motion of the pianists’ fin-
gers to enable us to answer questions on the relationships between finger move-
ment and acoustic sound, as well as their relationship with musical structure.
76
ferent rule set to perform traditional segmentation instead of marking down the
segmentation they had performed [45].
A general open interview on the performer’s views of motion in performance
takes place after the audiovisual segmentation exercise. The basic questions that
are asked to each performer are:
• How do you express structural features like the ones you have marked on
the score?
Results from these interviews can help in interpreting the numerical perfor-
mance analysis both in the motion differences between performers as well as the
segmentation of the pieces of music.
77
Chapter 6
6.1.1 PML
Performance Markup Language (PML) developed by Douglas McGilvray at the
Centre for Music Technology in the University of Glasgow [82], was particularly
designed to accommodate the mark-up of performance information alongside the
score. PML is a specification which can be used to extend XML-based score rep-
resentations such as Music-XML. Analytical, performance and score information
are separated into different hierarchies. Since MEI represents these domains in a
single hierarchy, which is based on the requirements of the features of the musical
score, it makes it a less elegant solution for the representation of other data which
may be non-isomorphic with the score. For example, one would not expect the
repeated portion of a da capo aria to be performed the same way the second time.
78
The performance data in a PML file is stored at the end of the MusicXML note list
and IDs link aligned performed notes to score notes. This allows more than one
performance note to be aligned to one score note.
Conversion into PML begins with MusicXML versions of the musical score and
MIDI performance recordings. Several steps are taken to store the separate files of
information and create links between the score and performance data. This in-
cludes a matching algorithm which uses Dynamic Time Warping to find the opti-
mal mapping between score and performance.
• musicxml2pml - The MusicXML file of the score is converted into the struc-
ture of a PML file.
The PML file at this point shows the two separate hierarchies for score notes and
performance notes. This can be seen in the file fragments in Figure 6.1.
The pml file at this point now contains links to note ids in the performance part
which identify which score note they are associated with. This can be seen in the
code fragment in Figure 6.2.
Other formats of performance data can be added such as audio files. Function-
ality for adding different gesture formats is currently in development.
79
<pml>
<score-partwise>
<work>
</work>
<identification>
</identification>
<part-list>
.
.
.
</part-list>
<part id="P1">
<measure number="1">
<print>
<staff-layout>
<number>2</number>
<staff-distance>70</staff-distance>
</staff-layout>
</print>
<attributes>
<divisions>8</divisions>
<key>
<fifths>3</fifths>
</key>
<time>
<beats>3</beats>
<beat-type>4</beat-type>
</time>
<clef number="1">
<sign>G</sign>
<line>2</line>
</clef>
<staves>2</staves>
<clef number="2">
<sign>F</sign>
<line>4</line>
</clef> </attributes>
<note id="note1">
<rest/>
<duration>8</duration>
<voice>1</voice>
<type>quarter</type>
<staff>1</staff>
<starttime>0</starttime>
</note>
.
.
.
<barline location="right">
<bar-style>light-heavy</bar-style>
</barline> </measure>
</part>
</score-partwise>
<performance>
<perfpart part="P1">
<event id="pnote1">
<onset>4.90729</onset>
<end>5.79063</end>
<midi>64</midi>
<velocity>36</velocity>
</event>
.
.
.
80
<performance>
<perfpart part="P1">
<event id="pnote1">
<onset>4.90729</onset>
<end>5.79063</end>
<midi>64</midi>
<velocity>36</velocity>
<align note="note3">correct</align></event>
<event id="pnote2">
<onset>5.65937</onset>
<end>5.91979</end>
<midi>40</midi>
<velocity>26</velocity>
<align note="note13">correct</align></event>
81
in the score and showing the inter-onset interval information for the performance
of these. After the query is sent to the database, a document is created and then
populated with the results of the query using Lilypond typesetter [3]. The results
of this query is shown in Figure 6.3.
Since this technology allows easy comparison of different performance values
with the notated musical notes intra performance and inter performance, it will be
used in the experiments in Part III.
82
2
1+M3 1+M2 1+m6 1+P5 1+m3 1+D5 1+M3 1+P5 1+m3 1+M2 0+m7 0+m6 0+A4 0+M6 0+P5 0+M7
0.21 0.18 0.20 0.19 0.18 0.19 0.21 0.19 0.21 0.18 0.20 0.19 0.16 0.22 0.22 0.16
1+M3 1+m6 1+m3 1+M3 1+m3 0+m7 0+A4
0.40 0.40 0.35 0.40 0.38 0.34 0.44
5
0+M6 0+m7 1+P4 2+P1 1+M3 1+M2 1+m6 1+P5 1+M3 1+P5 1+M3 1+P5
0.20 0.38 0.36 0.64 0.18 0.16 0.25 0.16 0.25 0.19 0.14 0.25
1+P5 0+m7 0+m6 1+P4 1+m6 1+D5 2+P1 1+M3 1+m6 1+M3 1+M3
0.39 0.40 0.35 0.22 0.12 0.24 0.20 0.16 0.21
1+m3 1+P5 1+m3 1+D5 1+M3 1+P5 1+m6 2+P1 2+m3 2+P1 2+m2 2+m3 2+P5 1+M6 2+P5 2+P4
0.21 0.16 0.21 0.17 0.20 0.21 0.19 0.19 0.21 0.22 0.20 0.21 0.44 0.21
1+m3 1+m3 1+M3 1+m6 2+m3 2+P5 1+M6 2+P5
0.24 0.35 0.43 0.43 0.44 0.37
Figure 6.3: Example of database produced result to query on dissonant notes and
IOIs
83
Part III
84
This part describes experiments that have been conducted using the method-
ologies and tools explained in Part II. These experiments have been designed in
order to answer the musicological questions posed in the introductory chapter
of how to elucidate musical structure from multi-modal performance data. This
also explores how physical gestures ranging from large scale body movements
to intricate finger movements align with the performer’s interpretive choices and
whether these can be used as indicators of structural features.
The first set of experiments explore the relationship between general body
movement and phrasing structure, and use this in tandem with audible param-
eters to examine the multi-modal changes taking place at these structural bound-
aries. The second experiment uses these relationships to discover structural fea-
tures where there is a certain ambiguity in traditional score-based analyses. The
musical compositions performed by the pianists in each experiment are chosen
specifically to expose these relationships between performance and score in a West-
ern classical music context.
85
Chapter 7
Musical Stimuli
Three Chopin pieces are used as the musical stimuli for these experiments: Prelude
in A major Op.28 No.7, Prelude in B minor Op.28 No.6 and the finale movement
of the Sonata in B flat minor Op.35. The two preludes come from the same Op.28
set which is a standard set of repertoire for pianists. There also exists a number of
analyses on the preludes and they tend to produce coinciding views on their struc-
ture. These are ideal pieces to explore the roles of aural and visual parameters in
conveying structure. The finale of the sonata however, is a piece that can encour-
age completely divergent views on its structure. For this reason, it is used as a test
piece for being able to use performance parameters to discover musical structure.
In both sets of experiments detailed in Section III, the Prelude in A major No.7 is
used as a control piece. This chapter provides traditional analyses of each piece
from which the investigations into ‘performed’ structure can proceed.
86
piece is used as a control piece in each of the experiments in Section III.
As mentioned in Chapter 2, Bisesi and Parncutt’s accent analysis of this Prelude
is included for reference as Figure 7.2.
Figure 7.1: Phrasing Analysis of Chopin’s A major Prelude op.28 No.7, with blue
marks for sectional boundaries and red marks for phrase groupings
87
Figure 7.2: Bisesi and Parncutt’s Accent Analysis of Chopin’s A major Prelude
op.28 No.7. Taken from Erica Bisesi and Richard Parncutt, Private Communica-
tion. This figure represents a preliminary stage of the analysis by the authors and
has been presented by Erica Bisesi at the Opening Ceremony of the Centre for Sys-
tematic Musicology - University of Graz, held on 15th October 2009, and is part
of her Lise Meitner Research Project M 1186-N23 sponsored by FWF, Austria. Per-
mission to reproduce this figure has been granted by the authors.
88
7.2 Chopin’s Prelude in B minor op.28 No.6
Prelude No.6 in B minor can be segmented into three sections from bars 1-8, bars 9-
22 and a coda section from bars 23-26. In the first section we see the representation
of an ‘extended idea’. As seen in Figure 7.3, Chopin begins with a two-bar motif in
B minor. This motif is repeated with a slightly higher pitch range in the next two
bars. The first part of the motif is repeated again for a third time and then expands
into a four bar phrase ending at bar 8, the first sectional boundary. The second
section represents an expansion of this idea. At bar 9, the original two-bar motif
is repeated with the next expansion moving into C major. A new four bar phrase
is introduced at bar 15, answered by the consequent four bar phrase arriving at
the tonic at bar 22, producing the second sectional boundary. The piece concludes
with a short coda in B minor in its final phrase 1 .
Again, as mentioned in Chapter 2, Bisesi and Parncutt’s accent analysis of this
Prelude is included for reference as Figure 7.4.
In the experiments in Chapter 8, this piece is used in combination with the con-
trol piece to examine how visual gestures relate to phrasing boundaries, and also
how tempo, dynamics and motion patterns can be used to detect musical struc-
ture. The first three phrases of this Prelude show an extension of the original two-
bar phrase. This can be compared structurally against the rhythmically repeating
two-bar phrases of Prelude 7 in A major.
1 This analysis of Chopin’s Prelude Op.28 No.6 is combined from Kofi Agawu,V. ’Concepts of
Closure and Chopin’s opus 28’ in Music Theory Spectrum 9:1-17, 1987 [65] and comments made by
Jennifer MacRitchie, University of Glasgow, and David Lewis and Christophe Rhodes, Goldsmiths,
University of London
89
Figure 7.3: Phrasing Analysis of Chopin’s B minor Prelude op.28 No.6 with blue
marks for sectional boundaries and red marks for phrase groupings
90
Figure 7.4: Bisesi and Parncutt’s Accent Analysis of Chopin’s B minor Prelude
op.28 No.6. This figure is taken from Bisesi and Parncutt (2010), An Accent-Based
Approach to Automatic Rendering of Piano Performance [15]. This figure is repro-
duced with the authors’ permission.
91
7.3 Chopin’s B Flat Minor Sonata op.35 Finale Move-
ment
The finale of Chopin’s B Flat Minor Piano Sonata Op.35, the first 8 bars of which
can be seen in Figure 7.5, has been referred to as "a wild child, unique and well-
nigh indescribable"[116]. A short piece written for the most part in octaves, this
rhythmically unrelenting and binary sonata form composition has confounded tra-
ditional approaches to its analysis.
The existing written literature on this particular piece is very sparse with com-
ments being both anecdotal and impressionistic, probably due to the problematic
nature of the composition. Only Charles Rosen [106] has written an extensive es-
say and most of his statements are very non-committal, even though his authority
as a pianist prompts us to take them seriously. For our purposes, this problem-
atic nature of the work makes the data more suitable for objective, quantitative
methods.
Rosen’s analysis of the piece sets the first four bars as the introduction in the
dominant key of B flat minor, with the chromatic main theme entering in bar 5.
After bar 8, there is a transition section where the harmony of the chromaticism
gradually defines the dominant of the relative major key. A new theme set in D flat
major enters at bar 23 and is repeated an octave higher at bar 27. The recapitulation
begins at bar 39 by literally repeating the first eight bars of the composition and
then expanding the recapitulation of the following bars with parts of the transition
and the second theme, moving towards a cadence.
Another viewpoint on the segmentation of this piece comes from Michael Tal-
bot [115]. His segmentation of the finale is seen in Figure 7.1. Contrary to Rosen’s
view that the first four bars are set as an introduction, Talbot determines the first
eight bars as the first theme.
Further different analyses are summarised by Lindstedt’s work on segmenting
the finale using computer analysis [74]. One of the first arguable points is the entry
of the first theme and establishing whether the first four bars are an introduction.
These traditional analyses are taken as a starting point in the following investiga-
tion in Chapter 9. From examining patterns of tempo, dynamics and motion at
phrasing boundaries in the control piece, the performer’s interpretation of struc-
ture in the finale can be highlighted and points of agreement and departure across
performers can be examined.
Features of this piece which make it ideal for computational analysis are its
92
Figure 7.5: Chopin’s B flat minor sonata op.35 finale movement measures 1-8
constant rhythms, as every single bar except the final few consist of twelve qua-
vers. Any differences in rhythm therefore will be entirely due to the performer’s
manipulation of inter onset intervals and keypress durations etc. The right hand
melody is also perfectly replicated an octave below in the left hand and so chord
separation and melody lead are not an issue.
As previously stated, all pieces of music analysed in this Chapter will be used
in combinations for experiments in Chapters 8 and 9.
93
Table 7.1: Talbot’s Analysis of Chopin’s B Flat minor Sonata Finale Movement
Op.35
Bars Key Description Comment
1-8 b flat first theme establishing tonic
9-22 modulating transition chromatically unstable
23-30 d flat second theme diatonic
31-38 modulating retransition sequential progressions
39-46 b flat first theme reprise of bars 1-8
47-56 modulating transition/second theme based on bars 9-30
57-75 b flat coda largely diatonic
94
Chapter 8
95
expand the duration of notes in later phrases to form a different rhythm. These are
Chopin’s Prelude in A major Op.28 No.7 and Chopin’s Prelude in B minor Op.28
No.6 respectively whose analyses can be seen in Chapter 7. The first piece is used
as a scientific control. These pieces satisfy a number of criteria:
• Short pieces are preferred as the Vicon motion capture system works opti-
mally with short recordings.
• The genre of the music may have an effect on the musical gestures used to
express the performer’s interpretation and so Western romantic style pieces
are used.
• The Chopin Preludes Op.28 set are a widely known and performed set of
repertoire with many analyses and recordings available. The structure of
these preludes are quite clear with the existing analyses widely agreeing. Dif-
ferences in interpretation therefore arise from the hierarchical importance of
the boundaries and not the position of the phrasing boundaries themselves.
96
Figure 8.1: Upper body model markers
97
Figure 8.2: Upper body model Marker Definitions
Hypothesis 8.1 Regardless of the subjective and personal nature of physical gesture
in relation to musical structure, there will exist an underlying pat-
tern that is related to phrasing and is common across all performers.
Hypothesis 8.2 The underlying motion profile of the performer related to phrasing
will be the same across pieces.
98
Examination of these parameters will then consider the actual values, in par-
ticular the maxima and minima:
Hypothesis 8.4 Where combinations of global maxima and minima occur in both
aural and visual streams of data, these will be related to the most
important structural features of the composition.
First I will examine how gestures relate to the phrasing structure of each piece
to establish that there is a relationship between movement and structure. I will
then consider the multi-modal parameters to examine how visual gestures and
aural gestures interact all within the context of the phrasing. This analysis is an
extension of the analysis performed in [78, 77].
99
Principal Components Analysis allows us to retrieve a comparable motion norm
for each performer. It can also calculate details of relationships between the several
markers. If all marker trajectories are similar to each other, only one significant
principal component will emerge. The variance of the first PC shows if there is
commonality between the patterns of motion in each marker. A high variance will
show high commonality. e.g. 64% will show considerable commonality between
markers but still leaves some room for alternative patterns. We can then see how
each marker correlates with these principal components by looking at the loadings
scores. These are exactly that - a measure of correlation between each marker and
the PCs. If any markers appear to be leading the motion of the rest of the body,
we can expect high loadings for a few markers and low loadings for the rest. Each
principal component may be considered a ’motion profile’ and so by calculating
a weighted sum of the components, this gives us a better estimate of overall mo-
tion. Reduced dimension curves such as these are good at expressing a general
overview but inevitably lose some semantics of the actual movement being per-
formed and so after considering PCA results for each performer, each individual
marker is then also examined for reference to phrases, measures and beats.
Each performer’s principal component score was mapped against the timings
of each phrasing boundary to determine if there was a pattern of movement for
each phrase. Three pianists have been chosen to demonstrate the spread of results
concisely. These pianists were chosen according to their ability, their standard
deviation and variance of movement calculated for intra-performance data on a
few selected markers, and also their views on movement during a performance.
The pianists’ self-reports also conveyed a wide view on the role of movement in
performance, with some branding movements extra to sound productive ones as
completely unnecessary and something they tried to limit, whilst others felt it vital
to move in order to ‘feel’ the music they were performing. Although physical ges-
tures in performance can be classified either as movements necessary to the actual
sound-production or movements that are related to the music but not necessary for
the actual sound (i.e. ancillary) [21], it is acknowledged that gestures may still be
multi-functional. The performers chosen to display a range of results also reflect
these varying opinions on the role of movement in performance. Performer 1 is a
highly trained amateur pianist and had a small standard deviation of movement.
Performer 2 is a conservatory trained postgraduate student and had a large stan-
dard deviation of movement, and Performer 3 is a music undergraduate student
and had a mid-range standard deviation. The results from the other performers
100
can be seen in the appendices. Normalization of results allows the movements to
be correlated with phrase structure independent of differences in amplitude. The
arrows in each graph indicate the point in time where the last note of each phrase
ends in the audio stream.
101
the correlation between each marker and the resultant PCA scores, there did not
appear to be any single prevalent markers causing the most variance in motion.
The PCA curves are a result of the variances in a combination of several markers
and these differ slightly for each pianist. The top ten correlations between the
first two principal components and the body markers are seen in the two tables
following each graph with the expanded full list of loadings for Performers 1, 2
and 3 seen in Appendix A. The full list of loadings for every pianist highlighting
the top correlations between the first two principal component scores and each
marker are also included in the appendices.
Figure 8.3: First Two Principal Components of Movement for Performer 1 , Pre-
lude 7, the first component accounting for 49% variance and the second compo-
nent accounting for 23.1% variance, with blue vertical lines representing phrasing
boundaries as in the audio recording
102
Marker PC1 PC2 PC3 PC4 PC5 PC6
T10:X 0.12 0.02 0.2 -0.02 0.08 -0.05
LUPA:X 0.12 -0.05 0.15 -0.11 0.09 0.02
LUPB:X 0.14 -0.01 0.05 -0.09 0.06 0.02
LUPC:X 0.14 0.04 0.1 0.01 -0.01 -0.01
LELB:X 0.14 0.08 -0.02 0.04 -0.05 0
LMEP:X 0.15 0.03 -0.01 -0.03 0.01 0.03
LFRA:X 0.14 0.08 -0.01 0.04 -0.06 0.02
RUPC:X 0.11 0.13 0.08 -0.1 -0.06 0.03
RFHD:X 0.12 0.06 0.14 0.06 -0.1 -0.08
LFHD:X 0.12 -0.01 0.2 -0.01 0 -0.03
Figure 8.4: Top Ten Loadings for the First Principal Component, Performer 1, Pre-
lude 7
Figure 8.5: Top Ten Loadings for the Second Principal Component, Performer 1,
Prelude 7
103
Figure 8.6: First Two Principal Components of Movement for Performer 2, Pre-
lude 7, the first component accounting for 36.8% variance and the second compo-
nent accounting for 28% variance, with blue vertical lines representing phrasing
boundaries as in the audio recording
Figure 8.7: Top Ten Loadings for the First Principal Component, Performer 2, Pre-
lude 7
104
Marker PC1 PC2 PC3 PC4 PC5 PC6
T10:Z 0.01 0.18 -0.09 0.03 -0.07 0
LWRA:Z -0.03 0.15 0.07 -0.19 0.15 -0.03
LWRB:Z -0.02 0.15 0.07 -0.2 0.14 -0.05
LFRA:Z 0.03 0.13 0.06 -0.25 0.13 -0.09
LFIN:Z -0.04 0.16 0.08 -0.16 0.1 0.02
RWRA:Z -0.04 0.13 0.16 0 0.12 -0.03
RWRB:Z -0.04 0.13 0.17 0.01 0.12 -0.01
RFIN:Y 0.08 0.13 0.07 0.09 0.07 0.2
RFIN:Z -0.03 0.13 0.14 -0.04 0.13 -0.02
RBHD:Y 0.13 0.12 -0.03 -0.04 -0.09 0
Figure 8.8: Top Ten Loadings for the Second Principal Component, Performer 2,
Prelude 7
Figure 8.9: First Two Principal Components of Movement for Performer 3, Pre-
lude 7, the first component accounting for 41.3% variance and the second compo-
nent accounting for 25% variance, with blue vertical lines representing phrasing
boundaries as in the audio recording
105
Marker PC1 PC2 PC3 PC4 PC5 PC6
C7:Y 0.16 0.01 -0.08 0.02 0.01 0.01
T10:Y 0.16 -0.01 -0.09 -0.02 0.01 0.01
CLAV:Y 0.16 0.03 -0.05 0.03 0 0.01
STRN:Y 0.16 0.03 -0.02 0.02 -0.02 0.02
LSHO:Y 0.16 0.02 -0.05 0.04 -0.01 0
RSHO:Y 0.16 0.01 -0.09 0.02 0.02 -0.01
RUPA:Y 0.17 0 -0.05 0.01 0.03 0.05
RUPB:Y 0.16 -0.01 -0.01 0 0.03 0.12
RUPC:Y 0.16 0 -0.01 0.01 0.05 0.09
RFHD:Y 0.16 0.02 -0.07 0.03 0.01 0.01
Figure 8.10: Top Ten Loadings for the First Principal Component, Performer 3,
Prelude 7
Figure 8.11: Top Ten Loadings for the Second Principal Component, Performer 3,
Prelude 7
106
within the piece (between end 5 and end 6). The highest loadings for performer 1
as seen in Figures 8.4 and 8.5 relate to movements in the upper arms and the head
predominantly in the x axis for the first component, and the chest, upper arms and
head predominantly in the y axis.
The second performer’s loadings, as seen in Figures 8.7 and 8.8, relate to move-
ments in the upper arms and chest predominantly in the y axis for the first com-
ponent, and the wrists and fingers predominantly in the z axis for the second com-
ponent. The first principal component (seen in Figure 8.6) indicates a pattern fol-
lowing the phrasing boundaries, with an exception to this occurring before the
end of phrase 6 where the curve is split into two. Suggestions for this occurrence
can be found in literature referring to action-chunking [47] where the gesture for
a long length of phrase can be split into sections. The second component displays
more noise, potentially related to the beats within the phrases. Again the global
maximum occurs near the ending of phrase 6 at the harmonic arrival, however the
global minimum occurs at the ending of phrase 3.
A similar pattern can be seen for the two principal components of performer
3 (seen in Figure 8.9), where the first component relates highly to phrasing and
displays the same split curve in phrase 6, whereas the second component is nois-
ier potentially echoing the inter-phrase beats. The loadings, seen in Figures 8.10
and 8.11, refer to movements in the chest, shoulders and upper arms predomi-
nantly in the y axis for the first component, and the head and wrists in both the y
and z axes.
In effort to produce a comparable measure of general motion between perform-
ers, the addition of the weighted values of the first six principal component scores
for each performer produces a motion norm accounting for more than 90% of the
variance in movement. The weightings are calculated from the percentage vari-
ance of each component over the full dataset. These have been resampled with
10,000 points so that variances in timing between each performance are warped so
that results between performers can be directly compared. The distance between
each audio phrase boundary is 0.1 and quoted means and standard deviations are
calculated for the distances between the peaks of the motion trajectory and its cor-
responding phrase boundary. These are measured by finding the local maximum
for each phrase, using a sliding window. The first three performers’ graphs are
shown in Figures 8.12, 8.13 and 8.14 whilst the remaining six pianists graphs are
included in Appendix B.
Figure 8.12 for Performer 1, at first glance shows no real pattern, however a
107
Figure 8.12: Weighted Combination of First Six Principal Components for Per-
former 1, Prelude 7, accounting for 94.1% variance, plotted in Warped Time
Figure 8.13: Weighted Combination of First Six Principal Components for Per-
former 2, Prelude 7, accounting for 91.8% variance, plotted in Warped Time
108
Figure 8.14: Weighted Combination of First Six Principal Components for Per-
former 3, Prelude 7, accounting for 90.4% variance, plotted in Warped Time
peak always occurs with a phrasing boundary suggesting some underlying pat-
tern (mean = 0.0286, s.d.= 0.026). A large dip occurs at the end of phrase 6, coin-
ciding with the harmonic arrival point. Performer 2’s results in Figure 8.13 instead
show a very clear pattern of motion with phrasing (mean = -0.046, s.d. = 0.0217).
The highest point in the motion norm occurs again at the harmonic arrival point.
Finally Figure 8.14 showing results for Performer 3 again shows a clear pattern
with the highest point occurring at the end of phrase 6. However, this reflects the
split gesture seen in the results of the first two principal components.
Despite being a good measure of general motion, reductional methods such as
PCA can get rid of some of the semantics that singular marker’s motion graphs
can show. For this reason, the motion of a few particular markers are observed,
chosen from those which correlate highest with the first principal components.
The plots for the y axis markers for Performer 1 as seen in Figure 8.15(d), Fig-
ure 8.15(e) and Figure 8.15(f) look extremely similar despite being located in dif-
ferent parts of the body. These markers show a trajectory with 8 peaks within the
boundaries of the 8 phrases of Prelude 7. The markers plotted for the x axis in
Figure 8.15(b) and Figure 8.15(c) show a similar pattern to each other with peaks
beginning at each of the phrases. Interestingly the x axis plot for the head marker
in Figure 8.15(a) looks entirely different, yet still exhibiting a peak in the motion
norm within each of the phrases.
Performer 2’s plots of singular markers for the y axis as seen in Figure 8.16(a),
109
(a) Head (x axis) : RFHD(x) (b) Left upper arm (x axis): LUPB(x)
Figure 8.15: Various Raw Marker Data Plotted Against Phrase Boundaries for Per-
former 1, Prelude 7
Figure 8.16(b) and Figure 8.16(c) are again remarkably similar to each other, im-
plying a full upper body movement along the y axis. One marker from the torso
as seen in Figure 8.16(d) plotting the z axis movement shows a pattern throughout
the 8 phrases albeit not as pronounced as those markers plotted for the y axis. The
plot for Performer 2’s wrist z axis as seen in Figure 8.16(e) shows peaks at the start
of each phrase, when the left hand plays the first bass note of each phrase and the
first chord. The subsequent chords are seen to have not so much of a movement
in the z axis implying the first two notes are given more stress. The right finger
plot in Figure 8.16(f) shows a peak at the end of each phrase, however this is due
to the nature of the composition as the performer will need to lift the right hand to
prepare for the next phrase.
110
(a) Clavicle (y axis): CLAV(y) (b) Left upper arm (y axis): LUPB(y)
(e) Left wrist (z axis): LWRB(z) (f) Right finger (z axis): RFIN(z)
Figure 8.16: Various Raw Marker Data Plotted Against Phrase Boundaries for Per-
former 2, Prelude 7
The y axis plots for Figure 8.17(a), Figure 8.17(b) and Figure 8.17(c) for Per-
former 3 again are remarkably similar in pattern to each other, showing a repeat-
ing trajectory for each phrase. The x-axis plots seen in Figure 8.17(d), Figure 8.17(e)
and Figure 8.17(f) are not as similar to each other as the y-axis plots but again show
patterns for the 8 phrases. Differences at this point lie between the left and right
arm markers. This is most likely due to the different rhythms and pitches they are
required to play.
111
(a) C7 (y axis): C7(y) (b) Left shoulder (y axis): LSHO(y)
(c) Right upper arm (y axis): (d) Left wrist (x axis): LWRA(x)
RUPB(y)
(e) Right upper arm (x axis): (f) Right wrist (x axis): RWRB(x)
RUPB(x)
Figure 8.17: Various Raw Marker Data Plotted Against Phrase Boundaries for Per-
former 3, Prelude 7
112
8.2.2 Prelude in B minor No.6
The initial two-bar motif in prelude 6 is in the left hand melody marked in the score
seen in Chapter 7. This motif is varied in the subsequent phrases, first in pitch for
the second phrase, then also in rhythm for the third phrase ending at bar 8. Phrase
4 repeats the opening motif and Phrase 5 ends with a modulation into C major.
These first five phrases represent an agreement in performers’ interpretations and
traditional analyses of this prelude. From phrase 6 onwards, performers held di-
verging views on the structure of the piece. The measured means and standard
deviations of distance between motion peak and phrase boundary are therefore
taken for the first five phrases only.
Observing Performer 1’s results for Prelude 6 (seen in Figure 8.18) and consid-
ering the first five phrases, a pattern of phrasing is reflected by the first compo-
nent. The global maximum occurs at the expansion of the motif in phrase 3 which
represents a climax in this particular section. Loadings for performer 1, seen in
Figures 8.19 and 8.20 identify correlations in movement of the head, upper arms
and chest predominantly in the y axis for the first component and movements of
the wrists and fingers predominantly in z axis for the second component.
Performer 2’s main loadings seen in Figures 8.22 and 8.23 reflect movements of
the upper arms and chest for both the x and y axes for the first component, and the
chest, right wrists and fingers for both the y and z axes for the second component.
The graph of the two components (seen in Figure 8.21) are highly similar to each
other except a slight drag in the second component. An anomaly occurs at the end
of phrase 3 where there appears to be an extra peak in the second component. The
global maximum can again be seen at the start of the phrase expansion in phrase
3.
As a contrast, the first two principal components for Performer 3, seen in Fig-
ure 8.24, appear to be in opposition to each other, yet still in relation with the
occurrence of the phrasing boundaries. Again the global maximum is seen at the
end of phrase 2, beginning of phrase 3 where the motif is first expanded in rhythm.
Loadings can be seen in Figures 8.25 and 8.26 reflecting movements in the head
and chest predominantly in the y axis for the first component, and movements in
the elbows and wrists predominantly in the x axis for the second component.
When these principal components are combined, into the weighted combina-
tion described for in the previous section, we can see clear patterns of phrasing for
each of the three pianists examined. These patterns are again repeated for phrases
of similar rhythm, although it is interesting to note the differences when compared
113
Figure 8.18: First Two Principal Components of Movement for Performer 1, Pre-
lude 6, the first component accounting for 35.3% variance and the second com-
ponent accounting for 34.4% variance, with blue vertical lines representing the
performer’s interpretation of phrasing boundaries as in the audio recording
Figure 8.19: Top Ten Loadings for the First Principal Component, Performer 1,
Prelude 6
114
Marker PC1 PC2 PC3 PC4 PC5 PC6
C7:Z -0.01 0.15 0.05 0.05 0.2 -0.16
LUPC:Z 0.03 0.15 0.15 -0.09 0.1 0.09
LWRA:Z -0.01 0.16 0.06 -0.08 -0.13 0.06
LWRB:Z -0.03 0.16 0.06 -0.11 -0.1 0.07
LFRA:Z -0.02 0.15 0.12 -0.16 0.06 0.06
LFIN:Z -0.05 0.16 0.03 -0.07 -0.15 0.07
RWRA:Z -0.05 0.16 0.05 0.03 -0.13 -0.11
RWRB:Z -0.06 0.16 0.04 0.04 -0.14 -0.08
RFRA:Z -0.09 0.14 0.07 0.11 -0.08 -0.06
RFIN:Z -0.05 0.16 0.02 -0.01 -0.19 0
Figure 8.20: Top Ten Loadings for the Second Principal Component, Performer 1,
Prelude 6
Figure 8.21: First Two Principal Components of Movement for Performer 2, Pre-
lude 6, the first component accounting for 47.4% variance and the second com-
ponent accounting for 15.9% variance, with blue vertical lines representing the
performer’s interpretation of phrasing boundaries as in the audio recording
115
Marker PC1 PC2 PC3 PC4 PC5 PC6
T10:Y 0.14 0.09 -0.06 -0.02 -0.01 -0.05
CLAV:X 0.14 -0.09 0.09 -0.04 0.04 -0.03
STRN:Y 0.14 0.12 -0.02 0.03 -0.03 0.01
LUPA:Y 0.14 0.11 -0.03 0.06 0 -0.03
LUPB:Y 0.15 0.09 -0.02 0.06 0 -0.04
RSHO:X 0.14 -0.07 0.1 -0.04 0.03 -0.01
RSHO:Y 0.14 0.1 -0.09 -0.03 -0.02 -0.03
RUPA:X 0.14 -0.07 0.08 0.03 -0.03 0.1
RUPA:Y 0.14 0.1 -0.06 -0.02 -0.02 -0.04
RUPC:X 0.14 -0.07 0.06 0.05 -0.05 0.13
Figure 8.22: Top Ten Loadings for the First Principal Component, Performer 2,
Prelude 6
Figure 8.23: Top Ten Loadings for the Second Principal Component, Performer 2,
Prelude 6
116
Figure 8.24: First Two Principal Components of Movement for Performer 3, Pre-
lude 6, the first component accounting for 40.6% variance and the second com-
ponent accounting for 21.2% variance, with blue vertical lines representing the
performer’s interpretation of phrasing boundaries as in the audio recording
Figure 8.25: Top Ten Loadings for the First Principal Component, Performer 3,
Prelude 6
117
Marker PC1 PC2 PC3 PC4 PC5 PC6
LELB:X 0.09 0.11 0.04 -0.26 0.19 0.01
LWRA:X 0.12 0.11 0.02 -0.2 -0.04 -0.05
LWRB:X 0.11 0.12 0.02 -0.23 -0.01 -0.01
LFRA:X 0.1 0.11 0.04 -0.25 0.15 -0.01
RELB:X 0.07 0.15 0.13 0.13 -0.07 -0.17
RMEP:X 0.07 0.14 0.14 0.13 -0.05 -0.19
RWRA:X 0.09 0.14 0.1 0.13 -0.1 -0.15
RWRB:X 0.08 0.16 0.1 0.11 -0.14 -0.12
RFRA:X 0.07 0.15 0.12 0.12 -0.09 -0.16
RFIN:X 0.09 0.16 0.09 0.12 -0.14 -0.13
Figure 8.26: Top Ten Loadings for the Second Principal Component, Performer 3,
Prelude 6
118
Figure 8.27: Weighted Combination of First Six Principal Components for Per-
former 1, Prelude 6, accounting for 93.7% variance, plotted in Warped Time
Figure 8.28: Weighted Combination of First Six Principal Components for Per-
former 2, Prelude 6, accounting for 93.2% variance, plotted in Warped Time
119
Figure 8.29: Weighted Combination of First Six Principal Components for Per-
former 3, Prelude 6, accounting for 91% variance, plotted in Warped Time
nature of these phrases, both sets being two bars in length, and phrase 2 being a
rhythmic replica of phrase 1 in each prelude with changes solely in melody and
harmony. The results of this are shown in Table 8.1. Despite some correlations
showing results above 0.8 with a significance of p<0.01, this is not repeated for
the correlation for the same performer in the next phrase of each prelude. Other
correlations are either extremely low or not significant. From this we can reject
Hypothesis 8.2 as performers’ motion profiles appear to differ depending on what
piece they are performing.
As principal components analysis is useful in reducing the number of dimen-
sions of data but often loses the semantics of what the condensed data actually
represents, it is advantageous to examine the components of each marker trajec-
tory separately so as to better understand their semantics.
The three plots for y axis markers for Performer 1 seen in Figure 8.30 show a
120
(a) Clavicle (y axis): CLAV(y) (b) Head (y axis): LFHD(y)
(c) Right shoulder (y axis): RSHO(y) (d) Left wrist (z axis): LWRB(z)
(e) Right wrist (z axis): RWRB(z) (f) Right finger (z axis): RFIN(z)
Figure 8.30: Various Raw Marker Data Plotted Against Phrase Boundaries for Per-
former 1, Prelude 6
121
note production, do show certain patterns that can be attributed to producing a
phrasing contour.
Performer 3’s graphs shown in Figure 8.32 again show the markers for the
upper body in the y axis moving simultaneously in the same direction which is
similar to the principal components motion norm displayed previously. The x axis
movement of the right elbow seen in Figure 8.32(e), and Figure 8.32(f) are reflective
of each other and show patterns involving peaks in the trajectories at the begin-
ning of phrases. The x axis of the left elbow seen in Figure 8.32(d) does not show
entirely a clear pattern of phrasing but places peaks in the trajectory at certain
points in the music, notably at the beginning of phrase 3, being the climax of the
first section of this piece with the highest pitch repetition of the original two-bar
motif and expansion into four bars.
8.2.3 Conclusions
By examining movement of nine performers across two Chopin Preludes, it is
demonstrated that each pianist’s movement is entirely subjective and personal.
No two performers appear to move in exactly the same way for any one piece of
music. However, there appears to be an underlying pattern within these gestures
that relate to phrasing structure. The results from the principal components anal-
ysis for Prelude 7 show clear patterns between the calculated motion norm and
the phrasing boundaries indicating that hypothesis 8.1 is correct. Local maxima in
the motion norm are consistent across phrases in their distance from the phrasing
boundary suggesting that with repeated phrases, performers will reliably produce
the same overall motion. This is reflected by the trajectories shown by plotting
the raw marker data for the highest correlated markers indicated by the loadings.
The loadings for each performer show that the movement cannot be attributed to
any singular marker but instead a combination of many from different parts of
the body. Marker trajectories particularly for the y axis (along the length of the
keyboard) reflect the phrases dictated by traditional analysis. Also we see that
markers in the head, upper torso and shoulders tend to reflect the phrasing struc-
ture more clearly, whereas markers getting closer to the elbows and wrists will
show the beats of each performed note, due to the necessary gestures required for
actual note production. Interestingly, for each performer, their loadings do not
stay consistent between pieces of music. Their calculated motion norm trajectories
as well as the trajectories for the raw marker data are also different between pieces,
suggesting that gesture is not used in the same way across pieces, but may have
122
qualities influenced by rhythm and pitch. This rejects hypothesis 8.2. The identical
rhythm in the phrases of Prelude 7 helps highlight gestures being produced in a
situation where rhythm is controlled. Despite the identical nature of these phrases
in rhythm, each performer’s gesture for each phrase is not entirely identical sug-
gesting that variables such as pitch and harmony contribute to gesture production.
Examining performers’ gestures for Prelude 6 up until the end of phrase 5, we
again see patterns developing between phrases with some slight differences, par-
ticularly at the expansion of the original motif in phrase 3. Some pianists expand
their gesture to cover the entire phrase whereas some are producing almost two
peaks within a gesture, so sub-chunking the movement.
Overall gesture appears to be a good identifier of phrasing structure across
these two pieces despite the pattern within each performer not being consistent. It
will now be examined how aural parameters contribute to the phrasing contour of
the piece and how these interact with gesture at important points in the structure.
123
(a) Clavicle (x axis): CLAV(y) (b) Sternum (y axis): STRN(y)
(e) Right wrist (y axis): RWRB(y) (f) Right wrist (z axis): RWRB(z)
Figure 8.31: Various Raw Marker Data Plotted Against Phrase Boundaries for Per-
former 2, Prelude 6
124
(a) Head (y axis): RFHD(y) (b) Clavicle (y axis): CLAV(y)
(c) Left upper arm (y axis): LUPA(y) (d) Left elbow (x axis): LELB(x)
(e) Right elbow (x axis): RELB(x) (f) Right wrist (x axis): RWRB(x)
Figure 8.32: Various Raw Marker Data Plotted Against Phrase Boundaries for Per-
former 3, Prelude 6
125
8.3 Multi-Modal Analysis
Exploring how aural parameters work with visual parameters to convey structure
in musical performances, two parameters taken from the audio and MIDI data are
examined. The movement parameter is taken from the weighted combinations of
principal components describing the overall body movement, explored in Chap-
ter 8.2. To estimate tempo, the MIDI notes were matched to the MusicXML score
notes in the processes involved in creating Performance Markup Language files
(seen in Section 6.1.1). These files were uploaded to the database (described in
Section 6.2) and queried for the calculation of inter-onset intervals(IOIs) for each
matched note. Each of these values were normalised to a crotchet beat and divided
by 1/60 to give an estimation of beats per minute. Outliers in tempo for specific
notes were removed due to habits of performers when performing semiquavers
following dotted quavers. The semiquaver part of this pair of notes tended to
be highly elongated in comparison to the other notes and was considered to be a
stylistic point. For this reason, these particular semiquavers were removed from
Prelude 7 and Prelude 6 from the calculations of tempo. Dynamics, or loudness,
was estimated by calculating the RMS amplitude of the audio signal using a short
Python script.
Again three of the nine pianists are taken as examples to examine the spread of
data. Each performer’s audio, MIDI and video data is plotted against the phrase
boundaries as they occur in the audio stream. Tempo estimations are plotted at
the note onset of the first of the pair of notes used for calculating the inter-onset
interval. The graphs of the remaining six pianists are seen in Appendix C.
126
Figure 8.33: Motion, Tempo and Dynamics for Performer 1, Prelude 7 with blue
vertical lines representing phrasing boundaries as noted in the recorded audio
this performance are during phrase 6 where the harmonic arrival occurs (between
end 5 and end 6 on the graph) where the tempo measured reaches a global mini-
mum and the rms amplitude instead of reducing throughout the phrase stays at a
constant level. This also aligns with a global minimum in the motion norm.
Performer 2’s multi-modal parameters as seen in Figure 8.34 highlights partic-
ular points in the piece such as the halfway point at the end of phrase 4 where we
see a global maximum in the tempo calculation. This occurs directly after a large
dip in loudness. Another point of interest occurs at the global maximum in the
motion norm at the end of phrase 6 at the harmonic arrival, which corresponds
with a global minimum in tempo.
Performer 3’s graph seen in Figure 8.35 again shows a global minimum in
tempo occurring alongside a global maximum in motion at the harmonic arrival
in phrase 6. Looking at the whole graph we can also see a reflection of the tempo
curve in the motion norm.
Observing these traits across all performers, a direct comparison can be taken
by warping each stream of data with respect to the occurrence of phrase bound-
aries in the audio stream. Distances to the local minima for dynamics and tempo
curves were extracted for each phrase boundary. Distances to the local maxima
were extracted for the motion curves. Two-way ANOVAs showed significant ef-
fects of performer on motion norm (F=12.07, p<0.001), a significant effect of per-
former on dynamics (F=6.26, p<0.05) and of phrase number on tempo (F=11.43,
127
Figure 8.34: Motion, Tempo and Dynamics for Performer 2, Prelude 7 with blue
vertical lines representing phrasing boundaries as noted in the recorded audio
p<0.001). Other effects were not significant. This suggests that performers have a
distinct style of motion and diminuendo. Although they may not vary in their use
of ritardando, this is varied between phrases.
Observing these three graphs we can see patterns of motion, tempo and loud-
ness which occur within each phrase and extremes of these datasets correspond-
ing to points of interest within the piece such as the end of a section or a particular
point of notice in harmony. To discern whether the extremes in the measured pa-
rameters correspond to important points in musical structure, the measurements
are sampled at the point of each note onset for the previously calculated IOI, mo-
tion norm and rms amplitude.
Box-plots showing the spread of data for each parameter can be seen in Fig-
ure 8.36 for tempo, motion norm and rms amplitude respectively. Measurements
for tempo and rms amplitude and motion are normalised between 0 and 1 for each
performer. Each box-plot shows a red line for the median of the data, and the sur-
rounding box shows the first and third quartiles. The extremes of the data not
considered to be outliers are identified by the whiskers of each box, with the out-
liers marked as red crosses. From these box-plots we can also view the preferences
or style of each performer in their use of tempo, dynamics and motion. A thin
box with many outliers suggests that the performer uses a very small range of a
certain parameter throughout the majority of the piece, reserving the extremes for
a few specific points. A large box covering most of the data range suggests that
128
Figure 8.35: Motion, Tempo and Dynamics for Performer 3, Prelude 7 with blue
vertical lines representing phrasing boundaries as noted in the recorded audio
the performer uses a larger spread of the parameter throughout the piece.
These box-plots show the spread of each parameter for each performer to be
very different to each other demonstrating that each pianist has a particular style
of expressing the notes of the piece. Also as not every box suggests a normal
distribution, there appear to be underlying patterns skewed to certain values. The
first two performers show a fairly normal distribution for motion and dynamics
whereas the tempo is slightly skewed. The third performer shows skewed values
for all three parameters.
Calculating the 5th and 95th percentiles for each parameter for each performer,
the extremes of the filtered data below the 5th and above the 95th percentile are
extracted and compared against their occurrence within the score. Hypothesis 8.4
states that the extremes in tempo, dynamics and motion are where the most im-
portant notes of the piece occur. For each of the three example performers, in
Figures 8.37, 8.39 and 8.41, a scatter plot exhibits the spread of extracted data for
each of the three parameters as red crosses.
The extremes of the motion data are plotted over the top of the dataset as blue
markers, the extremes of the tempo data as green markers and the extremes of the
dynamics data as pink markers. These are translated onto a score of the piece with
corresponding colours, as seen in Figures 8.38, 8.40 and 8.42.
From the scatter plot of data for Performer 1 seen in Figure 8.37, we can see
that most data points lie in a cluster in the middle of the graph, however, the few
129
1 1 1
Tempo Tempo Tempo Tempo Tempo
Motion
Dyn'
0.5 0.5 0.5
0 0
1 P1 1 P1 1 P1
Motion
Dyn'
0.5 0.5 0.5
0 0
P2 P2 P2
1 1 1
Motion
Dyn'
0.5 0.5 0.5
0 0
P3 P3 P3
1 1 1
Motion
Dyn'
0.5 0.5 0.5
0 0
P4 P4 P4
1 1 1
Motion
Motion
Dyn'
0.5 0.5
0.5
0 0
P6 P6 P6
1 1 1
Motion
Dyn'
Dyn'
Motion
Dyn'
Figure 8.36: Box-plots for all Nine Performers measuring Tempo, Motion norm
and Dynamics used in Performances of Prelude 7
130
Figure 8.37: Scatter Plot Showing Extremes in Tempo, Dynamics and Motion for
Performer 1, Prelude 7
outliers identified by the pink, green and blue markers indicate particular places
of interest. Some points show duplicates of extremes, with a blue marker occur-
ring at the same place as a pink box, showing a point where the motion has been
varied to a global maximum or minimum at the same point where dynamics have
been varied to a global maximum or minimum. Another point to note is clusters
such as the maxima in tempo denoted by green markers to the right hand side of
the scatter plot, which seem to occur with high values in motion norm, suggesting
that the pianist ties in fast tempi with higher values of their motion profile. To
see how these extremes lie on top of the structural boundaries of the music, these
maxima and minima are plotted on top of the original score. From the translated
score image identifying the outliers in each of the parameters for performer 1 in
Figure 8.38, we can see points of interest particularly at the beginning of the piece
with shows combinations of extremes from motion and dynamics. Also at the har-
monic arrival we can see extremes of tempo and dynamics leading up to the end of
the phrase in bar 12, which is also characterised by extremes in the motion norm.
The end of the piece also sees a combination of parameters in their extremes and
bar 9 which marks the beginning of the second half of the piece with a repetition
of the original phrase sees extremes in both tempo and dynamics. Particular notes
within phrases being accented by extremes of these parameters include the second
beat of bars 1, 3, 9, 11, 13 and 15. These correspond to melodic accents within each
phrase as marked out in Parncutt’s theory of accents in piano performance [91]
(see also Chapter 7). Performer 1’s particular accents correspond to the first pair
of phrases, the first phrase of the second section, the harmonic arrival and the last
pair of phrases.
131
D D
T M D
T
M M
ú
43
D
M D
43
M
T
D
5
D
T
M
M
11 D D
M
D
D T
T
Observing the scatter plot for Performer 2’s data during their performance of
Prelude 7 in Figure 8.39, we see the extremes in dynamics denoted by pink marks,
lie within the mid range of tempo values, something that challenges the general
theory that faster tempi more often than not result in higher dynamics. For the
other extremes, we see a spread of data, however there are quite a few points
where an extreme in motion (blue) coincides with an extreme in dynamics(pink).
For Performer 2’s score plot in Figure 8.40, we can see these combinations of ex-
tremes occurring at the end of section 1 in the fourth phrase at bar 7 and the be-
ginning of section 2 in the fifth phrase at bar 9. Again the harmonic arrival and the
end of the piece are characterised by extremes in motion and tempo and extremes
in all three parameters respectively. Another point of interest is noted at end of
phrase 2 and beginning of phrase 3 at bar 4. This is marked by an extreme in dy-
132
Figure 8.39: Scatter Plot Showing Extremes in Tempo, Dynamics and Motion for
Performer 2, Prelude 7
133
D M
43
D D D
T
3
4
D D
M D D T
5 M D
D T
T
D M D
T
D T T
M
M
11
M T T
D
M
the beginning and end of the piece. Some performers also pick out the halfway
point of the piece at the end of phrase 4. These most important structural fea-
tures tend to be characterised by a combination of extremes in the aural and visual
stream.
134
Figure 8.41: Scatter Plot Showing Extremes in Tempo, Dynamics and Motion for
Performer 3, Prelude 7
D
D D
43
D
3
4
M
M M T
5 T
D T
M
D
D
T
D T
M
D
11
D
M T M
135
8.3.2 Prelude No.6 in B minor
Again for this Prelude as in Section 8.2.2 the analysis will refer solely to the first
five phrases as marked out in traditional analysis in Chapter 7. These five phrases
represent an agreement among performers with an added split in the middle of
phrase 5 where the piece modulates into C major. The multi-modal graphs for each
performer will show their own interpretation of phrasing boundaries marked out
by a blue vertical line identifying their occurrence in the audio stream.
Figure 8.43: Motion, Tempo and Dynamics for Performer 1, Prelude 6 with blue
vertical lines representing the performer’s interpretation of phrasing boundaries
as in the recorded audio
Performer 1’s graph for multi-modal parameters seen in Figure 8.43. Particular
points of interest include the global maximum in tempo at the end of phrase 3
followed by a local minimum which marks the end of the first section of the piece.
The motion trajectory as analysed in the previous section shows distinct patterns
between phrases.
Global maxima in the motion norm for Performer 2’s multi-modal graph seen
in Figure 8.44 appear to correspond to the global maxima in tempo for each phrase.
This is also reflected in the rms amplitude measurement.
Performer 3’s graph of multi-modal parameters seen in Figure 8.45 shows a
peak within phrase 3 in the motion norm which is echoed in the dynamics and
tempo measurements.
Again these streams of data are re-sampled with a time-warping algorithm
which takes the phrasing boundaries into account. Two-way ANOVAs performed
136
Figure 8.44: Motion, Tempo and Dynamics for Performer 2, Prelude 6 with blue
vertical lines representing the performer’s interpretation of phrasing boundaries
as in the recorded audio
Figure 8.45: Motion, Tempo and Dynamics for Performer 3, Prelude 6 with blue
vertical lines representing the performer’s interpretation of phrasing boundaries
as in the recorded audio
137
on the data for the first five phrases showed a significant effect of performer on mo-
tion norm (F=8.27, p<0.05) and of phrase number on dynamics (F=4.81, p<0.05).
No other significant effects were found. Again we see that each performer uses
motion differently on their approach to phrasing boundaries but are consistent
across phrases. The effect of phrasing on dynamics could be a product of the struc-
ture of the phrases, as phrase 3 is expanded into 4 bars instead of the original 2.
Stronger effects may be noted if the performers’ interpretations were in agreement
allowing extraction of data at all phrase boundaries.
The measurements of extremes of each parameter performed for Prelude 7 are
repeated for this prelude. The resulting box plots are shown in Figure 8.46. The
spread of data for all nine performers is remarkably similar.
Comparing these to the spread of data seen in the box-plots in Figure 8.36 for
performances of Prelude 7, we can see some similarities for each performer be-
tween their data sets from each prelude. This suggests that although the use of
parameters for highlighting particular features can change across pieces, perform-
ers tend to use the same spread of tempi, dynamics and motion, implying that
they each have a certain style.
Again, the results of extracting the extremes of data below the 5th percentile
and above the 95th percentile in the spread of data are plotted in scatter plots
seen in Figures 8.47, 8.49 and 8.51. These extremes are examined to see how they
correspond to the structure of the music being performed.
Observing the scatter plot of Performer 1’s data (seen in Figure 8.47) from the
performance of Prelude 6, we can see extremes in motion occurring with both
maxima and minima in tempo and dynamics. This is different to the interaction
of parameters noted for the same performer’s Prelude 7 (see Figure 8.37). This
is another suggestion that performers use these parameters differently for differ-
ent pieces. Also noted are a number of combinations in extremes, particularly
motion(blue) and dynamics(pink). From the translated score image in Figure 8.48
identifying the outliers in each of the parameters, for performer 1 we can see points
of interest particularly at second beat of each phrase marked by a crotchet in the
left hand melody. This is in line with Parncutt’s analysis of the prelude for melodic
accents which appear to be marked here by combinations of extremes in dynamics,
motion and tempo (see Chapter 7).
The scatter plot for Performer 2, as seen in Figure 8.49, shows a spread of mo-
tion extremes throughout values of tempo and dynamics, however the minima in
the dynamic range appear to coincide with low values of motion norm whilst the
138
1 1 1
Tempo
Motion
Dyn'
0.5 0.5 0.5
0 0 0
1 P1 1 P1 1 P1
Tempo
Motion
Dyn'
0.5 0.5 0.5
0 0 0
P2 P2 P2
1 1 1
Tempo
Motion
Dyn'
0.5 0.5 0.5
0 0 0
P3 P3 P3
1 1 1
Tempo
Motion
Dyn'
0.5 0.5 0.5
0 0 0
P4 P4 P4
1 1 1
Tempo
Motion
Dyn'
0.5 0.5 0.5
0 0 0
P5 P5 P5
1 1 1
Tempo
Motion
Dyn'
0.5 0.5 0.5
0 0 0
P6 P6 P6
1 1 1
Tempo
Motion
Dyn'
Motion
Dyn'
Motion
Dyn'
Figure 8.46: Box-plots for all Nine Performers measuring Tempo, Motion norm
and Dynamics used in Performances of Prelude 6
139
Figure 8.47: Scatter Plot Showing Extremes in Tempo, Dynamics and Motion for
Performer 1, Prelude 6
maxima in the dynamic range coincide with high values of motion norm. Again, as
for this performer’s Prelude 7, the motion and dynamic data tend to occur across a
spread of tempi, not limited to low or high values. For Performer 2’s score plot in
Figure 8.40 shows the most combination of extremes at the beginning of the piece
and at the beginning of phrase 3 in bar 5. These align with Parncutt’s grouping ac-
cents which mark out the beginning and end of phrases. A large cluster of tempo
extremes is seen at the end of this first section in bar 8. The beginning of phrase
5 at bar 11 also sees a combination of tempo and motion extremes marking the
modulation into C major.
Performer 3’s scatter plot of data seen in Figure 8.51, shows distinct groupings
of maxima in motion norm occurring at high values of rms amplitude, and minima
of motion norm occurring at low values of rms amplitude. This is slightly differ-
ent to the spread of data found in the same performer’s interpretation of Prelude
7 (as seen in Figure 8.41). These values are slightly skewed for tempo as well with
the lower extremes in motion and dynamics occurring in the bottom half of the
tempo range, and the higher extremes occurring in the top half. Contrary to this
performer’s use of parameters in Prelude 7, there appear to be more parameter
extremes occurring simultaneously with one another. Performer 3’s score plot as
seen in Figure 8.52, shows a particular cluster of extremes in motion and tempo at
bar 7 which in Parncutt’s theory contains a cluster of melodic accents. The end of
phrase 3 at bar 8 is marked by a cluster of dynamics and tempo extremes. Follow-
ing this, the melodic accents on beat 2 of each phrase is marked by either dynamics
and tempo or dynamics and motion. Bar 13 onwards marks a cluster of motion and
tempo extremes as the piece modulates into C major.
140
D D D
T M M M
43
d
D
M
D
3 D
4
T
D M M T D
4
D
T
M
D M
T
M T
8
D D
D T T
D M M M
12
T D T
M
T
141
Figure 8.49: Scatter Plot Showing Extremes in Tempo, Dynamics and Motion for
Performer 2, Prelude 6
142
D
T D D
E
43
T D
43
D M D
4
M
T D
M T
M M
8
T M T
T
T M
T M M T
M T
12
T T
143
Figure 8.51: Scatter Plot Showing Extremes in Tempo, Dynamics and Motion for
Performer 3, Prelude 6
144
D D T D D
43
ú D
T T
3
4
D
M
T M T
4
T
T M
D
8
D
D M
T T
D M M M D D
12
T M M T T
145
8.4 Conclusions
At the beginning of this chapter, four hypotheses were set out suggesting how per-
formers manipulated parameters such as tempo, dynamics and overall motion in
accordance with phrasing structure of the music being performed. Hypothesis 8.1
stated that regardless of the subjective and personal nature of physical gesture in
relation to musical structure, there would exist an underlying pattern that was re-
lated to phrasing and was common across all performers. Hypothesis 8.2 stated
that the underlying motion profile of the performer related to phrasing would be
the same across pieces. Hypothesis 8.3 stated that when investigating the role of
gesture in multi-modal detection of phrasing, a combination of aural and visual
parameters would provide the most accurate indicator of phrasing and following
on from this, Hypothesis 8.4 stated that where combinations of global maxima and
minima occurred in both aural and visual streams of data, these would be related
to the most important structural features of the composition.
From the gestural motion studies conducted in the earlier part of this chapter,
it was shown that despite the idiosyncratic nature of the performers’ gestures in
performances of both preludes, the underlying motion norm suggested the same
phrasing structure. This was confirmed by measuring the local maxima of the
motion profile between phrases for each pianist. These local maxima occurred re-
liably at the same point for each phrase for each performer. These patterns were
evident across all performers despite their background and ideas on movement
within performance. This confirms Hypothesis 8.1. Correlating each performer’s
patterns of motion profile across their performances of the Preludes, it is shown
that few result in a high correlation. Some even result in negative correlations.
This suggests that the motion profile for each performer changes depending on
which piece they are performing. This rejects Hypothesis 8.2. Factors for this may
be due to changes in rhythm, melody or harmony, however, seeing as the rhyth-
mically repeating phrases of Prelude 7 tend to produce similar motion patterns
for each performer, it suggests that motion may be highly linked to rhythm. Mov-
ing onto the results of the multi-modal analysis, structural information appears to
be intrinsic in pianists’ use of both aural and visual parameters within their per-
formances. By the box-plots of data for motion norm, dynamics and tempo for
each performer for each piece, we can see that the spread of these parameters is
not consistent across performers but is similar across pieces. This suggests that
each performer has a particular style of playing. This is reinforced by the two-way
ANOVAs performed on the distances between the local maxima and minima and
146
the nearest phrase boundary which present a significant effect of performer on
motion for both pieces and for dynamics in Prelude 7. Significant effects of phrase
number on tempo for Prelude 7 and tempo for Prelude 6 suggest that performers’
use of these parameters at the ends of phrases is dependent on the position within
the score. As each performer ’style’ is different and the use of these parameters
can be varied according to the position on the score, it becomes apparent through
observation of the multi-modal graphs that a combination of parameters indicates
phrasing boundaries. An example of this is clearest at the harmonic arrival be-
tween phrases 5 and 6 where global maxima and minima in motion, dynamics
and tempo coincide. This suggests that Hypothesis 8.3 is correct.When examining
the maxima and minima of the dataset and their occurrence in the musical score,
it is clear that performers tend to use combinations of these extremes at important
points in structure, suggesting that Hypothesis 8.4 is correct. The location of these
extremes in motion dynamics and tempo occur at particular accents of harmony,
melody and rhythm set out by Parncutt.
In conclusion, structural information can be elucidated from examining certain
performance parameters. The continuous multi-modal streams form patterns for
each of the phrases and the extremes of the performer’s use of tempo, dynamics
and motion identify the most important structural features.
147
Chapter 9
In the previous chapter, results suggested that there are underlying patterns of
physical gesture across all performers and that these could be used to identify
phrasing boundaries. In combination with aural parameters of tempo and dy-
namics, clues concerning the hierarchical phrasing structure can be detected. This
experiment is designed to indicate whether phrasing structure can be predicted
purely from patterns of performance parameters, particularly for pieces of music
where the structure is not so explicit. Within this experiment I also explore the
role of finger gesture in piano performance and whether enhanced movements
can be related to specific accents. This follows on from the exploration of accents
illustrated in Chapter 8.3.
Six professional pianists were recorded performing Chopin’s Prelude in A ma-
jor (Op.28 No.7) and the finale of Chopin’s B flat minor sonata (Op.35). These
recordings were captured through audio, MIDI and finger motion analysis. This
chapter analyses the recordings taken with the multi-modal system described in
Chapter 9.1, building on the methodology described in [76] and extending the
preliminary results published in [79, 80]. The pianists are directed to perform both
Chopin pieces as they would in a normal concert situation. These six professional
pianists are combined from a mixture of lecturers in piano studies and postgradu-
ate students in the following music conservatories: the Royal Scottish Academy of
Music and Drama, Glasgow (Fali Pavri and Carlisle Beresford Anderson Frank),
Napier University, Edinburgh (Simon Coverdale), Royal College of Music, London
(Jessica Chan), Royal Northern College of Music, Manchester (Lauren Hibberd)
and Royal Academy of Music, London (Martin Jones). In order for structure to
be discovered in cases where no a priori information on phrasing is available, this
experiment requires the performer to have concrete ideas on the finale and experi-
148
ence of performing the whole sonata. For this instance, professional pianists who
have these pieces as part of their performing repertoire are used.
The pianists are also recorded for their interpretation of Chopin’s Prelude in A
major to provide a set of control data. For this piece, we can establish how each
performer uses aural and visual parameters at the phrasing boundaries. As these
phrasing boundaries for the finale are not explicitly known, the analysis will focus
on the note level with the first two phrases of the prelude and the first five bars of
the finale.
9.1 Method
As in Chapter 8, these multi-modal parameters of tempo, dynamics and motion
will initially be measured for an explicitly structured piece, Chopin’s Prelude in
A major op.28 no.7, which contains a rhythmically repeating two bar phrase (see
Chapter 7.1). These techniques will then be used in identifying performers’ inter-
pretation of an ambiguously structured piece, Chopin’s B flat minor sonata op.35
finale movement, the opening bars of which are shown in Chapter 7.3. Despite the
diverse opinions as to its analysis, this finale is still a widely performed piece as
part of the B flat minor sonata.
Each performer is recorded using the data capture system described in [79]
and in Chapter 5.2. This system captures audio, MIDI and video data whilst en-
suring minimum disturbance to the performer. The audio data is acquired through
the open source application Ardour whilst the MIDI data is captured through the
Moog Piano bar device [5] into the application Rosegarden. As the finale is gen-
erally performed at fast tempi and is also technically difficult, it is expected that
performers’ full body movements will be restricted [84] and so the motion anal-
ysis focusses entirely on hand movements. Each knuckle and joint of each hand
is detected as an x,y coordinate with the z coordinate estimated from the 3D algo-
rithms in Chapter 4. A post-recording self-report was conducted for each of the
performers, providing us with their own interpretations and comments on each
piece. The hypotheses for this experiment follow on from the results in Chapter 8.
It is expected that by examining patterns of finger motion, tempo and dynamics
for performances of Chopin’s Prelude in A major, phrasing boundaries will be dis-
covered in performances of the finale. As this is a subjective measurement, the
stated hypotheses that follow will be more observations on the analysis of these
parameters.
149
Hypothesis 9.1 Trajectories of finger motion in the x, y and z axis will reflect expres-
sive accents within the phrase.
Hypothesis 9.2 It is expected that wrist motion that reflects movements toward the
soundboard of the keyboard, and movements toward the key-bed will
produce high values of rms amplitude.
Tempo and dynamics information are extracted from the MIDI and audio data
streams respectively. The MIDI data is processed to create matched PML files (see
Chapter 6.1.1 for this process) where each performed note is aligned to a score
note through the use of note IDs. The PML file is then submitted to the database
designed in [94]. This data is then queried for inter-onset intervals (IOIs) between
matched notes and keypress durations and returns a text file with this information
for each note. A separate query produces these IOIs and keypress durations as
bars plotted above notes in a score produced using the music typesetting program
Lilypond [3]. This is particularly useful in fast pieces where a normal time graph
may lose the intricacies of measurements for each note. The calculated IOIs are
converted into an estimation of tempo by normalising each note to a crotchet or
dotted crotched beat depending on the time signature of the piece, and dividing
by 1/60 producing a beats per minute (bpm) value.
Despite the availability of onset and offset pedal information from the MIDI
bar, the pedal markings are not included in the displays of keypress durations. It
was decided that as the pedal information is only present for when the sustain
pedal is fully depressed, it may not be of much use as professional pianists use a
range of pedal angles to alter the sound. Observations of the spectra of notes with
pedal on and off are analysed in [67]. The keypress durations are therefore, not
exactly a measure of the length of time that a particular note is audible but rather
a reference of how long the key is held down for as an estimation of articulation.
This will provide certain clues on accenting of particular notes.
Dynamics are estimated by taking a simple measure of the rms amplitude of
the audio signal. This was preferred over velocity values for each note as the
rms amplitude would provide a better estimation of the overall loudness. This
was decided as the more important value when considering how the performer
communicates phrasing structure.
The motion data is extracted as x, y, z coordinates for each marker. As the hand
has such a high number of degrees of freedom, it was decided not to condense the
data using principal components analysis as in the previous chapter, but instead
to examine particular markers of interest individually. As audio and video results
150
are not yet stored within the PML representation (although this is in development),
they are linked with the performed music by using the open source program Au-
dacity to manually label the bars of the piece from the audio recording.
Particular issues arising from the methodology occurred in both the audio and
video recordings. An unforeseen issue with the Moog Piano bar arose from record-
ing the full range of MIDI notes through one channel. The default function for the
device is to split the keyboard into two channels at the D flat below middle C and
so for the first three recordings (those of Fali Pavri, Simon Coverdale and Carlisle
Frank), only a percentage of MIDI notes have been recorded. This was corrected
for the latter recordings of Jessica Chan, Martin Jones and Lauren Hibberd. The
motion capture method of placing the camera directly above the keyboard ap-
peared to capture the most information of the hand movement. However in two
cases, those of Simon Coverdale and Jessica Chan, the performer moved their head
over parts of the hands, obscuring them to the camera. The motion data was es-
timated for these particular cases and would be improved with better estimation
algorithms as detailed in Chapter 4. For the following results, the examples taken
are those of Lauren Hibberd, Martin Jones and Jessica Chan. The results for the
other three performers can be seen in Appendix D.
Tempo and dynamics information are plotted against the phrasing boundaries
in the same form as for the previous experiment. As the fingers of the hand can
move largely independent of one another (with some biomechanical constraints),
it is decided that PCA analysis will not be useful in examining finger motion dur-
ing performance. Instead, a few markers on each hand will be examined in isola-
tion, enabling the exploration of how finger movement contributes to the overall
phrasing of the piece. Considering the finger motion data, the x axis relates to
movement along the length of the keyboard, the y axis is movement towards and
away from the keyboard and the z axis relates to the height estimation. These
three axes are heavily influenced by the arrangement of notes being played and
height needed in preparation to physically play each note. These factors are all
closely related to sound production. However, what becomes apparent from these
measurements are products such as specific accents and groupings of notes, which
contribute to the performer’s interpretation. Larger body movements in fast pieces
such as these are few and far between, so it is expected that these small measure-
ments will provide the most information.
151
9.2 Results
Each performer is examined first for their performance of the Prelude and then
for the Finale. For a concise spread of results, three performers out of the six are
examined with the remaining graphs attached in Appendix D. It can be observed
from performances of the Prelude how performers employ expressive techniques
to communicate structural information such as phrase endings. Multi-modal pa-
rameters are displayed as stacked graphs for overall comparison of aural and vi-
sual cues. For these graphs, tempo is plotted as an estimation of beats per minute
extracted from the MIDI data, an estimation of dynamics is presented as the rms
amplitude of the audio signal and the left wrist marker and thumb’s metacar-
pophalangeal marker movement for each hand is plotted as an example of gen-
eral hand movement. For these movement graphs, the y axis reflects movement
towards and away from the keyboard, with the y value increasing as the marker
moves further away from the keyboard. The x axis reflects movement from the
left to the right of the keyboard, the x value increasing as the marker moves to
the right. The z axis estimate reflects movement in height from the key-bed to-
wards the camera, with the z value increasing as the marker moves away from the
camera. This value is an estimate subject to noise, due to the limitations of the sys-
tem design (see Chapter 4.5) in an effort to construct a lightweight, low-cost image
capture application. It should therefore be taken as an indication of height changes
instead of a strict measurement. This, however, will provide valuable information
as to how finger height changes throughout each phrase and in relation to other
audible parameters.
152
Martin Jones performing Chopin Prelude in A major
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
260
LH RH
220
movement
(pixels)
y axis
Wrist
180
140
100
1150
1100 LH RH
1050
z axis estimate
1000
movement
950
(mm)
Wrist
900
850
800
750
700
650
8000
RMS values
Start Bar 1 Bar 2 Bar 3 Bar 4
amplitude
Rms
4000
0
150
tempo
125
Tempo
(bpm)
100
75
50
15 16 17 18 19 20 21
Time(s)
Figure 9.1: Wrist Motion, Tempo and Dynamics for Martin Jones, Prelude in A
Major, with the first phrase running from the first blue vertical line until two-
thirds through bar 2 and the second phrase running from this point until two-
thirds through bar 4.
hands. In the z axis, in both hands, there is an increase in distance away from
the camera, towards the key-bed, towards the middle of the phrase. This phrase
runs from the start arrow to two-thirds through the second bar and also shows a
decrease in distance towards the end. This suggests that the hands are shaping
the phrase with the emphasis being on the middle of bar one. The three chords
at the end of the phrase seem to be played with decreasing height, which would
suggest the chords are being played with a lighter touch, and we would expect
smaller measures of rms amplitude for each consequent chord. This is seen in the
measurements of dynamics underneath.
The results of the lilypond typeset graphic produced from the database are
displayed in Figure 9.3 showing both IOIs and keypress durations in columns un-
derneath each matched note. The first column represents the IOIs data and the
153
second column represents the keypress duration data. In this particular case, as
the Prelude is short and relatively simple in harmony and structure, we do not
glean much more information from this representation than noted in the previous
time graphs, and so this figure is included purely for interest as it is still a better
representation considering each note.
This alternative representation although useful for scrutiny of every single note,
does not provide far more information than the original graphs in this case. They
may however, be useful for exploring comparisons between performers in a note-
to-note basis. For the rest of the examples, the database figures will only be shown
for performances of the finale. The remaining three pianists’ database results for
the finale are seen in Appendix E.
For Martin Jones’ performance of the finale, the wrist motion is displayed
alongside tempo and dynamics in Figure 9.5. Again, the x axis can be consid-
ered a representation of pitch and to an extent, the y axis represents the playing of
black notes, as the hand is generally moved into the piano to allow the performer
to reach the note. This is not exclusive however, as we see the difference between
the first two bars in the y axis despite the construction of notes in each bar in terms
of black and white notes is similar. Interestingly, the left and right hand do not
show the same pattern of movement which may be expected as the y axis move-
ment was entirely dependent on the position of the white and black notes in the
score. In this axis, the right hand shows a repeated movement spanning the first
four bars which suggests the twelve quavers in each bar are separated into groups
of six which is also reflected in the tempo patterns. The z axis of wrist movement
shows peaks (which reflect a higher distance away from the camera, and thus a
movement towards the key-bed) at the beginning of bar 2 and in the first and sec-
ond halves of bar 3. Dynamics show a clear separation in the middle of bar 3 and
again a dip at the beginning of bar 5.
Observing the thumb motion in Figure 9.6 for which the right hand plays what
may be considered by some analyses the most accented notes in each bar, we can
see a similar pattern in the y axis to the wrist motion in Figure 9.5 where the dips
correspond to the notes being played in the right hand. The left hand thumb plays
different notes to the right hand and so exhibits a different pattern of motion sug-
gesting this accenting may be true for this particular performance.
Focussing more on the aural parameters and the particular note accents, the
database result of the IOIs and keypress durations is presented in Figures 9.7
and 9.8. This shows a particularly elongated note for the F at the very beginning of
154
the piece, with the B flat in the 3rd bar also held down for considerably longer than
the consecutive notes. This elongated note coincides with the emphasised move-
ment seen in Figure 9.5. This note elongation is not imitated at the beginning of
bar 5, suggesting the performer is not making an effort to distinguish this bar from
the previous notes. The approach to this potential boundary is not characterised
by notable fluctuations in tempo, however, there is a slight diminuendo at the end
of bar 4.
For Martin Jones’ performance of the finale, we can infer that bar 5 is not
marked particularly as a phrasing boundary but simply a continuation, particu-
larly as his measurements from the prelude appear to highlight the start of new
phrases with all three measured parameters. Attention is drawn to bar 3, where
particular accents in movement and tempo are most likely a result of the change in
composition where each group of six quavers are now different pitches as opposed
to the repetition in pitch of six quaver groups seen at the beginning.
155
Martin Jones performing Chopin Prelude in A major
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
260
LH RH
220
movement
(pixels)
Thumb
y axis
180
140
100
850
800 LH RH
750
z axis estimate
700
movement
Thumb
650
(mm)
600
550
500
450
400
350
8000
RMS values
Start Bar 1 Bar 2 Bar 3 Bar 4
amplitude
Rms
4000
0
150
tempo
125
Tempo
(bpm)
100
75
50
15 16 17 18 19 20 21
Time(s)
Figure 9.2: Thumb Motion, Tempo and Dynamics for Martin Jones, Prelude in
A Major, with the first phrase running from the first blue vertical line until two-
thirds through bar 2 and the second phrase running from this point until two-
thirds through bar 4.
156
å å åå åå åå å åå åå åå
å å å
å å å å
å
0.74 0.59 0.17 0.65 0.65 1.40 0.63 0.48 0.20 0.66 0.69 1.39 0.65 0.18
1.04 0.63 0.23 0.31 0.23 0.71 0.75 0.51 0.08 0.38 0.29 0.74 0.74 0.22
å å åå åå
å å
å å å
0.71 0.67 0.63 0.67 0.57 0.67 0.69
å å åå å åå
å åå åå å å åå
5
å åå å å å
å å
0.63 0.71 1.34 0.66 0.52 0.21 0.72 1.89 0.48 0.67 0.68 1.39
0.31 0.30 0.87 0.41 0.26 0.59 0.27 0.96 0.46 0.38 0.29 0.73
å åå å å
åå åå
åå å å å
å å
0.61 0.64 0.72 0.64 0.70 0.67 0.63 0.67
0.26 0.18 0.85 0.21 0.27 0.22 0.98 0.24 0.28 0.23 0.94
Figure 9.3: Database Results Page 1 for Martin Jones, Prelude in A Major, the first
row of columns detailing inter-onset intervals and the second row of columns de-
tailing the keypress durations
157
2
åå å åå åååå åå
åå
åå åå åå åå åå
å å
å åå åå å å å
9
å
0.62 0.52 0.22 2.27 0.88 0.52 0.22 0.72 0.91 1.91 0.73 0.56 0.43 0.82 1.27
0.76 0.22 1.25 1.44 0.30 0.55 0.08 0.42 0.38 1.11 0.47 0.25 0.10 0.38 0.42 2.45
åå åå å åå
å å å åå åå å
å å
å
å å
0.73 0.66 0.97 0.70 0.74 0.93 0.99 2.03
0.23 0.33 0.26 1.76 0.24 0.32 0.36 1.12 0.18 0.33 0.33 2.39
Figure 9.4: Database Results Page 2 for Martin Jones, Prelude in A Major, the first
row of columns detailing inter-onset intervals and the second row of columns de-
tailing the keypress durations
158
Martin Jones performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
260
LH RH
220
movement
(pixels)
y axis
Wrist
180
140
100
LH RH
950
z axis estimate
movement
(mm)
Wrist
850
750
650
Start Bar 2 Bar 3 Bar 4 Bar 5
8000
RMS values
amplitude
Rms
4000
0
850
750 tempo
650
550
Tempo
(bpm)
450
350
250
150
50
15 16 17 18 19 20
Time(s)
Figure 9.5: Wrist Motion, Tempo and Dynamics for Martin Jones performing the
Chopin finale
159
Martin Jones performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
180 LH RH
movement
(pixels)
Thumb
y axis
140
100
2250
2050 LH RH
z axis estimate
1850
movement
1650
Thumb
(mm)
1450
1250
1050
850
650
Start Bar 2 Bar 3 Bar 4 Bar 5
8000
RMS values
amplitude
Rms
4000
0
850
750 tempo
650
550
Tempo
(bpm)
450
350
250
150
50
15 16 17 18 19 20
Time(s)
Figure 9.6: Thumb Motion, Tempo and Dynamics for Martin Jones performing the
Chopin finale
160
12
8
0.13 0.04 0.09 0.10 0.12 0.05 0.08 0.11 0.10 0.14 0.09 0.10 0.04 0.09 0.10 0.12 0.09 0.11 0.03 0.09 0.08 0.13
0.12 0.16 0.13 0.14 0.09 0.06 0.09 0.12 0.12 0.06 0.06 0.05 0.09 0.07 0.14 0.07 0.08 0.06 0.07 0.08 0.10 0.07 0.05
12
8
0.10 0.12 0.03 0.09 0.09 0.09 0.07 0.09 0.11 0.09 0.07 0.10 0.14 0.06 0.05 0.07 0.07 0.17
0.25 0.15 0.08 0.16 0.13 0.16 0.14 0.13 0.16 0.10 0.10 0.16 0.09 0.08 0.15 0.28 0.09 0.09 0.10 0.11 0.08
3
0.15 0.04 0.06 0.13 0.08 0.11 0.10 0.08 0.09 0.10 0.10 0.10 0.07 0.10 0.15 0.02 0.10 0.06 0.14
0.08 0.10 0.12 0.09 0.07 0.08 0.07 0.10 0.10 0.09 0.05 0.09 0.09 0.14 0.12 0.07 0.07 0.07 0.18 0.09 0.07
0.12 0.03 0.05 0.16 0.11 0.04 0.11 0.12 0.07 0.14 0.03 0.14 0.09 0.10 0.05 0.17 0.06 0.14
0.09 0.09 0.08 0.08 0.08 0.09 0.10 0.08 0.07 0.07 0.08 0.08 0.15 0.08 0.10 0.08 0.08 0.13 0.08 0.08 0.07
Figure 9.7: Database Results Page 1 for Martin Jones, B Flat minor Sonata finale, the
first row of columns detailing inter-onset intervals and the second row of columns
detailing the keypress durations
161
2
5
0.09 0.06 0.14 0.10 0.08 0.09 0.12 0.04 0.13 0.09 0.08 0.12 0.14 0.13 0.09 0.08 0.08 0.09 0.12 0.10
0.07 0.08 0.07 0.10 0.09 0.11 0.11 0.07 0.10 0.06 0.08 0.08 0.06 0.09 0.07 0.09 0.06 0.10 0.08 0.06 0.07 0.05
0.10 0.05 0.11 0.11 0.05 0.15 0.08 0.08 0.10 0.14 0.09 0.09
0.09 0.09 0.09 0.06 0.08 0.09 0.07 0.07 0.06 0.10 0.11 0.16 0.08 0.18 0.22 0.08 0.08 0.06
7
0.14 0.08 0.09 0.17 0.09 0.12 0.11 0.06 0.10 0.07 0.12 0.11 0.07 0.09 0.08 0.11 0.08 0.07 0.08 0.16
0.08 0.08 0.11 0.13 0.08 0.11 0.06 0.08 0.06 0.08 0.06 0.08 0.10 0.07 0.08 0.08 0.06 0.08 0.07 0.06 0.09 0.08
0.09 0.10 0.07 0.09 0.18 0.08 0.11 0.07 0.13 0.08 0.07 0.10 0.08 0.07 0.09
Figure 9.8: Database Results Page 2 for Martin Jones, B Flat minor Sonata finale, the
first row of columns detailing inter-onset intervals and the second row of columns
detailing the keypress durations
162
9.2.2 Jessica Chan
Considering another set of performances, Jessica Chan’s Prelude performance is
plotted in Figure 9.9. A similar pattern in the x axis of the wrist motion reflects the
pitch of the phrase. Jessica’s wrists move more frequently towards and away from
the keyboard, with each chord being shaped by the movement of the hand. From
this we can see how the performer ’releases’ each chord by movements away from
the keyboard and towards the camera in height. A general decrease in dynamics is
seen throughout the phrase (from the beginning of the piece until two thirds of the
way through bar 2) with a dynamic peak on the metrical accent on the first beat of
every two bars.
Jessica Chan performing Chopin Prelude in A major
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
250
230 LH RH
210
190
movement
(pixels)
170
y axis
Wrist
150
130
110
90
70
50
1600 LH RH
z axis estimate
1400
movement
1200
(mm)
Wrist
1000
800
600
12000
RMS values
amplitude
8000
Rms
0
150
tempo
130
110
Tempo
(bpm)
90
70
50
15 16 17 18 19 20 21 22
Time(s)
Figure 9.9: Wrist Motion, Tempo and Dynamics for Jessica Chan, Prelude in A Ma-
jor, with the first phrase running from the first blue vertical line until two-thirds
through bar 2 and the second phrase running from this point until two-thirds
through bar 4.
The thumb motion suffered during recording by being obscured by the head,
163
but the data displayed in Figure 9.10 still shows a pattern where each chord expe-
riences a movement in the hand which may be the performer ’releasing the chord’.
The biggest thumb height fluctuation is seen at the beginning of each phrase (at
the start and at the end of bar 2) despite it not being responsible for the production
of each beginning note.
Jessica Chan performing Chopin Prelude in A major
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
250
230 LH RH
210
190
movement
(pixels)
Thumb
170
y axis
150
130
110
90
70
50
3100
LH RH
2600
z axis estimate
movement
2100
Thumb
(mm)
1600
1100
600
12000
RMS values
amplitude
8000
Rms
0
150
tempo
130
110
Tempo
(bpm)
90
70
50
15 16 17 18 19 20 21 22
Time(s)
Figure 9.10: Thumb Motion, Tempo and Dynamics for Jessica Chan, Prelude in
A Major, with the first phrase running from the first blue vertical line until two-
thirds through bar 2 and the second phrase running from this point until two-
thirds through bar 4.
Jessica Chan’s performance of the finale is seen for these same parameters in
Figure 9.11 which much like Martin Jones’ tempo estimations suggests grouping
the notes into sixes. The dynamics here reflect this grouping to a certain extent
with the peaks in bar 2 and bar 3 and large peak just before bar 5. The grouping
is demonstrated physically by the y axis movement. An interesting point in the z
axis movement occurs simultaneously with the peak in rms amplitude occurring
around the E flat of the 4th bar which is also the highest pitch occurring across the
164
piece so far. Another point occurs before this in the z axis where there is an de-
crease in distance from the camera coinciding with a shift away from the keyboard
in the y axis. This could possibly be a product of a fingering change resulting in a
quick lift away from the keyboard.
Jessica Chan performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
260
LH RH
220
movement
(pixels)
y axis
Wrist
180
140
100
1400
LH RH
z axis estimate
1200
movement
(mm)
Wrist
1000
800
600
12000
RMS values
amplitude
8000
Rms
0
550
tempo
450
350
Tempo
(bpm)
250
150
50
22 23 24 25 26 27
Time(s)
Figure 9.11: Wrist Motion, Tempo and Dynamics for Jessica Chan, performing the
Chopin finale
The thumb motion is seen in Figure 9.12 shows large increases in distance from
the camera, moving towards the key-bed in the first bar, particularly in the right
hand. This pattern does not continue suggesting that this particular emphasis is
just for the opening bar of the phrase. Another peak in distance in the z axis is
seen in the second half of bar 3, much like Martin Jones’ emphasis of this change
in composition.
Delving into the note level of the aural parameters, the database result for Jes-
sica Chan is displayed in Figures 9.13 and 9.14. We observe again a slight elon-
gation in IOI and keypress duration for the first note in the piece but not as pro-
nounced as in Martin Jones’ performance. This coincides with the emphasised
165
Jessica Chan performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
170 LH RH
movement
130
(pixels)
Thumb
y axis 90
50
2600
LH RH
2200
z axis estimate
movement
1800
Thumb
(mm)
1400
1000
600
12000
RMS values
amplitude
8000
Rms
0
550
tempo
450
350
Tempo
(bpm)
250
150
50
22 23 24 25 26 27
Time(s)
Figure 9.12: Thumb Motion, Tempo and Dynamics for Jessica Chan, performing
the Chopin finale
movements seen in the first bar for the thumb movement in Figure 9.12. The E flat
in the fourth bar also shows a particularly held on note, reflected in the previous
movement and dynamics analysis. However, no specific accents appear to occur
at the beginning of bar 5.
Relating this information back to her performance of the Prelude, we see a large
fluctuation in dynamics and movement near the end of bar 4 in the finale which
would suggest the end of a phrase, however, this does not appear to be charac-
terised by the same movement in tempo. We can infer from this that although bar
5 has not been ’marked’ by the performer as a definite phrasing boundary, it is still
recognised as a juncture where the notes experience a change in composition and
key, much like the change midway through bar 3.
166
12
8
0.30 0.09 0.17 0.12 0.14 0.06 0.13 0.12 0.09 0.08 0.07 0.15 0.11 0.12 0.08 0.09 0.07 0.15 0.09
0.32 0.16 0.31 0.13 0.10 0.06 0.11 0.23 0.16 0.10 0.15 0.20 0.18 0.12 0.09 0.06 0.19 0.19 0.19 0.12 0.10
12
8
0.24 0.11 0.15 0.14 0.13 0.07 0.10 0.16 0.06 0.19 0.10 0.07 0.13 0.09 0.10 0.10 0.09 0.08 0.13 0.09
0.23 0.18 0.21 0.17 0.13 0.11 0.10 0.11 0.13 0.12 0.06 0.06 0.08 0.12 0.10 0.07 0.12 0.19 0.19 0.15 0.17 0.10
3
0.12 0.07 0.09 0.13 0.09 0.09 0.11 0.09 0.11 0.09 0.12 0.08 0.09 0.12 0.11 0.09 0.11 0.11 0.07 0.13 0.04 0.17
0.09 0.08 0.12 0.13 0.07 0.09 0.09 0.08 0.21 0.11 0.13 0.09 0.12 0.14 0.19 0.19 0.12 0.05 0.10 0.08 0.06 0.16 0.06
0.12 0.09 0.13 0.06 0.11 0.11 0.08 0.09 0.11 0.07 0.12 0.13 0.09 0.09 0.07 0.10 0.09 0.15
0.11 0.09 0.09 0.08 0.17 0.13 0.10 0.13 0.08 0.09 0.14 0.15 0.12 0.12 0.08 0.05 0.08 0.07 0.12 0.11 0.12
Figure 9.13: Database Results Page 1 for Jessica Chan, B Flat minor Sonata fi-
nale, the first row of columns detailing inter-onset intervals and the second row
of columns detailing the keypress durations
167
2
5
0.09 0.10 0.12 0.11 0.07 0.10 0.09 0.08 0.13 0.06 0.10 0.12 0.09 0.07 0.09 0.11 0.07 0.13 0.06 0.08 0.11 0.12 0.10 0.14
0.13 0.11 0.16 0.12 0.11 0.12 0.09 0.11 0.11 0.07 0.12 0.08 0.11 0.08 0.18 0.11 0.09 0.14 0.07 0.13 0.10 0.10 0.16 0.14
0.09 0.10 0.13 0.10 0.09 0.12 0.07 0.07 0.11 0.07 0.11 0.14 0.10 0.06 0.10 0.12 0.06 0.11 0.08 0.07 0.12 0.11 0.10 0.14
0.13 0.08 0.13 0.11 0.14 0.12 0.12 0.08 0.08 0.06 0.19 0.12 0.13 0.06 0.12 0.13 0.06 0.12 0.09 0.08 0.09 0.10 0.08 0.22
7
0.14 0.10 0.09 0.09 0.12 0.10 0.06 0.11 0.10 0.06 0.10 0.09 0.08 0.09 0.10 0.10 0.09 0.07 0.07 0.11 0.08 0.09 0.08 0.12
0.13 0.13 0.16 0.12 0.14 0.07 0.06 0.07 0.08 0.11 0.07 0.08 0.10 0.11 0.10 0.07 0.08 0.14 0.07 0.09 0.06 0.10 0.10 0.14
0.19 0.07 0.07 0.08 0.13 0.03 0.10 0.14 0.08 0.07 0.10 0.11 0.08 0.08 0.11 0.07 0.08 0.11 0.05 0.10 0.10 0.06 0.08 0.13
0.19 0.11 0.10 0.09 0.10 0.04 0.09 0.09 0.08 0.06 0.09 0.06 0.09 0.10 0.11 0.07 0.07 0.10 0.06 0.08 0.08 0.07 0.10 0.08
Figure 9.14: Database Results Page 2 for Jessica Chan, B Flat minor Sonata fi-
nale, the first row of columns detailing inter-onset intervals and the second row
of columns detailing the keypress durations
168
9.2.3 Lauren Hibberd
The final example of results is produced from Lauren Hibberd’s performances.
Figure 9.15 shows only slight fluctuations in tempo for each phrase, increasing
slightly in the middle at halfway through bar 1, and decreasing slightly for the
end of the phrase at two thirds through bar 2. The dynamics increase from the
beginning of each phrase and does not show any overall decreases apart from
those characteristic to the piano action. The wrist movement shows the ’releasing’
action of each chord much like the performance of Jessica Chan. A increase in
distance away from the camera at the beginning of bar 1 in the left hand could
reflect the metrical accent of the first beat of that bar, an accent which occurs again
at the beginning of bar 3. The general shape of the wrist movement of the left hand
in the z axis reflects this phrase shaping, with the accent on the first beat, followed
by the three chords played at the same height.
The thumb motion as displayed in Figure 9.16 shows a slightly different pattern
in the y axis movement with a movement into the keyboard at the last chord of
phrase 1 occurring two thirds through bar 2. This also occurs in the left hand at
the beginning of bar 1 and 3 where the metrical accents of the phrase occur. Slight
decreases in height above the normal can be seen at these accents as well. Height
fluctuations in the left hand show particular emphasis at these metrical accents as
well as on the last chord of the phrase.
Observing this pattern of performance parameters for Lauren Hibberd’s per-
formance of the finale, as seen in Figure 9.17, we can see a general crescendo in
dynamics towards bar 5. The tempo fluctuations again appear to group each bar
of quavers into sixes. This grouping can be seen reflected in the y axis movement,
however this pronounced shaping ceases at bar 4 where the left hand decreases at
one point near the start of the bar, and the right hand decreases at another point
near the end of the bar. Both of these occur simultaneously with increases in the
rms amplitude. Wrist movements in the z axis in bars 1-2 also reflect this grouping
with a particular increase in height towards the key-bed halfway through bar 4
coinciding with a peak in the rms amplitude.
The thumb motion for the finale as seen in Figure 9.18 shows similar results to
the wrist motion in the y axis, with three distinct peaks towards and away from
the piano in bar 3. The thumb height also reflects the grouping movements in bars
1 and 2.
The database results in Figures 9.19 and 9.20 shed light on the previous graph
findings by displaying increased IOIs and keypress durations at the 3rd and 4th
169
Lauren Hibberd performing Chopin Prelude in A major
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
250
LH RH
200
movement
(pixels)
y axis
Wrist
150
100
1000 LH RH
z axis estimate
movement
900
(mm)
Wrist
800
700
12000
RMS values
amplitude
8000
Rms
120 tempo
100
Tempo
(bpm)
80
60
40
10 11 12 13 14 15 16 17
Time(s)
Figure 9.15: Wrist Motion, Tempo and Dynamics for Lauren Hibberd, Prelude in
A Major, with the first phrase running from the first blue vertical line until two-
thirds through bar 2 and the second phrase running from this point until two-
thirds through bar 4.
170
Lauren Hibberd performing Chopin Prelude in A major
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
200
LH RH
movement
150
(pixels)
Thumb
y axis
100
50
1800 LH RH
z axis estimate
1600
movement
Thumb
1400
(mm)
1200
1000
800
12000
RMS values
amplitude
8000
Rms
120 tempo
100
Tempo
(bpm)
80
60
40
10 11 12 13 14 15 16 17
Time(s)
Figure 9.16: Thumb Motion, Tempo and Dynamics for Lauren Hibberd, Prelude
in A Major, with the first phrase running from the first blue vertical line until
two-thirds through bar 2 and the second phrase running from this point until two-
thirds through bar 4.
171
Lauren Hibberd performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
LH RH
180
movement
(pixels)
y axis
Wrist
140
100
950
LH RH
z axis estimate
900
movement
(mm)
Wrist
850
800
750
12000
RMS values
amplitude
8000
Rms
0
550
tempo
450
350
Tempo
(bpm)
250
150
50
16.5 17.5 18.5 19.5 20.5 21.5
Time(s)
Figure 9.17: Wrist Motion, Tempo and Dynamics for Lauren Hibberd, performing
the Chopin finale
172
Lauren Hibberd performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
200
LH RH
movement
150
(pixels)
Thumb
y axis
100
50
1450 LH RH
z axis estimate
movement
1250
Thumb
(mm)
1050
850
650
12000
RMS values
amplitude
8000
Rms
0
550
tempo
450
350
Tempo
(bpm)
250
150
50
16.5 17.5 18.5 19.5 20.5 21.5
Time(s)
Figure 9.18: Thumb Motion, Tempo and Dynamics for Lauren Hibberd, perform-
ing the Chopin finale
173
quaver of each group of six. This does not fluctuate much on the approach to bar
5. From these parameters we could infer again that this performer does not mark
bar 5 as a definitive phrasing boundary.
These results were similar for most performances with one exception being Si-
mon Coverdale whose elongated notes in bar 5 matched with a ritardando and
diminuendo on the approach suggest the presence of a phrasing boundary. Fali
Pavri and Carlisle Frank showed similar increases in dynamics much like Lauren
Hibberd’s performance, however, this was not matched by similar fluctuations in
tempo. This in-depth note analysis presented by the database alongside graphs of
3D motion in the pianists’ fingers has allowed us to observe the particularly ac-
cented or ’stressed’ notes in an effort to elucidate the structure being performed.
Analysis of movement in the wrist and thumb markers have indicated particu-
lar accents on notes in both performances of the prelude and finale, confirming
hypothesis 9.1. Measurements of height appear to relate to the dynamics of the re-
sultant notes in the performance, however, as the z axis is an estimate, we cannot
outrightly confirm hypothesis 9.2. From these continuous results of movement,
tempo and dynamics, the maxima and minima of each dataset are examined for
their position in the score.
174
12 ^ ^
8
^ ^ ^
0.12 0.07 0.11 0.17 0.09 0.13 0.13 0.09 0.13 0.16 0.11 0.14 0.14 0.13 0.10 0.12 0.16 0.13 0.13 0.12 0.10 0.11 0.15 0.10
0.17 0.16 0.23 0.19 0.05 0.10 0.13 0.14 0.21 0.18 0.10 0.04 0.08 0.10 0.14 0.16 0.18 0.12 0.11 0.11 0.14 0.09 0.19 0.05
12 ^ ^ ^ ^
8 ^
0.15 0.11 0.07 0.13 0.11 0.17 0.11 0.10 0.09 0.17 0.08 0.17 0.14 0.14 0.11 0.10 0.14 0.14 0.14 0.16 0.08 0.11 0.09 0.14
0.22 0.16 0.19 0.12 0.09 0.14 0.14 0.17 0.27 0.22 0.09 0.06 0.19 0.24 0.20 0.19 0.14 0.14 0.16 0.19 0.09 0.18 0.11 0.05
^ ^ ^
3
^ ^ ^ ^
^ ^ ^
0.16 0.10 0.11 0.15 0.10 0.14 0.11 0.11 0.14 0.13 0.09 0.14 0.13 0.11 0.14 0.11 0.09 0.15 0.11 0.14 0.11 0.14
0.19 0.15 0.14 0.14 0.10 0.05 0.13 0.21 0.17 0.13 0.10 0.15 0.14 0.21 0.16 0.13 0.11 0.07 0.19 0.22 0.18 0.23 0.13
^ ^ ^ ^ ^ ^
^ ^ ^ ^
0.15 0.08 0.12 0.18 0.08 0.14 0.11 0.12 0.15 0.14 0.09 0.11 0.12 0.11 0.15 0.14 0.07 0.16 0.11 0.18 0.08 0.15
0.21 0.21 0.21 0.21 0.08 0.05 0.20 0.26 0.17 0.11 0.09 0.13 0.23 0.29 0.17 0.14 0.08 0.07 0.18 0.24 0.11 0.16 0.09
Figure 9.19: Database Results Page 1 for Lauren Hibberd, B Flat minor Sonata
finale, the first row of columns detailing inter-onset intervals and the second row
of columns detailing the keypress durations
175
2
^
^ ^ ^ ^
^
5
^
0.10 0.13 0.15 0.11 0.10 0.16 0.12 0.09 0.13 0.12 0.11 0.13 0.12 0.09 0.14 0.14 0.09 0.16 0.08 0.12 0.15 0.11 0.12 0.09
0.12 0.24 0.15 0.11 0.15 0.16 0.11 0.11 0.10 0.14 0.15 0.14 0.11 0.10 0.14 0.14 0.10 0.15 0.09 0.16 0.15 0.12 0.10 0.11
^ ^ ^
^ ^ ^ ^
0.12 0.08 0.15 0.14 0.12 0.10 0.09 0.11 0.15 0.09 0.15 0.17 0.10 0.10 0.13 0.16 0.08 0.03 0.12 0.12
0.19 0.17 0.14 0.13 0.13 0.09 0.09 0.12 0.13 0.06 0.15 0.16 0.14 0.12 0.14 0.18 0.08 0.14 0.33 0.16 0.08 0.12
^ ^ ^
^ ^ ^
7
^ ^ ^
0.14 0.12 0.11 0.15 0.14 0.14 0.08 0.15 0.12 0.11 0.13 0.13 0.09 0.13 0.15 0.12 0.14 0.12 0.09 0.14 0.14 0.09 0.12 0.13
0.15 0.12 0.15 0.17 0.16 0.12 0.09 0.15 0.09 0.06 0.19 0.10 0.11 0.18 0.15 0.12 0.12 0.08 0.11 0.15 0.09 0.09 0.14 0.21
^ ^ ^ ^ ^
^ ^ ^ ^
0.12 0.11 0.12 0.13 0.14 0.12 0.09 0.17 0.12 0.10 0.17 0.12 0.09 0.12 0.15 0.11 0.13 0.12 0.11 0.14 0.13 0.07 0.15 0.13
0.15 0.10 0.13 0.16 0.18 0.08 0.08 0.13 0.06 0.07 0.15 0.07 0.12 0.15 0.26 0.09 0.11 0.11 0.15 0.14 0.09 0.08 0.16 0.14
Figure 9.20: Database Results Page 2 for Lauren Hibberd, B Flat minor Sonata
finale, the first row of columns detailing inter-onset intervals and the second row
of columns detailing the keypress durations
176
from the keyboard. Plotting these results as box-plots for each performer, this
allows observation on each performer’s use of these parameters throughout each
piece. Each measurement is normalised for each performer.
The tempo for performances of the prelude tends not to fluctuate too wildly as
each performer has a fairly limited spread of results as seen in Figure 9.21. Mo-
tion is varied more often in general than tempo and dynamics, however, Martin
Jones and Simon Coverdale appear to have a skewed distribution. For dynamics
however, Martin Jones and Lauren Hibberd have a more normal distribution com-
pared to the other performers. Performances of the finale invoke a more similar
use of performance parameters across performers, as seen in Figure 9.22. Notably
one would expect the prelude to have far more expressive movement which may
be true considering the release of notes, but this is not captured in this extracted
dataset.
1 1 1
Dynamics
Tempo
Motion
0.5 0.5
0 0.5 0
CFrank CFrank CFrank
1 1 1.5
Dynamics
Tempo
Motion
0.5 1
0 0.5 0.5
FPavri FPavri FPavri
1 1 1
Dynamics
Tempo
Motion
0.5 0.5
0 0.5 0
JChan JChan JChan
1 1 1
Dynamics
Tempo
Motion
0.5 0.5
0 0.5 0
LHibberd LHibberd LHibberd
1 1 1
Dynamics
Motion
Tempo
0.5 0.5
0 0.5 0
MJones MJones MJones
1 1 1
Dynamics
Tempo
Motion
0.5 0.5
0 0.5 0
SCoverdale SCoverdale SCoverdale
Figure 9.21: Box-plots for all Six Performers measuring Tempo, Motion norm and
Dynamics used in Performances of Chopin’s A major Prelude
177
1 1 1
Dynamics
Motion
Tempo
0.5 0.5
0 0.5 0
CFrank CFrank CFrank
1 1 1
Dynamics
Motion
Tempo
0.5 0.5
0 0.5 0
FPavri FPavri FPavri
1 1 1
Dynamics
Motion
Tempo
0.5 0.5
0 0.5 0
JChan JChan JChan
1 1 1
Dynamics
Motion
Tempo
0.5 0.5
0 0.5 0
LHibberd LHibberd LHibberd
1 1 1
Dynamics
Tempo
Motion
0.5 0.5
0 0.5 0
MJones MJones MJones
1 1 1
Dynamics
Tempo
Motion
0.5 0.5
0 0.5 0
SCoverdale SCoverdale SCoverdale
Figure 9.22: Box-plots for all Six Performers measuring Tempo, Motion norm and
Dynamics used in Performances of Chopin’s B flat minor sonata finale
178
To produce a representation of how these parameters are used in accenting
particular notes, the outliers for each dataset for below the 5th percentile and
above the 95th percentile are highlighted in the following scatter plots in Fig-
ures 9.23 ,9.25 and 9.27, and then plotted on the appropriate place in the score
in Figures 9.24, 9.26 and 9.28. These measurements are only taken for the first five
bars of the finale.
Figure 9.23: Scatter Plot Showing Extremes in Tempo, Dynamics and Motion for
Martin Jones performing the Chopin finale
The scatter plot for Martin Jones’ performance of the finale (seen in Figure 9.23)
shows quite a few combinations in motion(blue) and dynamics(pink) as well as
tempo(green) and dynamics(blue). These translate into the annotated score shown
in Figure 9.24 by highlighting the halfway point in each bar, particularly in the left
hand. These accents appear to be more rhythmical than anything entirely struc-
tural.
Jessica Chan’s performance is characterised by the scatter plot shown in Fig-
ure 9.25. An interesting point to note is the location of the minima and maxima of
the motion parameter. The minima tend to occur in the lower half of the tempo
range, whilst the maxima appear to occur within the upper half. This upper half
are also characterised by larger rms amplitude values than the minima. These
measurements may reflect the grouping wrist and tempo movements seen in the
earlier time graphs. The location of these maxima and minima in correspondence
with the musical score is seen in Figure 9.26. Again the beginning of the finale is
well accented across tempo, dynamics and motion, with further combinations of
179
M
12
D M
¬
8
í
D
12 T D T T
8 í
D M
D
3 T D
D D
T M
5
í
í
Figure 9.24: Annotated Score for Martin Jones’ performance of the Chopin finale,
noting extremes in tempo(T), dynamics(D) and motion(M)
180
Figure 9.25: Scatter Plot Showing Extremes in Tempo, Dynamics and Motion for
Jessica Chan performing the Chopin finale
181
T
D M
12
T D M T
8
12 D M T
8
T
D
3 M
D
T DD M M
D
5
Figure 9.26: Annotated Score for Jessica Chan’s performance of the Chopin finale,
noting extremes in tempo(T), dynamics(D) and motion(M)
182
Figure 9.27: Scatter Plot Showing Extremes in Tempo, Dynamics and Motion for
Lauren Hibberd performing the Chopin finale
183
D
T
12
D M
8
D
X M
D
T D
12 M T M
8
D
M
D
M
3 D D
D
D
D
D M
5
Figure 9.28: Annotated Score for Lauren Hibberd’s performance of the Chopin
finale, noting extremes in tempo(T), dynamics(D) and motion(M)
184
9.3 Exploring Finger Curvature
An advantage of using the finger motion capture system is that we can also ex-
amine curvature of fingers as they are used to play each note. For this particular
question, the curvature of the thumb and the second finger are examined for the fi-
nale. These are calculated as distances between the x,y coordinates of the metacar-
pophalangeal and the proximal phalanx for the thumb and first finger, and the
distance from the proximal to the distal phalanx of the first finger. In the graphs
for each performer in Figure 9.31 for Lauren Hibberd, Figure 9.30 for Martin Jones
and Figure 9.29 for Jessica Chan, an increase in each of the three graphs for cur-
vature indicates that the finger is becoming flatter and parallel to the keyboard. A
decrease indicates that the finger is becoming more curved.
Jessica Chan performing Chopin finale op.35
100
LH RH
80
curvature
1st finger
prox-dist
(pixels)
60
40
20
0
100
LH RH
80
meta-prox
curvature
1st finger
(pixels)
60
40
20
0
100
LH RH
80
meta-prox
curvature
(pixels)
Thumb
60
40
20
0
12000
RMS values
amplitude
8000
Rms
0
550
tempo
450
350
Tempo
(bpm)
250
150
50
22 23 24 25 26 27
Time(s)
Figure 9.29: Finger Curvature, Tempo and Dynamics for Jessica Chan, performing
the Chopin finale
Jessica Chan demonstrates a style of playing in which she moves her hands
around in each of the three axes extraneously to the movement required to phys-
185
ically play each note. Seen in the previous graphs marking the coordinates of
wrist markers, a ’releasing’ action is seen often in the prelude, and this is carried
into the finale despite the dramatically different tempo. For this performance as
seen in Figure 9.29, the thumb curvature characterises this movement in the first
bar, where a repeating pattern is seen for the twelve quavers, separating into two
groups of six.
Martin Jones performing Chopin finale op.35
100
LH RH
80
curvature
1st finger
prox-dist
(pixels)
60
40
20
0
100
LH RH
80
meta-prox
curvature
1st finger
(pixels)
60
40
20
0
100
LH RH
80
meta-prox
curvature
(pixels)
Thumb
60
40
20
0
Start Bar 2 Bar 3 Bar 4 Bar 5
8000
RMS values
amplitude
Rms
4000
0
850
750 tempo
650
550
Tempo
(bpm)
450
350
250
150
50
15 16 17 18 19 20
Time(s)
Figure 9.30: Finger Curvature, Tempo and Dynamics for Martin Jones, performing
the Chopin finale
In contrast, Martin Jones keeps his fingers flat whilst playing the first bar which
is demonstrated in Figure 9.30 by negligible differences in curvature. The thumb is
kept mainly flat for the next few bars, whilst the curvature for the first finger shows
clearly where notes are performed using this particular finger. The differences in
curvature for these performed notes are negligible suggesting that he uses his first
finger in the same way for each note. Using a flat thumb and a curved first finger
suggests Martin Jones may be using the right hand thumb to emphasise the first
and fifth quaver in each group of six as an underlying melody.
186
Lauren Hibberd performing Chopin finale op.35
100
LH RH
80
curvature
1st finger
prox-dist
(pixels)
60
40
20
0
100
LH RH
80
meta-prox
curvature
1st finger
(pixels)
60
40
20
0
100
LH RH
80
meta-prox
curvature
(pixels)
Thumb
60
40
20
0
12000
RMS values
amplitude
8000
Rms
0
550
tempo
450
350
Tempo
(bpm)
250
150
50
16.5 17.5 18.5 19.5 20.5 21.5
Time(s)
Figure 9.31: Finger Curvature, Tempo and Dynamics for Lauren Hibberd, per-
forming the Chopin finale
187
Lauren Hibberd is another performer that keeps her fingers relatively flat whilst
performing the finale, again which can be seen by the curvature plotted in Fig-
ure 9.31. The fingers remain fairly flat throughout the piece whereas the tips of the
first finger between the proximal and distal interphalangeal change in curvature
for where the notes need to be performed. An interesting point to note is that the
curvature of the fingers remains constant throughout the crescendo in amplitude
of the sound wave suggesting that curvature is not a direct factor for loudness of
each note.
These results demonstrate the ability of the finger tracking system to glean in-
formation on performer playing styles and also structural information intended
by the performer. The differences between each performer is clearly visible in the
changes in curvature for each finger. Further investigation would involve each of
the fingers’ curvature and attempt to align them to the performed notes.
188
9.4 Conclusions
Quantitative measurements of aural and visual parameters in performances of
both Chopin’s Prelude in A major Op.28 No.7 and B flat minor sonata finale move-
ment Op.35 reveal structural information from the manipulation of tempo, dynam-
ics and finger movement. This is used to analyse a point of disagreement amongst
traditional analysis on the importance of bar 5 in the finale as either a continuation
of the initial theme starting at bar 1 or the beginning of a new phrase marking the
first four bars as simply an introduction.
The beginning of this chapter again stated some hypotheses relating to how
performers used these parameters of tempo, dynamics and finger motion to project
structural ideas. Hypothesis 9.1 stated that trajectories of finger motion in the x, y
and z axis would reflect expressive accents within the phrase and Hypothesis 9.2
stated that it was expected that wrist motion that reflects movements toward the
soundboard of the keyboard, and movements toward the key-bed will produce
high values of rms amplitude.
Continuous measurement and display of these parameters against time al-
lows closer observation of fluctuations at particular structural points in each piece.
From the multi-modal graphs of wrist motion, dynamics and tempo, trajectories of
the y and z axis components reveals information about note groupings and general
phrasing. For all recorded performances of the finale, it is evident that each per-
former groups the quavers into sixes. For particular performances such as Martin
Jones’, the change in composition halfway through bar 3 where the pitch changes
every six quavers instead of twelve in bars 1-2, is marked by accents in tempo and
dynamics. This confirms that Hypothesis 9.1 is correct. From comparing perfor-
mances of the prelude and the finale, results show that five out of the six perform-
ers suggest that there is a boundary at bar 5, however, it is not a highly important
one in terms of structure.
Hypothesis 9.2 appears to be rejected as the observations for performers’ wrist
movements moving towards the keyboard do not seem to coincide with increases
in rms amplitude. As the estimations for the z axis were not entirely accurate
but more a reflection of the height movement of each finger, a confirmation of the
hypothesis in this respect would have been speculative. However, further inves-
tigation is warranted into the expressive movements of fingers throughout piano
performance and their close relationship with audio parameters.
From statistical analysis we see that the rarest occurring values for each pa-
rameter occur at specific points in the phrasing structure, which when applied to
189
performances of the finale, mark bar 5 as a change in the composition, but not a
complete change in phrase as would be expected for the introduction of a new
theme. Comparing the results from these different types of analysis confirms the
interpretation of phrasing structure.
The methodology used shows that very intricate details regarding how per-
formers play each note can be extracted from performances and used to indicate
structure even in pieces where the structure is ambiguous.
Improvements on this system could be made in the alignment of the video
parameters to the audio stream. The raw output video from the capture camera
could be altered to include time-stamping allowing more accurate alignment of
gestures to notes. Viewing the curvature of each finger, automatic detection of
notes being played could be programmed in order to better align the gesture with
the beginning and end of each note. This small scale analysis performed here
could be run for the entire piece, for many more performers and many more pieces
used for control. In comparison with the statistical analysis for particular accented
notes, structure can be more easily detected. Predicting structure in unknown
pieces is therefore possible.
190
Chapter 10
Discussion
191
approach with the subjective musical context. This has proved particularly im-
portant considering the subjectiveness with which fluctuations in aural and visual
parameters are produced by the performers. As quoted in Chapter 2.1, Eric Clarke
observes that fluctuations in tempo can be used for different purposes depending
on the structure of the piece. Visual parameters are also affected by the movements
necessary for basic note production. Separating movements in terms of function
is complicated considering some gestures may be multi-functional. Therefore, the
analysis has aimed to not discard any information which may pertain to the motor
movements required for note production, but to include them in the analysis. In
terms of phrasing, this would lessen the chance of movements lining up exactly
with these larger chunks of notes instead of individual notes or chords, and so the
results found will likely be more accurate.
When dealing with the multi-modal streams of data produced by the record-
ings, care is taken when choosing methods of measuring the relationships between
them. A lot of statistical tests determine whether data is related in direct ways
such as increasing tempo when there is increasing dynamics. However, with the
subjective nature of performance, and the manipulation of parameters changing
differently depending on the structural function, these factors will not always be
changing in the same way over time. To compensate for this, the tests used in-
clude determining how regularly troughs in motion norm occur close to a phras-
ing boundary, and more emphasis has been placed on graphs of the multi-modal
parameters plotted in time. What is required are methods of analysis which can
take into account the large variability of each of the parameters but still recognis-
ing that there are certain fixed parameters such as pitch and structure.
An issue in including the measurement of physical gestures alongside aural
parameters occurs in the alignment with the notes from the original score. As
gestures are multi-functional, and often can be for necessary purposes as well as
for expression, it is difficult to separate these from purely expressive gestures. For
this reason, it is also difficult to align gestures to singular notes. This can make
direct comparison throughout aural and visual domains complicated.
Drawing back from analysis, a bigger question to ask is whether performers ac-
tually intend the manipulation of parameters such as tempo, dynamics and move-
ment as a communication of musical structure for themselves and/or the audi-
ence. In an attempt to examine the differences between phrasing boundaries high-
lighted by changes in performance parameters, and phrasing boundaries identi-
fied by audience judges, videos created from the first experiment were used as
192
stimuli and audience judges were asked to denote phrase shaping by moving a
slider. There seemed to be very little difference in boundaries for performances of
the Prelude in A major op.28 no.7 but this could be partially explained by its strict
explicit structure. We cannot tell exactly how these audience judges are making
their judgements, particularly in audio-only presentations. Detecting phrasing
computationally then becomes more a question of how structure is reflected by
these certain parameters instead of trying to imitate how audiences perceive it.
The experiments in this thesis are a unique comparison between various com-
posed pieces. A control piece has been used in an effort to benchmark variations
in each performer’s style of playing. Examination of solely the control piece has
demonstrated various styles of performer playing even when conveying the exact
same structure. This has been confirmed by multi-modal explorations of tempo,
dynamics and movement trajectories which show decreases and increases in vary-
ing combinations at phrasing boundaries. This, along with the examination of the
extremes of these trajectories at their corresponding occurrence within the score
shows the fastest/slowest tempi reserved for particularly important structural
points for the example pianists. These example performance measurements reflect
completely different performances of the same piece, yet when looking at these
particularly highlighted points in the score, there are many agreements. These re-
sults coincide with suggestions by Repp [100] that different expressive strategies
are not necessarily produced by different structural interpretations, and this has
been seen within the research in this thesis to extend for physical gestures.
The comparison between two pieces in Chapter 8 shows that certain elements
of performer style are carried over from the control piece, yet there are also many
differences. The two pieces are composed by Chopin in a similar style with similar
rhythmic repetition, albeit dissimilar rhythms. Differences are evident in motion
norm between pieces for the same performer with the leading markers also chang-
ing between pieces. This suggests the rhythmic make-up of the phrase may have
far more influence on the motion trajectory, something previously stated by Wan-
derley [125]. Similarities still occur with motion peaks and troughs occurring at
phrasing boundaries but for these longer, expanded phrases in Prelude 6, sub-
chunking is sometimes present. In both of these pieces, it is evident that whilst
performer movement style can be widely differing, the underlying motion norms
conform to the same structure, confirming Hypothesis 8.1. Using this method of
multi-modal analysis for tempo, dynamics and motion, all nine performers in this
experiment produce a similar structural interpretation of eight phrases for Pre-
193
lude 7 and the first five phrases in Prelude 6. The second part of Prelude 6 cannot
be subjected to direct comparison between performers. Progressing from these
results, further studies could include examining these different interpretations,
however, a method of accurately determining these interpretations from the per-
formers themselves must be developed. Within these phrasing shapes of tempo,
dynamics and motion, there are different sub-shapes which may be reliant on how
each performer is accenting the notes within the phrase. Examples of sub-phrase
analysis are seen in the statistical analysis of the extremes of each parameter. From
deducing their position on the score, all nine performers use local maxima and
minima to determine the accents within each phrase. Particular points of inter-
est are characterised by extremes in all three parameters. Further exploration into
other parameters such as articulation and timbre would be expected to produce
similar results. These suggest that each performer draws our attention to interest-
ing points in structure in a form that is comparable across a number of pieces.
The exploration into interpretations of the finale in Chapter 9, uses more intri-
cate detail to determine the importance of each note as it is played in the overall
picture of the opening bars, but works on the theory that performer ‘styles’ can
be used to discover structure. This in a sense could be done for the second half
of Prelude 6 in the previous experiment. For each of the six pianists, wrist and
thumb motion is examined in all three axes alongside tempo and dynamics. This
produces results which can determine phrase shaping and even note ’groupings’.
Also evident from performances of the prelude is the ‘releasing’ action with which
pianists tend to play loud chords. Accents in note duration and inter-onset interval
are apparent at the beginning of the finale, however, in most of the performances,
we do not see these accents repeated at the beginning of bar 5. This suggests that
bar 5 is regarded as not the entry of a new theme but the continuation of the theme
beginning at bar 1. These results demonstrate a method of detecting structure
purely from performance parameters that could be used without a priori under-
standing of the musical structure itself.
Finally, the study of finger curvature enabled by the use of the finger tracking
system FingerDance 4, produce results which correctly identify the style of ‘touch’
used by each performer in performances of the prelude and the finale. The exam-
ple of Lauren Hibberd and Martin Jones who both use a flat fingered approach to
the finale against Jessica Chan’s more curved approach immediately allow us to
examine the differences between different touches and the different accents they
produce.
194
Chapter 11
Final Conclusions
The two main aims produced at the start of this thesis were to
Aim 1: design capture systems, storage and visualisation formats that allow ac-
curate and robust methods of recording live performances and display the
results in a useful way for musicological analysis.
Aim 2: to determine whether structure can be elucidated from the empirical anal-
ysis of multi-modal performance parameters.
The first aim was satisfied in the first half of the thesis which detailed the de-
sign of multi-modal systems from a selection of proprietary products as well as
specially designed ones. In order to satisfy the need for a cheap, portable and in-
expensive motion capture system, Chapter 4 detailed an accurate finger motion
capture system which operated with the least disturbance to the performer. This
used UV paint dots as passive markers in an image processing based system. This
system estimates 3D positioning within a margin of 1.66mm error and can also
provide information on finger curvature. Chapters 6 and 5 demonstrate how the
movement capture system alongside other multi-modal capture systems can be
used to produce and store multi-modal information and display queries above a
musical score. This is in such a format as to be incredibly useful for musicological
analysts.
The experiments that followed in Section III used these tools to highlight how
structure can be detected and in some cases predicted from the fluctuations in
performance parameter data, thus satisfying the second main aim of the thesis.
Within these experiments, the following hypotheses were made:
Hypothesis 8.1 Regardless of the subjective and personal nature of physical ges-
ture in relation to musical structure, there will exist an underlying pattern
195
that is related to phrasing and is common across all performers.
Hypothesis 8.2 The underlying motion profile of the performer related to phras-
ing will be the same across pieces.
Hypothesis 8.4 Where combinations of global maxima and minima occur in both
aural and visual streams of data, these will be related to the most important
structural features of the composition.
Hypothesis 9.1 Trajectories of finger motion in the x, y and z axis will reflect ex-
pressive accents within the phrase.
Hypothesis 9.2 It is expected that wrist motion that reflects movements toward
the soundboard of the keyboard, and movements toward the key-bed will
produce high values of rms amplitude.
From the gestural motion studies conducted from performances of two Chopin
Preludes in Chapter 8, it was shown that despite the idiosyncratic nature of the
performers’ gestures in performances of both preludes, the underlying motion
norm suggested the same phrasing structure. This was confirmed by measuring
the local maxima of the motion profile between phrases for each pianist. These lo-
cal maxima occurred reliably at the same point for each phrase for each performer.
These patterns were evident across all performers despite their background and
ideas on movement within performance. This confirms Hypothesis 8.1. Correlat-
ing each performer’s patterns of motion profile across their performances of the
Preludes, it is shown that few result in a high correlation. Some even result in
negative correlations. This suggests that the motion profile for each performer
changes depending on which piece they are performing. This rejects Hypothe-
sis 8.2. Factors for this may be due to changes in rhythm, melody or harmony,
however, seeing as the rhythmically repeating phrases of Prelude 7 tend to pro-
duce similar motion patterns for each performer, it suggests that motion may be
highly linked to rhythm.
As each performer ’style’ is different and the use of these parameters can be
varied according to the position on the score, it becomes apparent through ob-
servation of the multi-modal graphs that a combination of parameters indicates
196
phrasing boundaries. An example of this is clearest at the harmonic arrival be-
tween phrases 5 and 6 where global maxima and minima in motion, dynamics and
tempo coincide. This suggests that Hypothesis 8.3 is correct. When examining the
maxima and minima of the dataset and their occurrence in the musical score, it
is clear that performers tend to use combinations of these extremes at important
points in structure, suggesting that Hypothesis 8.4 is correct. The location of these
extremes in motion dynamics and tempo occur at particular accents of harmony,
melody and rhythm set out by Parncutt.
Using this knowledge to then try and predict musical structure from perfor-
mance nuances, the second experiment analyses professional performances of Chopin’s
B flat minor sonata op.35 finale movement. Looking at intricate finger movement
(as the piece is performed at the fastest limits of technical ability), we can see pat-
terns of how notes are grouped and accented. When added to information on
tempo and dynamics, this provides an interpretation from which we can glean
structural issues such as the interpretation of bar 5 as not the introduction of a
new theme but the continuation of the main theme introduced at bar 1.
Measurements of wrist movement in the x, y and z axes, throughout perfor-
mances of the finale show certain accents defined by peaks and troughs that occur
simultaneously with accents in tempo and dynamics. When located on the score
of the performance, these appear at points which reflect particular harmonic and
structural changes. This confirms Hypothesis 9.1. Measured movements of the
wrist towards and away from the keyboard do not necessarily coincide with in-
creases in rms amplitude and as the z axes is a estimation, Hypothesis 9.2 cannot
be confirmed. These measurements of motion, tempo and dynamics provide in-
sight into the structural choices of the six professional performers when consider-
ing the finale, and allow the conclusions to be drawn on the particular ambiguous
boundary of bar 5, something which cannot be achieved by traditional score anal-
ysis alone.
Such research into how performers highlight structure with these parameters
has major benefits for piano pedagogy and implications for computational meth-
ods of detecting structure such as in the field of music information retrieval.
As noted in Section 5, there have been many developments noted for these
systems and for the multi-modal analysis techniques. The most pertinent of these
I believe lie with the development of the finger tracking system and the alignment
of physical gestures with aural parameters, for both analysis and visualisation, and
to investigate more thoroughly the role of physical gestures in music performance.
197
A long debate has been waged between scientists and musicians over how the
finger strikes the key manipulates the resulting sound. Contrary to the belief that
the only variable can be key velocity, a direct measure of the force applied to the
key, pianists claim that the shape of the hand i.e. flat versus curved fingers alters
not just the loudness but also timbre. The Fingerdance software in its developed
form could be pivotal in answering these questions alongside physical modelling
of the piano itself.
The development of methods such as these for automatically detecting struc-
ture must be cultured in a way which respects the context of the music being anal-
ysed and the subjectivity of the performances. A highly-integrated approach to
computational methods is required, which constantly refer to musicians’ interpre-
tations and analyses of structure. Only in this way will automatic detection be
completely valid in all disciplines, and be useful in performing functions pertain-
ing to the analysis of music.
As well as determining that musical structure can be measured from quantifi-
able expressive parameters, this study has further implications for assisting com-
putational music analysis as well as music information retrieval. Implications for
piano pedagogy arise from relating body movement to underlying musical struc-
ture as well as the study of the relationship between finger curvature and the re-
sultant acoustic sound. Examining this first step in the communication of musical
information from composer through the performer to the audience can also reveal
what is conveyed in a musical performance so we can ultimately understand what
is being perceived and how.
198
Bibliography
[10] Coriander - gui for firewire camera control and capture, 2010.
http://damien.douxchamps.net/ieee1394/coriander/index.php.
[14] D. G. Barolsky. The performer as analyst. Music Theory Online, 13 No.1, 2007.
199
[15] E. Bisesi and R. Parncutt. An accent-based approach to automatic rendering
of piano performance. In W. Goebl, editor, Proceedings of the Second Vienna
Talk on Music Acoustics, Vienna, 19-21 September, 2010, pages 26–30. Vienna:
Institute of Musical Acoustics (Wiener Klangstil), 2010.
[20] A.-M. Burns and M. M. Wanderley. Visual methods for the retrieval of gui-
tarist fingering. In N. Schnell, F. Bevilacqua, M. J. Lyons, and A. Tanaka,
editors, NIME, pages 196–199. IRCAM - Centre Pompidou in collaboration
with Sorbonne University, 2006.
200
[25] E. Clarke and J. Davidson. Composition-Performance-Reception : Studies in the
Creative Process in Music, chapter The Body in performance, pages 74–92.
Ashgate Publishing Ltd, 1998.
[26] E. F. Clarke. Mind the gap: formal structures and psychological processes in
music. Contemporary Music Review, 3 part 1:1–13, 1989.
[29] N. Cook. Structure and performance timing in bach’s c major prelude (wtci):
An empirical study. Music Analysis, 6 No.3:257–272, 1987.
[34] J. W. Davidson. Qualitative insights into the use of expressive body move-
ment in solo piano performance: a case study approach. Psychology of Music,
35(3):381–401, 2007.
[35] J. W. Davidson and J. S. Correia. The Science and Psychology of Music Perfor-
mance, chapter Body Movement, pages 237–250. Oxford University Press,
2002.
201
[37] P. de Alcantara. Indirect Procedures : A Musician’s Guide to the Alexander Tech-
nique. Oxford University Press, 1997.
[46] J. Ginsborg and E. King. Gestures and glances: The effects of familiarity
and expertise on singers’ and pianists’ bodily movements in ensemble re-
hearsals. In Proceedings of the 7th Triennial Conference of European Society for
the Cognitive Sciences of Music (ESCOM 2009) Jyvaskyla, Finland, 2009.
202
[49] W. Goebl, R. Bresin, and A. Galembo. The piano action as the performer’s
interface: Timing properties,. Proceedings of the Stockholm Music acoustics Con-
ference, 2003.
[51] W. Goebl and C. Palmer. Tactile feedback and timing accuracy in piano per-
formance. Experimental Brain Research, 2007.
[52] W. Goebl and C. Palmer. Finger motion in piano performance: Touch and
tempo. In Proceedings of the International Symposium on Performance Science,
2009.
[53] W. Goebl and C. Palmer. Synchronization of timing and motion among per-
forming musicians. Music Perception, 26(5):427–438, 2009.
[57] M. Grachten and G. Widmer. Who is who in the end? recognizing pianists
by their final ritardandi. In Proceedings of the International Society for Music
Information Retrieval, 2009.
[58] H. Guan, C.-S. Chua, and Y.-K. Ho. 3d hand pose retrieval from a single
2d image. In International Conference on Image Processing 2001, Proceedings,
volume 1, pages 157–160, 2001.
[59] H. Honing. The final ritard: On music, motion, and kinematic models. Com-
puter Music Journal, 27:3:66–72, 2003.
203
[60] D. Huron. Music information processing using the humdrum toolkit: Con-
cepts, examples, and lessons. Computer Music Journal, 26(2):11–26, 2002.
[65] V. Kofi Agawu. Concepts of closure and chopin’s opus 28. Music Theory
Spectrum, 9:1–17, 1987.
[69] M. Leman. Embodied Music Cognition and Mediation Technology. MIT Press,
2008.
[70] F. Lerdahl and R. Jackendoff. A Generative Theory of Tonal Music. MIT Press,
1983.
204
[72] C.-C. Lien and C.-L. Huang. Model-based articulated hand motion tracking
for gesture recognition. Image and Vision Computing, 16:121–134, 1998.
[73] E. Lin, A. Cassidy, D. Hook, A. Baliga, and T. Chen. Hand tracking using
spatial gesture modeling as visual feedback. Multimodal Interfaces,2002. pro-
ceedings. Fourth IEEE International Conference, pages 197–202, 2002.
205
[81] R. A. d. S. Marranita. Visual tracking of articulated objects: An application to
the human hand. Master’s thesis, Universidade tecnica de lisboa : Instituto
superior tecnico, 2005.
[84] M. Nusseck and M. Wanderley. Music and motion - how music-related an-
cillary body movements contribute to the experience of music. Music Percep-
tion, 26:4:335–353, 2009.
[88] C. Palmer and S. Dalla Bella. Movement amplitude and tempo change in
piano performance. Journal of Acoustical Society of America, 115:5, 2004.
[92] K. Pearson. On the lines and planes of closest fit to systems of points in
space. Philisophical Magazine, 2:6:559–572, 1901.
[93] R. Perry. The music encoding initiative (mei). Proceedings of the First Interna-
tional Conference on Musical Applications Using XML, pages p55–59, 2002.
206
[94] S. Pullinger. A System for the Analysis of Musical Data. PhD thesis, University
of Glasgow, 2010.
[96] S. Pullinger, D. McGilvray, and N. Bailey. Music and gesture file: Perfor-
mance visualisation, analysis, storage and exchange. In Proceedings of the
International Computer Music Conference, 2008.
[97] J. Ramsay and B. Silverman. Functional Data Analysis. Springer, New York,
2005.
207
Musical Body: Gesture, Representation and Ergonomics in Musical Performance,
London, UK, 2009.
[112] N. Spiro, N. Gold, and J. Rink. Performance motives: Analysis and compar-
ison of performance timing repetitions using pattern matching and formal
concept analysis. In Proceedings of International Symposium on Performance
Science, 2007.
[114] H. Suzuki. Spectrum analysis and tone quality evaluation of piano tone with
hard and soft touches. Acoustical Science and Technology, 28(1):1–6, 2007.
[115] M. Talbot. The Finale in Western Instrumental Music. Oxford University Press,
2001.
[116] R. Taniskin. The Oxford History of Western Music, Vol3:The 19th Century. Ox-
ford University Press, 2005.
208
[117] M. Thompson and G. Luck. Effect of pianists’ expressive intention on
amount and type of body movement. In Proceedings of the 10th International
conference on music perception and cognition, 2008.
[120] G. Todd. Listener habits and choices–and their implications for public per-
formance venues. Journal of Sound and Vibration, 239:589–606, 2001.
[127] Welch. Motion tracking: No silver bullet, but a respectable arsenal. IEEE
Computer Graphics and Applications, 22 :6:24–38, 2002.
209
[128] H. Wold. Multivariate Analysis, chapter Estimation of principal components
and related models by iterative least squares, pages 391–420. Academic
Press, NY, 1966.
210
Appendices
211
Appendix A
Expanded loadings tables for three
performers from Chapter 8
212
!
!
!
!"
!"
!"
!
!
!
#"
#"
#"
#
#
#
$
$
$
$"
$"
$"
%
%
%
%&
%&
%&
!
!
!
!"
!"
!"
!
!
!
#"
#"
#"
#
#
#
$
$
$
$"
$"
$"
%
%
%
%&
%&
%&
%'
%'
%'
%'
%'
%'
"'
"'
"'
"'
"'
"'
Figure 11.1: Loadings for the First Six Principal Components, Performer 1, Pre-
lude 7 with top ten loadings in the first component highlighted in red and the
second component in blue
213
!
!
!
!"
!"
!"
!
!
!
#"
#"
#"
#
#
#
$
$
$
$"
$"
$"
%
%
%
%&
%&
%&
!
!
!
!"
!"
!"
!
!
!
#"
#"
#"
#
#
#
$
$
$
$"
$"
$"
%
%
%
%&
%&
%&
%'
%'
%'
%'
%'
%'
"'
"'
"'
"'
"'
"'
Figure 11.2: Loadings for the First Six Principal Components, Performer 2, Pre-
lude 7 with top ten loadings in the first component highlighted in red and the
second component in blue
214
Marker PC1 PC2 PC3 PC4 PC5 PC6
C7:X 0.15 0 0.05 -0.09 0.06 -0.08
C7:Y 0.16 0.01 -0.08 0.02 0.01 0.01
C7:Z 0.1 -0.12 -0.08 -0.01 -0.07 -0.23
T10:X 0.15 0.07 0.03 -0.07 0.02 0.03
T10:Y 0.16 -0.01 -0.09 -0.02 0.01 0.01
T10:Z 0.11 -0.05 -0.15 0.08 -0.14 -0.11
CLAV:X 0.15 0.06 0.09 -0.06 0.04 -0.07
CLAV:Y 0.16 0.03 -0.05 0.03 0 0.01
CLAV:Z 0.06 -0.16 0.01 -0.11 -0.01 -0.28
STRN:X 0.13 0.12 0.11 -0.02 -0.02 0.03
STRN:Y 0.16 0.03 -0.02 0.02 -0.02 0.02
STRN:Z 0.07 -0.16 0.02 -0.1 0.02 -0.25
LSHO:X 0.15 -0.03 0.01 -0.1 0.1 -0.04
LSHO:Y 0.16 0.02 -0.05 0.04 -0.01 0
LSHO:Z 0.13 -0.1 -0.04 0.02 0.01 -0.2
LUPA:X 0.1 -0.04 -0.05 -0.25 0.21 0.01
LUPA:Y 0.16 0.05 -0.04 0.05 -0.07 -0.01
LUPA:Z 0.13 -0.12 0.02 0.09 0.06 -0.08
LUPB:X 0.05 0.03 -0.11 -0.34 0.18 0.02
LUPB:Y 0.16 0.06 -0.05 0.03 -0.08 -0.02
LUPB:Z 0.13 -0.11 0.02 0.1 0.05 -0.08
LUPC:X 0.09 0.09 -0.08 -0.27 0.08 -0.02
LUPC:Y 0.15 0.08 -0.03 0.07 -0.14 -0.03
LUPC:Z 0.11 -0.12 0.07 0.14 0.07 -0.08
LELB:X -0.01 0.12 -0.14 -0.3 0.03 -0.03
LELB:Y 0.14 0.09 -0.05 0.03 -0.15 -0.02
LELB:Z 0.11 -0.12 0.06 0.13 0.04 -0.06
LMEP:X 0 0.09 -0.1 -0.33 0.13 0.01
LMEP:Y 0.14 0.09 -0.03 0.07 -0.17 -0.03
LMEP:Z 0.1 -0.14 0.07 0.13 0.09 -0.03
LWRA:X 0.05 0.16 -0.08 -0.09 -0.22 0.02
LWRA:Y 0.13 0.08 -0.04 0.08 -0.21 0
LWRA:Z 0.04 -0.18 0.04 0.06 0.11 0.09
LWRB:X 0.03 0.17 -0.09 -0.15 -0.15 -0.01
LWRB:Y 0.13 0.07 -0.04 0.09 -0.22 0
LWRB:Z 0.06 -0.18 0.05 0.09 0.08 0.1
LFRA:X 0 0.14 -0.12 -0.28 0.02 -0.01
LFRA:Y 0.13 0.09 -0.04 0.08 -0.2 -0.02
LFRA:Z 0.09 -0.16 0.07 0.12 0.09 0.01
LFIN:X 0.05 0.16 -0.08 -0.09 -0.21 0.01
LFIN:Y 0.13 0.07 -0.03 0.09 -0.23 0.01
LFIN:Z 0.03 -0.17 0.02 0.03 0.1 0.12
RSHO:X 0.12 0.08 0.19 -0.06 0.05 -0.09
RSHO:Y 0.16 0.01 -0.09 0.02 0.02 -0.01
RSHO:Z -0.05 -0.13 0.06 -0.13 -0.14 -0.27
RUPA:X 0.07 0.13 0.23 -0.01 0.11 -0.06
RUPA:Y 0.17 0 -0.05 0.01 0.03 0.05
RUPA:Z -0.02 -0.14 0.15 -0.15 -0.17 0.02
RUPB:X 0.02 0.17 0.15 0.05 0.16 -0.11
RUPB:Y 0.16 -0.01 -0.01 0 0.03 0.12
RUPB:Z -0.01 -0.13 0.16 -0.14 -0.2 0.01
RUPC:X 0.05 0.15 0.22 6.30810612319e-05 0.11 -0.06
RUPC:Y 0.16 0 -0.01 0.01 0.05 0.09
RUPC:Z 0.02 -0.16 0.13 -0.16 -0.2 0.04
RELB:X 0 0.17 0.13 0.07 0.16 -0.11
RELB:Y 0.15 -0.02 0.05 -0.01 0.06 0.19
RELB:Z 0.01 -0.08 -0.02 -0.05 -0.09 0.03
RMEP:X 0.01 0.17 0.18 0.05 0.15 -0.07
RMEP:Y 0.15 -0.01 0.04 -0.01 0.06 0.17
RMEP:Z 0.03 -0.15 0.15 -0.14 -0.16 0.13
RWRA:X 0.01 0.14 0.25 0 -0.02 -0.03
RWRA:Y 0.14 -0.01 0.05 -0.02 0.08 0.19
RWRA:Z 0.05 -0.12 0.17 -0.12 -0.14 0.15
RWRB:X 0 0.16 0.23 0.02 0 -0.06
RWRB:Y 0.14 -0.02 0.01 -0.01 0.11 0.19
RWRB:Z 0.04 -0.13 0.18 -0.13 -0.12 0.17
RFRA:X 0 0.17 0.19 0.04 0.09 -0.07
RFRA:Y 0.15 -0.02 0.04 -0.01 0.08 0.19
RFRA:Z 0.04 -0.14 0.18 -0.14 -0.15 0.16
RFIN:X 0.01 0.14 0.25 0 -0.04 -0.04
RFIN:Y 0.14 -0.02 0.01 -0.02 0.12 0.19
RFIN:Z 0.03 -0.1 0.21 -0.13 -0.11 0.16
RFHD:X 0.06 0.1 0.19 -0.02 -0.1 -0.13
RFHD:Y 0.16 0.02 -0.07 0.03 0.01 0.01
RFHD:Z -0.03 -0.13 0.09 -0.05 -0.07 -0.19
LFHD:X 0.14 0.02 0.09 -0.09 0.04 -0.19
LFHD:Y 0.15 0.06 -0.04 0.04 -0.04 -0.01
LFHD:Z 0.04 -0.16 -0.03 -0.08 0.01 -0.25
LBHD:X 0.13 -0.08 -0.01 -0.1 0.11 -0.09
LBHD:Y 0.16 0.02 -0.08 0.02 0.02 -0.02
LBHD:Z 0.08 -0.15 -0.12 -0.06 0.07 -0.11
RBHD:X 0.11 0.03 0.18 -0.07 -0.03 -0.12
RBHD:Y 0.16 -0.02 -0.1 0 0.05 0
RBHD:Z 0.01 -0.18 -0.01 -0.03 -0.02 -0.08
Figure 11.3: Loadings for the First Six Principal Components, Performer 3, Pre-
lude 7 with top ten loadings in the first component highlighted in red and the
second component in blue
215
Marker PC1 PC2 PC3 PC4 PC5 PC6
C7:X 0.07 -0.06 0.25 0.02 -0.09 0.04
C7:Y 0.16 0.08 -0.02 -0.05 -0.03 -0.09
C7:Z -0.01 0.15 0.05 0.05 0.2 -0.16
T10:X 0.06 -0.12 0.18 0.03 -0.17 0.05
T10:Y 0.15 0.1 -0.03 0.02 -0.04 -0.15
T10:Z 0.01 0.14 -0.16 -0.01 0.15 -0.05
CLAV:X 0.1 -0.07 0.21 0.03 -0.11 0.07
CLAV:Y 0.17 0.07 -0.02 -0.02 -0.01 -0.04
CLAV:Z -0.02 0.1 0.21 0.09 0.14 -0.09
STRN:X 0.08 -0.11 0.13 0.03 -0.2 0.13
STRN:Y 0.16 0.07 -0.03 0.05 -0.01 -0.04
STRN:Z -0.01 -0.01 0.26 0.11 0.02 0.01
LSHO:X 0.01 -0.06 0.27 0.01 -0.13 -0.02
LSHO:Y 0.16 0.08 -0.02 -0.04 -0.01 -0.03
LSHO:Z 0.08 0.13 0.11 -0.06 0.11 -0.09
LUPA:X -0.04 -0.12 0.19 0 -0.14 -0.18
LUPA:Y 0.16 0.08 -0.02 0.01 -0.03 0.02
LUPA:Z 0.06 0.13 0.13 -0.15 0.16 -0.07
LUPB:X -0.02 -0.16 0.08 0.02 -0.13 -0.28
LUPB:Y 0.16 0.09 -0.02 0.05 -0.07 0.04
LUPB:Z 0.09 0.11 0.1 -0.17 0.21 -0.14
LUPC:X 0.04 -0.16 0.1 0.08 -0.11 -0.19
LUPC:Y 0.16 0.08 -0.01 0.08 -0.06 0.11
LUPC:Z 0.03 0.15 0.15 -0.09 0.1 0.09
LELB:X 0.05 -0.15 -0.03 0.1 -0.07 -0.26
LELB:Y 0.14 0.08 -0.02 0.14 -0.16 0.13
LELB:Z 0.06 0.12 0.14 -0.12 0.23 0.03
LMEP:X 0.02 -0.16 -0.02 0.03 -0.09 -0.32
LMEP:Y 0.15 0.08 -0.01 0.14 -0.1 0.15
LMEP:Z 0 0.12 0.16 -0.19 0.19 0.02
LWRA:X 0.12 -0.1 -0.01 0.19 0.07 -0.08
LWRA:Y 0.15 0.07 0 0.18 -0.04 0.17
LWRA:Z -0.01 0.16 0.06 -0.08 -0.13 0.06
LWRB:X 0.11 -0.13 -0.01 0.16 0.07 -0.13
LWRB:Y 0.15 0.06 0 0.19 -0.02 0.17
LWRB:Z -0.03 0.16 0.06 -0.11 -0.1 0.07
LFRA:X 0.06 -0.16 -0.02 0.09 -0.03 -0.26
LFRA:Y 0.15 0.07 -0.01 0.16 -0.06 0.17
LFRA:Z -0.02 0.15 0.12 -0.16 0.06 0.06
LFIN:X 0.12 -0.11 -0.01 0.18 0.09 -0.08
LFIN:Y 0.15 0.06 0.01 0.17 0.01 0.19
LFIN:Z -0.05 0.16 0.03 -0.07 -0.15 0.07
RSHO:X 0.13 -0.06 0.17 0.02 -0.06 0.1
RSHO:Y 0.17 0.07 -0.03 -0.04 -0.02 -0.08
RSHO:Z -0.14 0.08 0.06 0.18 0.11 -0.02
RUPA:X 0.14 -0.09 0.08 0 0.01 0.1
RUPA:Y 0.16 0.08 -0.03 0 -0.01 -0.1
RUPA:Z -0.14 0.07 0.05 0.22 0.09 -0.04
RUPB:X 0.12 -0.13 0.01 -0.01 0.07 0.09
RUPB:Y 0.15 0.1 -0.02 0.06 -0.01 -0.12
RUPB:Z -0.14 0.04 0.02 0.24 0.16 -0.05
RUPC:X 0.13 -0.12 0.04 0 0.06 0.08
RUPC:Y 0.16 0.07 -0.05 0.02 0.03 -0.09
RUPC:Z -0.1 0.12 0.12 0.16 -0.04 -0.02
RELB:X 0.11 -0.14 -0.04 -0.02 0.13 0.06
RELB:Y 0.14 0.09 -0.04 0.12 0.04 -0.12
RELB:Z -0.12 0.08 0.09 0.22 0.05 -0.04
RMEP:X 0.11 -0.13 -0.03 -0.02 0.12 0.07
RMEP:Y 0.14 0.09 -0.05 0.09 0.04 -0.11
RMEP:Z -0.11 0.11 0.09 0.19 -0.01 -0.04
RWRA:X 0.11 -0.13 -0.03 -0.02 0.13 0.04
RWRA:Y 0.08 0.13 -0.07 0.12 0 -0.07
RWRA:Z -0.05 0.16 0.05 0.03 -0.13 -0.11
RWRB:X 0.1 -0.14 -0.03 -0.01 0.14 0.05
RWRB:Y 0.07 0.14 -0.07 0.11 -0.02 -0.06
RWRB:Z -0.06 0.16 0.04 0.04 -0.14 -0.08
RFRA:X 0.11 -0.13 -0.04 -0.03 0.12 0.06
RFRA:Y 0.13 0.1 -0.06 0.1 0.03 -0.1
RFRA:Z -0.09 0.14 0.07 0.11 -0.08 -0.06
RFIN:X 0.1 -0.14 -0.04 -0.03 0.13 0.06
RFIN:Y 0.05 0.14 -0.07 0.1 -0.05 -0.05
RFIN:Z -0.05 0.16 0.02 -0.01 -0.19 0
RFHD:X 0.08 -0.09 0.18 0.04 0.11 0.06
RFHD:Y 0.16 0.08 0 -0.08 -0.03 -0.06
RFHD:Z -0.07 -0.02 0.15 0.23 0.27 0.07
LFHD:X 0.07 -0.07 0.24 0 0.01 0.02
LFHD:Y 0.17 0.03 0 -0.06 0.06 -0.04
LFHD:Z 0.01 0.01 0.24 -0.02 0.2 -0.1
LBHD:X 0.04 0 0.26 -0.06 -0.16 -0.02
LBHD:Y 0.17 0.06 0 -0.09 0 -0.08
LBHD:Z 0.04 0.11 0.17 -0.12 0.01 -0.18
RBHD:X 0.08 -0.07 0.23 0.02 0.02 0.06
RBHD:Y 0.13 0.11 0 -0.11 -0.11 -0.1
RBHD:Z -0.1 0.07 0.03 0.26 0.1 0.09
Figure 11.4: Loadings for the First Six Principal Components, Performer 1, Pre-
lude 6 with top ten loadings in the first component highlighted in red and the
second component in blue
216
Marker PC1 PC2 PC3 PC4 PC5 PC6
C7:X 0.14 -0.09 0.1 -0.08 0.04 -0.03
C7:Y 0.14 0.11 -0.1 -0.04 -0.03 -0.02
C7:Z 0.08 0.17 0.09 -0.08 0.04 0.16
T10:X 0.13 -0.13 0.06 -0.05 0.03 -0.08
T10:Y 0.14 0.09 -0.06 -0.02 -0.01 -0.05
T10:Z -0.07 0.2 -0.12 0 -0.04 0.12
CLAV:X 0.14 -0.09 0.09 -0.04 0.04 -0.03
CLAV:Y 0.14 0.11 -0.07 -0.01 -0.03 0
CLAV:Z 0.1 0.05 0.2 -0.11 0.08 0.15
STRN:X 0.14 -0.11 0.05 0 0.02 -0.07
STRN:Y 0.14 0.12 -0.02 0.03 -0.03 0.01
STRN:Z 0.1 0.02 0.22 -0.1 0.09 0.14
LSHO:X 0.13 -0.12 0.07 -0.09 0.07 -0.05
LSHO:Y 0.14 0.11 -0.08 -0.02 -0.02 0
LSHO:Z 0.11 0.14 -0.01 -0.14 0.06 0.13
LUPA:X 0.12 -0.14 0.06 -0.11 0.02 -0.08
LUPA:Y 0.14 0.11 -0.03 0.06 0 -0.03
LUPA:Z 0.12 0.07 -0.05 -0.13 0.14 0.1
LUPB:X 0.11 -0.13 0.07 -0.16 -0.05 -0.08
LUPB:Y 0.15 0.09 -0.02 0.06 0 -0.04
LUPB:Z 0.1 0.11 -0.1 -0.17 0.07 0.14
LUPC:X 0.13 -0.11 0.09 -0.09 -0.05 -0.11
LUPC:Y 0.13 0.1 0.02 0.13 0.01 -0.06
LUPC:Z 0.13 0.01 -0.03 -0.07 0.22 0.07
LELB:X 0.06 -0.02 0.13 -0.11 -0.27 -0.13
LELB:Y 0.13 0.04 0.05 0.16 0.04 -0.12
LELB:Z 0.12 0.02 -0.08 -0.06 0.21 0.06
LMEP:X 0.06 -0.06 0.09 -0.17 -0.25 -0.1
LMEP:Y 0.12 0.06 0.07 0.18 0.03 -0.11
LMEP:Z 0.11 0.01 -0.11 -0.11 0.2 0.1
LWRA:X 0.11 -0.08 0.02 0.2 0.07 -0.15
LWRA:Y 0.07 0.14 0.11 0.2 -0.04 -0.08
LWRA:Z 0.05 -0.06 -0.13 0.04 0.32 0
LWRB:X 0.11 -0.1 0.03 0.16 0.05 -0.16
LWRB:Y 0.08 0.11 0.09 0.23 0.01 -0.09
LWRB:Z 0.05 -0.05 -0.13 0.03 0.33 0.01
LFRA:X 0.11 -0.1 0.08 -0.01 -0.14 -0.16
LFRA:Y 0.1 0.1 0.09 0.22 0.02 -0.1
LFRA:Z 0.07 -0.03 -0.13 -0.03 0.31 0.05
LFIN:X 0.11 -0.1 0 0.19 0.09 -0.14
LFIN:Y 0.07 0.12 0.1 0.23 0 -0.08
LFIN:Z 0.04 -0.06 -0.13 0.04 0.32 0
RSHO:X 0.14 -0.07 0.1 -0.04 0.03 -0.01
RSHO:Y 0.14 0.1 -0.09 -0.03 -0.02 -0.03
RSHO:Z -0.09 0.07 0.21 0.06 0.08 0.19
RUPA:X 0.14 -0.07 0.08 0.03 -0.03 0.1
RUPA:Y 0.14 0.1 -0.06 -0.02 -0.02 -0.04
RUPA:Z -0.11 0.05 0.18 0.1 0.06 0.14
RUPB:X 0.13 -0.08 0.01 0.09 -0.09 0.21
RUPB:Y 0.14 0.11 -0.03 0 -0.02 -0.04
RUPB:Z -0.11 0.05 0.17 0.1 0.05 0.17
RUPC:X 0.14 -0.07 0.06 0.05 -0.05 0.13
RUPC:Y 0.14 0.13 -0.06 0 -0.04 0.01
RUPC:Z -0.07 0.06 0.26 -8.35455607816e-06 0.15 -0.01
RELB:X 0.1 -0.1 -0.03 0.13 -0.12 0.27
RELB:Y 0.11 0.14 0.01 0.04 -0.03 0
RELB:Z -0.08 0.05 0.24 0.02 0.13 0.02
RMEP:X 0.1 -0.06 -0.02 0.15 -0.13 0.31
RMEP:Y 0.11 0.16 -0.01 0.03 -0.04 0.01
RMEP:Z -0.1 0.06 0.22 0.02 0.11 0.02
RWRA:X 0.11 -0.1 -0.04 0.13 -0.11 0.21
RWRA:Y 0.04 0.2 0.07 0.09 0.01 -0.09
RWRA:Z -0.08 0.19 0.05 -0.08 -0.01 0.05
RWRB:X 0.11 -0.11 -0.04 0.15 -0.1 0.22
RWRB:Y 0.04 0.21 0.07 0.08 0 -0.09
RWRB:Z -0.08 0.19 0.06 -0.08 -0.01 0.04
RFRA:X 0.1 -0.05 -0.03 0.14 -0.14 0.3
RFRA:Y 0.09 0.2 0.01 0.04 -0.04 -0.01
RFRA:Z -0.1 0.17 0.12 -0.06 0.03 0.04
RFIN:X 0.11 -0.12 -0.04 0.14 -0.1 0.22
RFIN:Y 0 0.2 0.08 0.06 0.03 -0.1
RFIN:Z -0.08 0.19 0.05 -0.08 -0.03 0.08
RFHD:X 0.13 -0.08 0.13 -0.09 0.02 -0.02
RFHD:Y 0.12 0.12 -0.13 -0.08 -0.07 -0.02
RFHD:Z 0.09 -0.01 0.25 -0.01 0.08 0.05
LFHD:X 0.13 -0.07 0.13 -0.12 0.01 -0.02
LFHD:Y 0.12 0.1 -0.12 -0.06 -0.06 -0.03
LFHD:Z 0.08 0.01 0.2 -0.19 -0.02 0.04
LBHD:X 0.12 -0.07 0.13 -0.14 0.01 -0.01
LBHD:Y 0.12 0.11 -0.13 -0.08 -0.07 -0.02
LBHD:Z 0.05 0.1 0.08 -0.26 -0.09 0.07
RBHD:X 0.13 -0.07 0.13 -0.1 0.02 -0.02
RBHD:Y 0.11 0.13 -0.13 -0.1 -0.08 -0.01
RBHD:Z 0.08 0.02 0.22 0.09 0.1 0.08
Figure 11.5: Loadings for the First Six Principal Components, Performer 2, Pre-
lude 6 with top ten loadings in the first component highlighted in red and the
second component in blue
217
Marker PC1 PC2 PC3 PC4 PC5 PC6
C7:X 0.13 -0.04 0.16 -0.01 0.03 0
C7:Y 0.15 -0.06 -0.08 0.01 0.05 0.06
C7:Z 0.02 -0.17 0.11 -0.13 -0.14 0.09
T10:X 0.14 0.02 0.13 0.02 0.09 0.01
T10:Y 0.15 -0.08 -0.08 -0.01 0.06 0.05
T10:Z -0.01 -0.09 -0.18 -0.03 -0.1 0.21
CLAV:X 0.14 0 0.15 0.01 0.02 -0.01
CLAV:Y 0.16 -0.05 -0.08 0.01 0.01 0.04
CLAV:Z 0.01 -0.14 0.17 -0.13 -0.13 0.01
STRN:X 0.15 0.05 0.11 0.03 0.04 0
STRN:Y 0.16 -0.05 -0.08 0 -0.02 0.03
STRN:Z 0.03 -0.15 0.18 -0.11 -0.09 0
LSHO:X 0.13 -0.05 0.16 -0.01 0.09 0
LSHO:Y 0.16 -0.05 -0.08 0.03 0 0.06
LSHO:Z 0.09 -0.17 0.1 0 -0.04 0.09
LUPA:X 0.12 -0.02 0.15 -0.09 0.22 0
LUPA:Y 0.16 -0.04 -0.08 0 -0.04 0.04
LUPA:Z 0.06 -0.18 0.08 0.14 -0.04 0.11
LUPB:X 0.08 -0.03 -0.08 -0.03 0.04 -0.03
LUPB:Y 0.15 -0.03 -0.11 0 -0.11 -0.01
LUPB:Z 0.04 -0.03 0.18 -0.01 0.04 0.07
LUPC:X 0.11 0.04 0.11 -0.18 0.26 -0.01
LUPC:Y 0.16 -0.04 -0.09 -0.03 -0.04 0.01
LUPC:Z 0.07 -0.17 0.07 0.14 -0.06 0.15
LELB:X 0.09 0.11 0.04 -0.26 0.19 0.01
LELB:Y 0.15 -0.01 -0.1 -0.07 -0.08 -0.06
LELB:Z 0.05 -0.18 0.08 0.17 -0.1 0.14
LMEP:X 0.1 0.08 0.07 -0.23 0.24 0
LMEP:Y 0.15 -0.02 -0.08 -0.04 -0.13 -0.02
LMEP:Z 0.03 -0.18 0.09 0.21 -0.06 0.12
LWRA:X 0.12 0.11 0.02 -0.2 -0.04 -0.05
LWRA:Y 0.15 -0.01 -0.08 -0.04 -0.14 -0.11
LWRA:Z -0.06 -0.17 0.03 0.16 0.13 -0.18
LWRB:X 0.11 0.12 0.02 -0.23 -0.01 -0.01
LWRB:Y 0.14 -0.02 -0.08 -0.03 -0.15 -0.11
LWRB:Z -0.04 -0.18 0.04 0.18 0.12 -0.16
LFRA:X 0.1 0.11 0.04 -0.25 0.15 -0.01
LFRA:Y 0.15 -0.01 -0.08 -0.04 -0.14 -0.07
LFRA:Z 0 -0.19 0.08 0.23 0.01 0.02
LFIN:X 0.12 0.11 0.02 -0.19 -0.06 -0.02
LFIN:Y 0.14 -0.02 -0.07 -0.02 -0.16 -0.12
LFIN:Z -0.05 -0.16 0.01 0.15 0.2 -0.17
RSHO:X 0.13 0.01 0.16 0.01 -0.02 -0.04
RSHO:Y 0.16 -0.05 -0.07 0.01 0.04 0.06
RSHO:Z -0.1 -0.07 0.11 -0.16 -0.19 0
RUPA:X 0.12 0.06 0.17 0.06 -0.03 -0.11
RUPA:Y 0.15 -0.07 -0.09 0.02 0.03 0.02
RUPA:Z -0.12 -0.11 0.04 -0.16 -0.15 -0.1
RUPB:X 0.11 0.08 0.16 0.08 -0.03 -0.13
RUPB:Y 0.15 -0.07 -0.1 0.04 0.03 -0.05
RUPB:Z -0.08 -0.16 0.04 -0.2 -0.1 -0.03
RUPC:X -0.03 -0.03 -0.03 -0.03 -0.13 -0.39
RUPC:Y 0.14 -0.06 -0.04 0.03 0.02 -0.12
RUPC:Z 0.05 0.05 0.03 0.04 0.1 0.32
RELB:X 0.07 0.15 0.13 0.13 -0.07 -0.17
RELB:Y 0.12 -0.09 -0.12 0.03 0.01 -0.17
RELB:Z -0.09 -0.15 0.01 -0.2 -0.11 -0.06
RMEP:X 0.07 0.14 0.14 0.13 -0.05 -0.19
RMEP:Y 0.13 -0.09 -0.12 0.04 0.03 -0.15
RMEP:Z -0.09 -0.16 0 -0.17 -0.06 -0.11
RWRA:X 0.09 0.14 0.1 0.13 -0.1 -0.15
RWRA:Y 0.05 -0.09 -0.21 -0.05 -0.02 -0.09
RWRA:Z -0.05 -0.17 0.03 -0.04 0.24 -0.15
RWRB:X 0.08 0.16 0.1 0.11 -0.14 -0.12
RWRB:Y 0.05 -0.09 -0.21 -0.05 -0.02 -0.1
RWRB:Z -0.06 -0.17 0.04 -0.04 0.23 -0.17
RFRA:X 0.07 0.15 0.12 0.12 -0.09 -0.16
RFRA:Y 0.1 -0.1 -0.16 0.01 0.01 -0.15
RFRA:Z -0.08 -0.18 0.02 -0.12 0.08 -0.15
RFIN:X 0.09 0.16 0.09 0.12 -0.14 -0.13
RFIN:Y 0.02 -0.09 -0.21 -0.07 -0.03 -0.04
RFIN:Z -0.05 -0.16 0.02 0 0.26 -0.2
RFHD:X 0.1 -0.06 0.16 -0.05 -0.08 0.04
RFHD:Y 0.16 -0.06 -0.06 0.02 0.05 0.07
RFHD:Z -0.02 -0.16 0.1 -0.13 -0.2 0.03
LFHD:X 0.14 -0.05 0.14 -0.02 0 0.05
LFHD:Y 0.15 -0.06 -0.06 0.02 0.03 0.09
LFHD:Z 0.05 -0.18 0.11 -0.09 -0.1 0.06
LBHD:X 0.14 -0.05 0.12 0 0.05 0.02
LBHD:Y 0.16 -0.05 -0.07 0.02 0.06 0.06
LBHD:Z 0.04 -0.16 0.17 -0.04 0.01 -0.01
RBHD:X 0.11 -0.06 0.17 -0.03 -0.03 0.03
RBHD:Y 0.16 -0.05 -0.06 0.03 0.08 0.05
RBHD:Z -0.06 -0.13 0.14 -0.1 -0.15 -0.07
Figure 11.6: Loadings for the First Six Principal Components, Performer 3, Pre-
lude 6 with top ten loadings in the first component highlighted in red and the
second component in blue
218
Appendix B
Extra weighted principal components
graphs from Chapter 8
219
Figure 11.8: Weighted Principal Components for Performer 5, Prelude 7
220
Figure 11.10: Weighted Principal Components for Performer 7, Prelude 7
221
Figure 11.12: Weighted Principal Components for Performer 9, Prelude 7
222
Figure 11.14: Weighted Principal Components for Performer 5, Prelude 6
223
Figure 11.16: Weighted Principal Components for Performer 7, Prelude 6
224
Figure 11.18: Weighted Principal Components for Performer 9, Prelude 6
225
Appendix C
Extra multi-modal graphs from
Chapter 8
226
Figure 11.20: Motion, Tempo and Dynamics for Performer 5 , Prelude 7
227
Figure 11.22: Motion, Tempo and Dynamics for Performer 7 , Prelude 7
228
Figure 11.24: Motion, Tempo and Dynamics for Performer 9 , Prelude 7
229
Figure 11.26: Motion, Tempo and Dynamics for Performer 5 , Prelude 6
230
Figure 11.28: Motion, Tempo and Dynamics for Performer 7 , Prelude 6
231
Figure 11.30: Motion, Tempo and Dynamics for Performer 9 , Prelude 6
232
Appendix D
Extra multi-modal graphs from
Chapter 9
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
260
LH RH
220
movement
(pixels)
y axis
Wrist
180
140
100
1400 LH RH
z axis estimate
movement
(mm)
Wrist
1000
600
1500
RMS values
amplitude
1000
Rms
500
0
Start Bar 1 Bar 2 Bar 3 Bar 4
200
tempo
150
Tempo
(bpm)
100
50
12 13 14 15 16 17 18 19 20
Time(s)
Figure 11.31: Wrist Motion, Tempo and Dynamics for Carlisle Frank, Prelude in
A Major
233
Carlisle Frank performing Chopin Prelude in A major
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
200
LH RH
movement
150
(pixels)
Thumb
y axis
100
50
2000
1750 LH RH
z axis estimate
movement
1500
Thumb
(mm)
1250
1000
750
1500
RMS values
amplitude
1000
Rms
500
0
Start Bar 1 Bar 2 Bar 3 Bar 4
200
tempo
150
Tempo
(bpm)
100
50
12 13 14 15 16 17 18 19 20
Time(s)
Figure 11.32: Thumb Motion, Tempo and Dynamics for Carlisle Frank, Prelude in
A Major
234
Carlisle Frank performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
260
LH RH
220
movement
(pixels)
y axis
Wrist
180
140
100
700
LH RH
movement
z axis
(mm)
Wrist
699.5
699
1500
RMS values
amplitude
1000
Rms
550
tempo
450
Tempo
350
(bpm)
250
150
50
6 7 8 9 10
Time(s)
Figure 11.33: Wrist Motion, Tempo and Dynamics for Carlisle Frank, performing
the Chopin finale
235
Carlisle Frank performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
170 LH RH
movement
130
(pixels)
Thumb
y axis
90
50
3500 LH RH
z axis estimate
3100
movement
2700
Thumb
(mm)
2300
1900
1500
1100
1500
RMS values
amplitude
1000
Rms
0
550
tempo
450
350
Tempo
(bpm)
250
150
50
6 7 8 9 10
Time(s)
Figure 11.34: Thumb Motion, Tempo and Dynamics for Carlisle Frank, performing
the Chopin finale
236
Fali Pavri performing Chopin Prelude in A major
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
240 LH RH
200
movement
160
(pixels)
y axis
Wrist
120
80
40
0
1000
950 LH RH
z axis estimate
900
movement
850
(mm)
Wrist
800
750
700
650
6000
5000 RMS values
amplitude
4000
Rms
3000
2000
1000
0
550
Start Bar 1 Bar 2 Bar 3 tempo Bar 4
450
350
Tempo
(bpm)
250
150
50
14 15 16 17 18 19 20 21
Time(s)
Figure 11.35: Wrist Motion, Tempo and Dynamics for Fali Pavri, Prelude in A Ma-
jor
237
Fali Pavri performing Chopin Prelude in A major
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
240 LH RH
200
movement
160
(pixels)
Thumb
y axis
120
80
40
0
2200 LH RH
z axis estimate
movement
1700
Thumb
(mm)
1200
700
200
6000
5000 RMS values
amplitude
4000
Rms
3000
2000
1000
0
550
Start Bar 1 Bar 2 Bar 3 tempo Bar 4
450
350
Tempo
(bpm)
250
150
50
14 15 16 17 18 19 20 21
Time(s)
Figure 11.36: Thumb Motion, Tempo and Dynamics for Fali Pavri, Prelude in
A Major
238
Fali Pavri performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
250
LH RH
210
movement
(pixels)
y axis
170
Wrist
130
90
50
1100
LH RH
z axis estimate
1000
movement
(mm)
Wrist
900
800
700
5000
RMS values
4000
amplitude
3000
Rms
250
150
50
14 15 16 17 18 19
Time(s)
Figure 11.37: Wrist Motion, Tempo and Dynamics for Fali Pavri, performing the
Chopin finale
239
Fali Pavri performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
250
LH RH
210
movement
(pixels)
Thumb
y axis
170
130
90
50
2700 LH RH
z axis estimate
2200
movement
Thumb
(mm)
1700
1200
700
200
5000
RMS values
4000
amplitude
3000
Rms
250
150
50
14 15 16 17 18 19
Time(s)
Figure 11.38: Thumb Motion, Tempo and Dynamics for FPavri, performing the
Chopin finale
240
Simon Coverdale performing Chopin Prelude in A major
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
260
LH RH
220
movement
(pixels)
y axis
Wrist
180
140
100
1900
1700 LH RH
z axis estimate
1500
movement
1300
(mm)
Wrist
1100
900
700
500
3000
2500 RMS values
amplitude
2000
Rms
1500
1000
Start Bar 1 Bar 2 Bar 3 Bar 4
500
0
350
300 tempo
250
Tempo
(bpm)
200
150
100
50
8 9 10 11 12 13 14 15
Time(s)
Figure 11.39: Wrist Motion, Tempo and Dynamics for Simon Coverdale, Prelude
in A Major
241
Simon Coverdale performing Chopin Prelude in A major
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
210
LH RH
170
movement
(pixels)
Thumb
y axis
130
90
50
1900
1700 LH RH
z axis estimate
1500
movement
Thumb
1300
(mm)
1100
900
700
500
3000
2500 RMS values
amplitude
2000
Rms
1500
1000
Start Bar 1 Bar 2 Bar 3 Bar 4
500
0
350
300 tempo
250
Tempo
(bpm)
200
150
100
50
8 9 10 11 12 13 14 15
Time(s)
Figure 11.40: Thumb Motion, Tempo and Dynamics for Simon Coverdale, Prelude
in A Major
242
Simon Coverdale performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
x axis
Wrist
400
250
100
260
LH RH
220
movement
(pixels)
y axis
Wrist
180
140
100
1050
1000 LH RH
z axis estimate
950
movement
900
(mm)
Wrist
850
800
750
3000
2500 RMS values
amplitude
2000
Rms
1500
1000
500
0
550
Start Bar 2 Bar 3 Bar 4 tempo Bar 5
450
350
Tempo
(bpm)
250
150
50
8 9 10 11 12 13
Time(s)
Figure 11.41: Wrist Motion, Tempo and Dynamics for Simon Coverdale, perform-
ing the Chopin finale
243
Simon Coverdale performing Chopin finale op.35
700 LH RH
movement
550
(pixels)
Thumb
x axis
400
250
100
170 LH RH
movement
130
(pixels)
Thumb
y axis
90
50
1600
1400 LH RH
z axis estimate
movement
1200
Thumb
(mm)
1000
800
600
400
3000
2500 RMS values
amplitude
2000
Rms
1500
1000
500
0
550
Start Bar 2 Bar 3 Bar 4 tempo Bar 5
450
350
Tempo
(bpm)
250
150
50
8 9 10 11 12 13
Time(s)
Figure 11.42: Thumb Motion, Tempo and Dynamics for Simon Coverdale, per-
forming the Chopin finale
244
Appendix E
Extra finale database results from
Chapter 9
245
12
8
0.15 0.06 0.09 0.15 0.11 0.10 0.12 0.07 0.08 0.13 0.08 0.16 0.08 0.09 0.08 0.11 0.11 0.11 0.10 0.07 0.08 0.09 0.11 0.11
0.17 0.17 0.27 0.28 0.08 0.12 0.09 0.10 0.23 0.28 0.08 0.08 0.12 0.10 0.20 0.24 0.10 0.09 0.08 0.11 0.14 0.26 0.13 0.06
12
8
0.21 0.24 0.11 0.16
3
0.11 0.08 0.11 0.13 0.08 0.10 0.09 0.10 0.14 0.08 0.05 0.12 0.11 0.09 0.13 0.08 0.06 0.13 0.09 0.14 0.06 0.14
0.12 0.13 0.21 0.25 0.07 0.06 0.12 0.22 0.22 0.21 0.07 0.14 0.14 0.21 0.22 0.19 0.07 0.09 0.18 0.16 0.15 0.25 0.07
0.08 0.10 0.10 0.10 0.14 0.11 0.09 0.11 0.11 0.10 0.11 0.08 0.13 0.10 0.04
0.15 0.11 0.19 0.14 0.11 0.11 0.11 0.16 0.14 0.10 0.11 0.16 0.14 0.09 0.11 0.14 0.16 0.09
Figure 11.43: Database Results Page 1 for Carlisle Frank, B Flat minor Sonata fi-
nale, the first row of columns detailing inter-onset intervals and the second row of
columns detailing the keypress durations
246
2
5
0.10 0.11 0.10 0.12 0.06 0.12 0.12 0.06 0.13 0.07 0.10 0.11 0.12 0.05 0.11 0.13 0.08 0.15 0.05 0.08 0.15 0.09 0.10 0.09
0.20 0.23 0.11 0.15 0.09 0.14 0.26 0.09 0.13 0.09 0.15 0.13 0.11 0.07 0.15 0.23 0.12 0.15 0.08 0.13 0.19 0.11 0.11 0.15
0.08 0.08 0.16 0.08 0.09 0.10 0.11 0.09 0.12 0.09 0.11 0.10 0.08 0.08 0.11 0.13 0.04 0.11 0.09 0.12 0.12 0.12 0.07 0.09
0.11 0.11 0.14 0.08 0.16 0.10 0.07 0.08 0.12 0.07 0.12 0.08 0.17 0.09 0.13 0.16 0.07 0.10 0.20 0.15 0.15 0.14 0.07 0.13
7
0.09 0.12 0.07 0.11 0.14 0.08 0.08 0.12 0.10 0.07 0.12 0.11 0.09 0.08 0.13 0.10 0.10 0.09 0.07 0.09 0.13 0.08 0.09 0.11
0.11 0.14 0.16 0.14 0.15 0.09 0.31 0.14 0.09 0.10 0.14 0.09 0.12 0.11 0.16 0.11 0.09 0.13 0.11 0.12 0.15 0.14 0.12 0.28
0.10 0.10 0.08 0.10 0.13 0.08 0.10 0.13 0.09 0.08 0.08 0.12 0.09 0.09 0.12 0.09 0.10 0.10 0.08 0.12 0.09 0.10 0.10 0.10
0.13 0.10 0.11 0.11 0.11 0.09 0.08 0.11 0.08 0.15 0.10 0.09 0.13 0.10 0.10 0.08 0.12 0.17 0.19 0.29 0.11 0.10 0.10 0.10
Figure 11.44: Database Results Page 2 for Carlisle Frank, B Flat minor Sonata fi-
nale, the first row of columns detailing inter-onset intervals and the second row of
columns detailing the keypress durations
247
12
8
0.34 0.13 0.14 0.10 0.10 0.13 0.14 0.09 0.11 0.09 0.09 0.13 0.13 0.08 0.10 0.07 0.12 0.10 0.05 0.13 0.06 0.07 0.10 0.09
0.61 0.34 0.37 0.19 0.06 0.09 0.33 0.27 0.26 0.17 0.05 0.07 0.25 0.10 0.21 0.33 0.07 0.07 0.13 0.23 0.12 0.14 0.08 0.08
12
8
0.10 0.09 0.09 0.07
3
0.14 0.08 0.08 0.09 0.09 0.07 0.10 0.08 0.13 0.08 0.04 0.10 0.08 0.09 0.09 0.07 0.04 0.11 0.08 0.09 0.07 0.08
0.08 0.08 0.11 0.17 0.06 0.05 0.09 0.11 0.11 0.10 0.09 0.08 0.10 0.15 0.11 0.11 0.05 0.07 0.10 0.11 0.11 0.20 0.06
0.08 0.10 0.04 0.09 0.10 0.09 0.04 0.12 0.07
0.06 0.06 0.07 0.07 0.07 0.11 0.07 0.09 0.07 0.07 0.06 0.09 0.06 0.07
Figure 11.45: Database Results Page 1 for Fali Pavri, B Flat minor Sonata finale, the
first row of columns detailing inter-onset intervals and the second row of columns
detailing the keypress durations
248
2
5
0.06 0.10 0.10 0.08 0.07 0.10 0.09 0.04 0.11 0.07 0.10 0.12 0.07 0.06 0.08 0.12 0.08 0.09 0.07 0.08 0.12 0.08 0.13 0.06
0.09 0.12 0.07 0.06 0.07 0.13 0.17 0.11 0.11 0.11 0.12 0.07 0.07 0.06 0.08 0.11 0.07 0.09 0.08 0.09 0.10 0.11 0.09 0.14
0.08 0.08 0.13 0.11 0.06 0.08 0.12 0.06 0.06 0.12 0.05 0.12 0.14 0.04 0.09 0.09 0.05 0.11 0.10 0.01 0.17 0.09
0.10 0.07 0.07 0.06 0.08 0.09 0.15 0.09 0.09 0.08 0.08 0.08 0.11 0.10 0.08 0.08 0.06 0.10 0.11 0.14 0.12 0.06 0.08
7
0.10 0.08 0.06 0.09 0.10 0.05 0.07 0.08 0.09 0.07 0.12 0.08 0.07 0.07 0.10 0.09 0.10 0.09 0.06 0.12 0.08 0.07 0.09 0.12
0.08 0.08 0.09 0.09 0.10 0.08 0.15 0.10 0.06 0.08 0.23 0.08 0.08 0.11 0.13 0.12 0.10 0.12 0.08 0.08 0.07 0.08 0.09 0.24
0.12 0.07 0.08 0.07 0.08 0.11 0.10 0.08 0.09 0.08 0.09 0.10 0.08 0.07 0.10 0.11
0.14 0.09 0.13 0.08 0.15 0.10 0.09 0.06 0.10 0.08 0.11 0.06 0.07 0.10 0.08 0.07 0.10 0.09 0.09 0.08
Figure 11.46: Database Results Page 2 for Fali Pavri, B Flat minor Sonata finale, the
first row of columns detailing inter-onset intervals and the second row of columns
detailing the keypress durations
249
12
8
0.13 0.13 0.16 0.14 0.14 0.10 0.11 0.12 0.11 0.13 0.12 0.16 0.08 0.13 0.11 0.10 0.16 0.15 0.11 0.12 0.13 0.08 0.15 0.12
0.28 0.17 0.20 0.12 0.07 0.08 0.30 0.18 0.15 0.12 0.07 0.05 0.20 0.16 0.13 0.11 0.08 0.08 0.15 0.20 0.14 0.13 0.08 0.12
12
8
0.12 0.09 0.09 0.10
3
0.12 0.12 0.10 0.13 0.13 0.13 0.12 0.13 0.11 0.10 0.10 0.14 0.12 0.14 0.09 0.11 0.11 0.17 0.11 0.18 0.07 0.15
0.06 0.11 0.11 0.13 0.06 0.03 0.16 0.08 0.10 0.15 0.06 0.08 0.07 0.08 0.09 0.13 0.08 0.06 0.16 0.20 0.18 0.11 0.10
0.11 0.11 0.12 0.15 0.14 0.09 0.13 0.15 0.14 0.12 0.15 0.11
0.12 0.06 0.08 0.08 0.11 0.06 0.07 0.25 0.17 0.07 0.05 0.14 0.08 0.07 0.12 0.07
Figure 11.47: Database Results Page 1 for Simon Coverdale, B Flat minor Sonata
finale, the first row of columns detailing inter-onset intervals and the second row
of columns detailing the keypress durations
250
2
5
0.16 0.16 0.12 0.12 0.10 0.13 0.14 0.09 0.13 0.09 0.14 0.13 0.12 0.10 0.11 0.14 0.09 0.11 0.10 0.10 0.09 0.12 0.13 0.16
0.24 0.24 0.10 0.13 0.11 0.09 0.14 0.13 0.09 0.11 0.20 0.08 0.13 0.07 0.12 0.09 0.13 0.11 0.07 0.13 0.12 0.15 0.16 0.08
0.20 0.17 0.15 0.16 0.06 0.13 0.12 0.10 0.12 0.11 0.13 0.13 0.10 0.15 0.08 0.12 0.11 0.15
0.17 0.09 0.09 0.08 0.07 0.08 0.10 0.07 0.06 0.08 0.09 0.10 0.09 0.25 0.09 0.07 0.10 0.11 0.07 0.15 0.05
7
0.10 0.13 0.09 0.14 0.13 0.11 0.09 0.11 0.12 0.09 0.14 0.10 0.11 0.09 0.10 0.12 0.13 0.06 0.11 0.16 0.04 0.08 0.13
0.10 0.16 0.10 0.11 0.12 0.08 0.13 0.13 0.08 0.08 0.15 0.05 0.13 0.07 0.08 0.06 0.09 0.08 0.13 0.11 0.13 0.08 0.10 0.21
0.11 0.12 0.11 0.11 0.09 0.08 0.12 0.13 0.14 0.09 0.16 0.09 0.08 0.11 0.14 0.11 0.07 0.12 0.11 0.11 0.12 0.06 0.11 0.16
0.11 0.10 0.06 0.06 0.16 0.07 0.07 0.08 0.06 0.04 0.10 0.06 0.09 0.07 0.07 0.08 0.06 0.10 0.10 0.07 0.06 0.07 0.09 0.07
Figure 11.48: Database Results Page 2 for Simon Coverdale, B Flat minor Sonata
finale, the first row of columns detailing inter-onset intervals and the second row
of columns detailing the keypress durations
251
Appendix F
Publications Arising From Work
Described Herein
MacRitchie, J., Hair, G., Bailey, N.J and Pullinger, S. (2008) “Extracting Musical
Structure from Multi-Modal Performance Analysis” Conference for Interdisci-
plinary Musicology, Thessaloniki, Greece.
252
MacRitchie, J. Buck, B and Bailey, N.J (2009) “Gestural Communication: Linking
the Multi- Modal Analysis of Performance to Perception of Musical Structure”
International Symposium on Performance Science, Auckland, New Zealand.
MacRitchie, J., Bailey, N.J, and Hair, G. (2010) “Highlighting Structural Issues
in Piano Performance with Optical Finger Tracking.” International Conference
on Music and Gesture, Montreal, Canada.
Buck, B., Bailey, B., MacRitchie, J. and Parncutt, R. (2010) “Performers’ Body Mo-
tion and Phrase Structure: The role of velocity magnitude” International Con-
ference on Music and Gesture, Montreal, Canada.
253