Using A Pitch Detector For Onset Detection

This document describes a pitch-based algorithm for detecting note onsets in monophonic instrumental music. It uses a pitch detector to extract the fundamental frequency over time. The pitch track is then processed to suppress vibrato and clean up errors. Onsets are identified as transitions between stable pitches. The algorithm shows improved performance over energy-based detectors for material with slow attacks and vibrato, such as the human voice. Further cues are needed for a complete solution, but this method provides a useful component for note event analysis.

Uploaded by

faroo28

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

Using A Pitch Detector For Onset Detection

Uploaded by

faroo28

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

USING A PITCH DETECTOR FOR ONSET DETECTION

Nick Collins
University of Cambridge
Centre for Music and Science
11 West Road, Cambridge, CB3 9DP, UK
nc272@cam.ac.uk

ABSTRACT segmentation feature, a tactic also highlighted by Tris-

tan Jehan in his event analysis/synthesis work (Jehan,
A segmentation strategy is explored for monophonic in-
2004). Fundamental frequency trails have been segmenta-
strumental pitched non-percussive material (PNP) which
tion features in work by teams from IRCAM (Rossignol
proceeds from the assertion that human-like event analy-
et al., 1999b,a) and Universitat Pompeu Fabra (Gómez
sis can be founded on a notion of stable pitch percept. A
et al., 2003b,a). Whilst many signal attributes, particu-
constant-Q pitch detector following the work of Brown
larly timbral descriptors, may contribute to onset detection
and Puckette provides pitch tracks which are post pro-
and event parsing (Handel, 1995; Yost and Sheft, 1993;
cessed in such a way as to identify likely transitions be-
Moore, 1997), the use of a central pitch percept is investi-
tween notes. A core part of this preparation of the pitch
gated in this paper as one component of a plausible strat-
detector signal is an algorithm for vibrato suppression. An
egy, and a significant one for the source material tackled
evaluation task is undertaken on slow attack and high vi-
herein.
brato PNP source files with human annotated onsets, ex-
In this paper I attempt to explore the basis of an
emplars of a difficult case in monophonic source segmen-
improved onset detection algorithm for pitched material
tation. The pitch track onset detection algorithm shows an
which uses the stability of a pitch percept as the defining
improvement over the previous best performing algorithm
property of a sound event. In order to obtain a clean de-
from a recent comparison study of onset detectors. Whilst
tection signal, the output of a pitch detection algorithm is
further timbral cues must play a part in a general solution,
processed in various ways, including by the suppression of
the method shows promise as a component of a note event
vibrato, following Rossignol et al. (1999b). The choice of
analysis system.
pitch detection algorithm is open, but the specific detector
considered in this paper is Brown and Puckette’s constant
Keywords: onset detection, pitch detection, segmenta- Q transform pitch tracker (Brown and Puckette, 1993).
tion The material with which I am concerned provides the
hardest case of monophonic onset detection, consisting
of musical sounds with slow attacks and containing vi-
1 INTRODUCTION brato, such as the singing voice (Saitou et al., 2002). Vi-
A recent paper (Collins, 2005) compared a number of mu- brato associated frequency and amplitude modulation pro-
sical onset detection functions with respect to onset detec- vides problems to traditional energy based onset detectors,
tion performance on sets of non-pitched percussive (NPP) which tend to record many false positives as they follow
and pitched non-percussive (PNP) sound files. Whilst the typically 4-7 Hz oscillation. For such material, the
many algorithms performed successfully at the NPP task, sought after performance is a segmentation as a human
with few false positives for a large number of correct de- auditor would perceive sound events. Better than human
tections, the ability of the same algorithms to parse the listener performance, as possible for some high speed per-
PNP set was substantially reduced. The most successful cussive sequences via non-real-time digital editing or by
attempt was that of the phase deviation algorithm (Bello algorithm (Collins, 2005) is unlikely.
et al., 2004), which uses a measure of the change of in- The applications of such an algorithm are multifold.
stantaneous frequency. It was proposed that this suc- Onset detection is a frontend to beat induction algorithms
cess could be linked to the use of stable pitch cues as a (Klapuri et al., 2004), empowers segmentation for rhyth-
mic analysis and event manipulation both online and of-
fline (Jehan, 2004; Brossier et al., 2004), and provides a
Permission to make digital or hard copies of all or part of this basis for automatically collating event databases for com-
work for personal or classroom use is granted without fee pro- positional and information retrieval applications (Rossig-
vided that copies are not made or distributed for profit or com- nol et al., 1999b; Schwarz, 2003). Extraction of note event
mercial advantage and that copies bear this notice and the full locations from an audio signal is a necessary component
citation on the first page. of automatic transcription, and the vibrato suppression in-
vestigated here may assist clear f0 estimation. For music
2005
c Queen Mary, University of London
Figure 2: The upper f0 track is cleaned up and the result
is the lower track

rithm next described.

A post processing stage was added to clean up some
small blips in the signal, consisting of momentary oc-
Figure 1: Overview of the algorithm tave errors and rogue outliers. Whilst a jump to an oc-
tave which is then maintained could indicate a true oc-
tave leap in the music, some obvious short-term octave
information retrieval, the ’query by humming’ approach errors were seen, with lengths of one or two frames. The
requires the parsing of monophonic vocal melodies from original Brown/Puckette algorithm also occasionally cre-
audio signal alone. ated some strange values during otherwise relatively sta-
ble held pitches. The pseudocode in figure 3 reveals the
tactic employed to clean up these short-term errors. The
2 ALGORITHM OUTLINE MATLAB indexing convention of counting from 1 is used.
Figure 1 gives an overview of the detection algorithm and The two tests check against the ratio of an equal tempered
the associated signal features based on the extracted fun- semitone.
damental frequency f0 . The following subsections will Figure 2 demonstrates the application of the algorithm
address successive stages of the onset detector. on a signal which has out of bound pitches and instanta-
neous errors against the general trend.
It is convenient to transform the fundamental fre-
2.1 Pitch Detection quency track to pitch in semitones prior to vibrato sup-
pression, as a musically normalised representation. An
Brown and Puckette (1993) describe an efficient FFT
arbitrary reference point is selected such that 0 Hz is trans-
based pitch detection algorithm which cross correlates a
formed to 0 semitones.
harmonic template with a constant Q spectrum in a search
for the best fitting fundamental frequency f0 . The form
of the template is devised so as to minimise octave errors; p = 12 ∗ log2 ((f + 440)/440) (1)
the template consists of the first 11 harmonics, weighted
from 1.0 to 0.6. A further stage evaluates phase change in 2.2 Vibrato Suppression
the winning FFT bin to get a more accurate value for the
pitch unconstrained by the limited bin resolution. Since The f0 track is perturbed by vibrato, and this can be at-
the full details are given in their papers (Brown and Puck- tributed as the chief cause of noise on that signal disrupt-
ette, 1992, 1993) and my implementation follows that ing its use in segmentation. Rossignol et al. (1999b) noted
work I shall avoid a fuller discussion of this pitch detec- this in their event segmentation paper, and sketch a vi-
tion method. Alternative pitch detection algorithms may brato suppression algorithm. Herrera and Bonada (1998)
easily be placed as front-ends to the analysis system now have also outlined both frequency domain and time do-
to be described. main vibrato suppression methods within the context of
The 4096 point FFT driving the pitch detector was run the SMS (Spectral Modeling Synthesis) framework, us-
with a step size of 512 samples, for a frame rate of around ing an FFT to isolate 6-7Hz vibrato by analysing peaks
86 Hz (all the audio signals involved had 44100Hz sam- in the frequency domain before suppression and IFFT re-
pling rate). The pitch detector output was taken from 150- synthesis, and in the time domain, a 10Hz high pass fil-
2000Hz, with values outside this range shifted by octave ter on a 200mS window. These methods require the be-
steps into this compass, and values outside 22050Hz sent fore application identification of the mean around which
to 1 Hz, where they are easily cleaned up with the algo- a vibrato fluctuates, and utilise fixed windows. Rossignol
postprocessing(arg input)
for jj= 2 to 7 {
for ii= 1 to (length(input)-jj){
testratio= input(ii)/input(ii+jj);
if testratio < 1.059 AND testratio > 0.945{
for kk=1 to (jj-1){
mid = (input(ii)+input(ii+jj))*0.5;
testratio2= input(ii+kk)/mid;
if testratio2 > 1.059 OR testratio < 0.945
input(kk) = mid;
}
}
}
}
output=input;

Figure 3: Pseudocode for the outlier removal algorithm

et al. (1999a) also expands upon a selection of methods

for suppression; I followed the ‘minima-maxima detec-
tion’ method as in common with Rossignol et al. (1999b)
as the most plausible for my purposes.
Attempts to implement the Rossignol et al. (1999b) al-
gorithm, however, were somewhat thwarted by the ques-
tion of the best windowing strategy to use; their algorithm
is underspecified. A vibrato suppression algorithm is de-
scribed here which is inspired by their work but makes ex-
plicit how the search for regions of vibrato will take place,
and uses some variation in the criteria for a vibrato detec-
tion and substituting value, along with variable window
size to encompass vibrato regions.
Vibrato removal proceeds in windows of 300mS, with
a step size of 100mS. If the difference of the maximum
and minimum value of the input within this window is less
than 1.5 semitones, a search for vibrato ensues. All max-
Figure 4: Vibrato suppression for an ascending arpeg-
ima and minima within the (open) window range form a
giated violin signal. The FFT frames are on the abscissae,
list of extrema. Lists of differences in time and in ampli-
pitch in semitones or a 0/1 flag for the ordinate
tude of the extrema are taken, and the variances of these
lists calculated. Note that this is different to Rossignol
et al. (1999b) where the maxima and minima lists are con-
sidered separately. The quantity pextrema is calculated
as the proportion of the time differences between extrema tone. Secondly, the window is only extended where no
that fall within the vibrato range of 0.025 to 0.175 sec- value departs more than a semitone from the mean of the
onds, corresponding to 2.86 to 20 Hz frequency modula- extrema list. The correction is applied, replacing all val-
tion. A vibrato is detected when pextrema is large and the ues in the window with the mean of the extrema list. After
variances are sufficiently small. suppressing a vibrato, the search for vibrato recommences
Given a vibrato detected in a window, the window is with the window positioned at the next frame unaffected
now gradually extended so as to take in the whole duration by the changes.
of this vibrato; this guarantees that the corrections will not Figure 4 shows an example where the vibrato suppres-
be piecemeal, giving rise to some erroneous fluctuations. sion works effectively. The top part of the figure shows the
A number of conditions are checked as the window is in- input, the centre marks areas where vibrato was detected
crementally widened, so as not to confuse a vibrato with and shows the length of the windows after extension, and
a jump to a new pitch. The mean of the input has been the bottom shows the vibrato suppressed output. Figure
precalculated in 21 frame segments centred on each point. 5 shows a less clean case where the suppression does not
This mean allows a guide as to the centre point of any remove all the frequency modulation. The heuristical al-
vibrato oscillation; if this mean changes during the win- gorithm given in this paper could likely be extended via
dow extension, it is likely that a new note event has com- such tactics as a cross correlation search for matches to
menced. This test was particularly important in cases of sinusoidal variation exhaustively through appropriate fre-
singing where the magnitude of vibrato on one tone could quencies or by further rules based on a study of instrumen-
encompass the smaller vibrato magnitude on a succeeding tal vibrato. It works well enough, however, for evaluation
purposes herein.
FFT. The pitch track test may have to be combined with
other features, to be described next. However, one inter-
esting case, that is not particularly well dealt with by the
vibrato suppression stage at the present time, is that the
end and restart of a vibrato itself may indicate a transition
between successive notes.

2.4 Correction for Signal Power

Because the detection function did not take account of sig-
nal power, onsets would often appear at the very tails of
events, for events which end in silence. To counteract this,
a multiplier was introduced based on the signal power im-
mediately following a given frame. A basic temporal in-
tegration was carried out, taking a weighted sum over 10
frames, and compressing to 1 for all reasonably large val-
Figure 5: Vibrato suppression for a solo soprano signal. ues. Small values under 0.01 of the maximum power were
The FFT frames are on the abscissae, pitch in semitones left unaffected and downweighted troublesome points in
or a 0/1 flag for the ordinate the pitch detector based detection function.

2.5 Peak Picking

A detection function must yield onset locations via some
peak picking process. Bello et al. (2004) provide an adap-
tive peak picking algorithm based on a median filter on a
moving window. Their peak picker was used as a common
stage in the evaluation, following (Collins, 2005; Bello
et al., 2004), and the algorithm is not discussed further
here.

3 EVALUATION
3.1 Procedure
An evaluation of the pitch detection based onset detec-
tor was carried out using the same methodology as pre-
Figure 6: The upper cleaned and vibrato suppressed pitch vious comparative studies of onset detection effectiveness
track is converted to a detection function (Collins, 2005; Bello et al., 2004). Pitched non-percussive
(PNP) soundfiles originally prepared and annotated by
Juan Bello formed the test set. 11 source files were se-
2.3 Assessing Peaks of Instability lected, containing 129 onsets, comprising slow attack and
high vibrato sounds from strings and voices. The on-
Given the vibrato suppressed pitch tracks, note events sets were sparse in relatively long sound files, providing
must be distinguished by jumps of pitch. A procedure is a great challenge; with amplitude modulation associated
applied to rate the strength of changes in the pitch track p with vibrato, it is unsurprising that loudness based detec-
over time. tion functions fared so poorly in Collins (2005). The tol-
8
erance for matches between algorithm and hand-marked
df (i) =
X
min (|p(i) − p(i + j)|, 2) (2) onsets was set at a very tolerant 100mS, though this win-
dow was small compared to the average distance between
j=1
note events.
The min operator disregards the size of changes The pitch track onset detection function was compared
greater than a tone to avoid overly biasing the output de- to the phase deviation detection function with a common
tection function df based on the size of leap between notes adaptive peak picking stage. The peak picker has a pa-
involved. Figure 6 demonstrates df for a soprano signal. rameter δ which acts like an adaptive threshold; this was
Because changes are sought out, cues for multiple note varied between -0.1 and 0.53 in steps of 0.01, giving 64
events in a row of the same pitch are the most difficult case runs on the test set for each detection function. A Re-
to spot (particularly questionable are the case of smooth ceiver Operating Characteristics curve was drawn out as
transitions between same pitch notes- how little energy delta is varied. This ROC curve is given in figure 7. The
drop can a player get away with?). It is assumed that closest points to the top left corner indicate the better per-
note onsets should show some slight perturbation in pitch, formance, with many correct detections for few false pos-
though the pitch integration area is around 90mS in the itives.
Table 1: NPP test set comparison of detection functions with Bello et al. (2004) peak picker
detection function score (eqn 4) CDR Onsets Detected False Positives best δ
1. pitch track detection function 42.6 -17 58.1 36.4 0.13
2. phase deviation (Bello et al., 2004) 32.8 -36.4 45.0 37.0 0.13

notes of the same pitch. It might even be speculated that

the appearance of vibrato in long notes can be linked to
a human desire for stimulation over time, for the con-
found given by vibrato and associated amplitude modu-
lation (often at 4-7 Hz) is comparable to new amplitude
cued events at the same rate. The central pitch around
which the vibrato oscillates maintains the identity of a sin-
gle note event.
Various problems with the evaluation task were noted,
which may have underrated the performance of the pitch
detector. First, the annotations were at their most subjec-
tive for this type of note event; as Leveau et al. (2004)
note, the annotation task involves some variability in
decisions between human experts, particularly for com-
plex polyphonic music and instruments with slow attacks.
However, at the time of writing, the Bello database pro-
vided a larger test set (11 as opposed to 5 files), and the
Figure 7: ROC curve of false positives against correct de-
Leveau database could not be made to function properly
tections comparing phase deviation and pitch track onset
within MATLAB.
detector functions over varying δ
Human pitch perception shows different time resolu-
tion capabilities to the computer pitch tracker used herein.
Results for the best δ for each algorithm are given in Whilst the qualitative agreement of onset locations with
table 1 with ratings with respect to two measures of perfor- the hand marked ones was much more impressive for the
mance. Liu et al. (2003)’s Correct Detection Ratio (CDR) stable pitch detector than the phase deviation (for exam-
is described by the equation: ple, figure 8), these would often be early with respect to
the human marked positions (though could also appear
total − missing − spurious late). To compensate somewhat, a delay of 7 frames had
CDR = ∗ 100% (3)
total been introduced in the detection function for the compar-
but is not constrained, however, to return values between ison test. The time resolution of the new onset detection
0-100. I also introduce therefore an evaluation formula algorithm is dependent on the lower time resolution of the
fromDixon (2001), originally used for the assessment pitch detection algorithm, with a 4096 point FFT (pitch
of beat tracking algorithm performance as an alternative detection accuracy degrades with a shorter window); the
scoring mechanism, combining matches m, false positives phase deviation was much less susceptible to this problem,
F + (spurious) and false negatives F − (missing). based on a 1024 point FFT. Localisation could perhaps
be improved by zero padded FFTs for the pitch detector,
m parallel time domain autocorrelation and timbrally mo-
score = ∗ 100% (4)
m + F− + F+ tivated onset detection (differentiating transient regions
The denominator includes the term for the number of on- from smooth wherever possible) and remains an area for
sets in the trial n as m+F − . These measures are the same further investigation.
as in (Collins, 2005). The selection of the test set also played a role. When
onsets are sparse, false positives count for proportionally
more over the run. A combination of sound files requiring
3.2 Discussion many onsets to be detected and those with sparse onsets
A small advance is shown by the pitch detection based on- is a difficult combination, for onset detectors built to risk
set detector, its performance being marginally better than more will score very poorly on the sparse regions. It can
the phase deviation and by extension all the energy based be speculated that additional contextual clues due to tim-
detection functions considered in (Collins, 2005). The bre and musical convention are utilised by human listeners
success of a pitch detection cue gives corroborative evi- to focus their event detection strategy. An onset detection
dence that note events defined by stable pitch percept are a algorithm which performed well for both NPP and PNP
plausible segmentation strategy. The fact that vibrato had material would most likely require some switching mech-
to be suppressed for effective performance shows the im- anism based on the recognition of instrument and playing
portance of higher level feature extraction in human seg- style. The evocation of a pitch percept and the detection
mentation. As noted above, the onset and offset of a vi- of vibrato cues may provide knowledge for deciding the
brato may be a feature that helps to segment successive event segmentation tactic.
Whilst the pitch discrimination capabilities of humans
are much more refined than a semitone, a semitone has
been used above as a practical working value for the size
of pitch changes, as opposed to vibrato. In fact, the or-
der of vibrato can approach that of note events, and some
tighter heuristics for the vibrato suppression which take
into account the nature of the vibrato percept may need to
be applied.
General improvements may arise from investigating
computational auditory models, for the goal on such mu-
sical material as targeted in this paper is to match a hu-
man auditor’s segmentation. A better pitch detection al-
gorithm as a frontend to event segmentation may be one
modeled more thoroughly on neural coding of periodicity,
with realistic pitch reaction time and stability characteris-
tics. For example, a perceptually plausible pitch detector
is proposed by Slaney and Lyon (1990).
Figure 8: Comparison of pitch detector (middle) and pitch
It is likely that human auditors use instrument recog-
deviation (bottom) on a violin signal. The top shows the
nition cues to decide on a segmentation strategy. Prior
source signal with onsets marked- those on the top line
knowledge of instrument timbre and associated playing
show the human annotation, above the middle those due to
conventions provide situations where human segmenta-
the pitch detector algorithm and below the phase deviation
tion may continue to out perform machine in the near fu-
ture.
For the determination, given arbitrary material, of the
best algorithm to use, a computer program might assess
ACKNOWLEDGEMENTS
the stability of pitch cues (amount of fluctuation) and gen-
eral inharmonicity to decide if pitched material is being Thanks are due to Juan Bello for providing the evaluation
targeted. Attack time cues through the file may distin- test set, and the helpful comments of four anonymous IS-
guish whether to apply a combined pitch and amplitude MIR reviewers.
algorithm or a pure pitch algorithm for slow attacks, and
how to deal with confounds from the recognition of the
specific shape of vibrato or other playing conventions (on REFERENCES
which much further work might be done).
J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury,
In testing the algorithm, it was found that the quality
M. Davies, and S. B. Sandler. A tutorial on onset de-
of pitch detection tracks was worse for lower register in-
tection in music signals. IEEE Transactions on Speech
struments, as for double bass or bass voice. This could
and Audio Processing, 2004.
be traced to inadequacies in the constant Q pitch detec-
tor for tracking fundamentals below around 150Hz. False P. Brossier, J. P. Bello, and M. D. Plumbley. Real-time
matches to higher harmonics could skew the pitch tracks temporal segmentation of note objects in music signals.
and the algorithm consistently gave the worst detection In Proc. Int. Computer Music Conference, 2004.
scores for such cases. Leaving these troublesome sound
J. C. Brown and M. S. Puckette. An efficient algorithm
files out of the test set led to much improved performance.
for the calculation of a constant Q transform. J. Acoust.
On a reduced test set of 6 files, the algorithm then achieved
Soc. Am., 92(5):2698–701, November 1992.
58.7% correct detections for 21.4% false positives (Dixon
score of 48.3, CDR 1.3) as opposed to 45.3% correct to J. C. Brown and M. S. Puckette. A high-resolution
38.2% false positives (Dixon score 32.8, CDR -37.3) for fundamental frequency determination based on phase
the phase deviation. changes of the Fourier transform. J. Acoust. Soc. Am.,
94(2):662–7, 1993.

4 CONCLUSIONS N. Collins. A comparison of sound onset detection algo-

rithms with emphasis on psychoacoustically motivated
In this paper, a pitch detection algorithm was adapted for detection functions. In AES Convention 118, Barcelona,
an onset detection task on pitched non-percussive source May 28-31 2005.
material. This often slow attacking and vibrato-ridden
S. Dixon. Automatic extraction of tempo and beat from
monophonic music provides a challenging case for event
expressive performances. Journal of New Music Re-
segmentation. A very high correct identification to low
search, 30(1):39–58, 2001.
false positive rate is yet to be exhibited commensurate
with the success rates on the easier NPP task, but the tac- E. Gómez, M. Grachten, X. Amatriain, and J. Arcos.
tic introduced shows some promise for the PNP task. It is Melodic characterization of monophonic recordings for
the most promising of detection functions assessed so far, expressive tempo transformations. In Proceedings of
particularly by qualitative comparison of results from the Stockholm Music Acoustics Conference 2003, Stock-
new detector with that of the phase deviation algorithm. holm, Sweden, 2003a.
E. Gómez, A. Klapuri, and B. Meudic. Melody descrip-
tion and extraction in the context of music content pro-
cessing. Journal of New Music Research, 32(1), 2003b.
S. Handel. Timbre perception and auditory object identi-
fication. In Moore (1995), pages 425–61.
P. Herrera and J. Bonada. Vibrato extraction and param-
eterization in the spectral modeling synthesis frame-
work. In Proc. Digital Audio Effects Workshop (DAFx),
Barcelona, 1998.
T. Jehan. Event-synchronous music analysis/synthesis. In
Proc. Digital Audio Effects Workshop (DAFx), Naples,
Italy, Oct. 2004.
A. P. Klapuri, A. J. Eronen, and J. T. Astola. Analysis
of the meter of acoustic musical signals. IEEE Trans.
Speech and Audio Processing, forthcoming, 2004.
P. Leveau, L. Daudet, and G. Richard. Methodology and
tools for the evaluation of automatic onset detection al-
gorithms in music. In Proc. Int. Symp. on Music Infor-
mation Retrieval, 2004.
R. Liu, N. Griffth, J. Walker, and P. Murphy. Time do-
main note average energy based music onset detection.
In Proceedings of the Stockholm Music Acoustics Con-
ference, Stockholm, Sweden, August 2003.
B. C. J. Moore, editor. Hearing. Academic Press, San
Diego, CA, 1995.
B. C. J. Moore. An Introduction to the Psychology of
Hearing. Academic Press, San Diego, CA, 1997.
S. Rossignol, P. Depalle, J. Soumagne, X. Rodet, and
J. Collette. Vibrato: Detection, estimation, extraction
and modification. In Proc. Digital Audio Effects Work-
shop (DAFx), 1999a.
S. Rossignol, X. Rodet, J. Soumagne, J.-L. Collette, and
P. Depalle. Automatic characterisation of musical sig-
nals: Feature extraction and temporal segmentation.
Journal of New Music Research, 28(4):281–95, 1999b.
T. Saitou, M. Unoki, and M. Akagi. Extraction of f0
dynamic characteristics and development of f0 control
model in singing voice. In Proc. of the 2002 Int. Conf.
on Auditory Display, Kyoto, Japan, July 2002.
D. Schwarz. New developments in data-driven concate-
native sound synthesis. In Proc. Int. Computer Music
Conference, 2003.
M. Slaney and R. F. Lyon. A perceptual pitch detector. In
Proc. ICASSP, pages 357–60, 1990.
W. A. Yost and S. Sheft. Auditory perception. In W. A.
Yost, A. N. Popper, and R. R. Fay, editors, Human Psy-
chophysics, pages 193–236. Springer, New York, 1993.