This paper presents new frequency-domain voice modification techniques that combine the high-qual... more This paper presents new frequency-domain voice modification techniques that combine the high-quality usually obtained by timedomain techniques such as TD-PSOLA with the flexibility provided by the frequency-domain representation. The technique only works for monophonic sources (single-speaker), and relies on a (possibly online) pitch detection. Based on the pitch, and according to the desired pitch and formant modifications, individual harmonics are selected and shifted to new locations in the spectrum. The harmonic phases are updated according to a pitchbased method that aims to achieve time-domain shape-invariance, thereby reducing or eliminating the usual artifacts associated with frequency-domain and sinusoidal-based voice modification techniques. The result is a fairly inexpensive, flexible algorithm which is able to match the quality of time-domain techniques, but provides vastly improved flexibility in the array of available modifications.
Additive synthesis is a powerful tool for the anal- ysis/modification/synthesis of complex audio ... more Additive synthesis is a powerful tool for the anal- ysis/modification/synthesis of complex audio or speech signals. However, the cost of wavetable sinusoidal synthesis can become prohibitive for large numbers of sinusoids (more than a few hundred). In that case, techniques based on the inverse Fourier transform offer an attractive alternative, being 200-300% more efficient than wavetable synthesis depending on the number of sinusoids. This paper presents an improved technique based on the concatenation of short-term signals obtained by inverse Fourier transforms. In contrast to the standard overlap-add technique, the new algorithm requires synthesizing sinusoids in the frequency domain whose time-domain amplitudes vary linearly within the synthesis frame. The technique is shown to achieve higher quality than the standard overlap-add technique, at the cost of a small increase in computation.
Ieee Workshop on Applications of Signal Processing to Audio and Acoustics 2005, Nov 16, 2005
A class of simple active noise-canceling algorithms rely on negative feedback techniques [S.J. El... more A class of simple active noise-canceling algorithms rely on negative feedback techniques [S.J. Elliott et al, 1993 and S.M. Kuo et al, 1999]. In such systems, a microphone captures the combination of the incoming noise and the "anti-noise" emitted by a loudspeaker, and that signal is filtered, amplified, and phase-inverted, then sent back to the loudspeaker. Most commercially available active noise-reduction headphones use negative feedback systems, because they can easily be implemented using analog components, are relatively simple, and can achieve fairly good performance. The problem of designing the filter to be used in the feedback loop isn't simple though: the filter must have a high gain in the frequency region where significant noise attenuation is desirable, but must guarantee that the closed-loop system remains stable. These two constraints run against one another, and to this author's knowledge, no optimal design procedure has yet been proposed to design such filters. This paper presents an algorithm for designing optimal FIR filters (and near optimal IIR filters) to be used in negative-feedback active noise-canceling systems. The algorithm uses the cepstral domain to constrain the filter in both magnitude and phase, and a linear-programming optimization technique to achieve the maximum noise attenuation while maintaining stability
Le travail a ete effectue a l'ircam (institut pour la recherche et la coordination acoustique... more Le travail a ete effectue a l'ircam (institut pour la recherche et la coordination acoustique/musique) en liaison avec le laboratoire d'acoustique de l'enst, sous la direction de messieurs chaigne et rodet. Le cadre general de l'etude est celui de la synthese musicale additive. A partir de l'enregistrement d'un instrument de musique reel (piano, timbales. . . ), on cherche a retrouver l'ensemble des parametres de synthese additive et leur evolution temporelle, de facon a obtenir la meilleure synthese possible du signal original. Dans le cas des instruments de musique de type percussif, le modele physique de production du son (vibration de cordes, de lames ou de plaques) suggere l'emploi de la methode de prony pour l'analyse des composantes sinusoidales constitutives du signal original. Le principe de cette methode consiste a modeliser le signal sous la forme d'une somme de sinusoides amorties dont les parametres sont a determiner. Son application aux signaux instrumentaux souleve un certain nombre de problemes specifiques auxquels il faut apporter des solutions: problemes des battements de frequences (par exemple dans les sons de piano et de gong), problemes lies au grand nombre de sinusoides et aux composantes amorties tres audibles dans l'attaque du signal. La methode se revele tres efficace pour la classe des sons de type percussif, et permet d'obtenir un suivi tres precis de l'evolution des frequences et des amplitudes au cours du temps
Proceedings of the 1999 Ieee Workshop on Applications of Signal Processing to Audio and Acoustics Waspaa 99, Feb 1, 1999
The phase vocoder is usually presented as a high-quality solution for time-scale modification of ... more The phase vocoder is usually presented as a high-quality solution for time-scale modification of signals, Its main advantages versus the cheaper time-domain techniques include the high-quality of the output for a wide range of types of input signals (speech, music, noise), and the possibility to perform very large factor modifications (e.g., four-fold time-stretching or more). In this paper, we present two applications that require such extreme modification factors: we call the first one pitch-preserving audio scrubbing, in which a user can move a pointer along an audio track and hear the sound at the corresponding location without any pitch alteration. Because the user controls the playback location (and therefore the playback speed), and can very well stop at a given location, the required time-scale modification can involve a very large-factor. The second application consists of synchronizing an audio stream to a video stream, while avoiding pitch alteration. For extreme slow-motion playback, the time-scaling operation required to preserve the pitch can also involve a very large factor. We address theoretical and practical issues related to pitch-preserving synchronization of an audio track. Techniques are discussed to allow freezing time in the phase-vocoder and avoid problems associated with very large factor modifications.
Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on
We consider the problem of segmenting an audio signal into characteristic regions based on featur... more We consider the problem of segmenting an audio signal into characteristic regions based on feature-set similarities. In the proposed approach, a feature-space representation of the signal is generated; sequences of these feature-space samples are then aggregated into clusters corresponding to distinct signal regions. The algorithm consists of using linear discriminant analysis (LDA) to condition the feature space and dynamic programming (DP) to identify data clusters. We consider the design of the dynamic program cost functions; we are able to derive effective cost functions without relying on significant prior information about the structure of the expected data clusters. We demonstrate the application of the LDA-DP segmentation algorithm to speech/music discrimination. Experimental results are given and discussed.
International Conference on Acoustics, Speech, and Signal Processing, 1989
A theoretical aspect of Prony's algorithm is presented, and the choice of analysis parameters... more A theoretical aspect of Prony's algorithm is presented, and the choice of analysis parameters and the achieved spectral resolution are briefly discussed. The analysis system itself is described, including frequency detection, component matching, and amplitude estimation. The problems connected with the analysis of complex signals are explored and some solutions proposed and developed. Three examples of musical sound analysis are
The International Series in Engineering and Computer Science, 2002
Page 1. 7 TIME AND PITCH SCALE MODIFICATION OF AUDIO SIGNALS Jean Laroche ... Data compression: T... more Page 1. 7 TIME AND PITCH SCALE MODIFICATION OF AUDIO SIGNALS Jean Laroche ... Data compression: Time-scale modification has also been studied for the purpose of data compression for communications or storage [Makhoul and El-Jaroudi, 1986]. ...
Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1993
This paper deals with source-filter models of percussive instruments. A 'multi-channel excita... more This paper deals with source-filter models of percussive instruments. A 'multi-channel excitation/filter model' is presented in which a single excitation is used to generate several sounds, for example six piano tones belonging to the same octave. Techniques for estimating the model parameters are presented and applied to the sound of a real piano. Our experiments demonstrate that it is possible to calculate a single excitation signal which when fed into different filters, generates very accurate synthetic tones. Finally, a low-cost synthesis method is proposed that can be used to generate natural sounding percussive tones.<<ETX>>
The phase-vocoder is a well-known tool for the frequency domain processing of speech or audio sig... more The phase-vocoder is a well-known tool for the frequency domain processing of speech or audio signals, with applications such as time compression or expansion, pitch-scale modification, noise reduction, etc. In the context of time-scale or pitch-scale modification, the phase-vocoder is usually considered to yield high quality results, especially when large modification factors are used on polyphonic or non-pitched signals. However, the phase-vocoder is also known for an artifact that plagues its output, and has been described in the literature as either “phasiness”, “reverberation”, or “loss of presence”. Research has been devoted to understanding and reducing this artifact, and solutions have been proposed which either significantly improve the quality of the output at the cost of a very high additional computation time, or are inexpensive but only marginally effective. This paper examines the problem of phasiness in the context of time-scale modification of signals, and presents t...
The phase-vocoder is a well-established tool for time-scaling and pitch shifting speech and audio... more The phase-vocoder is a well-established tool for time-scaling and pitch shifting speech and audio signals. Its theory is now well understood and improvements have been proposed to reduce artifacts commonly en-countered when time-expanding signals by large fac-tors 1, 2, 3, 4]. In the literature, the phase-vocoder has been described primarily as a tool for time-scaling rather than pitch shifting, the latter usually being achieved by a combination of time-scaling and sam-pling rate conversion 5, 6]. This article focuses mainly on pitch-scale modiication of speech and audio sig-nals, and discusses the drawbacks of the standard time-scale/resampling technique. Two alternative tech-niques are presented that signiicantly reduce the com-plexity and computational cost, while ooering dramati-cally extended capabilities. In particular, the new tech-niques, which operate solely in the frequency domain, enable chorusing, harmonizing and non-standard fre-quency modiications such as partial-stret...
This paper presents new frequency-domain voice modification techniques that combine the high-qual... more This paper presents new frequency-domain voice modification techniques that combine the high-quality usually obtained by timedomain techniques such as TD-PSOLA with the flexibility provided by the frequency-domain representation. The technique only works for monophonic sources (single-speaker), and relies on a (possibly online) pitch detection. Based on the pitch, and according to the desired pitch and formant modifications, individual harmonics are selected and shifted to new locations in the spectrum. The harmonic phases are updated according to a pitchbased method that aims to achieve time-domain shape-invariance, thereby reducing or eliminating the usual artifacts associated with frequency-domain and sinusoidal-based voice modification techniques. The result is a fairly inexpensive, flexible algorithm which is able to match the quality of time-domain techniques, but provides vastly improved flexibility in the array of available modifications.
Additive synthesis is a powerful tool for the anal- ysis/modification/synthesis of complex audio ... more Additive synthesis is a powerful tool for the anal- ysis/modification/synthesis of complex audio or speech signals. However, the cost of wavetable sinusoidal synthesis can become prohibitive for large numbers of sinusoids (more than a few hundred). In that case, techniques based on the inverse Fourier transform offer an attractive alternative, being 200-300% more efficient than wavetable synthesis depending on the number of sinusoids. This paper presents an improved technique based on the concatenation of short-term signals obtained by inverse Fourier transforms. In contrast to the standard overlap-add technique, the new algorithm requires synthesizing sinusoids in the frequency domain whose time-domain amplitudes vary linearly within the synthesis frame. The technique is shown to achieve higher quality than the standard overlap-add technique, at the cost of a small increase in computation.
Ieee Workshop on Applications of Signal Processing to Audio and Acoustics 2005, Nov 16, 2005
A class of simple active noise-canceling algorithms rely on negative feedback techniques [S.J. El... more A class of simple active noise-canceling algorithms rely on negative feedback techniques [S.J. Elliott et al, 1993 and S.M. Kuo et al, 1999]. In such systems, a microphone captures the combination of the incoming noise and the "anti-noise" emitted by a loudspeaker, and that signal is filtered, amplified, and phase-inverted, then sent back to the loudspeaker. Most commercially available active noise-reduction headphones use negative feedback systems, because they can easily be implemented using analog components, are relatively simple, and can achieve fairly good performance. The problem of designing the filter to be used in the feedback loop isn't simple though: the filter must have a high gain in the frequency region where significant noise attenuation is desirable, but must guarantee that the closed-loop system remains stable. These two constraints run against one another, and to this author's knowledge, no optimal design procedure has yet been proposed to design such filters. This paper presents an algorithm for designing optimal FIR filters (and near optimal IIR filters) to be used in negative-feedback active noise-canceling systems. The algorithm uses the cepstral domain to constrain the filter in both magnitude and phase, and a linear-programming optimization technique to achieve the maximum noise attenuation while maintaining stability
Le travail a ete effectue a l'ircam (institut pour la recherche et la coordination acoustique... more Le travail a ete effectue a l'ircam (institut pour la recherche et la coordination acoustique/musique) en liaison avec le laboratoire d'acoustique de l'enst, sous la direction de messieurs chaigne et rodet. Le cadre general de l'etude est celui de la synthese musicale additive. A partir de l'enregistrement d'un instrument de musique reel (piano, timbales. . . ), on cherche a retrouver l'ensemble des parametres de synthese additive et leur evolution temporelle, de facon a obtenir la meilleure synthese possible du signal original. Dans le cas des instruments de musique de type percussif, le modele physique de production du son (vibration de cordes, de lames ou de plaques) suggere l'emploi de la methode de prony pour l'analyse des composantes sinusoidales constitutives du signal original. Le principe de cette methode consiste a modeliser le signal sous la forme d'une somme de sinusoides amorties dont les parametres sont a determiner. Son application aux signaux instrumentaux souleve un certain nombre de problemes specifiques auxquels il faut apporter des solutions: problemes des battements de frequences (par exemple dans les sons de piano et de gong), problemes lies au grand nombre de sinusoides et aux composantes amorties tres audibles dans l'attaque du signal. La methode se revele tres efficace pour la classe des sons de type percussif, et permet d'obtenir un suivi tres precis de l'evolution des frequences et des amplitudes au cours du temps
Proceedings of the 1999 Ieee Workshop on Applications of Signal Processing to Audio and Acoustics Waspaa 99, Feb 1, 1999
The phase vocoder is usually presented as a high-quality solution for time-scale modification of ... more The phase vocoder is usually presented as a high-quality solution for time-scale modification of signals, Its main advantages versus the cheaper time-domain techniques include the high-quality of the output for a wide range of types of input signals (speech, music, noise), and the possibility to perform very large factor modifications (e.g., four-fold time-stretching or more). In this paper, we present two applications that require such extreme modification factors: we call the first one pitch-preserving audio scrubbing, in which a user can move a pointer along an audio track and hear the sound at the corresponding location without any pitch alteration. Because the user controls the playback location (and therefore the playback speed), and can very well stop at a given location, the required time-scale modification can involve a very large-factor. The second application consists of synchronizing an audio stream to a video stream, while avoiding pitch alteration. For extreme slow-motion playback, the time-scaling operation required to preserve the pitch can also involve a very large factor. We address theoretical and practical issues related to pitch-preserving synchronization of an audio track. Techniques are discussed to allow freezing time in the phase-vocoder and avoid problems associated with very large factor modifications.
Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on
We consider the problem of segmenting an audio signal into characteristic regions based on featur... more We consider the problem of segmenting an audio signal into characteristic regions based on feature-set similarities. In the proposed approach, a feature-space representation of the signal is generated; sequences of these feature-space samples are then aggregated into clusters corresponding to distinct signal regions. The algorithm consists of using linear discriminant analysis (LDA) to condition the feature space and dynamic programming (DP) to identify data clusters. We consider the design of the dynamic program cost functions; we are able to derive effective cost functions without relying on significant prior information about the structure of the expected data clusters. We demonstrate the application of the LDA-DP segmentation algorithm to speech/music discrimination. Experimental results are given and discussed.
International Conference on Acoustics, Speech, and Signal Processing, 1989
A theoretical aspect of Prony's algorithm is presented, and the choice of analysis parameters... more A theoretical aspect of Prony's algorithm is presented, and the choice of analysis parameters and the achieved spectral resolution are briefly discussed. The analysis system itself is described, including frequency detection, component matching, and amplitude estimation. The problems connected with the analysis of complex signals are explored and some solutions proposed and developed. Three examples of musical sound analysis are
The International Series in Engineering and Computer Science, 2002
Page 1. 7 TIME AND PITCH SCALE MODIFICATION OF AUDIO SIGNALS Jean Laroche ... Data compression: T... more Page 1. 7 TIME AND PITCH SCALE MODIFICATION OF AUDIO SIGNALS Jean Laroche ... Data compression: Time-scale modification has also been studied for the purpose of data compression for communications or storage [Makhoul and El-Jaroudi, 1986]. ...
Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 1993
This paper deals with source-filter models of percussive instruments. A 'multi-channel excita... more This paper deals with source-filter models of percussive instruments. A 'multi-channel excitation/filter model' is presented in which a single excitation is used to generate several sounds, for example six piano tones belonging to the same octave. Techniques for estimating the model parameters are presented and applied to the sound of a real piano. Our experiments demonstrate that it is possible to calculate a single excitation signal which when fed into different filters, generates very accurate synthetic tones. Finally, a low-cost synthesis method is proposed that can be used to generate natural sounding percussive tones.<<ETX>>
The phase-vocoder is a well-known tool for the frequency domain processing of speech or audio sig... more The phase-vocoder is a well-known tool for the frequency domain processing of speech or audio signals, with applications such as time compression or expansion, pitch-scale modification, noise reduction, etc. In the context of time-scale or pitch-scale modification, the phase-vocoder is usually considered to yield high quality results, especially when large modification factors are used on polyphonic or non-pitched signals. However, the phase-vocoder is also known for an artifact that plagues its output, and has been described in the literature as either “phasiness”, “reverberation”, or “loss of presence”. Research has been devoted to understanding and reducing this artifact, and solutions have been proposed which either significantly improve the quality of the output at the cost of a very high additional computation time, or are inexpensive but only marginally effective. This paper examines the problem of phasiness in the context of time-scale modification of signals, and presents t...
The phase-vocoder is a well-established tool for time-scaling and pitch shifting speech and audio... more The phase-vocoder is a well-established tool for time-scaling and pitch shifting speech and audio signals. Its theory is now well understood and improvements have been proposed to reduce artifacts commonly en-countered when time-expanding signals by large fac-tors 1, 2, 3, 4]. In the literature, the phase-vocoder has been described primarily as a tool for time-scaling rather than pitch shifting, the latter usually being achieved by a combination of time-scaling and sam-pling rate conversion 5, 6]. This article focuses mainly on pitch-scale modiication of speech and audio sig-nals, and discusses the drawbacks of the standard time-scale/resampling technique. Two alternative tech-niques are presented that signiicantly reduce the com-plexity and computational cost, while ooering dramati-cally extended capabilities. In particular, the new tech-niques, which operate solely in the frequency domain, enable chorusing, harmonizing and non-standard fre-quency modiications such as partial-stret...
Uploads
Papers by jean laroche