This paper presents a method for automatic music transcription applied to audio recordings of a c... more This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs spectrogram decomposition, extending probabilistic latent component analysis (PLCA) using a six-dimensional dictionary with pre-extracted log-spectral templates. The music language model performs voice separation and assignment using hidden Markov models that apply musicological assumptions. By integrating the two models, the system is able to detect multiple concurrent pitches in polyphonic vocal music and assign each detected pitch to a specific voice type such as soprano, alto, tenor or bass (SATB). We compare our system against multiple baselines, achieving state-of-the-art results for both multi-pitch detection and voice assignment on a dataset of Bach chorales and another of barbershop quartets. We also present an additional evaluation of our system using varied pitch tolerance levels to investigate its performance at 20-cent pitch resolution.
Music is a complex multimodal medium experienced not only via sounds but also through body moveme... more Music is a complex multimodal medium experienced not only via sounds but also through body movement. Musical instruments can be seen as technological objects coupled with a repertoire of gestures. We present technical and conceptual issues related to the digital representation and mediation of body movement in musical performance. The paper reports on a case study of a musical performance where motion sensor technologies tracked the movements of the musicians while they played their instruments. Motion data were used to control the electronic elements of the piece in real time. It is suggested that computable motion descriptors and machine learning techniques are useful tools for interpreting motion data in a meaningful manner. However, qualitative insights regarding how human body movement is understood and experienced are necessary to inform further development of motion-capture technologies for expressive purposes. Thus, musical performances provide an effective test bed for new modalities of human–computer interaction.
This paper describes the implementation of gestural mapping
strategies for performance with a tr... more This paper describes the implementation of gestural mapping
strategies for performance with a traditional musical
instrument and electronics. The approach adopted is informed
by embodied music cognition and functional categories
of musical gestures. Within this framework, gestures
are not seen as means of control subordinated to the resulting
musical sounds but rather as signicant elements contributing
to the formation of musical meaning similarly to
auditory features. Moreover, the ecological knowledge of
the gestural repertoire of the instrument is taken into account
as it denes the action-sound relationships between
the instrument and the performer and contributes to form
expectations in the listeners. Subsequently, mapping strategies
from a case study of electric guitar performance will be
illustrated describing what motivated the choice of a multimodal
motion capture system and how dierent solutions
have been adopted considering both gestural meaning formation
and technical constraints.
This paper presents a note-by-note approach for automatic solfège assessment. The proposed syst... more This paper presents a note-by-note approach for automatic solfège assessment. The proposed system uses melodic transcription techniques to extract the sung notes from the audio signal, and the sequence of melodic segments is subsequently processed by a two stage algorithm. On the first stage, an aggregation process is introduced to perform the temporal alignment between the transcribed melody and the music score (ground truth). This stage implicitly aggregates and links the best combination of the extracted melodic segments with the expected note in the ground truth. On the second stage, a statistical method is used to evaluate the accuracy of each detected sung note. The technique is implemented using a Bayesian classifier, which is trained using an audio dataset containing individual scores provided by a committee of expert listeners. These individual scores were measured at each musical note, regarding the pitch, onset, and offset accuracy. Experimental results indicate that the classification scheme is suitable to be used as an assessment tool, providing useful feedback to the student.
This work describes a new approach to gesture mapping in a
performance with a traditional musica... more This work describes a new approach to gesture mapping in a
performance with a traditional musical instrument and live
electronics inspired by theories of embodied music cognition
(EMC) and musical gestures. Considerations on EMC and
how gestures aect the experience of music inform dierent
mapping strategies. Our intent is to enhance the expressiveness
and the liveness of performance by tracking gestures
via a multimodal motion capture system and to use motion
data to control several features of the music. We then
describe an application of such approach to a performance
with electric guitar and live electronics, focusing both on
aspects of meaning formation and motion capturing.
Depending on the prosodic choices of the reader, using various acoustic parameters to apply empha... more Depending on the prosodic choices of the reader, using various acoustic parameters to apply emphasis, the meaning of the text may change. In songs, where the text and music work together, the prosody ensures coherence between the intentions of both languages, reducing possible ambiguities. This paper presents a generative system for automatic composition of melodies from lyrics using the prosody of the Portuguese language. This approach is divided into two principal stages. First, the prosody information is extracted from the reading of the lyrics and the captured audio is aligned with the text. The inflection lines from the expressive intonation by the reader associated with the temporal alignment is used to generate a probabilistic set of note transitions. A chain of constraints is applied sequentially in order to define musical scale, harmonic field, prosody and stylistic properties. The second stage focuses on the rhythmical structure. A set of rules based on the Portuguese prosody is used to generate compatible rhythm structures along the text. These two stages generate songs ensuring the correct prosody, which is the main goal of the system. Tests were performed to evaluate this approach with aim to evaluate the capability of the algorithm to generate songs without prosody errors, and also to evaluate how pleasant the songs generated by the machine are, in comparison with songs generated by human composers using the same set of text. The results showed that the proposed system can create pleasant songs while keeping a low number of prosody errors.
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014
ABSTRACT This paper explores a simple yet effective way to generate temporally coherent disparity... more ABSTRACT This paper explores a simple yet effective way to generate temporally coherent disparity maps from binocular video sequences based on kinematic constraints. Given the disparity map at a certain frame, the proposed approach computes the set of possible disparity values for each pixel in the subsequent frame, assuming a maximum displacement constraint (in world coordinates) allowed for each object. These disparity sets are then used to guide the stereo matching procedure in the subsequent frame, generating a temporally coherent disparity map. Experimental results indicate that the proposed approach produces temporally coherent disparity maps comparable to or better than competitive methods.
A machine system is designed to analyze the musical aspects during the live performance, allowing... more A machine system is designed to analyze the musical aspects during the live performance, allowing an interactive and dynamic flow of new expressions and also opening new compositional forms and multimodal methods.
The focus of this approach is to measure the expressiveness from distinct characters during the performance of the musical piece while decisions are made by the machine.
This multimodal approach is motivated by the Three Chamber Micro Songs composition, where measures of loudness and body motion are used by the algorithm to choose a particular musical end.
Body movement and embodied knowledge play an impor- tant part in how we express and understand mu... more Body movement and embodied knowledge play an impor- tant part in how we express and understand music. The gestures of a musician playing an instrument are part of a shared knowledge that con- tributes to musical expressivity by building expectations and influencing perception. In this study, we investigate the extent in which the move- ment vocabulary of violin performance is part of the embodied knowl- edge of individuals with no experience in playing the instrument. We asked people who cannot play the violin to mime a performance along an audio excerpt recorded by an expert. They do so by using a silent violin, specifically modified to be more accessible to neophytes. Prelimi- nary motion data analyses suggest that, despite the individuality of each performance, there is a certain consistency among participants in terms of overall rhythmic resonance with the music and movement in response to melodic phrasing. Individualities and commonalities are then analysed using Functional Principal Component Analysis.
Proceedings of the 2014 International Workshop on Movement and Computing - MOCO '14, 2014
This paper describes the implementation of gestural mapping strategies for performance with a tra... more This paper describes the implementation of gestural mapping strategies for performance with a traditional musical instrument and electronics. The approach adopted is informed by embodied music cognition and functional categories of musical gestures. Within this framework, gestures are not seen as means of control subordinated to the resulting musical sounds but rather as signi cant elements contributing to the formation of musical meaning similarly to auditory features. Moreover, the ecological knowledge of the gestural repertoire of the instrument is taken into account as it de nes the action-sound relationships between the instrument and the performer and contributes to form expectations in the listeners. Subsequently, mapping strategies from a case study of electric guitar performance will be illustrated describing what motivated the choice of a multimodal motion capture system and how di erent solutions have been adopted considering both gestural meaning formation and technical constraints.
This work describes a new approach to gesture mapping in a performance with a traditional musical... more This work describes a new approach to gesture mapping in a performance with a traditional musical instrument and live electronics inspired by theories of embodied music cognition (EMC) and musical gestures. Considerations on EMC and how gestures affect the experience of music inform different mapping strategies. Our intent is to enhance the expressiveness and the liveness of performance by tracking gestures via a multimodal motion capture system and to use motion data to control several features of the music. We then describe an application of such approach to a performance with electric guitar and live electronics, focusing both on aspects of meaning formation and motion capturing.
In Proceedings of 13th International Conference on Systems, Signals and Image Processing (2006), 2006
This paper proposes a new technique for par- allelogram detection using the Tiled Hough Transform... more This paper proposes a new technique for par- allelogram detection using the Tiled Hough Transform. Initially, the edge image is partitioned into rectangular regions (tiles), and the Hough Transform is computed for each tile. Peaks of the Hough image are extracted, and a parallelogram is detected when four extracted peaks satisfy certain geometric conditions. Then, adjacent tile s are grouped
An important problem in image processing is edge detection. Such problem is particularly difficul... more An important problem in image processing is edge detection. Such problem is particularly difficult for noisy imagens containing low contrast ed-ges, because noise typically introduces false edges. In this paper, a new pre-processing technique for simultaneous image denoising and edge enhancement base on wavelets is proposed. Some experimental results indicate that the pro-posed method improves the performance of existing edge detection techniques.
ABSTRACT Computerized systems for e-learning and entertainment have been created in different are... more ABSTRACT Computerized systems for e-learning and entertainment have been created in different areas of knowledge. This work presents a system designed to track and evaluate hand movements for conducting mu-sical measures -binary, ternary and quaternary metrics -identifying for the user the maintenance of tempo and the correctness of hand patterns. The main goal of this work is to aid the study of musical rhythm for beginners, not focusing on conducting styles. The system was developed with computer vision algorithms to detect movements of the hand, and a finite-state machine was used to recog-nize the patterns. Feedback for the user is given through visual information on the screen. The accuracy of the results is verified by an external observer (a conducting specialist), with satisfactory results.
This paper presents a method for automatic music transcription applied to audio recordings of a c... more This paper presents a method for automatic music transcription applied to audio recordings of a cappella performances with multiple singers. We propose a system for multi-pitch detection and voice assignment that integrates an acoustic and a music language model. The acoustic model performs spectrogram decomposition, extending probabilistic latent component analysis (PLCA) using a six-dimensional dictionary with pre-extracted log-spectral templates. The music language model performs voice separation and assignment using hidden Markov models that apply musicological assumptions. By integrating the two models, the system is able to detect multiple concurrent pitches in polyphonic vocal music and assign each detected pitch to a specific voice type such as soprano, alto, tenor or bass (SATB). We compare our system against multiple baselines, achieving state-of-the-art results for both multi-pitch detection and voice assignment on a dataset of Bach chorales and another of barbershop quartets. We also present an additional evaluation of our system using varied pitch tolerance levels to investigate its performance at 20-cent pitch resolution.
Music is a complex multimodal medium experienced not only via sounds but also through body moveme... more Music is a complex multimodal medium experienced not only via sounds but also through body movement. Musical instruments can be seen as technological objects coupled with a repertoire of gestures. We present technical and conceptual issues related to the digital representation and mediation of body movement in musical performance. The paper reports on a case study of a musical performance where motion sensor technologies tracked the movements of the musicians while they played their instruments. Motion data were used to control the electronic elements of the piece in real time. It is suggested that computable motion descriptors and machine learning techniques are useful tools for interpreting motion data in a meaningful manner. However, qualitative insights regarding how human body movement is understood and experienced are necessary to inform further development of motion-capture technologies for expressive purposes. Thus, musical performances provide an effective test bed for new modalities of human–computer interaction.
This paper describes the implementation of gestural mapping
strategies for performance with a tr... more This paper describes the implementation of gestural mapping
strategies for performance with a traditional musical
instrument and electronics. The approach adopted is informed
by embodied music cognition and functional categories
of musical gestures. Within this framework, gestures
are not seen as means of control subordinated to the resulting
musical sounds but rather as signicant elements contributing
to the formation of musical meaning similarly to
auditory features. Moreover, the ecological knowledge of
the gestural repertoire of the instrument is taken into account
as it denes the action-sound relationships between
the instrument and the performer and contributes to form
expectations in the listeners. Subsequently, mapping strategies
from a case study of electric guitar performance will be
illustrated describing what motivated the choice of a multimodal
motion capture system and how dierent solutions
have been adopted considering both gestural meaning formation
and technical constraints.
This paper presents a note-by-note approach for automatic solfège assessment. The proposed syst... more This paper presents a note-by-note approach for automatic solfège assessment. The proposed system uses melodic transcription techniques to extract the sung notes from the audio signal, and the sequence of melodic segments is subsequently processed by a two stage algorithm. On the first stage, an aggregation process is introduced to perform the temporal alignment between the transcribed melody and the music score (ground truth). This stage implicitly aggregates and links the best combination of the extracted melodic segments with the expected note in the ground truth. On the second stage, a statistical method is used to evaluate the accuracy of each detected sung note. The technique is implemented using a Bayesian classifier, which is trained using an audio dataset containing individual scores provided by a committee of expert listeners. These individual scores were measured at each musical note, regarding the pitch, onset, and offset accuracy. Experimental results indicate that the classification scheme is suitable to be used as an assessment tool, providing useful feedback to the student.
This work describes a new approach to gesture mapping in a
performance with a traditional musica... more This work describes a new approach to gesture mapping in a
performance with a traditional musical instrument and live
electronics inspired by theories of embodied music cognition
(EMC) and musical gestures. Considerations on EMC and
how gestures aect the experience of music inform dierent
mapping strategies. Our intent is to enhance the expressiveness
and the liveness of performance by tracking gestures
via a multimodal motion capture system and to use motion
data to control several features of the music. We then
describe an application of such approach to a performance
with electric guitar and live electronics, focusing both on
aspects of meaning formation and motion capturing.
Depending on the prosodic choices of the reader, using various acoustic parameters to apply empha... more Depending on the prosodic choices of the reader, using various acoustic parameters to apply emphasis, the meaning of the text may change. In songs, where the text and music work together, the prosody ensures coherence between the intentions of both languages, reducing possible ambiguities. This paper presents a generative system for automatic composition of melodies from lyrics using the prosody of the Portuguese language. This approach is divided into two principal stages. First, the prosody information is extracted from the reading of the lyrics and the captured audio is aligned with the text. The inflection lines from the expressive intonation by the reader associated with the temporal alignment is used to generate a probabilistic set of note transitions. A chain of constraints is applied sequentially in order to define musical scale, harmonic field, prosody and stylistic properties. The second stage focuses on the rhythmical structure. A set of rules based on the Portuguese prosody is used to generate compatible rhythm structures along the text. These two stages generate songs ensuring the correct prosody, which is the main goal of the system. Tests were performed to evaluate this approach with aim to evaluate the capability of the algorithm to generate songs without prosody errors, and also to evaluate how pleasant the songs generated by the machine are, in comparison with songs generated by human composers using the same set of text. The results showed that the proposed system can create pleasant songs while keeping a low number of prosody errors.
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014
ABSTRACT This paper explores a simple yet effective way to generate temporally coherent disparity... more ABSTRACT This paper explores a simple yet effective way to generate temporally coherent disparity maps from binocular video sequences based on kinematic constraints. Given the disparity map at a certain frame, the proposed approach computes the set of possible disparity values for each pixel in the subsequent frame, assuming a maximum displacement constraint (in world coordinates) allowed for each object. These disparity sets are then used to guide the stereo matching procedure in the subsequent frame, generating a temporally coherent disparity map. Experimental results indicate that the proposed approach produces temporally coherent disparity maps comparable to or better than competitive methods.
A machine system is designed to analyze the musical aspects during the live performance, allowing... more A machine system is designed to analyze the musical aspects during the live performance, allowing an interactive and dynamic flow of new expressions and also opening new compositional forms and multimodal methods.
The focus of this approach is to measure the expressiveness from distinct characters during the performance of the musical piece while decisions are made by the machine.
This multimodal approach is motivated by the Three Chamber Micro Songs composition, where measures of loudness and body motion are used by the algorithm to choose a particular musical end.
Body movement and embodied knowledge play an impor- tant part in how we express and understand mu... more Body movement and embodied knowledge play an impor- tant part in how we express and understand music. The gestures of a musician playing an instrument are part of a shared knowledge that con- tributes to musical expressivity by building expectations and influencing perception. In this study, we investigate the extent in which the move- ment vocabulary of violin performance is part of the embodied knowl- edge of individuals with no experience in playing the instrument. We asked people who cannot play the violin to mime a performance along an audio excerpt recorded by an expert. They do so by using a silent violin, specifically modified to be more accessible to neophytes. Prelimi- nary motion data analyses suggest that, despite the individuality of each performance, there is a certain consistency among participants in terms of overall rhythmic resonance with the music and movement in response to melodic phrasing. Individualities and commonalities are then analysed using Functional Principal Component Analysis.
Proceedings of the 2014 International Workshop on Movement and Computing - MOCO '14, 2014
This paper describes the implementation of gestural mapping strategies for performance with a tra... more This paper describes the implementation of gestural mapping strategies for performance with a traditional musical instrument and electronics. The approach adopted is informed by embodied music cognition and functional categories of musical gestures. Within this framework, gestures are not seen as means of control subordinated to the resulting musical sounds but rather as signi cant elements contributing to the formation of musical meaning similarly to auditory features. Moreover, the ecological knowledge of the gestural repertoire of the instrument is taken into account as it de nes the action-sound relationships between the instrument and the performer and contributes to form expectations in the listeners. Subsequently, mapping strategies from a case study of electric guitar performance will be illustrated describing what motivated the choice of a multimodal motion capture system and how di erent solutions have been adopted considering both gestural meaning formation and technical constraints.
This work describes a new approach to gesture mapping in a performance with a traditional musical... more This work describes a new approach to gesture mapping in a performance with a traditional musical instrument and live electronics inspired by theories of embodied music cognition (EMC) and musical gestures. Considerations on EMC and how gestures affect the experience of music inform different mapping strategies. Our intent is to enhance the expressiveness and the liveness of performance by tracking gestures via a multimodal motion capture system and to use motion data to control several features of the music. We then describe an application of such approach to a performance with electric guitar and live electronics, focusing both on aspects of meaning formation and motion capturing.
In Proceedings of 13th International Conference on Systems, Signals and Image Processing (2006), 2006
This paper proposes a new technique for par- allelogram detection using the Tiled Hough Transform... more This paper proposes a new technique for par- allelogram detection using the Tiled Hough Transform. Initially, the edge image is partitioned into rectangular regions (tiles), and the Hough Transform is computed for each tile. Peaks of the Hough image are extracted, and a parallelogram is detected when four extracted peaks satisfy certain geometric conditions. Then, adjacent tile s are grouped
An important problem in image processing is edge detection. Such problem is particularly difficul... more An important problem in image processing is edge detection. Such problem is particularly difficult for noisy imagens containing low contrast ed-ges, because noise typically introduces false edges. In this paper, a new pre-processing technique for simultaneous image denoising and edge enhancement base on wavelets is proposed. Some experimental results indicate that the pro-posed method improves the performance of existing edge detection techniques.
ABSTRACT Computerized systems for e-learning and entertainment have been created in different are... more ABSTRACT Computerized systems for e-learning and entertainment have been created in different areas of knowledge. This work presents a system designed to track and evaluate hand movements for conducting mu-sical measures -binary, ternary and quaternary metrics -identifying for the user the maintenance of tempo and the correctness of hand patterns. The main goal of this work is to aid the study of musical rhythm for beginners, not focusing on conducting styles. The system was developed with computer vision algorithms to detect movements of the hand, and a finite-state machine was used to recog-nize the patterns. Feedback for the user is given through visual information on the screen. The accuracy of the results is verified by an external observer (a conducting specialist), with satisfactory results.
Uploads
Papers by Rodrigo Schramm
strategies for performance with a traditional musical
instrument and electronics. The approach adopted is informed
by embodied music cognition and functional categories
of musical gestures. Within this framework, gestures
are not seen as means of control subordinated to the resulting
musical sounds but rather as signicant elements contributing
to the formation of musical meaning similarly to
auditory features. Moreover, the ecological knowledge of
the gestural repertoire of the instrument is taken into account
as it denes the action-sound relationships between
the instrument and the performer and contributes to form
expectations in the listeners. Subsequently, mapping strategies
from a case study of electric guitar performance will be
illustrated describing what motivated the choice of a multimodal
motion capture system and how dierent solutions
have been adopted considering both gestural meaning formation
and technical constraints.
performance with a traditional musical instrument and live
electronics inspired by theories of embodied music cognition
(EMC) and musical gestures. Considerations on EMC and
how gestures aect the experience of music inform dierent
mapping strategies. Our intent is to enhance the expressiveness
and the liveness of performance by tracking gestures
via a multimodal motion capture system and to use motion
data to control several features of the music. We then
describe an application of such approach to a performance
with electric guitar and live electronics, focusing both on
aspects of meaning formation and motion capturing.
The focus of this approach is to measure the expressiveness from distinct characters during the performance of the musical piece while decisions are made by the machine.
This multimodal approach is motivated by the Three Chamber Micro Songs composition, where measures of loudness and body motion are used by the algorithm to choose a particular musical end.
strategies for performance with a traditional musical
instrument and electronics. The approach adopted is informed
by embodied music cognition and functional categories
of musical gestures. Within this framework, gestures
are not seen as means of control subordinated to the resulting
musical sounds but rather as signicant elements contributing
to the formation of musical meaning similarly to
auditory features. Moreover, the ecological knowledge of
the gestural repertoire of the instrument is taken into account
as it denes the action-sound relationships between
the instrument and the performer and contributes to form
expectations in the listeners. Subsequently, mapping strategies
from a case study of electric guitar performance will be
illustrated describing what motivated the choice of a multimodal
motion capture system and how dierent solutions
have been adopted considering both gestural meaning formation
and technical constraints.
performance with a traditional musical instrument and live
electronics inspired by theories of embodied music cognition
(EMC) and musical gestures. Considerations on EMC and
how gestures aect the experience of music inform dierent
mapping strategies. Our intent is to enhance the expressiveness
and the liveness of performance by tracking gestures
via a multimodal motion capture system and to use motion
data to control several features of the music. We then
describe an application of such approach to a performance
with electric guitar and live electronics, focusing both on
aspects of meaning formation and motion capturing.
The focus of this approach is to measure the expressiveness from distinct characters during the performance of the musical piece while decisions are made by the machine.
This multimodal approach is motivated by the Three Chamber Micro Songs composition, where measures of loudness and body motion are used by the algorithm to choose a particular musical end.