Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

    Antonio Bonafonte

    In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) is not fixed a priori. Instead, first, large memories are accepted and afterwards, merging criteria are applied to reduce complexity and to... more
    In this paper, an extension of n-grams is proposed. In this extension, the memory of the model (n) is not fixed a priori. Instead, first, large memories are accepted and afterwards, merging criteria are applied to reduce complexity and to ensure reliable estimations. The results show how the perplexity obtained with x-grams is smaller than that of n-grams. Furthermore, the
    Research Interests:
    Page 1. Towards Robust Glottal Source Modeling Javier Pérez, Antonio Bonafonte Department of Signal Theory and Communication TALP Research Center Technical University of Catalonia (UPC), Barcelona, Spain {javierp,antonio}@gps.tsc.upc.edu... more
    Page 1. Towards Robust Glottal Source Modeling Javier Pérez, Antonio Bonafonte Department of Signal Theory and Communication TALP Research Center Technical University of Catalonia (UPC), Barcelona, Spain {javierp,antonio}@gps.tsc.upc.edu ...
    Applying a recently presented text-independent speech alignment technique based on unit selection to the training of a voice conversion system suggested that the more training data was available, the less speaker-specific information was... more
    Applying a recently presented text-independent speech alignment technique based on unit selection to the training of a voice conversion system suggested that the more training data was available, the less speaker-specific information was learned. This paradoxical effect contradicts experience we have from other corpus-based applications as speech recognition or synthesis. There, the performance usually gains with increasing amount of data.
    Research Interests:
    There are many exhaustive works that deal with the use of models for segmental duration. The aim of this paper is to evaluate some of the properties mentioned in literature and evaluate factorial and sum-of-products models in front of a... more
    There are many exhaustive works that deal with the use of models for segmental duration. The aim of this paper is to evaluate some of the properties mentioned in literature and evaluate factorial and sum-of-products models in front of a list- like approach for Catalan language as a base for a most exhaustive study on duration in this language. Sum-of-products
    ABSTRACT
    ABSTRACT
    Resumen En este artıculo se presentan dos nuevos sistemas para las segmentación de voz en fonemas. Uno basado en un clustering acústico previo a un alineado por programación dinámica y el segundo basado en una corrección especıfica de las... more
    Resumen En este artıculo se presentan dos nuevos sistemas para las segmentación de voz en fonemas. Uno basado en un clustering acústico previo a un alineado por programación dinámica y el segundo basado en una corrección especıfica de las fronteras mediante un ...
    Research Interests:
    Unit selection speech synthesis techniques lead the speech synthesis state of the art. Automatic segmentation of databases is necessary in order to build new voices. They may contain errors and segmentation processes may introduce some... more
    Unit selection speech synthesis techniques lead the speech synthesis state of the art. Automatic segmentation of databases is necessary in order to build new voices. They may contain errors and segmentation processes may introduce some more. Quality systems require a significant effort to find and correct these segmentation errors. Phonetic transcription is crucial and is one of the manually supervised
    Hidden Markov Modeling (HMM) techniques have been applied successfully to speech recognition problems. However, it has been claimed [1]-[5] that a major weakness of HMM is that the state duration probability density functions (SDPDF) are... more
    Hidden Markov Modeling (HMM) techniques have been applied successfully to speech recognition problems. However, it has been claimed [1]-[5] that a major weakness of HMM is that the state duration probability density functions (SDPDF) are exponential, which is not appropriate for speech signals. In order to cope with this deficiency some authors have proposed to model explicitly the state duration.
    Abstract Many of the research efforts in voice morphing, or also called voice conversion (VC), has been carried out in the field of vocal tract mapping. It has been studied that in the vocal tract parameters there is the most relevant... more
    Abstract Many of the research efforts in voice morphing, or also called voice conversion (VC), has been carried out in the field of vocal tract mapping. It has been studied that in the vocal tract parameters there is the most relevant part of the information about speaker ...
    ... The selected context dependent phones are those mphones and rigth context dependent phones which appear more than 100 times on the acoustic training data. 3 4 5 4. EVALUATION OF SETHOS 91.3 I 75.8 91.9 I 77.5 91.3 76.0 ...
    In this paper, the occupancy of the HMM states is modeled by means of a Markov chain. A linear estimator is introduced to compute the probabilities of the Markov chain. The distribution functions (DF) represents accurately the observed... more
    In this paper, the occupancy of the HMM states is modeled by means of a Markov chain. A linear estimator is introduced to compute the probabilities of the Markov chain. The distribution functions (DF) represents accurately the observed data. Representing the DF as a Markov chain allows the use of standard HMM recognizers. The increase of complexity is negligible in
    The synthesis quality is influenced by many important factors, among which the correctness of the grapheme-to-phoneme (g2p) conversion is one of the crucial ones. Automatic letter-to-sound systems have been in the center of attention for... more
    The synthesis quality is influenced by many important factors, among which the correctness of the grapheme-to-phoneme (g2p) conversion is one of the crucial ones. Automatic letter-to-sound systems have been in the center of attention for the last decade. One of the most effective and promising methods resulted to be the so-called ldquopronunciation by analogyrdquo method, based on the analogy in the grapheme context, allowing derivation of the correct pronunciation for a new word from the parts of similar words present in the dictionary. This paper aims at further development of this method. Novel scoring strategies for determining the best pronunciations were proposed. A word error rate reduction of 1.5-2.5 percent is obtained. A detailed analysis shows that one of the new strategies consistently outperforms the others. The results obtained are compared to other g2p methods using the same data.
    This paper presents the baseline text-to-speech system developed at UPC (Ogmios) plus our recent work on speech prosody generation and the procedures to create high quality language resources for speech synthesis. These contributions have... more
    This paper presents the baseline text-to-speech system developed at UPC (Ogmios) plus our recent work on speech prosody generation and the procedures to create high quality language resources for speech synthesis. These contributions have been evaluated within the TC-STAR European project, which is focused on speech-to-speech translation. Several presented contributions have been developed in order to adapt the TTS component
    ... of IEEE Conf. on Computer Vision, Puerto Rico, 1997. [9] С Padgett, G. Cottrell, Identifyingemotion in static face images, in Proc. Of the 2nd Joint Symp. on Neural Computation, Vol.5, pp.91-101, La Jolla, CA, Uni. of California, San... more
    ... of IEEE Conf. on Computer Vision, Puerto Rico, 1997. [9] С Padgett, G. Cottrell, Identifyingemotion in static face images, in Proc. Of the 2nd Joint Symp. on Neural Computation, Vol.5, pp.91-101, La Jolla, CA, Uni. of California, San Diego. ...
    ... Full-size table. View Within Article. As training material we have used phonetically balanced sentences uttered by task-independent 680 speakers from four dialectal zones and including over 236 000 phonemes. This corpus comprises five... more
    ... Full-size table. View Within Article. As training material we have used phonetically balanced sentences uttered by task-independent 680 speakers from four dialectal zones and including over 236 000 phonemes. This corpus comprises five hours and a half of continuous speech. ...
    ABSTRACT In the literature many intonation models are trained using pa-rameters extracted sentence-by-sentence on contours interpolated in the unvoiced segments. This may introduce a bias in the final param-eters and a reduction of the... more
    ABSTRACT In the literature many intonation models are trained using pa-rameters extracted sentence-by-sentence on contours interpolated in the unvoiced segments. This may introduce a bias in the final param-eters and a reduction of the generalization of the model due to ...

    And 66 more