This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project ... more This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project whose primary aim was to develop a speaker recognition and identification system for forensics. The present corpus was designed for the purpose of analysis of selected paralinguistic features in continuous speech and for preliminary examination of the vocal display of affective states. The recorded (and annotated) data include conversational speech in the form of task-oriented dialogues, emotional utterances (realized as emotion portrayals), and an acted court scene. As a reference material, a short read text was provided by each of the speakers.
The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis... more The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The next section focuses on the implementation of Polish TTS modules in BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.
This paper reports on the work directed at creating a framework for investigation of affective st... more This paper reports on the work directed at creating a framework for investigation of affective states and paralinguistic features in speech and their role in describing individual features of human voices. The work was carried out within a research-development project whose primary aim was to develop a speaker recognition and identification system mainly for forensic applications. The present paper describes the methods and preliminary results of examination of the choice of lexical means, vocal communication of affective states and voice quality features using "Paralingua‖ corpus, and introduces "Annotation System" - a novel tool designed specifically for annotation of paralinguistic features.
This paper proposes a framework of F0 contour generation and segmental duration modeling for appl... more This paper proposes a framework of F0 contour generation and segmental duration modeling for application in a unit-selection speech synthesis system for Polish – BOSS. We describe the design of the F0 and duration modeling modules and emphasize the role of prosodic features (related to stress, pitch accent and phrase) in these two tasks.
In the paper an overview of the Polish Speech Database for taking dictation of legal texts. creat... more In the paper an overview of the Polish Speech Database for taking dictation of legal texts. created for the purpose of L VCSR system for Polish in the frame of Polish Platform for Homeland Security (PPBW) is presented. Basic information about the design of the database is provided as well as the applied method of the text corpora construction and
The paper provides an overview of the Polish Speech Database for taking dictation of legal texts,... more The paper provides an overview of the Polish Speech Database for taking dictation of legal texts, crea ted for the purpose of LVCSR system for Polish. It presents background informati on about the design of the database and the require ments coming from its future uses. The applied method of the text corpora constr uction is presented as well as
This paper presents data on the most common syllable patterns in Polish based on corpora of appro... more This paper presents data on the most common syllable patterns in Polish based on corpora of approximately 40 minutes of read speech as well as on a word list of nearly 700 000 items. First, the results of statistical analysis concerning the frequency of occurrence for the possible syllable patterns in Polish are described. Then, chosen problems connected with segmental
This paper presents the results of acoustic modeling used in a Large Vocabulary Continuous Speech... more This paper presents the results of acoustic modeling used in a Large Vocabulary Continuous Speech Recognition (LVCSR) system designed with the use of a phonetically controlled large vocabulary corpus. Evaluation experiments showed that relatively good speech recognition results may be obtained with adequate training material, taking into account: a) the presence of lexical stress; b) speech styles (a variety of segmental and prosodic structures, various degree of spontaneity of speech, various pronunciation variants and dialects); c) the influence of the sound level and environment noise. Moreover, the article includes information about the speech corpus structure and also an outline of the design of the speech recognition system.
This paper presents the results of the pilot survey of the acoustic models obtained from the Poli... more This paper presents the results of the pilot survey of the acoustic models obtained from the Polish Speech Database for taking dictation of legal texts, created for the needs of the first LVCSR system for Polish (JURISDIC). Additionally, background information about the design of the database is presented along with the description of the applied methods of the corpus construction and current statistics of the data- base contents.
The study presents experimental data on Polish vowel durations in consonantal contexts gathered t... more The study presents experimental data on Polish vowel durations in consonantal contexts gathered to test prosodic hypotheses. An attempt is made to verify the signif icance of the process of balancing V-to-V durations in a dyna mical model of speech rhythm applied to Polish. We report on the results of a controlled experiment followed by a qu ery of a
The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis... more The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The next section focuses on the implementation of Polish TTS modules in BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.
Bulletin of the Polish Academy of Sciences: Technical Sciences, 2000
ABSTRACT The Bonn Open Synthesis System (BOSS) is an open-source software for the unit selection ... more ABSTRACT The Bonn Open Synthesis System (BOSS) is an open-source software for the unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The subsequent sections focus on the implementation of Polish TTS modules in the BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.
ABSTRACT This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing... more ABSTRACT This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project whose primary aim was to develop a speaker recognition and identification system for forensics. The present corpus was designed for the purpose of analysis of selected paralinguistic features in continuous speech and for preliminary examination of the vocal display of affective states. The recorded (and annotated) data include conversational speech in the form of task-oriented dialogues, emotional utterances (realized as emotion portrayals), and an acted court scene. As a reference material, a short read text was provided by each of the speakers.
In this paper, some measures of cross-modal interactions are
proposed and implemented in the anal... more In this paper, some measures of cross-modal interactions are proposed and implemented in the analysis of a multimodal corpus of task-oriented dialogues. The corpus includes multilevel annotations of speakers' verbal and gestural behaviour, e.g., hand gestures, gaze direction, utterance content or intonational phrasing. A moving time-window approach is adopted to analyse changes in the communicative behaviour of dialogue participants over time. The study is focused on how gestures and speech of the Instruction Giver influence the speech of the Instruction Follower in the course of dialogue.
Prosody of uncertainty
Two experiments were carried out in order to find how global prosodic par... more Prosody of uncertainty
Two experiments were carried out in order to find how global prosodic parameters of Polish utterances influenced their classification by Polish native speakers in the dimension of “certainty.” In Experiment 1, subjects listened to a set of resynthesized and manipulated utterances, in Experiment 2, subjects listened to prime-stimulus pairs where the prime was a manipulated and resynthesized pitch trace of a real utterance while the stimulus was a regular utterance. In Experiment 1, global pitch height, pitch range as well as speech tempo of the stimulus were manipulated. In Experiment 2, only pitch range and average pitch height of the prime was manipulated as primes did not have a segmental structure. The group of subjects consisted of sixteen students, predominantly females. It was shown that utterances produced in a lower voice and at slower speech rates were perceived as showing a higher degree of certainty while those produced in a higher pitch were mostly categorized as showing a low degree of certainty. In the second experiment, it was observed that the pitch contour of the prime did not influence perception of the stimulus in a significant way. Most of our results are coherent with those found in literature for other languages. However, there are also differences that may be attributed to culture-dependent aspects of paralinguistic prosody. Further research is necessary to explore and explain those discrepancies.
This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project ... more This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project whose primary aim was to develop a speaker recognition and identification system for forensics. The present corpus was designed for the purpose of analysis of selected paralinguistic features in continuous speech and for preliminary examination of the vocal display of affective states. The recorded (and annotated) data include conversational speech in the form of task-oriented dialogues, emotional utterances (realized as emotion portrayals), and an acted court scene. As a reference material, a short read text was provided by each of the speakers.
The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis... more The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The next section focuses on the implementation of Polish TTS modules in BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.
This paper reports on the work directed at creating a framework for investigation of affective st... more This paper reports on the work directed at creating a framework for investigation of affective states and paralinguistic features in speech and their role in describing individual features of human voices. The work was carried out within a research-development project whose primary aim was to develop a speaker recognition and identification system mainly for forensic applications. The present paper describes the methods and preliminary results of examination of the choice of lexical means, vocal communication of affective states and voice quality features using "Paralingua‖ corpus, and introduces "Annotation System" - a novel tool designed specifically for annotation of paralinguistic features.
This paper proposes a framework of F0 contour generation and segmental duration modeling for appl... more This paper proposes a framework of F0 contour generation and segmental duration modeling for application in a unit-selection speech synthesis system for Polish – BOSS. We describe the design of the F0 and duration modeling modules and emphasize the role of prosodic features (related to stress, pitch accent and phrase) in these two tasks.
In the paper an overview of the Polish Speech Database for taking dictation of legal texts. creat... more In the paper an overview of the Polish Speech Database for taking dictation of legal texts. created for the purpose of L VCSR system for Polish in the frame of Polish Platform for Homeland Security (PPBW) is presented. Basic information about the design of the database is provided as well as the applied method of the text corpora construction and
The paper provides an overview of the Polish Speech Database for taking dictation of legal texts,... more The paper provides an overview of the Polish Speech Database for taking dictation of legal texts, crea ted for the purpose of LVCSR system for Polish. It presents background informati on about the design of the database and the require ments coming from its future uses. The applied method of the text corpora constr uction is presented as well as
This paper presents data on the most common syllable patterns in Polish based on corpora of appro... more This paper presents data on the most common syllable patterns in Polish based on corpora of approximately 40 minutes of read speech as well as on a word list of nearly 700 000 items. First, the results of statistical analysis concerning the frequency of occurrence for the possible syllable patterns in Polish are described. Then, chosen problems connected with segmental
This paper presents the results of acoustic modeling used in a Large Vocabulary Continuous Speech... more This paper presents the results of acoustic modeling used in a Large Vocabulary Continuous Speech Recognition (LVCSR) system designed with the use of a phonetically controlled large vocabulary corpus. Evaluation experiments showed that relatively good speech recognition results may be obtained with adequate training material, taking into account: a) the presence of lexical stress; b) speech styles (a variety of segmental and prosodic structures, various degree of spontaneity of speech, various pronunciation variants and dialects); c) the influence of the sound level and environment noise. Moreover, the article includes information about the speech corpus structure and also an outline of the design of the speech recognition system.
This paper presents the results of the pilot survey of the acoustic models obtained from the Poli... more This paper presents the results of the pilot survey of the acoustic models obtained from the Polish Speech Database for taking dictation of legal texts, created for the needs of the first LVCSR system for Polish (JURISDIC). Additionally, background information about the design of the database is presented along with the description of the applied methods of the corpus construction and current statistics of the data- base contents.
The study presents experimental data on Polish vowel durations in consonantal contexts gathered t... more The study presents experimental data on Polish vowel durations in consonantal contexts gathered to test prosodic hypotheses. An attempt is made to verify the signif icance of the process of balancing V-to-V durations in a dyna mical model of speech rhythm applied to Polish. We report on the results of a controlled experiment followed by a qu ery of a
The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis... more The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The next section focuses on the implementation of Polish TTS modules in BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.
Bulletin of the Polish Academy of Sciences: Technical Sciences, 2000
ABSTRACT The Bonn Open Synthesis System (BOSS) is an open-source software for the unit selection ... more ABSTRACT The Bonn Open Synthesis System (BOSS) is an open-source software for the unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The subsequent sections focus on the implementation of Polish TTS modules in the BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.
ABSTRACT This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing... more ABSTRACT This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project whose primary aim was to develop a speaker recognition and identification system for forensics. The present corpus was designed for the purpose of analysis of selected paralinguistic features in continuous speech and for preliminary examination of the vocal display of affective states. The recorded (and annotated) data include conversational speech in the form of task-oriented dialogues, emotional utterances (realized as emotion portrayals), and an acted court scene. As a reference material, a short read text was provided by each of the speakers.
In this paper, some measures of cross-modal interactions are
proposed and implemented in the anal... more In this paper, some measures of cross-modal interactions are proposed and implemented in the analysis of a multimodal corpus of task-oriented dialogues. The corpus includes multilevel annotations of speakers' verbal and gestural behaviour, e.g., hand gestures, gaze direction, utterance content or intonational phrasing. A moving time-window approach is adopted to analyse changes in the communicative behaviour of dialogue participants over time. The study is focused on how gestures and speech of the Instruction Giver influence the speech of the Instruction Follower in the course of dialogue.
Prosody of uncertainty
Two experiments were carried out in order to find how global prosodic par... more Prosody of uncertainty
Two experiments were carried out in order to find how global prosodic parameters of Polish utterances influenced their classification by Polish native speakers in the dimension of “certainty.” In Experiment 1, subjects listened to a set of resynthesized and manipulated utterances, in Experiment 2, subjects listened to prime-stimulus pairs where the prime was a manipulated and resynthesized pitch trace of a real utterance while the stimulus was a regular utterance. In Experiment 1, global pitch height, pitch range as well as speech tempo of the stimulus were manipulated. In Experiment 2, only pitch range and average pitch height of the prime was manipulated as primes did not have a segmental structure. The group of subjects consisted of sixteen students, predominantly females. It was shown that utterances produced in a lower voice and at slower speech rates were perceived as showing a higher degree of certainty while those produced in a higher pitch were mostly categorized as showing a low degree of certainty. In the second experiment, it was observed that the pitch contour of the prime did not influence perception of the stimulus in a significant way. Most of our results are coherent with those found in literature for other languages. However, there are also differences that may be attributed to culture-dependent aspects of paralinguistic prosody. Further research is necessary to explore and explain those discrepancies.
Uploads
proposed and implemented in the analysis of a multimodal
corpus of task-oriented dialogues. The corpus includes multilevel
annotations of speakers' verbal and gestural behaviour,
e.g., hand gestures, gaze direction, utterance content or
intonational phrasing. A moving time-window approach is
adopted to analyse changes in the communicative behaviour of
dialogue participants over time. The study is focused on how
gestures and speech of the Instruction Giver influence the
speech of the Instruction Follower in the course of dialogue.
Two experiments were carried out in order to find how global prosodic parameters of Polish utterances influenced their classification by Polish native speakers in the dimension of “certainty.” In Experiment 1, subjects listened to a set of resynthesized and manipulated utterances, in Experiment 2, subjects listened to prime-stimulus pairs where the prime was a manipulated and resynthesized pitch trace of a real utterance while the stimulus was a regular utterance. In Experiment 1, global pitch height, pitch range as well as speech tempo of the stimulus were manipulated. In Experiment 2, only pitch range and average pitch height of the prime was manipulated as primes did not have a segmental structure. The group of subjects consisted of sixteen students, predominantly females. It was shown that utterances produced in a lower voice and at slower speech rates were perceived as showing a higher degree of certainty while those produced in a higher pitch were mostly categorized as showing a low degree of certainty. In the second experiment, it was observed that the pitch contour of the prime did not influence perception of the stimulus in a significant way. Most of our results are coherent with those found in literature for other languages. However, there are also differences that may be attributed to culture-dependent aspects of paralinguistic prosody. Further research is necessary to explore and explain those discrepancies.
proposed and implemented in the analysis of a multimodal
corpus of task-oriented dialogues. The corpus includes multilevel
annotations of speakers' verbal and gestural behaviour,
e.g., hand gestures, gaze direction, utterance content or
intonational phrasing. A moving time-window approach is
adopted to analyse changes in the communicative behaviour of
dialogue participants over time. The study is focused on how
gestures and speech of the Instruction Giver influence the
speech of the Instruction Follower in the course of dialogue.
Two experiments were carried out in order to find how global prosodic parameters of Polish utterances influenced their classification by Polish native speakers in the dimension of “certainty.” In Experiment 1, subjects listened to a set of resynthesized and manipulated utterances, in Experiment 2, subjects listened to prime-stimulus pairs where the prime was a manipulated and resynthesized pitch trace of a real utterance while the stimulus was a regular utterance. In Experiment 1, global pitch height, pitch range as well as speech tempo of the stimulus were manipulated. In Experiment 2, only pitch range and average pitch height of the prime was manipulated as primes did not have a segmental structure. The group of subjects consisted of sixteen students, predominantly females. It was shown that utterances produced in a lower voice and at slower speech rates were perceived as showing a higher degree of certainty while those produced in a higher pitch were mostly categorized as showing a low degree of certainty. In the second experiment, it was observed that the pitch contour of the prime did not influence perception of the stimulus in a significant way. Most of our results are coherent with those found in literature for other languages. However, there are also differences that may be attributed to culture-dependent aspects of paralinguistic prosody. Further research is necessary to explore and explain those discrepancies.