Katarzyna Klessa | Adam Mickiewicz University in Poznań - Academia.edu

Skip to main content

Katarzyna Klessa

Adam Mickiewicz University in Poznań, Faculty of Modern Languages and Literature, Adjunct

Followers

32

Following

8

Co-authors

3

Public Views

Address: www.katarzyna.klessa.pl

less

Interests

Uploads

Papers

Klessa, K., Wagner, A., Oleśkowicz-Popiel, M., Karpiński, M. 2013. "Paralingua" - a new speech corpus for the studies of paralinguistic features. Procedia - Social and Behavioral Science. Vol. 95. pp. 48-58. Elsevier ISSN: 1877-0428

by Katarzyna Klessa, Maciej Karpiński, and Magdalena Oleśkowicz-Popiel

This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project ... more This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project whose primary aim was to develop a speaker recognition and identification system for forensics. The present corpus was designed for the purpose of analysis of selected paralinguistic features in continuous speech and for preliminary examination of the vocal display of affective states. The recorded (and annotated) data include conversational speech in the form of task-oriented dialogues, emotional utterances (realized as emotion portrayals), and an acted court scene. As a reference material, a short read text was provided by each of the speakers.

The design of Polish Speech Corpus for Unit Selection Speech Synthesis

The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis... more The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The next section focuses on the implementation of Polish TTS modules in BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.

Is laboratory evoked infant directed speech significantly different from adult directed speech?

Using "Paralingua" database for investigation of affective states and paralinguistic features

This paper reports on the work directed at creating a framework for investigation of affective st... more This paper reports on the work directed at creating a framework for investigation of affective states and paralinguistic features in speech and their role in describing individual features of human voices. The work was carried out within a research-development project whose primary aim was to develop a speaker recognition and identification system mainly for forensic applications. The present paper describes the methods and preliminary results of examination of the choice of lexical means, vocal communication of affective states and voice quality features using "Paralingua‖ corpus, and introduces "Annotation System" - a novel tool designed specifically for annotation of paralinguistic features.

F0 contour and segmental duration modeling using prosodic features

This paper proposes a framework of F0 contour generation and segmental duration modeling for appl... more This paper proposes a framework of F0 contour generation and segmental duration modeling for application in a unit-selection speech synthesis system for Polish – BOSS. We describe the design of the F0 and duration modeling modules and emphasize the role of prosodic features (related to stress, pitch accent and phrase) in these two tasks.

Structure and annotation of Polish LVCSR speech database

Annual Conference of the International Speech Communication Association, 2009

LVCSR Speech Database - JURISDIC

In the paper an overview of the Polish Speech Database for taking dictation of legal texts. creat... more In the paper an overview of the Polish Speech Database for taking dictation of legal texts. created for the purpose of L VCSR system for Polish in the frame of Polish Platform for Homeland Security (PPBW) is presented. Basic information about the design of the database is provided as well as the applied method of the text corpora construction and

JURISDIC: Polish Speech Database for Taking Dictation of Legal Texts

Language Resources and Evaluation, 2008

The paper provides an overview of the Polish Speech Database for taking dictation of legal texts,... more The paper provides an overview of the Polish Speech Database for taking dictation of legal texts, crea ted for the purpose of LVCSR system for Polish. It presents background informati on about the design of the database and the require ments coming from its future uses. The applied method of the text corpora constr uction is presented as well as

A Study of Chosen Temporal Relations within Syllable Structure in Polish* Analiza wybranych związków iloczasowych zachodzących w obrębie sylaby w języku polskim

This paper presents data on the most common syllable patterns in Polish based on corpora of appro... more This paper presents data on the most common syllable patterns in Polish based on corpora of approximately 40 minutes of read speech as well as on a word list of nearly 700 000 items. First, the results of statistical analysis concerning the frequency of occurrence for the possible syllable patterns in Polish are described. Then, chosen problems connected with segmental

Development of large vocabulary continuous speech recognition using phonetically structured speech corpus

This paper presents the results of acoustic modeling used in a Large Vocabulary Continuous Speech... more This paper presents the results of acoustic modeling used in a Large Vocabulary Continuous Speech Recognition (LVCSR) system designed with the use of a phonetically controlled large vocabulary corpus. Evaluation experiments showed that relatively good speech recognition results may be obtained with adequate training material, taking into account: a) the presence of lexical stress; b) speech styles (a variety of segmental and prosodic structures, various degree of spontaneity of speech, various pronunciation variants and dialects); c) the influence of the sound level and environment noise. Moreover, the article includes information about the speech corpus structure and also an outline of the design of the speech recognition system.

First evaluation of Polish LVCSR acoustic models obtained from the JURISDIC database

This paper presents the results of the pilot survey of the acoustic models obtained from the Poli... more This paper presents the results of the pilot survey of the acoustic models obtained from the Polish Speech Database for taking dictation of legal texts, created for the needs of the first LVCSR system for Polish (JURISDIC). Additionally, background information about the design of the database is presented along with the description of the applied methods of the corpus construction and current statistics of the data- base contents.

A Preliminary Study of Temporal Adaptation in Polish VC Groups

The study presents experimental data on Polish vowel durations in consonantal contexts gathered t... more The study presents experimental data on Polish vowel durations in consonantal contexts gathered to test prosodic hypotheses. An attempt is made to verify the signif icance of the process of balancing V-to-V durations in a dyna mical model of speech rhythm applied to Polish. We report on the results of a controlled experiment followed by a qu ery of a

Development and evaluation of Polish speech corpus for unit selection speech synthesis systems

The design of Polish Speech Corpus for Unit Selection Speech Synthesis

The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis... more The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The next section focuses on the implementation of Polish TTS modules in BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.

A preliminary study of temporal adaptation in Polish VC groups

Polish unit selection speech synthesis with BOSS: extensions and speech corpora

International Journal of Speech Technology, 2010

Implementation of Polish speech synthesis for the BOSS system

Bulletin of the Polish Academy of Sciences: Technical Sciences, 2000

ABSTRACT The Bonn Open Synthesis System (BOSS) is an open-source software for the unit selection ... more ABSTRACT The Bonn Open Synthesis System (BOSS) is an open-source software for the unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The subsequent sections focus on the implementation of Polish TTS modules in the BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.

Paralingua – A New Speech Corpus for the Studies of Paralinguistic Features

Procedia - Social and Behavioral Sciences, 2013

ABSTRACT This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing... more ABSTRACT This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project whose primary aim was to develop a speaker recognition and identification system for forensics. The present corpus was designed for the purpose of analysis of selected paralinguistic features in continuous speech and for preliminary examination of the vocal display of affective states. The recorded (and annotated) data include conversational speech in the form of task-oriented dialogues, emotional utterances (realized as emotion portrayals), and an acted court scene. As a reference material, a short read text was provided by each of the speakers.

Czoska, A., Klessa, K., Karpinski, M., Jarmolowicz-Nowikow, E. 2015. Prosody and gesture in dialogue: Cross-modal interactions. Proceedings of GESPIN 2015 Conference, Nantes, France, pp. 83-88.

by Agnieszka Czoska, Maciej Karpiński, and Katarzyna Klessa

In this paper, some measures of cross-modal interactions are proposed and implemented in the anal... more In this paper, some measures of cross-modal interactions are
proposed and implemented in the analysis of a multimodal
corpus of task-oriented dialogues. The corpus includes multilevel
annotations of speakers' verbal and gestural behaviour,
e.g., hand gestures, gaze direction, utterance content or
intonational phrasing. A moving time-window approach is
adopted to analyse changes in the communicative behaviour of
dialogue participants over time. The study is focused on how
gestures and speech of the Instruction Giver influence the
speech of the Instruction Follower in the course of dialogue.

Karpiński, M., Klessa, K. 2015. Prozodia niepewności. (Prosody of uncertainty) [w:] Sens i brzmienie. Warszawa: Wydawnictwo UKSW.

by Katarzyna Klessa and Maciej Karpiński

(to be published in Proceedings of ProSem 2014)

Prosody of uncertainty Two experiments were carried out in order to find how global prosodic par... more Prosody of uncertainty
Two experiments were carried out in order to find how global prosodic parameters of Polish utterances influenced their classification by Polish native speakers in the dimension of “certainty.” In Experiment 1, subjects listened to a set of resynthesized and manipulated utterances, in Experiment 2, subjects listened to prime-stimulus pairs where the prime was a manipulated and resynthesized pitch trace of a real utterance while the stimulus was a regular utterance. In Experiment 1, global pitch height, pitch range as well as speech tempo of the stimulus were manipulated. In Experiment 2, only pitch range and average pitch height of the prime was manipulated as primes did not have a segmental structure. The group of subjects consisted of sixteen students, predominantly females. It was shown that utterances produced in a lower voice and at slower speech rates were perceived as showing a higher degree of certainty while those produced in a higher pitch were mostly categorized as showing a low degree of certainty. In the second experiment, it was observed that the pitch contour of the prime did not influence perception of the stimulus in a significant way. Most of our results are coherent with those found in literature for other languages. However, there are also differences that may be attributed to culture-dependent aspects of paralinguistic prosody. Further research is necessary to explore and explain those discrepancies.

Klessa, K., Wagner, A., Oleśkowicz-Popiel, M., Karpiński, M. 2013. "Paralingua" - a new speech corpus for the studies of paralinguistic features. Procedia - Social and Behavioral Science. Vol. 95. pp. 48-58. Elsevier ISSN: 1877-0428

by Katarzyna Klessa, Maciej Karpiński, and Magdalena Oleśkowicz-Popiel

This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project ... more This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project whose primary aim was to develop a speaker recognition and identification system for forensics. The present corpus was designed for the purpose of analysis of selected paralinguistic features in continuous speech and for preliminary examination of the vocal display of affective states. The recorded (and annotated) data include conversational speech in the form of task-oriented dialogues, emotional utterances (realized as emotion portrayals), and an acted court scene. As a reference material, a short read text was provided by each of the speakers.

The design of Polish Speech Corpus for Unit Selection Speech Synthesis

The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis... more The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The next section focuses on the implementation of Polish TTS modules in BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.

Is laboratory evoked infant directed speech significantly different from adult directed speech?

Using "Paralingua" database for investigation of affective states and paralinguistic features

This paper reports on the work directed at creating a framework for investigation of affective st... more This paper reports on the work directed at creating a framework for investigation of affective states and paralinguistic features in speech and their role in describing individual features of human voices. The work was carried out within a research-development project whose primary aim was to develop a speaker recognition and identification system mainly for forensic applications. The present paper describes the methods and preliminary results of examination of the choice of lexical means, vocal communication of affective states and voice quality features using "Paralingua‖ corpus, and introduces "Annotation System" - a novel tool designed specifically for annotation of paralinguistic features.

F0 contour and segmental duration modeling using prosodic features

This paper proposes a framework of F0 contour generation and segmental duration modeling for appl... more This paper proposes a framework of F0 contour generation and segmental duration modeling for application in a unit-selection speech synthesis system for Polish – BOSS. We describe the design of the F0 and duration modeling modules and emphasize the role of prosodic features (related to stress, pitch accent and phrase) in these two tasks.

Structure and annotation of Polish LVCSR speech database

Annual Conference of the International Speech Communication Association, 2009

LVCSR Speech Database - JURISDIC

In the paper an overview of the Polish Speech Database for taking dictation of legal texts. creat... more In the paper an overview of the Polish Speech Database for taking dictation of legal texts. created for the purpose of L VCSR system for Polish in the frame of Polish Platform for Homeland Security (PPBW) is presented. Basic information about the design of the database is provided as well as the applied method of the text corpora construction and

JURISDIC: Polish Speech Database for Taking Dictation of Legal Texts

Language Resources and Evaluation, 2008

The paper provides an overview of the Polish Speech Database for taking dictation of legal texts,... more The paper provides an overview of the Polish Speech Database for taking dictation of legal texts, crea ted for the purpose of LVCSR system for Polish. It presents background informati on about the design of the database and the require ments coming from its future uses. The applied method of the text corpora constr uction is presented as well as

A Study of Chosen Temporal Relations within Syllable Structure in Polish* Analiza wybranych związków iloczasowych zachodzących w obrębie sylaby w języku polskim

This paper presents data on the most common syllable patterns in Polish based on corpora of appro... more This paper presents data on the most common syllable patterns in Polish based on corpora of approximately 40 minutes of read speech as well as on a word list of nearly 700 000 items. First, the results of statistical analysis concerning the frequency of occurrence for the possible syllable patterns in Polish are described. Then, chosen problems connected with segmental

Development of large vocabulary continuous speech recognition using phonetically structured speech corpus

This paper presents the results of acoustic modeling used in a Large Vocabulary Continuous Speech... more This paper presents the results of acoustic modeling used in a Large Vocabulary Continuous Speech Recognition (LVCSR) system designed with the use of a phonetically controlled large vocabulary corpus. Evaluation experiments showed that relatively good speech recognition results may be obtained with adequate training material, taking into account: a) the presence of lexical stress; b) speech styles (a variety of segmental and prosodic structures, various degree of spontaneity of speech, various pronunciation variants and dialects); c) the influence of the sound level and environment noise. Moreover, the article includes information about the speech corpus structure and also an outline of the design of the speech recognition system.

First evaluation of Polish LVCSR acoustic models obtained from the JURISDIC database

This paper presents the results of the pilot survey of the acoustic models obtained from the Poli... more This paper presents the results of the pilot survey of the acoustic models obtained from the Polish Speech Database for taking dictation of legal texts, created for the needs of the first LVCSR system for Polish (JURISDIC). Additionally, background information about the design of the database is presented along with the description of the applied methods of the corpus construction and current statistics of the data- base contents.

A Preliminary Study of Temporal Adaptation in Polish VC Groups

The study presents experimental data on Polish vowel durations in consonantal contexts gathered t... more The study presents experimental data on Polish vowel durations in consonantal contexts gathered to test prosodic hypotheses. An attempt is made to verify the signif icance of the process of balancing V-to-V durations in a dyna mical model of speech rhythm applied to Polish. We report on the results of a controlled experiment followed by a qu ery of a

Development and evaluation of Polish speech corpus for unit selection speech synthesis systems

The design of Polish Speech Corpus for Unit Selection Speech Synthesis

The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis... more The Bonn Open Synthesis System (BOSS) is open-source software for unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The next section focuses on the implementation of Polish TTS modules in BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.

A preliminary study of temporal adaptation in Polish VC groups

Polish unit selection speech synthesis with BOSS: extensions and speech corpora

International Journal of Speech Technology, 2010

Implementation of Polish speech synthesis for the BOSS system

Bulletin of the Polish Academy of Sciences: Technical Sciences, 2000

ABSTRACT The Bonn Open Synthesis System (BOSS) is an open-source software for the unit selection ... more ABSTRACT The Bonn Open Synthesis System (BOSS) is an open-source software for the unit selection speech synthesis that has been used for the generation of high-quality German and Dutch speech. This article presents ongoing research and development aimed at adapting BOSS to the Polish language. In the first section, the origins and workings of the unit selection method for speech synthesis are explained. Section two details the structure of the Polish corpus and its segmental and prosodic annotation. The subsequent sections focus on the implementation of Polish TTS modules in the BOSS architecture (duration prediction and cost function) and the steps involved in preparing a new speech corpus for BOSS.

Paralingua – A New Speech Corpus for the Studies of Paralinguistic Features

Procedia - Social and Behavioral Sciences, 2013

ABSTRACT This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing... more ABSTRACT This paper introduces “Paralingua” - a new speech corpus created within a larger ongoing project whose primary aim was to develop a speaker recognition and identification system for forensics. The present corpus was designed for the purpose of analysis of selected paralinguistic features in continuous speech and for preliminary examination of the vocal display of affective states. The recorded (and annotated) data include conversational speech in the form of task-oriented dialogues, emotional utterances (realized as emotion portrayals), and an acted court scene. As a reference material, a short read text was provided by each of the speakers.

Czoska, A., Klessa, K., Karpinski, M., Jarmolowicz-Nowikow, E. 2015. Prosody and gesture in dialogue: Cross-modal interactions. Proceedings of GESPIN 2015 Conference, Nantes, France, pp. 83-88.

by Agnieszka Czoska, Maciej Karpiński, and Katarzyna Klessa

In this paper, some measures of cross-modal interactions are proposed and implemented in the anal... more In this paper, some measures of cross-modal interactions are
proposed and implemented in the analysis of a multimodal
corpus of task-oriented dialogues. The corpus includes multilevel
annotations of speakers' verbal and gestural behaviour,
e.g., hand gestures, gaze direction, utterance content or
intonational phrasing. A moving time-window approach is
adopted to analyse changes in the communicative behaviour of
dialogue participants over time. The study is focused on how
gestures and speech of the Instruction Giver influence the
speech of the Instruction Follower in the course of dialogue.

Karpiński, M., Klessa, K. 2015. Prozodia niepewności. (Prosody of uncertainty) [w:] Sens i brzmienie. Warszawa: Wydawnictwo UKSW.

by Katarzyna Klessa and Maciej Karpiński

(to be published in Proceedings of ProSem 2014)

Prosody of uncertainty Two experiments were carried out in order to find how global prosodic par... more Prosody of uncertainty
Two experiments were carried out in order to find how global prosodic parameters of Polish utterances influenced their classification by Polish native speakers in the dimension of “certainty.” In Experiment 1, subjects listened to a set of resynthesized and manipulated utterances, in Experiment 2, subjects listened to prime-stimulus pairs where the prime was a manipulated and resynthesized pitch trace of a real utterance while the stimulus was a regular utterance. In Experiment 1, global pitch height, pitch range as well as speech tempo of the stimulus were manipulated. In Experiment 2, only pitch range and average pitch height of the prime was manipulated as primes did not have a segmental structure. The group of subjects consisted of sixteen students, predominantly females. It was shown that utterances produced in a lower voice and at slower speech rates were perceived as showing a higher degree of certainty while those produced in a higher pitch were mostly categorized as showing a low degree of certainty. In the second experiment, it was observed that the pitch contour of the prime did not influence perception of the stimulus in a significant way. Most of our results are coherent with those found in literature for other languages. However, there are also differences that may be attributed to culture-dependent aspects of paralinguistic prosody. Further research is necessary to explore and explain those discrepancies.