WOZ acoustic data collection for interactive TV

Brutti, Alessio; Cristoforetti, Luca; Kellermann, Walter; Marquardt, Lutz; Omologo, Maurizio

doi:10.1007/s10579-010-9116-x

WOZ acoustic data collection for interactive TV

Published: 11 February 2010

Volume 44, pages 205–219, (2010)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Alessio Brutti¹,
Luca Cristoforetti¹,
Walter Kellermann²,
Lutz Marquardt² &
…
Maurizio Omologo¹

267 Accesses
10 Citations
Explore all metrics

Abstract

This paper describes a multichannel acoustic data collection recorded under the European DICIT project, during Wizard of Oz (WOZ) experiments carried out at FAU and FBK-irst laboratories. The application of interest in DICIT is a distant-talking interface for control of interactive TV working in a typical living room, with many interfering devices. The objective of the experiments was to collect a database supporting efficient development and tuning of acoustic processing algorithms for signal enhancement. In DICIT, techniques for sound source localization, multichannel acoustic echo cancellation, blind source separation, speech activity detection, speaker identification and verification as well as beamforming are combined to achieve a maximum possible reduction of the user speech impairments typical of distant-talking interfaces. The collected database permitted to simulate at preliminary stage a realistic scenario and to tailor the involved algorithms to the observed user behaviors. In order to match the project requirements, the WOZ experiments were recorded in three languages: English, German and Italian. Besides the user inputs, the database also contains non-speech related acoustic events, room impulse response measurements and video data, the latter used to compute three-dimensional positions of each subject. Sessions were manually transcribed and segmented at word level, introducing also specific labels for acoustic events.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MYRiAD: a multi-array room acoustic database

Article Open access 26 April 2023

Automatic Analysis of Speech and Acoustic Events for Ambient Assisted Living

On Distant Speech Recognition for Home Automation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

For further details see: http://dicit.fbk.eu.
For further details see http://chil.server.de.
See http://eca.cx/ecasound/index.php for details.
See http://winlirc.sourceforge.net/ for details.
See http://trans.sourceforge.net/en/presentation.php for details.

References

Brayda, L., Bertotti, C., Cristoforetti, L., Omologo, M., & Svaizer, P. (2005). Modifications on NIST MarkIII array to improve coherence properties among input signals. In Proceedings of AES, 118th audio engineering society convention, Barcelona, Spain.
Cristoforetti, L., Omologo, M., Matassoni, M., Svaizer, P., & Zovato E. (2000). Annotation of a multichannel noisy speech corpus. In Proceedings of LREC 2000, Athens, Greece.
Furui, S. (1997). Recent advances in speaker recognition. Pattern Recognition Letters, 18, 859–872.
Article Google Scholar
Goronzy, S., & Beringer, N. (2005). Integrated development and on-the-fly simulation of multimodal dialogs. In Proceedings of interspeech 2005, Lisbon, Portugal (pp. 2477–2480).
Huang, Y., & Benesty, J. (2004). Audio signal processing for next-generation multimedia communication systems. Boston: Kluwer.
Book Google Scholar
Kellermann, W. (1991). A self-steering digital microphone array. In Proceedings of ICASSP 1991, Toronto, Canada.
Lanz, O. (2006). Approximate bayesian multibody tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1436–1449.
Article Google Scholar
Temko, A., Malkin, R., Nadieu, C., Zieger, C., Macho, D., & Omologo, M. (2006). CLEAR evaluation of acoustic event detection and classification systems. CLEAR’06 evaluation campaign and workshop. Southampton, UK: Springer.
Google Scholar

Download references

Acknowledgments

This work was partially funded by the Commission of the European Community, Information Society Technologies (IST), FP6 IST-034624, under DICIT.

Author information

Authors and Affiliations

Fondazione Bruno Kessler (FBK)–irst, Via Sommarive 18, 38123, Povo (TN), Italy
Alessio Brutti, Luca Cristoforetti & Maurizio Omologo
Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg (FAU), Cauerstr. 7, 91058, Erlangen, Germany
Walter Kellermann & Lutz Marquardt

Authors

Alessio Brutti
View author publications
You can also search for this author in PubMed Google Scholar
Luca Cristoforetti
View author publications
You can also search for this author in PubMed Google Scholar
Walter Kellermann
View author publications
You can also search for this author in PubMed Google Scholar
Lutz Marquardt
View author publications
You can also search for this author in PubMed Google Scholar
Maurizio Omologo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Cristoforetti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brutti, A., Cristoforetti, L., Kellermann, W. et al. WOZ acoustic data collection for interactive TV. Lang Resources & Evaluation 44, 205–219 (2010). https://doi.org/10.1007/s10579-010-9116-x

Download citation

Published: 11 February 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s10579-010-9116-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

WOZ acoustic data collection for interactive TV

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MYRiAD: a multi-array room acoustic database

Automatic Analysis of Speech and Acoustic Events for Ambient Assisted Living

On Distant Speech Recognition for Home Automation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

WOZ acoustic data collection for interactive TV

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MYRiAD: a multi-array room acoustic database

Automatic Analysis of Speech and Acoustic Events for Ambient Assisted Living

On Distant Speech Recognition for Home Automation

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation