Timo Baumann

University of Hamburg, Informatics, Post-Doc

Followers

Following

Co-authors

Public Views

Address: Germany

less

Interests

Uploads

Papers

Open source automatic lecture subtitling

Telida

Download

Machine Learning Approaches to Classify Anatomical Regions in Rodent Brain from High Density Recordings

2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Jul 11, 2022

Evaluating Heuristics for Audio-Visual Translation

Download

Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts

arXiv (Cornell University), May 24, 2022

Download

Towards Smart Home Data Interpretation Using Analogies to Natural Language Processing

2022 IEEE International Conference on Smart Internet of Things (SmartIoT)

Mining the Spoken Wikipedia for Speech Data and Beyond

We present a corpus of time-aligned spoken data of Wikipedia articles as well as the pipeline tha... more We present a corpus of time-aligned spoken data of Wikipedia articles as well as the pipeline that allows to generate such corpora for many languages. There are initiatives to create and sustain spoken Wikipedia versions in many languages and hence the data is freely available, grows over time, and can be used for automatic corpus creation. Our pipeline automatically downloads and aligns this data. The resulting German corpus currently totals 293h of audio, of which we align 71h in full sentences and another 86h of sentences with some missing words. The English corpus consists of 287h, for which we align 27h in full sentence and 157h with some missing words. Results are publically available.

Download

Faster Responses Are Better Responses: Introducing Incrementality into Sociable Virtual Personal Assistants

IWSDS, 2018

Download

A Deep Dive Into Neural Synchrony Evaluation for Audio-visual Translation

INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION

Initiating Human-Robot Interactions Using Incremental Speech Adaptation

Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, 2021

In this paper, we present a study in which a robot initiates interactions with people passing by ... more In this paper, we present a study in which a robot initiates interactions with people passing by in an in-the-wild scenario. The robot adapts the loudness of its voice dynamically to the distance of the respective person approached, thus indicating who it is talking to. It furthermore tracks people based on information on body orientation and eye gaze and adapts the text produced based on people's distance autonomously. Our study shows that the adaptation of the loudness of its voice is perceived as personalization by the participants and that the likelihood that they stop by and interact with the robot increases when the robot incrementally adjusts its behavior.

Deep Learning als Herausforderung für die digitale Literaturwissenschaft

Download

How a Listener Influences the Speaker

10th International Conference on Speech Prosody 2020, 2020

Listeners typically provide feedback while listening to a speaker in conversation and thereby eng... more Listeners typically provide feedback while listening to a speaker in conversation and thereby engage in the co-construction of the interaction. We analyze the influence of the listener on the speaker by investigating how her verbal feedback signals help in modeling the speaker’s language. We find that feedback from the listener may help in modeling the speaker’s language, whether through the listener’s feedback as transcribed, or the acoustic signal directly. We find the largest positive effects for end of sentence as well as for pauses mid-utterance, but also effects that indicate we successfully model elaborations of ongoing utterances that may result from the presence or absence of listener feedback.

Tonality in Language: The Generative Theory of Tonal Music as a Framework for Prosodic Analysis of Poetry

6th International Symposium on Tonal Aspects of Languages (TAL 2018), 2018

Download

Towards a Social Robot that Incrementally Justifies Personal-Space Intrusion

Robots should appropriately give reasons for their actions when these actions affect a human’s ac... more Robots should appropriately give reasons for their actions when these actions affect a human’s action or goal space. Communicating reasons may help the human understand the robot’s intents and may initiate joint action, i. e., accepting the robot’s goals and cooperating on the robot’s actions. However, to be efficient, the communication of reasons should be limited to the necessary rather than to completeness, conforming to the Gricean Maxim of Quantity. Furthermore, what is necessary only becomes apparent as the situation evolves and hence, for seamless interaction, ongoing utterances must be adapted as they happen. We present a system that flexibly gives reasons in a reduced setting in which the robot needs to intrude a human’s personal space in order to reach its goal.

Download

Free Verse and Beyond: How to Classify Post-modern Spoken Poetry

10th International Conference on Speech Prosody 2020, 2020

Download

From Fluency To Disfluency: Ranking Prosodic Features Of Poetry By Using Neural Networks

of paper 0582 presented at the Digital Humanities Conference 2019 (DH2019), Utrecht , the Netherl... more

Download

Project Proposal for the Daimler and Benz PostDoc Award

Current-day spoken dialogue systems are tedious to interact with (Ward et al. 2005). eir naturaln... more Current-day spoken dialogue systems are tedious to interact with (Ward et al. 2005). eir naturalness and (measurable) quality of interaction can be improved through incremental (step-by-step) processing schemes that enable dialogue systems to interact continuously (Baumann 2013). However, incremental models have not yet adequately addressed the challenge of joint decision making and optimization of hypotheses across the multitude of components within a modularized system in real-time, mostly because their data-ows follow simple pipeline approaches. Ad-hoc integration of modules fails completely for distributed systems which are preferred in robotics, for research systems, and in mobile applications. is shortcoming impedes incremental spoken dialogue systems to leverage their full potential. is project proposes to design and implement an architecture for concurrent, distributed incremental processing and knowledge representation for spoken dialogue in which components share their un...

Download

Darstellung des Forschungsvorhabens Integration von inkrementeller Prosodie-und Spracherkennung

Sprachdialogsysteme profitieren von inkrementeller Verarbeitung. Inkrementelle Sprachdialogsystem... more Sprachdialogsysteme profitieren von inkrementeller Verarbeitung. Inkrementelle Sprachdialogsysteme (bei denen die Verarbeitung auf allen Ebenen schon während der Eingabe beginnt) führen schneller zum Ziel und werden von Nutzern besser bewertet als nichtinkrementelle Sprachdialogsysteme (bei denen die Spracherkennung und darauffolgende Module erst die Verarbeitung beginnen wenn die Eingabe abgeschlossen ist, bzw. das vorgeschaltete Modul seine Verarbeitung beendet hat).[1],[14] In meiner Promotion beschäftige ich mich mit der Inkrementalisierung von Spracherkennung und -verstehen für Sprachdialogsysteme. Vielfach wurde gezeigt, dass Prosodie unter anderem einen Beitrag zur Spracherkennung (ASR) [2], zur Disfluenzerkennung [13], zur End-of-Turn-Erkennung [9] und -Vorhersage [4], und zum Parsing [11] leisten kann. Die meisten dieser Anwendungen werden im experimentellen Maßstab allerdings offline durchgeführt, d. h. die prosodischen Merkmale und Kategorien werden in separaten Durchläuf...

Download

Recognizing Modern Sound Poetry with LSTM Networks

Our paper focuses on the computational analysis of “readout poetry” (german: Hördichtung) – recor... more Our paper focuses on the computational analysis of “readout poetry” (german: Hördichtung) – recordings of poets reading their own work – with regards to the most important type of this genre, the modern “sound poetry” (german: Lautdichtung). Whereas “readout poetry” often uses normal words and sentences, the “sound poetry”, developed by dadaistic poets like Hugo Ball and Kurt Schwitters or concrete poets like Ernst Jandl, Oskar Pastior, or Bob Cobbing, combines the “microparticles of the human voice” like the segments in Ernst Jandls sound poem “schtzngrmm” (“schtzngrmm / schtzngrmm / tttt / tttt / grrrmmmmm / tttt / sch / tzngrmm”). Within the genre of sound poetry, there are two main forms: The lettristic and the syllabic decomposition. A short anecdote will explain this difference: The dadaist Raoul Hausmann developed the lettristic sound poetry in his early dadaistic poem “fmsbw” from 1918. This is said to have inspired his successor Schwitters, whose famous “Ursonate” [The Sona...

Download

Prosodic addressee-detection: ensuring privacy in always-on spoken dialog systems

Proceedings of the Conference on Mensch und Computer, 2020

We analyze the addressee detection task for complexity-identical dialog for both human conversati... more We analyze the addressee detection task for complexity-identical dialog for both human conversation and device-directed speech. Our recurrent neural model performs at least as good as humans, who have problems with this task, even native speakers, who profit from the relevant linguistic skills. We perform ablation experiments on the features used by our model and show that fundamental frequency variation is the single most relevant feature class. Therefore, we conclude that future systems can detect whether they are addressed based only on speech prosody which does not (or only to a very limited extent) reveal the content of conversations not intended for the system.