Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard
Proceedings of the ACM on Human-Computer Interaction (PACMHCI), Volume 8, Issue CSCW2Article No.: 482, Pages 1–22https://doi.org/10.1145/3687021Anonymity is a powerful component of many participatory media platforms that can afford people greater freedom of expression and protection from external coercion and interference. However, it can be difficult to effectively implement on platforms that ...
- short-paperNovember 2024
Benchmarking Speech-Driven Gesture Generation Models for Generalization to Unseen Voices and Noisy Environments
- Johsac Isbac Gomez Sanchez,
- Kevin Adier Inofuente Colque,
- Leonardo Boulitreau de Menezes Martins Marques,
- Paula Dornhofer Paro Costa,
- Rodolfo Luis Tonoli
ICMI Companion '24: Companion Proceedings of the 26th International Conference on Multimodal InteractionPages 170–174https://doi.org/10.1145/3686215.3688823Speech-driven gesture generation models enhance robot gestures and control avatars in virtual environments by synchronizing gestures with speech prosody. However, state-of-the-art models are trained on a limited number of speakers, with audios typically ...
- research-articleOctober 2024
Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier
MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 6765–6773https://doi.org/10.1145/3664647.3681345Generative AI technologies, including text-to-speech (TTS) and voice conversion (VC), frequently become indistinguishable from genuine samples, posing challenges for individuals in discerning between real and synthetic content. This indistinguishability ...
- research-articleJune 2024
Retrieval-Augmented Audio Deepfake Detection
ICMR '24: Proceedings of the 2024 International Conference on Multimedia RetrievalPages 376–384https://doi.org/10.1145/3652583.3658086With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) ...
- research-articleOctober 2023
Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment
MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 8443–8452https://doi.org/10.1145/3581783.3613825This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely relying on a single ...
-
- research-articleOctober 2023
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion
MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 184–192https://doi.org/10.1145/3581783.3613800Voice conversion as the style transfer task applied to speech, refers to converting one person's speech into a new speech that sounds like another person's. Up to now, there has been a lot of research devoted to better implementation of VC tasks. However,...
- research-articleOctober 2023
Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion
MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 8674–8685https://doi.org/10.1145/3581783.3612333Voice conversion (VC), as a voice style transfer technology, is becoming increasingly prevalent while raising serious concerns about its illegal use. Proactively tracing the origins of VC-generated speeches, i.e., speaker traceability, can prevent the ...
- research-articleMarch 2023
High Quality and Similarity One-Shot Voice Conversion Using End-to-End Model
CSAI '22: Proceedings of the 2022 6th International Conference on Computer Science and Artificial IntelligencePages 284–288https://doi.org/10.1145/3577530.3577575Voice Conversion (VC) is becoming increasingly popular in speech synthesis applications. Most methods focus on many-to-many VC which can not be used for unseen speakers. One-shot (any-to-any) VC allows the source and the target speakers can be both ...
- research-articleOctober 2022
Improving Spoofing Capability for End-to-end Any-to-many Voice Conversion
DDAM '22: Proceedings of the 1st International Workshop on Deepfake Detection for Audio MultimediaPages 93–100https://doi.org/10.1145/3552466.3556532Audio deep synthesis techniques have been able to generate high-quality speech whose authenticity is difficult for humans to recognize. Meanwhile, many anti-spoofing systems have been developed to capture artifacts in the synthesized speech that are ...
- research-articleOctober 2022
Cloning and Conversion of an Arbitrary Voice Using Generative Flows
Automation and Remote Control (ARCO), Volume 83, Issue 10Pages 1555–1566https://doi.org/10.1134/S00051179220100083AbstractTo improve the quality of generated speech signals, this paper proposes a method for taking into account time-varying information about the speaker. Using this technique, the system synthesizes more natural speech with a voice similar to the given ...
- short-paperDecember 2021
Improving Model Stability and Training Efficiency in Fast, High Quality Expressive Voice Conversion System
ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal InteractionPages 75–79https://doi.org/10.1145/3461615.3491106Voice conversion (VC) systems have made significant progress owing to advanced deep learning methods. Current research is not only concerned with high-quality and fast audio synthesis, but also richer expressiveness. The most popular VC system was ...
- research-articleOctober 2021
Digital Speech Makeup: Voice Conversion Based Altered Auditory Feedback for Transforming Self-Representation
ICMI '21: Proceedings of the 2021 International Conference on Multimodal InteractionPages 159–167https://doi.org/10.1145/3462244.3479934Makeup (i.e., cosmetics) has long been used to transform not only one’s appearance but also their self-representation. Previous studies have demonstrated that visual transformations can induce a variety of effects on self-representation. Herein, we ...
- research-articleOctober 2021
Face-based Voice Conversion: Learning the Voice behind a Face
MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 496–505https://doi.org/10.1145/3474085.3475198Zero-shot voice conversion (VC) trained by non-parallel data has gained a lot of attention in recent years. Previous methods usually extract speaker embeddings from audios and use them for converting the voices into different voice styles. Since there ...
- research-articleOctober 2021
TACR-Net: Editing on Deep Video and Voice Portraits
MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 478–486https://doi.org/10.1145/3474085.3475196Utilizing an arbitrary speech clip to edit the mouth of the portrait in the target video is a novel yet challenging task. Despite impressive results have been achieved, there are still three limitations in the existing methods: 1) since the acoustic ...
- demonstrationOctober 2019
Development of a Real-time Bionic Voice Generation System based on Statistical Excitation Prediction
ASSETS '19: Proceedings of the 21st International ACM SIGACCESS Conference on Computers and AccessibilityPages 655–657https://doi.org/10.1145/3308561.3354591Despite the emergent progress in many fields of bionics, larynx amputees still lack a functional Bionic Voice source to overcome their voice disability. We have established the Pneumatic Bionic Voice (PBV) as a promising technology to generate a voice ...
- posterOctober 2019
TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication
UIST '19 Adjunct: Adjunct Proceedings of the 32nd Annual ACM Symposium on User Interface Software and TechnologyPages 33–35https://doi.org/10.1145/3332167.3357106Despite promising initial studies, a speaker's original voice can cause problems when it comes to the application of real-time voice conversion (data-driven speaker conversion) technology in our daily lives, specifically in our near-field communication, ...
- short-paperMarch 2019
Post-laryngectomy interaction restoration system
IUI '19 Companion: Companion Proceedings of the 24th International Conference on Intelligent User InterfacesPages 163–164https://doi.org/10.1145/3308557.3308731Laryngectomy is a surgical treatment that leads to voice loss. Novel approach to regaining communicative function after the operation is presented. In contrast to most technologies using artificial intelligence to restore the voice, it focuses on ...
- research-articleJanuary 2019
VoiCon: a Matlab GUI-based tool for voice conversion applications
International Journal of Computer Applications in Technology (IJCAT), Volume 61, Issue 3Pages 207–219https://doi.org/10.1504/ijcat.2019.102854Voice conversion finds applications in a wide variety of areas such as customisation of text to speech systems, voice editing and dubbing, voice restoration systems, etc. in addition to its initial applications of speaker conversion and conversion of ...
- research-articleNovember 2018
Hidebehind: Enjoy Voice Input with Voiceprint Unclonability and Anonymity
SenSys '18: Proceedings of the 16th ACM Conference on Embedded Networked Sensor SystemsPages 82–94https://doi.org/10.1145/3274783.3274855We are speeding toward a not-too-distant future when we can perform human-computer interaction using solely our voice. Speech recognition is the key technology that powers voice input, and it is usually outsourced to the cloud for the best performance. ...
- surveyJuly 2018
Voice Disguise in Automatic Speaker Recognition
ACM Computing Surveys (CSUR), Volume 51, Issue 4Article No.: 68, Pages 1–22https://doi.org/10.1145/3195832Humans are able to identify other people’s voices even in voice disguise conditions. However, we are not immune to all voice changes when trying to identify people from voice. Likewise, automatic speaker recognition systems can also be deceived by voice ...