Keyword: voice conversion : Search

research-article

Open Access

Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard

Proceedings of the ACM on Human-Computer Interaction (PACMHCI), Volume 8, Issue CSCW2Article No.: 482, Pages 1–22https://doi.org/10.1145/3687021

Anonymity is a powerful component of many participatory media platforms that can afford people greater freedom of expression and protection from external coercion and interference. However, it can be difficult to effectively implement on platforms that ...

short-paper

Free

Benchmarking Speech-Driven Gesture Generation Models for Generalization to Unseen Voices and Noisy Environments

ICMI Companion '24: Companion Proceedings of the 26th International Conference on Multimodal InteractionPages 170–174https://doi.org/10.1145/3686215.3688823

Speech-driven gesture generation models enhance robot gestures and control avatars in virtual environments by synchronizing gestures with speech prosody. However, state-of-the-art models are trained on a limited number of speakers, with audios typically ...

research-article

Free

Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier

MM '24: Proceedings of the 32nd ACM International Conference on MultimediaPages 6765–6773https://doi.org/10.1145/3664647.3681345

Generative AI technologies, including text-to-speech (TTS) and voice conversion (VC), frequently become indistinguishable from genuine samples, posing challenges for individuals in discerning between real and synthetic content. This indistinguishability ...

research-article

Retrieval-Augmented Audio Deepfake Detection

ICMR '24: Proceedings of the 2024 International Conference on Multimedia RetrievalPages 376–384https://doi.org/10.1145/3652583.3658086

With recent advances in speech synthesis including text-to-speech (TTS) and voice conversion (VC) systems enabling the generation of ultra-realistic audio deepfakes, there is growing concern about their potential misuse. However, most deepfake (DF) ...

research-article

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 8443–8452https://doi.org/10.1145/3581783.3613825

This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely relying on a single ...

research-article

PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion

MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 184–192https://doi.org/10.1145/3581783.3613800

Voice conversion as the style transfer task applied to speech, refers to converting one person's speech into a new speech that sounds like another person's. Up to now, there has been a lot of research devoted to better implementation of VC tasks. However,...

research-article

Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion

MM '23: Proceedings of the 31st ACM International Conference on MultimediaPages 8674–8685https://doi.org/10.1145/3581783.3612333

Voice conversion (VC), as a voice style transfer technology, is becoming increasingly prevalent while raising serious concerns about its illegal use. Proactively tracing the origins of VC-generated speeches, i.e., speaker traceability, can prevent the ...

research-article

High Quality and Similarity One-Shot Voice Conversion Using End-to-End Model

CSAI '22: Proceedings of the 2022 6th International Conference on Computer Science and Artificial IntelligencePages 284–288https://doi.org/10.1145/3577530.3577575

Voice Conversion (VC) is becoming increasingly popular in speech synthesis applications. Most methods focus on many-to-many VC which can not be used for unseen speakers. One-shot (any-to-any) VC allows the source and the target speakers can be both ...

research-article

Open Access

Improving Spoofing Capability for End-to-end Any-to-many Voice Conversion

DDAM '22: Proceedings of the 1st International Workshop on Deepfake Detection for Audio MultimediaPages 93–100https://doi.org/10.1145/3552466.3556532

Audio deep synthesis techniques have been able to generate high-quality speech whose authenticity is difficult for humans to recognize. Meanwhile, many anti-spoofing systems have been developed to capture artifacts in the synthesized speech that are ...

research-article

Cloning and Conversion of an Arbitrary Voice Using Generative Flows

D. S. Obukhov

Automation and Remote Control (ARCO), Volume 83, Issue 10Pages 1555–1566https://doi.org/10.1134/S00051179220100083

Abstract

To improve the quality of generated speech signals, this paper proposes a method for taking into account time-varying information about the speaker. Using this technique, the system synthesizes more natural speech with a voice similar to the given ...

short-paper

Improving Model Stability and Training Efficiency in Fast, High Quality Expressive Voice Conversion System

ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal InteractionPages 75–79https://doi.org/10.1145/3461615.3491106

Voice conversion (VC) systems have made significant progress owing to advanced deep learning methods. Current research is not only concerned with high-quality and fast audio synthesis, but also richer expressiveness. The most popular VC system was ...

research-article

Digital Speech Makeup: Voice Conversion Based Altered Auditory Feedback for Transforming Self-Representation

ICMI '21: Proceedings of the 2021 International Conference on Multimodal InteractionPages 159–167https://doi.org/10.1145/3462244.3479934

Makeup (i.e., cosmetics) has long been used to transform not only one’s appearance but also their self-representation. Previous studies have demonstrated that visual transformations can induce a variety of effects on self-representation. Herein, we ...

research-article

Face-based Voice Conversion: Learning the Voice behind a Face

MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 496–505https://doi.org/10.1145/3474085.3475198

Zero-shot voice conversion (VC) trained by non-parallel data has gained a lot of attention in recent years. Previous methods usually extract speaker embeddings from audios and use them for converting the voices into different voice styles. Since there ...

research-article

TACR-Net: Editing on Deep Video and Voice Portraits

MM '21: Proceedings of the 29th ACM International Conference on MultimediaPages 478–486https://doi.org/10.1145/3474085.3475196

Utilizing an arbitrary speech clip to edit the mouth of the portrait in the target video is a novel yet challenging task. Despite impressive results have been achieved, there are still three limitations in the existing methods: 1) since the acoustic ...

demonstration

Public Access

Development of a Real-time Bionic Voice Generation System based on Statistical Excitation Prediction

ASSETS '19: Proceedings of the 21st International ACM SIGACCESS Conference on Computers and AccessibilityPages 655–657https://doi.org/10.1145/3308561.3354591

Despite the emergent progress in many fields of bionics, larynx amputees still lack a functional Bionic Voice source to overcome their voice disability. We have established the Pneumatic Bionic Voice (PBV) as a promising technology to generate a voice ...

poster

TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication

UIST '19 Adjunct: Adjunct Proceedings of the 32nd Annual ACM Symposium on User Interface Software and TechnologyPages 33–35https://doi.org/10.1145/3332167.3357106

Despite promising initial studies, a speaker's original voice can cause problems when it comes to the application of real-time voice conversion (data-driven speaker conversion) technology in our daily lives, specifically in our near-field communication, ...

short-paper

Post-laryngectomy interaction restoration system

IUI '19 Companion: Companion Proceedings of the 24th International Conference on Intelligent User InterfacesPages 163–164https://doi.org/10.1145/3308557.3308731

Laryngectomy is a surgical treatment that leads to voice loss. Novel approach to regaining communicative function after the operation is presented. In contrast to most technologies using artificial intelligence to restore the voice, it focuses on ...

research-article

VoiCon: a Matlab GUI-based tool for voice conversion applications

International Journal of Computer Applications in Technology (IJCAT), Volume 61, Issue 3Pages 207–219https://doi.org/10.1504/ijcat.2019.102854

Voice conversion finds applications in a wide variety of areas such as customisation of text to speech systems, voice editing and dubbing, voice restoration systems, etc. in addition to its initial applications of speaker conversion and conversion of ...

research-article

Public Access

Hidebehind: Enjoy Voice Input with Voiceprint Unclonability and Anonymity

SenSys '18: Proceedings of the 16th ACM Conference on Embedded Networked Sensor SystemsPages 82–94https://doi.org/10.1145/3274783.3274855

We are speeding toward a not-too-distant future when we can perform human-computer interaction using solely our voice. Speech recognition is the key technology that powers voice input, and it is usually outsourced to the cloud for the best performance. ...

survey

Voice Disguise in Automatic Speaker Recognition

Mireia FarrÚs

ACM Computing Surveys (CSUR), Volume 51, Issue 4Article No.: 68, Pages 1–22https://doi.org/10.1145/3195832

Humans are able to identify other people’s voices even in voice disguise conditions. However, we are not immune to all voice changes when trying to identify people from voice. Likewise, automatic speaker recognition systems can also be deceived by voice ...

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Caption

Anonymization of Voices in Spaces for Civic Dialogue: Measuring Impact on Empathy, Trust, and Feeling Heard

Benchmarking Speech-Driven Gesture Generation Models for Generalization to Unseen Voices and Noisy Environments

Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier

Retrieval-Augmented Audio Deepfake Detection

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

Upcoming Conferences

PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion

Who is Speaking Actually? Robust and Versatile Speaker Traceability for Voice Conversion

High Quality and Similarity One-Shot Voice Conversion Using End-to-End Model

Improving Spoofing Capability for End-to-end Any-to-many Voice Conversion

Cloning and Conversion of an Arbitrary Voice Using Generative Flows

Improving Model Stability and Training Efficiency in Fast, High Quality Expressive Voice Conversion System

Digital Speech Makeup: Voice Conversion Based Altered Auditory Feedback for Transforming Self-Representation

Face-based Voice Conversion: Learning the Voice behind a Face

TACR-Net: Editing on Deep Video and Voice Portraits

Development of a Real-time Bionic Voice Generation System based on Statistical Excitation Prediction

TransVoice: Real-Time Voice Conversion for Augmenting Near-Field Speech Communication

Post-laryngectomy interaction restoration system

VoiCon: a Matlab GUI-based tool for voice conversion applications

Hidebehind: Enjoy Voice Input with Voiceprint Unclonability and Anonymity

Voice Disguise in Automatic Speaker Recognition

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder

Upcoming Conferences