Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–32 of 32 results for author: Reiss, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.08889  [pdf, other

    eess.AS cs.SD

    Diff-MST: Differentiable Mixing Style Transfer

    Authors: Soumya Sai Vanka, Christian Steinmetz, Jean-Baptiste Rolland, Joshua Reiss, George Fazekas

    Abstract: Mixing style transfer automates the generation of a multitrack mix for a given set of tracks by inferring production attributes from a reference song. However, existing systems for mixing style transfer are limited in that they often operate only on a fixed number of tracks, introduce artifacts, and produce mixes in an end-to-end fashion, without grounding in traditional audio effects, prohibiting… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to be published at the Proceedings of the 25th International Society for Music Information Retrieval Conference 2024

  2. arXiv:2405.20064  [pdf, other

    eess.AS cs.SD

    1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem

    Authors: Mingjie Chen, Hezhao Zhang, Yuanchao Li, Jiachen Luo, Wen Wu, Ziyang Ma, Peter Bell, Catherine Lai, Joshua Reiss, Lin Wang, Philip C. Woodland, Xie Chen, Huy Phan, Thomas Hain

    Abstract: Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for t… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  3. arXiv:2404.17821  [pdf

    cs.SD cs.MM eess.AS

    An automatic mixing speech enhancement system for multi-track audio

    Authors: Xiaojing Liu, Angeliki Mourgela, Hongwei Ai, Joshua D. Reiss

    Abstract: We propose a speech enhancement system for multitrack audio. The system will minimize auditory masking while allowing one to hear multiple simultaneous speakers. The system can be used in multiple communication scenarios e.g., teleconferencing, invoice gaming, and live streaming. The ITU-R BS.1387 Perceptual Evaluation of Audio Quality (PEAQ) model is used to evaluate the amount of masking in the… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 5 pages

  4. arXiv:2404.07970  [pdf, other

    eess.AS cs.LG cs.SD

    Differentiable All-pole Filters for Time-varying Audio Systems

    Authors: Chin-Yun Yu, Christopher Mitcheltree, Alistair Carson, Stefan Bilbao, Joshua D. Reiss, György Fazekas

    Abstract: Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous wo… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted at DAFx 2024

  5. arXiv:2310.15247  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis

    Authors: Marco Comunità, Riccardo F. Gramaccioni, Emilian Postolache, Emanuele Rodolà, Danilo Comminiello, Joshua D. Reiss

    Abstract: Sound design involves creatively selecting, recording, and editing sound effects for various media like cinema, video games, and virtual/augmented reality. One of the most time-consuming steps when designing sound is synchronizing audio with video. In some cases, environmental recordings from video shoots are available, which can aid in the process. However, in video games and animations, no refer… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  6. arXiv:2310.11364  [pdf, other

    cs.SD eess.AS

    High-Fidelity Noise Reduction with Differentiable Signal Processing

    Authors: Christian J. Steinmetz, Thomas Walther, Joshua D. Reiss

    Abstract: Noise reduction techniques based on deep learning have demonstrated impressive performance in enhancing the overall quality of recorded speech. While these approaches are highly performant, their application in audio engineering can be limited due to a number of factors. These include operation only on speech without support for music, lack of real-time capability, lack of interpretable control pa… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted for publication at the 155th Convention of the Audio Engineering Society

  7. arXiv:2309.14761  [pdf, other

    eess.AS cs.SD

    Optimization Techniques for a Physical Model of Human Vocalisation

    Authors: Mateo Cámara, Zhiyuan Xu, Yisu Zong, José Luis Blanco, Joshua D. Reiss

    Abstract: We present a non-supervised approach to optimize and evaluate the synthesis of non-speech audio effects from a speech production model. We use the Pink Trombone synthesizer as a case study of a simplified production model of the vocal tract to target non-speech human audio signals --yawnings. We selected and optimized the control parameters of the synthesizer to minimize the difference between rea… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted to DAFx 2023

  8. arXiv:2308.16177  [pdf, other

    cs.SD eess.AS

    General Purpose Audio Effect Removal

    Authors: Matthew Rice, Christian J. Steinmetz, George Fazekas, Joshua D. Reiss

    Abstract: Although the design and application of audio effects is well understood, the inverse problem of removing these effects is significantly more challenging and far less studied. Recently, deep learning has been applied to audio effect removal; however, existing approaches have focused on narrow formulations considering only one effect or source type at a time. In realistic scenarios, multiple effects… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Preprint. Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

  9. arXiv:2307.04702  [pdf, other

    cs.SD eess.AS

    Vocal Tract Area Estimation by Gradient Descent

    Authors: David Südholt, Mateo Cámara, Zhiyuan Xu, Joshua D. Reiss

    Abstract: Articulatory features can provide interpretable and flexible controls for the synthesis of human vocalizations by allowing the user to directly modify parameters like vocal strain or lip position. To make this manipulation through resynthesis possible, we need to estimate the features that result in a desired vocalization directly from audio recordings. In this work, we propose a white-box optimiz… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Accepted to DAFx 2023

  10. arXiv:2305.13262  [pdf, other

    cs.SD cs.LG eess.AS

    Modulation Extraction for LFO-driven Audio Effects

    Authors: Christopher Mitcheltree, Christian J. Steinmetz, Marco Comunità, Joshua D. Reiss

    Abstract: Low frequency oscillator (LFO) driven audio effects such as phaser, flanger, and chorus, modify an input signal using time-varying filters and delays, resulting in characteristic sweeping or widening effects. It has been shown that these effects can be modeled using neural networks when conditioned with the ground truth LFO signal. However, in most cases, the LFO signal is not accessible and measu… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to DAFx 2023. Listening samples and plugins can be found at https://christhetree.github.io/mod_extraction/

  11. arXiv:2302.02447  [pdf, other

    eess.AS cs.SD

    cross-modal fusion techniques for utterance-level emotion recognition from text and speech

    Authors: Jiachen Luo, Huy Phan, Joshua Reiss

    Abstract: Multimodal emotion recognition (MER) is a fundamental complex research problem due to the uncertainty of human emotional expression and the heterogeneity gap between different modalities. Audio and text modalities are particularly important for a human participant in understanding emotions. Although many successful attempts have been designed multimodal representations for MER, there still exist m… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

    Comments: 6 pages, 2 figures

  12. arXiv:2302.02419  [pdf, other

    cs.CL cs.SD eess.AS

    deep learning of segment-level feature representation for speech emotion recognition in conversations

    Authors: Jiachen Luo, Huy Phan, Joshua Reiss

    Abstract: Accurately detecting emotions in conversation is a necessary yet challenging task due to the complexity of emotions and dynamics in dialogues. The emotional state of a speaker can be influenced by many different factors, such as interlocutor stimulus, dialogue scene, and topic. In this work, we propose a conversational speech emotion recognition method to deal with capturing attentive contextual d… ▽ More

    Submitted 5 February, 2023; originally announced February 2023.

    Comments: 6 pages, 4 figures

  13. arXiv:2211.00497  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Modelling black-box audio effects with time-varying feature modulation

    Authors: Marco Comunità, Christian J. Steinmetz, Huy Phan, Joshua D. Reiss

    Abstract: Deep learning approaches for black-box modelling of audio effects have shown promise, however, the majority of existing work focuses on nonlinear effects with behaviour on relatively short time-scales, such as guitar amplifiers and distortion. While recurrent and convolutional architectures can theoretically be extended to capture behaviour at longer time scales, we show that simply scaling the wi… ▽ More

    Submitted 9 May, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

  14. arXiv:2207.08759  [pdf, other

    cs.SD eess.AS

    Style Transfer of Audio Effects with Differentiable Signal Processing

    Authors: Christian J. Steinmetz, Nicholas J. Bryan, Joshua D. Reiss

    Abstract: We present a framework that can impose the audio effects and production style from one recording to another by example with the goal of simplifying the audio production process. We train a deep neural network to analyze an input recording and a style reference recording, and predict the control parameters of audio effects used to render the output. In contrast to past work, we integrate audio effe… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: Preprint. To appear in the Journal of the Audio Engineering Society

  15. arXiv:2204.08026  [pdf, other

    cs.SD eess.AS eess.SP

    Advances in Thunder Sound Synthesis

    Authors: Eva Fineberg, Jack Walters, Joshua Reiss

    Abstract: A recent comparative study evaluated all known thunder synthesis techniques in terms of their perceptual realness. The findings concluded that none of the synthesised audio extracts seemed as realistic as the genuine phenomenon. The work presented herein is motivated by those findings, and attempts to create a synthesised sound effect of thunder indistinguishable from a real recording. The techniq… ▽ More

    Submitted 17 April, 2022; originally announced April 2022.

    Comments: 9 pages, 6 figures, conference paper accepted to the AES Europe Spring 2022 Audio Engineering 152nd Convention

  16. arXiv:2112.02926  [pdf, other

    eess.AS cs.SD

    Steerable discovery of neural audio effects

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: Applications of deep learning for audio effects often focus on modeling analog effects or learning to control effects to emulate a trained audio engineer. However, deep learning approaches also have the potential to expand creativity through neural audio effects that enable new sound transformations. While recent work demonstrated that neural networks with random weights produce compelling audio e… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: Accepted to NeurIPS 2021 Workshop on Machine Learning for Creativity and Design

  17. arXiv:2110.09605  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Neural Synthesis of Footsteps Sound Effects with Generative Adversarial Networks

    Authors: Marco Comunità, Huy Phan, Joshua D. Reiss

    Abstract: Footsteps are among the most ubiquitous sound effects in multimedia applications. There is substantial research into understanding the acoustic features and developing synthesis models for footstep sound effects. In this paper, we present a first attempt at adopting neural synthesis for this task. We implemented two GAN-based architectures and compared the results with real recordings as well as s… ▽ More

    Submitted 10 December, 2021; v1 submitted 18 October, 2021; originally announced October 2021.

  18. arXiv:2110.03691  [pdf, other

    eess.SP cs.LG cs.SD eess.AS

    Direct design of biquad filter cascades with deep learning by sampling random polynomials

    Authors: Joseph T. Colonel, Christian J. Steinmetz, Marcus Michelen, Joshua D. Reiss

    Abstract: Designing infinite impulse response filters to match an arbitrary magnitude response requires specialized techniques. Methods like modified Yule-Walker are relatively efficient, but may not be sufficiently accurate in matching high order responses. On the other hand, iterative optimization techniques often enable superior performance, but come at the cost of longer run-times and are sensitive to i… ▽ More

    Submitted 16 February, 2022; v1 submitted 7 October, 2021; originally announced October 2021.

    Comments: Accepted to ICASSP 2022

  19. arXiv:2110.01436  [pdf, other

    eess.AS cs.SD

    WaveBeat: End-to-end beat and downbeat tracking in the time domain

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: Deep learning approaches for beat and downbeat tracking have brought advancements. However, these approaches continue to rely on hand-crafted, subsampled spectral features as input, restricting the information available to the model. In this work, we propose WaveBeat, an end-to-end approach for joint beat and downbeat tracking operating directly on waveforms. This method forgoes engineered spectra… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

    Comments: To appear at the 151st AES Convention

  20. arXiv:2102.06200  [pdf, other

    eess.AS cs.SD

    Efficient neural networks for real-time modeling of analog dynamic range compression

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: Deep learning approaches have demonstrated success in modeling analog audio effects. Nevertheless, challenges remain in modeling more complex effects that involve time-varying nonlinear elements, such as dynamic range compressors. Existing neural network approaches for modeling compression either ignore the device parameters, do not attain sufficient accuracy, or otherwise require large noncausal… ▽ More

    Submitted 15 April, 2022; v1 submitted 11 February, 2021; originally announced February 2021.

    Comments: Updated and will appear at 152nd AES Convention (note title change)

  21. arXiv:2012.03216  [pdf, other

    cs.SD cs.LG eess.AS

    Guitar Effects Recognition and Parameter Estimation with Convolutional Neural Networks

    Authors: Marco Comunità, Dan Stowell, Joshua D. Reiss

    Abstract: Despite the popularity of guitar effects, there is very little existing research on classification and parameter estimation of specific plugins or effect units from guitar recordings. In this paper, convolutional neural networks were used for classification and parameter estimation for 13 overdrive, distortion and fuzz guitar effects. A novel dataset of processed electric guitar samples was assemb… ▽ More

    Submitted 6 December, 2020; originally announced December 2020.

    Journal ref: JAES Volume 69 Issue 7/8 pp. 594-604; July 2021

  22. arXiv:2011.05016  [pdf, other

    physics.flu-dyn cs.CE math.NA

    Wavelet Adaptive Proper Orthogonal Decomposition for Large Scale Flow Data

    Authors: Philipp Krah, Thomas Engels, Kai Schneider, Julius Reiss

    Abstract: The proper orthogonal decomposition (POD) is a powerful classical tool in fluid mechanics used, for instance, for model reduction and extraction of coherent flow features. However, its applicability to high-resolution data, as produced by three-dimensional direct numerical simulations, is limited owing to its computational complexity. Here, we propose a wavelet-based adaptive version of the POD (t… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: The algorithm can be found as a post processing tool in the open source software package wabbit (https://github.com/adaptive-cfd/WABBIT). Please note, that this paper is a working paper and is not reviewed yet. It was submitted to ACOM Journal at the 10th of November 2020

  23. arXiv:2010.13158  [pdf, other

    physics.geo-ph cs.SD eess.AS

    A "DIY" data acquisition system for acoustic field measurements under harsh conditions

    Authors: Steffen Büchholz, Mathias Lemke, Julius Reiss, Jörn Sesterhenn

    Abstract: Monitoring active volcanos is an ongoing and important task helping to understand and predict volcanic eruptions. In recent years, analysing the acoustic properties of eruptions became more relevant. We present an inexpensive, lightweight, portable, easy to use and modular acoustic data acquisition system for field measurements that can record data with up to 100~kHz. The system is based on a Rasp… ▽ More

    Submitted 25 October, 2020; originally announced October 2020.

    Comments: 9 figures at the end

  24. arXiv:2010.04237  [pdf, other

    eess.AS cs.SD

    Randomized Overdrive Neural Networks

    Authors: Christian J. Steinmetz, Joshua D. Reiss

    Abstract: By processing audio signals in the time-domain with randomly weighted temporal convolutional networks (TCNs), we uncover a wide range of novel, yet controllable overdrive effects. We discover that architectural aspects, such as the depth of the network, the kernel size, the number of channels, the activation function, as well as the weight initialization, all have a clear impact on the sonic chara… ▽ More

    Submitted 4 August, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

    Comments: Updating project URL. Now https://csteinmetz1.github.io/ronn

  25. arXiv:1910.10105  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Modeling plate and spring reverberation using a DSP-informed deep neural network

    Authors: Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss

    Abstract: Plate and spring reverberators are electromechanical systems first used and researched as means to substitute real room reverberation. Nowadays they are often used in music production for aesthetic reasons due to their particular sonic characteristics. The modeling of these audio processors and their perceptual qualities is difficult since they use mechanical elements together with analog electron… ▽ More

    Submitted 17 April, 2020; v1 submitted 22 October, 2019; originally announced October 2019.

    Comments: Presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Barcelona, Spain, May 2020. Source code, dataset, audio examples and more detailed diagrams: https://mchijmma.github.io/modeling-plate-spring-reverb/

  26. arXiv:1905.06148  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    A general-purpose deep learning approach to model time-varying audio effects

    Authors: Marco A. Martínez Ramírez, Emmanouil Benetos, Joshua D. Reiss

    Abstract: Audio processors whose parameters are modified periodically over time are often referred as time-varying or modulation based audio effects. Most existing methods for modeling these type of effect units are often optimized to a very specific circuit and cannot be efficiently generalized to other time-varying effects. Based on convolutional and recurrent neural networks, we propose a deep learning a… ▽ More

    Submitted 21 June, 2019; v1 submitted 15 May, 2019; originally announced May 2019.

    Comments: audio files: https://mchijmma.github.io/modeling-time-varying/

  27. arXiv:1901.11436  [pdf, other

    stat.ML cs.LG cs.SD eess.AS eess.SP

    End-to-End Probabilistic Inference for Nonstationary Audio Analysis

    Authors: William J. Wilkinson, Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, Arno Solin

    Abstract: A typical audio signal processing pipeline includes multiple disjoint analysis stages, including calculation of a time-frequency representation followed by spectrogram-based feature analysis. We show how time-frequency analysis and nonnegative matrix factorisation can be jointly formulated as a spectral mixture Gaussian process model with nonstationary priors over the amplitude variance parameters… ▽ More

    Submitted 27 April, 2019; v1 submitted 31 January, 2019; originally announced January 2019.

    Comments: Accepted to the Thirty-sixth International Conference on Machine Learning (ICML) 2019

  28. arXiv:1811.02489  [pdf, other

    eess.SP cs.LG cs.SD eess.AS stat.ML

    Unifying Probabilistic Models for Time-Frequency Analysis

    Authors: William J. Wilkinson, Michael Riis Andersen, Joshua D. Reiss, Dan Stowell, Arno Solin

    Abstract: In audio signal processing, probabilistic time-frequency models have many benefits over their non-probabilistic counterparts. They adapt to the incoming signal, quantify uncertainty, and measure correlation between the signal's amplitude and phase information, making time domain resynthesis straightforward. However, these models are still not widely used since they come at a high computational cos… ▽ More

    Submitted 12 February, 2019; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019

  29. arXiv:1810.06603  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Modeling of nonlinear audio effects with end-to-end deep neural networks

    Authors: Marco A. Martínez Ramirez, Joshua D. Reiss

    Abstract: In the context of music production, distortion effects are mainly used for aesthetic reasons and are usually applied to electric musical instruments. Most existing methods for nonlinear modeling are often either simplified or optimized to a very specific circuit. In this work, we investigate deep learning architectures for audio processing and we aim to find a general purpose end-to-end deep neura… ▽ More

    Submitted 6 March, 2019; v1 submitted 15 October, 2018; originally announced October 2018.

    Comments: Presented at the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, UK, May 2019

  30. arXiv:1803.11154  [pdf, other

    eess.IV cs.SD eess.AS

    An empirical approach to the relationship between emotion and music production quality

    Authors: David Ronan, Joshua D. Reiss, Hatice Gunes

    Abstract: In music production, the role of the mix engineer is to take recorded music and convey the expressed emotions as professionally sounding as possible. We investigated the relationship between music production quality and musically induced and perceived emotions. A listening test was performed where 10 critical listeners and 10 non-critical listeners evaluated 10 songs. There were two mixes of each… ▽ More

    Submitted 29 March, 2018; originally announced March 2018.

    Comments: 12 Pages

  31. arXiv:1803.09960   

    eess.AS cs.SD

    Automatic Minimisation of Masking in Multitrack Audio using Subgroups

    Authors: David Ronan, Zheng Ma, Paul Mc Namara, Hatice Gunes, Joshua D. Reiss

    Abstract: The iterative process of masking minimisation when mixing multitrack audio is a challenging optimisation problem, in part due to the complexity and non-linearity of auditory perception. In this article, we first propose a multitrack masking metric inspired by the MPEG psychoacoustic model. We investigate different audio processing techniques to manipulate the frequency and dynamic characteristics… ▽ More

    Submitted 5 January, 2021; v1 submitted 27 March, 2018; originally announced March 2018.

    Comments: Need to resolve ownership of intellectual property

  32. arXiv:1802.00680  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    A Generative Model for Natural Sounds Based on Latent Force Modelling

    Authors: William J. Wilkinson, Joshua D. Reiss, Dan Stowell

    Abstract: Recent advances in analysis of subband amplitude envelopes of natural sounds have resulted in convincing synthesis, showing subband amplitudes to be a crucial component of perception. Probabilistic latent variable analysis is particularly revealing, but existing approaches don't incorporate prior knowledge about the physical behaviour of amplitude envelopes, such as exponential decay and feedback.… ▽ More

    Submitted 27 March, 2019; v1 submitted 2 February, 2018; originally announced February 2018.

    Comments: 10 pages, 5 figures