Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–13 of 13 results for author: Schlüter, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.09546  [pdf, other

    eess.AS cs.SD

    Effective Pre-Training of Audio Transformers for Sound Event Detection

    Authors: Florian Schmid, Tobias Morocutti, Francesco Foscarin, Jan Schlüter, Paul Primus, Gerhard Widmer

    Abstract: We propose a pre-training pipeline for audio spectrogram transformers for frame-level sound event detection tasks. On top of common pre-training steps, we add a meticulously designed training routine on AudioSet frame-level annotations. This includes a balanced sampler, aggressive data augmentation, and ensemble knowledge distillation. For five transformers, we obtain a substantial performance imp… ▽ More

    Submitted 14 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP'25. Source code available: https://github.com/fschmid56/PretrainedSED

  2. arXiv:2409.07220  [pdf, other

    cs.CV

    Watchlist Challenge: 3rd Open-set Face Detection and Identification

    Authors: Furkan Kasım, Terrance E. Boult, Rensso Mora, Bernardo Biesseck, Rafael Ribeiro, Jan Schlueter, Tomáš Repák, Rafael Henrique Vareto, David Menotti, William Robson Schwartz, Manuel Günther

    Abstract: In the current landscape of biometrics and surveillance, the ability to accurately recognize faces in uncontrolled settings is paramount. The Watchlist Challenge addresses this critical need by focusing on face detection and open-set identification in real-world surveillance scenarios. This paper presents a comprehensive evaluation of participating algorithms, using the enhanced UnConstrained Coll… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: Accepted for presentation at IJCB 2024

  3. arXiv:2407.21658  [pdf, other

    cs.SD cs.LG eess.AS

    Beat this! Accurate beat tracking without DBN postprocessing

    Authors: Francesco Foscarin, Jan Schlüter, Gerhard Widmer

    Abstract: We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy. We achieve generality by training on multiple datasets -- including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations -- and by removing the commonly used Dynamic Bayesian Network (DBN) postprocessing, whi… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted at the 25th International Society for Music Information Retrieval Conference (ISMIR), 2024

  4. arXiv:2211.13956  [pdf, other

    cs.SD cs.LG eess.AS

    Learning General Audio Representations with Large-Scale Training of Patchout Audio Transformers

    Authors: Khaled Koutini, Shahed Masoudian, Florian Schmid, Hamid Eghbal-zadeh, Jan Schlüter, Gerhard Widmer

    Abstract: The success of supervised deep learning methods is largely due to their ability to learn relevant features from raw data. Deep Neural Networks (DNNs) trained on large-scale datasets are capable of capturing a diverse set of features, and learning a representation that can generalize onto unseen tasks and datasets that are from the same domain. Hence, these models can be used as powerful feature ex… ▽ More

    Submitted 2 March, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: will apear in HEAR: Holistic Evaluation of Audio Representations Proceedings of Machine Learning Research PMLR 166. Source code: https://github.com/kkoutini/passt_hear21

    Journal ref: Proceedings of Machine Learning Research v166 (2022) 65-89

  5. arXiv:2208.08706  [pdf, other

    cs.SD cs.LG eess.AS

    Musika! Fast Infinite Waveform Music Generation

    Authors: Marco Pasini, Jan Schlüter

    Abstract: Fast and user-controllable music generation could enable novel ways of composing or performing music. However, state-of-the-art music generation systems require large amounts of data and computational resources for training, and are slow at inference. This makes them impractical for real-time interactive use. In this work, we introduce Musika, a music generation system that can be trained on hundr… ▽ More

    Submitted 18 August, 2022; originally announced August 2022.

    Comments: Accepted at ISMIR 2022

  6. arXiv:2207.05508  [pdf, other

    cs.SD cs.LG eess.AS

    EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use

    Authors: Jan Schlüter, Gerald Gutenbrunner

    Abstract: In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy Normalization (PCEN), has shown promising results, but is computationally expensive. With inhomogeneous convolution kernel sizes and strides, and by replacing PCEN w… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

    Comments: Accepted at EUSIPCO 2022. Code at https://github.com/CPJKU/EfficientLEAF

  7. Efficient Training of Audio Transformers with Patchout

    Authors: Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, Gerhard Widmer

    Abstract: The great success of transformer-based models in natural language processing (NLP) has led to various attempts at adapting these architectures to other domains such as vision and audio. Recent work has shown that transformers can outperform Convolutional Neural Networks (CNNs) on vision and audio tasks. However, one of the main shortcomings of transformer models, compared to the well-established C… ▽ More

    Submitted 29 March, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Submitted to Interspeech 2022. Source code: https://github.com/kkoutini/PaSST

  8. arXiv:2107.08933  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Over-Parameterization and Generalization in Audio Classification

    Authors: Khaled Koutini, Hamid Eghbal-zadeh, Florian Henkel, Jan Schlüter, Gerhard Widmer

    Abstract: Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: Presented at the ICML 2021 Workshop on Overparameterization: Pitfalls & Opportunities

  9. arXiv:1905.00078  [pdf, other

    cs.SD eess.AS stat.ML

    Deep Learning for Audio Signal Processing

    Authors: Hendrik Purwins, Bo Li, Tuomas Virtanen, Jan Schlüter, Shuo-yiin Chang, Tara Sainath

    Abstract: Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fer… ▽ More

    Submitted 25 May, 2019; v1 submitted 30 April, 2019; originally announced May 2019.

    Comments: 15 pages, 2 pdf figures

    ACM Class: I.2.6; H.5.1

    Journal ref: Journal of Selected Topics of Signal Processing 14, No. 8 (2019)

  10. arXiv:1812.11901  [pdf, other

    cs.CV

    Large-Scale Object Detection of Images from Network Cameras in Variable Ambient Lighting Conditions

    Authors: Caleb Tung, Matthew R. Kelleher, Ryan J. Schlueter, Binhan Xu, Yung-Hsiang Lu, George K. Thiruvathukal, Yen-Kuang Chen, Yang Lu

    Abstract: Computer vision relies on labeled datasets for training and evaluation in detecting and recognizing objects. The popular computer vision program, YOLO ("You Only Look Once"), has been shown to accurately detect objects in many major image datasets. However, the images found in those datasets, are independent of one another and cannot be used to test YOLO's consistency at detecting the same object… ▽ More

    Submitted 31 December, 2018; originally announced December 2018.

    Comments: Submitted to MIPR 2019 (Accepted)

  11. End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss

    Authors: Matthias Dorfer, Jan Schlüter, Andreu Vall, Filip Korzeniowski, Gerhard Widmer

    Abstract: Cross-modality retrieval encompasses retrieval tasks where the fetched items are of a different type than the search query, e.g., retrieving pictures relevant to a given text query. The state-of-the-art approach to cross-modality retrieval relies on learning a joint embedding space of the two modalities, where items from either modality are retrieved using nearest-neighbor search. In this work, we… ▽ More

    Submitted 16 April, 2018; v1 submitted 19 May, 2017; originally announced May 2017.

    Comments: Preliminary version of a paper published in the International Journal of Multimedia Information Retrieval

  12. arXiv:1605.07008  [pdf, ps, other

    cs.SD

    madmom: a new Python Audio and Music Signal Processing Library

    Authors: Sebastian Böck, Filip Korzeniowski, Jan Schlüter, Florian Krebs, Gerhard Widmer

    Abstract: In this paper, we present madmom, an open-source audio processing and music information retrieval (MIR) library written in Python. madmom features a concise, NumPy-compatible, object oriented design with simple calling conventions and sensible default values for all parameters, which facilitates fast prototyping of MIR applications. Prototypes can be seamlessly converted into callable processing p… ▽ More

    Submitted 23 May, 2016; originally announced May 2016.

    ACM Class: H.5.5

  13. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures