Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–32 of 32 results for author: Das, R K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03661  [pdf, other

    eess.AS cs.SD

    Configurable DOA Estimation using Incremental Learning

    Authors: Yang Xiao, Rohan Kumar Das

    Abstract: This study introduces a progressive neural network (PNN) model for direction of arrival (DOA) estimation, DOA-PNN, addressing the challenge due to catastrophic forgetting in adapting dynamic acoustic environments. While traditional methods such as GCC, MUSIC, and SRP-PHAT are effective in static settings, they perform worse in noisy, reverberant conditions. Deep learning models, particularly CNNs,… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Submitted to DCASE WS 2024

  2. arXiv:2407.03657  [pdf, other

    eess.AS cs.SD

    UCIL: An Unsupervised Class Incremental Learning Approach for Sound Event Detection

    Authors: Yang Xiao, Rohan Kumar Das

    Abstract: This work explores class-incremental learning (CIL) for sound event detection (SED), advancing adaptability towards real-world scenarios. CIL's success in domains like computer vision inspired our SED-tailored method, addressing the unique challenges of diverse and complex audio environments. Our approach employs an independent unsupervised learning framework with a distillation loss function to i… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Submitted to DCASE WS 2024

  3. arXiv:2407.03656  [pdf, other

    eess.AS cs.SD

    WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection System

    Authors: Yang Xiao, Rohan Kumar Das

    Abstract: This work aims to advance sound event detection (SED) research by presenting a new large language model (LLM)-powered dataset namely wild domestic environment sound event detection (WildDESED). It is crafted as an extension to the original DESED dataset to reflect diverse acoustic variability and complex noises in home settings. We leveraged LLMs to generate eight different domestic scenarios base… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Submitted to DCASE WS 2024

  4. arXiv:2407.00291  [pdf, other

    eess.AS cs.SD

    FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels

    Authors: Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

    Abstract: This report presents the systems developed and submitted by Fortemedia Singapore (FMSG) and Joint Laboratory of Environmental Sound Sensing (JLESS) for DCASE 2024 Task 4. The task focuses on recognizing event classes and their time boundaries, given that multiple events can be present and may overlap in an audio recording. The novelty this year is a dataset with two sources, making it challenging… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Technical report for DCASE 2024 Challenge Task 4

  5. arXiv:2406.02483  [pdf, other

    eess.AS cs.AI cs.SD

    How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?

    Authors: Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li

    Abstract: Partially manipulating a sentence can greatly change its meaning. Recent work shows that countermeasures (CMs) trained on partially spoofed audio can effectively detect such spoofing. However, the current understanding of the decision-making process of CMs is limited. We utilize Grad-CAM and introduce a quantitative analysis metric to interpret CMs' decisions. We find that CMs prioritize the artif… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  6. arXiv:2404.17280  [pdf, other

    cs.SD eess.AS

    Device Feature based on Graph Fourier Transformation with Logarithmic Processing For Detection of Replay Speech Attacks

    Authors: Mingrui He, Longting Xu, Han Wang, Mingjun Zhang, Rohan Kumar Das

    Abstract: The most common spoofing attacks on automatic speaker verification systems are replay speech attacks. Detection of replay speech heavily relies on replay configuration information. Previous studies have shown that graph Fourier transform-derived features can effectively detect replay speech but ignore device and environmental noise effects. In this work, we propose a new feature, the graph frequen… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  7. arXiv:2404.09342  [pdf, other

    cs.CV cs.SD eess.AS

    Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

    Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

    Abstract: The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2… ▽ More

    Submitted 22 July, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: ACM Multimedia Conference - Grand Challenge

  8. arXiv:2402.02781  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Dual Knowledge Distillation for Efficient Sound Event Detection

    Authors: Yang Xiao, Rohan Kumar Das

    Abstract: Sound event detection (SED) is essential for recognizing specific sounds and their temporal locations within acoustic signals. This becomes challenging particularly for on-device applications, where computational resources are limited. To address this issue, we introduce a novel framework referred to as dual knowledge distillation for developing efficient SED systems in this work. Our proposed dua… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024 (Deep Neural Network Model Compression Workshop)

  9. arXiv:2401.07944  [pdf, ps, other

    cs.CL

    SemEval-2017 Task 4: Sentiment Analysis in Twitter using BERT

    Authors: Rupak Kumar Das, Dr. Ted Pedersen

    Abstract: This paper uses the BERT model, which is a transformer-based architecture, to solve task 4A, English Language, Sentiment Analysis in Twitter of SemEval2017. BERT is a very powerful large language model for classification tasks when the amount of training data is small. For this experiment, we have used the BERT(BASE) model, which has 12 hidden layers. This model provides better accuracy, precision… ▽ More

    Submitted 19 June, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

  10. arXiv:2401.00959  [pdf, other

    cs.HC

    Creating an Intelligent Dementia-Friendly Living Space: A Feasibility Study Integrating Assistive Robotics, Wearable Sensors, and Spatial Technology

    Authors: Arshia A Khan, Rupak Kumar Das, Anna Martin, Dale Dowling, Rana Imtiaz

    Abstract: This study investigates the integration of assistive therapeutic robotics, wearable sensors, and spatial sensors within an intelligent environment tailored for dementia care. The feasibility study aims to assess the collective impact of these technologies in enhancing care giving by seamlessly integrating supportive technology in the background. The wearable sensors track physiological data, while… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  11. Future Industrial Applications: Exploring LPWAN-Driven IoT Protocols

    Authors: Mahbubul Islam, Hossain Md. Mubashshir Jamil, Samiul Ahsan Pranto, Rupak Kumar Das, Al Amin, Arshia Khan

    Abstract: The Internet of Things (IoT) will bring about the next industrial revolution in Industry 4.0. The communication aspect of IoT devices is one of the most critical factors in choosing the suitable device for the suitable usage. So far, the IoT physical layer communication challenges have been met with various communications protocols that provide varying strengths and weaknesses. Moreover, most of t… ▽ More

    Submitted 19 January, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Report number: s24082509

    Journal ref: Sensors 2024, 24, 2509

  12. arXiv:2305.15901  [pdf, other

    cs.LG

    Consistent Optimal Transport with Empirical Conditional Measures

    Authors: Piyushi Manupriya, Rachit Keerti Das, Sayantan Biswas, Saketha Nath Jagarlapudi

    Abstract: Given samples from two joint distributions, we consider the problem of Optimal Transportation (OT) between them when conditioned on a common variable. We focus on the general setting where the conditioned variable may be continuous, and the marginals of this variable in the two joint distributions may not be the same. In such settings, standard OT variants cannot be employed, and novel estimation… ▽ More

    Submitted 10 June, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

  13. arXiv:2211.01091  [pdf, ps, other

    eess.AS cs.AI cs.SD

    I4U System Description for NIST SRE'20 CTS Challenge

    Authors: Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Viñals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang , et al. (1 additional authors not shown)

    Abstract: This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge. The I4U's submission was resulted from active collaboration among researchers across eight research teams - I$^2$R (Singapore), UEF (Finland), VALPT (Italy, Spain), NEC (Japan), THUEE (China), LIA (France), NUS (Singapore), INRIA (France) and TJU (C… ▽ More

    Submitted 2 November, 2022; originally announced November 2022.

    Comments: SRE 2021, NIST Speaker Recognition Evaluation Workshop, CTS Speaker Recognition Challenge, 14-12 December 2021

  14. arXiv:2210.15385  [pdf, other

    eess.AS cs.SD eess.SP

    Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

    Authors: Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

    Abstract: We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels. The speaker encoder is trained to extract a fixed-size speaker embedding from a spoken utterance of various length. Contrastive learning is a typical self-supervised learning technique. However, the quality of the speaker encoder depends very much on the sa… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: 13 pages

  15. arXiv:2202.01624  [pdf, other

    cs.SD cs.CL eess.AS eess.SP

    MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

    Authors: Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li

    Abstract: The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification. However, they require a large number of filters to capture the speaker characteristics at any local frequency region. In addition, the performance of such systems may degrade under short utterance scenarios. To address these issues, we propose a multi-scale freq… ▽ More

    Submitted 15 February, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: Accepted by ICASSP 2022

  16. arXiv:2112.04573  [pdf

    cs.DL cs.AI cs.LG

    Application of Artificial Intelligence and Machine Learning in Libraries: A Systematic Review

    Authors: Rajesh Kumar Das, Mohammad Sharif Ul Islam

    Abstract: As the concept and implementation of cutting-edge technologies like artificial intelligence and machine learning has become relevant, academics, researchers and information professionals involve research in this area. The objective of this systematic literature review is to provide a synthesis of empirical studies exploring application of artificial intelligence and machine learning in libraries.… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  17. arXiv:2110.00797  [pdf, other

    eess.AS cs.SD

    Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition

    Authors: Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna

    Abstract: The automatic recognition of pathological speech, particularly from children with any articulatory impairment, is a challenging task due to various reasons. The lack of available domain specific data is one such obstacle that hinders its usage for different speech-based applications targeting pathological speakers. In line with the challenge, in this work, we investigate a few data augmentation te… ▽ More

    Submitted 2 October, 2021; originally announced October 2021.

  18. arXiv:2109.08007  [pdf, other

    cs.MM cs.SD eess.AS

    Graph Fourier Transform based Audio Zero-watermarking

    Authors: Longting Xu, Daiyu Huang, Syed Faham Ali Zaidi, Abdul Rauf, Rohan Kumar Das

    Abstract: The frequent exchange of multimedia information in the present era projects an increasing demand for copyright protection. In this work, we propose a novel audio zero-watermarking technology based on graph Fourier transform for enhancing the robustness with respect to copyright protection. In this approach, the combined shift operator is used to construct the graph signal, upon which the graph Fou… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

  19. arXiv:2107.06592  [pdf, other

    eess.AS cs.SD eess.IV

    Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection

    Authors: Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li

    Abstract: Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers. The successful ASD depends on accurate interpretation of short-term and long-term audio and visual information, as well as audio-visual interaction. Unlike the prior work where systems make decision instantaneously using short-term features, we propose a novel framework, named TalkNet, that ma… ▽ More

    Submitted 25 July, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: ACM Multimedia 2021

  20. arXiv:2010.03909  [pdf, other

    eess.AS cs.SD

    Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech

    Authors: Biswajit Dev Sarma, Rohan Kumar Das

    Abstract: Emotional state of a speaker is found to have significant effect in speech production, which can deviate speech from that arising from neutral state. This makes identifying speakers with different emotions a challenging task as generally the speaker models are trained using neutral speech. In this work, we propose to overcome this problem by creation of emotion invariant speaker embedding. We lear… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: Accepted for publication in APSIPA ASC 2020

  21. arXiv:2010.03907  [pdf, ps, other

    eess.AS cs.SD

    Classification of Speech with and without Face Mask using Acoustic Features

    Authors: Rohan Kumar Das, Haizhou Li

    Abstract: The understanding and interpretation of speech can be affected by various external factors. The use of face masks is one such factors that can create obstruction to speech while communicating. This may lead to degradation of speech processing and affect humans perceptually. Knowing whether a speaker wears a mask may be useful for modeling speech for different applications. With this motivation, fi… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: Accepted for publication in APSIPA ASC 2020

  22. arXiv:2010.03905  [pdf, other

    eess.AS cs.SD

    HLT-NUS Submission for NIST 2019 Multimedia Speaker Recognition Evaluation

    Authors: Rohan Kumar Das, Ruijie Tao, Jichen Yang, Wei Rao, Cheng Yu, Haizhou Li

    Abstract: This work describes the speaker verification system developed by Human Language Technology Laboratory, National University of Singapore (HLT-NUS) for 2019 NIST Multimedia Speaker Recognition Evaluation (SRE). The multimedia research has gained attention to a wide range of applications and speaker recognition is no exception to it. In contrast to the previous NIST SREs, the latest edition focuses o… ▽ More

    Submitted 8 October, 2020; originally announced October 2020.

    Comments: Accepted for publication in APSIPA ASC 2020

  23. arXiv:2009.03554  [pdf, other

    eess.AS cs.SD

    Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions

    Authors: Rohan Kumar Das, Tomi Kinnunen, Wen-Chin Huang, Zhenhua Ling, Junichi Yamagishi, Yi Zhao, Xiaohai Tian, Tomoki Toda

    Abstract: The Voice Conversion Challenge 2020 is the third edition under its flagship that promotes intra-lingual semiparallel and cross-lingual voice conversion (VC). While the primary evaluation of the challenge submissions was done through crowd-sourced listening tests, we also performed an objective assessment of the submitted systems. The aim of the objective assessment is to provide complementary perf… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

    Comments: Submitted to ISCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

  24. arXiv:2008.12527  [pdf, other

    eess.AS cs.SD

    Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

    Authors: Yi Zhao, Wen-Chin Huang, Xiaohai Tian, Junichi Yamagishi, Rohan Kumar Das, Tomi Kinnunen, Zhenhua Ling, Tomoki Toda

    Abstract: The voice conversion challenge is a bi-annual scientific event held to compare and understand different voice conversion (VC) systems built on a common dataset. In 2020, we organized the third edition of the challenge and constructed and distributed a new database for two tasks, intra-lingual semi-parallel and cross-lingual VC. After a two-month challenge period, we received 33 submissions, includ… ▽ More

    Submitted 28 August, 2020; originally announced August 2020.

    Comments: Submitted to ISCA Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

  25. arXiv:2008.08901  [pdf, other

    eess.AS cs.CL cs.SD eess.SP

    Speaker-Utterance Dual Attention for Speaker and Utterance Verification

    Authors: Tianchi Liu, Rohan Kumar Das, Maulik Madhavi, Shengmei Shen, Haizhou Li

    Abstract: In this paper, we study a novel technique that exploits the interaction between speaker traits and linguistic content to improve both speaker verification and utterance verification performance. We implement an idea of speaker-utterance dual attention (SUDA) in a unified neural network. The dual attention refers to an attention mechanism for the two tasks of speaker and utterance verification. The… ▽ More

    Submitted 20 August, 2020; originally announced August 2020.

    Comments: Accepted by Interspeech 2020

  26. arXiv:2005.08046  [pdf, other

    eess.AS cs.SD

    The INTERSPEECH 2020 Far-Field Speaker Verification Challenge

    Authors: Xiaoyi Qin, Ming Li, Hui Bu, Wei Rao, Rohan Kumar Das, Shrikanth Narayanan, Haizhou Li

    Abstract: The INTERSPEECH 2020 Far-Field Speaker Verification Challenge (FFSVC 2020) addresses three different research problems under well-defined conditions: far-field text-dependent speaker verification from single microphone array, far-field text-independent speaker verification from single microphone array, and far-field text-dependent speaker verification from distributed microphone arrays. All three… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

    Comments: Submitted to INTERSPEECH 2020

  27. arXiv:2004.08849  [pdf, other

    eess.AS cs.CR

    The Attacker's Perspective on Automatic Speaker Verification: An Overview

    Authors: Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li

    Abstract: Security of automatic speaker verification (ASV) systems is compromised by various spoofing attacks. While many types of non-proactive attacks (and their defenses) have been studied in the past, attacker's perspective on ASV, represents a far less explored direction. It can potentially help to identify the weakest parts of ASV systems and be used to develop attacker-aware systems. We present an ov… ▽ More

    Submitted 19 April, 2020; originally announced April 2020.

    Comments: 5 pages, 1 figure, Submitted to Interspeech 2020

  28. arXiv:2002.00387  [pdf, other

    cs.SD eess.AS

    The FFSVC 2020 Evaluation Plan

    Authors: Xiaoyi Qin, Ming Li, Hui Bu, Rohan Kumar Das, Wei Rao, Shrikanth Narayanan, Haizhou Li

    Abstract: The Far-Field Speaker Verification Challenge 2020 (FFSVC20) is designed to boost the speaker verification research with special focus on far-field distributed microphone arrays under noisy conditions in real scenarios. The objectives of this challenge are to: 1) benchmark the current speech verification technology under this challenging condition, 2) promote the development of new ideas and techno… ▽ More

    Submitted 4 February, 2020; v1 submitted 2 February, 2020; originally announced February 2020.

  29. arXiv:1904.07386  [pdf, other

    eess.AS cs.CL cs.SD

    I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

    Authors: Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda , et al. (21 additional authors not shown)

    Abstract: The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the res… ▽ More

    Submitted 15 April, 2019; originally announced April 2019.

    Comments: 5 pages

  30. arXiv:1809.06798  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Generative x-vectors for text-independent speaker verification

    Authors: Longting Xu, Rohan Kumar Das, Emre Yılmaz, Jichen Yang, Haizhou Li

    Abstract: Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems. The fusion of these systems provides improved performance benefiting both from the discriminatively trained x-vectors and generative i-vectors capturing distinct speaker characteristics. In this paper, we propose… ▽ More

    Submitted 17 September, 2018; originally announced September 2018.

    Comments: Accepted for publication at SLT 2018

  31. arXiv:1403.2508  [pdf, other

    cs.DC

    Heuristic-based Optimal Resource Provisioning in Application-centric Cloud

    Authors: Sunirmal Khatua, Preetam K. Sur, Rajib K. Das, Nandini Mukherjee

    Abstract: Cloud Service Providers (CSPs) adapt different pricing models for their offered services. Some of the models are suitable for short term requirement while others may be suitable for the Cloud Service User's (CSU) long term requirement. In this paper, we look at the problem of finding the amount of resources to be reserved to satisfy the CSU's long term demands with the aim of minimizing the total… ▽ More

    Submitted 11 March, 2014; originally announced March 2014.

  32. arXiv:1310.7376  [pdf, ps, other

    cs.DC

    Eccentricity of the nodes of OTIS-cube and Enhanced OTIS-cube

    Authors: Rajib K Das

    Abstract: In this paper we have classified the nodes of OTIS-cube based on their eccentricities. OTIS (optical transpose interconnection system) is a large scale optoelectronic computer architecture, proposed in \cite{KMKE92}, that benefit from both optical and electronic technologies. We show that radius and diameter of OTIS-$Q_n$ is $n+1$ and $2n+1$ respectively. We also show that average eccentricity of… ▽ More

    Submitted 28 October, 2013; originally announced October 2013.