Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,143 results for author: Wang, X

Searching in archive eess. Search in all archives.
.
  1. arXiv:2409.02444  [pdf, other

    cs.RO eess.SY

    USV-AUV Collaboration Framework for Underwater Tasks under Extreme Sea Conditions

    Authors: Jingzehua Xu, Guanwen Xie, Xinqi Wang, Yiyuan Yang, Shuai Zhang

    Abstract: Autonomous underwater vehicles (AUVs) are valuable for ocean exploration due to their flexibility and ability to carry communication and detection units. Nevertheless, AUVs alone often face challenges in harsh and extreme sea conditions. This study introduces a unmanned surface vehicle (USV)-AUV collaboration framework, which includes high-precision multi-AUV positioning using USV path planning vi… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.02336  [pdf, other

    eess.SY

    Comparative Analysis of Learning-Based Methods for Transient Stability Assessment

    Authors: Xingjian Wu, Xiaoting Wang, Xiaozhe Wang, Peter E. Caines, Jingyu Liu

    Abstract: Transient stability and critical clearing time (CCT) are important concepts in power system protection and control. This paper explores and compares various learning-based methods for predicting CCT under uncertainties arising from renewable generation, loads, and contingencies. Specially, we introduce new definitions of transient stability (B-stablilty) and CCT from an engineering perspective. Fo… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted for presentation at the 56th North American Power Symposium (NAPS)

  3. Clutter Suppression, Time-Frequency Synchronization, and Sensing Parameter Association in Asynchronous Perceptive Vehicular Networks

    Authors: Xiao-Yang Wang, Shaoshi Yang, Jianhua Zhang, Christos Masouros, Ping Zhang

    Abstract: Significant challenges remain for realizing precise positioning and velocity estimation in perceptive vehicular networks (PVN) enabled by the emerging integrated sensing and communication technology. First, complicated wireless propagation environment generates undesired clutter, which degrades the vehicular sensing performance and increases the computational complexity. Second, in practical PVN,… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 18 pages, 13 figures, 3 tables, accepted to publish on IEEE Journal on Selected Areas in Communications, vol. 42, no. 10, Oct. 2024

  4. Windowing Optimization for Fingerprint-Spectrum-Based Passive Sensing in Perceptive Mobile Networks

    Authors: Xiao-Yang Wang, Shaoshi Yang, Hou-Yu Zhai, Christos Masouros, J. Andrew Zhang

    Abstract: Perceptive mobile networks (PMN) have been widely recognized as a pivotal pillar for the sixth generation (6G) mobile communication systems. However, the asynchronicity between transmitters and receivers results in velocity and range ambiguity, which seriously degrades the sensing performance. To mitigate the ambiguity, carrier frequency offset (CFO) and time offset (TO) synchronizations have been… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 16 pages, 12 figures, accepted to publish on IEEE Transactions on Communications, Aug. 2024

  5. arXiv:2409.00753  [pdf, other

    cs.LG eess.SY

    Generalized Multi-hop Traffic Pressure for Heterogeneous Traffic Perimeter Control

    Authors: Xiaocan Li, Xiaoyu Wang, Ilia Smirnov, Scott Sanner, Baher Abdulhai

    Abstract: Perimeter control prevents loss of traffic network capacity due to congestion in urban areas. Homogeneous perimeter control allows all access points to a protected region to have the same maximal permitted inflow. However, homogeneous perimeter control performs poorly when the congestion in the protected region is heterogeneous (e.g., imbalanced demand) since the homogeneous perimeter control does… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 21 pages main body, 12 figures, journal paper

  6. arXiv:2409.00481  [pdf, other

    eess.AS cs.SD

    DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module

    Authors: Xinyu Wang, Qian Wang

    Abstract: Speech recognition is the technology that enables machines to interpret and process human speech, converting spoken language into text or commands. This technology is essential for applications such as virtual assistants, transcription services, and communication tools. The Audio-Visual Speech Recognition (AVSR) model enhances traditional speech recognition, particularly in noisy environments, by… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP 2025

  7. arXiv:2409.00387  [pdf, other

    eess.AS cs.SD

    Progressive Residual Extraction based Pre-training for Speech Representation Learning

    Authors: Tianrui Wang, Jin Li, Ziyang Ma, Rui Cao, Xie Chen, Longbiao Wang, Meng Ge, Xiaobao Wang, Yuguang Wang, Jianwu Dang, Nyima Tashi

    Abstract: Self-supervised learning (SSL) has garnered significant attention in speech processing, excelling in linguistic tasks such as speech recognition. However, jointly improving the performance of pre-trained models on various downstream tasks, each requiring different speech information, poses significant challenges. To this purpose, we propose a progressive residual extraction based self-supervised l… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  8. arXiv:2409.00130  [pdf

    eess.SP cs.AI cs.LG

    Mirror contrastive loss based sliding window transformer for subject-independent motor imagery based EEG signal recognition

    Authors: Jing Luo, Qi Mao, Weiwei Shi, Zhenghao Shi, Xiaofan Wang, Xiaofeng Lu, Xinhong Hei

    Abstract: While deep learning models have been extensively utilized in motor imagery based EEG signal recognition, they often operate as black boxes. Motivated by neurological findings indicating that the mental imagery of left or right-hand movement induces event-related desynchronization (ERD) in the contralateral sensorimotor area of the brain, we propose a Mirror Contrastive Loss based Sliding Window Tr… ▽ More

    Submitted 29 August, 2024; originally announced September 2024.

    Comments: This paper has been accepted by the Fourth International Workshop on Human Brain and Artificial Intelligence, joint workshop of the 33rd International Joint Conference on Artificial Intelligence, Jeju Island, South Korea, from August 3rd to August 9th, 2024

  9. arXiv:2409.00036  [pdf, other

    cs.IT cs.LG cs.MA eess.SY

    GNN-Empowered Effective Partial Observation MARL Method for AoI Management in Multi-UAV Network

    Authors: Yuhao Pan, Xiucheng Wang, Zhiyao Xu, Nan Cheng, Wenchao Xu, Jun-jie Zhang

    Abstract: Unmanned Aerial Vehicles (UAVs), due to their low cost and high flexibility, have been widely used in various scenarios to enhance network performance. However, the optimization of UAV trajectories in unknown areas or areas without sufficient prior information, still faces challenges related to poor planning performance and low distributed execution. These challenges arise when UAVs rely solely on… ▽ More

    Submitted 17 August, 2024; originally announced September 2024.

  10. arXiv:2408.16707  [pdf, other

    cs.LG eess.SP

    Enhanced forecasting of stock prices based on variational mode decomposition, PatchTST, and adaptive scale-weighted layer

    Authors: Xiaorui Xue, Shaofang Li, Xiaonan Wang

    Abstract: The significant fluctuations in stock index prices in recent years highlight the critical need for accurate forecasting to guide investment and financial strategies. This study introduces a novel composite forecasting framework that integrates variational mode decomposition (VMD), PatchTST, and adaptive scale-weighted layer (ASWL) to address these challenges. Utilizing datasets of four major stock… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

  11. A Control Theoretic Approach to Simultaneously Estimate Average Value of Time and Determine Dynamic Price for High-occupancy Toll Lanes

    Authors: Xuting Wang, Wen-Long Jin, Yafeng Yin

    Abstract: The dynamic pricing problem of a freeway corridor with high-occupancy toll (HOT) lanes was formulated and solved based on a point queue abstraction of the traffic system [Yin and Lou, 2009]. However, existing pricing strategies cannot guarantee that the closed-loop system converges to the optimal state, in which the HOT lanes' capacity is fully utilized but there is no queue on the HOT lanes, and… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

    Comments: 34 pages, 16 figures

    Journal ref: IEEE Trans. on ITS, 22(11):7293-7305, 2021

  12. arXiv:2408.15490  [pdf, ps, other

    eess.SP

    Symbiotic Sensing and Communication: Framework and Beamforming Design

    Authors: Fanghao Xia, Zesong Fei, Xinyi Wang, Weijie Yuan, Qingqing Wu, Yuanwei Liu, Tony Q. S. Quek

    Abstract: In this paper, we propose a novel symbiotic sensing and communication (SSAC) framework, comprising a base station (BS) and a passive sensing node. In particular, the BS transmits communication waveform to serve vehicle users (VUEs), while the sensing node is employed to execute sensing tasks based on the echoes in a bistatic manner, thereby avoiding the issue of self-interference. Besides the weak… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 16 pages, 11 figures, submitted to IEEE journals for possible publication

  13. arXiv:2408.15481  [pdf, ps, other

    eess.SP

    Joint Offloading and Beamforming Design in Integrating Sensing, Communication, and Computing Systems: A Distributed Approach

    Authors: Peng Liu, Zesong Fei, Xinyi Wang, Jingxuan Huang, Jie Hu, J. Andrew Zhang

    Abstract: When applying integrated sensing and communications (ISAC) in future mobile networks, many sensing tasks have low latency requirements, preferably being implemented at terminals. However, terminals often have limited computing capabilities and energy supply. In this paper, we investigate the effectiveness of leveraging the advanced computing capabilities of mobile edge computing (MEC) servers and… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: 15 pages, 12 figures, submitted to IEEE journals for possible publication

  14. Stable dynamic pricing scheme independent of lane-choice models for high-occupancy-toll lanes

    Authors: Wen-Long Jin, Xuting Wang, Yingyan Lou

    Abstract: A stable dynamic pricing scheme is essential to guarantee the desired performance of high-occupancy-toll (HOT) lanes, where single-occupancy vehicles (SOVs) can pay a price to use the HOT lanes. But existing methods apply to either only one type of lane-choice models with unknown parameters or different types of lane-choice models but with known parameters. In this study we present a new dynamic p… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 24 pages, 10 figures

    Journal ref: Transportation Research Part B, 140:64-78, 2020

  15. arXiv:2408.14460  [pdf, other

    eess.SP cs.NI

    Cloud-Based Federation Framework and Prototype for Open, Scalable, and Shared Access to NextG and IoT Testbeds

    Authors: Maxwell McManus, Tenzin Rinchen, Annoy Dey, Sumanth Thota, Zhaoxi Zhang, Jiangqi Hu, Xi Wang, Mingyue Ji, Nicholas Mastronarde, Elizabeth Serena Bentley, Michael Medley, Zhangyu Guan

    Abstract: In this work, we present a new federation framework for UnionLabs, an innovative cloud-based resource-sharing infrastructure designed for next-generation (NextG) and Internet of Things (IoT) over-the-air (OTA) experiments. The framework aims to reduce the federation complexity for testbeds developers by automating tedious backend operations, thereby providing scalable federation and remote access… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  16. arXiv:2408.14066  [pdf, other

    cs.SD cs.CR eess.AS

    A Preliminary Case Study on Long-Form In-the-Wild Audio Spoofing Detection

    Authors: Xuechen Liu, Xin Wang, Junichi Yamagishi

    Abstract: Audio spoofing detection has become increasingly important due to the rise in real-world cases. Current spoofing detectors, referred to as spoofing countermeasures (CM), are mainly trained and focused on audio waveforms with a single speaker and short duration. This study explores spoofing detection in more realistic scenarios, where the audio is long in duration and features multiple speakers and… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: Accepted to the 23rd International Conference of the Biometrics Special Interest Group (BIOSIG 2024). Copyright might be transferred, in such case the current version may be replaced

  17. arXiv:2408.13978  [pdf, other

    eess.IV cs.CV

    Histology Virtual Staining with Mask-Guided Adversarial Transfer Learning for Tertiary Lymphoid Structure Detection

    Authors: Qiuli Wang, Yongxu Liu, Li Ma, Xianqi Wang, Wei Chen, Xiaohong Yao

    Abstract: Histological Tertiary Lymphoid Structures (TLSs) are increasingly recognized for their correlation with the efficacy of immunotherapy in various solid tumors. Traditionally, the identification and characterization of TLSs rely on immunohistochemistry (IHC) staining techniques, utilizing markers such as CD20 for B cells. Despite the specificity of IHC, Hematoxylin-Eosin (H&E) staining offers a more… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

    Comments: 8 pages, 8 figures

  18. arXiv:2408.12829  [pdf, other

    cs.LG cs.SD eess.AS

    Uncertainty-Aware Mean Opinion Score Prediction

    Authors: Hui Wang, Shiwan Zhao, Jiaming Zhou, Xiguang Zheng, Haoqin Sun, Xuechen Wang, Yong Qin

    Abstract: Mean Opinion Score (MOS) prediction has made significant progress in specific domains. However, the unstable performance of MOS prediction models across diverse samples presents ongoing challenges in the practical application of these systems. In this paper, we point out that the absence of uncertainty modeling is a significant limitation hindering MOS prediction systems from applying to the real… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: Accepted by Interspeech 2024, oral

  19. arXiv:2408.11295  [pdf, ps, other

    eess.SP

    Channel Modeling Framework for Both Communications and Bistatic Sensing Under 3GPP Standard

    Authors: Chenhao Luo, Aimin Tang, Fei Gao, Jianguo Liu, Xudong Wang

    Abstract: Integrated sensing and communications (ISAC) is considered a promising technology in the B5G/6G networks. The channel model is essential for an ISAC system to evaluate the communication and sensing performance. Most existing channel modeling studies focus on the monostatic ISAC channel. In this paper, the channel modeling framework for bistatic ISAC is considered. The proposed channel modeling fra… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE JOURNALS OF SELECTED AREAS IN SENSORS. Part of this work was presented at VTC-Spring 2024

  20. arXiv:2408.10853  [pdf, other

    cs.SD cs.AI eess.AS

    Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

    Authors: Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye

    Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  21. arXiv:2408.10852  [pdf, other

    cs.SD eess.AS

    EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

    Authors: Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li

    Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  22. arXiv:2408.10849  [pdf, other

    cs.SD eess.AS

    A Noval Feature via Color Quantisation for Fake Audio Detection

    Authors: Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li

    Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted by ISCSLP2024

  23. arXiv:2408.10287  [pdf

    physics.optics cs.AI eess.IV

    Recognizing Beam Profiles from Silicon Photonics Gratings using Transformer Model

    Authors: Yu Dian Lim, Hong Yu Li, Simon Chun Kiat Goh, Xiangyu Wang, Peng Zhao, Chuan Seng Tan

    Abstract: Over the past decade, there has been extensive work in developing integrated silicon photonics (SiPh) gratings for the optical addressing of trapped ion qubits in the ion trap quantum computing community. However, when viewing beam profiles from infrared (IR) cameras, it is often difficult to determine the corresponding heights where the beam profiles are located. In this work, we developed transf… ▽ More

    Submitted 22 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  24. arXiv:2408.09920  [pdf, other

    cs.CV cs.MM eess.IV

    Sliced Maximal Information Coefficient: A Training-Free Approach for Image Quality Assessment Enhancement

    Authors: Kang Xiao, Xu Wang, Yulin He, Baoliang Chen, Xuelin Shen

    Abstract: Full-reference image quality assessment (FR-IQA) models generally operate by measuring the visual differences between a degraded image and its reference. However, existing FR-IQA models including both the classical ones (eg, PSNR and SSIM) and deep-learning based measures (eg, LPIPS and DISTS) still exhibit limitations in capturing the full perception characteristics of the human visual system (HV… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 6 pages, 5 figures, accepted by ICME2024

  25. arXiv:2408.09731  [pdf, other

    eess.IV cs.CV

    Reconstruct Spine CT from Biplanar X-Rays via Diffusion Learning

    Authors: Zhi Qiao, Xuhui Liu, Xiaopeng Wang, Runkun Liu, Xiantong Zhen, Pei Dong, Zhen Qian

    Abstract: Intraoperative CT imaging serves as a crucial resource for surgical guidance; however, it may not always be readily accessible or practical to implement. In scenarios where CT imaging is not an option, reconstructing CT scans from X-rays can offer a viable alternative. In this paper, we introduce an innovative method for 3D CT reconstruction utilizing biplanar X-rays. Distinct from previous resear… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  26. arXiv:2408.09491  [pdf, other

    cs.SD eess.AS

    A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

    Authors: Yangze Li, Xiong Wang, Songjun Cao, Yike Zhang, Long Ma, Lei Xie

    Abstract: Audio-LLM introduces audio modality into a large language model (LLM) to enable a powerful LLM to recognize, understand, and generate audio. However, during speech recognition in noisy environments, we observed the presence of illusions and repetition issues in audio-LLM, leading to substitution and insertion errors. This paper proposes a transcription prompt-based audio-LLM by introducing an ASR… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  27. arXiv:2408.09367  [pdf, other

    eess.IV cs.CV

    Improving Lung Cancer Diagnosis and Survival Prediction with Deep Learning and CT Imaging

    Authors: Xiawei Wang, James Sharpnack, Thomas C. M. Lee

    Abstract: Lung cancer is a major cause of cancer-related deaths, and early diagnosis and treatment are crucial for improving patients' survival outcomes. In this paper, we propose to employ convolutional neural networks to model the non-linear relationship between the risk of lung cancer and the lungs' morphology revealed in the CT images. We apply a mini-batched loss that extends the Cox proportional hazar… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  28. arXiv:2408.09300  [pdf, other

    eess.AS cs.CR cs.LG cs.SD

    Malacopula: adversarial automatic speaker verification attacks using a neural-based generalised Hammerstein model

    Authors: Massimiliano Todisco, Michele Panariello, Xin Wang, Héctor Delgado, Kong Aik Lee, Nicholas Evans

    Abstract: We present Malacopula, a neural-based generalised Hammerstein model designed to introduce adversarial perturbations to spoofed speech utterances so that they better deceive automatic speaker verification (ASV) systems. Using non-linear processes to modify speech utterances, Malacopula enhances the effectiveness of spoofing attacks. The model comprises parallel branches of polynomial functions foll… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted at ASVspoof Workshop 2024

  29. arXiv:2408.09265  [pdf, other

    cs.CR cs.LG cs.NI eess.SY

    ByCAN: Reverse Engineering Controller Area Network (CAN) Messages from Bit to Byte Level

    Authors: Xiaojie Lin, Baihe Ma, Xu Wang, Guangsheng Yu, Ying He, Ren Ping Liu, Wei Ni

    Abstract: As the primary standard protocol for modern cars, the Controller Area Network (CAN) is a critical research target for automotive cybersecurity threats and autonomous applications. As the decoding specification of CAN is a proprietary black-box maintained by Original Equipment Manufacturers (OEMs), conducting related research and industry developments can be challenging without a comprehensive unde… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accept by IEEE Internet of Things Journal, 15 pages, 5 figures, 6 tables

  30. arXiv:2408.08881  [pdf, other

    eess.IV cs.AI cs.CV

    U-MedSAM: Uncertainty-aware MedSAM for Medical Image Segmentation

    Authors: Xin Wang, Xiaoyu Liu, Peng Huang, Pu Huang, Shu Hu, Hongtu Zhu

    Abstract: Medical Image Foundation Models have proven to be powerful tools for mask prediction across various datasets. However, accurately assessing the uncertainty of their predictions remains a significant challenge. To address this, we propose a new model, U-MedSAM, which integrates the MedSAM model with an uncertainty-aware loss function and the Sharpness-Aware Minimization (SharpMin) optimizer. The un… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.17496

  31. arXiv:2408.08849  [pdf, other

    eess.SP

    ECG-Chat: A Large ECG-Language Model for Cardiac Disease Diagnosis

    Authors: Yubao Zhao, Tian Zhang, Xu Wang, Puyu Han, Tong Chen, Linlin Huang, Youzhu Jin, Jiaju Kang

    Abstract: The success of Multimodal Large Language Models (MLLMs) in the medical auxiliary field shows great potential, allowing patients to engage in conversations using physiological signal data. However, general MLLMs perform poorly in cardiac disease diagnosis, particularly in the integration of ECG data analysis and long-text medical report generation, mainly due to the complexity of ECG data analysis… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  32. arXiv:2408.08739  [pdf, other

    eess.AS cs.AI cs.SD

    ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale

    Authors: Xin Wang, Hector Delgado, Hemlata Tak, Jee-weon Jung, Hye-jin Shim, Massimiliano Todisco, Ivan Kukanov, Xuechen Liu, Md Sahidullah, Tomi Kinnunen, Nicholas Evans, Kong Aik Lee, Junichi Yamagishi

    Abstract: ASVspoof 5 is the fifth edition in a series of challenges that promote the study of speech spoofing and deepfake attacks, and the design of detection solutions. Compared to previous challenges, the ASVspoof 5 database is built from crowdsourced data collected from a vastly greater number of speakers in diverse acoustic conditions. Attacks, also crowdsourced, are generated and tested using surrogat… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 8 pages, ASVspoof 5 Workshop (Interspeech2024 Satellite)

  33. arXiv:2408.08593  [pdf, other

    cs.LG eess.SY

    RadioDiff: An Effective Generative Diffusion Model for Sampling-Free Dynamic Radio Map Construction

    Authors: Xiucheng Wang, Keda Tao, Nan Cheng, Zhisheng Yin, Zan Li, Yuan Zhang, Xuemin Shen

    Abstract: Radio map (RM) is a promising technology that can obtain pathloss based on only location, which is significant for 6G network applications to reduce the communication costs for pathloss estimation. However, the construction of RM in traditional is either computationally intensive or depends on costly sampling-based pathloss measurements. Although the neural network (NN)-based method can efficientl… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  34. arXiv:2408.08567  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    S$^3$Attention: Improving Long Sequence Attention with Smoothed Skeleton Sketching

    Authors: Xue Wang, Tian Zhou, Jianqing Zhu, Jialin Liu, Kun Yuan, Tao Yao, Wotao Yin, Rong Jin, HanQin Cai

    Abstract: Attention based models have achieved many remarkable breakthroughs in numerous applications. However, the quadratic complexity of Attention makes the vanilla Attention based models hard to apply to long sequence tasks. Various improved Attention structures are proposed to reduce the computation cost by inducing low rankness and approximating the whole sequence by sub-sequences. The most challengin… ▽ More

    Submitted 23 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

  35. Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training

    Authors: Yiming Li, Zhifang Guo, Xiangdong Wang, Hong Liu

    Abstract: Recent advances have been witnessed in audio-language joint learning, such as CLAP, that shows much success in multi-modal understanding tasks. These models usually aggregate uni-modal local representations, namely frame or word features, into global ones, on which the contrastive loss is employed to reach coarse-grained cross-modal alignment. However, frame-level correspondence with texts may be… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: ACM MM 2024 (Oral)

  36. arXiv:2408.06922  [pdf, other

    cs.SD cs.AI eess.AS

    Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge

    Authors: Yuankun Xie, Xiaopeng Wang, Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Haonan Cheng, Long Ye

    Abstract: ASVspoof5, the fifth edition of the ASVspoof series, is one of the largest global audio security challenges. It aims to advance the development of countermeasure (CM) to discriminate bonafide and spoofed speech utterances. In this paper, we focus on addressing the problem of open-domain audio deepfake detection, which corresponds directly to the ASVspoof5 Track1 open condition. At first, we compre… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  37. arXiv:2408.05928  [pdf, other

    cs.SD eess.AS

    Adapting General Disentanglement-Based Speaker Anonymization for Enhanced Emotion Preservation

    Authors: Xiaoxiao Miao, Yuxiang Zhang, Xin Wang, Natalia Tomashenko, Donny Cheng Lock Soh, Ian Mcloughlin

    Abstract: A general disentanglement-based speaker anonymization system typically separates speech into content, speaker, and prosody features using individual encoders. This paper explores how to adapt such a system when a new speech attribute, for example, emotion, needs to be preserved to a greater extent. While existing systems are good at anonymizing speaker embeddings, they are not designed to preserve… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  38. arXiv:2408.05877  [pdf, other

    eess.IV

    Toward Pedestrian Head Tracking: A Benchmark Dataset and an Information Fusion Network

    Authors: Kailai Sun, Xinwei Wang, Shaobo Liu, Qianchuan Zhao, Gao Huang, Chang Liu

    Abstract: Pedestrian detection and tracking in crowded video sequences have a wide range of applications, including autonomous driving, robot navigation and pedestrian flow surveillance. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although deep learning models have achieved remarkable progress… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  39. Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI

    Authors: Lei Zhou, Yuzhong Zhang, Jiadong Zhang, Xuejun Qian, Chen Gong, Kun Sun, Zhongxiang Ding, Xing Wang, Zhenhui Li, Zaiyi Liu, Dinggang Shen

    Abstract: Automated breast tumor segmentation on the basis of dynamic contrast-enhancement magnetic resonance imaging (DCE-MRI) has shown great promise in clinical practice, particularly for identifying the presence of breast disease. However, accurate segmentation of breast tumor is a challenging task, often necessitating the development of complex networks. To strike an optimal trade-off between computati… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Journal ref: 2024,IEEE Transactions on Medical Imaging

  40. arXiv:2408.04755  [pdf, other

    cs.SE eess.SY

    Automation Configuration in Smart Home Systems: Challenges and Opportunities

    Authors: Sheik Murad Hassan Anik, Xinghua Gao, Hao Zhong, Xiaoyin Wang, Na Meng

    Abstract: As the innovation of smart devices and internet-of-things (IoT), smart homes have become prevalent. People tend to transform residences into smart homes by customizing off-the-shelf smart home platforms, instead of creating IoT systems from scratch. Among the alternatives, Home Assistant (HA) is one of the most popular platforms. It allows end-users (i.e., home residents) to smartify homes by (S1)… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 13 pages, 3 figures, 3 tables, 10 listings

  41. arXiv:2408.04300  [pdf, other

    eess.IV cs.CV

    An Explainable Non-local Network for COVID-19 Diagnosis

    Authors: Jingfu Yang, Peng Huang, Jing Hu, Shu Hu, Siwei Lyu, Xin Wang, Jun Guo, Xi Wu

    Abstract: The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  42. arXiv:2408.04063  [pdf, other

    eess.SY

    From Black Box to Clarity: AI-Powered Smart Grid Optimization with Kolmogorov-Arnold Networks

    Authors: Xiaoting Wang, Yuzhuo Li, Yunwei Li, Gregory Kish

    Abstract: This work is the first to adopt Kolmogorov-Arnold Networks (KAN), a recent breakthrough in artificial intelligence, for smart grid optimizations. To fully leverage KAN's interpretability, a general framework is proposed considering complex uncertainties. The stochastic optimal power flow problem in hybrid AC/DC systems is chosen as a particularly tough case study for demonstrating the effectivenes… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted in Late Breaking Research Publications in 2024 IEEE Energy Conversion Congress and Exposition (ECCE)

  43. arXiv:2408.03885  [pdf, other

    cs.CV eess.IV

    Global-Local Progressive Integration Network for Blind Image Quality Assessment

    Authors: Xiaoqi Wang, Yun Zhang

    Abstract: Vision transformers (ViTs) excel in computer vision for modeling long-term dependencies, yet face two key challenges for image quality assessment (IQA): discarding fine details during patch embedding, and requiring extensive training data due to lack of inductive biases. In this study, we propose a Global-Local progressive INTegration network for IQA, called GlintIQA, to address these issues throu… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  44. arXiv:2408.02178  [pdf, other

    eess.AS cs.SD

    StreamVoice+: Evolving into End-to-end Streaming Zero-shot Voice Conversion

    Authors: Zhichao Wang, Yuanzhe Chen, Xinsheng Wang, Lei Xie, Yuping Wang

    Abstract: StreamVoice has recently pushed the boundaries of zero-shot voice conversion (VC) in the streaming domain. It uses a streamable language model (LM) with a context-aware approach to convert semantic features from automatic speech recognition (ASR) into acoustic features with the desired speaker timbre. Despite its innovations, StreamVoice faces challenges due to its dependency on a streaming ASR wi… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  45. arXiv:2408.00325  [pdf, other

    cs.SD eess.AS

    Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition

    Authors: Haoqin Sun, Shiwan Zhao, Xiangyu Kong, Xuechen Wang, Hui Wang, Jiaming Zhou, Yong Qin

    Abstract: Recognizing emotions from speech is a daunting task due to the subtlety and ambiguity of expressions. Traditional speech emotion recognition (SER) systems, which typically rely on a singular, precise emotion label, struggle with this complexity. Therefore, modeling the inherent ambiguity of emotions is an urgent problem. In this paper, we propose an iterative prototype refinement framework (IPR) f… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  46. arXiv:2407.20280  [pdf, other

    eess.SP

    Movable Frequency Diverse Array-Assisted Covert Communication With Multiple Wardens

    Authors: Zihao Cheng, Jiangbo Si, Zan Li, Pengpeng Liu, Xiaoting Wang, Naofal Al-Dhahir

    Abstract: The frequency diverse array (FDA) is highly promising for improving covert communication performance by adjusting the frequency of each antenna at the transmitter. However, when faced with the cases of multiple wardens and highly correlated channels, FDA is limited by the frequency constraint and cannot provide satisfactory covert performance. In this paper, we propose a novel movable FDA (MFDA) a… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  47. arXiv:2407.20111  [pdf, other

    cs.SD eess.AS eess.SP

    Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning

    Authors: Yikang Wang, Xingming Wang, Hiromitsu Nishizaki, Ming Li

    Abstract: Current research in synthesized speech detection primarily focuses on the generalization of detection systems to unknown spoofing methods of noise-free speech. However, the performance of anti-spoofing countermeasures (CM) system is often don't work as well in more challenging scenarios, such as those involving noise and reverberation. To address the problem of enhancing the robustness of CM syste… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 29 pages, 4 figures, Journal Papers

  48. arXiv:2407.19841  [pdf, other

    eess.SP cs.AR

    RRAM-Based Bio-Inspired Circuits for Mobile Epileptic Correlation Extraction and Seizure Prediction

    Authors: Hao Wang, Lingfeng Zhang, Erjia Xiao, Xin Wang, Zhongrui Wang, Renjing Xu

    Abstract: Non-invasive mobile electroencephalography (EEG) acquisition systems have been utilized for long-term monitoring of seizures, yet they suffer from limited battery life. Resistive random access memory (RRAM) is widely used in computing-in-memory(CIM) systems, which offers an ideal platform for reducing the computational energy consumption of seizure prediction algorithms, potentially solving the en… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 7 pages, 5 figures

  49. arXiv:2407.19178  [pdf, other

    cs.CV eess.SP

    Power-LLaVA: Large Language and Vision Assistant for Power Transmission Line Inspection

    Authors: Jiahao Wang, Mingxuan Li, Haichen Luo, Jinguo Zhu, Aijun Yang, Mingzhe Rong, Xiaohua Wang

    Abstract: The inspection of power transmission line has achieved notable achievements in the past few years, primarily due to the integration of deep learning technology. However, current inspection approaches continue to encounter difficulties in generalization and intelligence, which restricts their further applicability. In this paper, we introduce Power-LLaVA, the first large language and vision assista… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  50. arXiv:2407.18054  [pdf, other

    eess.IV cs.CV

    LKCell: Efficient Cell Nuclei Instance Segmentation with Large Convolution Kernels

    Authors: Ziwei Cui, Jingfeng Yao, Lunbin Zeng, Juan Yang, Wenyu Liu, Xinggang Wang

    Abstract: The segmentation of cell nuclei in tissue images stained with the blood dye hematoxylin and eosin (H$\&$E) is essential for various clinical applications and analyses. Due to the complex characteristics of cellular morphology, a large receptive field is considered crucial for generating high-quality segmentation. However, previous methods face challenges in achieving a balance between the receptiv… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.