Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 77 results for author: Xin, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13035  [pdf, other

    cs.CL

    D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models

    Authors: Zhongwei Wan, Xinjian Wu, Yu Zhang, Yi Xin, Chaofan Tao, Zhihong Zhu, Xin Wang, Siqi Luo, Jing Xiong, Mi Zhang

    Abstract: Efficient inference in Large Language Models (LLMs) is impeded by the growing memory demands of key-value (KV) caching, especially for longer sequences. Traditional KV cache eviction strategies, which prioritize less critical KV-pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations. To address this, we introduce Dynamic Discrimi… ▽ More

    Submitted 23 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  2. arXiv:2406.07476  [pdf, other

    cs.CV cs.CL

    VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

    Authors: Zesen Cheng, Sicong Leng, Hang Zhang, Yifei Xin, Xin Li, Guanzheng Chen, Yongxin Zhu, Wenqi Zhang, Ziyang Luo, Deli Zhao, Lidong Bing

    Abstract: In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks. Building upon its predecessor, VideoLLaMA 2 incorporates a tailor-made Spatial-Temporal Convolution (STC) connector, which effectively captures the intricate spatial and temporal dynamics of video data… ▽ More

    Submitted 17 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: ZC, SL, HZ, YX, and XL contributed equally to this project

  3. arXiv:2406.02632  [pdf, other

    cs.CR cs.LG cs.NI

    Redefining DDoS Attack Detection Using A Dual-Space Prototypical Network-Based Approach

    Authors: Fernando Martinez, Mariyam Mapkar, Ali Alfatemi, Mohamed Rahouti, Yufeng Xin, Kaiqi Xiong, Nasir Ghani

    Abstract: Distributed Denial of Service (DDoS) attacks pose an increasingly substantial cybersecurity threat to organizations across the globe. In this paper, we introduce a new deep learning-based technique for detecting DDoS attacks, a paramount cybersecurity challenge with evolving complexity and scale. Specifically, we propose a new dual-space prototypical network that leverages a unique dual-space loss… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 9 pages, The 33rd International Conference on Computer Communications and Networks (ICCCN 2024)

  4. arXiv:2405.15330  [pdf, other

    cs.CV cs.LG

    Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model

    Authors: Mingyang Yi, Aoxue Li, Yi Xin, Zhenguo Li

    Abstract: Recently, the strong latent Diffusion Probabilistic Model (DPM) has been applied to high-quality Text-to-Image (T2I) generation (e.g., Stable Diffusion), by injecting the encoded target text prompt into the gradually denoised diffusion image generator. Despite the success of DPM in practice, the mechanism behind it remains to be explored. To fill this blank, we begin by examining the intermediate… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  5. arXiv:2405.14700  [pdf, other

    cs.CV

    Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference

    Authors: Ting Liu, Xuyang Liu, Liangtao Shi, Zunnan Xu, Siteng Huang, Yi Xin, Quanjun Yin

    Abstract: Parameter-efficient fine-tuning (PEFT) has emerged as a popular approach for adapting pre-trained Vision Transformer (ViT) models to downstream applications. While current PEFT methods achieve parameter efficiency, they overlook GPU memory and time efficiency during both fine-tuning and inference, due to the repeated computation of redundant tokens in the ViT architecture. This falls short of prac… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  6. arXiv:2405.00579  [pdf, other

    cs.GT

    LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game

    Authors: Jianfeng Lu, Yue Chen, Shuqin Cao, Longbiao Chen, Wei Wang, Yun Xin

    Abstract: Although Hierarchical Federated Learning (HFL) utilizes edge servers (ESs) to alleviate communication burdens, its model performance will be degraded by non-IID data and limited communication resources. Current works often assume that data is uniformly distributed, which however contradicts the heterogeneity of IoT. Solutions of additional model training to check the data distribution inevitably i… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  7. arXiv:2405.00456  [pdf, other

    cs.LG cs.AI

    Counterfactual Explanations for Deep Learning-Based Traffic Forecasting

    Authors: Rushan Wang, Yanan Xin, Yatao Zhang, Fernando Perez-Cruz, Martin Raubal

    Abstract: Deep learning models are widely used in traffic forecasting and have achieved state-of-the-art prediction accuracy. However, the black-box nature of those models makes the results difficult to interpret by users. This study aims to leverage an Explainable AI approach, counterfactual explanations, to enhance the explainability and usability of deep learning-based traffic forecasting models. Specifi… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 24 pages

  8. arXiv:2403.08133  [pdf, other

    eess.SP cs.AI cs.IT

    Physics-Inspired Deep Learning Anti-Aliasing Framework in Efficient Channel State Feedback

    Authors: Yu-Chien Lin, Yan Xin, Ta-Sung Lee, Charlie, Zhang, Zhi Ding

    Abstract: Acquiring downlink channel state information (CSI) at the base station is vital for optimizing performance in massive Multiple input multiple output (MIMO) Frequency-Division Duplexing (FDD) systems. While deep learning architectures have been successful in facilitating UE-side CSI feedback and gNB-side recovery, the undersampling issue prior to CSI feedback is often overlooked. This issue, which… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  9. arXiv:2402.07485  [pdf, other

    cs.SD eess.AS

    MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning

    Authors: Hang Zhao, Yifei Xin, Zhesong Yu, Bilei Zhu, Lu Lu, Zejun Ma

    Abstract: In the realm of audio-language pre-training (ALP), the challenge of achieving cross-modal alignment is significant. Moreover, the integration of audio inputs with diverse distributions and task variations poses challenges in developing generic audio-language models. In this study, we present MINT, a novel ALP framework boosting audio-language models through multi-target pre-training and instructio… ▽ More

    Submitted 11 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  10. arXiv:2402.02242  [pdf, other

    cs.CV cs.LG

    Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey

    Authors: Yi Xin, Siqi Luo, Haodi Zhou, Junlong Du, Xiaohong Liu, Yue Fan, Qing Li, Yuntao Du

    Abstract: Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability across various downstream vision tasks. However, with state-of-the-art PVMs growing to billions or even trillions of parameters, the standard full fine-tuning paradigm is becoming unsustainable due to high computational and storage demands. In response, researchers are exploring parameter-efficient fine-tuning… ▽ More

    Submitted 8 February, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

    Comments: 9 pages, 3 figures, 2 tables

  11. arXiv:2401.15953  [pdf, other

    cs.SD eess.AS

    Masked Audio Modeling with CLAP and Multi-Objective Learning

    Authors: Yifei Xin, Xiulian Peng, Yan Lu

    Abstract: Most existing masked audio modeling (MAM) methods learn audio representations by masking and reconstructing local spectrogram patches. However, the reconstruction loss mainly accounts for the signal-level quality of the reconstructed spectrogram and is still limited in extracting high-level audio semantics. In this paper, we propose to enhance the semantic modeling of MAM by distilling cross-modal… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted by Interspeech2023

  12. arXiv:2401.04332  [pdf, other

    cs.CV math.AT

    Flexible filtrations for multiparameter persistent homology detect digital images

    Authors: Jiaxing He, Bingzhe Hou, Tieru Wu, Yue Xin

    Abstract: Two important problems in the field of Topological Data Analysis are defining practical multifiltrations on objects and showing ability of TDA to detect the geometry. Motivated by the problems, we constuct three multifiltrations named multi-GENEO, multi-DGENEO and mix-GENEO, and prove the stability of both the interleaving distance and multiparameter persistence landscape of multi-GENEO with respe… ▽ More

    Submitted 1 April, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  13. arXiv:2401.03116  [pdf, other

    cs.CR cs.LG

    Advancing DDoS Attack Detection: A Synergistic Approach Using Deep Residual Neural Networks and Synthetic Oversampling

    Authors: Ali Alfatemi, Mohamed Rahouti, Ruhul Amin, Sarah ALJamal, Kaiqi Xiong, Yufeng Xin

    Abstract: Distributed Denial of Service (DDoS) attacks pose a significant threat to the stability and reliability of online systems. Effective and early detection of such attacks is pivotal for safeguarding the integrity of networks. In this work, we introduce an enhanced approach for DDoS attack detection by leveraging the capabilities of Deep Residual Neural Networks (ResNets) coupled with synthetic overs… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: 8 pages, 3 figures

  14. arXiv:2312.08733  [pdf, other

    cs.CV

    VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding

    Authors: Yi Xin, Junlong Du, Qiang Wang, Zhiwen Lin, Ke Yan

    Abstract: Large-scale pre-trained models have achieved remarkable success in various computer vision tasks. A standard approach to leverage these models is to fine-tune all model parameters for downstream tasks, which poses challenges in terms of computational and storage costs. Recently, inspired by Natural Language Processing (NLP), parameter-efficient transfer learning has been successfully applied to vi… ▽ More

    Submitted 15 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI2024

  15. arXiv:2312.08636  [pdf, other

    cs.CV

    MmAP : Multi-modal Alignment Prompt for Cross-domain Multi-task Learning

    Authors: Yi Xin, Junlong Du, Qiang Wang, Ke Yan, Shouhong Ding

    Abstract: Multi-Task Learning (MTL) is designed to train multiple correlated tasks simultaneously, thereby enhancing the performance of individual tasks. Typically, a multi-task network structure consists of a shared backbone and task-specific decoders. However, the complexity of the decoders increases with the number of tasks. To tackle this challenge, we integrate the decoder-free vision-language model CL… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  16. arXiv:2312.00006  [pdf, other

    cs.CR cs.AI

    Enhancing ML-Based DoS Attack Detection Through Combinatorial Fusion Analysis

    Authors: Evans Owusu, Mohamed Rahouti, D. Frank Hsu, Kaiqi Xiong, Yufeng Xin

    Abstract: Mitigating Denial-of-Service (DoS) attacks is vital for online service security and availability. While machine learning (ML) models are used for DoS attack detection, new strategies are needed to enhance their performance. We suggest an innovative method, combinatorial fusion, which combines multiple ML models using advanced algorithms. This includes score and rank combinations, weighted techniqu… ▽ More

    Submitted 1 October, 2023; originally announced December 2023.

    Comments: 6 pages, 3 figures, IEEE CNS

  17. arXiv:2311.11749  [pdf, other

    physics.soc-ph cs.LG cs.SI

    Revealing behavioral impact on mobility prediction networks through causal interventions

    Authors: Ye Hong, Yanan Xin, Simon Dirmeier, Fernando Perez-Cruz, Martin Raubal

    Abstract: Deep neural networks are increasingly utilized in mobility prediction tasks, yet their intricate internal workings pose challenges for interpretability, especially in comprehending how various aspects of mobility behavior affect predictions. This study introduces a causal intervention framework to assess the impact of mobility-related factors on neural networks designed for next location predictio… ▽ More

    Submitted 18 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: 31 pages, 6 figures

  18. arXiv:2311.09732  [pdf, other

    cs.CL cs.AI

    Source Prompt: Coordinated Pre-training of Language Models on Diverse Corpora from Multiple Sources

    Authors: Yipei Xu, Dakuan Lu, Jiaqing Liang, Xintao Wang, Yipeng Geng, Yingsi Xin, Hengkui Wu, Ken Chen, ruiji zhang, Yanghua Xiao

    Abstract: Pre-trained language models (PLMs) have established the new paradigm in the field of NLP. For more powerful PLMs, one of the most popular and successful way is to continuously scale up sizes of the models and the pre-training corpora. These large corpora are generally obtained by converging smaller ones from multiple sources, they are thus growing increasingly diverse. However, the side-effects of… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  19. arXiv:2311.07349  [pdf, other

    cs.CY cs.CE

    Vehicle-to-grid for car sharing -- A simulation study for 2030

    Authors: Nina Wiedemann, Yanan Xin, Vasco Medici, Lorenzo Nespoli, Esra Suel, Martin Raubal

    Abstract: The proliferation of car sharing services in recent years presents a promising avenue for advancing sustainable transportation. Beyond merely reducing car ownership rates, these systems can play a pivotal role in bolstering grid stability through the provision of ancillary services via vehicle-to-grid (V2G) technologies - a facet that has received limited attention in previous research. In this st… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  20. arXiv:2311.00377  [pdf, other

    cs.LG stat.AP

    Uncertainty quantification and out-of-distribution detection using surjective normalizing flows

    Authors: Simon Dirmeier, Ye Hong, Yanan Xin, Fernando Perez-Cruz

    Abstract: Reliable quantification of epistemic and aleatoric uncertainty is of crucial importance in applications where models are trained in one environment but applied to multiple different environments, often seen in real-world applications for example, in climate science or mobility analysis. We propose a simple approach using surjective normalizing flows to identify out-of-distribution data sets in dee… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  21. arXiv:2310.17661  [pdf, other

    eess.SP cs.NI

    An Overview on IEEE 802.11bf: WLAN Sensing

    Authors: Rui Du, Haocheng Hua, Hailiang Xie, Xianxin Song, Zhonghao Lyu, Mengshi Hu, Narengerile, Yan Xin, Stephen McCann, Michael Montemurro, Tony Xiao Han, Jie Xu

    Abstract: With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent requirements for emerging sensing applications.… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 31 pages, 25 figures, this is a significant updated version of arXiv:2207.04859

  22. arXiv:2310.08732  [pdf, other

    cs.LG cs.CR

    Provably Robust Cost-Sensitive Learning via Randomized Smoothing

    Authors: Yuan Xin, Michael Backes, Xiao Zhang

    Abstract: We study the problem of robust learning against adversarial perturbations under cost-sensitive scenarios, where the potential harm of different types of misclassifications is encoded in a cost matrix. Existing approaches are either empirical and cannot certify robustness or suffer from inherent scalability issues. In this work, we investigate whether randomized smoothing, a scalable framework for… ▽ More

    Submitted 30 May, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: 19 pages, 9 tables, 5 figures

  23. Soil Image Segmentation Based on Mask R-CNN

    Authors: Yida Chen, Kang Liu, Yi Xin, Xinru Zhao

    Abstract: The complex background in the soil image collected in the field natural environment will affect the subsequent soil image recognition based on machine vision. Segmenting the soil center area from the soil image can eliminate the influence of the complex background, which is an important preprocessing work for subsequent soil image recognition. For the first time, the deep learning method was appli… ▽ More

    Submitted 2 September, 2023; originally announced September 2023.

    Comments: 4 pages, 5 figures, Published in 2023 3rd International Conference on Consumer Electronics and Computer Engineering

    Journal ref: 2023 3rd International Conference on Consumer Electronics and Computer Engineering (ICCECE)

  24. arXiv:2308.15216  [pdf, other

    cs.CV

    On-the-Fly Guidance Training for Medical Image Registration

    Authors: Yicheng Chen, Shengxiang Ji, Yuelin Xin, Kun Han, Xiaohui Xie

    Abstract: This research explores a novel approach in the realm of learning-based image registration, addressing the limitations inherent in weakly-supervised and unsupervised methods. Weakly-supervised techniques depend heavily on scarce labeled data, while unsupervised strategies rely on indirect measures of accuracy through image similarity. Notably, traditional supervised learning is not utilized due to… ▽ More

    Submitted 22 December, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: 12 pages, 10 figures, 4 tables

    ACM Class: I.4.9

  25. arXiv:2308.07470  [pdf, other

    cs.DC cs.LG

    Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling

    Authors: Lequn Chen, Weixin Deng, Anirudh Canumalla, Yu Xin, Danyang Zhuo, Matthai Philipose, Arvind Krishnamurthy

    Abstract: Having large batch sizes is one of the most critical aspects of increasing the accelerator efficiency and the performance of DNN model inference. However, existing model serving systems cannot achieve adequate batch sizes while meeting latency objectives as these systems eagerly dispatch requests to accelerators to minimize the accelerator idle time. We propose Symphony, a DNN serving system that… ▽ More

    Submitted 28 February, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

  26. arXiv:2308.00210  [pdf, other

    cs.CV

    Scene Separation & Data Selection: Temporal Segmentation Algorithm for Real-Time Video Stream Analysis

    Authors: Yuelin Xin, Zihan Zhou, Yuxuan Xia

    Abstract: We present 2SDS (Scene Separation and Data Selection algorithm), a temporal segmentation algorithm used in real-time video stream interpretation. It complements CNN-based models to make use of temporal information in videos. 2SDS can detect the change between scenes in a video stream by com-paring the image difference between two frames. It separates a video into segments (scenes), and by combinin… ▽ More

    Submitted 31 July, 2023; originally announced August 2023.

    Comments: 5 pages, 4 figures, at IJCAI-ECAI 2022 workshop, First International Workshop on Spatio-Temporal Reasoning and Learning, July 24, 2022, Vienna, Austria

    ACM Class: I.4.8

    Journal ref: CEUR.Workshop.Proceedings.2022.Vol-3190.paper2

  27. arXiv:2307.15344  [pdf, other

    cs.SD eess.AS

    Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions

    Authors: Yifei Xin, Yuexian Zou

    Abstract: Most existing audio-text retrieval (ATR) methods focus on constructing contrastive pairs between whole audio clips and complete caption sentences, while ignoring fine-grained cross-modal relationships, e.g., short segments and phrases or frames and words. In this paper, we introduce a hierarchical cross-modal interaction (HCI) method for ATR by simultaneously exploring clip-sentence, segment-phras… ▽ More

    Submitted 28 July, 2023; originally announced July 2023.

    Comments: Accepted by Interspeech2023

  28. arXiv:2306.11556  [pdf, other

    cs.CV cs.GR

    NeRF synthesis with shading guidance

    Authors: Chenbin Li, Yu Xin, Gaoyi Liu, Xiang Zeng, Ligang Liu

    Abstract: The emerging Neural Radiance Field (NeRF) shows great potential in representing 3D scenes, which can render photo-realistic images from novel view with only sparse views given. However, utilizing NeRF to reconstruct real-world scenes requires images from different viewpoints, which limits its practical application. This problem can be even more pronounced for large scenes. In this paper, we introd… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: 16 pages, 16 figures, accepted by CAD/Graphics 2023(poster)

  29. arXiv:2306.11351  [pdf, other

    cs.AR

    A Versatility-Performance Balanced Hardware Architecture for Scene Text Detection

    Authors: Yao Xin, Guoming Tang, Donglong Chen, Rumin Zhang, Teng Liang, Ray C. C. Cheung, Cetin Kaya Koc

    Abstract: Detecting and extracting textual information from natural scene images needs Scene Text Detection (STD) algorithms. Fully Convolutional Neural Networks (FCNs) are usually utilized as the backbone model to extract features in these instance segmentation based STD algorithms. FCNs naturally come with high computational complexity. Furthermore, to keep up with the growing variety of models, flexible… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

  30. arXiv:2306.08648  [pdf, other

    cs.CV cs.RO

    SimpleMapping: Real-Time Visual-Inertial Dense Mapping with Deep Multi-View Stereo

    Authors: Yingye Xin, Xingxing Zuo, Dongyue Lu, Stefan Leutenegger

    Abstract: We present a real-time visual-inertial dense mapping method capable of performing incremental 3D mesh reconstruction with high quality using only sequential monocular images and inertial measurement unit (IMU) readings. 6-DoF camera poses are estimated by a robust feature-based visual-inertial odometry (VIO), which also generates noisy sparse 3D map points as a by-product. We propose a sparse poin… ▽ More

    Submitted 27 August, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

  31. arXiv:2305.09953  [pdf, other

    cs.IT eess.SP

    Low Complexity Detection of Spatial Modulation Aided OTFS in Doubly-Selective Channels

    Authors: Zeping Sui, Hongming Zhang, Yu Xin, Tong Bao, Lie-Liang Yang, Lajos Hanzo

    Abstract: A spatial modulation-aided orthogonal time frequency space (SM-OTFS) scheme is proposed for high-Doppler scenarios, which relies on a low-complexity distance-based detection algorithm. We first derive the delay-Doppler (DD) domain input-output relationship of our SM-OTFS system by exploiting an SM mapper, followed by characterizing the doubly-selective channels considered. Then we propose a distan… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  32. Spatially-Aware Car-Sharing Demand Prediction

    Authors: Dominik J. Mühlematter, Nina Wiedemann, Yanan Xin, Martin Raubal

    Abstract: In recent years, car-sharing services have emerged as viable alternatives to private individual mobility, promising more sustainable and resource-efficient, but still comfortable transportation. Research on short-term prediction and optimization methods has improved operations and fleet control of car-sharing services; however, long-term projections and spatial analysis are sparse in the literatur… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: 16 pages, 6 figures

    Journal ref: Journal of Transport Geography 114 (2024)

  33. arXiv:2303.05681  [pdf, other

    cs.SD eess.AS

    Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised Loss

    Authors: Yifei Xin, Dongchao Yang, Yuexian Zou

    Abstract: In text-audio retrieval (TAR) tasks, due to the heterogeneity of contents between text and audio, the semantic information contained in the text is only similar to certain frames within the audio. Yet, existing works aggregate the entire audio without considering the text, such as mean-pooling over the frames, which is likely to encode misleading audio information not described in the given text.… ▽ More

    Submitted 30 March, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

  34. arXiv:2303.05678  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Weakly Supervised Sound Event Detection with Causal Intervention

    Authors: Yifei Xin, Dongchao Yang, Fan Cui, Yujun Wang, Yuexian Zou

    Abstract: Existing weakly supervised sound event detection (WSSED) work has not explored both types of co-occurrences simultaneously, i.e., some sound events often co-occur, and their occurrences are usually accompanied by specific background sounds, so they would be inevitably entangled, causing misclassification and biased localization results with only clip-level supervision. To tackle this issue, we fir… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP2023

  35. arXiv:2302.11558  [pdf, other

    cs.SD eess.AS

    Improving Speech Enhancement via Event-based Query

    Authors: Yifei Xin, Xiulian Peng, Yan Lu

    Abstract: Existing deep learning based speech enhancement (SE) methods either use blind end-to-end training or explicitly incorporate speaker embedding or phonetic information into the SE network to enhance speech quality. In this paper, we perceive speech and noises as different types of sound events and propose an event-based query method for SE. Specifically, representative speech embeddings that can dis… ▽ More

    Submitted 24 February, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP2023

  36. arXiv:2302.09432  [pdf, other

    cs.CL

    BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark

    Authors: Dakuan Lu, Hengkui Wu, Jiaqing Liang, Yipei Xu, Qianyu He, Yipeng Geng, Mengkun Han, Yingsi Xin, Yanghua Xiao

    Abstract: To advance Chinese financial natural language processing (NLP), we introduce BBT-FinT5, a new Chinese financial pre-training language model based on the T5 model. To support this effort, we have built BBT-FinCorpus, a large-scale financial corpus with approximately 300GB of raw text from four different sources. In general domain NLP, comprehensive benchmarks like GLUE and SuperGLUE have driven sig… ▽ More

    Submitted 26 February, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: Changed author order

  37. Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities

    Authors: Moritz Neun, Christian Eichenberger, Yanan Xin, Cheng Fu, Nina Wiedemann, Henry Martin, Martin Tomko, Lukas Ambühl, Luca Hermes, Michael Kopp

    Abstract: Traffic analysis is crucial for urban operations and planning, while the availability of dense urban traffic data beyond loop detectors is still scarce. We present a large-scale floating vehicle dataset of per-street segment traffic information, Metropolitan Segment Traffic Speeds from Massive Floating Car Data in 10 Cities (MeTS-10), available for 10 global cities with a 15-minute resolution for… ▽ More

    Submitted 31 August, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems (T-ITS), DOI: https://doi.org/10.1109/TITS.2023.3291737

    Journal ref: IEEE Transactions on Intelligent Transportation Systems (T-ITS), 2023

  38. arXiv:2302.05406  [pdf, other

    cs.CL

    Adversarial Transformer Language Models for Contextual Commonsense Inference

    Authors: Pedro Colon-Hernandez, Henry Lieberman, Yida Xin, Claire Yin, Cynthia Breazeal, Peter Chin

    Abstract: Contextualized or discourse aware commonsense inference is the task of generating coherent commonsense assertions (i.e., facts) from a given story, and a particular sentence from that story. Some problems with the task are: lack of controllability for topics of the inferred facts; lack of commonsense knowledge during training; and, possibly, hallucinated or false facts. In this work, we utilize a… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Submitted to Semantic Web Journal special edition. https://semantic-web-journal.org/content/adversarial-transformer-language-models-contextual-commonsense-inference-1

  39. arXiv:2211.13391  [pdf

    cs.ET cond-mat.dis-nn

    Electrical Tunable Spintronic Neuron with Trainable Activation Function

    Authors: Yue Xin, Kang Zhou, Xuanyao Fong, Yumeng Yang, Shenghua Gao, Zhifeng Zhu

    Abstract: Spintronic devices have been widely studied for the hardware realization of artificial neurons. The stochastic switching of magnetic tunnel junction driven by the spin torque is commonly used to produce the sigmoid activation function. However, the shape of the activation function in previous studies is fixed during the training of neural network. This restricts the updating of weights and results… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: 26 pages, 9 figures

  40. arXiv:2210.16754  [pdf, other

    cs.LG cs.AI cs.CY cs.NE

    Mitigating Unfairness via Evolutionary Multi-objective Ensemble Learning

    Authors: Zhang Qingquan, Liu Jialin, Zhang Zeqi, Wen Junyi, Mao Bifei, Yao Xin

    Abstract: In the literature of mitigating unfairness in machine learning, many fairness measures are designed to evaluate predictions of learning models and also utilised to guide the training of fair models. It has been theoretically and empirically shown that there exist conflicts and inconsistencies among accuracy and multiple fairness measures. Optimising one or several fairness measures may sacrifice o… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: 15 pages

  41. Vision Paper: Causal Inference for Interpretable and Robust Machine Learning in Mobility Analysis

    Authors: Yanan Xin, Natasa Tagasovska, Fernando Perez-Cruz, Martin Raubal

    Abstract: Artificial intelligence (AI) is revolutionizing many areas of our lives, leading a new era of technological advancement. Particularly, the transportation sector would benefit from the progress in AI and advance the development of intelligent transportation systems. Building intelligent transportation systems requires an intricate combination of artificial intelligence and mobility analysis. The pa… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: accepted by ACM SIGSPATIAL 2022 Conference

    ACM Class: I.2; J.2

  42. arXiv:2209.01620  [pdf, other

    cs.CV

    MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition

    Authors: Yunhao Wang, Huixin Sun, Xiaodi Wang, Bin Zhang, Chao Li, Ying Xin, Baochang Zhang, Errui Ding, Shumin Han

    Abstract: Vision Transformer and its variants have demonstrated great potential in various computer vision tasks. But conventional vision transformers often focus on global dependency at a coarse level, which suffer from a learning challenge on global relationships and fine-grained representation at a token level. In this paper, we introduce Multi-scale Attention Fusion into transformer (MAFormer), which ex… ▽ More

    Submitted 31 August, 2022; originally announced September 2022.

    Comments: 9 pages, 2 figures

  43. arXiv:2207.04859  [pdf, ps, other

    cs.NI eess.SP

    An Overview on IEEE 802.11bf: WLAN Sensing

    Authors: Rui Du, Hailiang Xie, Mengshi Hu, Narengerile, Yan Xin, Stephen McCann, Michael Montemurro, Tony Xiao Han, Jie Xu

    Abstract: With recent advancements, the wireless local area network (WLAN) or wireless fidelity (Wi-Fi) technology has been successfully utilized to realize sensing functionalities such as detection, localization, and recognition. However, the WLANs standards are developed mainly for the purpose of communication, and thus may not be able to meet the stringent sensing requirements in emerging applications. T… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

  44. arXiv:2207.02102  [pdf, other

    cs.NI

    Data Integrity Error Localization in Networked Systems with Missing Data

    Authors: Yufeng Xin, Shih-Wen Fu, Anirban Mandal, Ryan Tanaka, Mats Rynge, Karan Vahi, Ewa Deelman

    Abstract: Most recent network failure diagnosis systems focused on data center networks where complex measurement systems can be deployed to derive routing information and ensure network coverage in order to achieve accurate and fast fault localization. In this paper, we target wide-area networks that support data-intensive distributed applications. We first present a new multi-output prediction model that… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

    Comments: Six pages

  45. arXiv:2206.11599  [pdf, other

    eess.IV cs.CV

    Universal Learned Image Compression With Low Computational Cost

    Authors: Bowen Li, Yao Xin, Youneng Bao, Fanyang Meng, Yongsheng Liang, Wen Tan

    Abstract: Recently, learned image compression methods have developed rapidly and exhibited excellent rate-distortion performance when compared to traditional standards, such as JPEG, JPEG2000 and BPG. However, the learning-based methods suffer from high computational costs, which is not beneficial for deployment on devices with limited resources. To this end, we propose shift-addition parallel modules (SAPM… ▽ More

    Submitted 23 June, 2022; originally announced June 2022.

    Comments: 5 pages

  46. arXiv:2205.15130  [pdf, other

    cs.LG cs.AI cs.CV

    Why Adversarial Training of ReLU Networks Is Difficult?

    Authors: Xu Cheng, Hao Zhang, Yue Xin, Wen Shen, Jie Ren, Quanshi Zhang

    Abstract: This paper mathematically derives an analytic solution of the adversarial perturbation on a ReLU network, and theoretically explains the difficulty of adversarial training. Specifically, we formulate the dynamics of the adversarial perturbation generated by the multi-step attack, which shows that the adversarial perturbation tends to strengthen eigenvectors corresponding to a few top-ranked eigenv… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  47. arXiv:2202.03026  [pdf, other

    cs.CV

    Context Autoencoder for Self-Supervised Representation Learning

    Authors: Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang

    Abstract: We present a novel masked image modeling (MIM) approach, context autoencoder (CAE), for self-supervised representation pretraining. We pretrain an encoder by making predictions in the encoded representation space. The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked p… ▽ More

    Submitted 10 August, 2023; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: Accepted by International Journal of Computer Vision (IJCV)

  48. arXiv:2111.01544  [pdf

    eess.IV cs.CV physics.med-ph

    Comprehensive and Clinically Accurate Head and Neck Organs at Risk Delineation via Stratified Deep Learning: A Large-scale Multi-Institutional Study

    Authors: Dazhou Guo, Jia Ge, Xianghua Ye, Senxiang Yan, Yi Xin, Yuchen Song, Bing-shen Huang, Tsung-Min Hung, Zhuotun Zhu, Ling Peng, Yanping Ren, Rui Liu, Gong Zhang, Mengyuan Mao, Xiaohua Chen, Zhongjie Lu, Wenxiang Li, Yuzhen Chen, Lingyun Huang, Jing Xiao, Adam P. Harrison, Le Lu, Chien-Yu Lin, Dakai Jin, Tsung-Ying Ho

    Abstract: Accurate organ at risk (OAR) segmentation is critical to reduce the radiotherapy post-treatment complications. Consensus guidelines recommend a set of more than 40 OARs in the head and neck (H&N) region, however, due to the predictable prohibitive labor-cost of this task, most institutions choose a substantially simplified protocol by delineating a smaller subset of OARs and neglecting the dose di… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  49. arXiv:2110.05280  [pdf

    cs.CV

    Multi-institutional Validation of Two-Streamed Deep Learning Method for Automated Delineation of Esophageal Gross Tumor Volume using planning-CT and FDG-PETCT

    Authors: Xianghua Ye, Dazhou Guo, Chen-kan Tseng, Jia Ge, Tsung-Min Hung, Ping-Ching Pai, Yanping Ren, Lu Zheng, Xinli Zhu, Ling Peng, Ying Chen, Xiaohua Chen, Chen-Yu Chou, Danni Chen, Jiaze Yu, Yuzhen Chen, Feiran Jiao, Yi Xin, Lingyun Huang, Guotong Xie, Jing Xiao, Le Lu, Senxiang Yan, Dakai Jin, Tsung-Ying Ho

    Abstract: Background: The current clinical workflow for esophageal gross tumor volume (GTV) contouring relies on manual delineation of high labor-costs and interuser variability. Purpose: To validate the clinical applicability of a deep learning (DL) multi-modality esophageal GTV contouring model, developed at 1 institution whereas tested at multiple ones. Methods and Materials: We collected 606 esophageal… ▽ More

    Submitted 11 October, 2021; originally announced October 2021.

    Comments: 36 pages, 10 figures

  50. RetroGAN: A Cyclic Post-Specialization System for Improving Out-of-Knowledge and Rare Word Representations

    Authors: Pedro Colon-Hernandez, Yida Xin, Henry Lieberman, Catherine Havasi, Cynthia Breazeal, Peter Chin

    Abstract: Retrofitting is a technique used to move word vectors closer together or further apart in their space to reflect their relationships in a Knowledge Base (KB). However, retrofitting only works on concepts that are present in that KB. RetroGAN uses a pair of Generative Adversarial Networks (GANs) to learn a one-to-one mapping between concepts and their retrofitted counterparts. It applies that mappi… ▽ More

    Submitted 29 August, 2021; originally announced August 2021.