Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 593 results for author: Gu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20495  [pdf, other

    eess.IV cs.CV

    Enhancing Quantitative Image Synthesis through Pretraining and Resolution Scaling for Bone Mineral Density Estimation from a Plain X-ray Image

    Authors: Yi Gu, Yoshito Otake, Keisuke Uemura, Masaki Takao, Mazen Soufi, Seiji Okada, Nobuhiko Sugano, Hugues Talbot, Yoshinobu Sato

    Abstract: While most vision tasks are essentially visual in nature (for recognition), some important tasks, especially in the medical field, also require quantitative analysis (for quantification) using quantitative images. Unlike in visual analysis, pixel values in quantitative images correspond to physical metrics measured by specific devices (e.g., a depth image). However, recent work has shown that it i… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: SASHIMI, 2024 (MICCAI workshop). 13 pages, 3 figures

  2. arXiv:2407.18157  [pdf, other

    cs.CR cs.DB

    Enhanced Privacy Bound for Shuffle Model with Personalized Privacy

    Authors: Yixuan Liu, Yuhan Liu, Li Xiong, Yujie Gu, Hong Chen

    Abstract: The shuffle model of Differential Privacy (DP) is an enhanced privacy protocol which introduces an intermediate trusted server between local users and a central data curator. It significantly amplifies the central DP guarantee by anonymizing and shuffling the local randomized data. Yet, deriving a tight privacy bound is challenging due to its complicated randomization protocol. While most existing… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  3. arXiv:2407.14198  [pdf

    cs.CV eess.IV

    Double-Shot 3D Shape Measurement with a Dual-Branch Network

    Authors: Mingyang Lei, Jingfan Fan, Long Shao, Hong Song, Deqiang Xiao, Danni Ai, Tianyu Fu, Ying Gu, Jian Yang

    Abstract: The structured light (SL)-based 3D measurement techniques with deep learning have been widely studied, among which speckle projection profilometry (SPP) and fringe projection profilometry (FPP) are two popular methods. However, they generally use a single projection pattern for reconstruction, resulting in fringe order ambiguity or poor reconstruction accuracy. To alleviate these problems, we prop… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  4. arXiv:2407.11044  [pdf, other

    cs.LG cs.AI

    Generalizing soft actor-critic algorithms to discrete action spaces

    Authors: Le Zhang, Yong Gu, Xin Zhao, Yanshuo Zhang, Shu Zhao, Yifei Jin, Xinxin Wu

    Abstract: ATARI is a suite of video games used by reinforcement learning (RL) researchers to test the effectiveness of the learning algorithm. Receiving only the raw pixels and the game score, the agent learns to develop sophisticated strategies, even to the comparable level of a professional human games tester. Ideally, we also want an agent requiring very few interactions with the environment. Previous co… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Chinese Conference on Pattern Recognition and Computer Vision (PRCV) 2024. GitHub Repo https://github.com/lezhang-thu/bigger-better-faster-SAC

  5. arXiv:2407.08555  [pdf, other

    eess.IV cs.CV

    SLoRD: Structural Low-Rank Descriptors for Shape Consistency in Vertebrae Segmentation

    Authors: Xin You, Yixin Lou, Minghui Zhang, Chuyan Zhang, Jie Yang, Yun Gu

    Abstract: Automatic and precise segmentation of vertebrae from CT images is crucial for various clinical applications. However, due to a lack of explicit and strict constraints, existing methods especially for single-stage methods, still suffer from the challenge of intra-vertebrae segmentation inconsistency, which refers to multiple label predictions inside a singular vertebra. For multi-stage methods, ver… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Under review

  6. arXiv:2407.07464  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Video-to-Audio Generation with Hidden Alignment

    Authors: Manjie Xu, Chenxing Li, Yong Ren, Rilin Chen, Yu Gu, Wei Liang, Dong Yu

    Abstract: Generating semantically and temporally aligned audio content in accordance with video input has become a focal point for researchers, particularly following the remarkable breakthrough in text-to-video generation. In this work, we aim to offer insights into the video-to-audio generation paradigm, focusing on three crucial aspects: vision encoders, auxiliary embeddings, and data augmentation techni… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: https://sites.google.com/view/vta-ldm

  7. arXiv:2407.06566  [pdf, other

    cs.CV

    Robust and Explainable Framework to Address Data Scarcity in Diagnostic Imaging

    Authors: Zehui Zhao, Laith Alzubaidi, Jinglan Zhang, Ye Duan, Usman Naseem, Yuantong Gu

    Abstract: Deep learning has significantly advanced automatic medical diagnostics and released the occupation of human resources to reduce clinical pressure, yet the persistent challenge of data scarcity in this area hampers its further improvements and applications. To address this gap, we introduce a novel ensemble framework called `Efficient Transfer and Self-supervised Learning based Ensemble Framework'… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 64 pages, 20 figures

  8. arXiv:2407.05592  [pdf, other

    cs.CV

    An Experimental Comparison of Transfer Learning against Self-supervised Learning

    Authors: Zehui Zhao, Laith Alzubaidi, Jinglan Zhang, Ye Duan, Usman Naseem, Yuantong Gu

    Abstract: Recently, transfer learning and self-supervised learning have gained significant attention within the medical field due to their ability to mitigate the challenges posed by limited data availability, improve model generalisation, and reduce computational expenses. Transfer learning and self-supervised learning hold immense potential for advancing medical research. However, it is crucial to recogni… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 27 pages, 11 figures

  9. arXiv:2407.05505  [pdf, other

    eess.IV cs.CV

    Dynamic Position Transformation and Boundary Refinement Network for Left Atrial Segmentation

    Authors: Fangqiang Xu, Wenxuan Tu, Fan Feng, Malitha Gunawardhana, Jiayuan Yang, Yun Gu, Jichao Zhao

    Abstract: Left atrial (LA) segmentation is a crucial technique for irregular heartbeat (i.e., atrial fibrillation) diagnosis. Most current methods for LA segmentation strictly assume that the input data is acquired using object-oriented center cropping, while this assumption may not always hold in practice due to the high cost of manual object annotation. Random cropping is a straightforward data pre-proces… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024 conference

  10. arXiv:2407.05407  [pdf, other

    cs.SD cs.AI eess.AS

    CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

    Authors: Zhihao Du, Qian Chen, Shiliang Zhang, Kai Hu, Heng Lu, Yexin Yang, Hangrui Hu, Siqi Zheng, Yue Gu, Ziyang Ma, Zhifu Gao, Zhijie Yan

    Abstract: Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity. In this paradigm, speech signals are discretized into token sequences, which are modeled by an LLM with text as prompts and reconstructed by a token-based vocoder to waveforms. Obviously, speech tokens play a critical role… ▽ More

    Submitted 9 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: work in progress. arXiv admin note: substantial text overlap with arXiv:2407.04051

  11. arXiv:2407.05361  [pdf, other

    eess.AS cs.CL

    Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

    Authors: Haorui He, Zengqiang Shang, Chaoren Wang, Xuyuan Li, Yicheng Gu, Hua Hua, Liwei Liu, Chen Yang, Jiaqi Li, Peiyang Shi, Yuancheng Wang, Kai Chen, Pengyuan Zhang, Zhizheng Wu

    Abstract: Recently, speech generation models have made significant progress by using large-scale training data. However, the research community struggle to produce highly spontaneous and human-like speech due to the lack of large-scale, diverse, and spontaneous speech data. This paper present Emilia, the first multilingual speech generation dataset from in-the-wild speech data, and Emilia-Pipe, the first op… ▽ More

    Submitted 12 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Fix typos

  12. arXiv:2407.04693  [pdf, other

    cs.CL cs.AI

    ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

    Authors: Yuzhe Gu, Ziwei Ji, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

    Abstract: Large language models (LLMs) exhibit hallucinations in long-form question-answering tasks across various domains and wide applications. Current hallucination detection and mitigation datasets are limited in domains and sizes, which struggle to scale due to prohibitive labor costs and insufficient reliability of existing hallucination annotators. To facilitate the scalable oversight of LLM hallucin… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 9 pages

  13. arXiv:2407.04051  [pdf, other

    cs.SD cs.AI eess.AS

    FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

    Authors: Keyu An, Qian Chen, Chong Deng, Zhihao Du, Changfeng Gao, Zhifu Gao, Yue Gu, Ting He, Hangrui Hu, Kai Hu, Shengpeng Ji, Yabin Li, Zerui Li, Heng Lu, Haoneng Luo, Xiang Lv, Bin Ma, Ziyang Ma, Chongjia Ni, Changhe Song, Jiaqi Shi, Xian Shi, Hao Wang, Wen Wang, Yuxuan Wang , et al. (8 additional authors not shown)

    Abstract: This report introduces FunAudioLLM, a model family designed to enhance natural voice interactions between humans and large language models (LLMs). At its core are two innovative models: SenseVoice, which handles multilingual speech recognition, emotion recognition, and audio event detection; and CosyVoice, which facilitates natural speech generation with control over multiple languages, timbre, sp… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress. Authors are listed in alphabetical order by family name

  14. arXiv:2407.01494  [pdf, other

    cs.CV cs.SD eess.AS

    FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

    Authors: Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen

    Abstract: We study Neural Foley, the automatic generation of high-quality sound effects synchronizing with videos, enabling an immersive audio-visual experience. Despite its wide range of applications, existing approaches encounter limitations when it comes to simultaneously synthesizing high-quality and video-aligned (i.e.,, semantic relevant and temporal synchronized) sounds. To overcome these limitations… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Project page: https://foleycrafter.github.io/

  15. arXiv:2407.00569  [pdf, other

    cs.CV cs.AI cs.CL

    Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models

    Authors: Weihong Zhong, Xiaocheng Feng, Liang Zhao, Qiming Li, Lei Huang, Yuxuan Gu, Weitao Ma, Yuan Xu, Bing Qin

    Abstract: Though advanced in understanding visual information with human languages, Large Vision-Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is that during multimodal interaction, the generated hallucinations could influence the LVLMs' subsequent generation. Thus, we raise a question: When presented with a query relevant to the previously generated hallucination, w… ▽ More

    Submitted 28 July, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Main Conference. 21 pages, 20 figures

  16. arXiv:2406.19774  [pdf, other

    cs.CL

    Direct Preference Knowledge Distillation for Large Language Models

    Authors: Yixing Li, Yuxian Gu, Li Dong, Dequan Wang, Yu Cheng, Furu Wei

    Abstract: In the field of large language models (LLMs), Knowledge Distillation (KD) is a critical technique for transferring capabilities from teacher models to student models. However, existing KD methods face limitations and challenges in distillation of LLMs, including efficiency and insufficient measurement capabilities of traditional KL divergence. It is shown that LLMs can serve as an implicit reward… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  17. arXiv:2406.16062  [pdf, other

    cs.NE

    Towards Biologically Plausible Computing: A Comprehensive Comparison

    Authors: Changze Lv, Yufei Gu, Zhengkang Guo, Zhibo Xu, Yixin Wu, Feiran Zhang, Tianyuan Shi, Zhenghua Wang, Ruicheng Yin, Yu Shang, Siqi Zhong, Xiaohua Wang, Muling Wu, Wenhao Liu, Tianlong Li, Jianhao Zhu, Cenyuan Zhang, Zixuan Ling, Xiaoqing Zheng

    Abstract: Backpropagation is a cornerstone algorithm in training neural networks for supervised learning, which uses a gradient descent method to update network weights by minimizing the discrepancy between actual and desired outputs. Despite its pivotal role in propelling deep learning advancements, the biological plausibility of backpropagation is questioned due to its requirements for weight symmetry, gl… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  18. arXiv:2406.14491  [pdf, other

    cs.CL

    Instruction Pre-Training: Language Models are Supervised Multitask Learners

    Authors: Daixuan Cheng, Yuxian Gu, Shaohan Huang, Junyu Bi, Minlie Huang, Furu Wei

    Abstract: Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds significant promise, as scaling it in the post-training stage trends towards better generalization. In this paper, we explore supervised multitask pre-training by proposing Instruction Pre-Training, a framework that scalably augment… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  19. arXiv:2406.11115  [pdf, other

    cs.CL

    Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

    Authors: Letian Peng, Yi Gu, Chengyu Dong, Zihan Wang, Jingbo Shang

    Abstract: For extremely weak-supervised text classification, pioneer research generates pseudo labels by mining texts similar to the class names from the raw corpus, which may end up with very limited or even no samples for the minority classes. Recent works have started to generate the relevant texts by prompting LLMs using the class names or definitions; however, there is a high risk that LLMs cannot gene… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  20. arXiv:2406.10563  [pdf, other

    cs.LG cs.AI cs.CR

    Privacy-Preserving Heterogeneous Federated Learning for Sensitive Healthcare Data

    Authors: Yukai Xu, Jingfeng Zhang, Yujie Gu

    Abstract: In the realm of healthcare where decentralized facilities are prevalent, machine learning faces two major challenges concerning the protection of data and models. The data-level challenge concerns the data privacy leakage when centralizing data with sensitive personal information. While the model-level challenge arises from the heterogeneity of local models, which need to be collaboratively traine… ▽ More

    Submitted 4 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: Accepted to the 2024 IEEE Conference on Artificial Intelligence (IEEE CAI 2024)

  21. arXiv:2406.09618  [pdf, other

    cs.CL cs.AI cs.IR cs.SD eess.AS

    Multi-Modal Retrieval For Large Language Model Based Speech Recognition

    Authors: Jari Kolehmainen, Aditya Gourav, Prashanth Gurunath Shivakumar, Yile Gu, Ankur Gandhe, Ariya Rastrow, Grant Strimel, Ivan Bulyko

    Abstract: Retrieval is a widely adopted approach for improving language models leveraging external information. As the field moves towards multi-modal large language models, it is important to extend the pure text based methods to incorporate other modalities in retrieval as well for applications across the wide spectrum of machine learning tasks and data types. In this work, we propose multi-modal retrieva… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  22. arXiv:2406.09455  [pdf, other

    cs.CV cs.AI cs.CL

    Pandora: Towards General World Model with Natural Language Actions and Video States

    Authors: Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Website: https://world-model.maitrix.org/

  23. arXiv:2406.08756  [pdf, other

    cs.DC cs.LG

    Optimizing Large Model Training through Overlapped Activation Recomputation

    Authors: Ping Chen, Wenjie Zhang, Shuibing He, Yingjie Gu, Zhuwei Peng, Kexin Huang, Xuan Zhan, Weijian Chen, Yi Zheng, Zhefeng Wang, Yanlong Yin, Gang Chen

    Abstract: Large model training has been using recomputation to alleviate the memory pressure and pipelining to exploit the parallelism of data, tensor, and devices. The existing recomputation approaches may incur up to 40% overhead when training real-world models, e.g., the GPT model with 22B parameters. This is because they are executed on demand in the critical training path. In this paper, we design a ne… ▽ More

    Submitted 27 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: 13 pages

  24. arXiv:2406.08446  [pdf, other

    cs.CL cs.AI

    OLMES: A Standard for Language Model Evaluations

    Authors: Yuling Gu, Oyvind Tafjord, Bailey Kuehl, Dany Haddad, Jesse Dodge, Hannaneh Hajishirzi

    Abstract: Progress in AI is often demonstrated by new models claiming improved performance on tasks measuring model capabilities. Evaluating language models in particular is challenging, as small changes to how a model is evaluated on a task can lead to large changes in measured performance. There is no common standard setup, so different models are evaluated on the same tasks in different ways, leading to… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  25. arXiv:2406.07430  [pdf, other

    cs.CL cs.MM

    Learning Domain-Invariant Features for Out-of-Context News Detection

    Authors: Yimeng Gu, Mengqi Zhang, Ignacio Castro, Shu Wu, Gareth Tyson

    Abstract: Multimodal out-of-context news is a common type of misinformation on online media platforms. This involves posting a caption, alongside an invalid out-of-context news image. Reflecting its importance, researchers have developed models to detect such misinformation. However, a common limitation of these models is that they only consider the scenario where pre-labeled data is available for each doma… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  26. arXiv:2406.06959  [pdf, other

    cs.LG cs.AI

    Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems

    Authors: Jiawei Zhang, Jiaxin Zhuang, Cheng Jin, Gen Li, Yuantao Gu

    Abstract: The recent emergence of diffusion models has significantly advanced the precision of learnable priors, presenting innovative avenues for addressing inverse problems. Since inverse problems inherently entail maximum a posteriori estimation, previous works have endeavored to integrate diffusion priors into the optimization frameworks. However, prevailing optimization-based inverse algorithms primari… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  27. arXiv:2406.06382  [pdf, other

    cs.CV cs.CL cs.LG

    Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization

    Authors: Yi Gu, Zhendong Wang, Yueqin Yin, Yujia Xie, Mingyuan Zhou

    Abstract: Aligning large language models with human preferences has emerged as a critical focus in language modeling research. Yet, integrating preference learning into Text-to-Image (T2I) generative models is still relatively uncharted territory. The Diffusion-DPO technique made initial strides by employing pairwise preference learning in diffusion models tailored for specific text prompts. We introduce Di… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  28. arXiv:2406.05325  [pdf, other

    eess.AS cs.SD

    LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance

    Authors: Shihao Chen, Yu Gu, Jie Zhang, Na Li, Rilin Chen, Liping Chen, Lirong Dai

    Abstract: Any-to-any singing voice conversion (SVC) is an interesting audio editing technique, aiming to convert the singing voice of one singer into that of another, given only a few seconds of singing data. However, during the conversion process, the issue of timbre leakage is inevitable: the converted singing voice still sounds like the original singer's voice. To tackle this, we propose a latent diffusi… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  29. arXiv:2406.02744  [pdf, other

    cs.CR cs.LG

    DPDR: Gradient Decomposition and Reconstruction for Differentially Private Deep Learning

    Authors: Yixuan Liu, Li Xiong, Yuhan Liu, Yujie Gu, Ruixuan Liu, Hong Chen

    Abstract: Differentially Private Stochastic Gradients Descent (DP-SGD) is a prominent paradigm for preserving privacy in deep learning. It ensures privacy by perturbing gradients with random noise calibrated to their entire norm at each training step. However, this perturbation suffers from a sub-optimal performance: it repeatedly wastes privacy budget on the general converging direction shared among gradie… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 14 pages

  30. arXiv:2406.01549  [pdf, other

    cs.CL cs.AI

    An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation

    Authors: Kun Zhu, Xiaocheng Feng, Xiyuan Du, Yuxuan Gu, Weijiang Yu, Haotian Wang, Qianglong Chen, Zheng Chu, Jingchang Chen, Bing Qin

    Abstract: Retrieval-augmented generation integrates the capabilities of large language models with relevant information retrieved from an extensive corpus, yet encounters challenges when confronted with real-world noisy data. One recent solution is to train a filter module to find relevant content but only achieve suboptimal noise compression. In this paper, we propose to introduce the information bottlenec… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024

  31. arXiv:2405.20315  [pdf, other

    cs.CL cs.AI

    ANAH: Analytical Annotation of Hallucinations in Large Language Models

    Authors: Ziwei Ji, Yuzhe Gu, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

    Abstract: Reducing the `$\textit{hallucination}$' problem of Large Language Models (LLMs) is crucial for their wide applications. A comprehensive and fine-grained measurement of the hallucination is the first key step for the governance of this issue but is under-explored in the community. Thus, we present $\textbf{ANAH}$, a bilingual dataset that offers $\textbf{AN}$alytical $\textbf{A}$nnotation of… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by ACL 2024

  32. arXiv:2405.17881  [pdf, other

    cs.LG

    Crystal-LSBO: Automated Design of De Novo Crystals with Latent Space Bayesian Optimization

    Authors: Onur Boyar, Yanheng Gu, Yuji Tanaka, Shunsuke Tonogai, Tomoya Itakura, Ichiro Takeuchi

    Abstract: Generative modeling of crystal structures is significantly challenged by the complexity of input data, which constrains the ability of these models to explore and discover novel crystals. This complexity often confines de novo design methodologies to merely small perturbations of known crystals and hampers the effective application of advanced optimization techniques. One such optimization techniq… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: 10 pages, 5 figures

  33. arXiv:2405.17100  [pdf, other

    cs.CR cs.SD eess.AS

    Sok: Comprehensive Security Overview, Challenges, and Future Directions of Voice-Controlled Systems

    Authors: Haozhe Xu, Cong Wu, Yangyang Gu, Xingcan Shang, Jing Chen, Kun He, Ruiying Du

    Abstract: The integration of Voice Control Systems (VCS) into smart devices and their growing presence in daily life accentuate the importance of their security. Current research has uncovered numerous vulnerabilities in VCS, presenting significant risks to user privacy and security. However, a cohesive and systematic examination of these vulnerabilities and the corresponding solutions is still absent. This… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  34. arXiv:2405.16363  [pdf, other

    cs.IR cs.AI

    LLMs for User Interest Exploration in Large-scale Recommendation Systems

    Authors: Jianling Wang, Haokai Lu, Yifan Liu, He Ma, Yueqi Wang, Yang Gu, Shuzhou Zhang, Ningren Han, Shuchao Bi, Lexi Baugher, Ed Chi, Minmin Chen

    Abstract: Traditional recommendation systems are subject to a strong feedback loop by learning from and reinforcing past user-item interactions, which in turn limits the discovery of novel user interests. To address this, we introduce a hybrid hierarchical framework combining Large Language Models (LLMs) and classic recommendation models for user interest exploration. The framework controls the interfacing… ▽ More

    Submitted 7 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  35. arXiv:2405.15711  [pdf, other

    cs.RO

    An Adaptive Framework for Manipulator Skill Reproduction in Dynamic Environments

    Authors: Ryan Donald, Brendan Hertel, Stephen Misenti, Yan Gu, Reza Azadeh

    Abstract: Robot skill learning and execution in uncertain and dynamic environments is a challenging task. This paper proposes an adaptive framework that combines Learning from Demonstration (LfD), environment state prediction, and high-level decision making. Proactive adaptation prevents the need for reactive adaptation, which lags behind changes in the environment rather than anticipating them. We propose… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Paper accepted at Ubiquitous Robots 2024 held at New York University on June 24 to June 27, 2024

  36. arXiv:2405.14831  [pdf, other

    cs.CL cs.AI

    HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

    Authors: Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, Yu Su

    Abstract: In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integra… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  37. arXiv:2405.12971  [pdf, other

    cs.CV

    BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once

    Authors: Theodore Zhao, Yu Gu, Jianwei Yang, Naoto Usuyama, Ho Hin Lee, Tristan Naumann, Jianfeng Gao, Angela Crabtree, Jacob Abel, Christine Moung-Wen, Brian Piening, Carlo Bifulco, Mu Wei, Hoifung Poon, Sheng Wang

    Abstract: Biomedical image analysis is fundamental for biomedical discovery in cell biology, pathology, radiology, and many other biomedical domains. Holistic image analysis comprises interdependent subtasks such as segmentation, detection, and recognition of relevant objects. Here, we propose BiomedParse, a biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, an… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: Project page: https://aka.ms/biomedparse-project

  38. arXiv:2405.12850  [pdf, other

    cs.CV

    Weakly supervised alignment and registration of MR-CT for cervical cancer radiotherapy

    Authors: Jjahao Zhang, Yin Gu, Deyu Sun, Yuhua Gao, Ming Gao, Ming Cui, Teng Zhang, He Ma

    Abstract: Cervical cancer is one of the leading causes of death in women, and brachytherapy is currently the primary treatment method. However, it is important to precisely define the extent of paracervical tissue invasion to improve cancer diagnosis and treatment options. The fusion of the information characteristics of both computed tomography (CT) and magnetic resonance imaging(MRI) modalities may be use… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  39. arXiv:2405.11976  [pdf, other

    cs.CV

    Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays

    Authors: Zhichao Sun, Yuliang Gu, Yepeng Liu, Zerui Zhang, Zhou Zhao, Yongchao Xu

    Abstract: Anomaly detection in chest X-rays is a critical task. Most methods mainly model the distribution of normal images, and then regard significant deviation from normal distribution as anomaly. Recently, CLIP-based methods, pre-trained on a large number of medical images, have shown impressive performance on zero/few-shot downstream tasks. In this paper, we aim to explore the potential of CLIP-based m… ▽ More

    Submitted 19 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: MICCAI 2024 Early Accept

  40. arXiv:2405.11868  [pdf, other

    cs.LG cs.AI cs.CE cs.IR cs.SI

    Towards Graph Contrastive Learning: A Survey and Beyond

    Authors: Wei Ju, Yifan Wang, Yifang Qin, Zhengyang Mao, Zhiping Xiao, Junyu Luo, Junwei Yang, Yiyang Gu, Dongjie Wang, Qingqing Long, Siyu Yi, Xiao Luo, Ming Zhang

    Abstract: In recent years, deep learning on graphs has achieved remarkable success in various domains. However, the reliance on annotated graph data remains a significant bottleneck due to its prohibitive cost and time-intensive nature. To address this challenge, self-supervised learning (SSL) on graphs has gained increasing attention and has made significant progress. SSL enables machine learning models to… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  41. arXiv:2405.10481  [pdf, other

    cs.LG cs.AI

    Multi-Evidence based Fact Verification via A Confidential Graph Neural Network

    Authors: Yuqing Lan, Zhenghao Liu, Yu Gu, Xiaoyuan Yi, Xiaohua Li, Liner Yang, Ge Yu

    Abstract: Fact verification tasks aim to identify the integrity of textual contents according to the truthful corpus. Existing fact verification models usually build a fully connected reasoning graph, which regards claim-evidence pairs as nodes and connects them with edges. They employ the graph to propagate the semantics of the nodes. Nevertheless, the noisy nodes usually propagate their semantics via the… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 12pages

  42. arXiv:2405.07679  [pdf, other

    cs.LG

    Class-wise Activation Unravelling the Engima of Deep Double Descent

    Authors: Yufei Gu

    Abstract: Double descent presents a counter-intuitive aspect within the machine learning domain, and researchers have observed its manifestation in various models and tasks. While some theoretical explanations have been proposed for this phenomenon in specific contexts, an accepted theory for its occurring mechanism in deep learning remains yet to be established. In this study, we revisited the phenomenon o… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.13572

  43. arXiv:2405.06003  [pdf, ps, other

    stat.ML cs.LG

    Binary Hypothesis Testing for Softmax Models and Leverage Score Models

    Authors: Yeqi Gao, Yuzhou Gu, Zhao Song

    Abstract: Softmax distributions are widely used in machine learning, including Large Language Models (LLMs) where the attention unit uses softmax distributions. We abstract the attention unit as the softmax model, where given a vector input, the model produces an output drawn from the softmax distribution (which depends on the vector input). We consider the fundamental problem of binary hypothesis testing i… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  44. arXiv:2405.05633  [pdf, other

    cs.DC

    HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions

    Authors: Jiabin Chen, Fei Xu, Yikun Gu, Li Chen, Fangming Liu, Zhi Zhou

    Abstract: Deep Neural Network (DNN) inference on serverless functions is gaining prominence due to its potential for substantial budget savings. Existing works on serverless DNN inference solely optimize batching requests from one application with a single Service Level Objective (SLO) on CPU functions. However, production serverless DNN inference traces indicate that the request arrival rate of application… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 10 pages, 14 figures, accepted by IWQOS24

  45. arXiv:2405.04773  [pdf, other

    cs.LG cs.AI cs.IR cs.SI

    Hypergraph-enhanced Dual Semi-supervised Graph Classification

    Authors: Wei Ju, Zhengyang Mao, Siyu Yi, Yifang Qin, Yiyang Gu, Zhiping Xiao, Yifan Wang, Xiao Luo, Ming Zhang

    Abstract: In this paper, we study semi-supervised graph classification, which aims at accurately predicting the categories of graphs in scenarios with limited labeled graphs and abundant unlabeled graphs. Despite the promising capability of graph neural networks (GNNs), they typically require a large number of costly labeled graphs, while a wealth of unlabeled graphs fail to be effectively utilized. Moreove… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  46. arXiv:2405.04715  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Causality Pursuit from Heterogeneous Environments via Neural Adversarial Invariance Learning

    Authors: Yihong Gu, Cong Fang, Peter Bühlmann, Jianqing Fan

    Abstract: Pursuing causality from data is a fundamental problem in scientific discovery, treatment intervention, and transfer learning. This paper introduces a novel algorithmic method for addressing nonparametric invariance and causality learning in regression models across multiple environments, where the joint distribution of response variables and covariates varies, but the conditional expectations of o… ▽ More

    Submitted 30 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 48 pages, 7 figures with appendix

    MSC Class: 62G08

  47. arXiv:2405.02760  [pdf, other

    cs.CE cs.SI

    GTFS2STN: Analyzing GTFS Transit Data by Generating Spatiotemporal Transit Network

    Authors: Diyi Liu, Jing Guo, Yangsong Gu, Meredith King, Lee D. Han, Candace Brakewood

    Abstract: GTFS, the General Transit Feed Specialization, is an open standard format to record transit information used by thousands of transit agencies across the world. By converting a static GTFS transit network to a spatiotemporal network connecting bus stops over space and time, a preliminary tool named GTFS2STN is implemented to analyze the accessibility of the transit system. Furthermore, a simple app… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 8 pages, 8 figures

  48. arXiv:2405.00526  [pdf, other

    cs.CR

    JNI Global References Are Still Vulnerable: Attacks and Defenses

    Authors: Yi He, Yuan Zhou, Yacong Gu, Purui Su, Qi Li, Yajin Zhou, Yong Jiang

    Abstract: System services and resources in Android are accessed through IPC based mechanisms. Previous research has demonstrated that they are vulnerable to the denial-of-service attack (DoS attack). For instance, the JNI global reference (JGR), which is widely used by system services, can be exhausted to cause the system reboot (hence the name JGRE attack). Even though the Android team tries to fix the pro… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  49. arXiv:2404.19441  [pdf, other

    cs.SD eess.AS

    ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

    Authors: Yuzhe Gu, Enmao Diao

    Abstract: Neural speech codecs aim to compress input signals into minimal bits while maintaining content quality in a low-latency manner. However, existing codecs often trade computational complexity for reconstruction performance. These codecs primarily use convolutional blocks for feature transformation layers, which are not inherently suited for capturing the local redundancies in speech signals. To comp… ▽ More

    Submitted 21 June, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  50. arXiv:2404.19019  [pdf, other

    cs.DS cs.DC

    Optimal Parallel Algorithms for Dendrogram Computation and Single-Linkage Clustering

    Authors: Laxman Dhulipala, Xiaojun Dong, Kishen N Gowda, Yan Gu

    Abstract: Computing a Single-Linkage Dendrogram (SLD) is a key step in the classic single-linkage hierarchical clustering algorithm. Given an input edge-weighted tree $T$, the SLD of $T$ is a binary dendrogram that summarizes the $n-1$ clusterings obtained by contracting the edges of $T$ in order of weight. Existing algorithms for computing the SLD all require $Ω(n\log n)$ work where $n = |T|$. Furthermore,… ▽ More

    Submitted 12 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: To appear at SPAA 2024