Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,290 results for author: Song, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.20564  [pdf, other

    cs.CL

    CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge

    Authors: Tianshi Zheng, Jiaxin Bai, Yicheng Wang, Tianqing Fang, Yue Guo, Yauwai Yim, Yangqiu Song

    Abstract: While large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks by acquiring rich factual knowledge from their broad training data, their ability to synthesize and logically reason with this knowledge in complex ways remains underexplored. In this work, we present a systematic evaluation of state-of-the-art LLMs' complex logical reasoni… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 9 pages

  2. arXiv:2407.19740  [pdf, other

    cs.CL cs.AI

    KNOWCOMP POKEMON Team at DialAM-2024: A Two-Stage Pipeline for Detecting Relations in Dialogical Argument Mining

    Authors: Zihao Zheng, Zhaowei Wang, Qing Zong, Yangqiu Song

    Abstract: Dialogical Argument Mining(DialAM) is an important branch of Argument Mining(AM). DialAM-2024 is a shared task focusing on dialogical argument mining, which requires us to identify argumentative relations and illocutionary relations among proposition nodes and locution nodes. To accomplish this, we propose a two-stage pipeline, which includes the Two-Step S-Node Prediction Model in Stage 1 and the… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Published on the 11th Workshop on Argument Mining

  3. GradCraft: Elevating Multi-task Recommendations through Holistic Gradient Crafting

    Authors: Yimeng Bai, Yang Zhang, Fuli Feng, Jing Lu, Xiaoxue Zang, Chenyi Lei, Yang Song

    Abstract: Recommender systems require the simultaneous optimization of multiple objectives to accurately model user interests, necessitating the application of multi-task learning methods. However, existing multi-task learning methods in recommendations overlook the specific characteristics of recommendation scenarios, falling short in achieving proper gradient balance. To address this challenge, we set the… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD'24

    ACM Class: H.3.3; H.3.5

  4. arXiv:2407.18939  [pdf

    cs.CY cs.AI

    Promoting AI Competencies for Medical Students: A Scoping Review on Frameworks, Programs, and Tools

    Authors: Yingbo Ma, Yukyeong Song, Jeremy A. Balch, Yuanfang Ren, Divya Vellanki, Zhenhong Hu, Meghan Brennan, Suraj Kolla, Ziyuan Guan, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Parisa Rashidi, Tyler J. Loftus, Azra Bihorac, Benjamin Shickel

    Abstract: As more clinical workflows continue to be augmented by artificial intelligence (AI), AI literacy among physicians will become a critical requirement for ensuring safe and ethical AI-enabled patient care. Despite the evolving importance of AI in healthcare, the extent to which it has been adopted into traditional and often-overloaded medical curricula is currently unknown. In a scoping review of 1,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 25 pages, 2 figures, 3 tables

  5. arXiv:2407.17842  [pdf, other

    cs.LG cs.AI

    On the Opportunities of (Re)-Exploring Atmospheric Science by Foundation Models: A Case Study

    Authors: Lujia Zhang, Hanzhe Cui, Yurong Song, Chenyue Li, Binhang Yuan, Mengqian Lu

    Abstract: Most state-of-the-art AI applications in atmospheric science are based on classic deep learning approaches. However, such approaches cannot automatically integrate multiple complicated procedures to construct an intelligent agent, since each functionality is enabled by a separate model learned from independent climate datasets. The emergence of foundation models, especially multimodal foundation m… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 28 pages, 12 figures

  6. arXiv:2407.17630  [pdf, other

    cs.CV

    Revising the Problem of Partial Labels from the Perspective of CNNs' Robustness

    Authors: Xin Zhang, Yuqi Song, Wyatt McCurdy, Xiaofeng Wang, Fei Zuo

    Abstract: Convolutional neural networks (CNNs) have gained increasing popularity and versatility in recent decades, finding applications in diverse domains. These remarkable achievements are greatly attributed to the support of extensive datasets with precise labels. However, annotating image datasets is intricate and complex, particularly in the case of multi-label datasets. Hence, the concept of partial-l… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  7. arXiv:2407.17398  [pdf, other

    cs.CV

    3D Question Answering for City Scene Understanding

    Authors: Penglei Sun, Yaoxian Song, Xiang Liu, Xiaofei Yang, Qiang Wang, Tiefeng Li, Yang Yang, Xiaowen Chu

    Abstract: 3D multimodal question answering (MQA) plays a crucial role in scene understanding by enabling intelligent agents to comprehend their surroundings in 3D environments. While existing research has primarily focused on indoor household tasks and outdoor roadside autonomous driving tasks, there has been limited exploration of city-level scene understanding tasks. Furthermore, existing research faces c… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  8. arXiv:2407.16741  [pdf, other

    cs.SE cs.AI cs.CL

    OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

    Authors: Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig

    Abstract: Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenD… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/OpenDevin/OpenDevin

  9. TWIN V2: Scaling Ultra-Long User Behavior Sequence Modeling for Enhanced CTR Prediction at Kuaishou

    Authors: Zihua Si, Lin Guan, ZhongXiang Sun, Xiaoxue Zang, Jing Lu, Yiqun Hui, Xingchao Cao, Zeyu Yang, Yichen Zheng, Dewei Leng, Kai Zheng, Chenbin Zhang, Yanan Niu, Yang Song, Kun Gai

    Abstract: The significance of modeling long-term user interests for CTR prediction tasks in large-scale recommendation systems is progressively gaining attention among researchers and practitioners. Existing work, such as SIM and TWIN, typically employs a two-stage approach to model long-term user behavior sequences for efficiency concerns. The first stage rapidly retrieves a subset of sequences related to… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by CIKM 2024

  10. arXiv:2407.16151  [pdf, other

    cs.RO

    Optimal camera-robot pose estimation in linear time from points and lines

    Authors: Guangyang Zeng, Biqiang Mu, Qingcheng Zeng, Yuchen Song, Chulin Dai, Guodong Shi, Junfeng Wu

    Abstract: Camera pose estimation is a fundamental problem in robotics. This paper focuses on two issues of interest: First, point and line features have complementary advantages, and it is of great value to design a uniform algorithm that can fuse them effectively; Second, with the development of modern front-end techniques, a large number of features can exist in a single image, which presents a potential… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  11. arXiv:2407.15873  [pdf, other

    cs.LG cs.AI

    CRMSP: A Semi-supervised Approach for Key Information Extraction with Class-Rebalancing and Merged Semantic Pseudo-Labeling

    Authors: Qi Zhang, Yonghong Song, Pengcheng Guo, Yangyang Hui

    Abstract: There is a growing demand in the field of KIE (Key Information Extraction) to apply semi-supervised learning to save manpower and costs, as training document data using fully-supervised methods requires labor-intensive manual annotation. The main challenges of applying SSL in the KIE are (1) underestimation of the confidence of tail classes in the long-tailed distribution and (2) difficulty in ach… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  12. arXiv:2407.15617  [pdf, other

    cs.CV cs.AI

    Norface: Improving Facial Expression Analysis by Identity Normalization

    Authors: Hanwei Liu, Rudong An, Zhimeng Zhang, Bowen Ma, Wei Zhang, Yan Song, Yujing Hu, Wei Chen, Yu Ding

    Abstract: Facial Expression Analysis remains a challenging task due to unexpected task-irrelevant noise, such as identity, head pose, and background. To address this issue, this paper proposes a novel framework, called Norface, that is unified for both Action Unit (AU) analysis and Facial Emotion Recognition (FER) tasks. Norface consists of a normalization network and a classification network. First, the ca… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  13. arXiv:2407.15176  [pdf, other

    cs.CL cs.AI

    Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope

    Authors: Xiaoran Liu, Qipeng Guo, Yuerong Song, Zhigeng Liu, Kai Lv, Hang Yan, Linlin Li, Qun Liu, Xipeng Qiu

    Abstract: The maximum supported context length is a critical bottleneck limiting the practical application of the Large Language Model (LLM). Although existing length extrapolation methods can extend the context of LLMs to millions of tokens, these methods all have an explicit upper bound. In this work, we propose LongCache, a training-free approach that enables LLM to support an infinite context with finit… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 8 pages, 7 figures

  14. arXiv:2407.15132  [pdf

    q-bio.NC cs.LG

    Deep multimodal saliency parcellation of cerebellar pathways: linking microstructure and individual function through explainable multitask learning

    Authors: Ari Tchetchenian, Leo Zekelman, Yuqian Chen, Jarrett Rushmore, Fan Zhang, Edward H. Yeterian, Nikos Makris, Yogesh Rathi, Erik Meijering, Yang Song, Lauren J. O'Donnell

    Abstract: Parcellation of human cerebellar pathways is essential for advancing our understanding of the human brain. Existing diffusion MRI tractography parcellation methods have been successful in defining major cerebellar fibre tracts, while relying solely on fibre tract structure. However, each fibre tract may relay information related to multiple cognitive and motor functions of the cerebellum. Hence, i… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  15. arXiv:2407.14883  [pdf, other

    eess.SY cs.AI cs.NE

    Inferring Ingrained Remote Information in AC Power Flows Using Neuromorphic Modality Regime

    Authors: Xiaoguang Diao, Yubo Song, Subham Sahoo

    Abstract: In this paper, we infer ingrained remote information in AC power flows using spiking neural network (SNN) as edge processors for efficient coordination of power electronic converters. This work unifies power and information as a means of data normalization using a multi-modal regime in the form of spikes using energy-efficient neuromorphic processing and semantics theory. Firstly, we organize the… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  16. arXiv:2407.14078  [pdf, other

    cs.CV

    Stable-Hair: Real-World Hair Transfer via Diffusion Model

    Authors: Yuxuan Zhang, Qing Zhang, Yiren Song, Jiaming Liu

    Abstract: Current hair transfer methods struggle to handle diverse and intricate hairstyles, thus limiting their applicability in real-world scenarios. In this paper, we propose a novel diffusion-based hair transfer framework, named \textit{Stable-Hair}, which robustly transfers a wide range of real-world hairstyles onto user-provided faces for virtual hair try-on. To achieve this goal, our Stable-Hair fram… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  17. arXiv:2407.14053  [pdf, other

    cs.GR cs.CV

    DirectL: Efficient Radiance Fields Rendering for 3D Light Field Displays

    Authors: Zongyuan Yang, Baolin Liu, Yingde Song, Yongping Xiong, Lan Yi, Zhaohe Zhang, Xunbo Yu

    Abstract: Autostereoscopic display, despite decades of development, has not achieved extensive application, primarily due to the daunting challenge of 3D content creation for non-specialists. The emergence of Radiance Field as an innovative 3D representation has markedly revolutionized the domains of 3D reconstruction and generation. This technology greatly simplifies 3D content creation for common users, b… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  18. arXiv:2407.13068  [pdf, other

    cs.LG cs.CR

    Krait: A Backdoor Attack Against Graph Prompt Tuning

    Authors: Ying Song, Rita Singh, Balaji Palanisamy

    Abstract: Graph prompt tuning has emerged as a promising paradigm to effectively transfer general graph knowledge from pre-trained models to various downstream tasks, particularly in few-shot contexts. However, its susceptibility to backdoor attacks, where adversaries insert triggers to manipulate outcomes, raises a critical concern. We conduct the first study to investigate such vulnerability, revealing th… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Previously submitted to CCS on 04/29

  19. arXiv:2407.11930  [pdf, other

    cs.CL

    Fine-grained Hallucination Detection and Mitigation in Long-form Question Answering

    Authors: Rachneet Sachdeva, Yixiao Song, Mohit Iyyer, Iryna Gurevych

    Abstract: Long-form question answering (LFQA) aims to provide thorough and in-depth answers to complex questions, enhancing comprehension. However, such detailed responses are prone to hallucinations and factual inconsistencies, challenging their faithful evaluation. This work introduces HaluQuestQA, the first hallucination dataset with localized error annotations for human-written and model-generated LFQA… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Code and data are available: https://github.com/UKPLab/arxiv2024-lfqa-hallucination

  20. arXiv:2407.11435  [pdf, other

    q-bio.GN cs.LG stat.ML

    Genomic Language Models: Opportunities and Challenges

    Authors: Gonzalo Benegas, Chengzhong Ye, Carlos Albors, Jianan Canal Li, Yun S. Song

    Abstract: Large language models (LLMs) are having transformative impacts across a wide range of scientific fields, particularly in the biomedical sciences. Just as the goal of Natural Language Processing is to understand sequences of words, a major objective in biology is to understand biological sequences. Genomic Language Models (gLMs), which are LLMs trained on DNA sequences, have the potential to signif… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Review article; 25 pages, 3 figures, 1 table

    MSC Class: 92-08; 92B20; 68T50; 68T07

  21. arXiv:2407.10804  [pdf, other

    cs.CL

    Mix-CPT: A Domain Adaptation Framework via Decoupling Knowledge Learning and Format Alignment

    Authors: Jinhao Jiang, Junyi Li, Wayne Xin Zhao, Yang Song, Tao Zhang, Ji-Rong Wen

    Abstract: Adapting general large language models (LLMs) to specialized domains presents great challenges due to varied data distributions. This adaptation typically requires continual pre-training on massive domain-specific corpora to facilitate knowledge memorization, followed by training to apply this knowledge following human instructions and preferences. However, this method may result in inefficient kn… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: LLM, CPT, knowledge learning, format alignment; work in progress

  22. arXiv:2407.10648  [pdf, other

    cs.RO

    Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics

    Authors: Yuang Zhang, Yu Hu, Yunlong Song, Danping Zou, Weiyao Lin

    Abstract: Swarm navigation in cluttered environments is a grand challenge in robotics. This work combines deep learning with first-principle physics through differentiable simulation to enable autonomous navigation of multiple aerial robots through complex environments at high speed. Our approach optimizes a neural network control policy directly by backpropagating loss gradients through the robot simulatio… ▽ More

    Submitted 15 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  23. arXiv:2407.10484  [pdf, other

    cs.CV cs.LG

    Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry

    Authors: Ziheng Chen, Yue Song, Xiao-Jun Wu, Gaowen Liu, Nicu Sebe

    Abstract: Global Covariance Pooling (GCP) has been demonstrated to improve the performance of Deep Neural Networks (DNNs) by exploiting second-order statistics of high-level representations. GCP typically performs classification of the covariance matrices by applying matrix function normalization, such as matrix logarithm or power, followed by a Euclidean classifier. However, covariance matrices inherently… ▽ More

    Submitted 20 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 24 pages, 3 figures

  24. arXiv:2407.10457  [pdf, other

    cs.CL cs.AI

    The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

    Authors: Yifan Song, Guoyin Wang, Sujian Li, Bill Yuchen Lin

    Abstract: Current evaluations of large language models (LLMs) often overlook non-determinism, typically focusing on a single output per example. This limits our understanding of LLM performance variability in real-world applications. Our study addresses this issue by exploring key questions about the performance differences between greedy decoding and sampling, identifying benchmarks' consistency regarding… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  25. arXiv:2407.08883  [pdf

    cs.CV

    TractGraphFormer: Anatomically Informed Hybrid Graph CNN-Transformer Network for Classification from Diffusion MRI Tractography

    Authors: Yuqian Chen, Fan Zhang, Meng Wang, Leo R. Zekelman, Suheyla Cetin-Karayumak, Tengfei Xue, Chaoyi Zhang, Yang Song, Nikos Makris, Yogesh Rathi, Weidong Cai, Lauren J. O'Donnell

    Abstract: The relationship between brain connections and non-imaging phenotypes is increasingly studied using deep neural networks. However, the local and global properties of the brain's white matter networks are often overlooked in convolutional network design. We introduce TractGraphFormer, a hybrid Graph CNN-Transformer deep learning framework tailored for diffusion MRI tractography. This model leverage… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 23 pages, 4 figures

  26. arXiv:2407.08010  [pdf

    cs.LG cs.NE

    A New Self-organizing Interval Type-2 Fuzzy Neural Network for Multi-Step Time Series Prediction

    Authors: Fulong Yao, Wanqing Zhao, Matthew Forshaw, Yang Song

    Abstract: This paper proposes a new self-organizing interval type-2 fuzzy neural network with multiple outputs (SOIT2FNN-MO) for multi-step time series prediction. Differing from the traditional six-layer IT2FNN, a nine-layer network is developed to improve prediction accuracy, uncertainty handling and model interpretability. First, a new co-antecedent layer and a modified consequent layer are devised to im… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  27. arXiv:2407.06157  [pdf, other

    cs.CV cs.AI

    Temporal Grounding of Activities using Multimodal Large Language Models

    Authors: Young Chol Song

    Abstract: Temporal grounding of activities, the identification of specific time intervals of actions within a larger event context, is a critical task in video understanding. Recent advancements in multimodal large language models (LLMs) offer new opportunities for enhancing temporal reasoning capabilities. In this paper, we evaluate the effectiveness of combining image-based and text-based large language m… ▽ More

    Submitted 30 May, 2024; originally announced July 2024.

  28. arXiv:2407.04604  [pdf, other

    cs.CV

    PartCraft: Crafting Creative Objects by Parts

    Authors: Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

    Abstract: This paper propels creative control in generative visual AI by allowing users to "select". Departing from traditional text or sketch-based methods, we for the first time allow users to choose visual concepts by parts for their creative endeavors. The outcome is fine-grained generation that precisely captures selected visual concepts, ensuring a holistically faithful and plausible result. To achiev… ▽ More

    Submitted 8 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. arXiv admin note: substantial text overlap with arXiv:2311.15477

  29. arXiv:2407.03893  [pdf, other

    cs.CV cs.AI

    Do Generalised Classifiers really work on Human Drawn Sketches?

    Authors: Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Aneeshan Sain, Subhadeep Koley, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song

    Abstract: This paper, for the first time, marries large foundation models with human sketch understanding. We demonstrate what this brings -- a paradigm shift in terms of generalised sketch representation learning (e.g., classification). This generalisation happens on two fronts: (i) generalisation across unknown categories (i.e., open-set), and (ii) generalisation traversing abstraction levels (i.e., good… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  30. arXiv:2407.02846  [pdf, other

    cs.CV

    Multi-Task Domain Adaptation for Language Grounding with 3D Objects

    Authors: Penglei Sun, Yaoxian Song, Xinglin Pan, Peijie Dong, Xiaofei Yang, Qiang Wang, Zhixu Li, Tiefeng Li, Xiaowen Chu

    Abstract: The existing works on object-level language grounding with 3D objects mostly focus on improving performance by utilizing the off-the-shelf pre-trained models to capture features, such as viewpoint selection or geometric priors. However, they have failed to consider exploring the cross-modal representation of language-vision alignment in the cross-domain field. To answer this problem, we propose a… ▽ More

    Submitted 5 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  31. arXiv:2407.02607  [pdf, other

    math.DG cs.LG math.MG

    Product Geometries on Cholesky Manifolds with Applications to SPD Manifolds

    Authors: Ziheng Chen, Yue Song, Xiao-Jun Wu, Nicu Sebe

    Abstract: This paper presents two new metrics on the Symmetric Positive Definite (SPD) manifold via the Cholesky manifold, i.e., the space of lower triangular matrices with positive diagonal elements. We first unveil that the existing popular Riemannian metric on the Cholesky manifold can be generally characterized as the product metric of a Euclidean metric and a Riemannian metric on the space of n-dimensi… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figures

    MSC Class: 47A64; 26E60; 53C22; 15B48; 58D17; 53C20; 58B20

  32. arXiv:2407.01810  [pdf, other

    cs.CV

    Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

    Authors: Aneeshan Sain, Pinaki Nath Chowdhury, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song

    Abstract: In this paper, we delve into the intricate dynamics of Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) by addressing a critical yet overlooked aspect -- the choice of viewpoint during sketch creation. Unlike photo systems that seamlessly handle diverse views through extensive datasets, sketch systems, with limited data collected from fixed perspectives, face challenges. Our pilot study, employ… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted in European Conference on Computer Vision (ECCV) 2024

  33. arXiv:2407.01568  [pdf, other

    cs.RO

    Agile Robotics: Optimal Control, Reinforcement Learning, and Differentiable Simulation

    Authors: Yunlong Song, Davide Scaramuzza

    Abstract: Control systems are at the core of every real-world robot. They are deployed in an ever-increasing number of applications, ranging from autonomous racing and search-and-rescue missions to industrial inspections and space exploration. To achieve peak performance, certain tasks require pushing the robot to its maximum agility. How can we design control algorithms that enhance the agility of autonomo… ▽ More

    Submitted 25 May, 2024; originally announced July 2024.

    Comments: This abstract has been accepted for the Robotics: Science and Systems (RSS) Pioneers Workshop, 2024

  34. arXiv:2407.01521  [pdf, other

    cs.LG cs.AI cs.CV

    Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing

    Authors: Bingliang Zhang, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, Yang Song

    Abstract: Diffusion models have recently achieved success in solving Bayesian inverse problems with learned data priors. Current methods build on top of the diffusion sampling process, where each denoising step makes small modifications to samples from the previous step. However, this process struggles to correct errors from earlier sampling steps, leading to worse performance in complicated nonlinear inver… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  35. arXiv:2407.01336  [pdf, other

    cs.IT eess.SP

    Compressed Sensing Inspired User Acquisition for Downlink Integrated Sensing and Communication Transmissions

    Authors: Yi Song, Fernando Pedraza, Shuangyang Li, Siyao Li, Han Yu, Giuseppe Caire

    Abstract: This paper investigates radar-assisted user acquisition for downlink multi-user multiple-input multiple-output (MIMO) transmission using Orthogonal Frequency Division Multiplexing (OFDM) signals. Specifically, we formulate a concise mathematical model for the user acquisition problem, where each user is characterized by its delay and beamspace response. Therefore, we propose a two-stage method for… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  36. arXiv:2407.00225  [pdf, other

    cs.SE

    Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation

    Authors: Wendkûuni C. Ouédraogo, Kader Kaboré, Haoye Tian, Yewei Song, Anil Koyuncu, Jacques Klein, David Lo, Tegawendé F. Bissyandé

    Abstract: Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints. Automated test generation techniques have emerged to address this, but often lack readability and require developer intervention. Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation. However… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  37. arXiv:2406.19598  [pdf, other

    cs.CL

    Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

    Authors: Hongzhan Lin, Ang Lv, Yuhan Chen, Chen Zhu, Yang Song, Hengshu Zhu, Rui Yan

    Abstract: Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions.Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging.In this paper, for LLMs utili… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 14 pages, 5 figures

  38. arXiv:2406.19560  [pdf, other

    cs.CV cs.LG eess.IV

    Cost-efficient Active Illumination Camera For Hyper-spectral Reconstruction

    Authors: Yuxuan Zhang, T. M. Sazzad, Yangyang Song, Spencer J. Chang, Ritesh Chowdhry, Tomas Mejia, Anna Hampton, Shelby Kucharski, Stefan Gerber, Barry Tillman, Marcio F. R. Resende, William M. Hammond, Chris H. Wilson, Alina Zare, Sanjeev J. Koppal

    Abstract: Hyper-spectral imaging has recently gained increasing attention for use in different applications, including agricultural investigation, ground tracking, remote sensing and many other. However, the high cost, large physical size and complicated operation process stop hyperspectral cameras from being employed for various applications and research fields. In this paper, we introduce a cost-efficient… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  39. arXiv:2406.19276  [pdf, other

    cs.CL

    VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation

    Authors: Yixiao Song, Yekyung Kim, Mohit Iyyer

    Abstract: Existing metrics for evaluating the factuality of long-form text, such as FACTSCORE (Min et al., 2023) and SAFE (Wei et al., 2024), decompose an input text into "atomic claims" and verify each against a knowledge base like Wikipedia. These metrics are not suitable for most generation tasks because they assume that every claim is verifiable (i.e., can plausibly be proven true or false). We address… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  40. arXiv:2406.19048  [pdf, other

    cs.CV cs.AI

    BiCo-Fusion: Bidirectional Complementary LiDAR-Camera Fusion for Semantic- and Spatial-Aware 3D Object Detection

    Authors: Yang Song, Lin Wang

    Abstract: 3D object detection is an important task that has been widely applied in autonomous driving. Recently, fusing multi-modal inputs, i.e., LiDAR and camera data, to perform this task has become a new trend. Existing methods, however, either ignore the sparsity of Lidar features or fail to preserve the original spatial structure of LiDAR and the semantic density of camera features simultaneously due t… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  41. arXiv:2406.15752  [pdf, other

    eess.AS cs.AI cs.CL

    TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers

    Authors: Yakun Song, Zhuo Chen, Xiaofei Wang, Ziyang Ma, Guanrou Yang, Xie Chen

    Abstract: Neural codec language model (LM) has demonstrated strong capability in zero-shot text-to-speech (TTS) synthesis. However, the codec LM often suffers from limitations in inference speed and stability, due to its auto-regressive nature and implicit alignment between text and audio. In this work, to handle these challenges, we introduce a new variant of neural codec LM, namely TacoLM. Specifically, T… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: INTERSPEECH 2024

  42. arXiv:2406.15727  [pdf, other

    eess.IV cs.CV

    Semi-supervised variational autoencoder for cell feature extraction in multiplexed immunofluorescence images

    Authors: Piumi Sandarenu, Julia Chen, Iveta Slapetova, Lois Browne, Peter H. Graham, Alexander Swarbrick, Ewan K. A. Millar, Yang Song, Erik Meijering

    Abstract: Advancements in digital imaging technologies have sparked increased interest in using multiplexed immunofluorescence (mIF) images to visualise and identify the interactions between specific immunophenotypes with the tumour microenvironment at the cellular level. Current state-of-the-art multiplexed immunofluorescence image analysis pipelines depend on cell feature representations characterised by… ▽ More

    Submitted 27 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

  43. arXiv:2406.14875  [pdf, other

    cs.SD eess.AS

    GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech

    Authors: Wenbin Wang, Yang Song, Sanjay Jha

    Abstract: This paper introduces GLOBE, a high-quality English corpus with worldwide accents, specifically designed to address the limitations of current zero-shot speaker adaptive Text-to-Speech (TTS) systems that exhibit poor generalizability in adapting to speakers with accents. Compared to commonly used English corpora, such as LibriTTS and VCTK, GLOBE is unique in its inclusion of utterances from 23,519… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024, 4 pages, 3 figures

  44. arXiv:2406.14503  [pdf, other

    cs.CL

    Overview of the CAIL 2023 Argument Mining Track

    Authors: Jingcong Liang, Junlong Wang, Xinyu Zhai, Yungui Zhuang, Yiyang Zheng, Xin Xu, Xiandong Ran, Xiaozheng Dong, Honghui Rong, Yanlun Liu, Hao Chen, Yuhan Wei, Donghai Li, Jiajie Peng, Xuanjing Huang, Chongde Shi, Yansong Feng, Yun Song, Zhongyu Wei

    Abstract: We give a detailed overview of the CAIL 2023 Argument Mining Track, one of the Chinese AI and Law Challenge (CAIL) 2023 tracks. The main goal of the track is to identify and extract interacting argument pairs in trial dialogs. It mainly uses summarized judgment documents but can also refer to trial recordings. The track consists of two stages, and we introduce the tasks designed for each stage; we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  45. arXiv:2406.14326  [pdf, other

    cs.CL

    medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs

    Authors: Mingyi Jia, Junwen Duan, Yan Song, Jianxin Wang

    Abstract: Electronic Medical Records (EMRs), while integral to modern healthcare, present challenges for clinical reasoning and diagnosis due to their complexity and information redundancy. To address this, we proposed medIKAL (Integrating Knowledge Graphs as Assistants of LLMs), a framework that combines Large Language Models (LLMs) with knowledge graphs (KGs) to enhance diagnostic capabilities. medIKAL as… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  46. arXiv:2406.13944  [pdf, other

    math.ST cs.LG stat.ME stat.ML

    Generalization error of min-norm interpolators in transfer learning

    Authors: Yanke Song, Sohom Bhattacharya, Pragya Sur

    Abstract: This paper establishes the generalization error of pooled min-$\ell_2$-norm interpolation in transfer learning where data from diverse distributions are available. Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms. Previous work characterized their out-of-distribution risk when samples from the test distribution are unavailable during trai… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 53 pages, 2 figures

  47. arXiv:2406.13873  [pdf, other

    cs.AI

    A Pure Transformer Pretraining Framework on Text-attributed Graphs

    Authors: Yu Song, Haitao Mao, Jiachen Xiao, Jingzhe Liu, Zhikai Chen, Wei Jin, Carl Yang, Jiliang Tang, Hui Liu

    Abstract: Pretraining plays a pivotal role in acquiring generalized knowledge from large-scale data, achieving remarkable successes as evidenced by large models in CV and NLP. However, progress in the graph domain remains limited due to fundamental challenges such as feature heterogeneity and structural heterogeneity. Recently, increasing efforts have been made to enhance node feature quality with Large Lan… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  48. arXiv:2406.12913  [pdf, other

    cs.LG cs.AI

    T-JEPA: A Joint-Embedding Predictive Architecture for Trajectory Similarity Computation

    Authors: Lihuan Li, Hao Xue, Yang Song, Flora Salim

    Abstract: Trajectory similarity computation is an essential technique for analyzing moving patterns of spatial data across various applications such as traffic management, wildlife tracking, and location-based services. Modern methods often apply deep learning techniques to approximate heuristic metrics but struggle to learn more robust and generalized representations from the vast amounts of unlabeled traj… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  49. arXiv:2406.12403  [pdf, other

    cs.CL cs.AI

    PDSS: A Privacy-Preserving Framework for Step-by-Step Distillation of Large Language Models

    Authors: Tao Fan, Yan Kang, Weijing Chen, Hanlin Gu, Yuanfeng Song, Lixin Fan, Kai Chen, Qiang Yang

    Abstract: In the context of real-world applications, leveraging large language models (LLMs) for domain-specific tasks often faces two major challenges: domain-specific knowledge privacy and constrained resources. To address these issues, we propose PDSS, a privacy-preserving framework for step-by-step distillation of LLMs. PDSS works on a server-client architecture, wherein client transmits perturbed promp… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  50. arXiv:2406.11828  [pdf, other

    cs.LG stat.ML

    Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

    Authors: Kazusato Oko, Yujin Song, Taiji Suzuki, Denny Wu

    Abstract: We study the computational and sample complexity of learning a target function $f_*:\mathbb{R}^d\to\mathbb{R}$ with additive structure, that is, $f_*(x) = \frac{1}{\sqrt{M}}\sum_{m=1}^M f_m(\langle x, v_m\rangle)$, where $f_1,f_2,...,f_M:\mathbb{R}\to\mathbb{R}$ are nonlinear link functions of single-index models (ridge functions) with diverse and near-orthogonal index features $\{v_m\}_{m=1}^M$,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: COLT 2024