Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 309 results for author: Yuan, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11934  [pdf, other

    cs.LG cs.AI cs.CE cs.HC

    Bridging Design Gaps: A Parametric Data Completion Approach With Graph Guided Diffusion Models

    Authors: Rui Zhou, Chenyang Yuan, Frank Permenter, Yanxia Zhang, Nikos Arechiga, Matt Klenk, Faez Ahmed

    Abstract: This study introduces a generative imputation model leveraging graph attention networks and tabular diffusion models for completing missing parametric data in engineering designs. This model functions as an AI design co-pilot, providing multiple design options for incomplete designs, which we demonstrate using the bicycle design CAD dataset. Through comparative evaluations, we demonstrate that our… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: IDETC 2024 Accepted

  2. arXiv:2406.11391  [pdf, other

    cs.LG

    P-TA: Using Proximal Policy Optimization to Enhance Tabular Data Augmentation via Large Language Models

    Authors: Shuo Yang, Chenchen Yuan, Yao Rong, Felix Steinbauer, Gjergji Kasneci

    Abstract: A multitude of industries depend on accurate and reasonable tabular data augmentation for their business processes. Contemporary methodologies in generating tabular data revolve around utilizing Generative Adversarial Networks (GAN) or fine-tuning Large Language Models (LLM). However, GAN-based approaches are documented to produce samples with common-sense errors attributed to the absence of exter… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: The paper was accepted by findings of ACL 2024

  3. arXiv:2406.11328  [pdf, other

    cs.CL

    Are Large Language Models True Healthcare Jacks-of-All-Trades? Benchmarking Across Health Professions Beyond Physician Exams

    Authors: Zheheng Luo, Chenhan Yuan, Qianqian Xie, Sophia Ananiadou

    Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated their potential in delivering accurate answers to questions about world knowledge. Despite this, existing benchmarks for evaluating LLMs in healthcare predominantly focus on medical doctors, leaving other critical healthcare professions underrepresented. To fill this research gap, we introduce the Examinations for Medical Person… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 15 pages, 4 figures

  4. arXiv:2405.17708  [pdf, other

    cs.LG cs.AI stat.ML

    OPERA: Automatic Offline Policy Evaluation with Re-weighted Aggregates of Multiple Estimators

    Authors: Allen Nie, Yash Chandak, Christina J. Yuan, Anirudhan Badrinath, Yannis Flet-Berliac, Emma Brunskil

    Abstract: Offline policy evaluation (OPE) allows us to evaluate and estimate a new sequential decision-making policy's performance by leveraging historical interaction data collected from other policies. Evaluating a new policy online without a confident estimate of its performance can lead to costly, unsafe, or hazardous outcomes, especially in education and healthcare. Several OPE estimators have been pro… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 22 pages

  5. arXiv:2405.16560  [pdf, other

    cs.LG

    Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

    Authors: Yongxian Wei, Zixuan Hu, Li Shen, Zhenyi Wang, Yu Li, Chun Yuan, Dacheng Tao

    Abstract: Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data, enabling the rapid adaptation to new unseen tasks. Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts. In this paper, we empirically and theoretically identify and analyze the mode… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  6. arXiv:2405.15925  [pdf

    eess.IV cs.CV cs.LG

    MUCM-Net: A Mamba Powered UCM-Net for Skin Lesion Segmentation

    Authors: Chunyu Yuan, Dongfang Zhao, Sos S. Agaian

    Abstract: Skin lesion segmentation is key for early skin cancer detection. Challenges in automatic segmentation from dermoscopic images include variations in color, texture, and artifacts of indistinct lesion boundaries. Deep learning methods like CNNs and U-Net have shown promise in addressing these issues. To further aid early diagnosis, especially on mobile devices with limited computing power, we presen… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 11 pages, 8 figures, journal paper (under review)

  7. arXiv:2405.12484  [pdf, other

    cs.GR

    Meta-Homogenization for Knitwear Simulation

    Authors: Chun Yuan, Kui Wu, Haoyang Shi, Lei Lan, Yuxing Qiu, Cem Yuksel, Huamin Wang, Chenfanfu Jiang, Yin Yang

    Abstract: This paper presents meta-homogenization, a spatially varying homogenization scheme for knitwear simulation. We are motivated by the observation that macro-scale fabric dynamics is strongly correlated with its underlying knitting patterns. Therefore, homogenization towards a single material is less effective when the knitting is complex and non-repetitive. Our method tackles this challenge by homog… ▽ More

    Submitted 23 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  8. arXiv:2405.11921  [pdf, other

    cs.CV

    MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections

    Authors: Jiayue Liu, Xiao Tang, Freeman Cheng, Roy Yang, Zhihao Li, Jianzhuang Liu, Yi Huang, Jiaqi Lin, Shiyong Liu, Xiaofei Wu, Songcen Xu, Chun Yuan

    Abstract: 3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting.… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  9. arXiv:2405.11732  [pdf

    cs.CV physics.med-ph

    Quality assurance of organs-at-risk delineation in radiotherapy

    Authors: Yihao Zhao, Cuiyun Yuan, Ying Liang, Yang Li, Chunxia Li, Man Zhao, Jun Hu, Wei Liu, Chenbin Liu

    Abstract: The delineation of tumor target and organs-at-risk is critical in the radiotherapy treatment planning. Automatic segmentation can be used to reduce the physician workload and improve the consistency. However, the quality assurance of the automatic segmentation is still an unmet need in clinical practice. The patient data used in our study was a standardized dataset from AAPM Thoracic Auto-Segmenta… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 14 pages,5 figures, 3 tables

    MSC Class: 68T07 ACM Class: I.4.9

  10. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  11. arXiv:2405.00988  [pdf, other

    cs.CL cs.LG

    Context-Aware Clustering using Large Language Models

    Authors: Sindhu Tipirneni, Ravinarayana Adkathimar, Nurendra Choudhary, Gaurush Hiranandani, Rana Ali Amjad, Vassilis N. Ioannidis, Changhe Yuan, Chandan K. Reddy

    Abstract: Despite the remarkable success of Large Language Models (LLMs) in text understanding and generation, their potential for text clustering tasks remains underexplored. We observed that powerful closed-source LLMs provide good quality clusterings of entity sets but are not scalable due to the massive compute power required and the associated costs. Thus, we propose CACTUS (Context-Aware ClusTering wi… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 16 pages

    ACM Class: I.2.7; I.2.m

  12. arXiv:2405.00984  [pdf, other

    cs.LG cs.CV

    FREE: Faster and Better Data-Free Meta-Learning

    Authors: Yongxian Wei, Zixuan Hu, Zhenyi Wang, Li Shen, Chun Yuan, Dacheng Tao

    Abstract: Data-Free Meta-Learning (DFML) aims to extract knowledge from a collection of pre-trained models without requiring the original data, presenting practical benefits in contexts constrained by data privacy concerns. Current DFML methods primarily focus on the data recovery from these pre-trained models. However, they suffer from slow recovery speed and overlook gaps inherent in heterogeneous pre-tra… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  13. arXiv:2405.00797  [pdf, other

    cs.RO cs.CV

    ADM: Accelerated Diffusion Model via Estimated Priors for Robust Motion Prediction under Uncertainties

    Authors: Jiahui Li, Tianle Shen, Zekai Gu, Jiawei Sun, Chengran Yuan, Yuhang Han, Shuo Sun, Marcelo H. Ang Jr

    Abstract: Motion prediction is a challenging problem in autonomous driving as it demands the system to comprehend stochastic dynamics and the multi-modal nature of real-world agent interactions. Diffusion models have recently risen to prominence, and have proven particularly effective in pedestrian motion prediction tasks. However, the significant time consumption and sensitivity to noise have limited the r… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures

  14. arXiv:2404.16880  [pdf, other

    q-bio.QM cs.AI cs.CL

    Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation

    Authors: Yikun Zhang, Geyan Ye, Chaohao Yuan, Bo Han, Long-Kai Huang, Jianhua Yao, Wei Liu, Yu Rong

    Abstract: Molecule-and-text cross-modal representation learning has emerged as a promising direction for enhancing the quality of molecular representation, thereby improving performance in various scientific fields, including drug discovery and materials science. Existing studies adopt a global alignment approach to learn the knowledge from different modalities. These global alignment approaches fail to cap… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  15. arXiv:2404.16866  [pdf, other

    q-bio.QM cs.AI cs.LG

    Functional Protein Design with Local Domain Alignment

    Authors: Chaohao Yuan, Songyou Li, Geyan Ye, Yikun Zhang, Long-Kai Huang, Wenbing Huang, Wei Liu, Jianhua Yao, Yu Rong

    Abstract: The core challenge of de novo protein design lies in creating proteins with specific functions or properties, guided by certain conditions. Current models explore to generate protein using structural and evolutionary guidance, which only provide indirect conditions concerning functions and properties. However, textual annotations of proteins, especially the annotations for protein domains, which d… ▽ More

    Submitted 27 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  16. arXiv:2404.13230  [pdf, other

    cs.IT math.CO

    Random Gabidulin Codes Achieve List Decoding Capacity in the Rank Metric

    Authors: Zeyu Guo, Chaoping Xing, Chen Yuan, Zihan Zhang

    Abstract: Gabidulin codes, serving as the rank-metric counterpart of Reed-Solomon codes, constitute an important class of maximum rank distance (MRD) codes. However, unlike the fruitful positive results about the list decoding of Reed-Solomon codes, results concerning the list decodability of Gabidulin codes in the rank metric are all negative so far. For example, in contrast to Reed-Solomon codes, which ar… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  17. arXiv:2404.10688  [pdf, other

    cs.CV cs.LG

    Efficient Conditional Diffusion Model with Probability Flow Sampling for Image Super-resolution

    Authors: Yutao Yuan, Chun Yuan

    Abstract: Image super-resolution is a fundamentally ill-posed problem because multiple valid high-resolution images exist for one low-resolution image. Super-resolution methods based on diffusion probabilistic models can deal with the ill-posed nature by learning the distribution of high-resolution images conditioned on low-resolution images, avoiding the problem of blurry images in PSNR-oriented methods. H… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: AAAI 2024

  18. arXiv:2404.10295  [pdf, other

    cs.RO

    ControlMTR: Control-Guided Motion Transformer with Scene-Compliant Intention Points for Feasible Motion Prediction

    Authors: Jiawei Sun, Chengran Yuan, Shuo Sun, Shanze Wang, Yuhang Han, Shuailei Ma, Zefan Huang, Anthony Wong, Keng Peng Tee, Marcelo H. Ang Jr

    Abstract: The ability to accurately predict feasible multimodal future trajectories of surrounding traffic participants is crucial for behavior planning in autonomous vehicles. The Motion Transformer (MTR), a state-of-the-art motion prediction method, alleviated mode collapse and instability during training and enhanced overall prediction performance by replacing conventional dense future endpoints with a s… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  19. arXiv:2404.08001  [pdf, other

    hep-ph cs.AI cs.CL cs.LG hep-ex physics.comp-ph

    Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics

    Authors: Zhengde Zhang, Yiyu Zhang, Haodong Yao, Jianwen Luo, Rui Zhao, Bo Huang, Jiameng Zhao, Yipu Liao, Ke Li, Lina Zhao, Jun Cao, Fazhi Qi, Changzheng Yuan

    Abstract: Large Language Models (LLMs) are undergoing a period of rapid updates and changes, with state-of-the-art (SOTA) model frequently being replaced. When applying LLMs to a specific scientific field, it's challenging to acquire unique domain knowledge while keeping the model itself advanced. To address this challenge, a sophisticated large language model system named as Xiwu has been developed, allowi… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures

    ACM Class: I.2.7

  20. arXiv:2403.19272  [pdf, other

    cs.GR

    Mil2: Efficient Cloth Simulation Using Non-distance Barriers and Subspace Reuse

    Authors: Lei Lan, Zixuan Lu, Jingyi Long, Chun Yuan, Xuan Li, Xiaowei He, Huamin Wang, Chenfanfu Jiang, Yin Yang

    Abstract: Mil2 pushes the performance of high-resolution cloth simulation, making the simulation interactive (in milliseconds) for models with one million degrees of freedom (DOFs) while keeping every triangle untangled. The guarantee of being penetration-free is inspired by the interior-point method, which converts the inequality constraints to barrier potentials. Nevertheless, we propose a major overhaul… ▽ More

    Submitted 23 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  21. arXiv:2403.16368  [pdf, other

    cs.CV

    Distilling Semantic Priors from SAM to Efficient Image Restoration Models

    Authors: Quan Zhang, Xiaoyu Liu, Wei Li, Hanting Chen, Junchao Liu, Jie Hu, Zhiwei Xiong, Chun Yuan, Yunhe Wang

    Abstract: In image restoration (IR), leveraging semantic priors from segmentation models has been a common approach to improve performance. The recent segment anything model (SAM) has emerged as a powerful tool for extracting advanced semantic priors to enhance IR tasks. However, the computational cost of SAM is prohibitive for IR, compared to existing smaller IR models. The incorporation of SAM for extract… ▽ More

    Submitted 2 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  22. arXiv:2403.12957  [pdf, other

    cs.CV

    GVGEN: Text-to-3D Generation with Volumetric Representation

    Authors: Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He

    Abstract: In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Vo… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: project page: https://gvgen.github.io/

  23. arXiv:2403.06249  [pdf, other

    cs.CE cs.CL

    No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks

    Authors: Gang Hu, Ke Qin, Chenhan Yuan, Min Peng, Alejandro Lopez-Lira, Benyou Wang, Sophia Ananiadou, Wanlong Yu, Jimin Huang, Qianqian Xie

    Abstract: While the progression of Large Language Models (LLMs) has notably propelled financial analysis, their application has largely been confined to singular language realms, leaving untapped the potential of bilingual Chinese-English capacity. To bridge this chasm, we introduce ICE-PIXIU, seamlessly amalgamating the ICE-INTENT model and ICE-FLARE benchmark for bilingual financial analysis. ICE-PIXIU un… ▽ More

    Submitted 16 April, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: 24 pages, 5 figures, 12 tables, including Appendix

  24. arXiv:2403.04993  [pdf, other

    cs.CV

    PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via Prompts

    Authors: Zewen Chen, Haina Qin, Juan Wang, Chunfeng Yuan, Bing Li, Weiming Hu, Liang Wang

    Abstract: Due to the diversity of assessment requirements in various application scenarios for the IQA task, existing IQA methods struggle to directly adapt to these varied requirements after training. Thus, when facing new requirements, a typical approach is fine-tuning these models on datasets specifically created for those requirements. However, it is time-consuming to establish IQA datasets. In this wor… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  25. arXiv:2403.01241  [pdf, other

    cs.CL cs.AI

    IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact

    Authors: Ruikang Liu, Haoli Bai, Haokun Lin, Yuening Li, Han Gao, Zhengzhuo Xu, Lu Hou, Jun Yao, Chun Yuan

    Abstract: Large language models (LLMs) excel in natural language processing but demand intensive computation. To mitigate this, various quantization methods have been explored, yet they compromise LLM performance. This paper unveils a previously overlooked type of outliers in LLMs. Such outliers are found to allocate most of the attention scores on initial tokens of input, termed as pivot tokens, which are… ▽ More

    Submitted 25 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: Accepted by ACL 2024 findings

  26. arXiv:2403.00987  [pdf, other

    cs.MA cs.RO eess.SY

    Composite Distributed Learning and Synchronization of Nonlinear Multi-Agent Systems with Complete Uncertain Dynamics

    Authors: Emadodin Jandaghi, Dalton L. Stein, Adam Hoburg, Paolo Stegagno, Mingxi Zhou, Chengzhi Yuan

    Abstract: This paper addresses the problem of composite synchronization and learning control in a network of multi-agent robotic manipulator systems with heterogeneous nonlinear uncertainties under a leader-follower framework. A novel two-layer distributed adaptive learning control strategy is introduced, comprising a first-layer distributed cooperative estimator and a second-layer decentralized determinist… ▽ More

    Submitted 9 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  27. arXiv:2403.00249  [pdf, other

    cs.CV

    Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training

    Authors: Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

    Abstract: In vision-language pre-training (VLP), masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment. However, in most existing methods, the reconstruction targets for MIM lack high-level semantics, and text is not sufficiently involved in masked modeling. These two drawbacks limit the effect of MIM in facilitating cross-modal semantic alignment. In this work, we… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  28. arXiv:2402.19231  [pdf, other

    cs.CV cs.RO

    CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition

    Authors: Feng Lu, Xiangyuan Lan, Lijun Zhang, Dongmei Jiang, Yaowei Wang, Chun Yuan

    Abstract: Over the past decade, most methods in visual place recognition (VPR) have used neural networks to produce feature representations. These networks typically produce a global representation of a place image using only this image itself and neglect the cross-image variations (e.g. viewpoint and illumination), which limits their robustness in challenging scenes. In this paper, we propose a robust glob… ▽ More

    Submitted 1 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted by CVPR2024

  29. arXiv:2402.18439  [pdf, other

    cs.CL cs.AI

    Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication

    Authors: Weize Chen, Chenfei Yuan, Jiarui Yuan, Yusheng Su, Chen Qian, Cheng Yang, Ruobing Xie, Zhiyuan Liu, Maosong Sun

    Abstract: Natural language (NL) has long been the predominant format for human cognition and communication, and by extension, has been similarly pivotal in the development and application of Large Language Models (LLMs). Yet, besides NL, LLMs have seen various non-NL formats during pre-training, such as code and logical expression. NL's status as the optimal format for LLMs, particularly in single-LLM reaso… ▽ More

    Submitted 18 June, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Code release at https://github.com/thunlp/AutoForm

  30. arXiv:2402.16769  [pdf, other

    cs.CV

    Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval

    Authors: Haowei Liu, Yaya Shi, Haiyang Xu, Chunfeng Yuan, Qinghao Ye, Chenliang Li, Ming Yan, Ji Zhang, Fei Huang, Bing Li, Weiming Hu

    Abstract: In video-text retrieval, most existing methods adopt the dual-encoder architecture for fast retrieval, which employs two individual encoders to extract global latent representations for videos and texts. However, they face challenges in capturing fine-grained semantic concepts. In this work, we propose the UNIFY framework, which learns lexicon representations to capture fine-grained semantics and… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted to LREC-COLING 2024

  31. Deep Homography Estimation for Visual Place Recognition

    Authors: Feng Lu, Shuting Dong, Lijun Zhang, Bingxi Liu, Xiangyuan Lan, Dongmei Jiang, Chun Yuan

    Abstract: Visual place recognition (VPR) is a fundamental task for many applications such as robot localization and augmented reality. Recently, the hierarchical VPR methods have received considerable attention due to the trade-off between accuracy and efficiency. They usually first use global features to retrieve the candidate images, then verify the spatial consistency of matched local features for re-ran… ▽ More

    Submitted 18 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI2024

    Journal ref: AAAI 2024

  32. arXiv:2402.14505  [pdf, other

    cs.CV cs.AI

    Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition

    Authors: Feng Lu, Lijun Zhang, Xiangyuan Lan, Shuting Dong, Yaowei Wang, Chun Yuan

    Abstract: Recent studies show that vision models pre-trained in generic visual learning tasks with large-scale data can provide useful feature representations for a wide range of visual perception problems. However, few attempts have been made to exploit pre-trained foundation models in visual place recognition (VPR). Due to the inherent difference in training objectives and data between the tasks of model… ▽ More

    Submitted 3 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: ICLR2024

  33. arXiv:2402.14259  [pdf, other

    cs.CL cs.AI cs.LG

    Word-Sequence Entropy: Towards Uncertainty Estimation in Free-Form Medical Question Answering Applications and Beyond

    Authors: Zhiyuan Wang, Jinhao Duan, Chenxi Yuan, Qingyu Chen, Tianlong Chen, Huaxiu Yao, Yue Zhang, Ren Wang, Kaidi Xu, Xiaoshuang Shi

    Abstract: Uncertainty estimation plays a pivotal role in ensuring the reliability of safety-critical human-AI interaction systems, particularly in the medical domain. However, a general method for quantifying the uncertainty of free-form answers has yet to be established in open-ended medical question-answering (QA) tasks, where irrelevant words and sequences with limited semantic information can be the pri… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 18 pages

  34. arXiv:2402.12659  [pdf, other

    cs.CL cs.AI cs.CE

    FinBen: A Holistic Financial Benchmark for Large Language Models

    Authors: Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi Xiao, Dong Li, Yongfu Dai, Duanyu Feng, Yijing Xu, Haoqiang Kang, Ziyan Kuang, Chenhan Yuan, Kailai Yang, Zheheng Luo, Tianlin Zhang, Zhiwei Liu, Guojun Xiong, Zhiyang Deng, Yuechen Jiang, Zhiyuan Yao, Haohang Li, Yangyang Yu, Gang Hu , et al. (9 additional authors not shown)

    Abstract: LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of comprehensive evaluation benchmarks, the rapid development of LLMs, and the complexity of financial tasks. In this paper, we introduce FinBen, the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks, covering seven critical… ▽ More

    Submitted 18 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 26 pages, 11 figures

  35. arXiv:2402.11533  [pdf, ps, other

    cs.IT

    Randomness-Efficient Constructions of Capacity-Achieving List-Decodable Codes

    Authors: Jonathan Mosheiff, Nicolas Resch, Kuo Shang, Chen Yuan

    Abstract: We wish to generate list-decodable codes over small alphabets using as little randomness as possible. Specifically, we hope to generate codes achieving what we term the Elias bound, which means that they are $(ρ,L)$-list-decodable with rate $R \geq 1-h(ρ)-O(1/L)$. A long line of work shows that uniformly random linear codes (RLCs) achieve the Elias bound: hence, we know $O(n^2)$ random bits suffic… ▽ More

    Submitted 15 May, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  36. arXiv:2402.07405  [pdf, other

    cs.CL

    Dólares or Dollars? Unraveling the Bilingual Prowess of Financial LLMs Between Spanish and English

    Authors: Xiao Zhang, Ruoyu Xiang, Chenhan Yuan, Duanyu Feng, Weiguang Han, Alejandro Lopez-Lira, Xiao-Yang Liu, Sophia Ananiadou, Min Peng, Jimin Huang, Qianqian Xie

    Abstract: Despite Spanish's pivotal role in the global finance industry, a pronounced gap exists in Spanish financial natural language processing (NLP) and application studies compared to English, especially in the era of large language models (LLMs). To bridge this gap, we unveil Toisón de Oro, the first bilingual framework that establishes instruction datasets, finetuned LLMs, and evaluation benchmark for… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: 10 pages, 2 figures

  37. arXiv:2402.02379  [pdf, other

    cs.CL

    Rethinking the Evaluation of Pre-trained Text-and-Layout Models from an Entity-Centric Perspective

    Authors: Chong Zhang, Yixi Zhao, Chenshu Yuan, Yi Tu, Ya Guo, Qi Zhang

    Abstract: Recently developed pre-trained text-and-layout models (PTLMs) have shown remarkable success in multiple information extraction tasks on visually-rich documents. However, the prevailing evaluation pipeline may not be sufficiently robust for assessing the information extraction ability of PTLMs, due to inadequate annotations within the benchmarks. Therefore, we claim the necessary standards for an i… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  38. arXiv:2401.17868  [pdf, other

    cs.CV cs.LG

    Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model

    Authors: Zihan Zhong, Zhiqiang Tang, Tong He, Haoyang Fang, Chun Yuan

    Abstract: The Segment Anything Model (SAM) stands as a foundational framework for image segmentation. While it exhibits remarkable zero-shot generalization in typical scenarios, its advantage diminishes when applied to specialized domains like medical imagery and remote sensing. To address this limitation, this paper introduces Conv-LoRA, a simple yet effective parameter-efficient fine-tuning approach. By i… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: Accepted at ICLR 2024 Conference

  39. arXiv:2401.11439  [pdf, other

    cs.RO cs.AI cs.CV

    General Flow as Foundation Affordance for Scalable Robot Learning

    Authors: Chengbo Yuan, Chuan Wen, Tong Zhang, Yang Gao

    Abstract: We address the challenge of acquiring real-world manipulation skills with a scalable framework.Inspired by the success of large-scale auto-regressive prediction in Large Language Models (LLMs), we hold the belief that identifying an appropriate prediction target capable of leveraging large-scale datasets is crucial for achieving efficient and universal learning. Therefore, we propose to utilize fl… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

  40. arXiv:2401.11123  [pdf, other

    cs.CV

    Uncertainty-aware Bridge based Mobile-Former Network for Event-based Pattern Recognition

    Authors: Haoxiang Yang, Chengguo Yuan, Yabin Zhu, Lan Chen, Xiao Wang, Jin Tang

    Abstract: The mainstream human activity recognition (HAR) algorithms are developed based on RGB cameras, which are easily influenced by low-quality images (e.g., low illumination, motion blur). Meanwhile, the privacy protection issue caused by ultra-high definition (HD) RGB cameras aroused more and more people's attention. Inspired by the success of event cameras which perform better on high dynamic range,… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Comments: Short Paper. arXiv admin note: text overlap with arXiv:2306.05239

  41. arXiv:2401.10511  [pdf, other

    cs.CV

    GMC-IQA: Exploiting Global-correlation and Mean-opinion Consistency for No-reference Image Quality Assessment

    Authors: Zewen Chen, Juan Wang, Bing Li, Chunfeng Yuan, Weiming Hu, Junxian Liu, Peng Li, Yan Wang, Youqun Zhang, Congxuan Zhang

    Abstract: Due to the subjective nature of image quality assessment (IQA), assessing which image has better quality among a sequence of images is more reliable than assigning an absolute mean opinion score for an image. Thus, IQA models are evaluated by global correlation consistency (GCC) metrics like PLCC and SROCC, rather than mean opinion consistency (MOC) metrics like MAE and MSE. However, most existing… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  42. arXiv:2401.10222  [pdf, other

    cs.CV cs.AI

    Supervised Fine-tuning in turn Improves Visual Foundation Models

    Authors: Xiaohu Jiang, Yixiao Ge, Yuying Ge, Dachuan Shi, Chun Yuan, Ying Shan

    Abstract: Image-text training like CLIP has dominated the pretraining of vision foundation models in recent years. Subsequent efforts have been made to introduce region-level visual learning into CLIP's pretraining but face scalability challenges due to the lack of large-scale region-level datasets. Drawing inspiration from supervised fine-tuning (SFT) in natural language processing such as instruction tuni… ▽ More

    Submitted 11 April, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 23 pages, 3 figures, Project page: https://github.com/TencentARC/ViSFT/tree/main

  43. arXiv:2401.08478  [pdf, other

    cs.LG cs.AI

    Solving Continual Offline Reinforcement Learning with Decision Transformer

    Authors: Kaixin Huang, Li Shen, Chen Zhao, Chun Yuan, Dacheng Tao

    Abstract: Continuous offline reinforcement learning (CORL) combines continuous and offline reinforcement learning, enabling agents to learn multiple tasks from static datasets without forgetting prior tasks. However, CORL faces challenges in balancing stability and plasticity. Existing methods, employing Actor-Critic structures and experience replay (ER), suffer from distribution shifts, low efficiency, and… ▽ More

    Submitted 7 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 11 pages, 6 figures

  44. arXiv:2401.02141  [pdf, other

    cs.CV

    Bayesian Intrinsic Groupwise Image Registration: Unsupervised Disentanglement of Anatomy and Geometry

    Authors: Xinzhe Luo, Xin Wang, Linda Shapiro, Chun Yuan, Jianfeng Feng, Xiahai Zhuang

    Abstract: This article presents a general Bayesian learning framework for multi-modal groupwise registration on medical images. The method builds on probabilistic modelling of the image generative process, where the underlying common anatomy and geometric variations of the observed images are explicitly disentangled as latent variables. Thus, groupwise registration is achieved through the solution to Bayesi… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  45. arXiv:2312.15915  [pdf, other

    cs.CV

    ChartBench: A Benchmark for Complex Visual Reasoning in Charts

    Authors: Zhengzhuo Xu, Sinan Du, Yiyan Qi, Chengjin Xu, Chun Yuan, Jian Guo

    Abstract: Multimodal Large Language Models (MLLMs) have shown impressive capabilities in image understanding and generation. However, current benchmarks fail to accurately evaluate the chart comprehension of MLLMs due to limited chart types and inappropriate metrics. To address this, we propose ChartBench, a comprehensive benchmark designed to assess chart comprehension and data reliability through complex… ▽ More

    Submitted 18 June, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  46. arXiv:2312.15720  [pdf, other

    cs.CV

    Set Prediction Guided by Semantic Concepts for Diverse Video Captioning

    Authors: Yifan Lu, Ziqi Zhang, Chunfeng Yuan, Peng Li, Yan Wang, Bing Li, Weiming Hu

    Abstract: Diverse video captioning aims to generate a set of sentences to describe the given video in various aspects. Mainstream methods are trained with independent pairs of a video and a caption from its ground-truth set without exploiting the intra-set relationship, resulting in low diversity of generated captions. Different from them, we formulate diverse captioning into a semantic-concept-guided set p… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: aaai 2024 accepted

  47. arXiv:2312.10247  [pdf, other

    cs.CR cs.DS

    Secure and Accurate Summation of Many Floating-Point Numbers

    Authors: Marina Blanton, Michael T. Goodrich, Chen Yuan

    Abstract: Motivated by the importance of floating-point computations, we study the problem of securely and accurately summing many floating-point numbers. Prior work has focused on security absent accuracy or accuracy absent security, whereas our approach achieves both of them. Specifically, we show how to implement floating-point superaccumulators using secure multi-party computation techniques, so that a… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Corrected version of the paper published at PETS 2023

    Journal ref: Proceedings on Privacy Enhancing Technologies (PoPETs), Vol. 2023, No. 3, pp. 432-445, 2023

  48. arXiv:2312.03594  [pdf, other

    cs.CV

    A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

    Authors: Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, Kai Chen

    Abstract: Achieving high-quality versatile image inpainting, where user-specified regions are filled with plausible content according to user intent, presents a significant challenge. Existing methods face difficulties in simultaneously addressing context-aware image inpainting and text-guided object inpainting due to the distinct optimal training strategies required. To overcome this challenge, we introduc… ▽ More

    Submitted 11 December, 2023; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Project page with code: https://powerpaint.github.io/

  49. arXiv:2311.14756  [pdf, other

    cs.LG cs.AI

    Task-Distributionally Robust Data-Free Meta-Learning

    Authors: Zixuan Hu, Li Shen, Zhenyi Wang, Yongxian Wei, Baoyuan Wu, Chun Yuan, Dacheng Tao

    Abstract: Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. Existing inversion-based DFML methods construct pseudo tasks from a learnable dataset, which is inversely generated from the pre-trained model pool. For the first time, we reveal two major challenges hindering their practical deployments: Task… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  50. arXiv:2311.12772  [pdf, other

    cs.PL quant-ph

    The T-Complexity Costs of Error Correction for Control Flow in Quantum Computation

    Authors: Charles Yuan, Michael Carbin

    Abstract: Numerous quantum algorithms require the use of quantum error correction to overcome the intrinsic unreliability of physical qubits. However, error correction imposes a unique performance bottleneck, known as T-complexity, that can make an implementation of an algorithm as a quantum program run more slowly than on idealized hardware. In this work, we identify that programming abstractions for contr… ▽ More

    Submitted 8 April, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: 22 pages, 17 figures. v2: camera-ready version of paper

    Journal ref: Proc. ACM Program. Lang., Vol. 8, No. PLDI, Article 167. Publication date: June 2024