Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 838 results for author: Lin, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.03917  [pdf, other

    cs.CV

    Timestep-Aware Correction for Quantized Diffusion Models

    Authors: Yuzhe Yao, Feng Tian, Jun Chen, Haonan Lin, Guang Dai, Yong Liu, Jingdong Wang

    Abstract: Diffusion models have marked a significant breakthrough in the synthesis of semantically coherent images. However, their extensive noise estimation networks and the iterative generation process limit their wider application, particularly on resource-constrained platforms like mobile devices. Existing post-training quantization (PTQ) methods have managed to compress diffusion models to low precisio… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  2. arXiv:2407.02327  [pdf, other

    cs.LG cs.DC

    QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

    Authors: Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu

    Abstract: A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling. Conducting DNN training with a combination of heterogeneous training and inference GPUs, known as hybrid device training, presents considerable challenges due to disparities in compute capability and significant differences in m… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: IPDPS 24

  3. arXiv:2407.00614  [pdf, other

    cs.RO cs.CV eess.IV

    Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Grasping in Dexterous Robotics

    Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

    Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool grasping remains unresolved. To address this, we pr… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: The source code and the established dataset will be made publicly available at https://github.com/yangfan293/GAAF-DEX

  4. arXiv:2407.00114  [pdf, other

    cs.LG cs.AI cs.CL

    OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

    Authors: Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

    Abstract: We present OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimod… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  5. arXiv:2406.20098  [pdf, other

    cs.CV cs.AI cs.CL

    Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

    Authors: Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

    Abstract: Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Website at https://mbzuai-llm.github.io/webpage2code/

  6. arXiv:2406.19598  [pdf, other

    cs.CL

    Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

    Authors: Hongzhan Lin, Ang Lv, Yuhan Chen, Chen Zhu, Yang Song, Hengshu Zhu, Rui Yan

    Abstract: Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions.Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging.In this paper, for LLMs utili… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 14 pages, 5 figures

  7. arXiv:2406.19392  [pdf, other

    cs.CV

    ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

    Authors: Jr-Jen Chen, Yu-Chien Liao, Hsi-Che Lin, Yu-Chu Yu, Yen-Chun Chen, Yu-Chiang Frank Wang

    Abstract: We introduce ReXTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events. Specifically, ReXTime focuses on reasoning across time, i.e. human-like understanding when the question and its corresponding answer occur in different video segments. This form of reasoning, requiring advanced understanding of cause-and-effect relationships across vi… ▽ More

    Submitted 2 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Project page: https://rextime.github.io/

  8. arXiv:2406.18591  [pdf, other

    cs.CV cs.AI cs.LG

    Composition Vision-Language Understanding via Segment and Depth Anything Model

    Authors: Mingxiao Huo, Pengliang Ji, Haotian Lin, Junchen Liu, Yixiao Wang, Yijun Chen

    Abstract: We introduce a pioneering unified library that leverages depth anything, segment anything models to augment neural comprehension in language-vision model zero-shot understanding. This library synergizes the capabilities of the Depth Anything Model (DAM), Segment Anything Model (SAM), and GPT-4V, enhancing multimodal tasks such as vision-question-answering (VQA) and composition reasoning. Through t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  9. arXiv:2406.15881  [pdf, other

    cs.LG cs.AI

    Fast Tree-Field Integrators: From Low Displacement Rank to Topological Transformers

    Authors: Krzysztof Choromanski, Arijit Sehanobish, Somnath Basu Roy Chowdhury, Han Lin, Avinava Dubey, Tamas Sarlos, Snigdha Chaturvedi

    Abstract: We present a new class of fast polylog-linear algorithms based on the theory of structured matrices (in particular low displacement rank) for integrating tensor fields defined on weighted trees. Several applications of the resulting fast tree-field integrators (FTFIs) are presented, including (a) approximation of graph metrics with tree metrics, (b) graph classification, (c) modeling on meshes, an… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Preprint. Comments welcome

  10. arXiv:2406.13231  [pdf, other

    cs.DS

    Tight Lower Bounds for Directed Cut Sparsification and Distributed Min-Cut

    Authors: Yu Cheng, Max Li, Honghao Lin, Zi-Yi Tai, David P. Woodruff, Jason Zhang

    Abstract: In this paper, we consider two fundamental cut approximation problems on large graphs. We prove new lower bounds for both problems that are optimal up to logarithmic factors. The first problem is to approximate cuts in balanced directed graphs. In this problem, the goal is to build a data structure that $(1 \pm ε)$-approximates cut values in graphs with $n$ vertices. For arbitrary directed graph… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  11. arXiv:2406.12718  [pdf, other

    cs.CV cs.AI cs.CL

    AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

    Authors: Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, QianYing Wang, Guang Dai, Ping Chen, Shijian Lu

    Abstract: Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of objec… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  12. arXiv:2406.12386  [pdf, other

    cs.CL

    IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

    Authors: Qiyao Wang, Jianguo Huang, Shule Lu, Yuan Lin, Kan Xu, Liang Yang, Hongfei Lin

    Abstract: The rapid development of Large Language Models (LLMs) in vertical domains, including intellectual property (IP), lacks a specific evaluation benchmark for assessing their understanding, application, and reasoning abilities. To fill this gap, we introduce IPEval, the first evaluation benchmark tailored for IP agency and consulting tasks. IPEval comprises 2657 multiple-choice questions across four m… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  13. arXiv:2406.12221  [pdf, other

    cs.CL

    On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation

    Authors: Xueru Wen, Xinyu Lu, Xinyan Guan, Yaojie Lu, Hongyu Lin, Ben He, Xianpei Han, Le Sun

    Abstract: Hallucination occurs when large language models (LLMs) exhibit behavior that deviates from the boundaries of their knowledge during the response generation process. Previous learning-based methods focus on detecting knowledge boundaries and finetuning models with instance-level feedback, but they suffer from inaccurate signals due to off-policy data sampling and coarse-grained feedback. In this pa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  14. arXiv:2406.12017  [pdf, other

    stat.ML cs.LG stat.CO

    Sparsity-Constraint Optimization via Splicing Iteration

    Authors: Zezhi Wang, Jin Zhu, Junxian Zhu, Borui Tang, Hongmei Lin, Xueqin Wang

    Abstract: Sparsity-constraint optimization has wide applicability in signal processing, statistics, and machine learning. Existing fast algorithms must burdensomely tune parameters, such as the step size or the implementation of precise stop criteria, which may be challenging to determine in practice. To address this issue, we develop an algorithm named Sparsity-Constraint Optimization via sPlicing itEratio… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 34 pages

  15. arXiv:2406.11514  [pdf, other

    cs.CL

    Counterfactual Debating with Preset Stances for Hallucination Elimination of LLMs

    Authors: Yi Fang, Moxin Li, Wenjie Wang, Hui Lin, Fuli Feng

    Abstract: Large Language Models (LLMs) excel in various natural language processing tasks but struggle with hallucination issues. Existing solutions have considered utilizing LLMs' inherent reasoning abilities to alleviate hallucination, such as self-correction and diverse sampling methods. However, these methods often overtrust LLMs' initial answers due to inherent biases. The key to alleviating this issue… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  16. arXiv:2406.11288  [pdf, other

    cs.CL cs.CV

    MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models

    Authors: Shengkang Wang, Hongzhan Lin, Ziyang Luo, Zhen Ye, Guang Chen, Jing Ma

    Abstract: Large vision-language models (LVLMs) have significantly improved multimodal reasoning tasks, such as visual question answering and image captioning. These models embed multimodal facts within their parameters, rather than relying on external knowledge bases to store factual information explicitly. However, the content discerned by LVLMs may deviate from actual facts due to inherent bias or incorre… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 22 pages, 8 figures

  17. arXiv:2406.10840  [pdf, other

    cs.LG cs.AI q-bio.BM

    CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

    Authors: Haitao Lin, Guojiang Zhao, Odin Zhang, Yufei Huang, Lirong Wu, Zicheng Liu, Siyuan Li, Cheng Tan, Zhifeng Gao, Stan Z. Li

    Abstract: Structure-based drug design (SBDD) aims to generate potential drugs that can bind to a target protein and is greatly expedited by the aid of AI techniques in generative models. However, a lack of systematic understanding persists due to the diverse settings, complex implementation, difficult reproducibility, and task singularity. Firstly, the absence of standardization can lead to unfair compariso… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 9 pages main context

  18. arXiv:2406.10280  [pdf, other

    cs.CR cs.CL cs.LG

    Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries

    Authors: Yu-Hsiang Huang, Yuche Tsai, Hsiang Hsiao, Hong-Yi Lin, Shou-De Lin

    Abstract: This study investigates the privacy risks associated with text embeddings, focusing on the scenario where attackers cannot access the original embedding model. Contrary to previous research requiring direct model access, we explore a more realistic threat model by developing a transfer attack method. This approach uses a surrogate model to mimic the victim model's behavior, allowing the attacker t… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 Main Conference

  19. arXiv:2406.08102  [pdf, other

    cs.CV

    Adversarial Patch for 3D Local Feature Extractor

    Authors: Yu Wen Pao, Li Chang Lai, Hong-Yi Lin

    Abstract: Local feature extractors are the cornerstone of many computer vision tasks. However, their vulnerability to adversarial attacks can significantly compromise their effectiveness. This paper discusses approaches to attack sophisticated local feature extraction algorithms and models to achieve two distinct goals: (1) forcing a match between originally non-matching image regions, and (2) preventing a… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  20. arXiv:2406.07540  [pdf, other

    cs.CV cs.LG

    Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

    Authors: Kuan Heng Lin, Sicheng Mo, Ben Klingher, Fangzhou Mu, Bolei Zhou

    Abstract: Recent controllable generation approaches such as FreeControl and Diffusion Self-guidance bring fine-grained spatial and appearance control to text-to-image (T2I) diffusion models without training auxiliary modules. However, these methods optimize the latent embedding for each type of score function with longer diffusion steps, making the generation process time-consuming and limiting their flexib… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 18 pages, 11 figures, see project page at https://genforce.github.io/ctrl-x

  21. arXiv:2406.06858  [pdf, other

    cs.LG cs.DC

    FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

    Authors: Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Ningxin Zheng, Yinmin Zhong, Xuanrun Zhang, Zuquan Song, Ziheng Jiang, Haibin Lin, Xin Jin, Xin Liu

    Abstract: Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models typically require training and inference to be distributed. Tensor parallelism is a common technique partitioning computation of an operation or layer across devices to overcome the memory capacity limitation of a single processor, and/or to accelerate computation… ▽ More

    Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  22. arXiv:2406.05862  [pdf, other

    cs.CL cs.AI cs.CV

    II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

    Authors: Ziqiang Liu, Feiteng Fang, Xi Feng, Xinrun Du, Chenhao Zhang, Zekun Wang, Yuelin Bai, Qixuan Zhao, Liyang Fan, Chengguang Gan, Hongquan Lin, Jiaming Li, Yuansheng Ni, Haihong Wu, Yaswanth Narsupalli, Zhigang Zheng, Chengming Li, Xiping Hu, Ruifeng Xu, Xiaojun Chen, Min Yang, Jiaheng Liu, Ruibo Liu, Wenhao Huang, Ge Zhang , et al. (1 additional authors not shown)

    Abstract: The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap,… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 100 pages, 82 figures, add citations

  23. arXiv:2406.05135  [pdf

    cs.RO math.OC

    Smart Navigation System for Parking Assignment at Large Events: Incorporating Heterogeneous Driver Characteristics

    Authors: Xi Cheng, Gaofeng Su, Siyuan Feng, Ke Liu, Chen Zhu, Hui Lin, Jilin Song, Jianan Chen

    Abstract: Parking challenges escalate significantly during large events such as concerts or sports games, yet few studies address dynamic parking lot assignments for such occasions. This paper introduces a smart navigation system designed to optimize parking assignments swiftly during large events, utilizing a mixed search algorithm that accounts for the heterogeneous characteristics of drivers. We conducte… ▽ More

    Submitted 14 May, 2024; originally announced June 2024.

  24. arXiv:2406.04997  [pdf, ps, other

    eess.AS cs.LG

    On the social bias of speech self-supervised models

    Authors: Yi-Cheng Lin, Tzu-Quan Lin, Hsi-Che Lin, Andy T. Liu, Hung-yi Lee

    Abstract: Self-supervised learning (SSL) speech models have achieved remarkable performance in various tasks, yet the biased outcomes, especially affecting marginalized groups, raise significant concerns. Social bias refers to the phenomenon where algorithms potentially amplify disparate properties between social groups present in the data used for training. Bias in SSL models can perpetuate injustice by au… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  25. arXiv:2406.04584  [pdf, other

    cs.LG cs.AI cs.CV

    CLoG: Benchmarking Continual Learning of Image Generation Models

    Authors: Haotian Zhang, Junting Zhou, Haowei Lin, Hang Ye, Jianhua Zhu, Zihao Wang, Liangcai Gao, Yizhou Wang, Yitao Liang

    Abstract: Continual Learning (CL) poses a significant challenge in Artificial Intelligence, aiming to mirror the human ability to incrementally acquire knowledge and skills. While extensive research has focused on CL within the context of classification tasks, the advent of increasingly powerful generative models necessitates the exploration of Continual Learning of Generative models (CLoG). This paper advo… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  26. arXiv:2406.02903  [pdf, other

    cs.CL

    Open Grounded Planning: Challenges and Benchmark Construction

    Authors: Shiguang Guo, Ziliang Deng, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun

    Abstract: The emergence of large language models (LLMs) has increasingly drawn attention to the use of LLMs for human-like planning. Existing work on LLM-based planning either focuses on leveraging the inherent language generation capabilities of LLMs to produce free-style plans, or employs reinforcement learning approaches to learn decision-making for a limited set of actions within restricted environments… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accept to ACL 2024 main conference

  27. arXiv:2406.02143  [pdf, other

    cs.CL

    Reinforcement Tuning for Detecting Stances and Debunking Rumors Jointly with Large Language Models

    Authors: Ruichao Yang, Wei Gao, Jing Ma, Hongzhan Lin, Bo Wang

    Abstract: Learning multi-task models for jointly detecting stance and verifying rumors poses challenges due to the need for training data of stance at post level and rumor veracity at claim level, which are difficult to obtain. To address this issue, we leverage large language models (LLMs) as the foundation annotators for the joint stance detection (SD) and rumor verification (RV) tasks, dubbed as JSDRV. W… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL 2024 (Findings)

  28. arXiv:2406.01721  [pdf, other

    cs.CL

    Rotation and Permutation for Advanced Outlier Management and Efficient Quantization of LLMs

    Authors: Haokun Lin, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, Ying Wei

    Abstract: Quantizing large language models (LLMs) presents significant challenges, primarily due to outlier activations that compromise the efficiency of low-bit representation. Traditional approaches mainly focus on solving Normal Outliers-activations with consistently high magnitudes across all tokens. However, these techniques falter when dealing with Massive Outliers, which are significantly higher in v… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 26 pages, 13 figures

  29. arXiv:2406.01252  [pdf, other

    cs.CL cs.AI stat.ML

    Towards Scalable Automated Alignment of LLMs: A Survey

    Authors: Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, Bowen Yu

    Abstract: Alignment is the most critical step in building large language models (LLMs) that meet human needs. With the rapid development of LLMs gradually surpassing human capabilities, traditional alignment methods based on human-annotation are increasingly unable to meet the scalability demands. Therefore, there is an urgent need to explore new sources of automated alignment signals and technical approach… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  30. arXiv:2406.00983  [pdf, other

    cs.CL cs.AI

    Take its Essence, Discard its Dross! Debiasing for Toxic Language Detection via Counterfactual Causal Effect

    Authors: Junyu Lu, Bo Xu, Xiaokun Zhang, Kaiyuan Liu, Dongyu Zhang, Liang Yang, Hongfei Lin

    Abstract: Current methods of toxic language detection (TLD) typically rely on specific tokens to conduct decisions, which makes them suffer from lexical bias, leading to inferior performance and generalization. Lexical bias has both "useful" and "misleading" impacts on understanding toxicity. Unfortunately, instead of distinguishing between these impacts, current debiasing methods typically eliminate them i… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  31. arXiv:2406.00706  [pdf, other

    cs.RO

    MINER-RRT*: A Hierarchical and Fast Trajectory Planning Framework in 3D Cluttered Environments

    Authors: Pengyu Wang, Jiawei Tang, Hin Wang Lin, Fan Zhang, Chaoqun Wang, Jiankun Wang, Ling Shi, Max Q. -H. Meng

    Abstract: Trajectory planning for quadrotors in cluttered environments has been challenging in recent years. While many trajectory planning frameworks have been successful, there still exists potential for improvements, particularly in enhancing the speed of generating efficient trajectories. In this paper, we present a novel hierarchical trajectory planning framework to reduce computational time and memory… ▽ More

    Submitted 14 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  32. arXiv:2405.20583  [pdf, other

    cs.CG math.AT

    The Gestalt Computational Model

    Authors: Yu Chen, Hongwei Lin, Jiacong Yan

    Abstract: Widely employed in cognitive psychology, Gestalt theory elucidates basic principles in visual perception, but meanwhile presents significant challenges for computation. The advancement of artificial intelligence requires the emulation of human cognitive behavior, for which Gestalt theory serves as a fundamental framework describing human visual cognitive behavior. In this paper, we utilize persist… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  33. arXiv:2405.20580  [pdf, other

    cs.GR

    Topology-Aware Blending Method for Implicit Heterogeneous Porous Model Design

    Authors: Depeng Gao, Yang Gao, Yuanzhi Zhang, Hongwei Lin

    Abstract: Porous structures are materials consisting of minuscule pores, where the microstructure morphology significantly impacts their macroscopic properties. Integrating different porous structures through a blending method is indispensable to cater to diverse functional regions in heterogeneous models. Previous studies on blending methods for porous structures have mainly focused on controlling the… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  34. arXiv:2405.19682  [pdf, other

    cs.CV

    Fully Test-Time Adaptation for Monocular 3D Object Detection

    Authors: Hongbin Lin, Yifan Zhang, Shuaicheng Niu, Shuguang Cui, Zhen Li

    Abstract: Monocular 3D object detection (Mono 3Det) aims to identify 3D objects from a single RGB image. However, existing methods often assume training and test data follow the same distribution, which may not hold in real-world test scenarios. To address the out-of-distribution (OOD) problems, we explore a new adaptation paradigm for Mono 3Det, termed Fully Test-time Adaptation. It aims to adapt a well-tr… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  35. arXiv:2405.19595  [pdf

    cs.CV

    The RSNA Abdominal Traumatic Injury CT (RATIC) Dataset

    Authors: Jeffrey D. Rudie, Hui-Ming Lin, Robyn L. Ball, Sabeena Jalal, Luciano M. Prevedello, Savvas Nicolaou, Brett S. Marinelli, Adam E. Flanders, Kirti Magudia, George Shih, Melissa A. Davis, John Mongan, Peter D. Chang, Ferco H. Berger, Sebastiaan Hermans, Meng Law, Tyler Richards, Jan-Peter Grunz, Andreas Steven Kunz, Shobhit Mathur, Sandro Galea-Soler, Andrew D. Chung, Saif Afat, Chin-Chi Kuo, Layal Aweidah , et al. (15 additional authors not shown)

    Abstract: The RSNA Abdominal Traumatic Injury CT (RATIC) dataset is the largest publicly available collection of adult abdominal CT studies annotated for traumatic injuries. This dataset includes 4,274 studies from 23 institutions across 14 countries. The dataset is freely available for non-commercial use via Kaggle at https://www.kaggle.com/competitions/rsna-2023-abdominal-trauma-detection. Created for the… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 40 pages, 2 figures, 3 tables

  36. arXiv:2405.19226  [pdf, other

    cs.CV cs.MM

    ContextBLIP: Doubly Contextual Alignment for Contrastive Image Retrieval from Linguistically Complex Descriptions

    Authors: Honglin Lin, Siyu Li, Guoshun Nan, Chaoyue Tang, Xueting Wang, Jingxin Xu, Rong Yankai, Zhili Zhou, Yutong Gao, Qimei Cui, Xiaofeng Tao

    Abstract: Image retrieval from contextual descriptions (IRCD) aims to identify an image within a set of minimally contrastive candidates based on linguistically complex text. Despite the success of VLMs, they still significantly lag behind human performance in IRCD. The main challenges lie in aligning key contextual cues in two modalities, where these subtle cues are concealed in tiny areas of multiple cont… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted in ACL 2024 Findings

  37. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  38. arXiv:2405.18113  [pdf, other

    cs.CL cs.AI

    Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting

    Authors: Hongda Sun, Hongzhan Lin, Haiyu Yan, Chen Zhu, Yang Song, Xin Gao, Shuo Shang, Rui Yan

    Abstract: The emergence of online recruitment services has revolutionized the traditional landscape of job seeking and recruitment, necessitating the development of high-quality industrial applications to improve person-job fitting. Existing methods generally rely on modeling the latent semantics of resumes and job descriptions and learning a matching function between them. Inspired by the powerful role-pla… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  39. arXiv:2405.17940  [pdf, other

    cs.RO cs.AI

    World Models for General Surgical Grasping

    Authors: Hongbin Lin, Bin Li, Chun Wai Wong, Juan Rojas, Xiangyu Chu, Kwok Wai Samuel Au

    Abstract: Intelligent vision control systems for surgical robots should adapt to unknown and diverse objects while being robust to system disturbances. Previous methods did not meet these requirements due to mainly relying on pose estimation and feature tracking. We propose a world-model-based deep reinforcement learning framework "Grasp Anything for Surgery" (GAS), that learns a pixel-level visuomotor poli… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Journal ref: Robotics: Science and Systems 2024

  40. arXiv:2405.17031  [pdf, other

    cs.LG

    Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning

    Authors: Haoxin Lin, Yu-Yan Xu, Yihao Sun, Zhilong Zhang, Yi-Chen Li, Chengxing Jia, Junyin Ye, Jiaji Zhang, Yang Yu

    Abstract: Model-based methods in reinforcement learning offer a promising approach to enhance data efficiency by facilitating policy exploration within a dynamics model. However, accurately predicting sequential steps in the dynamics model remains a challenge due to the bootstrapping prediction, which attributes the next state to the prediction of the current state. This leads to accumulated errors during m… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  41. arXiv:2405.16884  [pdf, other

    cs.CL cs.DB

    Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching

    Authors: Tianshu Wang, Xiaoyang Chen, Hongyu Lin, Xuanang Chen, Xianpei Han, Hao Wang, Zhenyu Zeng, Le Sun

    Abstract: Entity matching (EM) is a critical step in entity resolution (ER). Recently, entity matching based on large language models (LLMs) has shown great promise. However, current LLM-based entity matching approaches typically follow a binary matching paradigm that ignores the global consistency between record relationships. In this paper, we investigate various methodologies for LLM-based entity matchin… ▽ More

    Submitted 23 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Code is available at https://github.com/tshu-w/LLM4EM

  42. arXiv:2405.15032  [pdf, other

    cs.CL

    Aya 23: Open Weight Releases to Further Multilingual Progress

    Authors: Viraat Aryabumi, John Dang, Dwarak Talupuru, Saurabh Dash, David Cairuz, Hangyu Lin, Bharat Venkitesh, Madeline Smith, Jon Ander Campos, Yi Chern Tan, Kelly Marchisio, Max Bartolo, Sebastian Ruder, Acyr Locatelli, Julia Kreutzer, Nick Frosst, Aidan Gomez, Phil Blunsom, Marzieh Fadaee, Ahmet Üstün, Sara Hooker

    Abstract: This technical report introduces Aya 23, a family of multilingual language models. Aya 23 builds on the recent release of the Aya model (Üstün et al., 2024), focusing on pairing a highly performant pre-trained model with the recently released Aya collection (Singh et al., 2024). The result is a powerful multilingual large language model serving 23 languages, expanding state-of-art language modelin… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  43. arXiv:2405.14591  [pdf, other

    cs.CL

    Base of RoPE Bounds Context Length

    Authors: Xin Men, Mingyu Xu, Bingning Wang, Qingyu Zhang, Hongyu Lin, Xianpei Han, Weipeng Chen

    Abstract: Position embedding is a core component of current Large Language Models (LLMs). Rotary position embedding (RoPE), a technique that encodes the position information with a rotation matrix, has been the de facto choice for position embedding in many LLMs, such as the Llama series. RoPE has been further utilized to extend long context capability, which is roughly based on adjusting the \textit{base}… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 17 pages

  44. arXiv:2405.12357  [pdf

    eess.IV cs.CV

    Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI

    Authors: Di Xu, Xin Miao, Hengjie Liu, Jessica E. Scholey, Wensha Yang, Mary Feng, Michael Ohliger, Hui Lin, Yi Lao, Yang Yang, Ke Sheng

    Abstract: Purpose: 4D MRI with high spatiotemporal resolution is desired for image-guided liver radiotherapy. Acquiring densely sampling k-space data is time-consuming. Accelerated acquisition with sparse samples is desirable but often causes degraded image quality or long reconstruction time. We propose the Reconstruct Paired Conditional Generative Adversarial Network (Re-Con-GAN) to shorten the 4D MRI rec… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  45. arXiv:2405.11724  [pdf, other

    cs.CL cs.AI cs.CR cs.IR

    Token-wise Influential Training Data Retrieval for Large Language Models

    Authors: Huawei Lin, Jikai Long, Zhaozhuo Xu, Weijie Zhao

    Abstract: Given a Large Language Model (LLM) generation, how can we identify which training data led to this generation? In this paper, we proposed RapidIn, a scalable framework adapting to LLMs for estimating the influence of each training data. The proposed framework consists of two stages: caching and retrieval. First, we compress the gradient vectors by over 200,000x, allowing them to be cached on disk… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024

  46. arXiv:2405.11284  [pdf, other

    cs.AI stat.OT

    The Logic of Counterfactuals and the Epistemology of Causal Inference

    Authors: Hanti Lin

    Abstract: The 2021 Nobel Prize in Economics recognized a theory of causal inference, which deserves more attention from philosophers. To that end, I develop a dialectic that extends the Lewis-Stalnaker debate on a logical principle called Conditional Excluded Middle (CEM). I first play the good cop for CEM, and give a new argument for it: a Quine-Putnam indispensability argument based on the Nobel-Prize win… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

  47. arXiv:2405.10348  [pdf, other

    q-bio.QM cs.AI cs.LG

    Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning

    Authors: Lirong Wu, Yijun Tian, Haitao Lin, Yufei Huang, Siyuan Li, Nitesh V Chawla, Stan Z. Li

    Abstract: Protein-protein bindings play a key role in a variety of fundamental biological processes, and thus predicting the effects of amino acid mutations on protein-protein binding is crucial. To tackle the scarcity of annotated mutation data, pre-training with massive unlabeled data has emerged as a promising solution. However, this process faces a series of challenges: (1) complex higher-order dependen… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  48. arXiv:2405.10121  [pdf, other

    cs.CL cs.MM

    Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation

    Authors: Bo Zhang, Hui Ma, Jian Ding, Jian Wang, Bo Xu, Hongfei Lin

    Abstract: Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Under Review

  49. Dynamic GNNs for Precise Seizure Detection and Classification from EEG Data

    Authors: Arash Hajisafi, Haowen Lin, Yao-Yi Chiang, Cyrus Shahabi

    Abstract: Diagnosing epilepsy requires accurate seizure detection and classification, but traditional manual EEG signal analysis is resource-intensive. Meanwhile, automated algorithms often overlook EEG's geometric and semantic properties critical for interpreting brain activity. This paper introduces NeuroGNN, a dynamic Graph Neural Network (GNN) framework that captures the dynamic interplay between the EE… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: This preprint has not undergone any post-submission improvements or corrections. The Version of Record of this contribution is published in the proceedings of the 28th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2024), Taipei, Taiwan, May 7-10, 2024, and is available online at https://doi.org/10.1007/978-981-97-2238-9_16

    Journal ref: Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'24). Singapore: Springer Nature Singapore, 2024

  50. arXiv:2405.08298  [pdf, other

    cs.LG

    Deep Reinforcement Learning for Real-Time Ground Delay Program Revision and Corresponding Flight Delay Assignments

    Authors: Ke Liu, Fan Hu, Hui Lin, Xi Cheng, Jianan Chen, Jilin Song, Siyuan Feng, Gaofeng Su, Chen Zhu

    Abstract: This paper explores the optimization of Ground Delay Programs (GDP), a prevalent Traffic Management Initiative used in Air Traffic Management (ATM) to reconcile capacity and demand discrepancies at airports. Employing Reinforcement Learning (RL) to manage the inherent uncertainties in the national airspace system-such as weather variability, fluctuating flight demands, and airport arrival rates-we… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.