Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 324 results for author: Lin, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.01658  [pdf, other

    cs.CL

    From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning

    Authors: Wei Chen, Zhen Huang, Liang Xie, Binbin Lin, Houqiang Li, Le Lu, Xinmei Tian, Deng Cai, Yonggang Zhang, Wenxiao Wan, Xu Shen, Jieping Ye

    Abstract: Large Language Models (LLMs) tend to prioritize adherence to user prompts over providing veracious responses, leading to the sycophancy issue. When challenged by users, LLMs tend to admit mistakes and provide inaccurate responses even if they initially provided the correct answer. Recent works propose to employ supervised fine-tuning (SFT) to mitigate the sycophancy issue, while it typically leads… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by ICML 2024

  2. arXiv:2409.01199  [pdf, other

    cs.CV eess.IV

    OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model

    Authors: Liuhan Chen, Zongjian Li, Bin Lin, Bin Zhu, Qian Wang, Shenghai Yuan, Xing Zhou, Xinghua Cheng, Li Yuan

    Abstract: Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial preceding component of Latent Video Diffusion Models (LVDMs). With the same reconstruction quality, the more sufficient the VAE's compression for videos is, the more efficient the LVDMs are. However, most LVDMs utilize 2D image VAE, whose compression for videos is only in the spatial dimension and often ign… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: https://github.com/PKU-YuanGroup/Open-Sora-Plan

  3. arXiv:2408.15101  [pdf, other

    cs.CV cs.AI

    MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

    Authors: Baijiong Lin, Weisen Jiang, Pengguang Chen, Shu Liu, Ying-Cong Chen

    Abstract: Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2407.02228

  4. arXiv:2408.11948  [pdf, other

    q-bio.NC cs.LG math.GT

    Topological Representational Similarity Analysis in Brains and Beyond

    Authors: Baihan Lin

    Abstract: Understanding how the brain represents and processes information is crucial for advancing neuroscience and artificial intelligence. Representational similarity analysis (RSA) has been instrumental in characterizing neural representations, but traditional RSA relies solely on geometric properties, overlooking crucial topological information. This thesis introduces Topological RSA (tRSA), a novel fr… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Thesis defended by Baihan Lin (bl2681@columbia.edu) in 2023 for PhD in Computational Neuroscience at Columbia University; unifies and extends work from PNAS, WWW, CCN, ISMB, BIBM etc. (arXiv:2309.11028, 2203.05488, 1906.09264, 2204.14048, 1810.02923)

  5. arXiv:2408.09144  [pdf, other

    cs.CV

    SSNeRF: Sparse View Semi-supervised Neural Radiance Fields with Augmentation

    Authors: Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, Robby T. Tan

    Abstract: Sparse view NeRF is challenging because limited input images lead to an under constrained optimization problem for volume rendering. Existing methods address this issue by relying on supplementary information, such as depth maps. However, generating this supplementary information accurately remains problematic and often leads to NeRF producing images with undesired artifacts. To address these arti… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  6. arXiv:2408.07758  [pdf

    cs.RO

    RAVE Checklist: Recommendations for Overcoming Challenges in Retrospective Safety Studies of Automated Driving Systems

    Authors: John M. Scanlon, Eric R. Teoh, David G. Kidd, Kristofer D. Kusano, Jonas Bärgman, Geoffrey Chi-Johnston, Luigi Di Lillo, Francesca Favaro, Carol Flannagan, Henrik Liers, Bonnie Lin, Magdalena Lindman, Shane McLaughlin, Miguel Perez, Trent Victor

    Abstract: The public, regulators, and domain experts alike seek to understand the effect of deployed SAE level 4 automated driving system (ADS) technologies on safety. The recent expansion of ADS technology deployments is paving the way for early stage safety impact evaluations, whereby the observational data from both an ADS and a representative benchmark fleet are compared to quantify safety performance.… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  7. arXiv:2407.19548  [pdf, other

    cs.CV

    Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle

    Authors: Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Wangbo Yu, Chaoran Feng, Yatian Pang, Bin Lin, Li Yuan

    Abstract: Recent 3D large reconstruction models typically employ a two-stage process, including first generate multi-view images by a multi-view diffusion model, and then utilize a feed-forward model to reconstruct images to 3D content.However, multi-view diffusion models often produce low-quality and inconsistent images, adversely affecting the quality of the final 3D reconstruction. To address this issue,… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Project page: https://pku-yuangroup.github.io/Cycle3D/

  8. arXiv:2407.19056  [pdf, other

    cs.CL

    OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation

    Authors: Zilong Wang, Yuedong Cui, Li Zhong, Zimin Zhang, Da Yin, Bill Yuchen Lin, Jingbo Shang

    Abstract: Office automation significantly enhances human productivity by automatically finishing routine tasks in the workflow. Beyond the basic information extraction studied in much of the prior document AI literature, the office automation research should be extended to more realistic office tasks which require to integrate various information sources in the office system and produce outputs through a se… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Preprint

  9. arXiv:2407.14872  [pdf, other

    cs.CV cs.RO

    Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts

    Authors: Yanting Yang, Minghao Chen, Qibo Qiu, Jiahao Wu, Wenxiao Wang, Binbin Lin, Ziyu Guan, Xiaofei He

    Abstract: For a general-purpose robot to operate in reality, executing a broad range of instructions across various environments is imperative. Central to the reinforcement learning and planning for such robotic agents is a generalizable reward function. Recent advances in vision-language models, such as CLIP, have shown remarkable performance in the domain of deep learning, paving the way for open-domain v… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 camera-ready

  10. arXiv:2407.10457  [pdf, other

    cs.CL cs.AI

    The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

    Authors: Yifan Song, Guoyin Wang, Sujian Li, Bill Yuchen Lin

    Abstract: Current evaluations of large language models (LLMs) often overlook non-determinism, typically focusing on a single output per example. This limits our understanding of LLM performance variability in real-world applications. Our study addresses this issue by exploring key questions about the performance differences between greedy decoding and sampling, identifying benchmarks' consistency regarding… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  11. arXiv:2407.09992  [pdf, other

    cs.MM

    TOP:A New Target-Audience Oriented Content Paraphrase Task

    Authors: Boda Lin, Jiaxin Shi, Haolong Yan, Binghao Tang, Xiaocheng Gong, Si Li

    Abstract: Recommendation systems usually recommend the existing contents to different users. However, in comparison to static recommendation methods, a recommendation logic that dynamically adjusts based on user interest preferences may potentially attract a larger user base. Thus, we consider paraphrasing existing content based on the interests of the users to modify the content to better align with the pr… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 8 pages

  12. arXiv:2407.09787  [pdf, other

    cs.CV

    Semi-supervised 3D Object Detection with PatchTeacher and PillarMix

    Authors: Xiaopei Wu, Liang Peng, Liang Xie, Yuenan Hou, Binbin Lin, Xiaoshui Huang, Haifeng Liu, Deng Cai, Wanli Ouyang

    Abstract: Semi-supervised learning aims to leverage numerous unlabeled data to improve the model performance. Current semi-supervised 3D object detection methods typically use a teacher to generate pseudo labels for a student, and the quality of the pseudo labels is essential for the final performance. In this paper, we propose PatchTeacher, which focuses on partial scene 3D object detection to provide high… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by AAAI 2024

  13. arXiv:2407.09751  [pdf, other

    cs.CV

    TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

    Authors: Xiaopei Wu, Yuenan Hou, Xiaoshui Huang, Binbin Lin, Tong He, Xinge Zhu, Yuexin Ma, Boxi Wu, Haifeng Liu, Deng Cai, Wanli Ouyang

    Abstract: Training deep models for LiDAR semantic segmentation is challenging due to the inherent sparsity of point clouds. Utilizing temporal data is a natural remedy against the sparsity problem as it makes the input signal denser. However, previous multi-frame fusion algorithms fall short in utilizing sufficient temporal information due to the memory constraint, and they also ignore the informative tempo… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by CVPR 2024

  14. arXiv:2407.05890  [pdf, other

    cs.RO cs.CL

    Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation

    Authors: Jiaqi Chen, Bingqian Lin, Xinmin Liu, Lin Ma, Xiaodan Liang, Kwan-Yee K. Wong

    Abstract: LLM-based agents have demonstrated impressive zero-shot performance in vision-language navigation (VLN) task. However, existing LLM-based methods often focus only on solving high-level task planning by selecting nodes in predefined navigation graphs for movements, overlooking low-level control in navigation scenarios. To bridge this gap, we propose AO-Planner, a novel Affordances-Oriented Planner… ▽ More

    Submitted 20 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  15. arXiv:2407.02228  [pdf, other

    cs.CV cs.AI

    MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

    Authors: Baijiong Lin, Weisen Jiang, Pengguang Chen, Yu Zhang, Shu Liu, Ying-Cong Chen

    Abstract: Multi-task dense scene understanding, which learns a model for multiple dense prediction tasks, has a wide range of application scenarios. Modeling long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba, a novel Mamba-based architecture for multi-task scene understanding. It contains two types of core blocks: self-t… ▽ More

    Submitted 14 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  16. arXiv:2406.18495  [pdf, other

    cs.CL

    WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

    Authors: Seungju Han, Kavel Rao, Allyson Ettinger, Liwei Jiang, Bill Yuchen Lin, Nathan Lambert, Yejin Choi, Nouha Dziri

    Abstract: We introduce WildGuard -- an open, light-weight moderation tool for LLM safety that achieves three goals: (1) identifying malicious intent in user prompts, (2) detecting safety risks of model responses, and (3) determining model refusal rate. Together, WildGuard serves the increasing needs for automatic safety moderation and evaluation of LLM interactions, providing a one-stop tool with enhanced a… ▽ More

    Submitted 9 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: First two authors contributed equally. Third and fourth authors contributed equally

  17. arXiv:2406.12935  [pdf, other

    cs.CR cs.AI cs.LG

    ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

    Authors: Fengqing Jiang, Zhangchen Xu, Luyao Niu, Bill Yuchen Lin, Radha Poovendran

    Abstract: Large language models (LLMs) are expected to follow instructions from users and engage in conversations. Techniques to enhance LLMs' instruction-following capabilities typically fine-tune them using data structured according to a predefined chat template. Although chat templates are shown to be effective in optimizing LLM performance, their impact on safety alignment of LLMs has been less understo… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  18. arXiv:2406.11069  [pdf, other

    cs.CV cs.AI cs.CL

    WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

    Authors: Yujie Lu, Dongfu Jiang, Wenhu Chen, William Yang Wang, Yejin Choi, Bill Yuchen Lin

    Abstract: Recent breakthroughs in vision-language models (VLMs) emphasize the necessity of benchmarking human preferences in real-world multimodal interactions. To address this gap, we launched WildVision-Arena (WV-Arena), an online platform that collects human preferences to evaluate VLMs. We curated WV-Bench by selecting 500 high-quality samples from 8,000 user submissions in WV-Arena. WV-Bench uses GPT-4… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: link: https://hf.co/spaces/WildVision/vision-arena

  19. arXiv:2406.08464  [pdf, other

    cs.CL cs.AI

    Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

    Authors: Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, Bill Yuchen Lin

    Abstract: High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Link: https://magpie-align.github.io/

  20. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  21. arXiv:2406.04770  [pdf, other

    cs.CL cs.AI

    WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

    Authors: Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Faeze Brahman, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Yejin Choi

    Abstract: We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. WildBench consists of 1,024 tasks carefully selected from over one million human-chatbot conversation logs. For automated evaluation with WildBench, we have developed two metrics, WB-Reward and WB-Score, which are computable using advanced LLMs su… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Link: https://hf.co/spaces/allenai/WildBench

  22. arXiv:2406.04325  [pdf, other

    cs.CV

    ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

    Authors: Lin Chen, Xilin Wei, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Bin Lin, Zhenyu Tang, Li Yuan, Yu Qiao, Dahua Lin, Feng Zhao, Jiaqi Wang

    Abstract: We present the ShareGPT4Video series, aiming to facilitate the video understanding of large video-language models (LVLMs) and the video generation of text-to-video models (T2VMs) via dense and precise captions. The series comprises: 1) ShareGPT4Video, 40K GPT4V annotated dense captions of videos with various lengths and sources, developed through carefully designed data filtering and annotating st… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://sharegpt4video.github.io/

  23. arXiv:2405.19327  [pdf, other

    cs.CL cs.AI cs.LG

    MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

    Authors: Ge Zhang, Scott Qu, Jiaheng Liu, Chenchen Zhang, Chenghua Lin, Chou Leuang Yu, Danny Pan, Esther Cheng, Jie Liu, Qunshu Lin, Raven Yuan, Tuney Zheng, Wei Pang, Xinrun Du, Yiming Liang, Yinghao Ma, Yizhi Li, Ziyang Ma, Bill Lin, Emmanouil Benetos, Huan Yang, Junting Zhou, Kaijing Ma, Minghao Liu, Morry Niu , et al. (20 additional authors not shown)

    Abstract: Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparabl… ▽ More

    Submitted 10 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: https://map-neo.github.io/

  24. Correctable Landmark Discovery via Large Models for Vision-Language Navigation

    Authors: Bingqian Lin, Yunshuang Nie, Ziming Wei, Yi Zhu, Hang Xu, Shikui Ma, Jianzhuang Liu, Xiaodan Liang

    Abstract: Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack s… ▽ More

    Submitted 5 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Accepted by TPAMI 2024

  25. arXiv:2405.16247  [pdf, other

    cs.AI cs.CL

    AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning

    Authors: Minghao Chen, Yihang Li, Yanting Yang, Shiyu Yu, Binbin Lin, Xiaofei He

    Abstract: Large Language Models (LLM) based agents have shown promise in autonomously completing tasks across various domains, e.g., robotics, games, and web navigation. However, these agents typically require elaborate design and expert prompts to solve tasks in specific domains, which limits their adaptability. We introduce AutoManual, a framework enabling LLM agents to autonomously build their understand… ▽ More

    Submitted 29 July, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  26. arXiv:2405.07925  [pdf, other

    cs.LG cs.AI cs.DC

    Stable Diffusion-based Data Augmentation for Federated Learning with Non-IID Data

    Authors: Mahdi Morafah, Matthias Reisser, Bill Lin, Christos Louizos

    Abstract: The proliferation of edge devices has brought Federated Learning (FL) to the forefront as a promising paradigm for decentralized and collaborative model training while preserving the privacy of clients' data. However, FL struggles with a significant performance reduction and poor convergence when confronted with Non-Independent and Identically Distributed (Non-IID) data distributions among partici… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: International Workshop on Federated Foundation Models for the Web 2024 (FL@FM-TheWebConf'24)

  27. arXiv:2405.05060  [pdf, other

    cs.CL

    Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models

    Authors: Aylin Gunal, Baihan Lin, Djallel Bouneffouf

    Abstract: Given the increasing demand for mental health assistance, artificial intelligence (AI), particularly large language models (LLMs), may be valuable for integration into automated clinical support systems. In this work, we leverage a decision transformer architecture for topic recommendation in counseling conversations between patients and mental health professionals. The architecture is utilized fo… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 5 pages excluding references, 3 figures; accepted at Clinical NLP Workshop @ NAACL 2024

  28. arXiv:2405.01535  [pdf, other

    cs.CL

    Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

    Authors: Seungone Kim, Juyoung Suk, Shayne Longpre, Bill Yuchen Lin, Jamin Shin, Sean Welleck, Graham Neubig, Moontae Lee, Kyungjae Lee, Minjoon Seo

    Abstract: Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs. However, concerns including transparency, controllability, and affordability strongly motivate the development of open-source LMs specialized in evaluations. On the other hand, existing open evaluator LMs exhibit critical shortcomings: 1) they issue scores that significantly diverge from those ass… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Work in Progress

  29. arXiv:2404.19384  [pdf, other

    cs.CV cs.AI

    Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection

    Authors: Zhanwei Zhang, Minghao Chen, Shuai Xiao, Liang Peng, Hengjia Li, Binbin Lin, Ping Li, Wenxiao Wang, Boxi Wu, Deng Cai

    Abstract: Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previ… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024

  30. arXiv:2404.19330  [pdf, other

    cs.CV cs.AI

    G2LTraj: A Global-to-Local Generation Approach for Trajectory Prediction

    Authors: Zhanwei Zhang, Zishuo Hua, Minghao Chen, Wei Lu, Binbin Lin, Deng Cai, Wenxiao Wang

    Abstract: Predicting future trajectories of traffic agents accurately holds substantial importance in various applications such as autonomous driving. Previous methods commonly infer all future steps of an agent either recursively or simultaneously. However, the recursive strategy suffers from the accumulated error, while the simultaneous strategy overlooks the constraints among future steps, resulting in k… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  31. arXiv:2404.17900  [pdf, other

    cs.CV

    Unsupervised Anomaly Detection via Masked Diffusion Posterior Sampling

    Authors: Di Wu, Shicai Fan, Xue Zhou, Li Yu, Yuzhong Deng, Jianxiao Zou, Baihong Lin

    Abstract: Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image re… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Journal ref: International Joint Conference on Artificial Intelligence 2024

  32. arXiv:2404.14755  [pdf, other

    cs.MM cs.AI cs.CV cs.HC

    SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models

    Authors: Bo Lin, Yingjing Xu, Xuanwen Bao, Zhou Zhao, Zuyong Zhang, Zhouyang Wang, Jie Zhang, Shuiguang Deng, Jianwei Yin

    Abstract: With the continuous advancement of vision language models (VLMs) technology, remarkable research achievements have emerged in the dermatology field, the fourth most prevalent human disease category. However, despite these advancements, VLM still faces "hallucination" in dermatological diagnosis, and due to the inherent complexity of dermatological conditions, existing tools offer relatively limite… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  33. arXiv:2404.10199  [pdf, other

    cs.CL cs.AI

    CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting

    Authors: Huihan Li, Liwei Jiang, Jena D. Hwang, Hyunwoo Kim, Sebastin Santy, Taylor Sorensen, Bill Yuchen Lin, Nouha Dziri, Xiang Ren, Yejin Choi

    Abstract: As the utilization of large language models (LLMs) has proliferated world-wide, it is crucial for them to have adequate knowledge and fair representation for diverse global cultures. In this work, we uncover culture perceptions of three SOTA models on 110 countries and regions on 8 culture-related topics through culture-conditioned generations, and extract symbols from these generations that are a… ▽ More

    Submitted 20 August, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  34. arXiv:2404.09619  [pdf, other

    cs.CV cs.AI

    UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark

    Authors: Zhaokun Zhou, Qiulin Wang, Bin Lin, Yiwei Su, Rui Chen, Xin Tao, Amin Zheng, Li Yuan, Pengfei Wan, Di Zhang

    Abstract: As an alternative to expensive expert evaluation, Image Aesthetic Assessment (IAA) stands out as a crucial task in computer vision. However, traditional IAA methods are typically constrained to a single data source or task, restricting the universality and broader application. In this work, to better align with human aesthetics, we propose a Unified Multi-modal Image Aesthetic Assessment (UNIAA) f… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  35. arXiv:2404.08870  [pdf, other

    cs.CC

    Almost Optimal Time Lower Bound for Approximating Parameterized Clique, CSP, and More, under ETH

    Authors: Venkatesan Guruswami, Bingkai Lin, Xuandi Ren, Yican Sun, Kewen Wu

    Abstract: The Parameterized Inapproximability Hypothesis (PIH), which is an analog of the PCP theorem in parameterized complexity, asserts that, there is a constant $\varepsilon> 0$ such that for any computable function $f:\mathbb{N}\to\mathbb{N}$, no $f(k)\cdot n^{O(1)}$-time algorithm can, on input a $k$-variable CSP instance with domain size $n$, find an assignment satisfying $1-\varepsilon$ fraction of… ▽ More

    Submitted 11 June, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  36. arXiv:2404.06206  [pdf, other

    physics.comp-ph cs.LG math.NA

    Deep Learning Method for Computing Committor Functions with Adaptive Sampling

    Authors: Bo Lin, Weiqing Ren

    Abstract: The committor function is a central object for quantifying the transitions between metastable states of dynamical systems. Recently, a number of computational methods based on deep neural networks have been developed for computing the high-dimensional committor function. The success of the methods relies on sampling adequate data for the transition, which still is a challenging task for complex sy… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  37. arXiv:2404.05955  [pdf, other

    cs.CL cs.AI

    VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?

    Authors: Junpeng Liu, Yifan Song, Bill Yuchen Lin, Wai Lam, Graham Neubig, Yuanzhi Li, Xiang Yue

    Abstract: Multimodal Large Language models (MLLMs) have shown promise in web-related tasks, but evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks. Existing benchmarks are either designed for general multimodal tasks, failing to capture the unique characteristics of web pages, or focus on end-to-end web agent tasks, unable to measure fine-grained a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  38. arXiv:2404.05905  [pdf, other

    physics.comp-ph cs.LG math.NA stat.ML

    Computing Transition Pathways for the Study of Rare Events Using Deep Reinforcement Learning

    Authors: Bo Lin, Yangzheng Zhong, Weiqing Ren

    Abstract: Understanding the transition events between metastable states in complex systems is an important subject in the fields of computational physics, chemistry and biology. The transition pathway plays an important role in characterizing the mechanism underlying the transition, for example, in the study of conformational changes of bio-molecules. In fact, computing the transition pathway is a challengi… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  39. arXiv:2404.05300  [pdf, other

    cs.CV

    Texture Classification Network Integrating Adaptive Wavelet Transform

    Authors: Su-Xi Yu, Jing-Yuan He, Yi Wang, Yu-Jiao Cai, Jun Yang, Bo Lin, Wei-Bin Yang, Jian Ruan

    Abstract: Graves' disease is a common condition that is diagnosed clinically by determining the smoothness of the thyroid texture and its morphology in ultrasound images. Currently, the most widely used approach for the automated diagnosis of Graves' disease utilizes Convolutional Neural Networks (CNNs) for both feature extraction and classification. However, these methods demonstrate limited efficacy in ca… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  40. arXiv:2404.05014  [pdf, other

    cs.CV

    MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

    Authors: Shenghai Yuan, Jinfa Huang, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo

    Abstract: Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions. A largely overlooked problem in T2V is that existing models have not adequately encoded physical knowledge of the real world, thus generated videos tend to have limited motion and poor variations. In this paper, we propose \textbf{MagicTime}, a m… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  41. arXiv:2404.04966  [pdf, other

    cs.SE

    Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis

    Authors: Chen Yang, Junjie Chen, Bin Lin, Jianyi Zhou, Ziqi Wang

    Abstract: Automatic test generation plays a critical role in software quality assurance. While the recent advances in Search-Based Software Testing (SBST) and Large Language Models (LLMs) have shown promise in generating useful tests, these techniques still struggle to cover certain branches. Reaching these hard-to-cover branches usually requires constructing complex objects and resolving intricate inter-pr… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 11 pages, 4 figures

  42. arXiv:2403.13787  [pdf, other

    cs.LG

    RewardBench: Evaluating Reward Models for Language Modeling

    Authors: Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi

    Abstract: Reward models (RMs) are at the crux of successfully using RLHF to align pretrained models to human preferences, yet there has been relatively little study that focuses on evaluation of those models. Evaluating reward models presents an opportunity to understand the opaque technologies used for alignment of language models and which values are embedded in them. Resources for reward model training a… ▽ More

    Submitted 8 June, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 44 pages, 19 figures, 12 tables

  43. arXiv:2403.09973  [pdf, other

    cs.CV

    Den-SOFT: Dense Space-Oriented Light Field DataseT for 6-DOF Immersive Experience

    Authors: Xiaohang Yu, Zhengxian Yang, Shi Pan, Yuqi Han, Haoxiang Wang, Jun Zhang, Shi Yan, Borong Lin, Lei Yang, Tao Yu, Lu Fang

    Abstract: We have built a custom mobile multi-camera large-space dense light field capture system, which provides a series of high-quality and sufficiently dense light field images for various scenarios. Our aim is to contribute to the development of popular 3D scene reconstruction algorithms such as IBRnet, NeRF, and 3D Gaussian splitting. More importantly, the collected dataset, which is much denser than… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  44. arXiv:2403.07408  [pdf, other

    cs.CV

    NightHaze: Nighttime Image Dehazing via Self-Prior Learning

    Authors: Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, Robby T. Tan

    Abstract: Masked autoencoder (MAE) shows that severe augmentation during training produces robust representations for high-level tasks. This paper brings the MAE-like framework to nighttime image enhancement, demonstrating that severe augmentation during training produces strong network priors that are resilient to real-world night haze degradations. We propose a novel nighttime image dehazing method with s… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  45. arXiv:2403.07376  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning

    Authors: Bingqian Lin, Yunshuang Nie, Ziming Wei, Jiaqi Chen, Shikui Ma, Jianhua Han, Hang Xu, Xiaojun Chang, Xiaodan Liang

    Abstract: Vision-and-Language Navigation (VLN), as a crucial research problem of Embodied AI, requires an embodied agent to navigate through complex 3D environments following natural language instructions. Recent research has highlighted the promising capacity of large language models (LLMs) in VLN by improving navigational reasoning accuracy and interpretability. However, their predominant use in an offlin… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  46. Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

    Authors: Bingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Liang Lin

    Abstract: Vision-and-language navigation (VLN) asks an agent to follow a given language instruction to navigate through a real 3D environment. Despite significant advances, conventional VLN agents are trained typically under disturbance-free environments and may easily fail in real-world scenarios, since they are unaware of how to deal with various possible disturbances, such as sudden obstacles or human in… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted by TPAMI 2023

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI,2023)

  47. arXiv:2403.02502  [pdf, other

    cs.CL cs.AI cs.LG

    Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents

    Authors: Yifan Song, Da Yin, Xiang Yue, Jie Huang, Sujian Li, Bill Yuchen Lin

    Abstract: Large Language Models (LLMs) have become integral components in various autonomous agent systems. In this study, we present an exploration-based trajectory optimization approach, referred to as ETO. This learning method is designed to enhance the performance of open LLM agents. Contrary to previous studies that exclusively train on successful expert trajectories, our method allows agents to learn… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted to ACL 2024 Main Conference; Camera Ready

  48. arXiv:2403.00103  [pdf, other

    cs.LG cs.AR

    On Robustness and Generalization of ML-Based Congestion Predictors to Valid and Imperceptible Perturbations

    Authors: Chester Holtz, Yucheng Wang, Chung-Kuan Cheng, Bill Lin

    Abstract: There is substantial interest in the use of machine learning (ML)-based techniques throughout the electronic computer-aided design (CAD) flow, particularly methods based on deep learning. However, while deep learning methods have achieved state-of-the-art performance in several applications, recent work has demonstrated that neural networks are generally vulnerable to small, carefully chosen pertu… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: 7 pages, 7 figures

  49. arXiv:2402.18826  [pdf, ps, other

    cs.CY cs.AI cs.HC q-bio.NC

    The Machine Can't Replace the Human Heart

    Authors: Baihan Lin

    Abstract: What is the true heart of mental healthcare -- innovation or humanity? Can virtual therapy ever replicate the profound human bonds where healing arises? As artificial intelligence and immersive technologies promise expanded access, safeguards must ensure technologies remain supplementary tools guided by providers' wisdom. Implementation requires nuance balancing efficiency and empathy. If consciou… ▽ More

    Submitted 29 February, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  50. arXiv:2402.15610  [pdf, other

    cs.CL

    Selective "Selective Prediction": Reducing Unnecessary Abstention in Vision-Language Reasoning

    Authors: Tejas Srinivasan, Jack Hessel, Tanmay Gupta, Bill Yuchen Lin, Yejin Choi, Jesse Thomason, Khyathi Raghavi Chandu

    Abstract: Selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain. However, when deploying a vision-language system with low tolerance for inaccurate predictions, selective prediction may be over-cautious and abstain too frequently, even on many correct predictions. We introduce ReCoVERR, an inference-time algorithm to… ▽ More

    Submitted 12 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Accepted to ACL Findings 2024