Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,363 results for author: Sun, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13908  [pdf, other

    cs.RO

    A Decision-Making GPT Model Augmented with Entropy Regularization for Autonomous Vehicles

    Authors: Jiaqi Liu, Shiyu Fang, Xuekai Liu, Lulu Guo, Peng Hang, Jian Sun

    Abstract: In the domain of autonomous vehicles (AVs), decision-making is a critical factor that significantly influences the efficacy of autonomous navigation. As the field progresses, the enhancement of decision-making capabilities in complex environments has become a central area of research within data-driven methodologies. Despite notable advances, existing learning-based decision-making strategies in a… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang , et al. (32 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.11161  [pdf, other

    cs.AI cs.MM

    Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

    Authors: Zebang Cheng, Zhi-Qi Cheng, Jun-Yan He, Jingdong Sun, Kai Wang, Yuxiang Lin, Zheng Lian, Xiaojiang Peng, Alexander Hauptmann

    Abstract: Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing su… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 37 pages, 12 figures, Project: https://github.com/ZebangCheng/Emotion-LLaMA, Demo: https://huggingface.co/spaces/ZebangCheng/Emotion-LLaMA

  4. arXiv:2406.11045  [pdf, other

    cs.LG math.NA

    Kolmogorov Arnold Informed neural network: A physics-informed deep learning framework for solving PDEs based on Kolmogorov Arnold Networks

    Authors: Yizheng Wang, Jia Sun, Jinshuai Bai, Cosmin Anitescu, Mohammad Sadegh Eshaghi, Xiaoying Zhuang, Timon Rabczuk, Yinghua Liu

    Abstract: AI for partial differential equations (PDEs) has garnered significant attention, particularly with the emergence of Physics-informed neural networks (PINNs). The recent advent of Kolmogorov-Arnold Network (KAN) indicates that there is potential to revisit and enhance the previously MLP-based PINNs. Compared to MLPs, KANs offer interpretability and require fewer parameters. PDEs can be described in… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  5. arXiv:2406.10292  [pdf, other

    cs.AI cs.CL cs.LG

    Automatically Labeling $200B Life-Saving Datasets: A Large Clinical Trial Outcome Benchmark

    Authors: Chufan Gao, Jathurshan Pradeepkumar, Trisha Das, Shivashankar Thati, Jimeng Sun

    Abstract: The global cost of drug discovery and development exceeds $200 billion annually. The main results of drug discovery and development are the outcomes of clinical trials, which directly influence the regulatory approval of new drug candidates and ultimately affect patient outcomes. Despite their significance, large-scale, high-quality clinical trial outcome data are not readily available to the publ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  6. arXiv:2406.09701  [pdf, other

    cs.SE

    Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models

    Authors: Qiheng Mao, Zhenhao Li, Xing Hu, Kui Liu, Xin Xia, Jianling Sun

    Abstract: Software vulnerabilities pose significant risks to the security and integrity of software systems. Prior studies have proposed a series of approaches to vulnerability detection using deep learning or pre-trained models. However, there is still a lack of vulnerability's detailed explanation for understanding apart from detecting its occurrence. Recently, large language models (LLMs) have shown a re… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  7. arXiv:2406.09469  [pdf, other

    cs.DB

    Conformance Testing of Relational DBMS Against SQL Specifications

    Authors: Shuang Liu, Chenglin Tian, Jun Sun, Ruifeng Wang, Wei Lu, Yongxin Zhao, Yinxing Xue, Junjie Wang, Xiaoyong Du

    Abstract: A Relational Database Management System (RDBMS) is one of the fundamental software that supports a wide range of applications, making it critical to identify bugs within these systems. There has been active research on testing RDBMS, most of which employ crash or use metamorphic relations as the oracle. Although existing approaches can detect bugs in RDBMS, they are far from comprehensively evalua… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  8. arXiv:2406.08743  [pdf, other

    cs.LG

    Generalizable Implicit Neural Representation As a Universal Spatiotemporal Traffic Data Learner

    Authors: Tong Nie, Guoyang Qin, Wei Ma, Jian Sun

    Abstract: $\textbf{This is the conference version of our paper: Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner}… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted by the Conference in Emerging Technologies in Transportation Systems (TRC-30). arXiv admin note: substantial text overlap with arXiv:2405.03185

  9. arXiv:2406.07520  [pdf, other

    cs.CV cs.AI cs.GR

    Neural Gaffer: Relighting Any Object via Diffusion

    Authors: Haian Jin, Yuan Li, Fujun Luan, Yuanbo Xiangli, Sai Bi, Kai Zhang, Zexiang Xu, Jin Sun, Noah Snavely

    Abstract: Single-image relighting is a challenging task that involves reasoning about the complex interplay between geometry, materials, and lighting. Many prior methods either support only specific categories of images, such as portraits, or require special capture conditions, like using a flashlight. Alternatively, some methods explicitly decompose a scene into intrinsic components, such as normals and BR… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Project Website: https://neural-gaffer.github.io

  10. arXiv:2406.07189  [pdf, other

    cs.CV

    RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker

    Authors: Yunfeng Li, Bo Wang, Jiuran Sun, Xueyi Wu, Ye Li

    Abstract: Vision camera and sonar are naturally complementary in the underwater environment. Combining the information from two modalities will promote better observation of underwater targets. However, this problem has not received sufficient attention in previous research. Therefore, this paper introduces a new challenging RGB-Sonar (RGB-S) tracking task and investigates how to achieve efficient tracking… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  11. arXiv:2406.06592  [pdf, other

    cs.CL cs.LG

    Improve Mathematical Reasoning in Language Models by Automated Process Supervision

    Authors: Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi

    Abstract: Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a leng… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 18 pages, 5 figures, 1 table

  12. arXiv:2406.06007  [pdf, other

    cs.LG cs.CL cs.CV cs.CY

    CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

    Authors: Peng Xia, Ze Chen, Juanxi Tian, Yangrui Gong, Ruibo Hou, Yue Xu, Zhenbang Wu, Zhiyuan Fan, Yiyang Zhou, Kangyu Zhu, Wenhao Zheng, Zhaoyang Wang, Xiao Wang, Xuchao Zhang, Chetan Bansal, Marc Niethammer, Junzhou Huang, Hongtu Zhu, Yun Li, Jimeng Sun, Zongyuan Ge, Gang Li, James Zou, Huaxiu Yao

    Abstract: Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustworthiness of Med-LVLMs remains unverified, posing significant risks for future model deployment. In this paper, we introduce CARES and aim to comprehen… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  13. arXiv:2406.05906  [pdf, other

    cs.CL cs.AI

    TTM-RE: Memory-Augmented Document-Level Relation Extraction

    Authors: Chufan Gao, Xuan Wang, Jimeng Sun

    Abstract: Document-level relation extraction aims to categorize the association between any two entities within a document. We find that previous methods for document-level relation extraction are ineffective in exploiting the full potential of large amounts of training data with varied noise levels. For example, in the ReDocRED benchmark dataset, state-of-the-art methods trained on the large-scale, lower-q… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted in ACL 2024 Main

  14. arXiv:2406.04844  [pdf, other

    cs.CV

    Multi-Granularity Language-Guided Multi-Object Tracking

    Authors: Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

    Abstract: Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising performance, learning discriminative features solely based on visual information is challenging especially in case of environmental interference such as o… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  15. arXiv:2406.04273  [pdf, other

    cs.CV cs.AI

    ELFS: Enhancing Label-Free Coreset Selection via Clustering-based Pseudo-Labeling

    Authors: Haizhong Zheng, Elisa Tsai, Yifu Lu, Jiachen Sun, Brian R. Bartoldson, Bhavya Kailkhura, Atul Prakash

    Abstract: High-quality human-annotated data is crucial for modern deep learning pipelines, yet the human annotation process is both costly and time-consuming. Given a constrained human labeling budget, selecting an informative and representative data subset for labeling can significantly reduce human annotation effort. Well-performing state-of-the-art (SOTA) coreset selection methods require ground-truth la… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  16. arXiv:2406.03736  [pdf, other

    cs.LG cs.CL

    Your Absorbing Discrete Diffusion Secretly Models the Conditional Distributions of Clean Data

    Authors: Jingyang Ou, Shen Nie, Kaiwen Xue, Fengqi Zhu, Jiacheng Sun, Zhenguo Li, Chongxuan Li

    Abstract: Discrete diffusion models with absorbing processes have shown promise in language modeling. The key quantities to be estimated are the ratios between the marginal probabilities of two transitive states at all timesteps, called the concrete score. In this paper, we reveal that the concrete score in absorbing diffusion can be expressed as conditional probabilities of clean data, multiplied by a time… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  17. arXiv:2406.02860  [pdf, other

    cs.RO

    Towards Interactive Autonomous Vehicle Testing: Vehicle-Under-Test-Centered Traffic Simulation

    Authors: Yiru Liu, Xiaocong Zhao, Jian Sun

    Abstract: The simulation-based testing is essential for safely implementing autonomous vehicles (AVs) on roads, necessitating simulated traffic environments that dynamically interact with the Vehicle Under Test (VUT). This study introduces a VUT-Centered environmental Dynamics Inference (VCDI) model for realistic, interactive, and diverse background traffic simulation. VCDI is built on a Transformer-based t… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 7 pages, 4 figures

  18. arXiv:2406.02559  [pdf, other

    cs.CV

    ShadowRefiner: Towards Mask-free Shadow Removal via Fast Fourier Transformer

    Authors: Wei Dong, Han Zhou, Yuqiong Tian, Jingke Sun, Xiaohong Liu, Guangtao Zhai, Jun Chen

    Abstract: Shadow-affected images often exhibit pronounced spatial discrepancies in color and illumination, consequently degrading various vision applications including object detection and segmentation systems. To effectively eliminate shadows in real-world images while preserving intricate details and producing visually compelling outcomes, we introduce a mask-free Shadow Removal and Refinement network (Sh… ▽ More

    Submitted 17 April, 2024; originally announced June 2024.

    Comments: Accepted by CVPR workshop 2024 (NTIRE 2024)

  19. arXiv:2406.02370  [pdf, other

    cs.RO

    Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning

    Authors: Jiaxu Wang, Ziyi Zhang, Qiang Zhang, Jia Li, Jingkai Sun, Mingyuan Sun, Junhao He, Renjing Xu

    Abstract: Latent scene representation plays a significant role in training reinforcement learning (RL) agents. To obtain good latent vectors describing the scenes, recent works incorporate the 3D-aware latent-conditioned NeRF pipeline into scene representation learning. However, these NeRF-related methods struggle to perceive 3D structural information due to the inefficient dense sampling in volumetric rend… ▽ More

    Submitted 9 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  20. arXiv:2406.02310  [pdf, other

    cs.LG

    Disentangled Representation via Variational AutoEncoder for Continuous Treatment Effect Estimation

    Authors: Ruijing Cui, Jianbin Sun, Bingyu He, Kewei Yang, Bingfeng Ge

    Abstract: Continuous treatment effect estimation holds significant practical importance across various decision-making and assessment domains, such as healthcare and the military. However, current methods for estimating dose-response curves hinge on balancing the entire representation by treating all covariates as confounding variables. Although various approaches disentangle covariates into different facto… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  21. arXiv:2406.02252  [pdf, other

    cs.DC

    Exploring the Efficiency of Renewable Energy-based Modular Data Centers at Scale

    Authors: Jinghan Sun, Zibo Gong, Anup Agarwal, Shadi Noghabi, Ranveer Chandra, Marc Snir, Jian Huang

    Abstract: Modular data centers (MDCs) that can be placed right at the energy farms and powered mostly by renewable energy, are proven to be a flexible and effective approach to lowering the carbon footprint of data centers. However, the main challenge of using renewable energy is the high variability of power produced, which implies large volatility in powering computing resources at MDCs, and degraded appl… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  22. arXiv:2406.01960  [pdf, other

    cs.LG cs.AI

    Certifiably Byzantine-Robust Federated Conformal Prediction

    Authors: Mintong Kang, Zhen Lin, Jimeng Sun, Cao Xiao, Bo Li

    Abstract: Conformal prediction has shown impressive capacity in constructing statistically rigorous prediction sets for machine learning models with exchangeable data samples. The siloed datasets, coupled with the escalating privacy concerns related to local data sharing, have inspired recent innovations extending conformal prediction into federated environments with distributed data samples. However, this… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  23. arXiv:2406.01806  [pdf, other

    cs.CL cs.AI

    Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation

    Authors: Zhen Lin, Shubhendu Trivedi, Jimeng Sun

    Abstract: The advent of large language models (LLMs) has dramatically advanced the state-of-the-art in numerous natural language generation tasks. For LLMs to be applied reliably, it is essential to have an accurate measure of their confidence. Currently, the most commonly used confidence score function is the likelihood of the generated sequence, which, however, conflates semantic and syntactic components.… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  24. arXiv:2406.01799  [pdf, other

    cs.LG math.OC stat.ML

    Online Control in Population Dynamics

    Authors: Noah Golowich, Elad Hazan, Zhou Lu, Dhruv Rohatgi, Y. Jennifer Sun

    Abstract: The study of population dynamics originated with early sociological works but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics. Most studies on population dynamics focus on the problem of prediction rather than control. Existing mathematical models for control in population dynamics are often restricted to specific, noise-free dynamics,… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  25. arXiv:2406.01304  [pdf, other

    cs.CL cs.AI cs.SE

    CodeR: Issue Resolving with Multi-Agent and Task Graphs

    Authors: Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, Jie Wang, Xiao Cheng, Guangtai Liang, Yuchi Ma, Pan Bian, Tao Xie, Qianxiang Wang

    Abstract: GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issue… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: https://github.com/NL2Code/CodeR

  26. arXiv:2406.00992  [pdf, other

    cs.SE

    Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis

    Authors: Fengjie Li, Jiajun Jiang, Jiajun Sun, Hongyu Zhang

    Abstract: Automated Program Repair (APR) has garnered significant attention due to its potential to streamline the bug repair process for human developers. Recently, LLM-based APR methods have shown promise in repairing real-world bugs. However, existing APR methods often utilize patches generated by LLMs without further optimization, resulting in reduced effectiveness due to the lack of program-specific kn… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 12 pages, 4 figures

  27. arXiv:2406.00281  [pdf, other

    cs.LG cs.AI

    Cross-Table Pretraining towards a Universal Function Space for Heterogeneous Tabular Data

    Authors: Jintai Chen, Zhen Lin, Qiyuan Chen, Jimeng Sun

    Abstract: Tabular data from different tables exhibit significant diversity due to varied definitions and types of features, as well as complex inter-feature and feature-target relationships. Cross-dataset pretraining, which learns reusable patterns from upstream data to support downstream tasks, have shown notable success in various fields. Yet, when applied to tabular data prediction, this paradigm faces c… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  28. arXiv:2405.20711  [pdf, other

    cs.CV

    Revisiting Mutual Information Maximization for Generalized Category Discovery

    Authors: Zhaorui Tan, Chengrui Zhang, Xi Yang, Jie Sun, Kaizhu Huang

    Abstract: Generalized category discovery presents a challenge in a realistic scenario, which requires the model's generalization ability to recognize unlabeled samples from known and unknown categories. This paper revisits the challenge of generalized category discovery through the lens of information maximization (InfoMax) with a probabilistic parametric classifier. Our findings reveal that ensuring indepe… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Preprint version

  29. arXiv:2405.19686  [pdf, other

    cs.AI

    Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback

    Authors: Jingwei Sun, Zhixu Du, Yiran Chen

    Abstract: Large language models (LLMs) have demonstrated remarkable proficiency in a range of natural language processing tasks. Once deployed, LLMs encounter users with personalized factual knowledge, and such personalized knowledge is consistently reflected through users' interactions with the LLMs. To enhance user experience, real-time model personalization is essential, allowing LLMs to adapt user-speci… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  30. arXiv:2405.18166  [pdf, other

    cs.AI

    Defending Large Language Models Against Jailbreak Attacks via Layer-specific Editing

    Authors: Wei Zhao, Zhe Li, Yige Li, Ye Zhang, Jun Sun

    Abstract: Large language models (LLMs) are increasingly being adopted in a wide range of real-world applications. Despite their impressive performance, recent studies have shown that LLMs are vulnerable to deliberately crafted adversarial prompts even when aligned via Reinforcement Learning from Human Feedback or supervised fine-tuning. While existing defense methods focus on either detecting harmful prompt… ▽ More

    Submitted 14 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  31. arXiv:2405.17405  [pdf, other

    cs.CV

    Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer

    Authors: Ruizhi Shao, Youxin Pang, Zerong Zheng, Jingxiang Sun, Yebin Liu

    Abstract: We present a novel approach for generating high-quality, spatio-temporally coherent human videos from a single image under arbitrary viewpoints. Our framework combines the strengths of U-Nets for accurate condition injection and diffusion transformers for capturing global correlations across viewpoints and time. The core is a cascaded 4D transformer architecture that factorizes attention across vi… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Our project website is https://human4dit.github.io

  32. arXiv:2405.17216  [pdf, other

    cs.LG cs.AI cs.LO stat.ML

    Autoformalizing Euclidean Geometry

    Authors: Logan Murphy, Kaiyu Yang, Jialiang Sun, Zhaoyu Li, Anima Anandkumar, Xujie Si

    Abstract: Autoformalization involves automatically translating informal math into formal theorems and proofs that are machine-verifiable. Euclidean geometry provides an interesting and controllable domain for studying autoformalization. In this paper, we introduce a neuro-symbolic framework for autoformalizing Euclidean geometry, which combines domain knowledge, SMT solvers, and large language models (LLMs)… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024. The first two authors contributed equally

  33. arXiv:2405.16749  [pdf, other

    cs.LG cs.CV

    DMPlug: A Plug-in Method for Solving Inverse Problems with Diffusion Models

    Authors: Hengkang Wang, Xu Zhang, Taihui Li, Yuxiang Wan, Tiancong Chen, Ju Sun

    Abstract: Pretrained diffusion models (DMs) have recently been popularly used in solving inverse problems (IPs). The existing methods mostly interleave iterative steps in the reverse diffusion process and iterative steps to bring the iterates closer to satisfying the measurement constraint. However, such interleaving methods struggle to produce final results that look like natural objects of interest (i.e.,… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  34. arXiv:2405.16412  [pdf, other

    cs.CL cs.LG

    KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge

    Authors: Pengcheng Jiang, Lang Cao, Cao Xiao, Parminder Bhatia, Jimeng Sun, Jiawei Han

    Abstract: Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph, facilitating efficient reasoning and knowledge discovery. While existing methods typically focus either on training KGE models solely based on graph structure or fine-tuning pre-trained language models with classification data in KG, KG-FIT leverages LLM-gu… ▽ More

    Submitted 4 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  35. arXiv:2405.15302  [pdf, other

    cs.AI cs.CL cs.LG

    Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

    Authors: Zhiwei Wang, Yunji Wang, Zhongwang Zhang, Zhangchen Zhou, Hui Jin, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Yaoyu Zhang, Zhi-Qin John Xu

    Abstract: Large language models have consistently struggled with complex reasoning tasks, such as mathematical problem-solving. Investigating the internal reasoning mechanisms of these models can help us design better model architectures and training strategies, ultimately enhancing their reasoning capabilities. In this study, we examine the matching mechanism employed by Transformer for multi-step reasonin… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  36. Analog or Digital In-memory Computing? Benchmarking through Quantitative Modeling

    Authors: Jiacong Sun, Pouya Houshmand, Marian Verhelst

    Abstract: In-Memory Computing (IMC) has emerged as a promising paradigm for energy-efficient, throughput-efficient and area-efficient machine learning at the edge. However, the differences in hardware architectures, array dimensions, and fabrication technologies among published IMC realizations have made it difficult to grasp their relative strengths. Moreover, previous studies have primarily focused on exp… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  37. arXiv:2405.14959  [pdf, other

    cs.CV cs.AI

    EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting

    Authors: Jiaxu Wang, Junhao He, Ziyi Zhang, Mingyuan Sun, Jingkai Sun, Renjing Xu

    Abstract: Event cameras offer promising advantages such as high dynamic range and low latency, making them well-suited for challenging lighting conditions and fast-moving scenarios. However, reconstructing 3D scenes from raw event streams is difficult because event data is sparse and does not carry absolute color information. To release its potential in 3D reconstruction, we propose the first event-based ge… ▽ More

    Submitted 3 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  38. arXiv:2405.14923  [pdf, other

    cs.LG

    How Does Bayes Error Limit Probabilistic Robust Accuracy

    Authors: Ruihan Zhang, Jun Sun

    Abstract: Adversarial examples pose a security threat to many critical systems built on neural networks. Given that deterministic robustness often comes with significantly reduced accuracy, probabilistic robustness (i.e., the probability of having the same label with a vicinity is $\ge 1-κ$) has been proposed as a promising way of achieving robustness whilst maintaining accuracy. However, existing training… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  39. arXiv:2405.14870  [pdf, other

    cs.CV cs.RO

    An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

    Authors: Jiahao Sun, Chunmei Qing, Xiang Xu, Lingdong Kong, Youquan Liu, Li Li, Chenming Zhu, Jingwei Zhang, Zeqi Xiao, Runnan Chen, Tai Wang, Wenwei Zhang, Kai Chen

    Abstract: In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments. Traditional approaches often rely on disparate, standalone codebases, hindering unified advancements and fair benchmarking across models. To address these challenges, we introduce MMDetection3D-lidarseg, a comprehensive toolbox designed for the efficient tra… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: Preprint; 17 pages, 4 figures, 7 tables; Code at https://github.com/open-mmlab/mmdetection3d

  40. arXiv:2405.14781  [pdf, other

    cs.CR cs.AI

    Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning

    Authors: Nay Myat Min, Long H. Pham, Jun Sun

    Abstract: The application of deep neural network models in various security-critical applications has raised significant security concerns, particularly the risk of backdoor attacks. Neural backdoors pose a serious security threat as they allow attackers to maliciously alter model behavior. While many defenses have been explored, existing approaches are often bounded by model-specific constraints, or necess… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  41. arXiv:2405.14295  [pdf, other

    cs.CV

    Focus Anywhere for Fine-grained Multi-page Document Understanding

    Authors: Chenglong Liu, Haoran Wei, Jinyue Chen, Lingyu Kong, Zheng Ge, Zining Zhu, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang

    Abstract: Modern LVLMs still struggle to achieve fine-grained document understanding, such as OCR/translation/caption for regions of interest to the user, tasks that require the context of the entire page, or even multiple pages. Accordingly, this paper proposes Fox, an effective pipeline, hybrid data, and tuning strategy, that catalyzes LVLMs to focus anywhere on single/multi-page documents. We introduce a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  42. arXiv:2405.14278  [pdf, other

    cs.CV

    SCMix: Stochastic Compound Mixing for Open Compound Domain Adaptation in Semantic Segmentation

    Authors: Kai Yao, Zhaorui Tan, Zixian Su, Xi Yang, Jie Sun, Kaizhu Huang

    Abstract: Open compound domain adaptation (OCDA) aims to transfer knowledge from a labeled source domain to a mix of unlabeled homogeneous compound target domains while generalizing to open unseen domains. Existing OCDA methods solve the intra-domain gaps by a divide-and-conquer strategy, which divides the problem into several individual and parallel domain adaptation (DA) tasks. Such approaches often conta… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  43. arXiv:2405.14125  [pdf, other

    cs.AI cs.CL

    ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation

    Authors: Jingnan Zheng, Han Wang, An Zhang, Tai D. Nguyen, Jun Sun, Tat-Seng Chua

    Abstract: Large Language Models (LLMs) can elicit unintended and even harmful content when misaligned with human values, posing severe risks to users and society. To mitigate these risks, current evaluation benchmarks predominantly employ expert-designed contextual scenarios to assess how well LLMs align with human values. However, the labor-intensive nature of these benchmarks limits their test scope, hind… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  44. arXiv:2405.11739  [pdf

    cs.LG cs.AI cs.CY

    Contactless Polysomnography: What Radio Waves Tell Us about Sleep

    Authors: Hao He, Chao Li, Wolfgang Ganglberger, Kaileigh Gallagher, Rumen Hristov, Michail Ouroutzoglou, Haoqi Sun, Jimeng Sun, Brandon Westover, Dina Katabi

    Abstract: The ability to assess sleep at home, capture sleep stages, and detect the occurrence of apnea (without on-body sensors) simply by analyzing the radio waves bouncing off people's bodies while they sleep is quite powerful. Such a capability would allow for longitudinal data collection in patients' homes, informing our understanding of sleep and its interaction with various diseases and their therape… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: The first two authors contributed equally to this work

  45. arXiv:2405.11547  [pdf, other

    stat.ML cs.CR cs.LG

    Certified Robust Accuracy of Neural Networks Are Bounded due to Bayes Errors

    Authors: Ruihan Zhang, Jun Sun

    Abstract: Adversarial examples pose a security threat to many critical systems built on neural networks. While certified training improves robustness, it also decreases accuracy noticeably. Despite various proposals for addressing this issue, the significant accuracy drop remains. More importantly, it is not clear whether there is a certain fundamental limit on achieving robustness whilst maintaining accura… ▽ More

    Submitted 20 June, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

    Comments: accepted by CAV 2024

  46. arXiv:2405.10674  [pdf, other

    cs.CV cs.AI

    From Sora What We Can See: A Survey of Text-to-Video Generation

    Authors: Rui Sun, Yumin Zhang, Tejal Shah, Jiahao Sun, Shuoying Zhang, Wenqi Li, Haoran Duan, Bo Wei, Rajiv Ranjan

    Abstract: With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence. Sora, developed by OpenAI, which is capable of minute-level world-simulative abilities can be considered as a milestone on this developmental path. However, despite its notable successes, Sora still encounters various obstacles that need to be resolved. In this survey, we embark fr… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: A comprehensive list of text-to-video generation studies in this survey is available at https://github.com/soraw-ai/Awesome-Text-to-Video-Generation

  47. arXiv:2405.10529  [pdf, other

    cs.CV cs.AI

    Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

    Authors: Jiachen Sun, Changsheng Wang, Jiongxiao Wang, Yiwei Zhang, Chaowei Xiao

    Abstract: Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and intera… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 15 pages

    ACM Class: I.2.7; I.4

  48. arXiv:2405.09006  [pdf, other

    cs.CV cs.CL

    Spatial Semantic Recurrent Mining for Referring Image Segmentation

    Authors: Jiaxing Yang, Lihe Zhang, Jiayu Sun, Huchuan Lu

    Abstract: Referring Image Segmentation (RIS) consistently requires language and appearance semantics to more understand each other. The need becomes acute especially under hard situations. To achieve, existing works tend to resort to various trans-representing mechanisms to directly feed forward language semantic along main RGB branch, which however will result in referent distribution weakly-mined in space… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  49. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  50. arXiv:2405.08651  [pdf, other

    cs.DC

    BeACONS: A Blockchain-enabled Authentication and Communications Network for Scalable IoV

    Authors: Qi Shi, Jingyi Sun, Hanwei Fu, Peizhe Fu, Jiayuan Ma, Hao Xu, Erwu Liu

    Abstract: This paper introduces a novel blockchain-enabled authentication and communications network for scalable Internet of Vehicles, which aims to bolster security and confidentiality, diminish communications latency, and reduce dependence on centralised infrastructures like Certificate Authorities and Public Key Infrastructures by leveraging Blockchain-enabled Domain Name Services and Blockchain-enabled… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.