Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 1,559 results for author: Huang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02608  [pdf, other

    cs.CV

    A Medical Multimodal Large Language Model for Pediatric Pneumonia

    Authors: Weiwei Tian, Xinyu Huang, Tianhao Cheng, Wen He, Jinwu Fang, Rui Feng, Daoying Geng, Xiaobo Zhang

    Abstract: Pediatric pneumonia is the leading cause of death among children under five years worldwide, imposing a substantial burden on affected families. Currently, there are three significant hurdles in diagnosing and treating pediatric pneumonia. Firstly, pediatric pneumonia shares similar symptoms with other respiratory diseases, making rapid and accurate differential diagnosis challenging. Secondly, pr… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

    Comments: 18 pages, 10 figures

  2. arXiv:2409.02465  [pdf, other

    cs.CL

    DetectiveQA: Evaluating Long-Context Reasoning on Detective Novels

    Authors: Zhe Xu, Jiasheng Ye, Xiangyang Liu, Tianxiang Sun, Xiaoran Liu, Qipeng Guo, Linlin Li, Qun Liu, Xuanjing Huang, Xipeng Qiu

    Abstract: With the rapid advancement of Large Language Models (LLMs), long-context information understanding and processing have become a hot topic in academia and industry. However, benchmarks for evaluating the ability of LLMs to handle long-context information do not seem to have kept pace with the development of LLMs. Despite the emergence of various long-context evaluation benchmarks, the types of capa… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2409.02438  [pdf, other

    cs.CV

    Non-target Divergence Hypothesis: Toward Understanding Domain Gaps in Cross-Modal Knowledge Distillation

    Authors: Yilong Chen, Zongyi Xu, Xiaoshui Huang, Shanshan Zhao, Xinqi Jiang, Xinyu Gao, Xinbo Gao

    Abstract: Compared to single-modal knowledge distillation, cross-modal knowledge distillation faces more severe challenges due to domain gaps between modalities. Although various methods have proposed various solutions to overcome these challenges, there is still limited research on how domain gaps affect cross-modal knowledge distillation. This paper provides an in-depth analysis and evaluation of this iss… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  4. arXiv:2409.02418  [pdf, other

    cs.CV

    MOSMOS: Multi-organ segmentation facilitated by medical report supervision

    Authors: Weiwei Tian, Xinyu Huang, Junlin Hou, Caiyue Ren, Longquan Jiang, Rui-Wei Zhao, Gang Jin, Yuejie Zhang, Daoying Geng

    Abstract: Owing to a large amount of multi-modal data in modern medical systems, such as medical images and reports, Medical Vision-Language Pre-training (Med-VLP) has demonstrated incredible achievements in coarse-grained downstream tasks (i.e., medical classification, retrieval, and visual question answering). However, the problem of transferring knowledge learned from Med-VLP to fine-grained multi-organ… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 14 pages, 7 figures

  5. arXiv:2409.00968  [pdf, other

    math.OC cs.AI cs.LG

    Solving Integrated Process Planning and Scheduling Problem via Graph Neural Network Based Deep Reinforcement Learning

    Authors: Hongpei Li, Han Zhang, Ziyan He, Yunkai Jia, Bo Jiang, Xiang Huang, Dongdong Ge

    Abstract: The Integrated Process Planning and Scheduling (IPPS) problem combines process route planning and shop scheduling to achieve high efficiency in manufacturing and maximize resource utilization, which is crucial for modern manufacturing systems. Traditional methods using Mixed Integer Linear Programming (MILP) and heuristic algorithms can not well balance solution quality and speed when solving IPPS… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 24 pages, 13 figures

  6. arXiv:2409.00920  [pdf, other

    cs.LG cs.AI cs.CL

    ToolACE: Winning the Points of LLM Function Calling

    Authors: Weiwen Liu, Xu Huang, Xingshan Zeng, Xinlong Hao, Shuai Yu, Dexun Li, Shuai Wang, Weinan Gan, Zhengying Liu, Yuanqing Yu, Zezhong Wang, Yuxian Wang, Wu Ning, Yutai Hou, Bin Wang, Chuhan Wu, Xinzhi Wang, Yong Liu, Yasheng Wang, Duyu Tang, Dandan Tu, Lifeng Shang, Xin Jiang, Ruiming Tang, Defu Lian , et al. (2 additional authors not shown)

    Abstract: Function calling significantly extends the application boundary of large language models, where high-quality and diverse training data is critical for unlocking this capability. However, real function-calling data is quite challenging to collect and annotate, while synthetic data generated by existing pipelines tends to lack coverage and accuracy. In this paper, we present ToolACE, an automatic ag… ▽ More

    Submitted 1 September, 2024; originally announced September 2024.

    Comments: 21 pages, 22 figures

  7. arXiv:2409.00590  [pdf, other

    cs.CV

    COMOGen: A Controllable Text-to-3D Multi-object Generation Framework

    Authors: Shaorong Sun, Shuchao Pang, Yazhou Yao, Xiaoshui Huang

    Abstract: The controllability of 3D object generation methods is achieved through input text. Existing text-to-3D object generation methods primarily focus on generating a single object based on a single object description. However, these methods often face challenges in producing results that accurately correspond to our desired positions when the input text involves multiple objects. To address the issue… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

  8. arXiv:2408.16809  [pdf, other

    cs.CV cs.CL cs.MM

    See or Guess: Counterfactually Regularized Image Captioning

    Authors: Qian Cao, Xu Chen, Ruihua Song, Xiting Wang, Xinting Huang, Yuchen Ren

    Abstract: Image captioning, which generates natural language descriptions of the visual information in an image, is a crucial task in vision-language research. Previous models have typically addressed this task by aligning the generative capabilities of machines with human intelligence through statistical fitting of existing datasets. While effective for normal images, they may struggle to accurately descri… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

  9. arXiv:2408.15276  [pdf, other

    cs.CV cs.AI cs.LG

    A Survey of Deep Learning for Group-level Emotion Recognition

    Authors: Xiaohua Huang, Jinke Xu, Wenming Zheng, Qirong Mao, Abhinav Dhall

    Abstract: With the advancement of artificial intelligence (AI) technology, group-level emotion recognition (GER) has emerged as an important area in analyzing human behavior. Early GER methods are primarily relied on handcrafted features. However, with the proliferation of Deep Learning (DL) techniques and their remarkable success in diverse tasks, neural networks have garnered increasing interest in GER. U… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 16 pages, 2 figures

  10. arXiv:2408.14874  [pdf, other

    cs.CL

    Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data

    Authors: Han Xia, Songyang Gao, Qiming Ge, Zhiheng Xi, Qi Zhang, Xuanjing Huang

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has proven effective in aligning large language models with human intentions, yet it often relies on complex methodologies like Proximal Policy Optimization (PPO) that require extensive hyper-parameter tuning and present challenges in sample efficiency and stability. In this paper, we introduce Inverse-Q*, an innovative framework that transcends tr… ▽ More

    Submitted 29 August, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

  11. arXiv:2408.14086  [pdf, other

    cs.GT cs.LG

    ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners

    Authors: Xiangge Huang, Jingyuan Li, Jiaqing Xie

    Abstract: With the constraint of a no regret follower, will the players in a two-player Stackelberg game still reach Stackelberg equilibrium? We first show when the follower strategy is either reward-average or transform-reward-average, the two players can always get the Stackelberg Equilibrium. Then, we extend that the players can achieve the Stackelberg equilibrium in the two-player game under the no regr… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 10 pages, 3 figures. Technical Report

  12. arXiv:2408.14084  [pdf, other

    cs.CV cs.MM

    HABD: a houma alliance book ancient handwritten character recognition database

    Authors: Xiaoyu Yuan, Xiaohua Huang, Zibo Zhang, Yabo Sun

    Abstract: The Houma Alliance Book, one of history's earliest calligraphic examples, was unearthed in the 1970s. These artifacts were meticulously organized, reproduced, and copied by the Shanxi Provincial Institute of Cultural Relics. However, because of their ancient origins and severe ink erosion, identifying characters in the Houma Alliance Book is challenging, necessitating the use of digital technology… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures

  13. arXiv:2408.13454  [pdf, other

    cs.CV

    AdaOcc: Adaptive-Resolution Occupancy Prediction

    Authors: Chao Chen, Ruoyu Wang, Yuliang Guo, Cheng Zhao, Xinyu Huang, Chen Feng, Liu Ren

    Abstract: Autonomous driving in complex urban scenarios requires 3D perception to be both comprehensive and precise. Traditional 3D perception methods focus on object detection, resulting in sparse representations that lack environmental detail. Recent approaches estimate 3D occupancy around vehicles for a more comprehensive scene representation. However, dense 3D occupancy prediction increases computationa… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  14. arXiv:2408.13452  [pdf, other

    cs.LG

    Data Augmentation for Continual RL via Adversarial Gradient Episodic Memory

    Authors: Sihao Wu, Xingyu Zhao, Xiaowei Huang

    Abstract: Data efficiency of learning, which plays a key role in the Reinforcement Learning (RL) training process, becomes even more important in continual RL with sequential environments. In continual RL, the learner interacts with non-stationary, sequential tasks and is required to learn new tasks without forgetting previous knowledge. However, there is little work on implementing data augmentation for co… ▽ More

    Submitted 26 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  15. Multivariate Time-Series Anomaly Detection based on Enhancing Graph Attention Networks with Topological Analysis

    Authors: Zhe Liu, Xiang Huang, Jingyun Zhang, Zhifeng Hao, Li Sun, Hao Peng

    Abstract: Unsupervised anomaly detection in time series is essential in industrial applications, as it significantly reduces the need for manual intervention. Multivariate time series pose a complex challenge due to their feature and temporal dimensions. Traditional methods use Graph Neural Networks (GNNs) or Transformers to analyze spatial while RNNs to model temporal dependencies. These methods focus narr… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, to be published in CIKM 2024

  16. arXiv:2408.11957  [pdf, other

    cs.GR

    Bimodal Visualization of Industrial X-Ray and Neutron Computed Tomography Data

    Authors: Xuan Huang, Haichao Miao, Hyojin Kim, Andrew Townsend, Kyle Champley, Joseph Tringe, Valerio Pascucci, Peer-Timo Bremer

    Abstract: Advanced manufacturing creates increasingly complex objects with material compositions that are often difficult to characterize by a single modality. Our collaborating domain scientists are going beyond traditional methods by employing both X-ray and neutron computed tomography to obtain complementary representations expected to better resolve material boundaries. However, the use of two modalitie… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  17. arXiv:2408.11831  [pdf

    cs.HC

    Web-based Visualization and Analytics of Petascale data: Equity as a Tide that Lifts All Boats

    Authors: Aashish Panta, Xuan Huang, Nina McCurdy, David Ellsworth, Amy Gooch, Giorgio Scorzelli, Hector Torres, Patrice Klein, Gustavo Ovando-Montejo, Valerio Pascucci

    Abstract: Scientists generate petabytes of data daily to help uncover environmental trends or behaviors that are hard to predict. For example, understanding climate simulations based on the long-term average of temperature, precipitation, and other environmental variables is essential to predicting and establishing root causes of future undesirable scenarios and assessing possible mitigation strategies. Whi… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 11 pages, 12 figures, conference paper

  18. arXiv:2408.11396  [pdf, other

    cs.CL

    MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing

    Authors: Hao Zhou, Zhijun Wang, Shujian Huang, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Weihua Luo, Jiajun Chen

    Abstract: Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic forgetting of the ability of original languages. Previous methods either achieve good expansion with severe forgetting or slight forgetting with poor expansion, ind… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  19. arXiv:2408.10497  [pdf, other

    cs.CL cs.AI

    QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention

    Authors: Yihang Wang, Xu Huang, Bowen Tian, Yixing Fan, Jiafeng Guo

    Abstract: Generative LLM have achieved significant success in various industrial tasks and can effectively adapt to vertical domains and downstream tasks through ICL. However, with tasks becoming increasingly complex, the context length required by ICL is also getting longer, and two significant issues arise: (i) The excessively long context leads to high costs and inference delays. (ii) A substantial amoun… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  20. arXiv:2408.10404  [pdf, other

    cs.CV eess.IV eess.SP

    Parallel Processing of Point Cloud Ground Segmentation for Mechanical and Solid-State LiDARs

    Authors: Xiao Zhang, Zhanhong Huang, Garcia Gonzalez Antony, Witek Jachimczyk, Xinming Huang

    Abstract: In this study, we introduce a novel parallel processing framework for real-time point cloud ground segmentation on FPGA platforms, aimed at adapting LiDAR algorithms to the evolving landscape from mechanical to solid-state LiDAR (SSL) technologies. Focusing on the ground segmentation task, we explore parallel processing techniques on existing approaches and adapt them to real-world SSL data handli… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 5 pages

  21. arXiv:2408.09110  [pdf, other

    cs.CV

    Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community

    Authors: Jiancheng Pan, Yanxing Liu, Yuqian Fu, Muyuan Ma, Jiaohao Li, Danda Pani Paudel, Luc Van Gool, Xiaomeng Huang

    Abstract: Object detection, particularly open-vocabulary object detection, plays a crucial role in Earth sciences, such as environmental monitoring, natural disaster assessment, and land-use planning. However, existing open-vocabulary detectors, primarily trained on natural-world images, struggle to generalize to remote sensing images due to a significant data domain gap. Thus, this paper aims to advance th… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  22. arXiv:2408.08959  [pdf, other

    cs.AI cs.CL

    Adaptive Guardrails For Large Language Models via Trust Modeling and In-Context Learning

    Authors: Jinwei Hu, Yi Dong, Xiaowei Huang

    Abstract: Guardrails have become an integral part of Large language models (LLMs), by moderating harmful or toxic response in order to maintain LLMs' alignment to human expectations. However, the existing guardrail methods do not consider different needs and access rights of individual users, and treat all the users with the same rule. This study introduces an adaptive guardrail mechanism, supported by trus… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Under Review

  23. arXiv:2408.08711  [pdf, ps, other

    cs.GT

    Weighted Envy-free Allocation with Subsidy

    Authors: Haris Aziz, Xin Huang, Kei Kimura, Indrajit Saha, Zhaohong Sun Mashbat Suzuki, Makoto Yokoo

    Abstract: We consider the problem of fair allocation with subsidy when agents have weighted entitlements. After highlighting several important differences from the unweighted cases, we present several results concerning weighted envy-freeability including general characterizations, algorithms for achieving and testing weighted envy-freeability, lower and upper bounds for worst case subsidy for non-wasteful… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 20 pages, 1 Table

  24. arXiv:2408.08640  [pdf, other

    cs.CL

    Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

    Authors: Wenwen Zhuang, Xin Huang, Xiantao Zhang, Jin Zeng

    Abstract: Multimodal Large Language Models (MLLMs) excel in solving text-based mathematical problems, but they struggle with mathematical diagrams since they are primarily trained on natural scene images. For humans, visual aids generally enhance problem-solving, but MLLMs perform worse as information shifts from textual to visual modality. This decline is mainly due to their shortcomings in aligning images… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  25. arXiv:2408.08341  [pdf, other

    q-bio.QM cs.AI cs.LG

    Exploring Latent Space for Generating Peptide Analogs Using Protein Language Models

    Authors: Po-Yu Liang, Xueting Huang, Tibo Duran, Andrew J. Wiemer, Jun Bai

    Abstract: Generating peptides with desired properties is crucial for drug discovery and biotechnology. Traditional sequence-based and structure-based methods often require extensive datasets, which limits their effectiveness. In this study, we proposed a novel method that utilized autoencoder shaped models to explore the protein embedding space, and generate novel peptide analogs by leveraging protein langu… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  26. arXiv:2408.07543  [pdf, other

    cs.CV cs.CL

    MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark

    Authors: Minxuan Zhou, Hao Liang, Tianpeng Li, Zhiyu Wu, Mingan Lin, Linzhuang Sun, Yaqi Zhou, Yan Zhang, Xiaoqin Huang, Yicong Chen, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: With the development of Multimodal Large Language Models (MLLMs), the evaluation of multimodal models in the context of mathematical problems has become a valuable research field. Multimodal visual-textual mathematical reasoning serves as a critical indicator for evaluating the comprehension and complex multi-step quantitative reasoning abilities of MLLMs. However, previous multimodal math benchma… ▽ More

    Submitted 23 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  27. arXiv:2408.07481  [pdf, other

    cs.CV

    DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency

    Authors: Xiaojing Zhong, Xinyi Huang, Xiaofeng Yang, Guosheng Lin, Qingyao Wu

    Abstract: Diffusion models usher a new era of video editing, flexibly manipulating the video contents with text prompts. Despite the widespread application demand in editing human-centered videos, these models face significant challenges in handling complex objects like humans. In this paper, we introduce DeCo, a novel video editing framework specifically designed to treat humans and the background as separ… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: European Conference on Computer Vision

  28. arXiv:2408.06604  [pdf, other

    cs.CV

    MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers

    Authors: Zichao Dong, Yilin Zhang, Xufeng Huang, Hang Ji, Zhan Shi, Xin Zhan, Junbo Chen

    Abstract: We introduce a novel MV-DETR pipeline which is effective while efficient transformer based detection method. Given input RGBD data, we notice that there are super strong pretraining weights for RGB data while less effective works for depth related data. First and foremost , we argue that geometry and texture cues are both of vital importance while could be encoded separately. Secondly, we find tha… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  29. arXiv:2408.06567  [pdf, other

    cs.CL cs.AI

    AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

    Authors: Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, Chengwei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu , et al. (2 additional authors not shown)

    Abstract: In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  30. arXiv:2408.06082  [pdf, ps, other

    cs.SE

    AutoCheck: Automatically Identifying Variables for Checkpointing by Data Dependency Analysis

    Authors: Xiang Fu, Weiping Zhang, Xin Huang, Shiman Meng, Wubiao Xu, Luanzheng Guo, Kento Sato

    Abstract: Checkpoint/Restart (C/R) has been widely deployed in numerous HPC systems, Clouds, and industrial data centers, which are typically operated by system engineers. Nevertheless, there is no existing approach that helps system engineers without domain expertise, and domain scientists without system fault tolerance knowledge identify those critical variables accounted for correct application execution… ▽ More

    Submitted 15 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: 11 pages, 7 figures, 4 tables

  31. arXiv:2408.06030  [pdf, other

    cs.RO

    Developing Smart MAVs for Autonomous Inspection in GPS-denied Constructions

    Authors: Paoqiang Pan, Kewei Hu, Xiao Huang, Wei Ying, Xiaoxuan Xie, Yue Ma, Naizhong Zhang, Hanwen Kang

    Abstract: Smart Micro Aerial Vehicles (MAVs) have transformed infrastructure inspection by enabling efficient, high-resolution monitoring at various stages of construction, including hard-to-reach areas. Traditional manual operation of drones in GPS-denied environments, such as industrial facilities and infrastructure, is labour-intensive, tedious and prone to error. This study presents an innovative framew… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  32. arXiv:2408.05575  [pdf, other

    cs.AI cs.GT

    In-Context Exploiter for Extensive-Form Games

    Authors: Shuxin Li, Chang Yang, Youzhi Zhang, Pengdeng Li, Xinrun Wang, Xiao Huang, Hau Chan, Bo An

    Abstract: Nash equilibrium (NE) is a widely adopted solution concept in game theory due to its stability property. However, we observe that the NE strategy might not always yield the best results, especially against opponents who do not adhere to NE strategies. Based on this observation, we pose a new game-solving question: Can we learn a model that can exploit any, even NE, opponent to maximize their own u… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  33. arXiv:2408.05456  [pdf, other

    cs.CL

    Path-LLM: A Shortest-Path-based LLM Learning for Unified Graph Representation

    Authors: Wenbo Shang, Xuliang Zhu, Xin Huang

    Abstract: Unified graph representation learning aims to produce node embeddings, which can be applied to multiple downstream applications. However, existing studies based on graph neural networks and language models either suffer from the limitations of numerous training needed toward specific downstream predictions or have shallow semantic features. In this work, we propose a novel Path-LLM model to learn… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 12 pages, 8 figures

  34. arXiv:2408.05452  [pdf, other

    cs.CV cs.RO

    EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

    Authors: Junjie Jiang, Hao Zhuang, Xinjie Huang, Delei Kong, Zheng Fang

    Abstract: Event cameras have the potential to revolutionize the field of robot vision, particularly in areas like stereo disparity estimation, owing to their high temporal resolution and high dynamic range. Many studies use deep learning for event camera stereo disparity estimation. However, these methods fail to fully exploit the temporal information in the event stream to acquire clear event representatio… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  35. arXiv:2408.03877  [pdf, other

    cs.LG cs.AI

    Knowledge Probing for Graph Representation Learning

    Authors: Mingyu Zhao, Xingyu Huang, Ziyu Lyu, Yanlin Wang, Lixin Cui, Lu Bai

    Abstract: Graph learning methods have been extensively applied in diverse application areas. However, what kind of inherent graph properties e.g. graph proximity, graph structural information has been encoded into graph representation learning for downstream tasks is still under-explored. In this paper, we propose a novel graph probing framework (GraphProbe) to investigate and interpret whether the family o… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  36. arXiv:2408.03677  [pdf, other

    cs.CV

    L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection

    Authors: Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, Cheng Wang

    Abstract: LiDAR-based vision systems are integral for 3D object detection, which is crucial for autonomous navigation. However, they suffer from performance degradation in adverse weather conditions due to the quality deterioration of LiDAR point clouds. Fusing LiDAR with the weather-robust 4D radar sensor is expected to solve this problem. However, the fusion of LiDAR and 4D radar is challenging because th… ▽ More

    Submitted 30 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  37. arXiv:2408.02914  [pdf, other

    cs.HC

    VirtualNexus: Enhancing 360-Degree Video AR/VR Collaboration with Environment Cutouts and Virtual Replicas

    Authors: Xincheng Huang, Michael Yin, Ziyi Xia, Robert Xiao

    Abstract: Asymmetric AR/VR collaboration systems bring a remote VR user to a local AR user's physical environment, allowing them to communicate and work within a shared virtual/physical space. Such systems often display the remote environment through 3D reconstructions or 360-degree videos. While 360-degree cameras stream an environment in higher quality, they lack spatial information, making them less inte… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 12 pages, 10 figures, to be published in The 37th Annual ACM Symposium on User Interface Software and Technology (UIST'24)

  38. arXiv:2408.02404  [pdf, other

    cs.IR

    Feedback Reciprocal Graph Collaborative Filtering

    Authors: Weijun Chen, Yuanchen Bei, Qijie Shen, Hao Chen, Xiao Huang, Feiran Huang

    Abstract: Collaborative filtering on user-item interaction graphs has achieved success in the industrial recommendation. However, recommending users' truly fascinated items poses a seesaw dilemma for collaborative filtering models learned from the interaction graph. On the one hand, not all items that users interact with are equally appealing. Some items are genuinely fascinating to users, while others are… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 9 pages, accepted by CIKM 2024

  39. arXiv:2408.01471  [pdf, other

    cs.CV cs.RO

    Enhancing Online Road Network Perception and Reasoning with Standard Definition Maps

    Authors: Hengyuan Zhang, David Paz, Yuliang Guo, Arun Das, Xinyu Huang, Karsten Haug, Henrik I. Christensen, Liu Ren

    Abstract: Autonomous driving for urban and highway driving applications often requires High Definition (HD) maps to generate a navigation plan. Nevertheless, various challenges arise when generating and maintaining HD maps at scale. While recent online mapping methods have started to emerge, their performance especially for longer ranges is limited by heavy occlusion in dynamic environments. With these cons… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  40. arXiv:2408.00662  [pdf, other

    cs.CL cs.LG

    Aligning Multiple Knowledge Graphs in a Single Pass

    Authors: Yaming Yang, Zhe Wang, Ziyu Guan, Wei Zhao, Weigang Lu, Xinyan Huang

    Abstract: Entity alignment (EA) is to identify equivalent entities across different knowledge graphs (KGs), which can help fuse these KGs into a more comprehensive one. Previous EA methods mainly focus on aligning a pair of KGs, and to the best of our knowledge, no existing EA method considers aligning multiple (more than two) KGs. To fill this research gap, in this work, we study a novel problem of alignin… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  41. arXiv:2407.21781  [pdf, other

    cs.RO

    Berkeley Humanoid: A Research Platform for Learning-based Control

    Authors: Qiayuan Liao, Bike Zhang, Xuanyu Huang, Xiaoyu Huang, Zhongyu Li, Koushil Sreenath

    Abstract: We introduce Berkeley Humanoid, a reliable and low-cost mid-scale humanoid research platform for learning-based control. Our lightweight, in-house-built robot is designed specifically for learning algorithms with low simulation complexity, anthropomorphic motion, and high reliability against falls. The robot's narrow sim-to-real gap enables agile and robust locomotion across various terrains in ou… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 12 pages, 9 figures

  42. arXiv:2407.21693  [pdf, other

    cs.AI

    TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities

    Authors: Ming Zhang, Caishuang Huang, Yilong Wu, Shichun Liu, Huiyuan Zheng, Yurui Dong, Yujiong Shen, Shihan Dou, Jun Zhao, Junjie Ye, Qi Zhang, Tao Gui, Xuanjing Huang

    Abstract: Task-oriented dialogue (TOD) systems aim to efficiently handle task-oriented conversations, including information collection. How to utilize TOD accurately, efficiently and effectively for information collection has always been a critical and challenging task. Recent studies have demonstrated that Large Language Models (LLMs) excel in dialogue, instruction generation, and reasoning, and can signif… ▽ More

    Submitted 7 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  43. arXiv:2407.21669  [pdf, other

    cs.CL cs.LG

    Synth-Empathy: Towards High-Quality Synthetic Empathy Data

    Authors: Hao Liang, Linzhuang Sun, Jingxuan Wei, Xijie Huang, Linkun Sun, Bihui Yu, Conghui He, Wentao Zhang

    Abstract: In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capabilities has become a crucial prerequisite. Consequently, managing and understanding empathetic datasets have gained increasing significance. However, empathetic data are typically human-labeled, leading to insufficient datasets and wasted human labor. In this work, we present… ▽ More

    Submitted 10 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2407.01937

  44. arXiv:2407.20756  [pdf, other

    cs.CV cs.CL

    SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

    Authors: Zheng Liu, Hao Liang, Xijie Huang, Wentao Xiong, Qinhan Yu, Linzhuang Sun, Chong Chen, Conghui He, Bin Cui, Wentao Zhang

    Abstract: Recently, with the rise of web images, managing and understanding large-scale image datasets has become increasingly important. Vision Large Language Models (VLLMs) have recently emerged due to their robust vision-understanding capabilities. However, training these models requires vast amounts of data, posing challenges to efficiency, effectiveness, data quality, and privacy. In this paper, we int… ▽ More

    Submitted 10 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  45. arXiv:2407.20157  [pdf, other

    cs.AI

    rLLM: Relational Table Learning with LLMs

    Authors: Weichen Li, Xiaotong Huang, Jianwu Zheng, Zheng Wang, Chaokun Wang, Li Pan, Jianhua Li

    Abstract: We introduce rLLM (relationLLM), a PyTorch library designed for Relational Table Learning (RTL) with Large Language Models (LLMs). The core idea is to decompose state-of-the-art Graph Neural Networks, LLMs, and Table Neural Networks into standardized modules, to enable the fast construction of novel RTL-type models in a simple "combine, align, and co-train" manner. To illustrate the usage of rLLM,… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  46. arXiv:2407.20057  [pdf

    physics.ao-ph cs.LG stat.AP

    Reconstructing Global Daily CO2 Emissions via Machine Learning

    Authors: Tao Li, Lixing Wang, Zihan Qiu, Philippe Ciais, Taochun Sun, Matthew W. Jones, Robbie M. Andrew, Glen P. Peters, Piyu ke, Xiaoting Huang, Robert B. Jackson, Zhu Liu

    Abstract: High temporal resolution CO2 emission data are crucial for understanding the drivers of emission changes, however, current emission dataset is only available on a yearly basis. Here, we extended a global daily CO2 emissions dataset backwards in time to 1970 using machine learning algorithm, which was trained to predict historical daily emissions on national scales based on relationships between da… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  47. arXiv:2407.19412  [pdf, other

    cs.AI

    Identity-Driven Hierarchical Role-Playing Agents

    Authors: Libo Sun, Siyuan Wang, Xuanjing Huang, Zhongyu Wei

    Abstract: Utilizing large language models (LLMs) to achieve role-playing has gained great attention recently. The primary implementation methods include leveraging refined prompts and fine-tuning on role-specific datasets. However, these methods suffer from insufficient precision and limited flexibility respectively. To achieve a balance between flexibility and precision, we construct a Hierarchical Identit… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  48. arXiv:2407.18426  [pdf, other

    physics.geo-ph cs.LG

    Diffusion-based subsurface multiphysics monitoring and forecasting

    Authors: Xinquan Huang, Fu Wang, Tariq Alkhalifah

    Abstract: Carbon capture and storage (CCS) plays a crucial role in mitigating greenhouse gas emissions, particularly from industrial outputs. Using seismic monitoring can aid in an accurate and robust monitoring system to ensure the effectiveness of CCS and mitigate associated risks. However, conventional seismic wave equation-based approaches are computationally demanding, which hinders real-time applicati… ▽ More

    Submitted 4 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  49. arXiv:2407.17802  [pdf, other

    cs.IR

    Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

    Authors: Shu Chen, Jinwei Luo, Weike Pan, Jiangxing Yu, Xin Huang, Zhong Ming

    Abstract: Sequential recommendation leverages interaction sequences to predict forthcoming user behaviors, crucial for crafting personalized recommendations. However, the true preferences of a user are inherently complex and high-dimensional, while the observed data is merely a simplified and low-dimensional projection of the rich preferences, which often leads to prevalent issues like data sparsity and ina… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 12 pages, 6 figures

  50. arXiv:2407.17638  [pdf

    cs.CL

    Time Matters: Examine Temporal Effects on Biomedical Language Models

    Authors: Weisi Liu, Zhe He, Xiaolei Huang

    Abstract: Time roots in applying language models for biomedical applications: models are trained on historical data and will be deployed for new or future data, which may vary from training data. While increasing biomedical tasks have employed state-of-the-art language models, there are very few studies have examined temporal effects on biomedical models when data usually shifts across development and deplo… ▽ More

    Submitted 11 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted to AMIA 2024 Annual Symposium