Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 295 results for author: Luo, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02601  [pdf, other

    cs.CY

    ChatGPT vs Social Surveys: Probing the Objective and Subjective Human Society

    Authors: Muzhi Zhou, Lu Yu, Xiaomin Geng, Lan Luo

    Abstract: The extent to which Large Language Models (LLMs) can simulate the data-generating process for social surveys remains unclear. Current research has not thoroughly assessed potential biases in the sociodemographic population represented within the language model's framework. Additionally, the subjective worlds of LLMs often show inconsistencies in how closely their responses match those of groups of… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  2. arXiv:2409.02581  [pdf, other

    cs.CV

    Object Gaussian for Monocular 6D Pose Estimation from Sparse Views

    Authors: Luqing Luo, Shichu Sun, Jiangang Yang, Linfang Zheng, Jinwei Du, Jian Liu

    Abstract: Monocular object pose estimation, as a pivotal task in computer vision and robotics, heavily depends on accurate 2D-3D correspondences, which often demand costly CAD models that may not be readily available. Object 3D reconstruction methods offer an alternative, among which recent advancements in 3D Gaussian Splatting (3DGS) afford a compelling potential. Yet its performance still suffers and tend… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  3. arXiv:2409.01686  [pdf, other

    cs.CV

    Frequency-Spatial Entanglement Learning for Camouflaged Object Detection

    Authors: Yanguang Sun, Chunyan Xu, Jian Yang, Hanyu Xuan, Lei Luo

    Abstract: Camouflaged object detection has attracted a lot of attention in computer vision. The main challenge lies in the high degree of similarity between camouflaged objects and their surroundings in the spatial domain, making identification difficult. Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design, bu… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted at ECCV 2024

  4. arXiv:2408.14873  [pdf, other

    cs.RO math.NA math.OC

    Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation

    Authors: Haozhe Lou, Yurong Liu, Yike Pan, Yiran Geng, Jianteng Chen, Wenlong Ma, Chenglong Li, Lin Wang, Hengzhen Feng, Lu Shi, Liyi Luo, Yongliang Shi

    Abstract: Real2Sim2Real plays a critical role in robotic arm control and reinforcement learning, yet bridging this gap remains a significant challenge due to the complex physical properties of robots and the objects they manipulate. Existing methods lack a comprehensive solution to accurately reconstruct real-world objects with spatial representations and their associated physics attributes. We propose a… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  5. arXiv:2408.12606  [pdf, other

    cs.CV cs.AI

    Towards Non-invasive and Personalized Management of Breast Cancer Patients from Multiparametric MRI via A Large Mixture-of-Modality-Experts Model

    Authors: Luyang Luo, Mingxiang Wu, Mei Li, Yi Xin, Qiong Wang, Varut Vardhanabhuti, Winnie CW Chu, Zhenhui Li, Juan Zhou, Pranav Rajpurkar, Hao Chen

    Abstract: Breast magnetic resonance imaging (MRI) is the imaging technique with the highest sensitivity for detecting breast cancer and is routinely used for women at high risk. Despite the comprehensive multiparametric protocol of breast MRI, existing artificial intelligence-based studies predominantly rely on single sequences and have limited validation. Here we report a large mixture-of-modality-experts… ▽ More

    Submitted 1 September, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 27 pages, 8 figures, 10 tables

  6. arXiv:2408.07037  [pdf, other

    cs.CV cs.AI

    PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology

    Authors: Xiaomin Wu, Rui Xu, Pengchen Wei, Wenkang Qin, Peixiang Huang, Ziheng Li, Lin Luo

    Abstract: Pathological diagnosis remains the definitive standard for identifying tumors. The rise of multimodal large models has simplified the process of integrating image analysis with textual descriptions. Despite this advancement, the substantial costs associated with training and deploying these complex multimodal models, together with a scarcity of high-quality training datasets, create a significant… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 10 pages, 2 figures

  7. arXiv:2408.05160  [pdf, other

    cs.LG

    Federated Hypergraph Learning with Hyperedge Completion

    Authors: Linfeng Luo, Fengxiao Tang, Xiyu Liu, Zhiqi Guo, Zihao Qiu, Ming Zhao

    Abstract: Hypergraph neural networks enhance conventional graph neural networks by capturing high-order relationships among nodes, which proves vital in data-rich environments where interactions are not merely pairwise. As data complexity and interconnectivity grow, it is common for graph-structured data to be split and stored in a distributed manner, underscoring the necessity of federated learning on subg… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  8. arXiv:2408.03867  [pdf, other

    cs.CV

    Surgformer: Surgical Transformer with Hierarchical Temporal Attention for Surgical Phase Recognition

    Authors: Shu Yang, Luyang Luo, Qiong Wang, Hao Chen

    Abstract: Existing state-of-the-art methods for surgical phase recognition either rely on the extraction of spatial-temporal features at a short-range temporal resolution or adopt the sequential extraction of the spatial and temporal features across the entire temporal resolution. However, these methods have limitations in modeling spatial-temporal dependency and addressing spatial-temporal redundancy: 1) T… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  9. PRISM: PRogressive dependency maxImization for Scale-invariant image Matching

    Authors: Xudong Cai, Yongcai Wang, Lun Luo, Minhang Wang, Deying Li, Jintao Xu, Weihao Gu, Rui Ai

    Abstract: Image matching aims at identifying corresponding points between a pair of images. Currently, detector-free methods have shown impressive performance in challenging scenarios, thanks to their capability of generating dense matches and global receptive field. However, performing feature interaction and proposing matches across the entire image is unnecessary, because not all image regions contribute… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 15 pages, 8 figures, ACM MM 2024. Supplementary materials are included

  10. arXiv:2408.01841  [pdf, other

    cs.RO

    BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles

    Authors: Lun Luo, Si-Yuan Cao, Xiaorui Li, Jintao Xu, Rui Ai, Zhu Yu, Xieyuanli Chen

    Abstract: This article introduces BEVPlace++, a novel, fast, and robust LiDAR global localization method for unmanned ground vehicles. It uses lightweight convolutional neural networks (CNNs) on Bird's Eye View (BEV) image-like representations of LiDAR data to achieve accurate global localization through place recognition followed by 3-DoF pose estimation. Our detailed analyses reveal an interesting fact th… ▽ More

    Submitted 9 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: Under review

  11. arXiv:2408.01784  [pdf, other

    cs.IR

    Graph Stochastic Neural Process for Inductive Few-shot Knowledge Graph Completion

    Authors: Zicheng Zhao, Linhao Luo, Shirui Pan, Chengqi Zhang, Chen Gong

    Abstract: Knowledge graphs (KGs) store enormous facts as relationships between entities. Due to the long-tailed distribution of relations and the incompleteness of KGs, there is growing interest in few-shot knowledge graph completion (FKGC). Existing FKGC methods often assume the existence of all entities in KGs, which may not be practical since new relations and entities can emerge over time. Therefore, we… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  12. arXiv:2408.01481  [pdf

    cs.CV cs.HC cs.LG

    Using a CNN Model to Assess Visual Artwork's Creativity

    Authors: Zhehan Zhang, Meihua Qian, Li Luo, Ripon Saha, Qianyi Gao, Xinxin Song

    Abstract: Assessing artistic creativity has long challenged researchers, with traditional methods proving time-consuming. Recent studies have applied machine learning to evaluate creativity in drawings, but not paintings. Our research addresses this gap by developing a CNN model to automatically assess the creativity of human paintings. Using a dataset of six hundred paintings by professionals and children,… ▽ More

    Submitted 16 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

  13. arXiv:2407.21770  [pdf, other

    cs.AI cs.LG

    MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

    Authors: Xi Victoria Lin, Akshat Shrivastava, Liang Luo, Srinivasan Iyer, Mike Lewis, Gargi Ghosh, Luke Zettlemoyer, Armen Aghajanyan

    Abstract: We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed for pre-training mixed-modal, early-fusion language models. MoMa processes images and text in arbitrary sequences by dividing expert modules into modality-specific groups. These groups exclusively process designated tokens while employing learned routing within each group to maintain semantically informed adap… ▽ More

    Submitted 12 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: v2 -> update related work section v3 -> fix spelling

  14. arXiv:2407.20455  [pdf, other

    cs.CV

    Learning Feature-Preserving Portrait Editing from Generated Pairs

    Authors: Bowei Chen, Tiancheng Zhi, Peihao Zhu, Shen Sang, Jing Liu, Linjie Luo

    Abstract: Portrait editing is challenging for existing techniques due to difficulties in preserving subject features like identity. In this paper, we propose a training-based method leveraging auto-generated paired data to learn desired editing while ensuring the preservation of unchanged subject features. Specifically, we design a data generation process to create reasonably good training pairs for desired… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  15. arXiv:2407.18249  [pdf, other

    cs.CV

    Trajectory-aligned Space-time Tokens for Few-shot Action Recognition

    Authors: Pulkit Kumar, Namitha Padmanabhan, Luke Luo, Sai Saketh Rambhatla, Abhinav Shrivastava

    Abstract: We propose a simple yet effective approach for few-shot action recognition, emphasizing the disentanglement of motion and appearance representations. By harnessing recent progress in tracking, specifically point trajectories and self-supervised representation learning, we build trajectory-aligned tokens (TATs) that capture motion and appearance information. This approach significantly reduces the… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  16. arXiv:2407.11442  [pdf, other

    cs.AI cs.CY cs.HC

    EARN Fairness: Explaining, Asking, Reviewing and Negotiating Artificial Intelligence Fairness Metrics Among Stakeholders

    Authors: Lin Luo, Yuri Nakao, Mathieu Chollet, Hiroya Inakoshi, Simone Stumpf

    Abstract: Numerous fairness metrics have been proposed and employed by artificial intelligence (AI) experts to quantitatively measure bias and define fairness in AI models. Recognizing the need to accommodate stakeholders' diverse fairness understandings, efforts are underway to solicit their input. However, conveying AI fairness metrics to stakeholders without AI expertise, capturing their personal prefere… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  17. arXiv:2407.08148  [pdf, other

    cs.CV

    SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

    Authors: Runmin Zhang, Jun Ma, Si-Yuan Cao, Lun Luo, Beinan Yu, Shu-Jie Chen, Junwei Li, Hui-Liang Shen

    Abstract: We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent fe… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  18. arXiv:2407.05467  [pdf, other

    cs.DC cs.AI

    The infrastructure powering IBM's Gen AI model development

    Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

    Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

  19. arXiv:2407.03195  [pdf, other

    math.OC cs.LG

    Incremental Gauss--Newton Methods with Superlinear Convergence Rates

    Authors: Zhiling Zhou, Zhuanghua Liu, Chengchang Liu, Luo Luo

    Abstract: This paper addresses the challenge of solving large-scale nonlinear equations with Hölder continuous Jacobians. We introduce a novel Incremental Gauss--Newton (IGN) method within explicit superlinear convergence rate, which outperforms existing methods that only achieve linear convergence rate. In particular, we formulate our problem by the nonlinear least squares with finite-sum structure, and ou… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 37 pages, 9 figures

  20. arXiv:2407.02356  [pdf, other

    eess.IV cs.CV cs.LG

    Enable the Right to be Forgotten with Federated Client Unlearning in Medical Imaging

    Authors: Zhipeng Deng, Luyang Luo, Hao Chen

    Abstract: The right to be forgotten, as stated in most data regulations, poses an underexplored challenge in federated learning (FL), leading to the development of federated unlearning (FU). However, current FU approaches often face trade-offs between efficiency, model performance, forgetting efficacy, and privacy preservation. In this paper, we delve into the paradigm of Federated Client Unlearning (FCU) t… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  21. arXiv:2406.19611  [pdf, other

    q-bio.QM cs.AI

    Multimodal Data Integration for Precision Oncology: Challenges and Future Directions

    Authors: Huajun Zhou, Fengtao Zhou, Chenyu Zhao, Yingxue Xu, Luyang Luo, Hao Chen

    Abstract: The essence of precision oncology lies in its commitment to tailor targeted treatments and care measures to each patient based on the individual characteristics of the tumor. The inherent heterogeneity of tumors necessitates gathering information from diverse data sources to provide valuable insights from various perspectives, fostering a holistic comprehension of the tumor. Over the past decade,… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 15 pages, 4 figures

  22. arXiv:2406.15859  [pdf, other

    cs.IR cs.AI

    LLM-Powered Explanations: Unraveling Recommendations Through Subgraph Reasoning

    Authors: Guangsi Shi, Xiaofeng Deng, Linhao Luo, Lijuan Xia, Lei Bao, Bei Ye, Fei Du, Shirui Pan, Yuxiao Li

    Abstract: Recommender systems are pivotal in enhancing user experiences across various web applications by analyzing the complicated relationships between users and items. Knowledge graphs(KGs) have been widely used to enhance the performance of recommender systems. However, KGs are known to be noisy and incomplete, which are hard to provide reliable explanations for recommendation results. An explainable r… ▽ More

    Submitted 29 June, 2024; v1 submitted 22 June, 2024; originally announced June 2024.

  23. arXiv:2406.11802  [pdf, other

    cs.CV

    PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models

    Authors: Fanqing Meng, Wenqi Shao, Lixin Luo, Yahong Wang, Yiran Chen, Quanfeng Lu, Yue Yang, Tianshuo Yang, Kaipeng Zhang, Yu Qiao, Ping Luo

    Abstract: Text-to-image (T2I) models have made substantial progress in generating images from textual prompts. However, they frequently fail to produce images consistent with physical commonsense, a vital capability for applications in world simulation and everyday tasks. Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal know… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  24. arXiv:2406.08305  [pdf, other

    cs.NI eess.SP

    Large Language Model(LLM) assisted End-to-End Network Health Management based on Multi-Scale Semanticization

    Authors: Fengxiao Tang, Xiaonan Wang, Xun Yuan, Linfeng Luo, Ming Zhao, Nei Kato

    Abstract: Network device and system health management is the foundation of modern network operations and maintenance. Traditional health management methods, relying on expert identification or simple rule-based algorithms, struggle to cope with the dynamic heterogeneous networks (DHNs) environment. Moreover, current state-of-the-art distributed anomaly detection methods, which utilize specific machine learn… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  25. arXiv:2406.06592  [pdf, other

    cs.CL cs.LG

    Improve Mathematical Reasoning in Language Models by Automated Process Supervision

    Authors: Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi

    Abstract: Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a leng… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 18 pages, 5 figures, 1 table

  26. arXiv:2405.18284  [pdf, other

    stat.ML cs.LG

    Adaptive debiased SGD in high-dimensional GLMs with streaming data

    Authors: Ruijian Han, Lan Luo, Yuanhang Luo, Yuanyuan Lin, Jian Huang

    Abstract: Online statistical inference facilitates real-time analysis of sequentially collected data, making it different from traditional methods that rely on static datasets. This paper introduces a novel approach to online inference in high-dimensional generalized linear models, where we update regression coefficient estimates and their standard errors upon each new data arrival. In contrast to existing… ▽ More

    Submitted 1 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: 37 pages, 4 figures

  27. arXiv:2405.16178  [pdf, other

    cs.CL

    Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

    Authors: Yun Zhu, Jia-Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, Bang Liu, Jindong Chen

    Abstract: Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Specifically, Sparse R… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  28. arXiv:2405.16126  [pdf, other

    math.OC cs.LG

    Near-Optimal Distributed Minimax Optimization under the Second-Order Similarity

    Authors: Qihao Zhou, Haishan Ye, Luo Luo

    Abstract: This paper considers the distributed convex-concave minimax optimization under the second-order similarity. We propose stochastic variance-reduced optimistic gradient sliding (SVOGS) method, which takes the advantage of the finite-sum structure in the objective by involving the mini-batch client sampling and variance reduction. We prove SVOGS can achieve the $\varepsilon$-duality gap within commun… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  29. arXiv:2405.14608  [pdf, other

    cs.LG cs.AI

    ShapeFormer: Shapelet Transformer for Multivariate Time Series Classification

    Authors: Xuan-May Le, Ling Luo, Uwe Aickelin, Minh-Tuan Tran

    Abstract: Multivariate time series classification (MTSC) has attracted significant research attention due to its diverse real-world applications. Recently, exploiting transformers for MTSC has achieved state-of-the-art performance. However, existing methods focus on generic features, providing a comprehensive understanding of data, but they ignore class-specific features crucial for learning the representat… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: Accepted at KDD 2024

  30. arXiv:2405.14206  [pdf, other

    cs.CV

    LG-VQ: Language-Guided Codebook Learning

    Authors: Guotao Liang, Baoquan Zhang, Yaowei Wang, Xutao Li, Yunming Ye, Huaibin Wang, Chuyao Luo, Kola Ye, linfeng Luo

    Abstract: Vector quantization (VQ) is a key technique in high-resolution and high-fidelity image synthesis, which aims to learn a codebook to encode an image with a sequence of discrete codes and then generate an image in an auto-regression manner. Although existing methods have shown superior performance, most methods prefer to learn a single-modal codebook (\emph{e.g.}, image), resulting in suboptimal per… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: None

  31. arXiv:2405.14170  [pdf, other

    cs.AI cs.CL

    Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning

    Authors: Jiapu Wang, Kai Sun, Linhao Luo, Wei Wei, Yongli Hu, Alan Wee-Chung Liew, Shirui Pan, Baocai Yin

    Abstract: Temporal Knowledge Graph Reasoning (TKGR) is the process of utilizing temporal information to capture complex relations within a Temporal Knowledge Graph (TKG) to infer new knowledge. Conventional methods in TKGR typically depend on deep learning algorithms or temporal logical rules. However, deep learning-based TKGRs often lack interpretability, whereas rule-based TKGRs struggle to effectively le… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  32. arXiv:2405.13675  [pdf, other

    cs.CV

    Context and Geometry Aware Voxel Transformer for Semantic Scene Completion

    Authors: Zhu Yu, Runming Zhang, Jiacheng Ying, Junchen Yu, Xiaohai Hu, Lun Luo, Siyuan Cao, Huiliang Shen

    Abstract: Vision-based Semantic Scene Completion (SSC) has gained much attention due to its widespread applications in various 3D perception tasks. Existing sparse-to-dense approaches typically employ shared context-independent queries across various input images, which fails to capture distinctions among them as the focal regions of different inputs vary and may result in undirected feature aggregation of… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  33. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  34. arXiv:2405.02608  [pdf, other

    cs.CV cs.AI cs.RO

    UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model

    Authors: Shuai Yuan, Lei Luo, Zhuo Hui, Can Pu, Xiaoyu Xiang, Rakesh Ranjan, Denis Demandolx

    Abstract: Traditional unsupervised optical flow methods are vulnerable to occlusions and motion boundaries due to lack of object-level information. Therefore, we propose UnSAMFlow, an unsupervised flow network that also leverages object information from the latest foundation model Segment Anything Model (SAM). We first include a self-supervised semantic augmentation module tailored to SAM masks. We also ana… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024. Code is available at https://github.com/facebookresearch/UnSAMFlow

  35. arXiv:2404.19025  [pdf

    cs.SE cs.CR

    Unsupervised Binary Code Translation with Application to Code Similarity Detection and Vulnerability Discovery

    Authors: Iftakhar Ahmad, Lannan Luo

    Abstract: Binary code analysis has immense importance in the research domain of software security. Today, software is very often compiled for various Instruction Set Architectures (ISAs). As a result, cross-architecture binary code analysis has become an emerging problem. Recently, deep learning-based binary analysis has shown promising success. It is widely known that training a deep learning model require… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: conference

    Journal ref: The 2023 Conference on Empirical Methods in Natural Language Processing. 2023

  36. arXiv:2404.15580  [pdf, other

    cs.CV

    MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis

    Authors: Jiaxin Zhuang, Linshan Wu, Qiong Wang, Varut Vardhanabhuti, Lin Luo, Hao Chen

    Abstract: The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Mask AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the per… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: submitted to journal

  37. arXiv:2404.14209  [pdf

    cs.CL

    EnzChemRED, a rich enzyme chemistry relation extraction dataset

    Authors: Po-Ting Lai, Elisabeth Coudert, Lucila Aimo, Kristian Axelsen, Lionel Breuza, Edouard de Castro, Marc Feuermann, Anne Morgat, Lucille Pourcel, Ivo Pedruzzi, Sylvain Poux, Nicole Redaschi, Catherine Rivoire, Anastasia Sveshnikova, Chih-Hsuan Wei, Robert Leaman, Ling Luo, Zhiyong Lu, Alan Bridge

    Abstract: Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) metho… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  38. arXiv:2404.13692  [pdf, other

    cs.CV

    A sustainable development perspective on urban-scale roof greening priorities and benefits

    Authors: Jie Shao, Wei Yao, Lei Luo, Linzhou Zeng, Zhiyi He, Puzuo Wang, Huadong Guo

    Abstract: Greenspaces are tightly linked to human well-being. Yet, rapid urbanization has exacerbated greenspace exposure inequality and declining human life quality. Roof greening has been recognized as an effective strategy to mitigate these negative impacts. Understanding priorities and benefits is crucial to promoting green roofs. Here, using geospatial big data, we conduct an urban-scale assessment of… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  39. Accelerating Geo-distributed Machine Learning with Network-Aware Adaptive Tree and Auxiliary Route

    Authors: Zonghang Li, Wenjiao Feng, Weibo Cai, Hongfang Yu, Long Luo, Gang Sun, Hongyang Du, Dusit Niyato

    Abstract: Distributed machine learning is becoming increasingly popular for geo-distributed data analytics, facilitating the collaborative analysis of data scattered across data centers in different regions. This paradigm eliminates the need for centralizing sensitive raw data in one location but faces the significant challenge of high parameter synchronization delays, which stems from the constraints of ba… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 17 pages, 20 figures

    MSC Class: 68T99 ACM Class: I.2.11; C.2.4

  40. arXiv:2404.03149  [pdf, other

    cs.RO

    Design and Evaluation of a Compact 3D End-effector Assistive Robot for Adaptive Arm Support

    Authors: Sibo Yang, Lincong Luo, Wei Chuan Law, Youlong Wang, Lei Li, Wei Tech Ang

    Abstract: We developed a 3D end-effector type of upper limb assistive robot, named as Assistive Robotic Arm Extender (ARAE), that provides transparency movement and adaptive arm support control to achieve home-based therapy and training in the real environment. The proposed system composes five degrees of freedom, including three active motors and two passive joints at the end-effector module. The core stru… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 11 pages

  41. arXiv:2404.01693  [pdf, other

    cs.LG

    HeMeNet: Heterogeneous Multichannel Equivariant Network for Protein Multitask Learning

    Authors: Rong Han, Wenbing Huang, Lingxiao Luo, Xinyan Han, Jiaming Shen, Zhiqiang Zhang, Jun Zhou, Ting Chen

    Abstract: Understanding and leveraging the 3D structures of proteins is central to a variety of biological and drug discovery tasks. While deep learning has been applied successfully for structure-based protein function prediction tasks, current methods usually employ distinct training for each task. However, each of the tasks is of small size, and such a single-task strategy hinders the models' performance… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  42. arXiv:2404.00729  [pdf, other

    eess.SY cs.LG

    Nonparametric End-to-End Probabilistic Forecasting of Distributed Generation Outputs Considering Missing Data Imputation

    Authors: Minghui Chen, Zichao Meng, Yanping Liu, Longbo Luo, Ye Guo, Kang Wang

    Abstract: In this paper, we introduce a nonparametric end-to-end method for probabilistic forecasting of distributed renewable generation outputs while including missing data imputation. Firstly, we employ a nonparametric probabilistic forecast model utilizing the long short-term memory (LSTM) network to model the probability distributions of distributed renewable generations' outputs. Secondly, we design a… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  43. arXiv:2403.19919  [pdf, other

    cs.CV

    Diff-Reg v1: Diffusion Matching Model for Registration Problem

    Authors: Qianliang Wu, Haobo Jiang, Lei Luo, Jun Li, Yaqing Ding, Jin Xie, Jian Yang

    Abstract: Establishing reliable correspondences is essential for registration tasks such as 3D and 2D3D registration. Existing methods commonly leverage geometric or semantic point features to generate potential correspondences. However, these features may face challenges such as large deformation, scale inconsistency, and ambiguous matching problems (e.g., symmetry). Additionally, many previous methods, wh… ▽ More

    Submitted 24 July, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.00436

  44. arXiv:2403.18762  [pdf, other

    cs.CV cs.AI cs.RO

    ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

    Authors: Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen

    Abstract: Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 8 pages, 11 figures, conference

  45. arXiv:2403.16386  [pdf, other

    cs.CV cs.AI

    Dia-LLaMA: Towards Large Language Model-driven CT Report Generation

    Authors: Zhixuan Chen, Luyang Luo, Yequan Bie, Hao Chen

    Abstract: Medical report generation has achieved remarkable advancements yet has still been faced with several challenges. First, the inherent imbalance in the distribution of normal and abnormal cases may lead models to exhibit a biased focus on normal samples, resulting in unreliable diagnoses. Second, the frequent occurrence of common template sentences in the reports may overwhelm the critical abnormal… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 10 pages

  46. arXiv:2403.15931  [pdf, other

    cs.CV cs.AI

    X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention

    Authors: You Xie, Hongyi Xu, Guoxian Song, Chao Wang, Yichun Shi, Linjie Luo

    Abstract: We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation. Specifically, given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions along with wide-range head movements. As its core, we leverage the gen… ▽ More

    Submitted 25 July, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: SIGGRAPH 2024

  47. arXiv:2403.13512  [pdf, other

    cs.CV cs.AI

    Scale Decoupled Distillation

    Authors: Shicai Wei Chunbo Luo Yang Luo

    Abstract: Logit knowledge distillation attracts increasing attention due to its practicality in recent studies. However, it often suffers inferior performance compared to the feature knowledge distillation. In this paper, we argue that existing logit-based methods may be sub-optimal since they only leverage the global logit output that couples multiple semantic knowledge. This may transfer ambiguous knowled… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024 10 pages 6figure

  48. arXiv:2403.12451  [pdf, other

    cs.AI

    End-to-End Neuro-Symbolic Reinforcement Learning with Textual Explanations

    Authors: Lirui Luo, Guoxi Zhang, Hongming Xu, Yaodong Yang, Cong Fang, Qing Li

    Abstract: Neuro-symbolic reinforcement learning (NS-RL) has emerged as a promising paradigm for explainable decision-making, characterized by the interpretability of symbolic policies. NS-RL entails structured state representations for tasks with visual observations, but previous methods cannot refine the structured states with rewards due to a lack of efficiency. Accessibility also remains an issue, as ext… ▽ More

    Submitted 13 June, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: ICML 2024. Project page: https://ins-rl.github.io/

  49. arXiv:2403.10758  [pdf

    cs.CL

    Rules still work for Open Information Extraction

    Authors: Jialin Hua, Liangqing Luo, Weiying Ping, Yan Liao, Chunhai Tao, Xuewen Lub

    Abstract: Open information extraction (OIE) aims to extract surface relations and their corresponding arguments from natural language text, irrespective of domain. This paper presents an innovative OIE model, APRCOIE, tailored for Chinese text. Diverging from previous models, our model generates extraction patterns autonomously. The model defines a new pattern form for Chinese OIE and proposes an automated… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  50. arXiv:2403.09410  [pdf, other

    cs.CV cs.AI

    XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization

    Authors: Yequan Bie, Luyang Luo, Zhixuan Chen, Hao Chen

    Abstract: Utilizing potent representations of the large vision-language models (VLMs) to accomplish various downstream tasks has attracted increasing attention. Within this research field, soft prompt learning has become a representative approach for efficiently adapting VLMs such as CLIP, to tasks like image classification. However, most existing prompt learning methods learn text tokens that are unexplain… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.