Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 422 results for author: Huang, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13498  [pdf, other

    cs.LG

    Rethinking State Disentanglement in Causal Reinforcement Learning

    Authors: Haiyao Cao, Zhen Zhang, Panpan Cai, Yuhang Liu, Jinan Zou, Ehsan Abbasnejad, Biwei Huang, Mingming Gong, Anton van den Hengel, Javen Qinfeng Shi

    Abstract: One of the significant challenges in reinforcement learning (RL) when dealing with noise is estimating latent states from observations. Causality provides rigorous theoretical support for ensuring that the underlying states can be uniquely recovered through identifiability. Consequently, some existing work focuses on establishing identifiability from a causal perspective to aid in the design of al… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  2. arXiv:2408.13316  [pdf, other

    quant-ph cs.ET

    QuCLEAR: Clifford Extraction and Absorption for Significant Reduction in Quantum Circuit Size

    Authors: Ji Liu, Alvin Gonzales, Benchen Huang, Zain Hamid Saleem, Paul Hovland

    Abstract: Quantum computing carries significant potential for addressing practical problems. However, currently available quantum devices suffer from noisy quantum gates, which degrade the fidelity of executed quantum circuits. Therefore, quantum circuit optimization is crucial for obtaining useful results. In this paper, we present QuCLEAR, a compilation framework designed to optimize quantum circuits. QuC… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

    Comments: 13 pages, 9 figures, 2 tables

  3. arXiv:2408.13126  [pdf, other

    cs.CV

    CathAction: A Benchmark for Endovascular Intervention Understanding

    Authors: Baoru Huang, Tuan Vo, Chayun Kongtongvattana, Giulio Dagnino, Dennis Kundrat, Wenqiang Chi, Mohamed Abdelaziz, Trevor Kwok, Tudor Jianu, Tuong Do, Hieu Le, Minh Nguyen, Hoan Nguyen, Erman Tjiputra, Quang Tran, Jianyang Xie, Yanda Meng, Binod Bhattarai, Zhaorui Tan, Hongbin Liu, Hong Seng Gan, Wei Wang, Xi Yang, Qiufeng Wang, Jionglong Su , et al. (13 additional authors not shown)

    Abstract: Real-time visual feedback from catheterization analysis is crucial for enhancing surgical safety and efficiency during endovascular interventions. However, existing datasets are often limited to specific tasks, small scale, and lack the comprehensive annotations necessary for broader endovascular intervention understanding. To tackle these limitations, we introduce CathAction, a large-scale datase… ▽ More

    Submitted 30 August, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: 10 pages. Webpage: https://airvlab.github.io/cathaction/

  4. arXiv:2408.10821  [pdf, other

    cs.CV

    Constructing a High Temporal Resolution Global Lakes Dataset via Swin-Unet with Applications to Area Prediction

    Authors: Yutian Han, Baoxiang Huang, He Gao

    Abstract: Lakes provide a wide range of valuable ecosystem services, such as water supply, biodiversity habitats, and carbon sequestration. However, lakes are increasingly threatened by climate change and human activities. Therefore, continuous global monitoring of lake dynamics is crucial, but remains challenging on a large scale. The recently developed Global Lakes Area Database (GLAKES) has mapped over 3… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  5. arXiv:2408.10657  [pdf, other

    cs.CR cs.AI

    ETGuard: Malicious Encrypted Traffic Detection in Blockchain-based Power Grid Systems

    Authors: Peng Zhou, Yongdong Liu, Lixun Ma, Weiye Zhang, Haohan Tan, Zhenguang Liu, Butian Huang

    Abstract: The escalating prevalence of encryption protocols has led to a concomitant surge in the number of malicious attacks that hide in encrypted traffic. Power grid systems, as fundamental infrastructure, are becoming prime targets for such attacks. Conventional methods for detecting malicious encrypted packets typically use a static pre-trained model. We observe that these methods are not well-suited f… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  6. arXiv:2408.08946  [pdf, other

    cs.CY

    Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges

    Authors: Baixiang Huang, Canyu Chen, Kai Shu

    Abstract: Accurate attribution of authorship is crucial for maintaining the integrity of digital content, improving forensic investigations, and mitigating the risks of misinformation and plagiarism. Addressing the imperative need for proper authorship attribution is essential to uphold the credibility and accountability of authentic authorship. The rapid advancements of Large Language Models (LLMs) have bl… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 12 pages for the main paper. More resources and a curated list of papers are available and regularly updated at https://llm-authorship.github.io

  7. arXiv:2408.08815  [pdf, other

    cs.LG

    An Empirical Examination of Balancing Strategy for Counterfactual Estimation on Time Series

    Authors: Qiang Huang, Chuizheng Meng, Defu Cao, Biwei Huang, Yi Chang, Yan Liu

    Abstract: Counterfactual estimation from observations represents a critical endeavor in numerous application fields, such as healthcare and finance, with the primary challenge being the mitigation of treatment bias. The balancing strategy aimed at reducing covariate disparities between different treatment groups serves as a universal solution. However, when it comes to the time series data, the effectivenes… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: ICML 2024 Carema Ready Version. 20 Pages, 12 Figures, 10 Tables

  8. arXiv:2408.07605  [pdf, other

    cs.CV

    Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving

    Authors: Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

    Abstract: The field of autonomous driving increasingly demands high-quality annotated video training data. In this paper, we propose Panacea+, a powerful and universally applicable framework for generating video data in driving scenes. Built upon the foundation of our previous work, Panacea, Panacea+ adopts a multi-view appearance noise prior mechanism and a super-resolution module for enhanced consistency… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Project page: https://panacea-ad.github.io/. arXiv admin note: text overlap with arXiv:2311.16813

  9. arXiv:2408.07471  [pdf, other

    cs.CL

    Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

    Authors: Yuxin Jiang, Bo Huang, Yufei Wang, Xingshan Zeng, Liangyou Li, Yasheng Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Wei Wang

    Abstract: Direct preference optimization (DPO), a widely adopted offline preference optimization algorithm, aims to align large language models (LLMs) with human-desired behaviors using pairwise preference data. However, the winning response and the losing response within pairwise data are generated isolatedly, leading to weak correlations between them as well as suboptimal alignment performance. To address… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 18 pages, 8 figures, 8 tables, working in progress

  10. GeoFormer: Learning Point Cloud Completion with Tri-Plane Integrated Transformer

    Authors: Jinpeng Yu, Binbin Huang, Yuxuan Zhang, Huaxia Li, Xu Tang, Shenghua Gao

    Abstract: Point cloud completion aims to recover accurate global geometry and preserve fine-grained local details from partial point clouds. Conventional methods typically predict unseen points directly from 3D point cloud coordinates or use self-projected multi-view depth maps to ease this task. However, these gray-scale depth maps cannot reach multi-view consistency, consequently restricting the performan… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: accepted by the 32nd ACM International Conference on Multimedia (MM'24)

  11. arXiv:2408.05457  [pdf, other

    cs.CL cs.AI

    Investigating Instruction Tuning Large Language Models on Graphs

    Authors: Kerui Zhu, Bo-Wei Huang, Bowen Jin, Yizhu Jiao, Ming Zhong, Kevin Chang, Shou-De Lin, Jiawei Han

    Abstract: Inspired by the recent advancements of Large Language Models (LLMs) in NLP tasks, there's growing interest in applying LLMs to graph-related tasks. This study delves into the capabilities of instruction-following LLMs for engaging with real-world graphs, aiming to offer empirical insights into how LLMs can effectively interact with graphs and generalize across graph tasks. We begin by constructing… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: COLM 2024

  12. arXiv:2408.00114  [pdf, other

    cs.AI

    Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs

    Authors: Kewei Cheng, Jingfeng Yang, Haoming Jiang, Zhengyang Wang, Binxuan Huang, Ruirui Li, Shiyang Li, Zheng Li, Yifan Gao, Xian Li, Bing Yin, Yizhou Sun

    Abstract: Reasoning encompasses two typical types: deductive reasoning and inductive reasoning. Despite extensive research into the reasoning capabilities of Large Language Models (LLMs), most studies have failed to rigorously differentiate between inductive and deductive reasoning, leading to a blending of the two. This raises an essential question: In LLM reasoning, which poses a greater challenge - deduc… ▽ More

    Submitted 6 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  13. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  14. arXiv:2407.20651   

    cs.LG

    Towards Generalizable Reinforcement Learning via Causality-Guided Self-Adaptive Representations

    Authors: Yupei Yang, Biwei Huang, Fan Feng, Xinyue Wang, Shikui Tu, Lei Xu

    Abstract: General intelligence requires quick adaption across tasks. While existing reinforcement learning (RL) methods have made progress in generalization, they typically assume only distribution changes between source and target domains. In this paper, we explore a wider range of scenarios where both the distribution and environment spaces may change. For example, in Atari games, we train agents to gener… ▽ More

    Submitted 31 July, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: This paper was submitted to NeurIPS24. According to the reviews, there are some mistakes in the Theorems in this papers. Moreover, we will choose some other environments for experiments, which means that it takes at least months to update/rewrite the Experiment & Appendix Sections. So we need to withdraw this paper for major revision

  15. arXiv:2407.20521  [pdf, ps, other

    math.DS cs.SC

    Integrability and Linearizability of a Family of Three-Dimensional Polynomial Systems

    Authors: Bo Huang, Ivan Mastev, Valery Romanovski

    Abstract: We investigate the local integrability and linearizability of a family of three-dimensional polynomial systems with the matrix of the linear approximation having the eigenvalues $1, ζ, ζ^2 $, where $ζ$ is a primitive cubic root of unity. We establish a criterion for the convergence of the Poincaré--Dulac normal form of the systems and examine the relationship between the normal form and integrabil… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  16. arXiv:2407.20506  [pdf, other

    cs.LG cs.AI

    Boosting Efficiency in Task-Agnostic Exploration through Causal Knowledge

    Authors: Yupei Yang, Biwei Huang, Shikui Tu, Lei Xu

    Abstract: The effectiveness of model training heavily relies on the quality of available training resources. However, budget constraints often impose limitations on data collection efforts. To tackle this challenge, we introduce causal exploration in this paper, a strategy that leverages the underlying causal knowledge for both data collection and model training. We, in particular, focus on enhancing the sa… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: This paper was accepted by IJCAI'24

  17. arXiv:2407.20224  [pdf, other

    cs.CL

    Can Editing LLMs Inject Harm?

    Authors: Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu

    Abstract: Knowledge editing has been increasingly adopted to correct the false or outdated knowledge in Large Language Models (LLMs). Meanwhile, one critical but under-explored question is: can knowledge editing be used to inject harm into LLMs? In this paper, we propose to reformulate knowledge editing as a new type of safety threat for LLMs, namely Editing Attack, and conduct a systematic investigation wi… ▽ More

    Submitted 16 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally. 9 pages for main paper, 36 pages including appendix. The code, results, dataset for this paper and more resources are on the project website: https://llm-editing.github.io

  18. arXiv:2407.19877  [pdf, other

    cs.RO cs.CV

    Language-driven Grasp Detection with Mask-guided Attention

    Authors: Tuan Van Vo, Minh Nhat Vu, Baoru Huang, An Vuong, Ngan Le, Thieu Vo, Anh Nguyen

    Abstract: Grasp detection is an essential task in robotics with various industrial applications. However, traditional methods often struggle with occlusions and do not utilize language for grasping. Incorporating natural language into grasp detection remains a challenging task and largely unexplored. To address this gap, we propose a new method for language-driven grasp detection with mask-guided attention… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted at IROS 2024

  19. arXiv:2407.19102  [pdf, ps, other

    cs.CC

    The Computational Complexity of Factored Graphs

    Authors: Shreya Gupta, Boyang Huang, Russell Impagliazzo, Stanley Woo, Christopher Ye

    Abstract: Computational complexity is traditionally measured with respect to input size. For graphs, this is typically the number of vertices (or edges) of the graph. However, for large graphs even explicitly representing the graph could be prohibitively expensive. Instead, graphs with enough structure could admit more succinct representations. A number of previous works have considered various succinct rep… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  20. Effect of Duration and Delay on the Identifiability of VR Motion

    Authors: Mark Roman Miller, Vivek Nair, Eugy Han, Cyan DeVeaux, Christian Rack, Rui Wang, Brandon Huang, Marc Erich Latoschik, James F. O'Brien, Jeremy N. Bailenson

    Abstract: Social virtual reality is an emerging medium of communication. In this medium, a user's avatar (virtual representation) is controlled by the tracked motion of the user's headset and hand controllers. This tracked motion is a rich data stream that can leak characteristics of the user or can be effectively matched to previously-identified data to identify a user. To better understand the boundaries… ▽ More

    Submitted 26 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: 6 pages, 2 figures, presented at the SePAR workshop (Security and Privacy in Mixed, Augmented, and Virtual Realities), co-located with WoWMoM 2024. arXiv admin note: text overlap with arXiv:2303.01430

  21. Effect of Data Degradation on Motion Re-Identification

    Authors: Vivek Nair, Mark Roman Miller, Rui Wang, Brandon Huang, Christian Rack, Marc Erich Latoschik, James F. O'Brien

    Abstract: The use of virtual and augmented reality devices is increasing, but these sensor-rich devices pose risks to privacy. The ability to track a user's motion and infer the identity or characteristics of the user poses a privacy risk that has received significant attention. Existing deep-network-based defenses against this risk, however, require significant amounts of training data and have not yet bee… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 6 pages, 4 figures, presented at the SePAR (Security and Privacy in Mixed, Virtual, and Augmented Realities) workshop, co-located with WoWMoM 2024 in Perth, Australia

  22. arXiv:2407.17967  [pdf, other

    cs.RO cs.CV

    Lightweight Language-driven Grasp Detection using Conditional Consistency Model

    Authors: Nghia Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Ngan Le, Thieu Vo, Anh Nguyen

    Abstract: Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. In this work, we present a new approach for language-driven grasp detection that leverages the concept of lightweight diffusion models to achieve fast inference time. By integrating diffusion processes with grasping prompts in natural language, our method can effectively encode v… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted at IROS 2024

  23. arXiv:2407.16975  [pdf, other

    cs.LG stat.ME

    On the Parameter Identifiability of Partially Observed Linear Causal Models

    Authors: Xinshuai Dong, Ignavier Ng, Biwei Huang, Yuewen Sun, Songyao Jin, Roberto Legaspi, Peter Spirtes, Kun Zhang

    Abstract: Linear causal models are important tools for modeling causal dependencies and yet in practice, only a subset of the variables can be observed. In this paper, we examine the parameter identifiability of these models by investigating whether the edge coefficients can be recovered given the causal structure and partially observed data. Our setting is more general than that of prior research - we allo… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  24. arXiv:2407.16131  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    Crystals with Transformers on Graphs, for Prediction of Unconventional Crystal Material Properties and the Benchmark

    Authors: Hongyi Wang, Ji Sun, Jinzhe Liang, Li Zhai, Zitian Tang, Zijian Li, Wei Zhai, Xusheng Wang, Weihao Gao, Sheng Gong, Bolong Huang, Hua Zhang

    Abstract: The ionic bonding across the lattice and ordered microscopic structures endow crystals with unique symmetry and determine their macroscopic properties. Unconventional crystals, in particular, exhibit non-traditional lattice structures or possess exotic physical properties, making them intriguing subjects for investigation. Therefore, to accurately predict the physical and chemical properties of cr… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  25. arXiv:2407.15212  [pdf, other

    cs.CV cs.GR

    Surfel-based Gaussian Inverse Rendering for Fast and Relightable Dynamic Human Reconstruction from Monocular Video

    Authors: Yiqun Zhao, Chenming Wu, Binbin Huang, Yihao Zhi, Chen Zhao, Jingdong Wang, Shenghua Gao

    Abstract: Efficient and accurate reconstruction of a relightable, dynamic clothed human avatar from a monocular video is crucial for the entertainment industry. This paper introduces the Surfel-based Gaussian Inverse Avatar (SGIA) method, which introduces efficient training and rendering for relightable dynamic human reconstruction. SGIA advances previous Gaussian Avatar methods by comprehensively modeling… ▽ More

    Submitted 23 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: Under Review; Project Page: https://GS-IA.github.io

  26. arXiv:2407.13842  [pdf, other

    cs.RO cs.CV

    Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

    Authors: Toan Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Quan Vuong, Ngan Le, Thieu Vo, Anh Nguyen

    Abstract: 6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering effective collaboration between robots and users in complex 3D environments. In this paper, we present a new approach for language-driven 6-DoF grasp detection i… ▽ More

    Submitted 25 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  27. arXiv:2407.12470  [pdf, other

    cs.CL

    Continual Learning for Temporal-Sensitive Question Answering

    Authors: Wanqi Yang, Yunqiu Xu, Yanda Li, Kunze Wang, Binbin Huang, Ling Chen

    Abstract: In this study, we explore an emerging research area of Continual Learning for Temporal Sensitive Question Answering (CLTSQA). Previous research has primarily focused on Temporal Sensitive Question Answering (TSQA), often overlooking the unpredictable nature of future events. In real-world applications, it's crucial for models to continually acquire knowledge over time, rather than relying on a sta… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCNN 2024

  28. arXiv:2407.12105  [pdf, other

    cs.RO cs.HC

    AeroHaptix: A Wearable Vibrotactile Feedback System for Enhancing Collision Avoidance in UAV Teleoperation

    Authors: Bingjian Huang, Zhecheng Wang, Qilong Cheng, Siyi Ren, Hanfeng Cai, Antonio Alvarez Valdivia, Karthik Mahadevan, Daniel Wigdor

    Abstract: Haptic feedback enhances collision avoidance by providing directional obstacle information to operators in unmanned aerial vehicle (UAV) teleoperation. However, such feedback is often rendered via haptic joysticks, which are unfamiliar to UAV operators and limited to single-directional force feedback. Additionally, the direct coupling of the input device and the feedback method diminishes the oper… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  29. arXiv:2407.10545  [pdf, other

    cs.LG cs.AI cs.CV

    Efficient Continual Learning with Low Memory Footprint For Edge Device

    Authors: Zeqing Wang, Fei Cheng, Kangye Ji, Bohu Huang

    Abstract: Continual learning(CL) is a useful technique to acquire dynamic knowledge continually. Although powerful cloud platforms can fully exert the ability of CL,e.g., customized recommendation systems, similar personalized requirements for edge devices are almost disregarded. This phenomenon stems from the huge resource overhead involved in training neural networks and overcoming the forgetting problem… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  30. arXiv:2407.10132  [pdf, other

    cs.LG stat.ME

    Optimal Kernel Choice for Score Function-based Causal Discovery

    Authors: Wenjie Wang, Biwei Huang, Feng Liu, Xinge You, Tongliang Liu, Kun Zhang, Mingming Gong

    Abstract: Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appr… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML2024

  31. arXiv:2407.08942  [pdf

    cs.IR cs.AI

    A Neural Matrix Decomposition Recommender System Model based on the Multimodal Large Language Model

    Authors: Ao Xiang, Bingjie Huang, Xinyu Guo, Haowei Yang, Tianyao Zheng

    Abstract: Recommendation systems have become an important solution to information search problems. This article proposes a neural matrix factorization recommendation system model based on the multimodal large language model called BoNMF. This model combines BoBERTa's powerful capabilities in natural language processing, ViT in computer in vision, and neural matrix decomposition technology. By capturing the… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  32. arXiv:2407.01050  [pdf, other

    cs.RO cs.AI

    Evolutionary Morphology Towards Overconstrained Locomotion via Large-Scale, Multi-Terrain Deep Reinforcement Learning

    Authors: Yenan Chen, Chuye Zhang, Pengxi Gu, Jianuo Qiu, Jiayi Yin, Nuofan Qiu, Guojing Huang, Bangchao Huang, Zishang Zhang, Hui Deng, Wei Zhang, Fang Wan, Chaoyang Song

    Abstract: While the animals' Fin-to-Limb evolution has been well-researched in biology, such morphological transformation remains under-adopted in the modern design of advanced robotic limbs. This paper investigates a novel class of overconstrained locomotion from a design and learning perspective inspired by evolutionary morphology, aiming to integrate the concept of `intelligent design under constraints'… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 13 pages, 5 figures, Accepted and Presented at ReMAR2024

  33. arXiv:2407.00944  [pdf

    cs.CV

    Diffusion Transformer Model With Compact Prior for Low-dose PET Reconstruction

    Authors: Bin Huang, Xubiao Liu, Lei Fang, Qiegen Liu, Bingxuan Li

    Abstract: Positron emission tomography (PET) is an advanced medical imaging technique that plays a crucial role in non-invasive clinical diagnosis. However, while reducing radiation exposure through low-dose PET scans is beneficial for patient safety, it often results in insufficient statistical data. This scarcity of data poses significant challenges for accurately reconstructing high-quality images, which… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  34. arXiv:2406.19617  [pdf, ps, other

    cs.LG cs.IT math.OC

    Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity

    Authors: Qian Yu, Yining Wang, Baihe Huang, Qi Lei, Jason D. Lee

    Abstract: Optimization of convex functions under stochastic zeroth-order feedback has been a major and challenging question in online learning. In this work, we consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries. We provide the first tight characterization for the rate of the mi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  35. arXiv:2406.15334  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

    Authors: Brandon Huang, Chancharik Mitra, Assaf Arbelle, Leonid Karlinsky, Trevor Darrell, Roei Herzig

    Abstract: The recent success of interleaved Large Multimodal Models (LMMs) in few-shot learning suggests that in-context learning (ICL) with many examples can be promising for learning new tasks. However, this many-shot multimodal ICL setting has one crucial problem: it is fundamentally limited by the model's context length set at pretraining. The problem is especially prominent in the multimodal domain, wh… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  36. arXiv:2406.14103  [pdf, other

    cs.AI

    Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation

    Authors: Yanwei Zheng, Shaopu Feng, Bowen Huang, Changrui Li, Xiao Zhang, Dongxiao Yu

    Abstract: The task that requires an agent to navigate to a given object through only visual observation is called visual object navigation (VON). The main bottlenecks of VON are strategies exploration and prior knowledge exploitation. Traditional strategies exploration ignores the differences of searching and navigating stages, using the same reward in two stages, which reduces navigation performance and tr… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  37. arXiv:2406.13317  [pdf, other

    cs.CV

    M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere

    Authors: Mengqiu Xu, Ming Wu, Kaixin Chen, Yixiang Huang, Mingrui Xu, Yujia Yang, Yiqing Feng, Yiying Guo, Bin Huang, Dongliang Chang, Zhenwei Shi, Chuang Zhang, Zhanyu Ma, Jun Guo

    Abstract: Marine fog poses a significant hazard to global shipping, necessitating effective detection and forecasting to reduce economic losses. In recent years, several machine learning (ML) methods have demonstrated superior detection accuracy compared to traditional meteorological methods. However, most of these works are developed on proprietary datasets, and the few publicly accessible datasets are oft… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  38. arXiv:2406.13219  [pdf, other

    cs.CV cs.CL

    MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

    Authors: Junzhe Zhang, Huixuan Zhang, Xunjian Yin, Baizhou Huang, Xu Zhang, Xinyu Hu, Xiaojun Wan

    Abstract: Multimodal large language models (MLLMs) are prone to non-factual or outdated knowledge issues, which can manifest as misreading and misrecognition errors due to the complexity of multimodal knowledge. Previous benchmarks have not systematically analyzed the performance of editing methods in correcting these two error types. To better represent and correct these errors, we decompose multimodal kno… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  39. arXiv:2406.09489  [pdf, other

    cs.CV

    Language-driven Grasp Detection

    Authors: An Dinh Vuong, Minh Nhat Vu, Baoru Huang, Nghia Nguyen, Hieu Le, Thieu Vo, Anh Nguyen

    Abstract: Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language-driven grasp detection dataset featuring 1M samp… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 19 pages. Accepted to CVPR24

  40. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, Jinming Guo, Xiaolin Chen, Jingcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  41. arXiv:2406.07558  [pdf, other

    cs.CY cs.AI cs.CV

    An AI-Enabled Framework Within Reach for Enhancing Healthcare Sustainability and Fairness

    Authors: Bin Huang, Changchen Zhao, Zimeng Liu, Shenda Hong, Baochang Zhang, Hao Lu, Zhijun Liu, Wenjin Wang, Hui Liu

    Abstract: Good health and well-being is among key issues in the United Nations 2030 Sustainable Development Goals. The rising prevalence of large-scale infectious diseases and the accelerated aging of the global population are driving the transformation of healthcare technologies. In this context, establishing large-scale public health datasets, developing medical models, and creating decision-making system… ▽ More

    Submitted 1 August, 2024; v1 submitted 21 April, 2024; originally announced June 2024.

    Comments: 16 pages, 5 figures

  42. arXiv:2406.07275  [pdf, other

    cs.AI

    DCA-Bench: A Benchmark for Dataset Curation Agents

    Authors: Benhao Huang, Yingzhuo Yu, Jin Huang, Xingjian Zhang, Jiaqi Ma

    Abstract: The quality of datasets plays an increasingly crucial role in the research and development of modern artificial intelligence (AI). Despite the proliferation of open dataset platforms nowadays, data quality issues, such as insufficient documentation, inaccurate annotations, and ethical concerns, remain common in datasets widely used in AI. Furthermore, these issues are often subtle and difficult to… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  43. arXiv:2406.06050  [pdf, other

    cs.CV

    Generalizable Human Gaussians from Single-View Image

    Authors: Jinnan Chen, Chen Li, Jianfeng Zhang, Hanlin Chen, Buzhen Huang, Gim Hee Lee

    Abstract: In this work, we tackle the task of learning generalizable 3D human Gaussians from a single image. The main challenge for this task is to recover detailed geometry and appearance, especially for the unobserved regions. To this end, we propose single-view generalizable Human Gaussian model (HGM), a diffusion-guided framework for 3D human modeling from a single image. We design a diffusion-based coa… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  44. arXiv:2406.03836  [pdf, other

    cs.CR cs.AI

    Proactive Detection of Physical Inter-rule Vulnerabilities in IoT Services Using a Deep Learning Approach

    Authors: Bing Huang, Chen Chen, Kwok-Yan Lam, Fuqun Huang

    Abstract: Emerging Internet of Things (IoT) platforms provide sophisticated capabilities to automate IoT services by enabling occupants to create trigger-action rules. Multiple trigger-action rules can physically interact with each other via shared environment channels, such as temperature, humidity, and illumination. We refer to inter-rule interactions via shared environment channels as a physical inter-ru… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE ICWS 2024 Workshop

  45. arXiv:2406.00519  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Discrete Concepts in Latent Hierarchical Models

    Authors: Lingjing Kong, Guangyi Chen, Biwei Huang, Eric P. Xing, Yuejie Chi, Kun Zhang

    Abstract: Learning concepts from natural high-dimensional data (e.g., images) holds potential in building human-aligned and interpretable machine learning models. Despite its encouraging prospect, formalization and theoretical insights into this crucial task are still lacking. In this work, we formalize concepts as discrete latent causal variables that are related via a hierarchical causal model that encode… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  46. arXiv:2406.00492  [pdf, other

    eess.IV cs.CV cs.LG

    SAM-VMNet: Deep Neural Networks For Coronary Angiography Vessel Segmentation

    Authors: Xueying Zeng, Baixiang Huang, Yu Luo, Guangyu Wei, Songyan He, Yushuang Shao

    Abstract: Coronary artery disease (CAD) is one of the most prevalent diseases in the cardiovascular field and one of the major contributors to death worldwide. Computed Tomography Angiography (CTA) images are regarded as the authoritative standard for the diagnosis of coronary artery disease, and by performing vessel segmentation and stenosis detection on CTA images, physicians are able to diagnose coronary… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  47. arXiv:2405.17167  [pdf

    eess.IV cs.CV

    Partitioned Hankel-based Diffusion Models for Few-shot Low-dose CT Reconstruction

    Authors: Wenhao Zhang, Bin Huang, Shuyue Chen, Xiaoling Xu, Weiwen Wu, Qiegen Liu

    Abstract: Low-dose computed tomography (LDCT) plays a vital role in clinical applications by mitigating radiation risks. Nevertheless, reducing radiation doses significantly degrades image quality. Concurrently, common deep learning methods demand extensive data, posing concerns about privacy, cost, and time constraints. Consequently, we propose a few-shot low-dose CT reconstruction method using Partitioned… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  48. arXiv:2405.17137  [pdf, other

    cs.CV

    Jump-teaching: Ultra Efficient and Robust Learning with Noisy Label

    Authors: Kangye Ji, Fei Cheng, Zeqing Wang, Bohu Huang

    Abstract: Sample selection is the most straightforward technique to combat label noise, aiming to distinguish mislabeled samples during training and avoid the degradation of the robustness of the model. In the workflow, $\textit{selecting possibly clean data}$ and $\textit{model update}$ are iterative. However, their interplay and intrinsic characteristics hinder the robustness and efficiency of learning wi… ▽ More

    Submitted 27 August, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  49. arXiv:2405.17132  [pdf, other

    cs.LG

    Your decision path does matter in pre-training industrial recommenders with multi-source behaviors

    Authors: Chunjing Gan, Binbin Hu, Bo Huang, Ziqi Liu, Jian Ma, Zhiqiang Zhang, Wenliang Zhong, Jun Zhou

    Abstract: Online service platforms offering a wide range of services through miniapps have become crucial for users who visit these platforms with clear intentions to find services they are interested in. Aiming at effective content delivery, cross-domain recommendation are introduced to learn high-quality representations by transferring behaviors from data-rich scenarios. However, these methods overlook th… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  50. arXiv:2405.15165  [pdf, other

    cs.CL cs.AI cs.SE

    A Solution-based LLM API-using Methodology for Academic Information Seeking

    Authors: Yuanchun Wang, Jifan Yu, Zijun Yao, Jing Zhang, Yuyang Xie, Shangqing Tu, Yiyang Fu, Youhe Feng, Jinkai Zhang, Jingyao Zhang, Bowen Huang, Yuanyao Li, Huihui Yuan, Lei Hou, Juanzi Li, Jie Tang

    Abstract: Applying large language models (LLMs) for academic API usage shows promise in reducing researchers' academic information seeking efforts. However, current LLM API-using methods struggle with complex API coupling commonly encountered in academic queries. To address this, we introduce SoAy, a solution-based LLM API-using methodology for academic information seeking. It uses code with a solution as t… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 22 pages, 13 figures