Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 132 results for author: Hao, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.15273  [pdf, other

    cs.LG cs.AI

    Unifying Invariant and Variant Features for Graph Out-of-Distribution via Probability of Necessity and Sufficiency

    Authors: Xuexin Chen, Ruichu Cai, Kaitao Zheng, Zhifan Jiang, Zhengting Huang, Zhifeng Hao, Zijian Li

    Abstract: Graph Out-of-Distribution (OOD), requiring that models trained on biased data generalize to the unseen test data, has considerable real-world applications. One of the most mainstream methods is to extract the invariant subgraph by aligning the original and augmented data with the help of environment augmentation. However, these solutions might lead to the loss or redundancy of semantic subgraphs a… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  2. arXiv:2407.11578  [pdf, other

    cs.CV eess.IV

    UP-Diff: Latent Diffusion Model for Remote Sensing Urban Prediction

    Authors: Zeyu Wang, Zecheng Hao, Jingyu Lin, Yuchao Feng, Yufei Guo

    Abstract: This study introduces a novel Remote Sensing (RS) Urban Prediction (UP) task focused on future urban planning, which aims to forecast urban layouts by utilizing information from existing urban layouts and planned change maps. To address the proposed RS UP task, we propose UP-Diff, which leverages a Latent Diffusion Model (LDM) to capture positionaware embeddings of pre-change urban layouts and pla… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  3. arXiv:2407.05232  [pdf, other

    cs.LG

    PAPM: A Physics-aware Proxy Model for Process Systems

    Authors: Pengwei Liu, Zhongkai Hao, Xingyu Ren, Hangjie Yuan, Jiayang Ren, Dong Ni

    Abstract: In the context of proxy modeling for process systems, traditional data-driven deep learning approaches frequently encounter significant challenges, such as substantial training costs induced by large amounts of data, and limited generalization capabilities. As a promising alternative, physics-aware models incorporate partial physics knowledge to ameliorate these challenges. Although demonstrating… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  4. arXiv:2407.04064  [pdf, other

    cs.RO

    Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

    Authors: Jiafan Zhuang, Zihao Xia, Gaofei Han, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

    Abstract: Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie,… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  5. arXiv:2407.04056  [pdf, other

    cs.RO

    Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection

    Authors: Jiafan Zhuang, Gaofei Han, Zihao Xia, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

    Abstract: In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  6. arXiv:2406.19195  [pdf, other

    cs.LG cs.AI

    Estimating Long-term Heterogeneous Dose-response Curve: Generalization Bound Leveraging Optimal Transport Weights

    Authors: Zeqin Yang, Weilin Chen, Ruichu Cai, Yuguang Yan, Zhifeng Hao, Zhipeng Yu, Zhichao Zou, Zhen Peng, Jiecheng Guo

    Abstract: Long-term causal effect estimation is a significant but challenging problem in many applications. Existing methods rely on ideal assumptions to estimate long-term average effects, e.g., no unobserved confounders or a binary treatment,while in numerous real-world applications, these assumptions could be violated and average effects are unable to provide individual-level suggestions.In this paper,we… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  7. arXiv:2406.15166  [pdf, other

    cond-mat.soft cs.GR

    Inverse Design of Planar Clamped-Free Elastic Rods from Noisy Data

    Authors: Dezhong Tong, Zhuonan Hao, Weicheng Huang

    Abstract: Slender structures, such as rods, often exhibit large nonlinear geometrical deformations even under moderate external forces (e.g., gravity). This characteristic results in a rich variety of morphological changes, making them appealing for engineering design and applications, such as soft robots, submarine cables, decorative knots, and more. Prior studies have demonstrated that the natural shape o… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 21 pages, 9 figures

  8. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  9. arXiv:2406.11050  [pdf, other

    cs.CL cs.AI

    A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners

    Authors: Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth

    Abstract: This study introduces a hypothesis-testing framework to assess whether large language models (LLMs) possess genuine reasoning abilities or primarily depend on token bias. We go beyond evaluating LLMs on accuracy; rather, we aim to investigate their token bias in solving logical reasoning tasks. Specifically, we develop carefully controlled synthetic datasets, featuring conjunction fallacy and syll… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Codes are open-sourced at https://github.com/bowen-upenn/llm_token_bias

  10. arXiv:2406.10881  [pdf, other

    cs.CL

    Teaching Large Language Models to Express Knowledge Boundary from Their Own Signals

    Authors: Lida Chen, Zujie Liang, Xintao Wang, Jiaqing Liang, Yanghua Xiao, Feng Wei, Jinglei Chen, Zhenghong Hao, Bing Han, Wei Wang

    Abstract: Large language models (LLMs) have achieved great success, but their occasional content fabrication, or hallucination, limits their practical application. Hallucination arises because LLMs struggle to admit ignorance due to inadequate training on knowledge boundaries. We call it a limitation of LLMs that they can not accurately express their knowledge boundary, answering questions they know while a… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  11. arXiv:2406.07020  [pdf, other

    cs.LG

    Learning Discrete Latent Variable Structures with Tensor Rank Conditions

    Authors: Zhengming Chen, Ruichu Cai, Feng Xie, Jie Qiao, Anpeng Wu, Zijian Li, Zhifeng Hao, Kun Zhang

    Abstract: Unobserved discrete data are ubiquitous in many scientific disciplines, and how to learn the causal structure of these latent variables is crucial for uncovering data patterns. Most studies focus on the linear latent variable model or impose strict constraints on latent structures, which fail to address cases in discrete data involving non-linear relationships or complex latent structures. To achi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  12. arXiv:2406.02902  [pdf, other

    cs.CL

    S$^2$GSL: Incorporating Segment to Syntactic Enhanced Graph Structure Learning for Aspect-based Sentiment Analysis

    Authors: Bingfeng Chen, Qihan Ouyang, Yongqi Luo, Boyan Xu, Ruichu Cai, Zhifeng Hao

    Abstract: Previous graph-based approaches in Aspect based Sentiment Analysis(ABSA) have demonstrated impressive performance by utilizing graph neural networks and attention mechanisms to learn structures of static dependency trees and dynamic latent trees. However, incorporating both semantic and syntactic information simultaneously within complex global structures can introduce irrelevant contexts and synt… ▽ More

    Submitted 7 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL2024(main)

  13. arXiv:2406.02147  [pdf, other

    cs.CV

    UA-Track: Uncertainty-Aware End-to-End 3D Multi-Object Tracking

    Authors: Lijun Zhou, Tao Tang, Pengkun Hao, Zihang He, Kalok Ho, Shuo Gu, Wenbo Hou, Zhihui Hao, Haiyang Sun, Kun Zhan, Peng Jia, Xianpeng Lang, Xiaodan Liang

    Abstract: 3D multiple object tracking (MOT) plays a crucial role in autonomous driving perception. Recent end-to-end query-based trackers simultaneously detect and track objects, which have shown promising potential for the 3D MOT task. However, existing methods overlook the uncertainty issue, which refers to the lack of precise confidence about the state and location of tracked objects. Uncertainty arises… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  14. arXiv:2405.20355  [pdf, other

    cs.NE cs.CR cs.CV cs.LG

    Enhancing Adversarial Robustness in SNNs with Sparse Gradients

    Authors: Yujia Liu, Tong Bu, Jianhao Ding, Zecheng Hao, Tiejun Huang, Zhaofei Yu

    Abstract: Spiking Neural Networks (SNNs) have attracted great attention for their energy-efficient operations and biologically inspired structures, offering potential advantages over Artificial Neural Networks (ANNs) in terms of energy efficiency and interpretability. Nonetheless, similar to ANNs, the robustness of SNNs remains a challenge, especially when facing adversarial attacks. Existing techniques, wh… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: accepted by ICML 2024

  15. arXiv:2405.17509  [pdf, other

    cs.LG

    Reference Neural Operators: Learning the Smooth Dependence of Solutions of PDEs on Geometric Deformations

    Authors: Ze Cheng, Zhongkai Hao, Xiaoqiang Wang, Jianing Huang, Youjia Wu, Xudan Liu, Yiru Zhao, Songming Liu, Hang Su

    Abstract: For partial differential equations on domains of arbitrary shapes, existing works of neural operators attempt to learn a mapping from geometries to solutions. It often requires a large dataset of geometry-solution pairs in order to obtain a sufficiently accurate neural operator. However, for many industrial applications, e.g., engineering design optimization, it can be prohibitive to satisfy the r… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  16. arXiv:2405.16083  [pdf, other

    cs.LG

    From Orthogonality to Dependency: Learning Disentangled Representation for Multi-Modal Time-Series Sensing Signals

    Authors: Ruichu Cai, Zhifang Jiang, Zijian Li, Weilin Chen, Xuexin Chen, Zhifeng Hao, Yifan Shen, Guangyi Chen, Kun Zhang

    Abstract: Existing methods for multi-modal time series representation learning aim to disentangle the modality-shared and modality-specific latent variables. Although achieving notable performances on downstream tasks, they usually assume an orthogonal latent space. However, the modality-specific and modality-shared latent variables might be dependent on real-world scenarios. Therefore, we propose a general… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  17. arXiv:2405.14073  [pdf, other

    cs.LG

    PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning

    Authors: Chengyang Ying, Zhongkai Hao, Xinning Zhou, Xuezhou Xu, Hang Su, Xingxing Zhang, Jun Zhu

    Abstract: Designing generalizable agents capable of adapting to diverse embodiments has achieved significant attention in Reinforcement Learning (RL), which is critical for deploying RL agents in various real-world applications. Previous Cross-Embodiment RL approaches have focused on transferring knowledge across embodiments within specific tasks. These methods often result in knowledge tightly coupled with… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  18. arXiv:2405.11225  [pdf, other

    cs.SI cs.AI

    SeBot: Structural Entropy Guided Multi-View Contrastive Learning for Social Bot Detection

    Authors: Yingguang Yang, Qi Wu, Buyun He, Hao Peng, Renyu Yang, Zhifeng Hao, Yong Liao

    Abstract: Recent advancements in social bot detection have been driven by the adoption of Graph Neural Networks. The social graph, constructed from social network interactions, contains benign and bot accounts that influence each other. However, previous graph-based detection methods that follow the transductive message-passing paradigm may not fully utilize hidden graph information and are vulnerable to ad… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: KDD 2024

  19. arXiv:2405.10748  [pdf, other

    cs.CV

    Deep Data Consistency: a Fast and Robust Diffusion Model-based Solver for Inverse Problems

    Authors: Hanyu Chen, Zhixiu Hao, Liying Xiao

    Abstract: Diffusion models have become a successful approach for solving various image inverse problems by providing a powerful diffusion prior. Many studies tried to combine the measurement into diffusion by score function replacement, matrix decomposition, or optimization algorithms, but it is hard to balance the data consistency and realness. The slow sampling speed is also a main obstacle to its wide ap… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Codes: https://github.com/Hanyu-Chen373/DeepDataConsistency

  20. arXiv:2405.07096  [pdf, other

    cs.SI cs.IT

    Multi-Relational Structural Entropy

    Authors: Yuwei Cao, Hao Peng, Angsheng Li, Chenyu You, Zhifeng Hao, Philip S Yu

    Abstract: Structural Entropy (SE) measures the structural information contained in a graph. Minimizing or maximizing SE helps to reveal or obscure the intrinsic structural patterns underlying graphs in an interpretable manner, finding applications in various tasks driven by networked data. However, SE ignores the heterogeneity inherent in the graph relations, which is ubiquitous in modern networks. In this… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted to UAI 2024

  21. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  22. arXiv:2405.04114  [pdf, other

    cs.LG cs.AI

    Acceleration Algorithms in GNNs: A Survey

    Authors: Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Wentao Zhang, Bin Cui

    Abstract: Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph-based tasks. However, their inefficiency in training and inference presents challenges for scaling up to real-world and large-scale graph applications. To address the critical challenges, a range of algorithms have been proposed to accelerate training and inference of GNNs, attracting increasing attention from the resear… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 9 pages,3 figures

  23. arXiv:2405.03342  [pdf, other

    cs.LG

    Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning

    Authors: Weilin Chen, Ruichu Cai, Zeqin Yang, Jie Qiao, Yuguang Yan, Zijian Li, Zhifeng Hao

    Abstract: Causal effect estimation under networked interference is an important but challenging problem. Available parametric methods are limited in their model space, while previous semiparametric methods, e.g., leveraging neural networks to fit only one single nuisance function, may still encounter misspecification problems under networked interference without appropriate assumptions on the data generatio… ▽ More

    Submitted 5 July, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  24. arXiv:2405.02291  [pdf, other

    cs.RO

    Bundling and Tumbling in Bacterial-inspired Bi-flagellated Soft Robots for Attitude Adjustment

    Authors: Zhuonan Hao, Siddharth Zalavadia, Mohammad Khalid Jawed

    Abstract: We create a mechanism inspired by bacterial swimmers, featuring two flexible flagella with individual control over rotation speed and direction in viscous fluid environments. Using readily available materials, we design and fabricate silicone-based helical flagella. To simulate the robot's motion, we develop a physics-based computational tool, drawing inspiration from computer graphics. The framew… ▽ More

    Submitted 19 January, 2024; originally announced May 2024.

  25. Unsupervised Social Bot Detection via Structural Information Theory

    Authors: Hao Peng, Jingyun Zhang, Xiang Huang, Zhifeng Hao, Angsheng Li, Zhengtao Yu, Philip S. Yu

    Abstract: Research on social bot detection plays a crucial role in maintaining the order and reliability of information dissemination while increasing trust in social interactions. The current mainstream social bot detection models rely on black-box neural network technology, e.g., Graph Neural Network, Transformer, etc., which lacks interpretability. In this work, we present UnDBot, a novel unsupervised, i… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 42 pages, 12 figures, accepted for publication in Transactions on Information Systems

  26. arXiv:2404.11202  [pdf, other

    cs.CV

    GhostNetV3: Exploring the Training Strategies for Compact Models

    Authors: Zhenhua Liu, Zhiwei Hao, Kai Han, Yehui Tang, Yunhe Wang

    Abstract: Compact neural networks are specially designed for applications on edge devices with faster inference speed yet modest performance. However, training strategies of compact models are borrowed from that of conventional models at present, which ignores their difference in model capacity and thus may impede the performance of compact models. In this paper, by systematically investigating the impact o… ▽ More

    Submitted 21 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  27. arXiv:2404.08450  [pdf, other

    cs.CV

    Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

    Authors: Xianhua He, Dashuang Liang, Song Yang, Zhanlong Hao, Hui Ma, Binjie Mao, Xi Li, Yao Wang, Pengfei Yan, Ajian Liu

    Abstract: Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to dev… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 10 pages with 6 figures, Accepted by CVPRW 2024

  28. arXiv:2403.16523  [pdf, other

    stat.ML cs.AI cs.LG

    Causal Discovery from Poisson Branching Structural Causal Model Using High-Order Cumulant with Path Analysis

    Authors: Jie Qiao, Yu Xiang, Zhengming Chen, Ruichu Cai, Zhifeng Hao

    Abstract: Count data naturally arise in many fields, such as finance, neuroscience, and epidemiology, and discovering causal structure among count data is a crucial task in various scientific and industrial scenarios. One of the most common characteristics of count data is the inherent branching structure described by a binomial thinning operator and an independent Poisson distribution that captures both br… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI-2024

  29. arXiv:2403.16353  [pdf, other

    cs.IT eess.SP

    Energy-Efficient Hybrid Beamforming with Dynamic On-off Control for Integrated Sensing, Communications, and Powering

    Authors: Zeyu Hao, Yuan Fang, Xianghao Yu, Jie Xu, Ling Qiu, Lexi Xu, Shuguang Cui

    Abstract: This paper investigates the energy-efficient hybrid beamforming design for a multi-functional integrated sensing, communications, and powering (ISCAP) system. In this system, a base station (BS) with a hybrid analog-digital (HAD) architecture sends unified wireless signals to communicate with multiple information receivers (IRs), sense multiple point targets, and wirelessly charge multiple energy… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 13 pages, 6 figures, submitted to IEEE Transactions on Communications

  30. arXiv:2403.14302  [pdf, other

    cs.NE cs.CV cs.LG

    SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

    Authors: Xinyu Shi, Zecheng Hao, Zhaofei Yu

    Abstract: The remarkable success of Vision Transformers in Artificial Neural Networks (ANNs) has led to a growing interest in incorporating the self-attention mechanism and transformer-based architecture into Spiking Neural Networks (SNNs). While existing methods propose spiking self-attention mechanisms that are compatible with SNNs, they lack reasonable scaling methods, and the overall architectures propo… ▽ More

    Submitted 28 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: To be published in the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  31. arXiv:2403.09355  [pdf, other

    eess.IV cs.CV

    Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction

    Authors: Hanyu Chen, Zhixiu Hao, Lin Guo, Liying Xiao

    Abstract: Sparse-view Computed Tomography (CT) image reconstruction is a promising approach to reduce radiation exposure, but it inevitably leads to image degradation. Although diffusion model-based approaches are computationally expensive and suffer from the training-sampling discrepancy, they provide a potential solution to the problem. This study introduces a novel Cascaded Diffusion with Discrepancy Mit… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  32. arXiv:2403.03542  [pdf, other

    cs.LG math.NA

    DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

    Authors: Zhongkai Hao, Chang Su, Songming Liu, Julius Berner, Chengyang Ying, Hang Su, Anima Anandkumar, Jian Song, Jun Zhu

    Abstract: Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training s… ▽ More

    Submitted 6 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  33. arXiv:2402.17133  [pdf, other

    cs.CV

    SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution

    Authors: Chengcheng Wang, Zhiwei Hao, Yehui Tang, Jianyuan Guo, Yujie Yang, Kai Han, Yunhe Wang

    Abstract: Diffusion-based super-resolution (SR) models have recently garnered significant attention due to their potent restoration capabilities. But conventional diffusion models perform noise sampling from a single distribution, constraining their ability to handle real-world scenes and complex textures across semantic regions. With the success of segment anything model (SAM), generating sufficiently fine… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  34. arXiv:2402.15819  [pdf, other

    cs.IR cs.LG

    Debiased Model-based Interactive Recommendation

    Authors: Zijian Li, Ruichu Cai, Haiqin Huang, Sili Zhang, Yuguang Yan, Zhifeng Hao, Zhenghua Dong

    Abstract: Existing model-based interactive recommendation systems are trained by querying a world model to capture the user preference, but learning the world model from historical logged data will easily suffer from bias issues such as popularity bias and sampling bias. This is why some debiased methods have been proposed recently. However, two essential drawbacks still remain: 1) ignoring the dynamics of… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  35. arXiv:2402.09165  [pdf, other

    cs.LG

    Unifying Invariance and Spuriousity for Graph Out-of-Distribution via Probability of Necessity and Sufficiency

    Authors: Xuexin Chen, Ruichu Cai, Kaitao Zheng, Zhifan Jiang, Zhengting Huang, Zhifeng Hao, Zijian Li

    Abstract: Graph Out-of-Distribution (OOD), requiring that models trained on biased data generalize to the unseen test data, has a massive of real-world applications. One of the most mainstream methods is to extract the invariant subgraph by aligning the original and augmented data with the help of environment augmentation. However, these solutions might lead to the loss or redundancy of semantic subgraph an… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  36. arXiv:2402.08845  [pdf, other

    cs.LG stat.ME

    Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

    Authors: Xuexin Chen, Ruichu Cai, Zhengting Huang, Yuxuan Zhu, Julien Horwood, Zhifeng Hao, Zijian Li, Jose Miguel Hernandez-Lobato

    Abstract: We investigate the problem of explainability for machine learning models, focusing on Feature Attribution Methods (FAMs) that evaluate feature importance through perturbation tests. Despite their utility, FAMs struggle to distinguish the contributions of different features, when their prediction changes are similar after perturbation. To enhance FAMs' discriminative power, we introduce Feature Att… ▽ More

    Submitted 4 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted in the Proceedings of the 41st International Conference on Machine Learning (ICML2024)

  37. arXiv:2402.04869  [pdf, other

    cs.LG cs.AI

    Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy

    Authors: Ruichu Cai, Siyang Huang, Jie Qiao, Wei Chen, Yan Zeng, Keli Zhang, Fuchun Sun, Yang Yu, Zhifeng Hao

    Abstract: As a key component to intuitive cognition and reasoning solutions in human intelligence, causal knowledge provides great potential for reinforcement learning (RL) agents' interpretability towards decision-making by helping reduce the searching space. However, there is still a considerable gap in discovering and incorporating causality into RL, which hinders the rapid development of causal RL. In t… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  38. arXiv:2402.04841  [pdf, other

    cs.CV

    Data-efficient Large Vision Models through Sequential Autoregression

    Authors: Jianyuan Guo, Zhiwei Hao, Chengcheng Wang, Yehui Tang, Han Wu, Han Hu, Kai Han, Chang Xu

    Abstract: Training general-purpose vision models on purely sequential visual data, eschewing linguistic inputs, has heralded a new frontier in visual understanding. These models are intended to not only comprehend but also seamlessly transit to out-of-domain tasks. However, current endeavors are hamstrung by an over-reliance on colossal models, exemplified by models with upwards of 3B parameters, and the ne… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: 15 pages

    Journal ref: ICML 2024

  39. arXiv:2402.02316  [pdf, other

    cs.LG cs.CV

    Your Diffusion Model is Secretly a Certifiably Robust Classifier

    Authors: Huanran Chen, Yinpeng Dong, Shitong Shao, Zhongkai Hao, Xiao Yang, Hang Su, Jun Zhu

    Abstract: Diffusion models are recently employed as generative classifiers for robust classification. However, a comprehensive theoretical understanding of the robustness of diffusion classifiers is still lacking, leading us to question whether they will be vulnerable to future stronger attacks. In this study, we propose a new family of diffusion classifiers, named Noised Diffusion Classifiers~(NDCs), that… ▽ More

    Submitted 13 February, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  40. arXiv:2402.00531  [pdf, other

    cs.LG math.NA

    Preconditioning for Physics-Informed Neural Networks

    Authors: Songming Liu, Chang Su, Jiachen Yao, Zhongkai Hao, Hang Su, Youjia Wu, Jun Zhu

    Abstract: Physics-informed neural networks (PINNs) have shown promise in solving various partial differential equations (PDEs). However, training pathologies have negatively affected the convergence and prediction accuracy of PINNs, which further limits their practical applications. In this paper, we propose to use condition number as a metric to diagnose and mitigate the pathologies in PINNs. Inspired by c… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  41. arXiv:2402.00411  [pdf, other

    cs.NE cs.AI cs.CV

    LM-HT SNN: Enhancing the Performance of SNN to ANN Counterpart through Learnable Multi-hierarchical Threshold Model

    Authors: Zecheng Hao, Xinyu Shi, Zhiyu Pan, Yujia Liu, Zhaofei Yu, Tiejun Huang

    Abstract: Compared to traditional Artificial Neural Network (ANN), Spiking Neural Network (SNN) has garnered widespread academic interest for its intrinsic ability to transmit information in a more biological-inspired and energy-efficient manner. However, despite previous efforts to optimize the learning gradients and model structure of SNNs through various methods, SNNs still lag behind ANNs in terms of pe… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 15 pages, 2 figures

  42. arXiv:2401.12471  [pdf, ps, other

    cs.CV

    Zero Shot Open-ended Video Inference

    Authors: Ee Yeo Keat, Zhang Hao, Alexander Matyasko, Basura Fernando

    Abstract: Zero-shot open-ended inference on untrimmed videos poses a significant challenge, especially when no annotated data is utilized to navigate the inference direction. In this work, we aim to address this underexplored domain by introducing an adaptable framework that efficiently combines both the frozen vision-language (VL) model and off-the-shelf large language model (LLM) for conducting zero-shot… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  43. arXiv:2401.09516  [pdf, other

    cs.LG cs.AI math.NA

    Accelerating Data Generation for Neural Operators via Krylov Subspace Recycling

    Authors: Hong Wang, Zhongkai Hao, Jie Wang, Zijie Geng, Zhen Wang, Bin Li, Feng Wu

    Abstract: Learning neural operators for solving partial differential equations (PDEs) has attracted great attention due to its high inference efficiency. However, training such operators requires generating a substantial amount of labeled data, i.e., PDE problems together with their solutions. The data generation process is exceptionally time-consuming, as it involves solving numerous systems of linear equa… ▽ More

    Submitted 19 March, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  44. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  45. arXiv:2401.02682  [pdf, other

    cs.LG cs.SI

    Homophily-Related: Adaptive Hybrid Graph Filter for Multi-View Graph Clustering

    Authors: Zichen Wen, Yawen Ling, Yazhou Ren, Tianyi Wu, Jianpeng Chen, Xiaorong Pu, Zhifeng Hao, Lifang He

    Abstract: Recently there is a growing focus on graph data, and multi-view graph clustering has become a popular area of research interest. Most of the existing methods are only applicable to homophilous graphs, yet the extensive real-world graph data can hardly fulfill the homophily assumption, where the connected nodes tend to belong to the same class. Several studies have pointed out that the poor perform… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI2024

  46. arXiv:2312.13628  [pdf, other

    cs.LG

    Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

    Authors: Ruichu Cai, Yuxuan Zhu, Jie Qiao, Zefeng Liang, Furui Liu, Zhifeng Hao

    Abstract: Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted \emph{adversarial examples}, which are generated through either well-conceived $\mathcal{L}_p$-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreaso… ▽ More

    Submitted 26 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI-2024

  47. arXiv:2312.12206  [pdf, other

    cs.LG cs.AI stat.ME

    Identification of Causal Structure in the Presence of Missing Data with Additive Noise Model

    Authors: Jie Qiao, Zhengming Chen, Jianhua Yu, Ruichu Cai, Zhifeng Hao

    Abstract: Missing data are an unavoidable complication frequently encountered in many causal discovery tasks. When a missing process depends on the missing values themselves (known as self-masking missingness), the recovery of the joint distribution becomes unattainable, and detecting the presence of such self-masking missingness remains a perplexing challenge. Consequently, due to the inability to reconstr… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI-2024

  48. arXiv:2312.11934  [pdf, other

    cs.LG cs.AI stat.ME

    Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

    Authors: Wei Chen, Zhiyi Huang, Ruichu Cai, Zhifeng Hao, Kun Zhang

    Abstract: Causal discovery with latent variables is a crucial but challenging task. Despite the emergence of numerous methods aimed at addressing this challenge, they are not fully identified to the structure that two observed variables are influenced by one latent variable and there might be a directed edge in between. Interestingly, we notice that this structure can be identified through the utilization o… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  49. arXiv:2312.11152  [pdf, other

    cs.CL cs.AI

    Prompt Based Tri-Channel Graph Convolution Neural Network for Aspect Sentiment Triplet Extraction

    Authors: Kun Peng, Lei Jiang, Hao Peng, Rui Liu, Zhengtao Yu, Jiaqian Ren, Zhifeng Hao, Philip S. Yu

    Abstract: Aspect Sentiment Triplet Extraction (ASTE) is an emerging task to extract a given sentence's triplets, which consist of aspects, opinions, and sentiments. Recent studies tend to address this task with a table-filling paradigm, wherein word relations are encoded in a two-dimensional table, and the process involves clarifying all the individual cells to extract triples. However, these studies ignore… ▽ More

    Submitted 24 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted in SIAM International Conference on Data Mining (SDM24)

  50. arXiv:2312.06578  [pdf, other

    cs.LG

    Multi-class Support Vector Machine with Maximizing Minimum Margin

    Authors: Feiping Nie, Zhezheng Hao, Rong Wang

    Abstract: Support Vector Machine (SVM) stands out as a prominent machine learning technique widely applied in practical pattern recognition tasks. It achieves binary classification by maximizing the "margin", which represents the minimum distance between instances and the decision boundary. Although many efforts have been dedicated to expanding SVM for multi-class case through strategies such as one versus… ▽ More

    Submitted 14 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.