Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 414 results for author: Zheng, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13880  [pdf, other

    cs.CR

    Privacy-Preserving ECG Data Analysis with Differential Privacy: A Literature Review and A Case Study

    Authors: Arin Ghazarian, Jianwei Zheng, Cyril Rakovski

    Abstract: Differential privacy has become the preeminent technique to protect the privacy of individuals in a database while allowing useful results from data analysis to be shared. Notably, it guarantees the amount of privacy loss in the worst-case scenario. Although many theoretical research papers have been published, practical real-life application of differential privacy demands estimating several impo… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2406.12376  [pdf, other

    cs.CR

    DCS Chain: A Flexible Private Blockchain System

    Authors: Jianwu Zheng, Siyuan Zhao, Zheng Wang, Li Pan, Jianhua Li

    Abstract: Blockchain technology has seen tremendous development over the past few years. Despite the emergence of numerous blockchain systems, they all suffer from various limitations, which can all be attributed to the fundamental issue posed by the DCS trilemma. In light of this, this work introduces a novel private blockchain system named DCS Chain. The core idea is to quantify the DCS metrics and dynami… ▽ More

    Submitted 19 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2406.11369  [pdf, other

    cs.CG cs.DS

    Approximation Algorithms for Smallest Intersecting Balls

    Authors: Jiaqi Zheng, Tiow-Seng Tan

    Abstract: We study a general smallest intersecting ball problem and its soft-margin variant in high-dimensional Euclidean spaces, which only require the input objects to be compact and convex. These two problems link and unify a series of fundamental problems in computational geometry and machine learning, including smallest enclosing ball, polytope distance, intersection radius, $\ell_1$-loss support vecto… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.09321  [pdf, other

    cs.CR cs.AI cs.CL

    JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models

    Authors: Delong Ran, Jinyuan Liu, Yichen Gong, Jingyi Zheng, Xinlei He, Tianshuo Cong, Anyu Wang

    Abstract: Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful responses for forbidden instructions, presenting severe misuse threats to LLMs. Up to now, research into jailbreak attacks and defenses is emerging, however, there is (surprisingly) no consensus on how to evaluate whether a jailbreak attempt is successful. In other words, the methods to assess the harmfulness of an LL… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Our code is available at https://github.com/ThuCCSLab/JailbreakEval

  5. arXiv:2406.08079  [pdf, other

    cs.CV

    A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

    Authors: Lixian Zhang, Yi Zhao, Runmin Dong, Jinxiao Zhang, Shuai Yuan, Shilei Cao, Mengxuan Chen, Juepeng Zheng, Weijia Li, Wei Liu, Wayne Zhang, Litong Feng, Haohuan Fu

    Abstract: Vast amounts of remote sensing (RS) data provide Earth observations across multiple dimensions, encompassing critical spatial, temporal, and spectral information which is essential for addressing global-scale challenges such as land use monitoring, disaster prevention, and environmental change mitigation. Despite various pre-training methods tailored to the characteristics of RS data, a key limita… ▽ More

    Submitted 16 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  6. arXiv:2406.06391  [pdf, other

    cs.LG cs.CL

    Towards Lifelong Learning of Large Language Models: A Survey

    Authors: Junhao Zheng, Shengjie Qiu, Chengming Shi, Qianli Ma

    Abstract: As the applications of large language models (LLMs) expand across diverse fields, the ability of these models to adapt to ongoing changes in data, tasks, and user preferences becomes crucial. Traditional training methods, relying on static datasets, are increasingly inadequate for coping with the dynamic nature of real-world information. Lifelong learning, also known as continual or incremental le… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 37 pages

  7. arXiv:2406.05982  [pdf

    eess.IV cs.LG physics.med-ph

    Artificial Intelligence for Neuro MRI Acquisition: A Review

    Authors: Hongjia Yang, Guanhua Wang, Ziyu Li, Haoxiang Li, Jialan Zheng, Yuxin Hu, Xiaozhi Cao, Congyu Liao, Huihui Ye, Qiyuan Tian

    Abstract: Magnetic resonance imaging (MRI) has significantly benefited from the resurgence of artificial intelligence (AI). By leveraging AI's capabilities in large-scale optimization and pattern recognition, innovative methods are transforming the MRI acquisition workflow, including planning, sequence design, and correction of acquisition artifacts. These emerging algorithms demonstrate substantial potenti… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Submitted to MAGMA for review

  8. arXiv:2406.05849  [pdf, other

    cs.RO

    MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps

    Authors: Jianhao Zheng, Daniel Barath, Marc Pollefeys, Iro Armeni

    Abstract: Creating 3D semantic reconstructions of environments is fundamental to many applications, especially when related to autonomous agent operation (e.g., goal-oriented navigation or object interaction and manipulation). Commonly, 3D semantic reconstruction systems capture the entire scene in the same level of detail. However, certain tasks (e.g., object interaction) require a fine-grained and high-re… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  9. arXiv:2406.05766  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data With Soft Alignment

    Authors: Zijia Song, Zelin Zang, Yelin Wang, Guozheng Yang, Jiangbin Zheng, Kaicheng yu, Wanyu Chen, Stan Z. Li

    Abstract: Multimodal fusion breaks through the barriers between diverse modalities and has already yielded numerous impressive performances. However, in various specialized fields, it is struggling to obtain sufficient alignment data for the training process, which seriously limits the use of previously elegant models. Thus, semi-supervised learning attempts to achieve multimodal alignment with fewer matche… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  10. arXiv:2406.02605  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    A Novel Defense Against Poisoning Attacks on Federated Learning: LayerCAM Augmented with Autoencoder

    Authors: Jingjing Zheng, Xin Yuan, Kai Li, Wei Ni, Eduardo Tovar, Jon Crowcroft

    Abstract: Recent attacks on federated learning (FL) can introduce malicious model updates that circumvent widely adopted Euclidean distance-based detection methods. This paper proposes a novel defense strategy, referred to as LayerCAM-AE, designed to counteract model poisoning in federated learning. The LayerCAM-AE puts forth a new Layer Class Activation Mapping (LayerCAM) integrated with an autoencoder (AE… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  11. arXiv:2405.19783  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Instruction-Guided Visual Masking

    Authors: Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan Zhan

    Abstract: Instruction following is crucial in contemporary LLM. However, when extended to multimodal setting, it often suffers from misalignment between specific textual instruction and targeted local region of an image. To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with d… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: preprint, 21 pages

  12. arXiv:2405.19055  [pdf, other

    cs.CV

    FUSU: A Multi-temporal-source Land Use Change Segmentation Dataset for Fine-grained Urban Semantic Understanding

    Authors: Shuai Yuan, Guancong Lin, Lixian Zhang, Runmin Dong, Jinxiao Zhang, Shuang Chen, Juepeng Zheng, Jie Wang, Haohuan Fu

    Abstract: Fine urban change segmentation using multi-temporal remote sensing images is essential for understanding human-environment interactions in urban areas. Although there have been advances in high-quality land cover datasets that reveal the physical features of urban landscapes, the lack of fine-grained land use datasets hinders a deeper understanding of how human activities are distributed across th… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  13. arXiv:2405.19009  [pdf, other

    cs.CV

    Enhancing Vision-Language Model with Unmasked Token Alignment

    Authors: Jihao Liu, Jinliang Zheng, Boxiao Liu, Yu Liu, Hongsheng Li

    Abstract: Contrastive pre-training on image-text pairs, exemplified by CLIP, becomes a standard technique for learning multi-modal visual-language representations. Although CLIP has demonstrated remarkable performance, training it from scratch on noisy web-scale datasets is computationally demanding. On the other hand, mask-then-predict pre-training approaches, like Masked Image Modeling (MIM), offer effici… ▽ More

    Submitted 14 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by TMLR; Code and models are available at https://github.com/jihaonew/UTA

  14. arXiv:2405.18326  [pdf, other

    cs.CV

    VITON-DiT: Learning In-the-Wild Video Try-On from Human Dance Videos via Diffusion Transformers

    Authors: Jun Zheng, Fuwei Zhao, Youjiang Xu, Xin Dong, Xiaodan Liang

    Abstract: Video try-on stands as a promising area for its tremendous real-world potential. Prior works are limited to transferring product clothing images onto person videos with simple poses and backgrounds, while underperforming on casually captured videos. Recently, Sora revealed the scalability of Diffusion Transformer (DiT) in generating lifelike videos featuring real-world scenarios. Inspired by this,… ▽ More

    Submitted 7 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Project Page: https://zhengjun-ai.github.io/viton-dit-page/

  15. arXiv:2405.14701  [pdf, other

    cs.CV cs.AI

    High Fidelity Scene Text Synthesis

    Authors: Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin

    Abstract: Scene text synthesis involves rendering specified texts onto arbitrary images. Current methods typically formulate this task in an end-to-end manner but lack effective character-level guidance during training. Besides, their text encoders, pre-trained on a single font type, struggle to adapt to the diverse font styles encountered in practical applications. Consequently, these methods suffer from c… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  16. arXiv:2405.14293  [pdf, other

    cs.GT

    Sybil-Proof Mechanism for Information Propagation with Budgets

    Authors: Junjie Zheng, Xu Ge, Bin Li, Dengji Zhao

    Abstract: This paper examines the problem of distributing rewards on social networks to improve the efficiency of crowdsourcing tasks for sponsors. To complete the tasks efficiently, we aim to design reward mechanisms that incentivize early-joining agents to invite more participants to the tasks. Nonetheless, participants could potentially engage in strategic behaviors, e.g., not inviting others to the task… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  17. arXiv:2405.14125  [pdf, other

    cs.AI cs.CL

    ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation

    Authors: Jingnan Zheng, Han Wang, An Zhang, Tai D. Nguyen, Jun Sun, Tat-Seng Chua

    Abstract: Large Language Models (LLMs) can elicit unintended and even harmful content when misaligned with human values, posing severe risks to users and society. To mitigate these risks, current evaluation benchmarks predominantly employ expert-designed contextual scenarios to assess how well LLMs align with human values. However, the labor-intensive nature of these benchmarks limits their test scope, hind… ▽ More

    Submitted 24 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  18. arXiv:2405.12110  [pdf, other

    cs.CV

    CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization

    Authors: Jiawei Zhang, Jiahe Li, Xiaohan Yu, Lei Huang, Lin Gu, Jin Zheng, Xiao Bai

    Abstract: 3D Gaussian Splatting (3DGS) creates a radiance field consisting of 3D Gaussians to represent a scene. With sparse training views, 3DGS easily suffers from overfitting, negatively impacting the reconstruction quality. This paper introduces a new co-regularization perspective for improving sparse-view 3DGS. When training two 3D Gaussian radiance fields with the same sparse views of a scene, we obse… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Project page: https://jiaw-z.github.io/CoR-GS/

  19. arXiv:2405.10812  [pdf, other

    q-bio.GN cs.AI

    VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling

    Authors: Siyuan Li, Zedong Wang, Zicheng Liu, Di Wu, Cheng Tan, Jiangbin Zheng, Yufei Huang, Stan Z. Li

    Abstract: Similar to natural language models, pre-trained genome language models are proposed to capture the underlying intricacies within genomes with unsupervised sequence modeling. They have become essential tools for researchers and practitioners in biology. However, the hand-crafted tokenization policies used in these models may not encode the most discriminative patterns from the limited vocabulary of… ▽ More

    Submitted 2 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: ICML 2024. Preprint V2 with 17 pages and 5 figures

  20. arXiv:2405.09157  [pdf, other

    math.OC cs.CG cs.DC cs.DS

    A Primal-Dual Framework for Symmetric Cone Programming

    Authors: Jiaqi Zheng, Antonios Varvitsiotis, Tiow-Seng Tan, Wayne Lin

    Abstract: In this paper, we introduce a primal-dual algorithmic framework for solving Symmetric Cone Programs (SCPs), a versatile optimization model that unifies and extends Linear, Second-Order Cone (SOCP), and Semidefinite Programming (SDP). Our work generalizes the primal-dual framework for SDPs introduced by Arora and Kale, leveraging a recent extension of the Multiplicative Weights Update method (MWU)… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  21. Vehicles Swarm Intelligence: Cooperation in both Longitudinal and Lateral Dimensions

    Authors: Jia Hu, Nuoheng Zhang, Haoran Wang, Tenglong Jiang, Junnian Zheng, Feilong Liu

    Abstract: Longitudinal-only platooning methods are facing great challenges on running mobility, since they may be impeded by slow-moving vehicles from time to time. To address this issue, this paper proposes a vehicles swarming method coupled both longitudinal and lateral cooperation. The proposed method bears the following contributions: i) enhancing driving mobility by swarming like a bee colony; ii) ensu… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  22. arXiv:2405.07801  [pdf, other

    cs.CV

    Deep Learning-Based Object Pose Estimation: A Comprehensive Survey

    Authors: Jian Liu, Wei Sun, Hui Yang, Zhiwen Zeng, Chongpei Liu, Jin Zheng, Xingyu Liu, Hossein Rahmani, Nicu Sebe, Ajmal Mian

    Abstract: Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior accuracy and robustness, have increasingly supplanted conventional algorithms reliant on engineered point pair features. Nevertheless, several challenges persist in contemporary methods, including their dependen… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: 27 pages, 7 figures

  23. arXiv:2405.04834  [pdf, other

    cs.CV

    FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation

    Authors: Xuehai He, Jian Zheng, Jacob Zhiyuan Fang, Robinson Piramuthu, Mohit Bansal, Vicente Ordonez, Gunnar A Sigurdsson, Nanyun Peng, Xin Eric Wang

    Abstract: Controllable text-to-image (T2I) diffusion models generate images conditioned on both text prompts and semantic inputs of other modalities like edge maps. Nevertheless, current controllable T2I methods commonly face challenges related to efficiency and faithfulness, especially when conditioning on multiple inputs from either the same or diverse modalities. In this paper, we propose a novel Flexibl… ▽ More

    Submitted 21 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  24. arXiv:2405.03176  [pdf, other

    cs.NE

    FIMP-HGA: A Novel Approach to Addressing the Partitioning Min-Max Weighted Matching Problem

    Authors: Yuxuan Wang, Jiongzhi Zheng, Jinyao Xie, Kun He

    Abstract: The Partitioning Min-Max Weighted Matching (PMMWM) problem, being a practical NP-hard problem, integrates the task of partitioning the vertices of a bipartite graph into disjoint sets of limited size with the classical Maximum-Weight Perfect Matching (MPWM) problem. Initially introduced in 2015, the state-of-the-art method for addressing PMMWM is the MP$_{\text{LS}}$. In this paper, we present a n… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  25. arXiv:2405.00186  [pdf

    cs.AI cs.DB cs.IR

    Credentials in the Occupation Ontology

    Authors: John Beverley, Robin McGill, Sam Smith, Jie Zheng, Giacomo De Colle, Finn Wilson, Matthew Diller, William D. Duncan, William R. Hogan, Yongqun He

    Abstract: The term credential encompasses educational certificates, degrees, certifications, and government-issued licenses. An occupational credential is a verification of an individuals qualification or competence issued by a third party with relevant authority. Job seekers often leverage such credentials as evidence that desired qualifications are satisfied by their holders. Many U.S. education and workf… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 11

  26. arXiv:2404.17835  [pdf, other

    cs.CL

    VANER: Leveraging Large Language Model for Versatile and Adaptive Biomedical Named Entity Recognition

    Authors: Junyi Biana, Weiqi Zhai, Xiaodi Huang, Jiaxuan Zheng, Shanfeng Zhu

    Abstract: Prevalent solution for BioNER involves using representation learning techniques coupled with sequence labeling. However, such methods are inherently task-specific, demonstrate poor generalizability, and often require dedicated model for each dataset. To leverage the versatile capabilities of recently remarkable large language models (LLMs), several endeavors have explored generative approaches to… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  27. arXiv:2404.16627  [pdf, other

    cs.CL

    Incorporating Lexical and Syntactic Knowledge for Unsupervised Cross-Lingual Transfer

    Authors: Jianyu Zheng, Fengfei Fan, Jianquan Li

    Abstract: Unsupervised cross-lingual transfer involves transferring knowledge between languages without explicit supervision. Although numerous studies have been conducted to improve performance in such tasks by focusing on cross-lingual knowledge, particularly lexical and syntactic knowledge, current approaches are limited as they only incorporate syntactic or lexical information. Since each type of inform… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted at LREC-Coling 2024

  28. arXiv:2404.15264  [pdf, other

    cs.CV

    TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting

    Authors: Jiahe Li, Jiawei Zhang, Xiao Bai, Jin Zheng, Xin Ning, Jun Zhou, Lin Gu

    Abstract: Radiance fields have demonstrated impressive performance in synthesizing lifelike 3D talking heads. However, due to the difficulty in fitting steep appearance changes, the prevailing paradigm that presents facial motions by directly modifying point appearance may lead to distortions in dynamic regions. To tackle this challenge, we introduce TalkingGaussian, a deformation-based radiance fields fram… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Project page: https://fictionarry.github.io/TalkingGaussian/

  29. arXiv:2404.15042  [pdf, other

    cs.CR cs.AI

    Leverage Variational Graph Representation For Model Poisoning on Federated Learning

    Authors: Kai Li, Xin Yuan, Jingjing Zheng, Wei Ni, Falko Dressler, Abbas Jamalipour

    Abstract: This paper puts forth a new training data-untethered model poisoning (MP) attack on federated learning (FL). The new MP attack extends an adversarial variational graph autoencoder (VGAE) to create malicious local models based solely on the benign local models overheard without any access to the training data of FL. Such an advancement leads to the VGAE-MP attack that is not only efficacious but al… ▽ More

    Submitted 24 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: 12 pages, 8 figures, 2 tables

  30. arXiv:2404.15014  [pdf, other

    cs.CV

    OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

    Authors: Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

    Abstract: Existing solutions for 3D semantic occupancy prediction typically treat the task as a one-shot 3D voxel-wise segmentation perception problem. These discriminative methods focus on learning the mapping between the inputs and occupancy map in a single step, lacking the ability to gradually refine the occupancy map and the reasonable scene imaginative capacity to complete the local regions somewhere.… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  31. arXiv:2404.13544  [pdf, other

    cs.CR

    Faster Post-Quantum TLS 1.3 Based on ML-KEM: Implementation and Assessment

    Authors: Jieyu Zheng, Haoliang Zhu, Yifan Dong, Zhenyu Song, Zhenhao Zhang, Yafang Yang, Yunlei Zhao

    Abstract: TLS is extensively utilized for secure data transmission over networks. However, with the advent of quantum computers, the security of TLS based on traditional public-key cryptography is under threat. To counter quantum threats, it is imperative to integrate post-quantum algorithms into TLS. Most PQ-TLS research focuses on integration and evaluation, but few studies address the improvement of PQ-T… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: update the title

  32. arXiv:2404.12675  [pdf, other

    cs.CR

    ESPM-D: Efficient Sparse Polynomial Multiplication for Dilithium on ARM Cortex-M4 and Apple M2

    Authors: Jieyu Zheng, Hong Zhang, Le Tian, Zhuo Zhang, Hanyu Wei, Zhiwei Chu, Yafang Yang, Yunlei Zhao

    Abstract: Dilithium is a lattice-based digital signature scheme standardized by the NIST post-quantum cryptography (PQC) project. In this study, we focus on developing efficient sparse polynomial multiplication implementations of Dilithium for ARM Cortex-M4 and Apple M2, which are both based on the ARM architecture. The ARM Cortex-M4 is commonly utilized in resource-constrained devices such as sensors. Conv… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 19 pages, 1 figure

  33. arXiv:2404.10253  [pdf, other

    cs.DC

    Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

    Authors: Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiaojing Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu , et al. (16 additional authors not shown)

    Abstract: With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries t… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 18 pages, 13 figures

  34. arXiv:2404.09997  [pdf, other

    cs.NE

    An Efficient Evolutionary Algorithm for Diversified Top-k (Weight) Clique Search Problems

    Authors: Jiongzhi Zheng, Jinghui Xue, Kun He, Chu-Min Li, Yanli Liu

    Abstract: In many real-world problems and applications, finding only a single element, even though the best, among all possible candidates, cannot fully meet the requirements. We may wish to have a collection where each individual is not only outstanding but also distinctive. Diversified Top-k (DTk) problems are a kind of combinatorial optimization problem for finding such a promising collection of multiple… ▽ More

    Submitted 19 January, 2024; originally announced April 2024.

    Comments: 8 pages, 3 figures, 4 tables

  35. arXiv:2404.09993  [pdf, other

    cs.CV

    No More Ambiguity in 360° Room Layout via Bi-Layout Estimation

    Authors: Yu-Ju Tsai, Jin-Cheng Jhang, Jingjing Zheng, Wei Wang, Albert Y. C. Chen, Min Sun, Cheng-Hao Kuo, Ming-Hsuan Yang

    Abstract: Inherent ambiguity in layout annotations poses significant challenges to developing accurate 360° room layout estimation models. To address this issue, we propose a novel Bi-Layout model capable of predicting two distinct layout types. One stops at ambiguous regions, while the other extends to encompass all visible areas. Our model employs two global context embeddings, where each embedding is des… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, Project page: https://liagm.github.io/Bi_Layout/

  36. arXiv:2404.09502  [pdf, other

    cs.CV

    SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction

    Authors: Pin Tang, Zhongdao Wang, Guoqing Wang, Jilai Zheng, Xiangxuan Ren, Bailan Feng, Chao Ma

    Abstract: Vision-based perception for autonomous driving requires an explicit modeling of a 3D space, where 2D latent representations are mapped and subsequent 3D operators are applied. However, operating on dense latent spaces introduces a cubic time and space complexity, which limits scalability in terms of perception range or spatial resolution. Existing approaches compress the dense representation using… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures, accepted by CVPR 2024

    Journal ref: IEEE Conference on Computer Vision and Pattern Recognition 2024 (CVPR 2024)

  37. arXiv:2404.07603  [pdf, other

    cs.CV

    GLID: Pre-training a Generalist Encoder-Decoder Vision Model

    Authors: Jihao Liu, Jinliang Zheng, Yu Liu, Hongsheng Li

    Abstract: This paper proposes a GeneraLIst encoder-Decoder (GLID) pre-training method for better handling various downstream computer vision tasks. While self-supervised pre-training approaches, e.g., Masked Autoencoder, have shown success in transfer learning, task-specific sub-architectures are still required to be appended for different downstream tasks, which cannot enjoy the benefits of large-scale pre… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  38. arXiv:2404.06809  [pdf, other

    cs.CL

    Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation

    Authors: Ruotong Pan, Boxi Cao, Hongyu Lin, Xianpei Han, Jia Zheng, Sirui Wang, Xunliang Cai, Le Sun

    Abstract: The rapid development of large language models has led to the widespread adoption of Retrieval-Augmented Generation (RAG), which integrates external knowledge to alleviate knowledge bottlenecks and mitigate hallucinations. However, the existing RAG paradigm inevitably suffers from the impact of flawed information introduced during the retrieval phrase, thereby diminishing the reliability and corre… ▽ More

    Submitted 8 May, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Our code, benchmark, and models are available at https://github.com/panruotong/CAG

  39. arXiv:2404.06211  [pdf, other

    cs.CV

    Unified Physical-Digital Attack Detection Challenge

    Authors: Haocheng Yuan, Ajian Liu, Junze Zheng, Jun Wan, Jiankang Deng, Sergio Escalera, Hugo Jair Escalante, Isabelle Guyon, Zhen Lei

    Abstract: Face Anti-Spoofing (FAS) is crucial to safeguard Face Recognition (FR) Systems. In real-world scenarios, FRs are confronted with both physical and digital attacks. However, existing algorithms often address only one type of attack at a time, which poses significant limitations in real-world scenarios where FR systems face hybrid physical-digital threats. To facilitate the research of Unified Attac… ▽ More

    Submitted 18 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: 11 pages, 10 figures

  40. arXiv:2404.05105  [pdf, other

    cs.CV

    VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module

    Authors: Ziyang Wang, Jian-Qing Zheng, Chao Ma, Tao Guo

    Abstract: Image registration, a critical process in medical imaging, involves aligning different sets of medical imaging data into a single unified coordinate system. Deep learning networks, such as the Convolutional Neural Network (CNN)-based VoxelMorph, Vision Transformer (ViT)-based TransMorph, and State Space Model (SSM)-based MambaMorph, have demonstrated effective performance in this domain. The recen… ▽ More

    Submitted 14 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

  41. arXiv:2404.04936  [pdf, other

    cs.CV

    Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models

    Authors: Weiwei Cao, Jianpeng Zhang, Yingda Xia, Tony C. W. Mok, Zi Li, Xianghua Ye, Le Lu, Jian Zheng, Yuxing Tang, Ling Zhang

    Abstract: Radiologists highly desire fully automated versatile AI for medical imaging interpretation. However, the lack of extensively annotated large-scale multi-disease datasets has hindered the achievement of this goal. In this paper, we explore the feasibility of leveraging language as a naturally high-quality supervision for chest CT imaging. In light of the limited availability of image-report pairs,… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  42. arXiv:2404.04823  [pdf, other

    cs.CV

    3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

    Authors: Weijia Li, Haote Yang, Zhenghao Hu, Juepeng Zheng, Gui-Song Xia, Conghui He

    Abstract: 3D building reconstruction from monocular remote sensing images is an important and challenging research problem that has received increasing attention in recent years, owing to its low cost of data acquisition and availability for large-scale applications. However, existing methods rely on expensive 3D-annotated samples for fully-supervised training, restricting their application to large-scale c… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: accepted by CVPR 2024

  43. arXiv:2404.03248  [pdf, other

    cs.CV

    Learning Transferable Negative Prompts for Out-of-Distribution Detection

    Authors: Tianqi Li, Guansong Pang, Xiao Bai, Wenjun Miao, Jin Zheng

    Abstract: Existing prompt learning methods have shown certain capabilities in Out-of-Distribution (OOD) detection, but the lack of OOD images in the target dataset in their training can lead to mismatches between OOD images and In-Distribution (ID) categories, resulting in a high false positive rate. To address this issue, we introduce a novel OOD detection method, named 'NegPrompt', to learn a set of negat… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  44. Fragmented Moments, Balanced Choices: How Do People Make Use of Their Waiting Time?

    Authors: Jian Zheng, Ge Gao

    Abstract: Everyone spends some time waiting every day. HCI research has developed tools for boosting productivity while waiting. However, little is known about how people naturally spend their waiting time. We conducted an experience sampling study with 21 working adults who used a mobile app to report their daily waiting time activities over two weeks. The aim of this study is to understand the activities… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 14 pages. 6 figures. Published at ACM CHI'24

    ACM Class: H.5.m

  45. arXiv:2404.02686  [pdf, other

    cs.CV

    Design2Cloth: 3D Cloth Generation from 2D Masks

    Authors: Jiali Zheng, Rolandos Alexandros Potamias, Stefanos Zafeiriou

    Abstract: In recent years, there has been a significant shift in the field of digital avatar research, towards modeling, animating and reconstructing clothed human representations, as a key step towards creating realistic avatars. However, current 3D cloth generation methods are garment specific or trained completely on synthetic data, hence lacking fine details and realism. In this work, we make a step tow… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024, Project page: https://jiali-zheng.github.io/Design2Cloth/

  46. arXiv:2404.01139  [pdf, other

    cs.CV

    Structured Initialization for Attention in Vision Transformers

    Authors: Jianqiao Zheng, Xueqian Li, Simon Lucey

    Abstract: The training of vision transformer (ViT) networks on small-scale datasets poses a significant challenge. By contrast, convolutional neural networks (CNNs) have an architectural inductive bias enabling them to perform well on such problems. In this paper, we argue that the architectural bias inherent to CNNs can be reinterpreted as an initialization bias within ViT. This insight is significant as i… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 20 pages, 5 figures, 8 tables

  47. arXiv:2404.00451  [pdf, other

    cs.RO

    Thin-Shell Object Manipulations With Differentiable Physics Simulations

    Authors: Yian Wang, Juntian Zheng, Zhehuan Chen, Zhou Xian, Gu Zhang, Chao Liu, Chuang Gan

    Abstract: In this work, we aim to teach robots to manipulate various thin-shell materials. Prior works studying thin-shell object manipulation mostly rely on heuristic policies or learn policies from real-world video demonstrations, and only focus on limited material types and tasks (e.g., cloth unfolding). However, these approaches face significant challenges when extended to a wider variety of thin-shell… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: ICLR 2024

  48. arXiv:2403.19615  [pdf, other

    cs.CV

    SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing

    Authors: Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, Hao Zhao

    Abstract: In this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS). While the state-of-the-art method Mip-Splatting needs modifying the training procedure of Gaussian splatting, our method functions at test-time and is training-free. Specifically, SA-GS can be applied to any pretrained Gaussian splatting field as a plugin to significantly improve the field's anti-alisin… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Project page: https://kevinsong729.github.io/project-pages/SA-GS/ Code: https://github.com/zsy1987/SA-GS

  49. arXiv:2403.17460  [pdf, other

    eess.IV cs.CV

    Building Bridges across Spatial and Temporal Resolutions: Reference-Based Super-Resolution via Change Priors and Conditional Diffusion Model

    Authors: Runmin Dong, Shuai Yuan, Bin Luo, Mengxuan Chen, Jinxiao Zhang, Lixian Zhang, Weijia Li, Juepeng Zheng, Haohuan Fu

    Abstract: Reference-based super-resolution (RefSR) has the potential to build bridges across spatial and temporal resolutions of remote sensing images. However, existing RefSR methods are limited by the faithfulness of content reconstruction and the effectiveness of texture transfer in large scaling factors. Conditional diffusion models have opened up new opportunities for generating realistic high-resoluti… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  50. arXiv:2403.17301  [pdf, other

    cs.CV cs.CR

    Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving

    Authors: Junhao Zheng, Chenhao Lin, Jiahao Sun, Zhengyu Zhao, Qian Li, Chao Shen

    Abstract: Deep learning-based monocular depth estimation (MDE), extensively applied in autonomous driving, is known to be vulnerable to adversarial attacks. Previous physical attacks against MDE models rely on 2D adversarial patches, so they only affect a small, localized region in the MDE map but fail under various viewpoints. To address these limitations, we propose 3D Depth Fool (3D$^2$Fool), the first 3… ▽ More

    Submitted 27 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024