Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 271 results for author: Xie, C

Searching in archive cs. Search in all archives.
.
  1. Large Kernel Distillation Network for Efficient Single Image Super-Resolution

    Authors: Chengxing Xie, Xiaoming Zhang, Linze Li, Haiteng Meng, Tianlin Zhang, Tianrui Li, Xiaole Zhao

    Abstract: Efficient and lightweight single-image super-resolution (SISR) has achieved remarkable performance in recent years. One effective approach is the use of large kernel designs, which have been shown to improve the performance of SISR models while reducing their computational requirements. However, current state-of-the-art (SOTA) models still face problems such as high computational costs. To address… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted to CVPR workshop 2023

  2. arXiv:2407.13181  [pdf, other

    cs.CV

    Training-Free Large Model Priors for Multiple-in-One Image Restoration

    Authors: Xuanhua He, Lang Li, Yingying Wang, Hui Zheng, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

    Abstract: Image restoration aims to reconstruct the latent clear images from their degraded versions. Despite the notable achievement, existing methods predominantly focus on handling specific degradation types and thus require specialized models, impeding real-world applications in dynamic degradation scenarios. To address this issue, we propose Large Model Driven Image Restoration framework (LMDIR), a nov… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  3. arXiv:2407.09274  [pdf, other

    cs.LG cs.AI q-bio.BM

    Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX

    Authors: Zhiyuan Chen, Tianhao Chen, Chenggang Xie, Yang Xue, Xiaonan Zhang, Jingbo Zhou, Xiaomin Fang

    Abstract: Proteins are fundamental components of biological systems and can be represented through various modalities, including sequences, structures, and textual descriptions. Despite the advances in deep learning and scientific large language models (LLMs) for protein research, current methodologies predominantly focus on limited specialized tasks -- often predicting one protein modality from another. Th… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  4. arXiv:2407.03314  [pdf, other

    cs.CV cs.CL cs.DB

    BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

    Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

    Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  5. arXiv:2406.17309  [pdf, other

    cs.CV

    Zero-Shot Long-Form Video Understanding through Screenplay

    Authors: Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

    Abstract: The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information. In this paper, we present MM-Screenplayer, an advanced video understanding system with multi-modal perception capabilities that can convert any video into textual screenplay representations. Unlike pr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Highest Score Award to the CVPR'2024 LOVEU Track 1 Challenge

  6. arXiv:2406.16338  [pdf, other

    cs.CV

    VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

    Authors: Yuxuan Wang, Yueqian Wang, Dongyan Zhao, Cihang Xie, Zilong Zheng

    Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have extended their capabilities to video understanding. Yet, these models are often plagued by "hallucinations", where irrelevant or nonsensical content is generated, deviating from the actual video context. This work introduces VideoHallucer, the first comprehensive benchmark for hallucination detection in large video-language model… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  7. arXiv:2406.16135  [pdf, other

    cs.CL cs.LG

    Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

    Authors: Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chulin Xie, Chiyuan Zhang

    Abstract: Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks. We observe that while these models show promising surface-level crosslingual abilities on machine translation… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  8. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  9. arXiv:2406.09187  [pdf, other

    cs.LG

    GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

    Authors: Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Carl Yang, Dawn Song, Bo Li

    Abstract: The rapid advancement of large language models (LLMs) has catalyzed the deployment of LLM-powered agents across numerous applications, raising new concerns regarding their safety and trustworthiness. Existing methods for enhancing the safety of LLMs are not directly transferable to LLM-powered agents due to their diverse objectives and output modalities. In this paper, we propose GuardAgent, the f… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2406.08899  [pdf, other

    physics.soc-ph cs.SI

    ESND: An Embedding-based Framework for Signed Network Dismantling

    Authors: Chenwei Xie, Chuang Liu, Cong Li, Xiu-Xiu Zhan, Xiang Li

    Abstract: Network dismantling aims to maximize the disintegration of a network by removing a specific set of nodes or edges and is applied to various tasks in diverse domains, such as cracking down on crime organizations, delaying the propagation of rumors, and blocking the transmission of viruses. Most of the current network dismantling methods are tailored for unsigned networks, which only consider the co… ▽ More

    Submitted 21 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  11. arXiv:2406.08478  [pdf, other

    cs.CV cs.CL

    What If We Recaption Billions of Web Images with LLaMA-3?

    Authors: Xianhang Li, Haoqin Tu, Mude Hui, Zeyu Wang, Bingchen Zhao, Junfei Xiao, Sucheng Ren, Jieru Mei, Qing Liu, Huangjie Zheng, Yuyin Zhou, Cihang Xie

    Abstract: Web-crawled image-text pairs are inherently noisy. Prior studies demonstrate that semantically aligning and enriching textual descriptions of these pairs can significantly enhance model training across various vision-language tasks, particularly text-to-image generation. However, large-scale investigations in this area remain predominantly closed-source. Our paper aims to bridge this community eff… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

    Comments: First five authors contributed equally

  12. arXiv:2406.07537  [pdf, other

    cs.CV

    Autoregressive Pretraining with Mamba in Vision

    Authors: Sucheng Ren, Xianhang Li, Haoqin Tu, Feng Wang, Fangxun Shu, Lei Zhang, Jieru Mei, Linjie Yang, Peng Wang, Heng Wang, Alan Yuille, Cihang Xie

    Abstract: The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual capability can be significantly enhanced through autoregressive pretraining, a direction not previously explored. Efficiency-wise, the autoregressive nature can well capitalize on the Mamba's unidirectional recurrent structur… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  13. arXiv:2406.07112  [pdf, ps, other

    cs.IT

    Linear Codes from Projective Linear Anticodes Revisited

    Authors: Hao Chen, Conghui Xie

    Abstract: An anticode ${\bf C} \subset {\bf F}_q^n$ with the diameter $δ$ is a code in ${\bf F}_q^n$ such that the distance between any two distinct codewords in ${\bf C}$ is at most $δ$. The famous Erdös-Kleitman bound for a binary anticode ${\bf C}$ of the length $n$ and the diameter $δ$ asserts that $$|{\bf C}| \leq Σ_{i=0}^{\fracδ{2}} \displaystyle{n \choose i}.$$ In this paper, we give an antiGriesmer… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 38 pages, submitted

  14. arXiv:2405.20299  [pdf, other

    cs.CV

    Scaling White-Box Transformers for Vision

    Authors: Jinrui Yang, Xianhang Li, Druv Pai, Yuyin Zhou, Yi Ma, Yaodong Yu, Cihang Xie

    Abstract: CRATE, a white-box transformer architecture designed to learn compressed and sparse representations, offers an intriguing alternative to standard vision transformers (ViTs) due to its inherent mathematical interpretability. Despite extensive investigations into the scaling behaviors of language and vision transformers, the scalability of CRATE remains an open question which this paper aims to addr… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: project page: https://rayjryang.github.io/CRATE-alpha/

  15. arXiv:2405.18756  [pdf, other

    cs.LG cs.AI cs.CV stat.AP stat.ML

    Provable Contrastive Continual Learning

    Authors: Yichen Wen, Zhiquan Tan, Kaipeng Zheng, Chuanlong Xie, Weiran Huang

    Abstract: Continual learning requires learning incremental tasks with dynamic data distributions. So far, it has been observed that employing a combination of contrastive loss and distillation loss for training in continual learning yields strong performance. To the best of our knowledge, however, this contrastive continual learning framework lacks convincing theoretical explanations. In this work, we fill… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  16. arXiv:2405.18208  [pdf, other

    cs.AI cs.CL cs.LG

    A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models

    Authors: Chengxing Xie, Difan Zou

    Abstract: Recent studies have highlighted their proficiency in some simple tasks like writing and coding through various reasoning strategies. However, LLM agents still struggle with tasks that require comprehensive planning, a process that challenges current models and remains a critical research issue. In this study, we concentrate on travel planning, a Multi-Phases planning problem, that involves multipl… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  17. arXiv:2405.15388  [pdf, other

    cs.AI cs.RO

    Language-Driven Interactive Traffic Trajectory Generation

    Authors: Junkai Xia, Chenxin Xu, Qingyao Xu, Chen Xie, Yanfeng Wang, Siheng Chen

    Abstract: Realistic trajectory generation with natural language control is pivotal for advancing autonomous vehicle technology. However, previous methods focus on individual traffic participant trajectory generation, thus failing to account for the complexity of interactive traffic dynamics. In this work, we propose InteractTraj, the first language-driven traffic trajectory generator that can generate inter… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  18. arXiv:2405.15160  [pdf, other

    cs.CV

    ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning

    Authors: Sucheng Ren, Hongru Zhu, Chen Wei, Yijiang Li, Alan Yuille, Cihang Xie

    Abstract: This paper presents a new self-supervised video representation learning framework, ARVideo, which autoregressively predicts the next video token in a tailored sequence order. Two key designs are included. First, we organize autoregressive video tokens into clusters that span both spatially and temporally, thereby enabling a richer aggregation of contextual information compared to the standard spat… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  19. arXiv:2405.14858  [pdf, other

    cs.CV

    Mamba-R: Vision Mamba ALSO Needs Registers

    Authors: Feng Wang, Jiahao Wang, Sucheng Ren, Guoyizhe Wei, Jieru Mei, Wei Shao, Yuyin Zhou, Alan Yuille, Cihang Xie

    Abstract: Similar to Vision Transformers, this paper identifies artifacts also present within the feature maps of Vision Mamba. These artifacts, corresponding to high-norm tokens emerging in low-information background areas of images, appear much more severe in Vision Mamba -- they exist prevalently even with the tiny-sized model and activate extensively across background regions. To mitigate this issue, we… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  20. arXiv:2405.08284  [pdf

    econ.EM cs.LG stat.AP

    Predicting NVIDIA's Next-Day Stock Price: A Comparative Analysis of LSTM, MLP, ARIMA, and ARIMA-GARCH Models

    Authors: Yiluan Xing, Chao Yan, Cathy Chang Xie

    Abstract: Forecasting stock prices remains a considerable challenge in financial markets, bearing significant implications for investors, traders, and financial institutions. Amid the ongoing AI revolution, NVIDIA has emerged as a key player driving innovation across various sectors. Given its prominence, we chose NVIDIA as the subject of our study.

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures, 2 tables, conference paper

  21. arXiv:2405.06929  [pdf, other

    cs.CV

    PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition

    Authors: Shenglin He, Xiaoyang Qu, Jiguang Wan, Guokuan Li, Changsheng Xie, Jianzong Wang

    Abstract: Recognizing human actions from point cloud sequence has attracted tremendous attention from both academia and industry due to its wide applications. However, most previous studies on point cloud action recognition typically require complex networks to extract intra-frame spatial features and inter-frame temporal features, resulting in an excessive number of redundant computations. This leads to hi… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  22. arXiv:2404.18438  [pdf, ps, other

    cs.IT

    Two classes of constacyclic codes with a square-root-like lower bound

    Authors: Tingfang Chen, Zhonghua Sun, Conghui Xie, Hao Chen, Cunsheng Ding

    Abstract: Constacyclic codes over finite fields are an important class of linear codes as they contain distance-optimal codes and linear codes with best known parameters. They are interesting in theory and practice, as they have the constacyclic structure. In this paper, an infinite class of $q$-ary negacyclic codes of length $(q^m-1)/2$ and an infinite class of $q$-ary constacyclic codes of length… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  23. arXiv:2404.14693  [pdf, other

    cs.CR cs.CV eess.IV

    Double Privacy Guard: Robust Traceable Adversarial Watermarking against Face Recognition

    Authors: Yunming Zhang, Dengpan Ye, Sipeng Shen, Caiyun Xie, Ziyi Liu, Jiacheng Deng, Long Tang

    Abstract: The wide deployment of Face Recognition (FR) systems poses risks of privacy leakage. One countermeasure to address this issue is adversarial attacks, which deceive malicious FR searches but simultaneously interfere the normal identity verification of trusted authorizers. In this paper, we propose the first Double Privacy Guard (DPG) scheme based on traceable adversarial watermarking. DPG employs a… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  24. arXiv:2404.09990  [pdf, other

    cs.CV cs.AI

    HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

    Authors: Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie

    Abstract: This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200,000 edits. Unlike prior approaches relying on attribute guidance or human feedback on building datasets, we devise a scalable data collection pipeline leveraging advanced foundation models, namely GPT-4V and DALL-E 3. To ensure its high quality, diverse examples are first collected online, expande… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Project Page: https://thefllood.github.io/HQEdit_web

  25. arXiv:2404.08197  [pdf, other

    cs.CV

    Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

    Authors: Zichao Li, Cihang Xie, Ekin Dogus Cubuk

    Abstract: This paper investigates the performance of the Contrastive Language-Image Pre-training (CLIP) when scaled down to limited computation budgets. We explore CLIP along three dimensions: data, architecture, and training strategies. With regards to data, we demonstrate the significance of high-quality training data and show that a smaller dataset of high-quality data can outperform a larger dataset wit… ▽ More

    Submitted 15 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  26. arXiv:2404.07103  [pdf, other

    cs.CL cs.IR cs.LG

    Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs

    Authors: Bowen Jin, Chulin Xie, Jiawei Zhang, Kashob Kumar Roy, Yu Zhang, Zheng Li, Ruirui Li, Xianfeng Tang, Suhang Wang, Yu Meng, Jiawei Han

    Abstract: Large language models (LLMs), while exhibiting exceptional performance, suffer from hallucinations, especially on knowledge-intensive tasks. Existing works propose to augment LLMs with individual text units retrieved from external knowledge corpora to alleviate the issue. However, in many domains, texts are interconnected (e.g., academic papers in a bibliographic graph are linked by citations and… ▽ More

    Submitted 15 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: 21 pages. Code: https://github.com/PeterGriffinJin/Graph-CoT

  27. arXiv:2404.02478  [pdf, other

    cs.LG cs.AI

    FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning

    Authors: Rishub Tamirisa, Chulin Xie, Wenxuan Bao, Andy Zhou, Ron Arel, Aviv Shamsian

    Abstract: Standard federated learning approaches suffer when client data distributions have sufficient heterogeneity. Recent methods addressed the client data heterogeneity issue via personalized federated learning (PFL) - a class of FL algorithms aiming to personalize learned global knowledge to better suit the clients' local data distributions. Existing PFL methods usually decouple global updates in deep… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Published in CVPR 2024

  28. arXiv:2403.15839  [pdf, other

    cs.LG cs.DB cs.DC

    TablePuppet: A Generic Framework for Relational Federated Learning

    Authors: Lijie Xu, Chulin Xie, Yiran Guo, Gustavo Alonso, Bo Li, Guoliang Li, Wei Wang, Wentao Wu, Ce Zhang

    Abstract: Current federated learning (FL) approaches view decentralized training data as a single table, divided among participants either horizontally (by rows) or vertically (by columns). However, these approaches are inadequate for handling distributed relational tables across databases. This scenario requires intricate SQL operations like joins and unions to obtain the training data, which is either cos… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 14 pages, 8 figures

  29. arXiv:2403.15735  [pdf, other

    eess.IV cs.CV

    3D-TransUNet for Brain Metastases Segmentation in the BraTS2023 Challenge

    Authors: Siwei Yang, Xianhang Li, Jieru Mei, Jieneng Chen, Cihang Xie, Yuyin Zhou

    Abstract: Segmenting brain tumors is complex due to their diverse appearances and scales. Brain metastases, the most common type of brain tumor, are a frequent complication of cancer. Therefore, an effective segmentation model for brain metastases must adeptly capture local intricacies to delineate small tumor regions while also integrating global context to understand broader scan features. The TransUNet m… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  30. arXiv:2403.15447  [pdf, other

    cs.CL cs.AI

    Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

    Authors: Junyuan Hong, Jinhao Duan, Chenhui Zhang, Zhangheng Li, Chulin Xie, Kelsey Lieberman, James Diffenderfer, Brian Bartoldson, Ajay Jaiswal, Kaidi Xu, Bhavya Kailkhura, Dan Hendrycks, Dawn Song, Zhangyang Wang, Bo Li

    Abstract: Compressing high-capability Large Language Models (LLMs) has emerged as a favored strategy for resource-efficient inferences. While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected. This study conducts the first, thorough evaluation o… ▽ More

    Submitted 4 June, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted to ICML'24

  31. arXiv:2403.13064  [pdf, other

    cs.CV

    SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model

    Authors: Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, Jakob Engel, Edward Miller, Richard Newcombe, Vasileios Balntas

    Abstract: We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our proposed scene representation is inspired by recent successes in transformers & LLMs, and departs from more traditional methods which commonly describe scenes as meshes, voxel grids, point clouds or radiance fields. Our method… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: see project page, https://projectaria.com/scenescript

  32. arXiv:2403.10803  [pdf, other

    cs.LG cs.AI cs.CV

    Enhancing Out-of-Distribution Detection with Multitesting-based Layer-wise Feature Fusion

    Authors: Jiawei Li, Sitong Li, Shanshan Wang, Yicheng Zeng, Falong Tan, Chuanlong Xie

    Abstract: Deploying machine learning in open environments presents the challenge of encountering diverse test inputs that differ significantly from the training data. These out-of-distribution samples may exhibit shifts in local or global features compared to the training distribution. The machine learning (ML) community has responded with a number of methods aimed at distinguishing anomalous inputs from or… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  33. arXiv:2403.01749  [pdf, other

    cs.CL

    Differentially Private Synthetic Data via Foundation Model APIs 2: Text

    Authors: Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee, Bo Li, Sergey Yekhanin

    Abstract: Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalab… ▽ More

    Submitted 23 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: ICML'24 Spotlight

  34. arXiv:2402.15627  [pdf, other

    cs.LG cs.DC

    MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

    Authors: Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao , et al. (7 additional authors not shown)

    Abstract: We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPUs. Training LLMs at this scale brings unprecedented challenges to training efficiency and stability. We take a full-stack approach that co-designs the algorithmic and system components across model bl… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  35. arXiv:2402.12192  [pdf, other

    cs.CV

    Pan-Mamba: Effective pan-sharpening with State Space Model

    Authors: Xuanhua He, Ke Cao, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

    Abstract: Pan-sharpening involves integrating information from low-resolution multi-spectral and high-resolution panchromatic images to generate high-resolution multi-spectral counterparts. While recent advancements in the state space model, particularly the efficient long-range dependency modeling achieved by Mamba, have revolutionized computer vision community, its untapped potential in pan-sharpening mot… ▽ More

    Submitted 8 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  36. arXiv:2402.09404  [pdf, other

    cs.CL cs.AI cs.LG

    AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability

    Authors: Siwei Yang, Bingchen Zhao, Cihang Xie

    Abstract: This paper introduces AQA-Bench, a novel benchmark to assess the sequential reasoning capabilities of large language models (LLMs) in algorithmic contexts, such as depth-first search (DFS). The key feature of our evaluation benchmark lies in its interactive evaluation protocol -- for example, in DFS, the availability of each node's connected edge is contingent upon the model's traversal to that no… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  37. arXiv:2402.04559  [pdf, other

    cs.AI cs.CL cs.HC

    Can Large Language Model Agents Simulate Human Trust Behaviors?

    Authors: Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Kai Shu, Adel Bibi, Ziniu Hu, Philip Torr, Bernard Ghanem, Guohao Li

    Abstract: Large Language Model (LLM) agents have been increasingly adopted as simulation tools to model humans in applications such as social science. However, one fundamental question remains: can LLM agents really simulate human behaviors? In this paper, we focus on one of the most critical behaviors in human interactions, trust, and aim to investigate whether or not LLM agents can simulate human trust be… ▽ More

    Submitted 10 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: The first two authors contributed equally. Project website: https://www.camel-ai.org/research/agent-trust

  38. arXiv:2402.02853  [pdf, ps, other

    cs.IT

    Repeated-Root Cyclic Codes with Optimal Parameters or Best Parameters Known

    Authors: Hao Chen, Conghui Xie, Cunsheng Ding

    Abstract: Cyclic codes are the most studied subclass of linear codes and widely used in data storage and communication systems. Many cyclic codes have optimal parameters or the best parameters known. They are divided into simple-root cyclic codes and repeated-root cyclic codes. Although there are a huge number of references on cyclic codes, few of them are on repeated-root cyclic codes. Hence, repeated-root… ▽ More

    Submitted 22 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: 27 pages

  39. arXiv:2402.01226  [pdf, other

    cs.LG cs.AR

    HW-SW Optimization of DNNs for Privacy-preserving People Counting on Low-resolution Infrared Arrays

    Authors: Matteo Risso, Chen Xie, Francesco Daghero, Alessio Burrello, Seyedmorteza Mollaei, Marco Castellano, Enrico Macii, Massimo Poncino, Daniele Jahier Pagliari

    Abstract: Low-resolution infrared (IR) array sensors enable people counting applications such as monitoring the occupancy of spaces and people flows while preserving privacy and minimizing energy consumption. Deep Neural Networks (DNNs) have been shown to be well-suited to process these sensor data in an accurate and efficient manner. Nevertheless, the space of DNNs' architectures is huge and its manual exp… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted for publication in the DATE 2024 conference IEEE

  40. arXiv:2401.17895  [pdf, other

    cs.CV cs.AI cs.GR

    ReplaceAnything3D:Text-Guided 3D Scene Editing with Compositional Neural Radiance Fields

    Authors: Edward Bartrum, Thu Nguyen-Phuoc, Chris Xie, Zhengqin Li, Numair Khan, Armen Avetisyan, Douglas Lanman, Lei Xiao

    Abstract: We introduce ReplaceAnything3D model (RAM3D), a novel text-guided 3D scene editing method that enables the replacement of specific objects within a scene. Given multi-view images of a scene, a text prompt describing the object to replace, and a text prompt describing the new object, our Erase-and-Replace approach can effectively swap objects in the scene with newly generated content while maintain… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: For our project page, see https://replaceanything3d.github.io/

  41. arXiv:2401.17268  [pdf, other

    cs.CL cs.AI cs.LG

    Weaver: Foundation Models for Creative Writing

    Authors: Tiannan Wang, Jiamin Chen, Qingrui Jia, Shuai Wang, Ruoyu Fang, Huilin Wang, Zhaowei Gao, Chunzhao Xie, Chuou Xu, Jihong Dai, Yibin Liu, Jialong Wu, Shengwei Ding, Long Li, Zhiwei Huang, Xinle Deng, Teng Yu, Gangan Ma, Han Xiao, Zixin Chen, Danjun Xiang, Yunxia Wang, Yuanyuan Zhu, Yi Xiao, Jing Wang , et al. (21 additional authors not shown)

    Abstract: This work introduces Weaver, our first family of large language models (LLMs) dedicated to content creation. Weaver is pre-trained on a carefully selected corpus that focuses on improving the writing capabilities of large language models. We then fine-tune Weaver for creative and professional writing purposes and align it to the preference of professional writers using a suit of novel methods for… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  42. arXiv:2401.15708  [pdf, other

    cs.CV

    Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with Prototypical Embedding

    Authors: Jianxiang Lu, Cong Xie, Hui Guo

    Abstract: As large-scale text-to-image generation models have made remarkable progress in the field of text-to-image generation, many fine-tuning methods have been proposed. However, these models often struggle with novel objects, especially with one-shot scenarios. Our proposed method aims to address the challenges of generalizability and fidelity in an object-driven way, using only a single input image an… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  43. arXiv:2401.10642  [pdf, other

    cs.SI cs.AI

    Fast Butterfly-Core Community Search For Large Labeled Graphs

    Authors: JiaYi Du, Yinghao Wu, Wei Ai, Tao Meng, CanHao Xie, KeQin Li

    Abstract: Community Search (CS) aims to identify densely interconnected subgraphs corresponding to query vertices within a graph. However, existing heterogeneous graph-based community search methods need help identifying cross-group communities and suffer from efficiency issues, making them unsuitable for large graphs. This paper presents a fast community search model based on the Butterfly-Core Community (… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 8 pages, 8 figures

  44. arXiv:2401.10641  [pdf, other

    cs.SI cs.AI

    An Effective Index for Truss-based Community Search on Large Directed Graphs

    Authors: Wei Ai, CanHao Xie, Tao Meng, Yinghao Wu, KeQin Li

    Abstract: Community search is a derivative of community detection that enables online and personalized discovery of communities and has found extensive applications in massive real-world networks. Recently, there needs to be more focus on the community search issue within directed graphs, even though substantial research has been carried out on undirected graphs. The recently proposed D-truss model has achi… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 8 pages, 8figures

  45. arXiv:2401.06445  [pdf, other

    physics.soc-ph cs.SI

    Directed network comparison using motifs

    Authors: Chenwei Xie, Qiao Ke, Haoyu Chen, Chuang Liu, Xiu-Xiu Zhan

    Abstract: Analyzing and characterizing the differences between networks is a fundamental and challenging problem in network science. Previously, most network comparison methods that rely on topological properties have been restricted to measuring differences between two undirected networks. However, many networks, such as biological networks, social networks, and transportation networks, exhibit inherent di… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  46. arXiv:2401.04885  [pdf, ps, other

    cs.IT

    Cyclic and Negacyclic Sum-Rank Codes

    Authors: Hao Chen, Cunsheng Ding, Zhiqiang Cheng, Conghui Xie

    Abstract: Sum-rank codes have known applications in the multishot network coding, the distributed storage and the construction of space-time codes. U. Martínez-Peñas introduced the cyclic-skew-cyclic sum-rank codes and proposed the BCH bound on the cyclic-skew-cyclic sum-rank codes in his paper published in IEEE Trans. Inf. Theory, vol. 67, no. 8, 2021. Afterwards, many sum-rank BCH codes with lower bounds… ▽ More

    Submitted 8 April, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  47. arXiv:2401.04727  [pdf, other

    cs.CV

    Revisiting Adversarial Training at Scale

    Authors: Zeyu Wang, Xianhang Li, Hongru Zhu, Cihang Xie

    Abstract: The machine learning community has witnessed a drastic change in the training pipeline, pivoted by those ''foundation models'' with unprecedented scales. However, the field of adversarial training is lagging behind, predominantly centered around small model sizes like ResNet-50, and tiny and low-resolution datasets like CIFAR-10. To bridge this transformation gap, this paper provides a modern re-e… ▽ More

    Submitted 21 April, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted by CVPR 2024

  48. arXiv:2401.02931  [pdf, other

    cs.CV

    SPFormer: Enhancing Vision Transformer with Superpixel Representation

    Authors: Jieru Mei, Liang-Chieh Chen, Alan Yuille, Cihang Xie

    Abstract: In this work, we introduce SPFormer, a novel Vision Transformer enhanced by superpixel representation. Addressing the limitations of traditional Vision Transformers' fixed-size, non-adaptive patch partitioning, SPFormer employs superpixels that adapt to the image's content. This approach divides the image into irregular, semantically coherent regions, effectively capturing intricate details and ap… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  49. arXiv:2401.02161  [pdf, other

    cs.CV

    Enhancing RAW-to-sRGB with Decoupled Style Structure in Fourier Domain

    Authors: Xuanhua He, Tao Hu, Guoli Wang, Zejin Wang, Run Wang, Qian Zhang, Keyu Yan, Ziyi Chen, Rui Li, Chenjun Xie, Jie Zhang, Man Zhou

    Abstract: RAW to sRGB mapping, which aims to convert RAW images from smartphones into RGB form equivalent to that of Digital Single-Lens Reflex (DSLR) cameras, has become an important area of research. However, current methods often ignore the difference between cell phone RAW images and DSLR camera RGB images, a difference that goes beyond the color matrix and extends to spatial structure due to resolution… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  50. arXiv:2401.02151  [pdf, other

    cs.CV

    Frequency-Adaptive Pan-Sharpening with Mixture of Experts

    Authors: Xuanhua He, Keyu Yan, Rui Li, Chengjun Xie, Jie Zhang, Man Zhou

    Abstract: Pan-sharpening involves reconstructing missing high-frequency information in multi-spectral images with low spatial resolution, using a higher-resolution panchromatic image as guidance. Although the inborn connection with frequency domain, existing pan-sharpening research has not almost investigated the potential solution upon frequency domain. To this end, we propose a novel Frequency Adaptive Mi… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.