Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 974 results for author: Tian, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04656  [pdf, other

    cs.DC cs.LG

    Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement

    Authors: Yongji Wu, Wenjie Qu, Tianyang Tao, Zhuang Wang, Wei Bai, Zhuohao Li, Yuan Tian, Jiaheng Zhang, Matthew Lentz, Danyang Zhuo

    Abstract: Sparsely-activated Mixture-of-Experts (MoE) architecture has increasingly been adopted to further scale large language models (LLMs) due to its sub-linear scaling for computation costs. However, frequent failures still pose significant challenges as training scales. The cost of even a single failure is significant, as all GPUs need to wait idle until the failure is resolved, potentially losing con… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  2. arXiv:2407.04169  [pdf, other

    cs.CV cs.CR

    Solutions to Deepfakes: Can Camera Hardware, Cryptography, and Deep Learning Verify Real Images?

    Authors: Alexander Vilesov, Yuan Tian, Nader Sehatbakhsh, Achuta Kadambi

    Abstract: The exponential progress in generative AI poses serious implications for the credibility of all real images and videos. There will exist a point in the future where 1) digital content produced by generative AI will be indistinguishable from those created by cameras, 2) high-quality generative algorithms will be accessible to anyone, and 3) the ratio of all synthetic to real images will be large. I… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  3. arXiv:2407.03736  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    Semantic Grouping Network for Audio Source Separation

    Authors: Shentong Mo, Yapeng Tian

    Abstract: Recently, audio-visual separation approaches have taken advantage of the natural synchronization between the two modalities to boost audio source separation performance. They extracted high-level semantics from visual inputs as the guidance to help disentangle sound representation for individual sources. Can we directly learn to disentangle the individual semantics from the sound itself? The dilem… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  4. arXiv:2407.00676  [pdf, other

    cs.CV

    Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation

    Authors: Yuchuan Tian, Jianhong Han, Hanting Chen, Yuanyuan Xi, Guoyang Zhang, Jie Hu, Chao Xu, Yunhe Wang

    Abstract: Due to the unaffordable size and intensive computation costs of low-level vision models, All-in-One models that are designed to address a handful of low-level vision tasks simultaneously have been popular. However, existing All-in-One models are limited in terms of the range of tasks and performance. To overcome these limitations, we propose Instruct-IPT -- an All-in-One Image Processing Transform… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 15 pages, 4 figures

  5. arXiv:2407.00617  [pdf, other

    cs.LG cs.AI cs.CL cs.GT

    Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

    Authors: Yuheng Zhang, Dian Yu, Baolin Peng, Linfeng Song, Ye Tian, Mingyue Huo, Nan Jiang, Haitao Mi, Dong Yu

    Abstract: Reinforcement Learning with Human Feedback (RLHF) has achieved great success in aligning large language models (LLMs) with human preferences. Prevalent RLHF approaches are reward-based, following the Bradley-Terry (BT) model assumption, which may not fully capture the complexity of human preferences. In this paper, we explore RLHF under a general preference framework and approach it from a game-th… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  6. arXiv:2407.00320  [pdf, other

    cs.CL cs.AI cs.LG

    LiteSearch: Efficacious Tree Search for LLM

    Authors: Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Dian Yu, Haitao Mi, Jinsong Su, Dong Yu

    Abstract: Recent research suggests that tree search algorithms (e.g. Monte Carlo Tree Search) can dramatically boost LLM performance on complex mathematical reasoning tasks. However, they often require more than 10 times the computational resources of greedy decoding due to wasteful search strategies, making them difficult to be deployed in practical applications. This study introduces a novel guided tree s… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  7. arXiv:2406.19640  [pdf, other

    cs.CV

    Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

    Authors: Quanmin Liang, Zhilin Huang, Xiawu Zheng, Feidiao Yang, Jun Peng, Kai Huang, Yonghong Tian

    Abstract: Current Event Stream Super-Resolution (ESR) methods overlook the redundant and complementary information present in positive and negative events within the event stream, employing a direct mixing approach for super-resolution, which may lead to detail loss and inefficiency. To address these issues, we propose an efficient Recursive Multi-Branch Information Fusion Network (RMFNet) that separates po… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Journal ref: International Joint Conference on Artificial Intelligence 2024

  8. arXiv:2406.18845  [pdf, other

    cs.CV cs.AI cs.NE

    Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition

    Authors: Lan Chen, Dong Li, Xiao Wang, Pengpeng Shao, Wei Zhang, Yaowei Wang, Yonghong Tian, Jin Tang

    Abstract: Existing event stream-based pattern recognition models usually represent the event stream as the point cloud, voxel, image, etc., and design various deep neural networks to learn their features. Although considerable results can be achieved in simple cases, however, the model performance may be limited by monotonous modality expressions, sub-optimal fusion, and readout mechanisms. In this paper, w… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: In Peer Review, Journal Extension of PRCV 2023

  9. arXiv:2406.15034  [pdf, other

    cs.CV

    SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition

    Authors: Liutao Yu, Liwei Huang, Chenlin Zhou, Han Zhang, Zhengyu Ma, Huihui Zhou, Yonghong Tian

    Abstract: Video action recognition (VAR) plays crucial roles in various domains such as surveillance, healthcare, and industrial automation, making it highly significant for the society. Consequently, it has long been a research spot in the computer vision field. As artificial neural networks (ANNs) are flourishing, convolution neural networks (CNNs), including 2D-CNNs and 3D-CNNs, as well as variants of th… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI 2024 workshop - Human Brain and Artificial Intelligence

  10. arXiv:2406.13452  [pdf, other

    quant-ph cs.DM cs.SI physics.comp-ph physics.soc-ph

    Quantum Networks: from Multipartite Entanglement to Hypergraph Immersion

    Authors: Yu Tian, Yuefei Liu, Xiangyi Meng

    Abstract: Multipartite entanglement, a higher-order interaction unique to quantum information, offers various advantages over bipartite entanglement in quantum network (QN) applications. Establishing multipartite entanglement across remote parties in QN requires entanglement routing, which irreversibly transforms the QN topology at the cost of existing entanglement links. Here, we address the question of wh… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures, 1 table

  11. arXiv:2406.11838  [pdf, other

    cs.CV

    Autoregressive Image Generation without Vector Quantization

    Authors: Tianhong Li, Yonglong Tian, He Li, Mingyang Deng, Kaiming He

    Abstract: Conventional wisdom holds that autoregressive models for image generation are typically accompanied by vector-quantized tokens. We observe that while a discrete-valued space can facilitate representing a categorical distribution, it is not a necessity for autoregressive modeling. In this work, we propose to model the per-token probability distribution using a diffusion procedure, which allows us t… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Tech report

  12. arXiv:2406.11327  [pdf, other

    cs.CV

    ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding

    Authors: Tianren Ma, Lingxi Xie, Yunjie Tian, Boyu Yang, Yuan Zhang, David Doermann, Qixiang Ye

    Abstract: An essential topic for multimodal large language models (MLLMs) is aligning vision and language concepts at a finer level. In particular, we devote efforts to encoding visual referential information for tasks such as referring and grounding. Existing methods, including proxy encoding and geometry encoding, incorporate additional syntax to encode the object's location, bringing extra burdens in tra… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Project page: https://github.com/martian422/ClawMachine

  13. arXiv:2406.09907  [pdf, other

    cs.SI math.DS math.SP physics.soc-ph

    Balance with Memory in Signed Networks via Mittag-Leffler Matrix Functions

    Authors: Yu Tian, Ernesto Estrada

    Abstract: Structural balance is an important characteristic of graphs/networks where edges can be positive or negative, with direct impact on the study of real-world complex systems. When a network is not structurally balanced, it is important to know how much balance still exists in it. Although several measures have been proposed to characterize the degree of balance, the use of matrix functions of the si… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 25 pages, 8 figures, 4 tables

    MSC Class: 05C22; 05C38; 05C50; 37E25; 91D30; 94C15

  14. arXiv:2406.08034  [pdf, other

    physics.soc-ph cs.LG cs.SI math.DS

    Strong and Weak Random Walks on Signed Networks

    Authors: Shazia'Ayn Babul, Yu Tian, Renaud Lambiotte

    Abstract: Random walks play an important role in probing the structure of complex networks. On traditional networks, they can be used to extract community structure, understand node centrality, perform link prediction, or capture the similarity between nodes. On signed networks, where the edge weights can be either positive or negative, it is non-trivial to design a random walk which can be used to extract… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  15. arXiv:2406.07944  [pdf, other

    cs.SE cs.AI

    DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis

    Authors: Meiziniu Li, Dongze Li, Jianmeng Liu, Jialun Cao, Yongqiang Tian, Shing-Chi Cheung

    Abstract: Testing is a major approach to ensuring the quality of deep learning (DL) libraries. Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction. However, these techniques are limited in finding implementations that offer the same functionality and generating diverse test inputs for differential testing. This paper introduces DLLens, a novel dif… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    ACM Class: D.2.5; I.2.5

  16. arXiv:2406.07686  [pdf, other

    cs.CV

    AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

    Authors: Kai Wang, Shijian Deng, Jing Shi, Dimitrios Hatzinakos, Yapeng Tian

    Abstract: Recent Diffusion Transformers (DiTs) have shown impressive capabilities in generating high-quality single-modality content, including images, videos, and audio. However, it is still under-explored whether the transformer-based diffuser can efficiently denoise the Gaussian noises towards superb multimodal content creation. To bridge this gap, we introduce AV-DiT, a novel and efficient audio-visual… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  17. arXiv:2406.06326  [pdf, other

    cs.CL

    Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching

    Authors: Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Yipeng Zhang, Haitao Mi, Helen Meng

    Abstract: Large language models (LLMs) often struggle to provide up-to-date information due to their one-time training and the constantly evolving nature of the world. To keep LLMs current, existing approaches typically involve continued pre-training on new documents. However, they frequently face difficulties in extracting stored knowledge. Motivated by the remarkable success of the Feynman Technique in ef… ▽ More

    Submitted 15 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 30 pages

  18. arXiv:2406.06087  [pdf, other

    cs.CV

    GAIA: Rethinking Action Quality Assessment for AI-Generated Videos

    Authors: Zijian Chen, Wei Sun, Yuan Tian, Jun Jia, Zicheng Zhang, Jiarui Wang, Ru Huang, Xiongkuo Min, Guangtao Zhai, Wenjun Zhang

    Abstract: Assessing action quality is both imperative and challenging due to its significant impact on the quality of AI-generated videos, further complicated by the inherently ambiguous nature of actions within AI-generated video (AIGV). Current action quality assessment (AQA) algorithms predominantly focus on actions from real specific scenarios and are pre-trained with normative action features, thus ren… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 28 pages, 13 figures

  19. arXiv:2406.05392  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas

    Authors: Chengyuan Deng, Yiqun Duan, Xin Jin, Heng Chang, Yijun Tian, Han Liu, Henry Peng Zou, Yiqiao Jin, Yijia Xiao, Yichen Wang, Shenghao Wu, Zongxing Xie, Kuofeng Gao, Sihong He, Jun Zhuang, Lu Cheng, Haohan Wang

    Abstract: Large Language Models (LLMs) have achieved unparalleled success across diverse language modeling tasks in recent years. However, this progress has also intensified ethical concerns, impacting the deployment of LLMs in everyday contexts. This paper provides a comprehensive survey of ethical challenges associated with LLMs, from longstanding issues such as copyright infringement, systematic bias, an… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  20. arXiv:2406.05317  [pdf, other

    cs.LG cs.CL

    LoCoCo: Dropping In Convolutions for Long Context Compression

    Authors: Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen

    Abstract: This paper tackles the memory hurdle of processing long context sequences in Large Language Models (LLMs), by presenting a novel approach, Dropping In Convolutions for Long Context Compression (LoCoCo). LoCoCo employs only a fixed-size Key-Value (KV) cache, and can enhance efficiency in both inference and fine-tuning stages. Diverging from prior methods that selectively drop KV pairs based on heur… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  21. arXiv:2406.04930  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers

    Authors: Tanvir Mahmud, Shentong Mo, Yapeng Tian, Diana Marculescu

    Abstract: Recent advances in pre-trained vision transformers have shown promise in parameter-efficient audio-visual learning without audio pre-training. However, few studies have investigated effective methods for aligning multimodal features in parameter-efficient audio-visual transformers. In this paper, we propose MA-AVT, a new parameter-efficient audio-visual transformer employing deep modality alignmen… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted in Efficient Deep Learning for Computer Vision CVPR Workshop 2024

  22. arXiv:2406.04765  [pdf, other

    cs.CV cs.MM

    SMC++: Masked Learning of Unsupervised Video Semantic Compression

    Authors: Yuan Tian, Guo Lu, Guangtao Zhai

    Abstract: Most video compression methods focus on human visual perception, neglecting semantic preservation. This leads to severe semantic loss during the compression, hampering downstream video analysis tasks. In this paper, we propose a Masked Video Modeling (MVM)-powered compression framework that particularly preserves video semantics, by jointly mining and compressing the semantics in a self-supervised… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  23. arXiv:2406.04277  [pdf, other

    cs.CV

    VideoTetris: Towards Compositional Text-to-Video Generation

    Authors: Ye Tian, Ling Yang, Haotian Yang, Yuan Gao, Yufan Deng, Jingmin Chen, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, Di Zhang, Bin Cui

    Abstract: Diffusion models have demonstrated great success in text-to-video (T2V) generation. However, existing methods may face challenges when handling complex (long) video generation scenarios that involve multiple objects or dynamic changes in object numbers. To address these limitations, we propose VideoTetris, a novel framework that enables compositional T2V generation. Specifically, we propose spatio… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/YangLing0818/VideoTetris

  24. arXiv:2406.02559  [pdf, other

    cs.CV

    ShadowRefiner: Towards Mask-free Shadow Removal via Fast Fourier Transformer

    Authors: Wei Dong, Han Zhou, Yuqiong Tian, Jingke Sun, Xiaohong Liu, Guangtao Zhai, Jun Chen

    Abstract: Shadow-affected images often exhibit pronounced spatial discrepancies in color and illumination, consequently degrading various vision applications including object detection and segmentation systems. To effectively eliminate shadows in real-world images while preserving intricate details and producing visually compelling outcomes, we introduce a mask-free Shadow Removal and Refinement network (Sh… ▽ More

    Submitted 2 July, 2024; v1 submitted 17 April, 2024; originally announced June 2024.

    Comments: Accepted by CVPR workshop 2024 (NTIRE 2024); Corrected references

  25. arXiv:2406.02554  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.LG cs.MM

    Hear Me, See Me, Understand Me: Audio-Visual Autism Behavior Recognition

    Authors: Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian

    Abstract: In this article, we introduce a novel problem of audio-visual autism behavior recognition, which includes social behavior recognition, an essential aspect previously omitted in AI-assisted autism screening research. We define the task at hand as one that is audio-visual autism behavior recognition, which uses audio and visual cues, including any speech present in the audio, to recognize autism-rel… ▽ More

    Submitted 22 March, 2024; originally announced June 2024.

  26. arXiv:2406.01914  [pdf, other

    cs.CV cs.AI cs.CL

    HPE-CogVLM: New Head Pose Grounding Task Exploration on Vision Language Model

    Authors: Yu Tian, Tianqi Shao, Tsukasa Demizu, Xuyang Wu, Hsin-Tai Wu

    Abstract: Head pose estimation (HPE) task requires a sophisticated understanding of 3D spatial relationships and precise numerical output of yaw, pitch, and roll Euler angles. Previous HPE studies are mainly based on Non-large language models (Non-LLMs), which rely on close-up human heads cropped from the full image as inputs and lack robustness in real-world scenario. In this paper, we present a novel fram… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  27. arXiv:2406.01331  [pdf, other

    cs.IT eess.SP

    Performance Trade-off of Integrated Sensing and Communications for Multi-User Backscatter Systems

    Authors: Yuanming Tian, Dan Wang, Chuan Huang, Wei Zhang

    Abstract: This paper studies the performance trade-off in a multi-user backscatter communication (BackCom) system for integrated sensing and communications (ISAC), where the multi-antenna ISAC transmitter sends excitation signals to power multiple single-antenna passive backscatter devices (BD), and the multi-antenna ISAC receiver performs joint sensing (localization) and communication tasks based on the ba… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  28. arXiv:2406.00258  [pdf, other

    cs.CV cs.AI

    Artemis: Towards Referential Understanding in Complex Videos

    Authors: Jihao Qiu, Yuan Zhang, Xi Tang, Lingxi Xie, Tianren Ma, Pengyu Yan, David Doermann, Qixiang Ye, Yunjie Tian

    Abstract: Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we present Artemis, an MLLM that pushes video-based referential understanding to a finer level. Given a video, Artemis receives a natural-language quest… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 19 pages, 14 figures. Code and data are available at https://github.com/qiujihao19/Artemis

  29. arXiv:2406.00180  [pdf, other

    cs.SE

    An Empirical Study of Developers' Challenges in Implementing Workflows as Code: A Case Study on Apache Airflow

    Authors: Jerin Yasmin, Jiale Wang, Yuan Tian, Bram Adams

    Abstract: The Workflows as Code paradigm is becoming increasingly essential to streamline the design and management of complex processes within data-intensive software systems. These systems require robust capabilities to process, analyze, and extract insights from large datasets. Workflow orchestration platforms such as Apache Airflow are pivotal in meeting these needs, as they effectively support the impl… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: This is the preprint version of a paper that has been submitted to the Journal of Systems and Software

  30. arXiv:2405.20224  [pdf, other

    cs.CV

    EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images

    Authors: Wangbo Yu, Chaoran Feng, Jiye Tang, Xu Jia, Li Yuan, Yonghong Tian

    Abstract: 3D Gaussian Splatting (3D-GS) has demonstrated exceptional capabilities in 3D scene reconstruction and novel view synthesis. However, its training heavily depends on high-quality, sharp images and accurate camera poses. Fulfilling these requirements can be challenging in non-ideal real-world scenarios, where motion-blurred images are commonly encountered in high-speed moving cameras or low-light e… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Project Page: https://drexubery.github.io/EvaGaussians/

  31. arXiv:2405.19668  [pdf, other

    cs.CV

    AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization

    Authors: Jiawei Chen, Xiao Yang, Zhengwei Fang, Yu Tian, Yinpeng Dong, Zhaoxia Yin, Hang Su

    Abstract: Despite the widespread application of large language models (LLMs) across various tasks, recent studies indicate that they are susceptible to jailbreak attacks, which can render their defense mechanisms ineffective. However, previous jailbreak research has frequently been constrained by limited universality, suboptimal efficiency, and a reliance on manual crafting. In response, we rethink the appr… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Under review

  32. arXiv:2405.17782  [pdf, other

    cs.LG cs.CY

    Post-Fair Federated Learning: Achieving Group and Community Fairness in Federated Learning via Post-processing

    Authors: Yuying Duan, Yijun Tian, Nitesh Chawla, Michael Lemmon

    Abstract: Federated Learning (FL) is a distributed machine learning framework in which a set of local communities collaboratively learn a shared global model while retaining all training data locally within each community. Two notions of fairness have recently emerged as important issues for federated learning: group fairness and community fairness. Group fairness requires that a model's decisions do not fa… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  33. arXiv:2405.16555  [pdf, other

    cs.CV

    vHeat: Building Vision Models upon Heat Conduction

    Authors: Zhaozhi Wang, Yue Liu, Yunfan Liu, Hongtian Yu, Yaowei Wang, Qixiang Ye, Yunjie Tian

    Abstract: A fundamental problem in learning robust and expressive visual representations lies in efficiently estimating the spatial relationships of visual semantics throughout the entire image. In this study, we propose vHeat, a novel vision backbone model that simultaneously achieves both high computational efficiency and global receptive field. The essential idea, inspired by the physical principle of he… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 18 pages, 10 figures, 9 tables

  34. arXiv:2405.16466  [pdf, other

    cs.NE

    High-Performance Temporal Reversible Spiking Neural Networks with $O(L)$ Training Memory and $O(1)$ Inference Cost

    Authors: JiaKui Hu, Man Yao, Xuerui Qiu, Yuhong Chou, Yuxuan Cai, Ning Qiao, Yonghong Tian, Bo XU, Guoqi Li

    Abstract: Multi-timestep simulation of brain-inspired Spiking Neural Networks (SNNs) boost memory requirements during training and increase inference energy cost. Current training methods cannot simultaneously solve both training and inference dilemmas. This work proposes a novel Temporal Reversible architecture for SNNs (T-RevSNN) to jointly address the training and inference challenges by altering the for… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML2024

  35. arXiv:2405.16406  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    SpinQuant: LLM quantization with learned rotations

    Authors: Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krishnamoorthi, Vikas Chandra, Yuandong Tian, Tijmen Blankevoort

    Abstract: Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Recent findings suggest that rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  36. arXiv:2405.16234  [pdf, other

    cs.CV

    Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities

    Authors: Shiyu Xia, Junyu Xiong, Haoyu Dong, Jianbo Zhao, Yuzhang Tian, Mengyu Zhou, Yeye He, Shi Han, Dongmei Zhang

    Abstract: This paper explores capabilities of Vision Language Models on spreadsheet comprehension. We propose three self-supervised challenges with corresponding evaluation metrics to comprehensively evaluate VLMs on Optical Character Recognition (OCR), spatial perception, and visual format recognition. Additionally, we utilize the spreadsheet table detection task to assess the overall performance of VLMs b… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  37. arXiv:2405.15881  [pdf, other

    cs.CV cs.AI cs.LG

    Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

    Authors: Shentong Mo, Yapeng Tian

    Abstract: In recent developments, the Mamba architecture, known for its selective state space approach, has shown potential in the efficient modeling of long sequences. However, its application in image generation remains underexplored. Traditional diffusion transformers (DiT), which utilize self-attention blocks, are effective but their computational complexity scales quadratically with the input length, l… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  38. arXiv:2405.15755  [pdf, other

    cs.CV

    ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking

    Authors: Xudong Han, Nobuyuki Oishi, Yueying Tian, Elif Ucurum, Rupert Young, Chris Chatwin, Philip Birch

    Abstract: Many Multi-Object Tracking (MOT) approaches exploit motion information to associate all the detected objects across frames. However, many methods that rely on filtering-based algorithms, such as the Kalman Filter, often work well in linear motion scenarios but struggle to accurately predict the locations of objects undergoing complex and non-linear movements. To tackle these scenarios, we propose… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 16 pages, 7 figures

  39. arXiv:2405.15571  [pdf, other

    cs.HC

    RCInvestigator: Towards Better Investigation of Anomaly Root Causes in Cloud Computing Systems

    Authors: Shuhan Liu, Yunfan Zhou, Lu Ying, Yuan Tian, Jue Zhang, Shandan Zhou, Weiwei Cui, Qingwei Lin, Thomas Moscibroda, Haidong Zhang, Di Weng, Yingcai Wu

    Abstract: Finding the root causes of anomalies in cloud computing systems quickly is crucial to ensure availability and efficiency since accurate root causes can guide engineers to take appropriate actions to address the anomalies and maintain customer satisfaction. However, it is difficult to investigate and identify the root causes based on large-scale and high-dimension monitoring data collected from com… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  40. arXiv:2405.14036  [pdf, other

    cs.CR

    Remote Keylogging Attacks in Multi-user VR Applications

    Authors: Zihao Su, Kunlin Cai, Reuben Beeler, Lukas Dresel, Allan Garcia, Ilya Grishchenko, Yuan Tian, Christopher Kruegel, Giovanni Vigna

    Abstract: As Virtual Reality (VR) applications grow in popularity, they have bridged distances and brought users closer together. However, with this growth, there have been increasing concerns about security and privacy, especially related to the motion data used to create immersive experiences. In this study, we highlight a significant security threat in multi-user VR applications, which are applications t… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted for Usenix 2024

  41. arXiv:2405.13874  [pdf, other

    cs.CV

    Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching

    Authors: Hongkai Chen, Zixin Luo, Yurun Tian, Xuyang Bai, Ziyu Wang, Lei Zhou, Mingmin Zhen, Tian Fang, David McKinnon, Yanghai Tsin, Long Quan

    Abstract: Identifying robust and accurate correspondences across images is a fundamental problem in computer vision that enables various downstream tasks. Recent semi-dense matching methods emphasize the effectiveness of fusing relevant cross-view information through Transformer. In this paper, we propose several improvements upon this paradigm. Firstly, we introduce affine-based local attention to model cr… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR2024 Image Matching Workshop

  42. arXiv:2405.10718  [pdf, other

    cs.CV cs.CL

    SignLLM: Sign Languages Production Large Language Models

    Authors: Sen Fang, Lei Wang, Ce Zheng, Yapeng Tian, Chen Chen

    Abstract: In this paper, we introduce the first comprehensive multilingual sign language dataset named Prompt2Sign, which builds from public data including American Sign Language (ASL) and seven others. Our dataset transforms a vast array of videos into a streamlined, model-friendly format, optimized for training with translation models like seq2seq and text2text. Building on this new dataset, we propose Si… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 33 pages, website at https://signllm.github.io/

  43. arXiv:2405.10348  [pdf, other

    q-bio.QM cs.AI cs.LG

    Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning

    Authors: Lirong Wu, Yijun Tian, Haitao Lin, Yufei Huang, Siyuan Li, Nitesh V Chawla, Stan Z. Li

    Abstract: Protein-protein bindings play a key role in a variety of fundamental biological processes, and thus predicting the effects of amino acid mutations on protein-protein binding is crucial. To tackle the scarcity of annotated mutation data, pre-training with massive unlabeled data has emerged as a promising solution. However, this process faces a series of challenges: (1) complex higher-order dependen… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  44. arXiv:2405.09291  [pdf, other

    cs.CV cs.AI eess.IV

    Sensitivity Decouple Learning for Image Compression Artifacts Reduction

    Authors: Li Ma, Yifan Zhao, Peixi Peng, Yonghong Tian

    Abstract: With the benefit of deep learning techniques, recent researches have made significant progress in image compression artifacts reduction. Despite their improved performances, prevailing methods only focus on learning a mapping from the compressed image to the original one but ignore the intrinsic attributes of the given compressed images, which greatly harms the performance of downstream parsing ta… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted by Transactions on Image Processing

  45. arXiv:2405.08935  [pdf, other

    cs.RO

    Function based sim-to-real learning for shape control of deformable free-form surfaces

    Authors: Yingjun Tian, Guoxin Fang, Renbo Su, Weiming Wang, Simeon Gill, Andrew Weightman, Charlie C. L. Wang

    Abstract: For the shape control of deformable free-form surfaces, simulation plays a crucial role in establishing the mapping between the actuation parameters and the deformed shapes. The differentiation of this forward kinematic mapping is usually employed to solve the inverse kinematic problem for determining the actuation parameters that can realize a target shape. However, the free-form surfaces obtaine… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  46. arXiv:2405.08816  [pdf, other

    cs.CV cs.RO

    The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

    Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

    Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More

    Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

  47. arXiv:2405.07429  [pdf, other

    cs.RO

    JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation

    Authors: Xubo Luo, Xue Wan, Yixing Gao, Yaolin Tian, Wei Zhang, Leizheng Shu

    Abstract: Unmanned aerial vehicles (UAVs) visual localization in planetary aims to estimate the absolute pose of the UAV in the world coordinate system through satellite maps and images captured by on-board cameras. However, since planetary scenes often lack significant landmarks and there are modal differences between satellite maps and UAV images, the accuracy and real-time performance of UAV positioning… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 8 pages

  48. arXiv:2405.07155  [pdf, other

    cs.CV

    Enhancing Multi-modal Learning: Meta-learned Cross-modal Knowledge Distillation for Handling Missing Modalities

    Authors: Hu Wang, Congbo Ma, Yuyuan Liu, Yuanhong Chen, Yu Tian, Jodie Avery, Louise Hull, Gustavo Carneiro

    Abstract: In multi-modal learning, some modalities are more influential than others, and their absence can have a significant impact on classification/segmentation accuracy. Hence, an important research question is if it is possible for trained multi-modal models to have high accuracy even when influential modalities are absent from the input data. In this paper, we propose a novel approach called Meta-lear… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  49. arXiv:2405.06806  [pdf, other

    cs.SE

    An Empirical Study on the Effectiveness of Large Language Models for SATD Identification and Classification

    Authors: Mohammad Sadegh Sheikhaei, Yuan Tian, Shaowei Wang, Bowen Xu

    Abstract: Self-Admitted Technical Debt (SATD), a concept highlighting sub-optimal choices in software development documented in code comments or other project resources, poses challenges in the maintainability and evolution of software systems. Large language models (LLMs) have demonstrated significant effectiveness across a broad range of software tasks, especially in software text generation tasks. Noneth… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: This is the preprint version of a paper that has been submitted to Empirical Software Engineering

    ACM Class: D.2; I.2

  50. arXiv:2405.06672  [pdf, other

    stat.ML cs.LG math.PR physics.data-an stat.CO

    Liouville Flow Importance Sampler

    Authors: Yifeng Tian, Nishant Panda, Yen Ting Lin

    Abstract: We present the Liouville Flow Importance Sampler (LFIS), an innovative flow-based model for generating samples from unnormalized density functions. LFIS learns a time-dependent velocity field that deterministically transports samples from a simple initial distribution to a complex target distribution, guided by a prescribed path of annealed distributions. The training of LFIS utilizes a unique met… ▽ More

    Submitted 9 June, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: 25 pages, 7 figures, 15 tables. Submitted to and accepted by the 41th International Conference on Machine Learning (Vienna, Austria)

    Report number: LA-UR-24-21091