Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 248 results for author: Hu, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00783  [pdf, other

    cs.CV cs.AI

    Diffusion Models and Representation Learning: A Survey

    Authors: Michael Fuest, Pingchuan Ma, Ming Gui, Johannes S. Fischer, Vincent Tao Hu, Bjorn Ommer

    Abstract: Diffusion Models are popular generative modeling methods in various vision tasks, attracting significant attention. They can be considered a unique instance of self-supervised learning methods due to their independence from label annotation. This survey explores the interplay between diffusion models and representation learning. It provides an overview of diffusion models' essential aspects, inclu… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Github Repo: https://github.com/dongzhuoyao/Diffusion-Representation-Learning-Survey-Taxonomy

  2. arXiv:2406.18311  [pdf, other

    cs.LG

    Online Learning of Multiple Tasks and Their Relationships : Testing on Spam Email Data and EEG Signals Recorded in Construction Fields

    Authors: Yixin Jin, Wenjing Zhou, Meiqi Wang, Meng Li, Xintao Li, Tianyu Hu, Xingyuan Bu

    Abstract: This paper examines an online multi-task learning (OMTL) method, which processes data sequentially to predict labels across related tasks. The framework learns task weights and their relatedness concurrently. Unlike previous models that assumed static task relatedness, our approach treats tasks as initially independent, updating their relatedness iteratively using newly calculated weight vectors.… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2406.11657  [pdf, other

    cs.CL cs.CY

    Can LLM be a Personalized Judge?

    Authors: Yijiang River Dong, Tiancheng Hu, Nigel Collier

    Abstract: Ensuring that large language models (LLMs) reflect diverse user values and preferences is crucial as their user bases expand globally. It is therefore encouraging to see the growing interest in LLM personalization within the research community. However, current works often rely on the LLM-as-a-Judge approach for evaluation without thoroughly examining its validity. In this paper, we investigate th… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Our code is available at https://github.com/dong-river/Personalized-Judge

  4. arXiv:2406.11096  [pdf, other

    cs.CL

    The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

    Authors: Bolei Ma, Xinpeng Wang, Tiancheng Hu, Anna-Carolina Haensch, Michael A. Hedderich, Barbara Plank, Frauke Kreuter

    Abstract: Recent advances in Large Language Models (LLMs) have sparked wide interest in validating and comprehending the human-like cognitive-behavioral traits LLMs may have. These cognitive-behavioral traits include typically Attitudes, Opinions, Values (AOV). However, measuring AOV embedded within LLMs remains opaque, and different evaluation methods may yield different results. This has led to a lack of… ▽ More

    Submitted 1 July, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  5. arXiv:2406.09794  [pdf, other

    cs.CV

    SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis

    Authors: Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L. Rosin, Yu-Kun Lai

    Abstract: SVG (Scalable Vector Graphics) is a widely used graphics format that possesses excellent scalability and editability. Image vectorization, which aims to convert raster images to SVGs, is an important yet challenging problem in computer vision and graphics. Existing image vectorization methods either suffer from low reconstruction accuracy for complex images or require long computation time. To add… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: CVPR 2024

  6. arXiv:2406.07811  [pdf, other

    cs.NE cs.AI cs.LG

    Evolutionary Computation and Explainable AI: A Roadmap to Transparent Intelligent Systems

    Authors: Ryan Zhou, Jaume Bacardit, Alexander Brownlee, Stefano Cagnoni, Martin Fyvie, Giovanni Iacca, John McCall, Niki van Stein, David Walker, Ting Hu

    Abstract: AI methods are finding an increasing number of applications, but their often black-box nature has raised concerns about accountability and trust. The field of explainable artificial intelligence (XAI) has emerged in response to the need for human understanding of AI models. Evolutionary computation (EC), as a family of powerful optimization and learning tools, has significant potential to contribu… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 29 pages, 4 figures. arXiv admin note: substantial text overlap with arXiv:2306.14786

  7. arXiv:2406.02847  [pdf, other

    cs.LG stat.ML

    Exact Conversion of In-Context Learning to Model Weights in Linearized-Attention Transformers

    Authors: Brian K Chen, Tianyang Hu, Hui Jin, Hwee Kuan Lee, Kenji Kawaguchi

    Abstract: In-Context Learning (ICL) has been a powerful emergent property of large language models that has attracted increasing attention in recent years. In contrast to regular gradient-based learning, ICL is highly interpretable and does not require parameter updates. In this paper, we show that, for linearized transformer networks, ICL can be made explicit and permanent through the inclusion of bias ter… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  8. arXiv:2405.15660  [pdf, other

    cs.CV

    Low-Light Video Enhancement via Spatial-Temporal Consistent Illumination and Reflection Decomposition

    Authors: Xiaogang Xu, Kun Zhou, Tao Hu, Ruixing Wang, Hujun Bao

    Abstract: Low-Light Video Enhancement (LLVE) seeks to restore dynamic and static scenes plagued by severe invisibility and noise. One critical aspect is formulating a consistency constraint specifically for temporal-spatial illumination and appearance enhanced versions, a dimension overlooked in existing methods. In this paper, we present an innovative video Retinex-based decomposition strategy that operate… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  9. arXiv:2405.15302  [pdf, other

    cs.AI cs.CL cs.LG

    Towards Understanding How Transformer Perform Multi-step Reasoning with Matching Operation

    Authors: Zhiwei Wang, Yunji Wang, Zhongwang Zhang, Zhangchen Zhou, Hui Jin, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Yaoyu Zhang, Zhi-Qin John Xu

    Abstract: Large language models have consistently struggled with complex reasoning tasks, such as mathematical problem-solving. Investigating the internal reasoning mechanisms of these models can help us design better model architectures and training strategies, ultimately enhancing their reasoning capabilities. In this study, we examine the matching mechanism employed by Transformer for multi-step reasonin… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  10. arXiv:2405.14616  [pdf, other

    cs.LG cs.AI

    TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

    Authors: Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, Jun Zhou

    Abstract: Time series forecasting is widely used in extensive applications, such as traffic planning and weather forecasting. However, real-world time series usually present intricate temporal variations, making forecasting extremely challenging. Going beyond the mainstream paradigms of plain decomposition and multiperiodicity analysis, we analyze temporal variations in a novel view of multiscale-mixing, wh… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  11. arXiv:2405.10989  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    Learnable Privacy Neurons Localization in Language Models

    Authors: Ruizhe Chen, Tianxiang Hu, Yang Feng, Zuozhu Liu

    Abstract: Concerns regarding Large Language Models (LLMs) to memorize and disclose private information, particularly Personally Identifiable Information (PII), become prominent within the community. Many efforts have been made to mitigate the privacy risks. However, the mechanism through which LLMs memorize PII remains poorly understood. To bridge this gap, we introduce a pioneering method for pinpointing P… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: ACL 2024 main conference

  12. arXiv:2405.10861  [pdf, other

    cs.CL cs.AI cs.CY

    Tailoring Vaccine Messaging with Common-Ground Opinions

    Authors: Rickard Stureborg, Sanxing Chen, Ruoyu Xie, Aayushi Patel, Christopher Li, Chloe Qinyu Zhu, Tingnan Hu, Jun Yang, Bhuwan Dhingra

    Abstract: One way to personalize chatbot interactions is by establishing common ground with the intended reader. A domain where establishing mutual understanding could be particularly impactful is vaccine concerns and misinformation. Vaccine interventions are forms of messaging which aim to answer concerns expressed about vaccination. Tailoring responses in this domain is difficult, since opinions often hav… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: NAACL Findings 2024

    MSC Class: 68T50 (Primary) 68T01; 68T37; 91F20 (Secondary) ACM Class: I.2; I.2.7; I.7

  13. arXiv:2404.16233  [pdf, other

    cs.LG cs.AI

    AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

    Authors: Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis

    Abstract: AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundation models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite… ▽ More

    Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted at AutoML 2024 Conference

  14. arXiv:2404.15789  [pdf, other

    cs.CV

    MotionMaster: Training-free Camera Motion Transfer For Video Generation

    Authors: Teng Hu, Jiangning Zhang, Ran Yi, Yating Wang, Hongrui Huang, Jieyu Weng, Yabiao Wang, Lizhuang Ma

    Abstract: The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate sub… ▽ More

    Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  15. arXiv:2404.15275  [pdf, other

    cs.CV

    ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

    Authors: Xuanhua He, Quande Liu, Shengju Qian, Xin Wang, Tao Hu, Ke Cao, Keyu Yan, Jie Zhang

    Abstract: Generating high-fidelity human video with specified identities has attracted significant attention in the content generation community. However, existing techniques struggle to strike a balance between training efficiency and identity preservation, either requiring tedious case-by-case fine-tuning or usually missing identity details in the video generation process. In this study, we present \textb… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: Project Page: https://id-animator.github.io/

  16. arXiv:2404.14678  [pdf, other

    cs.CV

    3DBench: A Scalable 3D Benchmark and Instruction-Tuning Dataset

    Authors: Junjie Zhang, Tianci Hu, Xiaoshui Huang, Yongshun Gong, Dan Zeng

    Abstract: Evaluating the performance of Multi-modal Large Language Models (MLLMs), integrating both point cloud and language, presents significant challenges. The lack of a comprehensive assessment hampers determining whether these models truly represent advancements, thereby impeding further progress in the field. Current evaluations heavily rely on classification and caption tasks, falling short in provid… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  17. arXiv:2404.14329  [pdf, other

    cs.CV

    X-Ray: A Sequential 3D Representation For Generation

    Authors: Tao Hu, Wenhang Ge, Yuyang Zhao, Gim Hee Lee

    Abstract: We introduce X-Ray, a novel 3D sequential representation inspired by the penetrability of x-ray scans. X-Ray transforms a 3D object into a series of surface frames at different layers, making it suitable for generating 3D models from images. Our method utilizes ray casting from the camera center to capture geometric and textured details, including depth, normal, and color, across all intersected s… ▽ More

    Submitted 1 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  18. arXiv:2404.14132  [pdf, other

    cs.CV eess.IV

    CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task

    Authors: Kangzhen Yang, Tao Hu, Kexin Dai, Genggeng Chen, Yu Cao, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. Howev… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR2024 Workshop, Code: https://github.com/CalvinYang0/CRNet

  19. Revolutionizing student course selection: Exploring the application prospects and challenges of blockchain token voting technology

    Authors: Tiansu Hu, Yuzhao Song, Linjing Zhang, Xiaoya Zhou

    Abstract: This paper explores the utilization of blockchain token voting technology in student course selection systems. The current course selection systems face various issues, which can be mitigated through the implementation of blockchain technology. The advantages of blockchain technology, including consensus mechanisms and smart contracts, are discussed in detail. The token voting mechanism, encompass… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages, 5 figures, presented at the 5th International Conference on Computing and Data Science, 2023. Open access article distributed under CC BY license. DOI: 10.54254/2755-2721/18/20230971

    ACM Class: H.1.2; K.4.1

    Journal ref: Applied and Computational Engineering, Vol. 18, Pages 96-101, Published 23 October 2023

  20. arXiv:2404.13537  [pdf, other

    eess.IV cs.CV

    Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition

    Authors: Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul… ▽ More

    Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR 2024 Workshop, code: https://github.com/chengeng0613/HLNet

  21. arXiv:2404.12624  [pdf, other

    cs.RO cs.CV

    Dragtraffic: A Non-Expert Interactive and Point-Based Controllable Traffic Scene Generation Framework

    Authors: Sheng Wang, Ge Sun, Fulong Ma, Tianshuai Hu, Yongkang Song, Lei Zhu, Ming Liu

    Abstract: The evaluation and training of autonomous driving systems require diverse and scalable corner cases. However, most existing scene generation methods lack controllability, accuracy, and versatility, resulting in unsatisfactory generation results. To address this problem, we propose Dragtraffic, a generalized, point-based, and controllable traffic scene generation framework based on conditional diff… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  22. arXiv:2404.10642  [pdf, other

    cs.CL cs.LG

    Self-playing Adversarial Language Game Enhances LLM Reasoning

    Authors: Pengyu Cheng, Tianhao Hu, Han Xu, Zhisong Zhang, Yong Dai, Lei Han, Nan Du

    Abstract: We explore the self-play training procedure of large language models (LLMs) in a two-player adversarial language game called Adversarial Taboo. In this game, an attacker and a defender communicate around a target word only visible to the attacker. The attacker aims to induce the defender to speak the target word unconsciously, while the defender tries to infer the target word from the attacker's u… ▽ More

    Submitted 23 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Preprint

  23. arXiv:2404.09135  [pdf, ps, other

    cs.CL

    Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions

    Authors: Taojun Hu, Xiao-Hua Zhou

    Abstract: Natural Language Processing (NLP) is witnessing a remarkable breakthrough driven by the success of Large Language Models (LLMs). LLMs have gained significant attention across academia and industry for their versatile applications in text generation, question answering, and text summarization. As the landscape of NLP evolves with an increasing number of domain-specific LLMs employing diverse techni… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  24. arXiv:2404.03663  [pdf, other

    cs.NE cs.CV

    Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips

    Authors: Man Yao, Jiakui Hu, Tianxiang Hu, Yifan Xu, Zhaokun Zhou, Yonghong Tian, Bo Xu, Guoqi Li

    Abstract: Neuromorphic computing, which exploits Spiking Neural Networks (SNNs) on neuromorphic chips, is a promising energy-efficient alternative to traditional AI. CNN-based SNNs are the current mainstream of neuromorphic computing. By contrast, no neuromorphic chips are designed especially for Transformer-based SNNs, which have just emerged, and their performance is only on par with CNN-based SNNs, offer… ▽ More

    Submitted 15 February, 2024; originally announced April 2024.

    Comments: Accepted by ICLR2024. Code and Model: https://github.com/BICLab/Spike-Driven-Transformer-V2

  25. arXiv:2404.01655  [pdf, other

    cs.CV

    FashionEngine: Interactive 3D Human Generation and Editing via Multimodal Controls

    Authors: Tao Hu, Fangzhou Hong, Zhaoxi Chen, Ziwei Liu

    Abstract: We present FashionEngine, an interactive 3D human generation and editing system that creates 3D digital humans via user-friendly multimodal controls such as natural languages, visual perceptions, and hand-drawing sketches. FashionEngine automates the 3D human production with three key components: 1) A pre-trained 3D human diffusion model that learns to model 3D humans in a semantic UV latent space… ▽ More

    Submitted 20 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Project Page: https://taohuumd.github.io/projects/FashionEngine

  26. arXiv:2404.01241  [pdf, other

    cs.CV

    StructLDM: Structured Latent Diffusion for 3D Human Generation

    Authors: Tao Hu, Fangzhou Hong, Ziwei Liu

    Abstract: Recent 3D human generative models have achieved remarkable progress by learning 3D-aware GANs from 2D images. However, existing 3D human generative methods model humans in a compact 1D latent space, ignoring the articulated structure and semantics of human body topology. In this paper, we explore more expressive and higher-dimensional latent space for 3D human modeling and propose StructLDM, a dif… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to ECCV 2024. Project page: https://taohuumd.github.io/projects/StructLDM/

  27. arXiv:2404.01225  [pdf, other

    cs.CV

    SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering

    Authors: Tao Hu, Fangzhou Hong, Ziwei Liu

    Abstract: Dynamic human rendering from video sequences has achieved remarkable progress by formulating the rendering as a mapping from static poses to human images. However, existing methods focus on the human appearance reconstruction of every single frame while the temporal motion relations are not fully explored. In this paper, we propose a new 4D motion modeling paradigm, SurMo, that jointly models the… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024. Project Page: https://taohuumd.github.io/projects/SurMo/

  28. arXiv:2404.00849  [pdf, other

    cs.CV

    Generating Content for HDR Deghosting from Frequency View

    Authors: Tao Hu, Qingsen Yan, Yuankai Qi, Yanning Zhang

    Abstract: Recovering ghost-free High Dynamic Range (HDR) images from multiple Low Dynamic Range (LDR) images becomes challenging when the LDR images exhibit saturation and significant motion. Recent Diffusion Models (DMs) have been introduced in HDR imaging field, demonstrating promising performance, particularly in achieving visually perceptible results compared to previous DNN-based methods. However, DMs… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR2024

  29. arXiv:2404.00750  [pdf, other

    cs.CL cs.CY

    Can Language Models Recognize Convincing Arguments?

    Authors: Paula Rescala, Manoel Horta Ribeiro, Tiancheng Hu, Robert West

    Abstract: The remarkable and ever-increasing capabilities of Large Language Models (LLMs) have raised concerns about their potential misuse for creating personalized, convincing misinformation and propaganda. To gain insights into LLMs' persuasive capabilities without directly engaging in experimentation with humans, we propose studying their performance on the related task of detecting convincing arguments… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  30. arXiv:2403.19425  [pdf, ps, other

    eess.IV cs.CV

    A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge

    Authors: Ezequiel de la Rosa, Mauricio Reyes, Sook-Lei Liew, Alexandre Hutton, Roland Wiest, Johannes Kaesmacher, Uta Hanning, Arsany Hakim, Richard Zubal, Waldo Valenzuela, David Robben, Diana M. Sima, Vincenzo Anania, Arne Brys, James A. Meakin, Anne Mickan, Gabriel Broocks, Christian Heitkamp, Shengbo Gao, Kongming Liang, Ziji Zhang, Md Mahfuzur Rahman Siddiquee, Andriy Myronenko, Pooya Ashtari, Sabine Van Huffel , et al. (33 additional authors not shown)

    Abstract: Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemi… ▽ More

    Submitted 3 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  31. arXiv:2403.17064  [pdf, other

    cs.CV cs.AI cs.LG

    Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

    Authors: Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Vincent Tao Hu, Björn Ommer

    Abstract: In recent years, advances in text-to-image (T2I) diffusion models have substantially elevated the quality of their generated images. However, achieving fine-grained control over attributes remains a challenge due to the limitations of natural language prompts (such as no continuous set of intermediate descriptions existing between ``person'' and ``old person''). Even though many methods were intro… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Project page: https://compvis.github.io/attribute-control

  32. arXiv:2403.16880  [pdf, other

    cs.RO

    DHP-Mapping: A Dense Panoptic Mapping System with Hierarchical World Representation and Label Optimization Techniques

    Authors: Tianshuai Hu, Jianhao Jiao, Yucheng Xu, Hongji Liu, Sheng Wang, Ming Liu

    Abstract: Maps provide robots with crucial environmental knowledge, thereby enabling them to perform interactive tasks effectively. Easily accessing accurate abstract-to-detailed geometric and semantic concepts from maps is crucial for robots to make informed and efficient decisions. To comprehensively model the environment and effectively manage the map data structure, we propose DHP-Mapping, a dense mappi… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Submit to IROS 2024. Project website https://github.com/hutslib/DHP-Mapping

  33. arXiv:2403.13802  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    ZigMa: A DiT-style Zigzag Mamba Diffusion Model

    Authors: Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Björn Ommer

    Abstract: The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long sequence modeling capability of a State-Space Model called Mamba to extend its applicability to visual data generation. Firstly, we identify a critical oversight in most current Mamba-based vision methods, namely the la… ▽ More

    Submitted 1 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Project Page: https://taohu.me/zigma/

  34. arXiv:2403.13788  [pdf, other

    cs.CV

    DepthFM: Fast Monocular Depth Estimation with Flow Matching

    Authors: Ming Gui, Johannes S. Fischer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer

    Abstract: Monocular depth estimation is crucial for numerous downstream vision tasks and applications. Current discriminative approaches to this problem are limited due to blurry artifacts, while state-of-the-art generative methods suffer from slow sampling due to their SDE nature. Rather than starting from noise, we seek a direct mapping from input image to depth map. We observe that this can be effectivel… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  35. arXiv:2403.07585  [pdf, other

    cs.DC cs.LG

    Communication Optimization for Distributed Training: Architecture, Advances, and Opportunities

    Authors: Yunze Wei, Tianshuo Hu, Cong Liang, Yong Cui

    Abstract: The past few years have witnessed the flourishing of large-scale deep neural network models with ever-growing parameter numbers. Training such large-scale models typically requires massive memory and computing resources that exceed those of a single GPU, necessitating distributed training. As GPU performance has rapidly evolved in recent years, computation time has shrunk, thereby increasing the p… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  36. arXiv:2403.06793  [pdf, other

    cs.CV

    Boosting Image Restoration via Priors from Pre-trained Models

    Authors: Xiaogang Xu, Shu Kong, Tao Hu, Zhe Liu, Hujun Bao

    Abstract: Pre-trained models with large-scale training data, such as CLIP and Stable Diffusion, have demonstrated remarkable performance in various high-level computer vision tasks such as image understanding and generation from language descriptions. Yet, their potential for low-level tasks such as image restoration remains relatively unexplored. In this paper, we explore such models to enhance image resto… ▽ More

    Submitted 19 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  37. arXiv:2402.17376  [pdf, other

    cs.CV cs.AI cs.LG

    Accelerating Diffusion Sampling with Optimized Time Steps

    Authors: Shuchen Xue, Zhaoqiang Liu, Fei Chen, Shifeng Zhang, Tianyang Hu, Enze Xie, Zhenguo Li

    Abstract: Diffusion probabilistic models (DPMs) have shown remarkable performance in high-resolution image synthesis, but their sampling efficiency is still to be desired due to the typically large number of sampling steps. Recent advancements in high-order numerical ODE solvers for DPMs have enabled the generation of high-quality images with much fewer sampling steps. While this is a significant developmen… ▽ More

    Submitted 1 July, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: CVPR 2024

  38. arXiv:2402.16305  [pdf, other

    cs.LG cs.AI

    Referee Can Play: An Alternative Approach to Conditional Generation via Model Inversion

    Authors: Xuantong Liu, Tianyang Hu, Wenjia Wang, Kenji Kawaguchi, Yuan Yao

    Abstract: As a dominant force in text-to-image generation tasks, Diffusion Probabilistic Models (DPMs) face a critical challenge in controllability, struggling to adhere strictly to complex, multi-faceted instructions. In this work, we aim to address this alignment challenge for conditional generation tasks. First, we provide an alternative view of state-of-the-art DPMs as a way of inverting advanced Vision… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  39. arXiv:2402.15170  [pdf, other

    cs.LG cs.AI

    The Surprising Effectiveness of Skip-Tuning in Diffusion Sampling

    Authors: Jiajun Ma, Shuchen Xue, Tianyang Hu, Wenjia Wang, Zhaoqiang Liu, Zhenguo Li, Zhi-Ming Ma, Kenji Kawaguchi

    Abstract: With the incorporation of the UNet architecture, diffusion probabilistic models have become a dominant force in image generation tasks. One key design in UNet is the skip connections between the encoder and decoder blocks. Although skip connections have been shown to improve training stability and model performance, we reveal that such shortcuts can be a limiting factor for the complexity of the t… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  40. arXiv:2402.13572  [pdf, other

    cs.LG cs.AI math.NA

    On the Expressive Power of a Variant of the Looped Transformer

    Authors: Yihang Gao, Chuanyang Zheng, Enze Xie, Han Shi, Tianyang Hu, Yu Li, Michael K. Ng, Zhenguo Li, Zhaoqiang Liu

    Abstract: Besides natural language processing, transformers exhibit extraordinary performance in solving broader applications, including scientific computing and computer vision. Previous works try to explain this from the expressive power and capability perspectives that standard transformers are capable of performing some algorithms. To empower transformers with algorithmic capabilities and motivated by t… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  41. arXiv:2402.10821  [pdf, other

    cs.CV

    Training Class-Imbalanced Diffusion Model Via Overlap Optimization

    Authors: Divin Yan, Lu Qi, Vincent Tao Hu, Ming-Hsuan Yang, Meng Tang

    Abstract: Diffusion models have made significant advances recently in high-quality image synthesis and related tasks. However, diffusion models trained on real-world datasets, which often follow long-tailed distributions, yield inferior fidelity for tail classes. Deep generative models, including diffusion models, are biased towards classes with abundant training images. To address the observed appearance o… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Technique Report

  42. arXiv:2402.10811  [pdf, other

    cs.CL cs.CY

    Quantifying the Persona Effect in LLM Simulations

    Authors: Tiancheng Hu, Nigel Collier

    Abstract: Large language models (LLMs) have shown remarkable promise in simulating human language and behavior. This study investigates how integrating persona variables-demographic, social, and behavioral factors-impacts LLMs' ability to simulate diverse perspectives. We find that persona variables account for <10% variance in annotations in existing subjective NLP datasets. Nonetheless, incorporating pers… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Main

  43. arXiv:2402.05375  [pdf, other

    cs.CV

    Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models

    Authors: Senmao Li, Joost van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, Jian Yang

    Abstract: The success of recent text-to-image diffusion models is largely due to their capacity to be guided by a complex text prompt, which enables users to precisely describe the desired content. However, these models struggle to effectively suppress the generation of undesired content, which is explicitly requested to be omitted from the generated image in the prompt. In this paper, we analyze how to man… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: ICLR 2024. Our code is available in https://github.com/sen-mao/SuppressEOT

  44. arXiv:2401.16190  [pdf

    q-bio.QM cs.AI

    AI prediction of cardiovascular events using opportunistic epicardial adipose tissue assessments from CT calcium score

    Authors: Tao Hu, Joshua Freeze, Prerna Singh, Justin Kim, Yingnan Song, Hao Wu, Juhwan Lee, Sadeer Al-Kindi, Sanjay Rajagopalan, David L. Wilson, Ammar Hoori

    Abstract: Background: Recent studies have used basic epicardial adipose tissue (EAT) assessments (e.g., volume and mean HU) to predict risk of atherosclerosis-related, major adverse cardiovascular events (MACE). Objectives: Create novel, hand-crafted EAT features, 'fat-omics', to capture the pathophysiology of EAT and improve MACE prediction. Methods: We segmented EAT using a previously-validated deep learn… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 7 pages, 1 central illustration, 6 figures, 5 tables

  45. arXiv:2401.15554  [pdf

    cs.CV

    Pericoronary adipose tissue feature analysis in CT calcium score images with comparison to coronary CTA

    Authors: Yingnan Song, Hao Wu, Juhwan Lee, Justin Kim, Ammar Hoori, Tao Hu, Vladislav Zimin, Mohamed Makhlouf, Sadeer Al-Kindi, Sanjay Rajagopalan, Chun-Ho Yun, Chung-Lieh Hung, David L. Wilson

    Abstract: We investigated the feasibility and advantages of using non-contrast CT calcium score (CTCS) images to assess pericoronary adipose tissue (PCAT) and its association with major adverse cardiovascular events (MACE). PCAT features from coronary CTA (CCTA) have been shown to be associated with cardiovascular risk but are potentially confounded by iodine. If PCAT in CTCS images can be similarly analyze… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

    Comments: 24 pages,10 figures

  46. arXiv:2401.11731  [pdf, other

    cs.NI cs.AI cs.LG

    Fast and Scalable Network Slicing by Integrating Deep Learning with Lagrangian Methods

    Authors: Tianlun Hu, Qi Liao, Qiang Liu, Antonio Massaro, Georg Carle

    Abstract: Network slicing is a key technique in 5G and beyond for efficiently supporting diverse services. Many network slicing solutions rely on deep learning to manage complex and high-dimensional resource allocation problems. However, deep learning models suffer limited generalization and adaptability to dynamic slicing configurations. In this paper, we propose a novel framework that integrates constrain… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: 6 pages, 5 figures, IEEE Global Communications Conference 2023

  47. arXiv:2401.11257  [pdf, other

    cs.MA cs.AI

    Measuring Policy Distance for Multi-Agent Reinforcement Learning

    Authors: Tianyi Hu, Zhiqiang Pu, Xiaolin Ai, Tenghai Qiu, Jianqiang Yi

    Abstract: Diversity plays a crucial role in improving the performance of multi-agent reinforcement learning (MARL). Currently, many diversity-based methods have been developed to overcome the drawbacks of excessive parameter sharing in traditional MARL. However, there remains a lack of a general metric to quantify policy differences among agents. Such a metric would not only facilitate the evaluation of the… ▽ More

    Submitted 28 January, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

    Comments: 9 pages, 6 figures

  48. arXiv:2401.02161  [pdf, other

    cs.CV

    Enhancing RAW-to-sRGB with Decoupled Style Structure in Fourier Domain

    Authors: Xuanhua He, Tao Hu, Guoli Wang, Zejin Wang, Run Wang, Qian Zhang, Keyu Yan, Ziyi Chen, Rui Li, Chenjun Xie, Jie Zhang, Man Zhou

    Abstract: RAW to sRGB mapping, which aims to convert RAW images from smartphones into RGB form equivalent to that of Digital Single-Lens Reflex (DSLR) cameras, has become an important area of research. However, current methods often ignore the difference between cell phone RAW images and DSLR camera RGB images, a difference that goes beyond the color matrix and extends to spatial structure due to resolution… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  49. arXiv:2312.10825  [pdf, other

    cs.CV cs.LG

    Latent Space Editing in Transformer-Based Flow Matching

    Authors: Vincent Tao Hu, David W Zhang, Pascal Mettes, Meng Tang, Deli Zhao, Cees G. M. Snoek

    Abstract: This paper strives for image editing via generative models. Flow Matching is an emerging generative modeling technique that offers the advantage of simple and efficient training. Simultaneously, a new transformer-based U-ViT has recently been proposed to replace the commonly used UNet for better scalability and performance in generative modeling. Hence, Flow Matching with a transformer backbone of… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: AAAI 2024 with Appendix

  50. arXiv:2312.09608  [pdf, other

    cs.CV

    Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models

    Authors: Senmao Li, Taihang Hu, Fahad Shahbaz Khan, Linxuan Li, Shiqi Yang, Yaxing Wang, Ming-Ming Cheng, Jian Yang

    Abstract: One of the key components within diffusion models is the UNet for noise prediction. While several works have explored basic properties of the UNet decoder, its encoder largely remains unexplored. In this work, we conduct the first comprehensive study of the UNet encoder. We empirically analyze the encoder features and provide insights to important questions regarding their changes at the inference… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.