Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–6 of 6 results for author: Baumann, S A

.
  1. arXiv:2405.07913  [pdf, other

    cs.CV

    CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models

    Authors: Nick Stracke, Stefan Andreas Baumann, Joshua M. Susskind, Miguel Angel Bautista, Björn Ommer

    Abstract: Text-to-image generative models have become a prominent and powerful tool that excels at generating high-resolution realistic images. However, guiding the generative process of these models to consider detailed forms of conditioning reflecting style and/or structure information remains an open problem. In this paper, we present LoRAdapter, an approach that unifies both style and structure conditio… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  2. arXiv:2403.17064  [pdf, other

    cs.CV cs.AI cs.LG

    Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

    Authors: Stefan Andreas Baumann, Felix Krause, Michael Neumayr, Nick Stracke, Vincent Tao Hu, Björn Ommer

    Abstract: In recent years, advances in text-to-image (T2I) diffusion models have substantially elevated the quality of their generated images. However, achieving fine-grained control over attributes remains a challenge due to the limitations of natural language prompts (such as no continuous set of intermediate descriptions existing between ``person'' and ``old person''). Even though many methods were intro… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Project page: https://compvis.github.io/attribute-control

  3. arXiv:2403.13802  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    ZigMa: A DiT-style Zigzag Mamba Diffusion Model

    Authors: Vincent Tao Hu, Stefan Andreas Baumann, Ming Gui, Olga Grebenkova, Pingchuan Ma, Johannes Fischer, Björn Ommer

    Abstract: The diffusion model has long been plagued by scalability and quadratic complexity issues, especially within transformer-based structures. In this study, we aim to leverage the long sequence modeling capability of a State-Space Model called Mamba to extend its applicability to visual data generation. Firstly, we identify a critical oversight in most current Mamba-based vision methods, namely the la… ▽ More

    Submitted 1 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: Project Page: https://taohu.me/zigma/

  4. arXiv:2403.13788  [pdf, other

    cs.CV

    DepthFM: Fast Monocular Depth Estimation with Flow Matching

    Authors: Ming Gui, Johannes S. Fischer, Ulrich Prestel, Pingchuan Ma, Dmytro Kotovenko, Olga Grebenkova, Stefan Andreas Baumann, Vincent Tao Hu, Björn Ommer

    Abstract: Monocular depth estimation is crucial for numerous downstream vision tasks and applications. Current discriminative approaches to this problem are limited due to blurry artifacts, while state-of-the-art generative methods suffer from slow sampling due to their SDE nature. Rather than starting from noise, we seek a direct mapping from input image to depth map. We observe that this can be effectivel… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  5. arXiv:2401.11605  [pdf, other

    cs.CV cs.AI cs.LG

    Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

    Authors: Katherine Crowson, Stefan Andreas Baumann, Alex Birch, Tanishq Mathew Abraham, Daniel Z. Kaplan, Enrico Shippole

    Abstract: We present the Hourglass Diffusion Transformer (HDiT), an image generative model that exhibits linear scaling with pixel count, supporting training at high-resolution (e.g. $1024 \times 1024$) directly in pixel-space. Building on the Transformer architecture, which is known to scale to billions of parameters, it bridges the gap between the efficiency of convolutional U-Nets and the scalability of… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: 20 pages, 13 figures, project page and code available at https://crowsonkb.github.io/hourglass-diffusion-transformers/

  6. arXiv:2312.07360  [pdf, other

    cs.CV

    Boosting Latent Diffusion with Flow Matching

    Authors: Johannes S. Fischer, Ming Gui, Pingchuan Ma, Nick Stracke, Stefan A. Baumann, Björn Ommer

    Abstract: Recently, there has been tremendous progress in visual synthesis and the underlying generative models. Here, diffusion models (DMs) stand out particularly, but lately, flow matching (FM) has also garnered considerable interest. While DMs excel in providing diverse images, they suffer from long training and slow generation. With latent diffusion, these issues are only partially alleviated. Converse… ▽ More

    Submitted 28 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.