research-article

PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions

Authors:

John Joon Young Chung,

Eytan AdarAuthors Info & Claims

UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology

Article No.: 6, Pages 1 - 17

https://doi.org/10.1145/3586183.3606777

Published: 29 October 2023 Publication History

Abstract

While diffusion-based text-to-image (T2I) models provide a simple and powerful way to generate images, guiding this generation remains a challenge. For concepts that are difficult to describe through language, users may struggle to create prompts. Moreover, many of these models are built as end-to-end systems, lacking support for iterative shaping of the image. In response, we introduce PromptPaint, which combines T2I generation with interactions that model how we use colored paints. PromptPaint allows users to go beyond language to mix prompts that express challenging concepts. Just as we iteratively tune colors through layered placements of paint on a physical canvas, PromptPaint similarly allows users to apply different prompts to different canvas areas and times of the generative process. Through a set of studies, we characterize different approaches for mixing prompts, design trade-offs, and socio-technical challenges for generative models. With PromptPaint we provide insight into future steerable generative tools.

Supplementary Material

ZIP File (Supplementary materials - Prompts and Results - Google Docs-compressed_1.pdf.zip)

Prompts and results from the user study.

Download
11.63 MB

ZIP File (3606777.zip)

Supplemental File

Download
181.45 MB

References

[1]

Adobe. 2023. Adobe Firefly (Beta). https://firefly.adobe.com/ Accessed: July, 2023.

[2]

Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, and Christian Frank. 2023. MusicLM: Generating Music From Text. https://doi.org/10.48550/ARXIV.2301.11325

[3]

Yuval Alaluf, Omer Tov, Ron Mokady, Rinon Gal, and Amit H. Bermano. 2021. HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing. CoRR abs/2111.15666 (2021). arXiv:2111.15666https://arxiv.org/abs/2111.15666

[4]

Atlee Arts. 2023. ELEMENTS OF ART and PRINCIPLES OF DESIGN. https://atleearts.weebly.com/uploads/5/0/4/9/50491891/3.elements_and_principles_of_art-descriptive_words__1_.pdf

[5]

AUTOMATIC1111. 2023. Stable-diffusion-webui: Features. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features Accessed: March, 2023.

[6]

AUTOMATIC1111. 2023. Stable-diffusion-webui: Negative prompt. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Negative-prompt Accessed: March, 2023.

[7]

Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended Diffusion for Text-driven Editing of Natural Images. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18187–18197. https://doi.org/10.1109/CVPR52688.2022.01767

[8]

Arpit Bansal, Eitan Borgnia, Hong-Min Chu, Jie S. Li, Hamid Kazemi, Furong Huang, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2022. Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise. https://doi.org/10.48550/ARXIV.2208.09392

[9]

Luca Benedetti, Holger Winnemöller, Massimiliano Corsini, and Roberto Scopigno. 2014. Painting with Bob: Assisted Creativity for Novices. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (Honolulu, Hawaii, USA) (UIST ’14). Association for Computing Machinery, New York, NY, USA, 419–428. https://doi.org/10.1145/2642918.2647415

Digital Library

[10]

Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2022. InstructPix2Pix: Learning to Follow Image Editing Instructions. https://doi.org/10.48550/ARXIV.2211.09800

[11]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 1877–1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf

[12]

Hsiang-Ting Chen, Li-Yi Wei, and Chun-Fa Chang. 2011. Nonlinear Revision Control for Images. ACM Trans. Graph. 30, 4, Article 105 (jul 2011), 10 pages. https://doi.org/10.1145/2010324.1965000

Digital Library

[13]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021. Evaluating Large Language Models Trained on Code. https://doi.org/10.48550/ARXIV.2107.03374

[14]

Erin Cherry and Celine Latulipe. 2014. Quantifying the Creativity Support of Digital Tools through the Creativity Support Index. ACM Trans. Comput.-Hum. Interact. 21, 4, Article 21 (jun 2014), 25 pages. https://doi.org/10.1145/2617588

Digital Library

[15]

Chia-Hsing Chiu, Yuki Koyama, Yu-Chi Lai, Takeo Igarashi, and Yonghao Yue. 2020. Human-in-the-Loop Differential Subspace Search in High-Dimensional Latent Space. ACM Trans. Graph. 39, 4, Article 85 (aug 2020), 15 pages. https://doi.org/10.1145/3386569.3392409

Digital Library

[16]

Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]

John Joon Young Chung, and Eytan Adar. 2023. Artinter: AI-powered Boundary Objects for Commissioning Visual Arts. In Designing Interactive Systems Conference (Pittsburgh, PA) (DIS ’23). Association for Computing Machinery, New York, NY, USA.

[18]

John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching Stories with Generative Pretrained Language Models. Association for Computing Machinery, New York, NY, USA.

[19]

Colaboratory. 2023. Colaboratory. https://colab.research.google.com/ Accessed: July, 2023.

[20]

Guillaume Couairon, Jakob Verbeek, Holger Schwenk, and Matthieu Cord. 2022. DiffEdit: Diffusion-based semantic image editing with mask guidance. https://doi.org/10.48550/ARXIV.2210.11427

[21]

Katherine Crowson, Stella Biderman, Daniel Kornis, Dashiell Stander, Eric Hallahan, Louis Castricato, and Edward Raff. 2022. VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance. https://doi.org/10.48550/ARXIV.2204.08583

[22]

Hai Dang, Lukas Mecke, and Daniel Buschek. 2022. GANSlider: How Users Control Generative Models for Images Using Multiple Sliders with and without Feedforward Information. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 569, 15 pages. https://doi.org/10.1145/3491102.3502141

Digital Library

[23]

Nicholas Davis, Chih-PIn Hsiao, Kunwar Yashraj Singh, Lisa Li, and Brian Magerko. 2016. Empirically Studying Participatory Sense-Making in Abstract Drawing with a Co-Creative Cognitive Agent. In Proceedings of the 21st International Conference on Intelligent User Interfaces (Sonoma, California, USA) (IUI ’16). Association for Computing Machinery, New York, NY, USA, 196–207. https://doi.org/10.1145/2856767.2856795

Digital Library

[24]

Richard De Charms. 1970. Personal causation: The international affective determinants of behavior. Acad. Press.

[25]

Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.). Vol. 34. Curran Associates, Inc., 8780–8794. https://proceedings.neurips.cc/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf

[26]

Daniel Dixon, Manoj Prasad, and Tracy Hammond. 2010. ICanDraw: Using Sketch Recognition and Corrective Feedback to Assist a User in Drawing Human Faces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). Association for Computing Machinery, New York, NY, USA, 897–906. https://doi.org/10.1145/1753326.1753459

Digital Library

[27]

Vincent Dumoulin, Jonathon Shlens, and Manjunath Kudlur. 2016. A learned representation for artistic style. arXiv preprint arXiv:1610.07629 (2016).

[28]

Elliot W Eisner. 1978. What do children learn when they paint?Art Education 31, 3 (1978), 6–11.

[29]

Yuki Endo. 2022. User-Controllable Latent Transformer for StyleGAN Image Layout Editing. Computer Graphics Forum 41, 7 (2022), 395–406. https://doi.org/10.1111/cgf.14686

[30]

Jennifer Fernquist, Tovi Grossman, and George Fitzmaurice. 2011. Sketch-Sketch Revolution: An Engaging Tutorial System for Guided Sketching and Application Learning. In Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology (Santa Barbara, California, USA) (UIST ’11). Association for Computing Machinery, New York, NY, USA, 373–382. https://doi.org/10.1145/2047196.2047245

Digital Library

[31]

Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2015. A Neural Algorithm of Artistic Style.CoRR abs/1508.06576 (2015). http://dblp.uni-trier.de/db/journals/corr/corr1508.html#GatysEB15a

[32]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

[33]

Gradio. 2023. Gradio. https://gradio.app/ Accessed: July, 2023.

[34]

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. (2022).

[35]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.). Vol. 33. Curran Associates, Inc., 6840–6851. https://proceedings.neurips.cc/paper/2020/file/4c5bcfec8584af0d967f1ab10179ca4b-Paper.pdf

[36]

Jonathan Ho and Tim Salimans. 2022. Classifier-Free Diffusion Guidance. https://doi.org/10.48550/ARXIV.2207.12598

[37]

Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, and Jingren Zhou. 2023. Composer: Creative and Controllable Image Synthesis with Composable Conditions. arxiv:2302.09778 [cs.CV]

[38]

H.D. Hume. 2010. The Art Teacher’s Book of Lists. Wiley. https://books.google.com/books?id=D4GwOqhPQ88C

[39]

Emmanuel Iarussi, Adrien Bousseau, and Theophanis Tsandilas. 2013. The Drawing Assistant: Automated Drawing Guidance and Feedback from Photographs. In Proceedings of the 26th Annual ACM Symposium on User Interface Software and Technology (St. Andrews, Scotland, United Kingdom) (UIST ’13). Association for Computing Machinery, New York, NY, USA, 183–192. https://doi.org/10.1145/2501988.2501997

Digital Library

[40]

Jennifer Jacobs, Joel Brandt, Radomír Mech, and Mitchel Resnick. 2018. Extending Manual Drawing Practices with Artist-Centric Programming Tools. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174164

Digital Library

[41]

Ajay Jain, Ben Mildenhall, Jonathan T. Barron, Pieter Abbeel, and Ben Poole. 2021. Zero-Shot Text-Guided Object Generation with Dream Fields. (December 2021).

[42]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 4396–4405. https://doi.org/10.1109/CVPR.2019.00453

[43]

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2022. Imagic: Text-Based Real Image Editing with Diffusion Models. https://doi.org/10.48550/ARXIV.2210.09276

[44]

Kevin Gonyop Kim, Richard Lee Davis, Alessia Eletta Coppi, Alberto Cattaneo, and Pierre Dillenbourg. 2022. Mixplorer: Scaffolding Design Space Exploration through Genetic Recombination of Multiple Peoples’ Designs to Support Novices’ Creativity. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 308, 13 pages. https://doi.org/10.1145/3491102.3501854

Digital Library

[45]

Tae Soo Kim, DaEun Choi, Yoonseo Choi, and Juho Kim. 2022. Stylette: Styling the Web with Natural Language. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 5, 17 pages. https://doi.org/10.1145/3491102.3501931

Digital Library

[46]

Yuki Koyama, Issei Sato, and Masataka Goto. 2020. Sequential Gallery for Interactive Visual Design Optimization. ACM Trans. Graph. 39, 4, Article 88 (aug 2020), 12 pages. https://doi.org/10.1145/3386569.3392444

Digital Library

[47]

Ben Krause, Akhilesh Deepak Gotmare, Bryan McCann, Nitish Shirish Keskar, Shafiq Joty, Richard Socher, and Nazneen Fatema Rajani. 2020. GeDi: Generative Discriminator Guided Sequence Generation. arxiv:2009.06367 [cs.CL]

[48]

Gihyun Kwon and Jong Chul Ye. 2021. CLIPstyler: Image Style Transfer with a Single Text Condition. arxiv:2112.00374 [cs.CV]

[49]

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. 2023. GLIGEN: Open-Set Grounded Text-to-Image Generation. https://doi.org/10.48550/ARXIV.2301.07093

[50]

Jun Hao Liew, Hanshu Yan, Daquan Zhou, and Jiashi Feng. 2022. MagicMix: Semantic Mixing with Diffusion Models. arXiv preprint arXiv:2210.16056 (2022).

[51]

Alex Limpaecher, Nicolas Feltman, Adrien Treuille, and Michael Cohen. 2013. Real-Time Drawing Assistance through Crowdsourcing. ACM Trans. Graph. 32, 4, Article 54 (July 2013), 8 pages. https://doi.org/10.1145/2461912.2462016

Digital Library

[52]

Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, and Joshua B. Tenenbaum. 2022. Compositional Visual Generation with Composable Diffusion Models. https://doi.org/10.48550/ARXIV.2206.01714

[53]

Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 384, 23 pages. https://doi.org/10.1145/3491102.3501825

Digital Library

[54]

Vivian Liu, Han Qiao, and Lydia Chilton. 2022. Opal: Multimodal Image Generation for News Illustration. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA, Article 73, 17 pages. https://doi.org/10.1145/3526113.3545621

Digital Library

[55]

Ryan Louie, Andy Coenen, Cheng Zhi Huang, Michael Terry, and Carrie J. Cai. 2020. Novice-AI Music Co-Creation via AI-Steering Tools for Deep Generative Models. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376739

Digital Library

[56]

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2022. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In International Conference on Learning Representations. https://openreview.net/forum?id=aBsCjcPu_tE

[57]

Midjourney. 2023. Midjourney. https://www.midjourney.com/ Accessed: July, 2023.

[58]

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).

[59]

Changhoon Oh, Jungwoo Song, Jinhan Choi, Seonghyeon Kim, Sungwoo Lee, and Bongwon Suh. 2018. I Lead, You Help but Only with Enough Details: Understanding User Experience of Co-Creation with Artificial Intelligence. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI ’18). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3173574.3174223

Digital Library

[60]

D. Y. Park and K. H. Lee. 2019. Arbitrary Style Transfer With Style-Attentional Networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5873–5881. https://doi.org/10.1109/CVPR.2019.00603

[61]

Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, and Richard Zhang. 2020. Swapping Autoencoder for Deep Image Manipulation. In Advances in Neural Information Processing Systems.

[62]

Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu. 2023. Zero-shot Image-to-Image Translation. https://doi.org/10.48550/ARXIV.2302.03027

[63]

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. 2022. DreamFusion: Text-to-3D using 2D Diffusion. arXiv (2022).

[64]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. arxiv:2103.00020 [cs.CV]

[65]

Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.). http://arxiv.org/abs/1511.06434

[66]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical Text-Conditional Image Generation with CLIP Latents. https://doi.org/10.48550/ARXIV.2204.06125

[67]

Mitchel Resnick, Brad Myers, Kumiyo Nakakoji, Ben Shneiderman, Randy Pausch, Ted Selker, and Mike Eisenberg. 2005. Design Principles for Tools to Support Creative Thinking. Technical Report.

[68]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752 [cs.CV]

[69]

Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. 2022. Palette: Image-to-Image Diffusion Models. In ACM SIGGRAPH 2022 Conference Proceedings (Vancouver, BC, Canada) (SIGGRAPH ’22). Association for Computing Machinery, New York, NY, USA, Article 15, 10 pages. https://doi.org/10.1145/3528233.3530757

Digital Library

[70]

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. https://doi.org/10.48550/ARXIV.2205.11487

[71]

Sarah Schwettmann, Evan Hernandez, David Bau, Samuel Klein, Jacob Andreas, and Antonio Torralba. 2021. Toward a Visual Concept Vocabulary for GAN Latent Space. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 6784–6792. https://doi.org/10.1109/ICCV48922.2021.00673

[72]

Ticha Sethapakdi and James McCann. 2019. Painting with CATS: Camera-Aided Texture Synthesis. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–9. https://doi.org/10.1145/3290605.3300287

Digital Library

[73]

Lu Sheng, Ziyi Lin, Jing Shao, and Xiaogang Wang. 2018. Avatar-Net: Multi-scale Zero-Shot Style Transfer by Feature Decoration. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018), 8242–8250.

[74]

Ben Shneiderman and Pattie Maes. 1997. Direct Manipulation vs. Interface Agents. Interactions 4, 6 (Nov. 1997), 42–61. https://doi.org/10.1145/267505.267514

Digital Library

[75]

Maria Shugrina, Jingwan Lu, and Stephen Diverdi. 2017. Playful Palette: An Interactive Parametric Color Mixer for Artists. ACM Trans. Graph. 36, 4, Article 61 (July 2017), 10 pages. https://doi.org/10.1145/3072959.3073690

Digital Library

[76]

Maria Shugrina, Wenjia Zhang, Fanny Chevalier, Sanja Fidler, and Karan Singh. 2019. Color Builder: A Direct Manipulation Interface for Versatile Color Theme Authoring. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3290605.3300686

Digital Library

[77]

Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, Devi Parikh, Sonal Gupta, and Yaniv Taigman. 2022. Make-A-Video: Text-to-Video Generation without Text-Video Data. https://doi.org/10.48550/ARXIV.2209.14792

[78]

Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2022. Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models. https://doi.org/10.48550/ARXIV.2212.03860

[79]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. In International Conference on Learning Representations. https://openreview.net/forum?id=St1giarCHLP

[80]

Stability.ai. 2023. DreamStudio. https://beta.dreamstudio.ai/ Accessed: July, 2023.

[81]

Hendrik Strobelt, Albert Webson, Victor Sanh, Benjamin Hoover, Johanna Beyer, Hanspeter Pfister, and Alexander M. Rush. 2022. Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation With Large Language Models. IEEE Transactions on Visualization and Computer Graphics (2022), 1–11. https://doi.org/10.1109/TVCG.2022.3209479

[82]

Qingkun Su, Wing Ho Andy Li, Jue Wang, and Hongbo Fu. 2014. EZ-Sketching: Three-Level Optimization for Error-Tolerant Image Tracing. ACM Trans. Graph. 33, 4, Article 54 (July 2014), 9 pages. https://doi.org/10.1145/2601097.2601202

Digital Library

[83]

Roman Suvorov, Elizaveta Logacheva, Anton Mashikhin, Anastasia Remizova, Arsenii Ashukha, Aleksei Silvestrov, Naejin Kong, Harshith Goka, Kiwoong Park, and Victor Lempitsky. 2021. Resolution-robust Large Mask Inpainting with Fourier Convolutions. arXiv preprint arXiv:2109.07161 (2021).

[84]

Dani Valevski, Matan Kalman, Yossi Matias, and Yaniv Leviathan. 2022. UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image. https://doi.org/10.48550/ARXIV.2210.09477

[85]

Ruben Villegas, Mohammad Babaeizadeh, Pieter-Jan Kindermans, Hernan Moraldo, Han Zhang, Mohammad Taghi Saffar, Santiago Castro, Julius Kunze, and Dumitru Erhan. 2022. Phenaki: Variable Length Video Generation From Open Domain Textual Description. https://doi.org/10.48550/ARXIV.2210.02399

[86]

Yunlong Wang, Shuyuan Shen, and Brian Y Lim. 2023. RePrompt: Automatic Prompt Editing to Refine AI-Generative Art Towards Precise Expressions. arXiv preprint arXiv:2302.09466 (2023).

[87]

Blake Williford, Abhay Doke, Michel Pahud, Ken Hinckley, and Tracy Hammond. 2019. DrawMyPhoto: Assisting Novices in Drawing from Photographs. In Proceedings of the 2019 on Creativity and Cognition (San Diego, CA, USA) (C&C ’19). Association for Computing Machinery, New York, NY, USA, 198–209. https://doi.org/10.1145/3325480.3325507

Digital Library

[88]

Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, and Juan Pablo Bello. 2021. Wav2CLIP: Learning Robust Audio Representations From CLIP. arXiv preprint arXiv:2110.11499 (2021).

[89]

Tongshuang Wu, Michael Terry, and Carrie J Cai. 2021. AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts. arXiv preprint arXiv:2110.01691 (2021).

[90]

Jun Xie, Aaron Hertzmann, Wilmot Li, and Holger Winnemöller. 2014. PortraitSketch: Face Sketching Assistance for Novices. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (Honolulu, Hawaii, USA) (UIST ’14). Association for Computing Machinery, New York, NY, USA, 407–417. https://doi.org/10.1145/2642918.2647399

Digital Library

[91]

Chuan Yan, John Joon Young Chung, Kiheon Yoon, Yotam Gingold, Eytan Adar, and Sungsoo Ray Hong. 2022. FlatMagic: Improving Flat Colorization through AI-driven Design for DigitalComic Professionals. Association for Computing Machinery, New York, NY, USA.

[92]

Enhao Zhang and Nikola Banovic. 2021. Method for Exploring Generative Adversarial Networks (GANs) via Automatically Generated Image Galleries. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 76, 15 pages. https://doi.org/10.1145/3411764.3445714

Digital Library

[93]

Lvmin Zhang and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. https://doi.org/10.48550/ARXIV.2302.05543

Cited By

Antony VHuang C(2024)ID.8: Co-Creating Visual Stories with Generative AIACM Transactions on Interactive Intelligent Systems10.1145/367227714:3(1-29)Online publication date: 15-Jun-2024
https://dl.acm.org/doi/10.1145/3672277
Peng XKoch JMackay W(2024)DesignPrompt: Using Multimodal Interaction for Design Exploration with Generative AIProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661588(804-818)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661588
Ding Z(2024)Advancing GUI for Generative AI: Charting the Design Space of Human-AI Interactions through Task Creativity and ComplexityCompanion Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640544.3645241(140-143)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640544.3645241
Show More Cited By

Index Terms

PromptPaint: Steering Text-to-Image Generation Through Paint Medium-like Interactions

Recommendations

The Creativity of Text-to-Image Generation
Academic Mindtrek '22: Proceedings of the 25th International Academic Mindtrek Conference

Text-guided synthesis of images has made a giant leap towards becoming a mainstream phenomenon. With text-to-image generation systems, anybody can create digital images and artworks. This provokes the question of whether text-to-image generation is ...
Beyond Text-to-Image: Multimodal Prompts to Explore Generative AI
CHI EA '23: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

Text-to-image AI systems have proven to have extraordinary generative capacities that have facilitated widespread adoption. However, these systems are primarily text-based, which is a fundamental inversion of what many artists are traditionally used to: ...
Diversified text-to-image generation via deep mutual information estimation
Abstract
Generating photo-realistic, text-matched, and diverse images simultaneously from given text descriptions is a challenging task in computer vision. Previous works mostly focus on visual realism and semantic relevance, but neglect ...
Highlights
- A deep MI estimation module is proposed to alleviate the limited diversity of T2I.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology

October 2023

1825 pages

ISBN:9798400701320

DOI:10.1145/3586183

Editors:
Sean Follmer
Stanford University, USA
,
Jeff Han,
Jürgen Steimle
Saarland University, Germany
,
Nathalie Henry Riche
Microsoft Research, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

UIST '23

Sponsor:

UIST '23: The 36th Annual ACM Symposium on User Interface Software and Technology

October 29 - November 1, 2023

CA, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 842 of 3,967 submissions, 21%

Upcoming Conference

UIST '24

Sponsor:
sigchi
sigchi

The 37th Annual ACM Symposium on User Interface Software and Technology

October 13 - 16, 2024

Pittsburgh , PA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
1,116
Total Downloads

Downloads (Last 12 months)1,116
Downloads (Last 6 weeks)91

Reflects downloads up to 09 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Antony VHuang C(2024)ID.8: Co-Creating Visual Stories with Generative AIACM Transactions on Interactive Intelligent Systems10.1145/367227714:3(1-29)Online publication date: 15-Jun-2024
https://dl.acm.org/doi/10.1145/3672277
Peng XKoch JMackay W(2024)DesignPrompt: Using Multimodal Interaction for Design Exploration with Generative AIProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661588(804-818)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661588
Ding Z(2024)Advancing GUI for Generative AI: Charting the Design Space of Human-AI Interactions through Task Creativity and ComplexityCompanion Proceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640544.3645241(140-143)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640544.3645241
Anderson BShah JKreminski M(2024)Homogenization Effects of Large Language Models on Human Creative IdeationProceedings of the 16th Conference on Creativity & Cognition10.1145/3635636.3656204(413-425)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3635636.3656204
Anderson BShah JKreminski M(2024)Evaluating Creativity Support Tools via Homogenization AnalysisExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3651088(1-7)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3651088
Mahdavi Goloujeh ASullivan AMagerko B(2024)Is It AI or Is It Me? Understanding Users’ Prompt Journey with Text-to-Image Generative AI ToolsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642861(1-13)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642861
Son KChoi DKim TKim YKim J(2024)GenQuery: Supporting Expressive Visual Search with Generative ModelsProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642847(1-19)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642847
Wang ZHuang YSong DMa LZhang T(2024)PromptCharm: Text-to-Image Generation through Multi-modal Prompting and RefinementProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642803(1-21)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642803
Choi DHong SPark JChung JKim J(2024)CreativeConnect: Supporting Reference Recombination for Graphic Design Ideation with Generative AIProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642794(1-25)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642794
Zhao YLi MBerger M(2024)CUPID: Contextual Understanding of Prompt‐conditioned Image DistributionsComputer Graphics Forum10.1111/cgf.1508643:3Online publication date: 10-Jun-2024
https://doi.org/10.1111/cgf.15086

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents