Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3613904.3642824acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

PlantoGraphy: Incorporating Iterative Design Process into Generative Artificial Intelligence for Landscape Rendering

Published: 11 May 2024 Publication History
  • Get Citation Alerts
  • Abstract

    Landscape renderings are realistic images of landscape sites, allowing stakeholders to perceive better and evaluate design ideas. While recent advances in Generative Artificial Intelligence (GAI (generative artificial intelligence)) enable automated generation of landscape renderings, the End to End (endtoend) methods are not compatible with common design processes, leading to insufficient alignment with design idealizations and limited cohesion of iterative landscape design. Informed by a formative study for comprehending design requirements, we present PlantoGraphy, an iterative design system that allows for interactive configuration of generative artificial intelligence models to accommodate human-centered design practice. A two-stage pipeline is incorporated: first, the concretization module transforms conceptual ideas into concrete scene layouts with a domain-oriented large language model; and second, the illustration module converts scene layouts into realistic landscape renderings with a layout-guided diffusion model Fine-tune (finetune)ed through Low-Rank Adaptation (LoRA) (lora). PlantoGraphy has undergone a series of performance evaluations and user studies, demonstrating its effectiveness in landscape rendering generation and the high recognition of its interactive functionality.

    Supplemental Material

    MP4 File - Video Presentation
    Video Presentation
    MP4 File - A video preview of our work
    The video briefly demonstrates our work, including background, research question, system design and operational demonstration.
    ZIP File - The plant image examples used for LoRA training
    The image examples utilized for the LoRA training comprised from 23 distinct plant species. It is important to note that the dataset should encompass a diversity of representations for each plant species, showcasing variations in camera angle, size, and spatial arrangement.

    References

    [1]
    Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–13.
    [2]
    Oron Ashual and Lior Wolf. 2019. Specifying object attributes and relations in interactive scene generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4561–4569.
    [3]
    Elizabeth Boults and Chip Sullivan. 2010. Illustrated history of landscape design. John Wiley & Sons.
    [4]
    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in Neural Information Processing Systems 33 (2020), 1877–1901.
    [5]
    Tara Capel and Margot Brereton. 2023. What is Human-Centered about Human-Centered AI? A Map of the Research Landscape. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–23.
    [6]
    Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen, and Alex Hauptmann. 2021. A Comprehensive Survey of Scene Graphs: Generation and Application. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2021), 1–26.
    [7]
    Yu Cheng, Zhe Gan, Yitong Li, Jingjing Liu, and Jianfeng Gao. 2020. Sequential attention GAN for interactive image editing. In Proceedings of the 28th ACM international conference on multimedia. 4383–4391.
    [8]
    John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Sketching stories with generative pretrained language models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.
    [9]
    Hai Dang, Lukas Mecke, and Daniel Buschek. 2022. GANSlider: How Users Control Generative Models for Images using Multiple Sliders with and without Feedforward Information. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–15.
    [10]
    Richard Lee Davis, Thiemo Wambsganss, Wei Jiang, Kevin Gonyop Kim, Tanja Käser, and Pierre Dillenbourg. 2023. Fashioning the Future: Unlocking the Creative Potential of Deep Generative Models for Design Space Exploration. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. New York, NY, USA, Article 136, 9 pages.
    [11]
    Ellen Yi-Luen Do. 2005. Design sketches and sketch design tools. Knowledge-Based Systems 18, 8 (2005), 383–405.
    [12]
    Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, and Zhifang Sui. 2022. A survey for in-context learning. arXiv preprint arXiv:2301.00234 (2022).
    [13]
    Dave Epstein, Allan Jabri, Ben Poole, Alexei A Efros, and Aleksander Holynski. 2023. Diffusion self-guidance for controllable image generation. arXiv preprint arXiv:2306.00986 (2023).
    [14]
    Seamus W Filor. 1994. The nature of landscape design and design process. Landscape and Urban Planning 30, 3 (1994), 121–129.
    [15]
    David Foster. 2023. Generative Deep Learning, 2nd Edition. O’Reilly Media, Inc.
    [16]
    Gabriela Goldschmidt. 2014. Linkography: unfolding the design process. Mit Press.
    [17]
    Daniel M Herbert. 1993. Architectural study drawings. John Wiley & Sons.
    [18]
    Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840–6851.
    [19]
    Alain Hore and Djemel Ziou. 2010. Image quality metrics: PSNR vs. SSIM. In 2010 20th International Conference on Pattern Recognition. IEEE, 2366–2369.
    [20]
    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
    [21]
    Nanna Inie, Jeanette Falk, and Steve Tanimoto. 2023. Designing Participatory AI: Creative Professionals’ Worries and Expectations about Generative AI. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems. 1–8.
    [22]
    Justin Johnson, Agrim Gupta, and Li Fei-Fei. 2018. Image generation from scene graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1219–1228.
    [23]
    Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David Shamma, Michael Bernstein, and Li Fei-Fei. 2015. Image retrieval using scene graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3668–3678.
    [24]
    Tae Soo Kim, DaEun Choi, Yoonseo Choi, and Juho Kim. 2022. Stylette: Styling the web with natural language. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–17.
    [25]
    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).
    [26]
    Hyung-Kwon Ko, Gwanmo Park, Hyeon Jeon, Jaemin Jo, Juho Kim, and Jinwook Seo. 2023. Large-scale text-to-image generation models for visual artists’ creative works. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 919–933.
    [27]
    Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. URL https://arxiv. org/abs/2205.11916 (2022).
    [28]
    Bryan Lawson. 2006. How designers think: The design process demystified. Routledge.
    [29]
    Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jianwei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. 2023. Gligen: Open-set grounded text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22511–22521.
    [30]
    Long Lian, Boyi Li, Adam Yala, and Trevor Darrell. 2023. LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models. arXiv preprint arXiv:2305.13655 (2023).
    [31]
    Ryan Louie, Andy Coenen, Cheng Zhi Huang, Michael Terry, and Carrie J. Cai. 2020. Novice-AI Music Co-Creation via AI-Steering Tools for Deep Generative Models. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13. https://doi.org/10.1145/3313831.3376739
    [32]
    Merriam-Webster. 2023. Definition of Generative AI. https://www.merriam-webster.com/dictionary/generative%20AI Accessed: 11 December 2023.
    [33]
    Chong Mou, Xintao Wang, Liangbin Xie, Jian Zhang, Zhongang Qi, Ying Shan, and Xiaohu Qie. 2023. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453 (2023).
    [34]
    Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).
    [35]
    OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]
    [36]
    Han Qiao, Vivian Liu, and Lydia Chilton. 2022. Initial Images: Using Image Prompts to Improve Subject Representation in Multimodal AI Generated Art. In Proceedings of the 14th Conference on Creativity and Cognition (Venice, Italy). 15–28.
    [37]
    Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).
    [38]
    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695.
    [39]
    Murray Shanahan. 2022. Talking about large language models. arXiv preprint arXiv:2212.03551 (2022).
    [40]
    Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, 2022. Language models are multilingual chain-of-thought reasoners. arXiv preprint arXiv:2210.03057 (2022).
    [41]
    Jingyu Shi, Rahul Jain, Hyungjun Doh, Ryo Suzuki, and Karthik Ramani. 2023. An HCI-Centric Survey and Taxonomy of Human-Generative-AI Interactions. arXiv preprint arXiv:2310.07127 (2023).
    [42]
    Yang Shi, Tian Gao, Xiaohan Jiao, and Nan Cao. 2023. Understanding Design Collaboration Between Designers and Artificial Intelligence: A Systematic Literature Review. Proceedings of the ACM on Human-Computer Interaction 7, CSCW2 (2023), 1–35.
    [43]
    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256–2265.
    [44]
    Veera Vimpari, Annakaisa Kultima, Perttu Hämäläinen, and Christian Guckelsberger. 2023. "An Adapt-or-Die Type of Situation": Perception, Adoption, and Use of Text-To-Image-Generation AI by Game Industry Professionals. arXiv preprint arXiv:2302.12601 (2023).
    [45]
    Bo Wang, Tao Wu, Minfeng Zhu, and Peng Du. 2022. Interactive image synthesis with panoptic layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7783–7792.
    [46]
    Xinyi Wang, Wanrong Zhu, and William Yang Wang. 2023. Large language models are implicitly topic models: Explaining and finding good demonstrations for in-context learning. arXiv preprint arXiv:2301.11916 (2023).
    [47]
    Zeyu Wang, Cuong Nguyen, Paul Asente, and Julie Dorsey. 2023. PointShopAR: Supporting Environmental Design Prototyping Using Point Cloud in Augmented Reality. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–15.
    [48]
    Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, 2022. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682 (2022).
    [49]
    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, 2022. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
    [50]
    Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. 2023. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671 (2023).
    [51]
    Di Wu, Zhiwang Yu, Nan Ma, Jianan Jiang, Yuetian Wang, Guixiang Zhou, Hanhui Deng, and Yi Li. 2023. StyleMe: Towards Intelligent Fashion Generation with Designer Style. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–16.
    [52]
    Tongshuang Wu, Ellen Jiang, Aaron Donsbach, Jeff Gray, Alejandra Molina, Michael Terry, and Carrie J Cai. 2022. Promptchainer: Chaining large language model prompts through visual programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–10.
    [53]
    Shishi Xiao, Suizi Huang, Yue Lin, Yilin Ye, and Wei Zeng. 2023. Let the chart spark: Embedding semantic context into chart with text-to-image generative model. IEEE Transactions on Visualization and Computer Graphics (2023).
    [54]
    Ling Yang, Zhilin Huang, Yang Song, Shenda Hong, Guohao Li, Wentao Zhang, Bin Cui, Bernard Ghanem, and Ming-Hsuan Yang. 2022. Diffusion-based scene graph to image generation with masked contrastive pre-training. arXiv preprint arXiv:2211.11138 (2022).
    [55]
    Qian Yang, Aaron Steinfeld, Carolyn Rosé, and John Zimmerman. 2020. Re-examining whether, why, and how human-AI interaction is uniquely difficult to design. In Proceedings of the 2020 chi conference on human factors in computing systems. 1–13.
    [56]
    Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Thomas L Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601 (2023).
    [57]
    Lvmin Zhang and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543 (2023).
    [58]
    Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. 2022. Automatic Chain of Thought Prompting in Large Language Models. arXiv preprint arXiv:2210.03493 (2022).
    [59]
    Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, and Alex Smola. 2023. Multimodal chain-of-thought reasoning in language models. arXiv preprint arXiv:2302.00923 (2023).
    [60]
    Guangcong Zheng, Xianpan Zhou, Xuewei Li, Zhongang Qi, Ying Shan, and Xi Li. 2023. LayoutDiffusion: Controllable Diffusion Model for Layout-to-image Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22490–22499.
    [61]
    Zhaohui Zheng, Ping Wang, Dongwei Ren, Wei Liu, Rongguang Ye, Qinghua Hu, and Wangmeng Zuo. 2021. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Transactions on Cybernetics 52, 8 (2021), 8574–8586.

    Index Terms

    1. PlantoGraphy: Incorporating Iterative Design Process into Generative Artificial Intelligence for Landscape Rendering

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          CHI '24: Proceedings of the CHI Conference on Human Factors in Computing Systems
          May 2024
          18961 pages
          ISBN:9798400703300
          DOI:10.1145/3613904
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 11 May 2024

          Permissions

          Request permissions for this article.

          Check for updates

          Badges

          Author Tags

          1. Landscape rendering
          2. generative artificial intelligence
          3. large language model
          4. scene graph

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Data Availability

          The plant image examples used for LoRA training: The image examples utilized for the LoRA training comprised from 23 distinct plant species. It is important to note that the dataset should encompass a diversity of representations for each plant species, showcasing variations in camera angle, size, and spatial arrangement. https://dl.acm.org/doi/10.1145/3613904.3642824#pn5741-supplemental-material-1.zip
          The plant image examples used for LoRA training: The image examples utilized for the LoRA training comprised from 23 distinct plant species. It is important to note that the dataset should encompass a diversity of representations for each plant species, showcasing variations in camera angle, size, and spatial arrangement. https://dl.acm.org/doi/10.1145/3613904.3642824#pn5741-supplemental-material-1.zip

          Funding Sources

          • Guangzhou Basic and Applied Basic Research Foundation

          Conference

          CHI '24

          Acceptance Rates

          Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

          Upcoming Conference

          CHI PLAY '24
          The Annual Symposium on Computer-Human Interaction in Play
          October 14 - 17, 2024
          Tampere , Finland

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 623
            Total Downloads
          • Downloads (Last 12 months)623
          • Downloads (Last 6 weeks)338

          Other Metrics

          Citations

          View Options

          Get Access

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Full Text

          View this article in Full Text.

          Full Text

          HTML Format

          View this article in HTML Format.

          HTML Format

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media