Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Word-As-Image for Semantic Typography

Published: 26 July 2023 Publication History

Abstract

A word-as-image is a semantic typography technique where a word illustration presents a visualization of the meaning of the word, while also preserving its readability. We present a method to create word-as-image illustrations automatically. This task is highly challenging as it requires semantic understanding of the word and a creative idea of where and how to depict these semantics in a visually pleasing and legible manner. We rely on the remarkable ability of recent large pretrained language-vision models to distill textual concepts visually. We target simple, concise, black-and-white designs that convey the semantics clearly. We deliberately do not change the color or texture of the letters and do not use embellishments. Our method optimizes the outline of each letter to convey the desired concept, guided by a pretrained Stable Diffusion model. We incorporate additional loss terms to ensure the legibility of the text and the preservation of the style of the font. We show high quality and engaging results on numerous examples and compare to alternative techniques. Code and demo will be available at our project page.

Supplementary Material

ZIP File (papers_301-supplemental.zip)
supplemental material.
MP4 File (papers_301_VOD.mp4)
presentation

References

[1]
Tomer Amit, Tal Shaharbany, Eliya Nachmani, and Lior Wolf. 2021. SegDiff: Image Segmentation with Diffusion Probabilistic Models.
[2]
Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended Diffusion for Text-Driven Editing of Natural Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18208--18218.
[3]
Samaneh Azadi, Matthew Fisher, Vladimir G. Kim, Zhaowen Wang, Eli Shechtman, and Trevor Darrell. 2018. Multi-Content GAN for Few-Shot Font Style Transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Salt Lake City, UT, USA, 7564--7573.
[4]
Elena Balashova, Amit H. Bermano, Vladimir G. Kim, Stephen DiVerdi, Aaron Hertzmann, and Thomas Funkhouser. 2019. Learning a Stroke-Based Representation for Fonts. Computer Graphics Forum 38, 1 (2019), 429--442.
[5]
Brad Barber and Hannu Huhdanpaa. 1995. QHull. The Geometry Center, University of Minnesota, http://www.geom.umn.edu/software/qhull (1995).
[6]
Daniel Berio, Frederic Fol Leymarie, Paul Asente, and Jose Echevarria. 2022. StrokeStyles: Stroke-Based Segmentation and Stylization of Fonts. ACM Trans. Graph. 41, 3, Article 28 (apr 2022), 21 pages.
[7]
Neill DF Campbell and Jan Kautz. 2014. Learning a Manifold of Fonts. ACM Transactions on Graphics (TOG) 33, 4 (2014). Article no. 91.
[8]
Hila Chefer, Shir Gur, and Lior Wolf. 2021. Transformer Interpretability Beyond Attention Visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 782--791.
[9]
Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. 2021. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models. CoRR abs/2108.02938 (2021). arXiv:2108.02938 https://arxiv.org/abs/2108.02938
[10]
Boris Delaunay et al. 1934. Sur la sphere vide. Izv. Akad. Nauk SSSR, Otdelenie Matematicheskii i Estestvennyka Nauk 7, 793--800 (1934), 1--2.
[11]
Noa Fish, Lilach Perry, Amit Bermano, and Daniel Cohen-Or. 2020. SketchPatch: Sketch Stylization via Seamless Patch-Level Synthesis. ACM Trans. Graph. 39, 6, Article 227 (nov 2020), 14 pages.
[12]
Kevin Frans, Lisa B Soros, and Olaf Witkowski. 2021. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. arXiv preprint arXiv:2106.14843 (2021).
[13]
Freepik. 2010. Freepik Designs. https://www.freepik.com/
[14]
FreeType. 2009. FreeType library. https://freetype.org/
[15]
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. 2022. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion.
[16]
Rinon Gal, Moab Arar, Yuval Atzmon, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. 2023. Designing an Encoder for Fast Personalization of Text-to-Image Models.
[17]
David Ha and Douglas Eck. 2018. A Neural Representation of Sketch Drawings. In Sixth International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1704.03477.
[18]
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. (2022).
[19]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. CoRR abs/2006.11239 (2020). arXiv:2006.11239 https://arxiv.org/abs/2006.11239
[20]
Kai Hormann and Günther Greiner. 2000. MIPS: An efficient global parametrization method. Technical Report. Erlangen-Nuernberg Univ (Germany) Computer Graphics Group.
[21]
Adobe Systems Inc. 1990. Adobe Type 1 Font Format. Addison Wesley Publishing Company.
[22]
Ajay Jain, Amber Xie, and Pieter Abbeel. 2022. VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models. arXiv preprint arXiv:2211.11319 (2022).
[23]
Yue Jiang, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. 2019. SCFont: Structure-Guided Chinese Font Generation via Deep Stacked Networks. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (Jul. 2019), 4015--4022.
[24]
Ji Lee. 2011. Word As Image. Adams Media, London.
[25]
Tzu-Mao Li, Michal Lukáč, Gharbi Michaël, and Jonathan Ragan-Kelley. 2020. Differentiable Vector Graphics Rasterization for Editing and Learning. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 39, 6 (2020), 193:1--193:15.
[26]
Zhouhui Lian, Bo Zhao, Xudong Chen, and Jianguo Xiao. 2018. EasyFont: A style learning-based system to easily build your large-scale handwriting fonts. ACM Transactions on Graphics (TOG) 38, 1 (2018), 1--18.
[27]
Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. 2019. A Learned Representation for Scalable Vector Graphics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[28]
Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. 2022. Towards Layer-wise Image Vectorization.
[29]
Wendong Mao, Shuai Yang, Huihong Shi, Jiaying Liu, and Zhongfeng Wang. 2022. Intelligent Typography: Artistic Text Style Transfer for Complex Texture and Structure. IEEE Transactions on Multimedia (2022), 1--15.
[30]
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2022. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In International Conference on Learning Representations.
[31]
Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. 2022. Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures.
[32]
Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, and Rana Hanocka. 2021. Text2Mesh: Text-Driven Neural Stylization for Meshes. arXiv preprint arXiv:2112.03221 (2021).
[33]
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).
[34]
Laurence Penney. 1996. A History of TrueType. https://www.truetype-typography.com/.
[35]
Huy Quoc Phan, Hongbo Fu, and Antoni B Chan. 2015. Flexyfont: Learning Transferring Rules for Flexible Typeface Synthesis. Computer Graphics Forum 34, 7 (2015), 245--256.
[36]
Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2022. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).
[37]
Lakshman Prasad. 1997. Morphological analysis of shapes. CNLS newsletter 139, 1 (1997), 1997--07.
[38]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. CoRR abs/2103.00020 (2021). arXiv:2103.00020 https://arxiv.org/abs/2103.00020
[39]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).
[40]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752 [cs.CV]
[41]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.
[42]
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2022. DreamBooth: Fine Tuning Text-to-image Diffusion Models for Subject-Driven Generation. (2022).
[43]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding.
[44]
Kunpeng Song, Ligong Han, Bingchen Liu, Dimitris Metaxas, and Ahmed Elgammal. 2022. Diffusion Guided Domain Adaptation of Image Generators.
[45]
Rapee Suveeranont and Takeo Igarashi. 2010. Example-Based Automatic Font Generation. In Smart Graphics. Number LNCS 6133 in Lecture Notes in Computer Science. 127--138.
[46]
Purva Tendulkar, Kalpesh Krishna, Ramprasaath R. Selvaraju, and Devi Parikh. 2019. Trick or TReAT: Thematic Reinforcement for Artistic Typography.
[47]
Guy Tevet, Brian Gordon, Amir Hertz, Amit H Bermano, and Daniel Cohen-Or. 2022. Motionclip: Exposing human motion generation to clip space. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXII. Springer, 358--374.
[48]
Yingtao Tian and David Ha. 2021. Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Conceptst. arXiv:2109.08857 [cs.NE]
[49]
Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. 2022a. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation.
[50]
Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. 2022b. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation.
[51]
Yael Vinker, Yuval Alaluf, Daniel Cohen-Or, and Ariel Shamir. 2022a. CLIPascene: Scene Sketching with Different Types and Levels of Abstraction.
[52]
Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. 2022b. CLIPasso: Semantically-Aware Object Sketching. ACM Trans. Graph. 41, 4, Article 86 (jul 2022), 11 pages.
[53]
Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, and Thomas Wolf. 2022. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers.
[54]
Wenjing Wang, Jiaying Liu, Shuai Yang, and Zongming Guo. 2019. Typography With Decor: Intelligent Text Style Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55]
Yizhi Wang and Zhouhui Lian. 2021. DeepVecFont: Synthesizing High-Quality Vector Fonts via Dual-Modality Learning. ACM Transactions on Graphics 40, 6 (Dec. 2021), 1--15.
[56]
Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei Xiong, Hongwen Kang, and Zhouhui Lian. 2022. Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2436--2445.
[57]
Jie Xu and Craig S. Kaplan. 2007. Calligraphic Packing. In Proceedings of Graphics Interface 2007 on - GI '07. ACM Press, Montreal, Canada, 43.
[58]
Shuai Yang, Jiaying Liu, Zhouhui Lian, and Zongming Guo. 2017. Awesome Typography: Statistics-Based Text Effects Transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59]
Shuai Yang, Jiaying Liu, Wenhan Yang, and Zongming Guo. 2018. Context-Aware Un-supervised Text Stylization. In Proceedings of the 26th ACM International Conference on Multimedia (Seoul, Republic of Korea) (MM '18). Association for Computing Machinery, New York, NY, USA, 1688--1696.
[60]
Shuai Yang, Zhangyang Wang, and Jiaying Liu. 2022. Shape-Matching GAN++: Scale Controllable Dynamic Artistic Text Style Transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7 (2022), 3807--3820.
[61]
Junsong Zhang, Yu Wang, Weiyi Xiao, and Zhenshan Luo. 2017. Synthesizing Ornamental Typefaces: Synthesizing Ornamental Typefaces. Computer Graphics Forum 36, 1 (Jan. 2017), 64--75.
[62]
Renrui Zhang, Ziyu Guo, Wei Zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, and Hongsheng Li. 2021. PointCLIP: Point Cloud Understanding by CLIP.
[63]
Changqing Zou, Junjie Cao, Warunika Ranaweera, Ibraheem Alhashim, Ping Tan, Alla Sheffer, and Hao Zhang. 2016. Legible Compact Calligrams. ACM Transactions on Graphics 35, 4 (July 2016), 1--12.
[64]
Ju Jia Zou, Hung-Hsin Chang, and Hong Yan. 2001. Shape skeletonization by identifying discrete local symmetries. Pattern Recognition 34, 10 (2001), 1895--1905.

Cited By

View all
  • (2024)Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion ModelsApplied Sciences10.3390/app1408333814:8(3338)Online publication date: 15-Apr-2024
  • (2024)Fabricable 3D Wire ArtACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657453(1-11)Online publication date: 13-Jul-2024
  • (2024)The Chosen One: Consistent Characters in Text-to-Image Diffusion ModelsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657430(1-12)Online publication date: 13-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 42, Issue 4
August 2023
1912 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3609020
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2023
Published in TOG Volume 42, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. semantic typography
  2. stable diffusion
  3. fonts
  4. SVG

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)896
  • Downloads (Last 6 weeks)145
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion ModelsApplied Sciences10.3390/app1408333814:8(3338)Online publication date: 15-Apr-2024
  • (2024)Fabricable 3D Wire ArtACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657453(1-11)Online publication date: 13-Jul-2024
  • (2024)The Chosen One: Consistent Characters in Text-to-Image Diffusion ModelsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657430(1-12)Online publication date: 13-Jul-2024
  • (2024)TypeDance: Creating Semantic Typographic Logos from Image through Personalized GenerationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642185(1-18)Online publication date: 11-May-2024
  • (2024)Sketch Video SynthesisComputer Graphics Forum10.1111/cgf.1504443:2Online publication date: 30-Apr-2024
  • (2024)FontCLIP: A Semantic Typography Visual‐Language Model for Multilingual Font ApplicationsComputer Graphics Forum10.1111/cgf.1504343:2Online publication date: 30-Apr-2024
  • (2024)Legit: Text Legibility For User-Generated Media2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647498(1152-1158)Online publication date: 27-Oct-2024
  • (2024)Posterior Distillation Sampling2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01268(13352-13361)Online publication date: 16-Jun-2024
  • (2024)SVGDreamer: Text Guided SVG Generation with Diffusion Model2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00435(4546-4555)Online publication date: 16-Jun-2024
  • (2024)Breathing Life Into Sketches Using Text-to-Video Priors2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00414(4325-4336)Online publication date: 16-Jun-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media