research-article

Open access

Word-As-Image for Semantic Typography

Authors:

Daniel Cohen-Or,

Ariel ShamirAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 42, Issue 4

Article No.: 151, Pages 1 - 11

https://doi.org/10.1145/3592123

Published: 26 July 2023 Publication History

Abstract

A word-as-image is a semantic typography technique where a word illustration presents a visualization of the meaning of the word, while also preserving its readability. We present a method to create word-as-image illustrations automatically. This task is highly challenging as it requires semantic understanding of the word and a creative idea of where and how to depict these semantics in a visually pleasing and legible manner. We rely on the remarkable ability of recent large pretrained language-vision models to distill textual concepts visually. We target simple, concise, black-and-white designs that convey the semantics clearly. We deliberately do not change the color or texture of the letters and do not use embellishments. Our method optimizes the outline of each letter to convey the desired concept, guided by a pretrained Stable Diffusion model. We incorporate additional loss terms to ensure the legibility of the text and the preservation of the style of the font. We show high quality and engaging results on numerous examples and compare to alternative techniques. Code and demo will be available at our project page.

Supplementary Material

ZIP File (papers_301-supplemental.zip)

supplemental material.

Download
11.16 MB

MP4 File (papers_301_VOD.mp4)

presentation

Download
165.69 MB

References

[1]

Tomer Amit, Tal Shaharbany, Eliya Nachmani, and Lior Wolf. 2021. SegDiff: Image Segmentation with Diffusion Probabilistic Models.

[2]

Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended Diffusion for Text-Driven Editing of Natural Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18208--18218.

[3]

Samaneh Azadi, Matthew Fisher, Vladimir G. Kim, Zhaowen Wang, Eli Shechtman, and Trevor Darrell. 2018. Multi-Content GAN for Few-Shot Font Style Transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Salt Lake City, UT, USA, 7564--7573.

[4]

Elena Balashova, Amit H. Bermano, Vladimir G. Kim, Stephen DiVerdi, Aaron Hertzmann, and Thomas Funkhouser. 2019. Learning a Stroke-Based Representation for Fonts. Computer Graphics Forum 38, 1 (2019), 429--442.

[5]

Brad Barber and Hannu Huhdanpaa. 1995. QHull. The Geometry Center, University of Minnesota, http://www.geom.umn.edu/software/qhull (1995).

[6]

Daniel Berio, Frederic Fol Leymarie, Paul Asente, and Jose Echevarria. 2022. StrokeStyles: Stroke-Based Segmentation and Stylization of Fonts. ACM Trans. Graph. 41, 3, Article 28 (apr 2022), 21 pages.

Digital Library

[7]

Neill DF Campbell and Jan Kautz. 2014. Learning a Manifold of Fonts. ACM Transactions on Graphics (TOG) 33, 4 (2014). Article no. 91.

Digital Library

[8]

Hila Chefer, Shir Gur, and Lior Wolf. 2021. Transformer Interpretability Beyond Attention Visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 782--791.

[9]

Jooyoung Choi, Sungwon Kim, Yonghyun Jeong, Youngjune Gwon, and Sungroh Yoon. 2021. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models. CoRR abs/2108.02938 (2021). arXiv:2108.02938 https://arxiv.org/abs/2108.02938

[10]

Boris Delaunay et al. 1934. Sur la sphere vide. Izv. Akad. Nauk SSSR, Otdelenie Matematicheskii i Estestvennyka Nauk 7, 793--800 (1934), 1--2.

[11]

Noa Fish, Lilach Perry, Amit Bermano, and Daniel Cohen-Or. 2020. SketchPatch: Sketch Stylization via Seamless Patch-Level Synthesis. ACM Trans. Graph. 39, 6, Article 227 (nov 2020), 14 pages.

Digital Library

[12]

Kevin Frans, Lisa B Soros, and Olaf Witkowski. 2021. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. arXiv preprint arXiv:2106.14843 (2021).

[13]

Freepik. 2010. Freepik Designs. https://www.freepik.com/

[14]

FreeType. 2009. FreeType library. https://freetype.org/

[15]

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. 2022. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion.

[16]

Rinon Gal, Moab Arar, Yuval Atzmon, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. 2023. Designing an Encoder for Fast Personalization of Text-to-Image Models.

[17]

David Ha and Douglas Eck. 2018. A Neural Representation of Sketch Drawings. In Sixth International Conference on Learning Representations (ICLR). https://arxiv.org/abs/1704.03477.

[18]

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. (2022).

[19]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising Diffusion Probabilistic Models. CoRR abs/2006.11239 (2020). arXiv:2006.11239 https://arxiv.org/abs/2006.11239

[20]

Kai Hormann and Günther Greiner. 2000. MIPS: An efficient global parametrization method. Technical Report. Erlangen-Nuernberg Univ (Germany) Computer Graphics Group.

[21]

Adobe Systems Inc. 1990. Adobe Type 1 Font Format. Addison Wesley Publishing Company.

[22]

Ajay Jain, Amber Xie, and Pieter Abbeel. 2022. VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models. arXiv preprint arXiv:2211.11319 (2022).

[23]

Yue Jiang, Zhouhui Lian, Yingmin Tang, and Jianguo Xiao. 2019. SCFont: Structure-Guided Chinese Font Generation via Deep Stacked Networks. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (Jul. 2019), 4015--4022.

Digital Library

[24]

Ji Lee. 2011. Word As Image. Adams Media, London.

[25]

Tzu-Mao Li, Michal Lukáč, Gharbi Michaël, and Jonathan Ragan-Kelley. 2020. Differentiable Vector Graphics Rasterization for Editing and Learning. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 39, 6 (2020), 193:1--193:15.

[26]

Zhouhui Lian, Bo Zhao, Xudong Chen, and Jianguo Xiao. 2018. EasyFont: A style learning-based system to easily build your large-scale handwriting fonts. ACM Transactions on Graphics (TOG) 38, 1 (2018), 1--18.

Digital Library

[27]

Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. 2019. A Learned Representation for Scalable Vector Graphics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[28]

Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. 2022. Towards Layer-wise Image Vectorization.

[29]

Wendong Mao, Shuai Yang, Huihong Shi, Jiaying Liu, and Zhongfeng Wang. 2022. Intelligent Typography: Artistic Text Style Transfer for Complex Texture and Structure. IEEE Transactions on Multimedia (2022), 1--15.

Digital Library

[30]

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2022. SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations. In International Conference on Learning Representations.

[31]

Gal Metzer, Elad Richardson, Or Patashnik, Raja Giryes, and Daniel Cohen-Or. 2022. Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures.

[32]

Oscar Michel, Roi Bar-On, Richard Liu, Sagie Benaim, and Rana Hanocka. 2021. Text2Mesh: Text-Driven Neural Stylization for Meshes. arXiv preprint arXiv:2112.03221 (2021).

[33]

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).

[34]

Laurence Penney. 1996. A History of TrueType. https://www.truetype-typography.com/.

[35]

Huy Quoc Phan, Hongbo Fu, and Antoni B Chan. 2015. Flexyfont: Learning Transferring Rules for Flexible Typeface Synthesis. Computer Graphics Forum 34, 7 (2015), 245--256.

Digital Library

[36]

Ben Poole, Ajay Jain, Jonathan T Barron, and Ben Mildenhall. 2022. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988 (2022).

[37]

Lakshman Prasad. 1997. Morphological analysis of shapes. CNLS newsletter 139, 1 (1997), 1997--07.

[38]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning Transferable Visual Models From Natural Language Supervision. CoRR abs/2103.00020 (2021). arXiv:2103.00020 https://arxiv.org/abs/2103.00020

[39]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).

[40]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arXiv:2112.10752 [cs.CV]

[41]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 234--241.

[42]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2022. DreamBooth: Fine Tuning Text-to-image Diffusion Models for Subject-Driven Generation. (2022).

[43]

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding.

[44]

Kunpeng Song, Ligong Han, Bingchen Liu, Dimitris Metaxas, and Ahmed Elgammal. 2022. Diffusion Guided Domain Adaptation of Image Generators.

[45]

Rapee Suveeranont and Takeo Igarashi. 2010. Example-Based Automatic Font Generation. In Smart Graphics. Number LNCS 6133 in Lecture Notes in Computer Science. 127--138.

[46]

Purva Tendulkar, Kalpesh Krishna, Ramprasaath R. Selvaraju, and Devi Parikh. 2019. Trick or TReAT: Thematic Reinforcement for Artistic Typography.

[47]

Guy Tevet, Brian Gordon, Amir Hertz, Amit H Bermano, and Daniel Cohen-Or. 2022. Motionclip: Exposing human motion generation to clip space. In Computer Vision-ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23--27, 2022, Proceedings, Part XXII. Springer, 358--374.

[48]

Yingtao Tian and David Ha. 2021. Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Conceptst. arXiv:2109.08857 [cs.NE]

[49]

Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. 2022a. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation.

[50]

Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. 2022b. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation.

[51]

Yael Vinker, Yuval Alaluf, Daniel Cohen-Or, and Ariel Shamir. 2022a. CLIPascene: Scene Sketching with Different Types and Levels of Abstraction.

[52]

Yael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. 2022b. CLIPasso: Semantically-Aware Object Sketching. ACM Trans. Graph. 41, 4, Article 86 (jul 2022), 11 pages.

Digital Library

[53]

Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, and Thomas Wolf. 2022. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers.

[54]

Wenjing Wang, Jiaying Liu, Shuai Yang, and Zongming Guo. 2019. Typography With Decor: Intelligent Text Style Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]

Yizhi Wang and Zhouhui Lian. 2021. DeepVecFont: Synthesizing High-Quality Vector Fonts via Dual-Modality Learning. ACM Transactions on Graphics 40, 6 (Dec. 2021), 1--15.

Digital Library

[56]

Yizhi Wang, Guo Pu, Wenhan Luo, Yexin Wang, Pengfei Xiong, Hongwen Kang, and Zhouhui Lian. 2022. Aesthetic Text Logo Synthesis via Content-Aware Layout Inferring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2436--2445.

[57]

Jie Xu and Craig S. Kaplan. 2007. Calligraphic Packing. In Proceedings of Graphics Interface 2007 on - GI '07. ACM Press, Montreal, Canada, 43.

Digital Library

[58]

Shuai Yang, Jiaying Liu, Zhouhui Lian, and Zongming Guo. 2017. Awesome Typography: Statistics-Based Text Effects Transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]

Shuai Yang, Jiaying Liu, Wenhan Yang, and Zongming Guo. 2018. Context-Aware Un-supervised Text Stylization. In Proceedings of the 26th ACM International Conference on Multimedia (Seoul, Republic of Korea) (MM '18). Association for Computing Machinery, New York, NY, USA, 1688--1696.

Digital Library

[60]

Shuai Yang, Zhangyang Wang, and Jiaying Liu. 2022. Shape-Matching GAN++: Scale Controllable Dynamic Artistic Text Style Transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7 (2022), 3807--3820.

[61]

Junsong Zhang, Yu Wang, Weiyi Xiao, and Zhenshan Luo. 2017. Synthesizing Ornamental Typefaces: Synthesizing Ornamental Typefaces. Computer Graphics Forum 36, 1 (Jan. 2017), 64--75.

Digital Library

[62]

Renrui Zhang, Ziyu Guo, Wei Zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, and Hongsheng Li. 2021. PointCLIP: Point Cloud Understanding by CLIP.

[63]

Changqing Zou, Junjie Cao, Warunika Ranaweera, Ibraheem Alhashim, Ping Tan, Alla Sheffer, and Hao Zhang. 2016. Legible Compact Calligrams. ACM Transactions on Graphics 35, 4 (July 2016), 1--12.

Digital Library

[64]

Ju Jia Zou, Hung-Hsin Chang, and Hong Yan. 2001. Shape skeletonization by identifying discrete local symmetries. Pattern Recognition 34, 10 (2001), 1895--1905.

Cited By

Zhang GQian YDeng JCai X(2024)Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion ModelsApplied Sciences10.3390/app1408333814:8(3338)Online publication date: 15-Apr-2024
https://doi.org/10.3390/app14083338
Tojo KShamir ABickel BUmetani N(2024)Fabricable 3D Wire ArtACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657453(1-11)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657453
Avrahami OHertz AVinker YArar MFruchter SFried OCohen-Or DLischinski D(2024)The Chosen One: Consistent Characters in Text-to-Image Diffusion ModelsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657430(1-12)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657430
Show More Cited By

Index Terms

Word-As-Image for Semantic Typography
1. Computing methodologies

Recommendations

Two template matching approaches to Arabic, Amharic and Latin isolated characters recognition

With the establishment of commercial OCR systems for Latin text, recent research efforts have been directed at the design of recognition systems for non-Latin scripts, such as Japanese, Cyrillic, Chinese, Hindi, Tibetan, and in particular Arabic. The ...
Automatic script identification from images using cluster-based templates
ICDAR '95: Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1

We describe a system that automatically identifies the script used in documents stored electronically in image form. The system can learn to distinguish any number of scripts. It develops a set of representative symbols (templates) for each script by ...
Automatic generation of large-scale handwriting fonts via style learning
SA '16: SIGGRAPH ASIA 2016 Technical Briefs

Generating personal handwriting fonts with large amounts of characters is a boring and time-consuming task. Take Chinese fonts as an example, the official standard GB18030-2000 for commercial font products contains 27533 simplified Chinese characters. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 42, Issue 4

August 2023

1912 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3609020

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2023

Published in TOG Volume 42, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
1,016
Total Downloads

Downloads (Last 12 months)896
Downloads (Last 6 weeks)145

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang GQian YDeng JCai X(2024)Inv-ReVersion: Enhanced Relation Inversion Based on Text-to-Image Diffusion ModelsApplied Sciences10.3390/app1408333814:8(3338)Online publication date: 15-Apr-2024
https://doi.org/10.3390/app14083338
Tojo KShamir ABickel BUmetani N(2024)Fabricable 3D Wire ArtACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657453(1-11)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657453
Avrahami OHertz AVinker YArar MFruchter SFried OCohen-Or DLischinski D(2024)The Chosen One: Consistent Characters in Text-to-Image Diffusion ModelsACM SIGGRAPH 2024 Conference Papers10.1145/3641519.3657430(1-12)Online publication date: 13-Jul-2024
https://dl.acm.org/doi/10.1145/3641519.3657430
Xiao SWang LMa XZeng W(2024)TypeDance: Creating Semantic Typographic Logos from Image through Personalized GenerationProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642185(1-18)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642185
Zheng YCun XXia MPun C(2024)Sketch Video SynthesisComputer Graphics Forum10.1111/cgf.1504443:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15044
Tatsukawa YShen IQi AKoyama YIgarashi TShamir A(2024)FontCLIP: A Semantic Typography Visual‐Language Model for Multilingual Font ApplicationsComputer Graphics Forum10.1111/cgf.1504343:2Online publication date: 30-Apr-2024
https://doi.org/10.1111/cgf.15043
Mandal MBirkbeck NAdsumilli BBovik A(2024)Legit: Text Legibility For User-Generated Media2024 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP51287.2024.10647498(1152-1158)Online publication date: 27-Oct-2024
https://doi.org/10.1109/ICIP51287.2024.10647498
Koo JPark CSung M(2024)Posterior Distillation Sampling2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01268(13352-13361)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01268
Xing XZhou HWang CZhang JXu DYu Q(2024)SVGDreamer: Text Guided SVG Generation with Diffusion Model2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00435(4546-4555)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00435
Gal RVinker YAlaluf YBermano ACohen-Or DShamir AChechik G(2024)Breathing Life Into Sketches Using Text-to-Video Priors2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.00414(4325-4336)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.00414
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents