Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3343031.3350929acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Editing Text in the Wild

Published: 15 October 2019 Publication History

Abstract

In this paper, we are interested in editing text in natural images, which aims to replace or modify a word in the source image with another one while maintaining its realistic look. This task is challenging, as the styles of both background and text need to be preserved so that the edited image is visually indistinguishable from the source image. Specifically, we propose an end-to-end trainable style retention network (SRNet) that consists of three modules: text conversion module, background inpainting module and fusion module. The text conversion module changes the text content of the source image into the target text while keeping the original text style. The background inpainting module erases the original text, and fills the text region with appropriate texture. The fusion module combines the information from the two former modules, and generates the edited text images. To our knowledge, this work is the first attempt to edit text in natural images at the word level. Both visual effects and quantitative results on synthetic and real-world dataset (ICDAR 2013) fully confirm the importance and necessity of modular decomposition. We also conduct extensive experiments to validate the usefulness of our method in various real-world applications such as text image synthesis, augmented reality (AR) translation, information hiding, etc.

References

[1]
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to Compose Neural Networks for Question Answering. In NAACL-HLT . 1545--1554.
[2]
Samaneh Azadi, Matthew Fisher, Vladimir G Kim, Zhaowen Wang, Eli Shechtman, and Trevor Darrell. 2018. Multi-content gan for few-shot font style transfer. In CVPR. 7564--7573.
[3]
Guha Balakrishnan, Amy Zhao, Adrian V Dalca, Fredo Durand, and John Guttag. 2018. Synthesizing images of humans in unseen poses. In CVPR. 8340--8348.
[4]
Shancheng Fang, Hongtao Xie, Zheng-Jun Zha, Nannan Sun, Jianlong Tan, and Yongdong Zhang. 2018. Attention and Language Ensemble for Scene Text Recognition with Convolutional Sequence Modeling. In ACM Multimedia. ACM, 248--256.
[5]
Victor Fragoso, Steffen Gauglitz, Shane Zamora, Jim Kleban, and Matthew Turk. 2011. TranslatAR: A mobile augmented reality translator. In WACV. IEEE, 497--502.
[6]
Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image style transfer using convolutional neural networks. In CVPR. 2414--2423.
[7]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NeurIPS. 2672--2680.
[8]
Ankush Gupta, Andrea Vedaldi, and Andrew Zisserman. 2016. Synthetic data for text localisation in natural images. In CVPR. 2315--2324.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770--778.
[10]
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In ICML. 448--456.
[11]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In CVPR . 1125--1134.
[12]
M. Jaderberg, K. Simonyan, A. Vedaldi, and A. Zisserman. 2014. Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition. arXiv preprint arXiv:1406.2227 (2014).
[13]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real-time style transfer and super-resolution. In ECCV. Springer, 694--711.
[14]
Dimosthenis Karatzas, Faisal Shafait, Seiichi Uchida, Masakazu Iwamura, Lluis Gomez i Bigorda, Sergi Robles Mestre, Joan Mas, David Fernandez Mota, Jon Almazan Almazan, and Lluis Pere De Las Heras. 2013. ICDAR 2013 robust reading competition. In ICDAR. IEEE, 1484--1493.
[15]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR. 13.
[16]
Shangbang Long, Xin He, and Cong Yao. 2018. Scene Text Detection and Recognition: The Deep Learning Era. arXiv preprint arXiv:1811.04256 (2018).
[17]
Pengyuan Lyu, Xiang Bai, Cong Yao, Zhen Zhu, Tengteng Huang, and Wenyu Liu. 2017. Auto-encoder guided gan for chinese calligraphy synthesis. In ICDAR, Vol. 1. IEEE, 1095--1100.
[18]
Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. 2016. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In IC3DV . IEEE, 565--571.
[19]
Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).
[20]
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957 (2018).
[21]
Toshiki Nakamura, Anna Zhu, Keiji Yanai, and Seiichi Uchida. 2017. Scene text eraser. In ICDAR, Vol. 1. IEEE, 832--837.
[22]
Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR .
[23]
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In MICCAI. Springer, 234--241.
[24]
Prasun Roy, Saumik Bhattacharya, Subhankar Ghosh, and Umapada Pal. 2019. STEFANN: Scene Text Editor using Font Adaptive Neural Network. arXiv preprint arXiv:1903.01192 (2019).
[25]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. ImageNet Large Scale Visual Recognition Challenge. IJCV, Vol. 3, 115 (2015), 211--252.
[26]
Baoguang Shi, Xiang Bai, and Cong Yao. 2017. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE TPAMI, Vol. 39, 11 (2017), 2298--2304.
[27]
Baoguang Shi, Mingkun Yang, Xinggang Wang, Pengyuan Lyu, Cong Yao, and Xiang Bai. 2018. Aster: An attentional scene text recognizer with flexible rectification. IEEE TPAMI (2018).
[28]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR .
[29]
Danyang Sun, Tongzheng Ren, Chongxun Li, Hang Su, and Jun Zhu. 2017. Learning to write stylized chinese characters by reading a handful of examples. IJCAI (2017).
[30]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, Eero P Simoncelli, et almbox. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, Vol. 13, 4 (2004), 600--612.
[31]
Shuai Yang, Jiaying Liu, Zhouhui Lian, and Zongming Guo. 2017. Awesome typography: Statistics-based text effects transfer. In NeurIPS . 7464--7473.
[32]
Shuai Yang, Jiaying Liu, Wenjing Wang, and Zongming Guo. 2019. Tet-gan: Text effects transfer via stylization and destylization. In AAAI, Vol. 33. 1238--1245.
[33]
Shuai Yang, Jiaying Liu, Wenhan Yang, and Zongming Guo. 2018. Context-Aware Unsupervised Text Stylization. In ACM Multimedia. ACM, 1688--1696.
[34]
Chengquan Zhang, Borong Liang, Zuming Huang, Mengyi En, Junyu Han, Errui Ding, and Xinghao Ding. 2019 a. Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes. In CVPR . 10552--10561.
[35]
Shuaitao Zhang, Yuliang Liu, Lianwen Jin, Yaoxiong Huang, and Songxuan Lai. 2019 b. Ensnet: Ensconce text in the wild. In AAAI, Vol. 33. 801--808.
[36]
TY Zhang and Ching Y Suen. 1984. A fast parallel algorithm for thinning digital patterns. Commun. ACM, Vol. 27, 3 (1984), 236--239.
[37]
Yexun Zhang, Ya Zhang, and Wenbin Cai. 2018. Separating style and content for generalized style transfer. In CVPR. 8447--8455.
[38]
Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, and Xiang Bai. 2016. Multi-oriented text detection with fully convolutional networks. In CVPR . 4159--4167.
[39]
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: an efficient and accurate scene text detector. In CVPR. 5551--5560.
[40]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV . 2223--2232.
[41]
Zhen Zhu, Tengteng Huang, Baoguang Shi, Miao Yu, Bofei Wang, and Xiang Bai. 2019. Progressive Pose Attention Transfer for Person Image Generation. In CVPR . 2347--2356.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '19: Proceedings of the 27th ACM International Conference on Multimedia
October 2019
2794 pages
ISBN:9781450368896
DOI:10.1145/3343031
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. gan
  2. text editing
  3. text erasure
  4. text synthesis

Qualifiers

  • Research-article

Funding Sources

Conference

MM '19
Sponsor:

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)99
  • Downloads (Last 6 weeks)11
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Image Inpainting of Portraits Artwork Design and ImplementationITM Web of Conferences10.1051/itmconf/2025700302670(03026)Online publication date: 23-Jan-2025
  • (2025)Toward real text manipulation detection: New dataset and new solutionPattern Recognition10.1016/j.patcog.2024.110828157(110828)Online publication date: Jan-2025
  • (2025)Diff-TST: Diffusion model for one-shot text-image style transferExpert Systems with Applications10.1016/j.eswa.2024.125747263(125747)Online publication date: Mar-2025
  • (2025)Enhancing scene text detectors with realistic text image synthesis using diffusion modelsComputer Vision and Image Understanding10.1016/j.cviu.2024.104224250(104224)Online publication date: Jan-2025
  • (2025)SceneTextStyler: Editing Text with Style TransformationMultiMedia Modeling10.1007/978-981-96-2074-6_22(194-201)Online publication date: 1-Jan-2025
  • (2024)Jointly Text Region and Stroke Modeling for Scene Text RemovalProceedings of the 2nd International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice10.1145/3688867.3690167(3-10)Online publication date: 28-Oct-2024
  • (2024)Towards Diverse and Consistent Typography Generation2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00713(7281-7290)Online publication date: 3-Jan-2024
  • (2024)On Manipulating Scene Text in the Wild with Diffusion Models2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00512(5190-5199)Online publication date: 3-Jan-2024
  • (2024)Textual Alchemy: CoFormer for Scene Text Understanding2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00291(2919-2929)Online publication date: 3-Jan-2024
  • (2024)Explicitly-Decoupled Text Transfer With Minimized Background Reconstruction for Scene Text EditingIEEE Transactions on Image Processing10.1109/TIP.2024.347735533(5921-5935)Online publication date: 2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media