Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3612059acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

PBFormer: Capturing Complex Scene Text Shape with Polynomial Band Transformer

Published: 27 October 2023 Publication History
  • Get Citation Alerts
  • Abstract

    We present PBFormer, an efficient yet powerful scene text detector that unifies the transformer with a novel text shape representation Polynomial Band (PB). The representation has four polynomial curves to fit a text's top, bottom, left, and right sides, which can capture a text with a complex shape by varying polynomial coefficients. PB has appealing features compared with conventional representations: 1) It can model different curvatures with a fixed number of parameters, while polygon-points-based methods need to utilize a different number of points. 2) It can distinguish adjacent or overlapping texts as they have apparent different curve coefficients, while segmentation-based or points-based methods suffer from adhesive spatial positions. PBFormer combines the PB with the transformer, which can directly generate smooth text contours sampled from predicted curves without interpolation. A parameter-free cross-scale pixel attention (CPA) module is employed to highlight the feature map of a suitable scale while suppressing the other feature maps. The simple operation can help detect small-scale texts and is compatible with the one-stage DETR framework, where no postprocessing exists for NMS. Furthermore, PBFormer is trained with a shape-contained loss, which not only enforces the piecewise alignment between the ground truth and the predicted curves but also makes curves' position and shapes consistent with each other. Without bells and whistles about text pre-training, our method is superior to the previous state-of-the-art text detectors on the arbitrary-shaped text datasets. Codes will be public.

    Supplementary Material

    MP4 File (1636-video.mp4)
    Presentation Video

    References

    [1]
    Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. 2019. Character Region Awareness for Text Detection. In CVPR. 9365--9374.
    [2]
    Youngmin Baek, Seung Shin, Jeonghun Baek, Sungrae Park, Junyeop Lee, Daehyun Nam, and Hwalsuk Lee. 2020. Character Region Attention for Text Spotting. In ECCV (29) (Lecture Notes in Computer Science, Vol. 12374). 504--521.
    [3]
    Chee Kheng Chng and Chee Seng Chan. 2017. Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition. In ICDAR. 935--942.
    [4]
    Chee Kheng Chng, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, ChuanMing Fang, Shuaitao Zhang, and Junyu Han. 2019. ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT. In ICDAR.
    [5]
    Pengwen Dai, Sanyi Zhang, Hua Zhang, and Xiaochun Cao. 2021. Progressive Contour Regression for Arbitrary-Shape Scene Text Detection. In CVPR. 7393--7402.
    [6]
    Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. 2018. PixelLink: Detecting Scene Text via Instance Segmentation. In AAAI. 6773--6780.
    [7]
    Wenyang Hu, Xiaocong Cai, Jun Hou, Shuai Yi, and Zhiping Lin. 2020. GTC: Guided Training of CTC towards Efficient and Accurate Scene Text Recognition. In AAAI. 11005--11012.
    [8]
    Hui Li, Peng Wang, and Chunhua Shen. 2017. Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks. In ICCV. 5248--5256.
    [9]
    Hui Li, Peng Wang, and Chunhua Shen. 2019. Towards End-to-End Text Spotting in Natural Scenes. CoRR, Vol. abs/1906.06013 (2019).
    [10]
    Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei A. F. Florê ncio, Cha Zhang, Zhoujun Li, and Furu Wei. 2023. TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models. In AAAI.
    [11]
    Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, and Xiang Bai. 2021. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 43, 2 (2021), 532--548.
    [12]
    Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, and Xiang Bai. 2020a. Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting. In ECCV (11) (Lecture Notes in Computer Science, Vol. 12356). 706--722.
    [13]
    Minghui Liao, Baoguang Shi, and Xiang Bai. 2018a. TextBoxes: A Single-Shot Oriented Scene Text Detector. IEEE Trans. Image Process., Vol. 27, 8 (2018), 3676--3690.
    [14]
    Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, and Xiang Bai. 2020b. Real-Time Scene Text Detection with Differentiable Binarization. In AAAI. 11474--11481.
    [15]
    Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-Song Xia, and Xiang Bai. 2018b. Rotation-Sensitive Regression for Oriented Scene Text Detection. In CVPR. 5909--5918.
    [16]
    Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, and Xiang Bai. 2022. Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion. CoRR, Vol. abs/2202.10304 (2022).
    [17]
    Ruijin Liu, Dapeng Chen, Tie Liu, Zhiliang Xiong, and Zejian Yuan. 2022. Learning to Predict 3D Lane Shape and Camera Pose from a Single Image via Geometry Constraints. In AAAI. 1765--1772.
    [18]
    Ruijin Liu, Zejian Yuan, Tie Liu, and Zhiliang Xiong. 2021b. End-to-end Lane Shape Prediction with Transformers. In WACV. 3693--3701.
    [19]
    Yuliang Liu, Hao Chen, Chunhua Shen, Tong He, Lianwen Jin, and Liangwei Wang. 2020a. ABCNet: Real-Time Scene Text Spotting With Adaptive Bezier-Curve Network. In CVPR. 9806--9815.
    [20]
    Yuliang Liu, Lianwen Jin, and ChuanMing Fang. 2020b. Arbitrarily Shaped Scene Text Detection With a Mask Tightness Text Detector. IEEE Trans. Image Process., Vol. 29 (2020), 2918--2930.
    [21]
    Yuliang Liu, Lianwen Jin, Shuaitao Zhang, Canjie Luo, and Sheng Zhang. 2019a. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit., Vol. 90 (2019), 337--345.
    [22]
    Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, and Hao Chen. 2021a. ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting. CoRR, Vol. abs/2105.03620 (2021).
    [23]
    Zichuan Liu, Guosheng Lin, Sheng Yang, Fayao Liu, Weisi Lin, and Wang Ling Goh. 2019b. Towards Robust Curve Text Detection With Conditional Spatial Expansion. In CVPR. 7269--7278.
    [24]
    Shangbang Long, Xin He, and Cong Yao. 2021. Scene Text Detection and Recognition: The Deep Learning Era. Int. J. Comput. Vis., Vol. 129, 1 (2021), 161--184.
    [25]
    Shangbang Long, Jiaqiang Ruan, Wenjie Zhang, Xin He, Wenhao Wu, and Cong Yao. 2018. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. In ECCV (2), Vol. 11206. 19--35.
    [26]
    Chixiang Ma, Lei Sun, Zhuoyao Zhong, and Qiang Huo. 2021. ReLaText: Exploiting visual relationships for arbitrary-shaped scene text detection with graph convolutional networks. Pattern Recognit., Vol. 111 (2021), 107684.
    [27]
    Nibal Nayef, Cheng-Lin Liu, Jean-Marc Ogier, Yash Patel, Michal Busta, Pinaki Nath Chowdhury, Dimosthenis Karatzas, Wafa Khlif, Jiri Matas, Umapada Pal, and Jean-Christophe Burie. 2019. ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition - RRC-MLT-2019. In ICDAR.
    [28]
    Xugong Qin, Yu Zhou, Youhui Guo, Dayan Wu, Zhihong Tian, Ning Jiang, Hongbin Wang, and Weiping Wang. 2021. Mask is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection. In ACM Multimedia. 414--423.
    [29]
    Sangeeth Reddy, Minesh Mathew, Lluís Gómez, Marçal Rusiñol, Dimosthenis Karatzas, and C. V. Jawahar. 2020. RoadText-1K: Text Detection & Recognition Dataset for Driving Videos. In ICRA. 11074--11080.
    [30]
    Tao Sheng, Jie Chen, and Zhouhui Lian. 2021. CentripetalText: An Efficient Text Instance Representation for Scene Text Detection. In NeurIPS. 335--346.
    [31]
    Baoguang Shi, Xiang Bai, and Serge J. Belongie. 2017. Detecting Oriented Text in Natural Images by Linking Segments. In CVPR. 3482--3490.
    [32]
    Jun Tang, Zhibo Yang, Yongpan Wang, Qi Zheng, Yongchao Xu, and Xiang Bai. 2019. SegLink: Detecting Dense and Arbitrary-shaped Scene Text by Instance-aware Component Grouping. Pattern Recognit., Vol. 96 (2019).
    [33]
    Jingqun Tang, Wenqing Zhang, Hongye Liu, Mingkun Yang, Bo Jiang, Guanglong Hu, and Xiang Bai. 2022. Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection. In CVPR.
    [34]
    Zhuotao Tian, Michelle Shu, Pengyuan Lyu, Ruiyu Li, Chao Zhou, Xiaoyong Shen, and Jiaya Jia. 2019a. Learning Shape-Aware Embedding for Scene Text Detection. In CVPR. 4234--4243.
    [35]
    Zhuotao Tian, Michelle Shu, Pengyuan Lyu, Ruiyu Li, Chao Zhou, Xiaoyong Shen, and Jiaya Jia. 2019b. Learning Shape-Aware Embedding for Scene Text Detection. In CVPR. 4234--4243.
    [36]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS. 5998--6008.
    [37]
    Fangfang Wang, Yifeng Chen, Fei Wu, and Xi Li. 2020a. TextRay: Contour-based Geometric Modeling for Arbitrary-shaped Scene Text Detection. In ACM Multimedia. ACM, 111--119.
    [38]
    Hao Wang, Pu Lu, Hui Zhang, Mingkun Yang, Xiang Bai, Yongchao Xu, Mengchao He, Yongpan Wang, and Wenyu Liu. 2020b. All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting. In AAAI. 12160--12167.
    [39]
    Pengfei Wang, Chengquan Zhang, Fei Qi, Zuming Huang, Mengyi En, Junyu Han, Jingtuo Liu, Errui Ding, and Guangming Shi. 2019d. A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning. In ACM Multimedia. 1277--1285.
    [40]
    Wenhai Wang, Enze Xie, Xiang Li, Wenbo Hou, Tong Lu, Gang Yu, and Shuai Shao. 2019b. Shape Robust Text Detection With Progressive Scale Expansion Network. In CVPR. 9336--9345.
    [41]
    Wenhai Wang, Enze Xie, Xiaoge Song, Yuhang Zang, Wenjia Wang, Tong Lu, Gang Yu, and Chunhua Shen. 2019c. Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network. In ICCV. IEEE, 8439--8448. https://doi.org/10.1109/ICCV.2019.00853
    [42]
    Wei Wang, Yu Zhou, Jiahao Lv, Dayan Wu, Guoqing Zhao, Ning Jiang, and Weiping Wang. 2022. TPSNet: Reverse Thinking of Thin Plate Splines for Arbitrary Shape Scene Text Representation. In ACM Multimedia. 5014--5025.
    [43]
    Xiaobing Wang, Yingying Jiang, Zhenbo Luo, Cheng-Lin Liu, Hyunsoo Choi, and Sungjin Kim. 2019a. Arbitrary Shape Scene Text Detection With Adaptive Text Region Representation. In CVPR. 6449--6458.
    [44]
    Yuxin Wang, Hongtao Xie, Zheng-Jun Zha, Mengting Xing, Zilong Fu, and Yongdong Zhang. 2020c. ContourNet: Taking a Further Step Toward Accurate Arbitrary-Shaped Scene Text Detection. In CVPR. 11750--11759.
    [45]
    Chuhui Xue, Shijian Lu, and Wei Zhang. 2019. MSR: Multi-Scale Shape Regression for Scene Text Detection. In IJCAI. 989--995.
    [46]
    Fangneng Zhan and Shijian Lu. 2019. ESIR: End-To-End Scene Text Recognition via Iterative Image Rectification. In CVPR. 2059--2068.
    [47]
    Chengquan Zhang, Borong Liang, Zuming Huang, Mengyi En, Junyu Han, Errui Ding, and Xinghao Ding. 2019. Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes. In CVPR. 10552--10561.
    [48]
    Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Chang Liu, Chun Yang, Hongfa Wang, and Xu-Cheng Yin. 2020. Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection. In CVPR. 9696--9705.
    [49]
    Shi-Xue Zhang, Xiaobin Zhu, Chun Yang, Hongfa Wang, and Xu-Cheng Yin. 2021. Adaptive Boundary Proposal Network for Arbitrary Shape Text Detection. In ICCV. 1285--1294.
    [50]
    Xiang Zhang, Yongwen Su, Subarna Tripathi, and Zhuowen Tu. 2022. Text Spotting Transformers. In CVPR.
    [51]
    Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: An Efficient and Accurate Scene Text Detector. In CVPR. IEEE Computer Society, 2642--2651.
    [52]
    Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021b. Deformable DETR: Deformable Transformers for End-to-End Object Detection. In ICLR.
    [53]
    Yiqin Zhu, Jianyong Chen, Lingyu Liang, Zhanghui Kuang, Lianwen Jin, and Wayne Zhang. 2021a. Fourier Contour Embedding for Arbitrary-Shaped Text Detection. In CVPR. 3123--3131.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. detection transformer
    2. polynomial regression
    3. scene text detection
    4. scene text representation

    Qualifiers

    • Research-article

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 995 of 4,171 submissions, 24%

    Upcoming Conference

    MM '24
    The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 51
      Total Downloads
    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 28 Jul 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media