Abstract
An objective and fair evaluation metric is fundamental to scene text detection and recognition research. Existing metrics cannot handle properly one-to-many and many-to-one matchings that arise naturally from the bounding box granularity inconsistency issue. They also use thresholds to match the ground truth and detection boxes, which leads to unstable matching result. In this paper, we propose a novel End-to-end Evaluation Metric (EEM) to tackle these problems. EEM handles one-to-many and many-to-one matching cases more reasonably and is threshold-free. We design a simple yet effective method to find matching groups from the ground truth and detection boxes in an image. We further employ a label merging method and use normalized scores to evaluate the performance of end-to-end text recognition methods more fairly. We conduct extensive experiments on the ICDAR2015, RCTW dataset, and a new general OCR dataset covering 17 categories of real-life scenes. Experimental results demonstrate the effectiveness and fairness of the proposed evaluation metric.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
If there is no GT or DT box in a matching group, the GT or DT label is an empty string.
References
Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: AAAI (2018)
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Gomez, R., et al.: ICDAR2017 robust reading challenge on COCO-text. In: ICDAR, vol. 01, pp. 1435–1443 (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)
He, M., et al.: ICPR2018 contest on robust reading for multi-type web images. In: ICPR, pp. 7–12 (2018)
He, T., Tian, Z., Huang, W., Shen, C., Qiao, Y., Sun, C.: An end-to-end textspotter with explicit alignment and attention. In: CVPR, pp. 5020–5029 (2018)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR, pp. 1156–1160 (2015)
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR. pp. 1484–1493 (2013)
Liao, M., Lyu, P., He, M., Yao, C., Wu, W., Bai, X.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. (2019)
Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)
Liu, X., Liang, D., Yan, S., Chen, D., Qiao, Y., Yan, J.: FOTS: Fast oriented text spotting with a unified network. In: CVPR, pp. 5676–5685 (2018)
Liu, Y., Jin, L., Xie, Z., Luo, C., Zhang, S., Xie, L.: Tightness-aware evaluation protocol for scene text detection. In: CVPR, pp. 9612–9620 (2019)
Nayef, N., et al.: ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT. In: ICDAR, vol. 01, pp. 1454–1459 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS, pp. 91–99 (2015)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal, Mach. Intell. (2018)
Shi, B., et al.: ICDAR2017 competition on reading Chinese text in the wild (RCTW-17). In: ICDAR, vol. 1, pp. 1429–1434 (2017)
Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Wang, K., Babenko, B., Belongie, S.J.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)
Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. In: IJDAR, vol. 8, pp. 280–296 (2006)
Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: ICCV (2019)
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Hao, J. et al. (2021). EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12824. Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-86337-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86336-4
Online ISBN: 978-3-030-86337-1
eBook Packages: Computer ScienceComputer Science (R0)