Contextual transformer sequence-based recognition network for medical examination reports

Wan, Honglin; Zhong, Zongfeng; Li, Tianping; Zhang, Huaxiang; Sun, Jiande

doi:10.1007/s10489-022-04420-4

Contextual transformer sequence-based recognition network for medical examination reports

Published: 31 December 2022

Volume 53, pages 17363–17380, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Honglin Wan¹,
Zongfeng Zhong¹,
Tianping Li¹,
Huaxiang Zhang² &
…
Jiande Sun ORCID: orcid.org/0000-0001-6157-2051²

340 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

The automatic recognition of the medical examination report table (MERT) is receiving increasing attention in recent years as it is an essential step for intelligent healthcare and medical treatment. However, there are still some challenges in the table prediction when it is applied practically. In this paper, a recognition network (CoT_SRN) for medical examination reports is proposed to improve the recognition accuracy of MERT structure and reconstruct the image into a spreadsheet. The network is based on contextual transformer sequence and consists of CoT encoder and SRN decoder. In the encoder, the CNN backbone is constructed to extract the MERT image structure features based on the Contextual Transformer (CoT) proposed in this paper. In the decoder, an attention head with gated recurrent unit (GRU) was used for feature sequence recognition to obtain the cell location and table structure represented by a structured language. In addition, MERT structure labels are defined as character-level HTML formats, which are added in the training of the table structure recognition. The proposed method can achieve competitive tree-edit-distance-based similarity (TEDS) scores on the English datasets, such as PubTabNet and SciTSR, and Chinese datasets, such as the Chinese medical document dataset (CMDD). It demonstrates that the Cot_SRN is helpful to preserve the good performance across multi-language MERT structure recognition. Additionally, the performance of the proposed method is verified on the practical examples with folds and small angle deflection. The experimental results show that the proposed method is promising in practical application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Report Generation Method based on Multiscale Feature Extraction and Word Attention Network

Multi-modal transformer architecture for medical image analysis and automated report generation

Article Open access 20 August 2024

IIHT: Medical Report Generation with Image-to-Indicator Hierarchical Transformer

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets generated during and analysed during the current study are not publicly available due not authorized by the partner organization but are available from the corresponding author on reasonable request.

References

Prasad, D., Gadpal, A., Kapadni, K., Visave, M., & Sultanpure, K. (2020) CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 572-573). https://doi.org/10.1109/CVPRW50498.2020.00294
Paliwal, S. S., Vishwanath, D., Rahul, R., Sharma, M., & Vig, L. (2019) Tablenet: deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In 2019 international conference on document analysis and recognition (ICDAR) (pp. 128–133). IEEE. https://doi.org/10.1109/ICDAR.2019.00029
Schreiber, S., Agne, S., Wolf, I., Dengel, A., & Ahmed, S. (2017) Deepdesrt: deep learning for detection and structure recognition of tables in document images. In 2017 14th IAPR international conference on document analysis and recognition (ICDAR) (Vol. 1, pp. 1162-1167). IEEE. https://doi.org/10.1109/ICDAR.2017.192
Tensmeyer C, Morariu VI, Price B, Cohen S, Martinez T (2019) Deep splitting and merging for table structure decomposition. In 2019 international conference on document analysis and recognition (ICDAR) (pp. 114–121). IEEE. https://doi.org/10.1109/ICDAR.2019.00027
Siddiqui SA, Fateh IA, Rizvi STR, Dengel A, Ahmed S (2019) Deeptabstr: deep learning based table structure recognition. In 2019 international conference on document analysis and recognition (ICDAR) (pp. 1403-1409). IEEE. https://doi.org/10.1109/ICDAR.2019.00226
Siddiqui SA, Khan PI, Dengel A, Ahmed S (2019) Rethinking semantic segmentation for table structure recognition in documents. In 2019 international conference on document analysis and recognition (ICDAR) (pp. 1397-1402). IEEE. https://doi.org/10.1109/ICDAR.2019.00225
Xue W, Li Q, Tao D (2019) ReS2TIM: reconstruct syntactic structures from table images. In 2019 international conference on document analysis and recognition (ICDAR) (pp. 749–755). IEEE. https://doi.org/10.1109/ICDAR.2019.00125
Xue W, Yu B, Wang W, Tao D, Li Q. (2021) Tgrnet: a table graph reconstruction network for table structure recognition. In proceedings of the IEEE/CVF international conference on computer vision (pp. 1295-1304). https://ieeexplore.ieee.org/document/9709898
Qasim SR, Mahmood H, Shafait F (2019) Rethinking table recognition using graph neural networks. In 2019 international conference on document analysis and recognition (ICDAR) (pp. 142–147). IEEE. https://doi.org/10.1109/ICDAR.2019.00031
Li Y, Huang Z, Yan J, Zhou Y, Ye F, Liu X (2021) GFTE: graph-based financial table extraction. In International conference on pattern recognition (pp. 644–658). Springer, Cham. https://doi.org/10.1007/978-3-030-68790-8_50
Zhong X, ShafieiBavani E, Jimeno Yepes A (2020) Image-based table recognition: data, model, and evaluation. In European conference on computer vision (pp. 564–580). Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_34
Qiao L, Li Z, Cheng Z, Zhang P, Pu S, Niu Y, ..., Wu F (2021) Lgpma: complicated table structure recognition with local and global pyramid mask alignment. In International conference on document analysis and recognition (pp. 99–114). Springer, Cham. https://doi.org/10.1007/978-3-030-86549-8_7
Ye J, Qi X, He Y, Chen Y, Gu D, Gao P, Xiao R (2021) PingAn-VCGroup's solution for ICDAR 2021 competition on scientific literature parsing task B: table recognition to HTML
Zhang Z, Zhang J, Du J, Wang F (2022). Split,embed and merge: An accurate table structure recognizer. Pattern Recogn, 126, 108565. https://doi.org/10.1016/j.patcog.2022.108565
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, Cham, pp 234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6154-6162)
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5693-5703). https://doi.org/10.1109/CVPR.2019.00584
Cheng J, Tian S, Yu L, Lu H, Lv X (2020) Fully convolutional attention network for biomedical image segmentation. Artif Intell Med 107:101899
Article Google Scholar
Wu Z, Pan S, Chen F, Long G, Zhang C, Philip SY (2020) A comprehensive survey on graph neural networks. IEEE Transac Neu Net Learn Sys 32(1):4–24. https://doi.org/10.1109/TNNLS.2020.2978386
Article MathSciNet Google Scholar
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In European conference on computer vision (pp. 213–229). Springer, Cham
Wu X, Tang B, Zhao M, Wang J, Guo Y (2022) STR transformer: a cross-domain transformer for scene text recognition. Appl Intell:1–15
Lu N, Yu W, Qi X, Chen Y, Gong P, Xiao R, Bai X (2021) Master: multi-aspect non-local network for scene text recognition. Pattern Recogn 117:107980. https://doi.org/10.1016/j.patcog.2021.107980
Article Google Scholar
Ma X, He K, Zhang D, Li D (2021) PIEED: position information enhanced encoder-decoder framework for scene text recognition. Appl Intell 51(10):6698–6707
Article Google Scholar
Ji Y, Zhang H, Zhang Z, Liu M (2021) CNN-based encoder-decoder networks for salient object detection: a comprehensive review and recent advances. Inf Sci 546:835–857
Article MathSciNet Google Scholar
Wang J, Wu Z, Ouyang W, Han X, Chen J, Jiang YG, Li SN (2022) M2tr: multi-modal multi-scale transformers for deepfake detection. In proceedings of the 2022 international conference on multimedia retrieval (pp. 615-623). https://doi.org/10.48550/arXiv.2104.09770
Kenton JDMWC, Toutanova LK (2019) Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of naacL-HLT (pp. 4171–4186)
Yan C, Chen Y, Wan Y, Wang P (2021) Modeling low-and high-order feature interactions with FM and self-attention network. Appl Intell 51(6):3189–3201
Article Google Scholar
Li Y, Yao T, Pan Y, Mei T (2022) Contextual transformer networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence
Sun J, Xue F, Li J, Zhu L, Zhang H, Zhang J, TSINIT: a two-stage Inpainting network for incomplete text, IEEE Transactions on Multimedia, https://doi.org/10.1109/TMM.2022.3189245
Zhong G, Yue G (2019) Attention recurrent neural networks for image-based sequence text recognition. In: Asian conference on pattern recognition. Springer, Cham, pp 793–806
Google Scholar
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, ... Zhang, L (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 6881–6890)
Sherstinsky A (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306
Desai H, Kayal P, Singh M (2021) TabLeX: a benchmark dataset for structure and content information extraction from scientific tables. In International conference on document analysis and recognition (pp. 554–569). Springer, Cham
Xue W, Li Q, Zhang Z, Zhao Y, Wang H (2018) Table analysis and information extraction for medical laboratory reports. In 2018 IEEE 16th Intl Conf on dependable, autonomic and secure computing, 16th Intl Conf on pervasive intelligence and computing, 4th Intl Conf on big data intelligence and computing and cyber science and technology congress (DASC/PiCom/DataCom/CyberSciTech) (pp. 193-199). IEEE. https://doi.org/10.1109/DASC/PiCom/DataCom/CyberSciTec.2018.00043
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371
Article Google Scholar
Xue F, Zhang J, Sun J, Yin J, Zou L, Li J (2022) INIT: Inpainting network for incomplete text. ISCAS:2973–2977
Cho K, Merrienboer BV, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. Computer Science 2014
Liao, M., Wan, Z., Yao, C., Chen, K., & Bai, X. (2020) Real-time scene text detection with differentiable binarization. In proceedings of the AAAI conference on artificial intelligence (Vol. 34, no. 07, pp. 11474-11481). https://doi.org/10.1609/aaai.v34i07.6812
Raja S, Mondal A, Jawahar CV (2020) Table structure recognition using top-down and bottom-up cues. In European conference on computer vision (pp. 70–86). Springer, Cham. https://doi.org/10.1007/978-3-030-58604-1_5
Zheng X, Burdick D, Popa L, Xu Z, Wang NXR (2021) Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 697–706
Dhruv P, Naskar S (2020) Image classification using convolutional neural network (CNN) and recurrent neural network (RNN): a review. Machine learning and information processing, 367–381
Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4168-4176). https://doi.org/10.1109/CVPR.2016.452
Simonyan K, Zisserman A (2018) Very deep convolutional networks for large-scale image recognition Karen. Am J Health Pharm 75:398–406
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). https://doi.org/10.1109/CVPR.2016.90
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, ... Adam H (2019) Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1314–1324)
Jimeno Yepes A, Zhong P, Burdick D (2021) ICDAR 2021 competition on scientific literature parsing. In International conference on document analysis and recognition (pp. 605–617). Springer, Cham. https://doi.org/10.1007/978-3-030-86337-1_40

Download references

Acknowledgements

This work was supported in part by Scientific Research Leader Studio of Jinan (No. 2021GXRC081), in part by Joint Project for Smart Computing of Shandong Natural Science Foundation (ZR2020LZH015), and in part by Taishan Scholar Project of Shandong, China (No. ts20190924).

Author information

Honglin Wan and Zongfeng Zhong are contributed equally to this work.

Authors and Affiliations

School of Physics and Electronics, Shandong Normal University, Jinan, Shandong, China
Honglin Wan, Zongfeng Zhong & Tianping Li
School of Information Science and Engineering, Shandong Normal University, Jinan, 250358, China
Huaxiang Zhang & Jiande Sun

Authors

Honglin Wan
View author publications
You can also search for this author in PubMed Google Scholar
Zongfeng Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Tianping Li
View author publications
You can also search for this author in PubMed Google Scholar
Huaxiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jiande Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiande Sun.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest to this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wan, H., Zhong, Z., Li, T. et al. Contextual transformer sequence-based recognition network for medical examination reports. Appl Intell 53, 17363–17380 (2023). https://doi.org/10.1007/s10489-022-04420-4

Download citation

Accepted: 19 December 2022
Published: 31 December 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s10489-022-04420-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Contextual transformer sequence-based recognition network for medical examination reports

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Report Generation Method based on Multiscale Feature Extraction and Word Attention Network

Multi-modal transformer architecture for medical image analysis and automated report generation

IIHT: Medical Report Generation with Image-to-Indicator Hierarchical Transformer

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Contextual transformer sequence-based recognition network for medical examination reports

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic Report Generation Method based on Multiscale Feature Extraction and Word Attention Network

Multi-modal transformer architecture for medical image analysis and automated report generation

IIHT: Medical Report Generation with Image-to-Indicator Hierarchical Transformer

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation