Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Multiclass Imbalanced Dataset Classification of Symbols from Piping and Instrumentation Diagrams

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2024 (ICDAR 2024)

Abstract

Engineering diagrams provide rich source of information and are widely used across different industries. Recent years have seen growing research interest in developing solutions for processing and analysing these diagrams using wide range of image-processing and computer vision techniques. In this paper, we first, present a new multiclass imbalanced dataset of symbols extracted from Piping and Instrumentation Diagrams (P&IDs). The dataset contains 7,728 instances representing 48 different types of engineering symbols and it is considered the first of its kind in the research community. Second, we present a new method for handling multiclass imbalance classification based on class decomposition by means of unsupervised machine learning methods. Experiments using Convolutional Neural Networks showed that using class decomposition significantly improves the classification performance that can be achieved, without causing information loss, as it is the case with other class imbalance data sampling approaches.

Supported by DNV.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/carlosfmorenog/CDSMOTE-NONBIN-Symbols.

References

  1. Arroyo, E., Hoernicke, M., Rodríguez, P., Fay, A.: Automatic derivation of qualitative plant simulation models from legacy piping and instrumentation diagrams. Comput. Chem. Eng. 92, 112–132 (2016)

    Article  Google Scholar 

  2. Banerjee, P., Choudhary, S., Das, S., Majumdar, H., Roy, R., Chaudhuri, B.B.: Automatic hyperlinking of engineering drawing documents. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 102–107 (2016). https://doi.org/10.1109/DAS.2016.76

  3. Buda, M., Maki, A., Mazurowski, M.A.: A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 106, 249–259 (2018). https://doi.org/10.1016/j.neunet.2018.07.011

    Article  Google Scholar 

  4. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002)

    Google Scholar 

  5. Datta, R., Mandal, P.D.S., Chanda, B.: Detection and identification of logic gates from document images using mathematical morphology. In: 2015 Fifth National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), pp. 1–4 (2015). https://doi.org/10.1109/NCVPRIPG.2015.7490040

  6. Elyan, E., Moreno-García, C.F., Jayne, C.: Symbols classification in engineering drawings. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2018). https://doi.org/10.1109/IJCNN.2018.8489087

  7. Elyan, E., Jamieson, L., Ali-Gombe, A.: Deep learning for symbols detection and classification in engineering drawings. Neural Netw. 129, 91–102 (2020). https://doi.org/10.1016/j.neunet.2020.05.025

    Article  Google Scholar 

  8. Elyan, E., Moreno-García, C.F., Jayne, C.: CDSMOTE: class decomposition and synthetic minority class oversampling technique for imbalanced-data classification. Neural Comput. Appl. (2020). https://doi.org/10.1007/s00521-020-05130-z

    Article  Google Scholar 

  9. Elyan, E., Moreno-García, C.F., Johnston, P.: Symbols in engineering drawings (SiED): an imbalanced dataset benchmarked by convolutional neural networks. In: Iliadis, L., Angelov, P.P., Jayne, C., Pimenidis, E. (eds.) EANN 2020. PINNS, vol. 2, pp. 215–224. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-48791-1_16

    Chapter  Google Scholar 

  10. Fajardo, V.A., et al.: On oversampling imbalanced data with deep conditional generative models. Expert Syst. Appl. 169, 114463 (2021). https://doi.org/10.1016/j.eswa.2020.114463

    Article  Google Scholar 

  11. Gellaboina, M.K., Venkoparao, V.G.: Graphic symbol recognition using auto associative neural network model. In: 2009 Seventh International Conference on Advances in Pattern Recognition, pp. 297–301 (2009). https://doi.org/10.1109/ICAPR.2009.45

  12. Gupta, G., Swati, Sharma, M., Vig, L.: Information extraction from hand-marked industrial inspection sheets. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 6, pp. 33–38 (2017). https://doi.org/10.1109/ICDAR.2017.346

  13. Jamieson, L., Moreno-García, C.F., Elyan, E.: Deep learning for text detection and recognition in complex engineering diagrams. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2020). https://doi.org/10.1109/IJCNN48605.2020.9207127

  14. Johnson, J.M., Khoshgoftaar, T.M.: Survey on deep learning with class imbalance. J. Big Data 6(1), 27 (2019)

    Article  Google Scholar 

  15. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386

    Article  Google Scholar 

  16. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  17. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965

  18. Mani, S., Haddad, M.A., Constantini, D., Douhard, W., Li, Q., Poirier, L.: Automatic digitization of engineering diagrams using deep learning and graph search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 1–7 (2020)

    Google Scholar 

  19. Moreno-García, C.F., Elyan, E.: Digitisation of assets from the oil gas industry: challenges and opportunities. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 7, pp. 2–5 (2019). https://doi.org/10.1109/ICDARW.2019.60122

  20. Moreno-García, C.F., Elyan, E., Jayne, C.: Heuristics-based detection to improve text/graphics segmentation in complex engineering drawings. In: Engineering Applications of Neural Networks, vol. CCIS 744, pp. 87–98 (2017)

    Google Scholar 

  21. Moreno-García, C.F., Elyan, E., Jayne, C.: New trends on digitisation of complex engineering drawings. Neural Comput. Appl. (2018). https://doi.org/10.1007/s00521-018-3583-1

    Article  Google Scholar 

  22. Moreno-García, C.F., Johnston, P., Garkuwa, B.: Pixel-based layer segmentation of complex engineering drawings using convolutional neural networks. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2020). https://doi.org/10.1109/IJCNN48605.2020.9207479

  23. Nurminen, J.K., Rainio, K., Numminen, J.-P., Syrjänen, T., Paganus, N., Honkoila, K.: Object detection in design diagrams with machine learning. In: Burduk, R., Kurzynski, M., Wozniak, M. (eds.) CORES 2019. AISC, vol. 977, pp. 27–36. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-19738-4_4

    Chapter  Google Scholar 

  24. Rahul, R., Paliwal, S., Sharma, M., Vig, L.: Automatic information extraction from piping and instrumentation diagrams. CoRR (2019). http://arxiv.org/abs/1901.11383

  25. Ravagli, J., Ziran, Z., Marinai, S.: Text recognition and classification in floor plan images. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 1–6 (2019). https://doi.org/10.1109/ICDARW.2019.00006

  26. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. CoRR (2018). http://arxiv.org/abs/1804.02767

  27. Rica, E., Alvarez, S., Moreno-García, C.F., Serratosa, F.: Zero-error digitisation and contextualisation of piping and instrumentation diagrams using node classification and sub-graph search. In: Krzyzak, A., Suen, C.Y., Torsello, A., Nobile, N. (eds.) S+SSPR 2022. LNCS, vol. 13813, pp. 274–282. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-23028-8_28

    Chapter  Google Scholar 

  28. Sinha, A., Bayer, J., Bukhari, S.S.: Table localization and field value extraction in piping and instrumentation diagram images. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 1, pp. 26–31 (2019). https://doi.org/10.1109/ICDARW.2019.00010

  29. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html

  30. Tan, W.C., Chen, I.M., Tan, H.K.: Automated identification of components in raster piping and instrumentation diagram with minimal pre-processing. In: 2016 IEEE International Conference on Automation Science and Engineering (CASE), pp. 1301–1306 (2016). https://doi.org/10.1109/COASE.2016.7743558

  31. Toral, L., Moreno-García, C.F., Elyan, E., Memon, S.: A deep learning digitisation framework to mark up corrosion circuits in piping and instrumentation diagrams. In: Barney Smith, E.H., Pal, U. (eds.) ICDAR 2021. LNCS, vol. 12917, pp. 268–276. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86159-9_18

    Chapter  Google Scholar 

  32. Vuttipittayamongkol, P., Elyan, E.: Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf. Sci. 509, 47–70 (2020). https://doi.org/10.1016/j.ins.2019.08.062. http://www.sciencedirect.com/science/article/pii/S0020025519308114

  33. Wang, S., Liu, W., Wu, J., Cao, L., Meng, Q., Kennedy, P.J.: Training deep neural networks on imbalanced data sets. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4368–4374 (2016). https://doi.org/10.1109/IJCNN.2016.7727770

  34. Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42(4), 1119–1130 (2012). https://doi.org/10.1109/TSMCB.2012.2187280

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Francisco Moreno-García .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jamieson, L., Moreno-García, C.F., Elyan, E. (2024). A Multiclass Imbalanced Dataset Classification of Symbols from Piping and Instrumentation Diagrams. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14804. Springer, Cham. https://doi.org/10.1007/978-3-031-70533-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70533-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70532-8

  • Online ISBN: 978-3-031-70533-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics