Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Knowledge-driven description synthesis for floor plan interpretation

Published: 01 June 2021 Publication History

Abstract

Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in the literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model’s superiority.

References

[1]
Adam S, Ogier JM, Cariou C, Mullot R, Labiche J, and Gardes J Symbol and character recognition: application to engineering drawings IJDAR 2000 3 2 89-101
[2]
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
[3]
Barducci, A., Marinai, S.: Object recognition in floor plans by graphs of white connected components. In: ICPR (2012)
[4]
Chatterjee, M., Schwing, A.G.: Diverse and coherent paragraph generation from images. In: ECCV (2018)
[5]
Chen, X., Fang, H., Lin, T.Y., Vedantam, R., Gupta, S., Dollár, P., Zitnick, C.L.: Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)
[6]
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)
[7]
de las Heras, L.P., Terrades, O.R., Robles, S., Sánchez, G.: CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool. IJDAR 18(1), 15–30 (2015)
[8]
Delalandre M, Valveny E, Pridmore T, and Karatzas D Generation of synthetic documents for performance evaluation of symbol recognition & spotting systems IJDAR 2010 13 3 187-207
[9]
Dutta, A., Llados, J., Pal, U.: Symbol spotting in line drawings through graph paths hashing. In: ICDAR (2011)
[10]
Dutta A, Lladós J, and Pal U A symbol spotting approach in graphical documents by hashing serialized graphs Pattern Recognit. 2013 46 3 752-768
[11]
Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: ECCV (2010)
[12]
Girshick, R.: Fast R-CNN. In: ICCV (2015)
[13]
Goyal S, Bhavsar S, Patel S, Chattopadhyay C, and Bhatnagar G SUGAMAN: describing floor plans for visually impaired by annotation learning and proximity-based grammar Image Process. 2019 13 13 2623-2635
[14]
Goyal, S., Chattopadhyay, C., Bhatnagar, G.: ASYSST: a framework for synopsis synthesis empowering visually impaired. In: MAHCI (2018)
[15]
Goyal, S., Chattopadhyay, C., Bhatnagar, G.: Plan2Text: a framework for describing building floor plan images from first person perspective. In: CSPA (2018)
[16]
Goyal, S., Mistry, V., Chattopadhyay, C., Bhatnagar, G.: BRIDGE: building plan repository for image description generation, and evaluation. In: ICDAR (2019)
[17]
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
[18]
He K, Zhang X, Ren S, and Sun J Spatial pyramid pooling in deep convolutional networks for visual recognition T-PAMI 2015 37 9 1904-1916
[19]
Hochreiter S and Schmidhuber J Long short-term memory Neural Comput. 1997 9 8 1735-1780
[20]
Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: CVPR (2016)
[21]
Khan I, Islam N, Rehman HU, and Khan M A comparative study of graphic symbol recognition methods Multimedia Tools Appl. 2020 79 13 8695-8725
[22]
Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: CVPR (2017)
[23]
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA, et al. Visual genome: connecting language and vision using crowdsourced dense image annotations IJCV 2017 123 1 32-73
[24]
Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg AC, and Berg TL Babytalk: understanding and generating simple image descriptions T-PAMI 2013 35 12 2891-2903
[25]
Li, S., Kulkarni, G., Berg, T.L., Berg, A.C., Choi, Y.: Composing simple image descriptions using web-scale n-grams. In: CoNLL (2011)
[26]
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: ECCV (2014)
[27]
Liu, Y., Fu, J., Mei, T., Chen, C.W.: Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks. In: AAAI (2017)
[28]
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
[29]
Madugalla A, Marriott K, Marinai S, Capobianco S, and Goncu C Creating accessible online floor plans for visually impaired readers ACM T-ACCESS 2020 13 4 1-37
[30]
Mao, Y., Zhou, C., Wang, X., Li, R.: Show and tell more: topic-oriented multi-sentence image captioning. In: IJCAI (2018)
[31]
Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank (1993)
[32]
Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B., et al.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016)
[33]
Ordonez, V., Kulkarni, G., Berg, T.L.: Im2Text: describing images using 1 million captioned photographs. In: NIPS (2011)
[34]
Park, C.C., Kim, G.: Expressing an image stream with a sequence of natural sentences. In: NIPS (2015)
[35]
Qureshi, R.J., Ramel, J.Y., Barret, D., Cardot, H.: Spotting symbols in line drawing images using graph representations. In: GREC (2007)
[36]
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
[37]
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)
[38]
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
[39]
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
[40]
Rezvanifar, A., Cote, M., Branzan Albu, A.: Symbol spotting on digital architectural floor plans using a deep learning-based framework. In: Proceedings of the IEEE/CVF CVPR Workshops, pp. 568–569 (2020)
[41]
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015)
[42]
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in neural information processing systems, pp. 3856–3866 (2017)
[43]
Saha, R., Mondal, A., Jawahar, C.: Graphical Object Detection in Document Images. In: ICDAR (2019)
[44]
Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: ICDAR (2017)
[45]
Sharma, D., Gupta, N., Chattopadhyay, C., Mehta, S.: DANIEL: A deep architecture for automatic analysis and retrieval of building floor plans. In: ICDAR (2017)
[46]
Sharma, N., Mandal, R., Sharma, R., Pal, U., Blumenstein, M.: Signature and Logo Detection using Deep CNN for Document Image Retrieval. In: ICFHR (2018)
[47]
Su H, Gong S, and Zhu X Scalable logo detection by self co-learning Pattern Recognition 2020 97 107003
[48]
Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. NIPS (2014)
[49]
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)
[50]
Wang, Q., Chan, A.B.: CNN+CNN: convolutional decoders for image captioning. arXiv preprint arXiv:1805.09019 (2018)
[51]
Wang, Z., Luo, Y., Li, Y., Huang, Z., Yin, H.: Look Deeper See Richer: Depth-aware Image Paragraph Captioning. In: ACM MM (2018)
[52]
Yao, T., Pan, Y., Li, Y., Qiu, Z., Mei, T.: Boosting image captioning with attributes. In: ICCV (2017)
[53]
Yi, X., Gao, L., Liao, Y., Zhang, X., Liu, R., Jiang, Z.: CNN based page object detection in document images. In: ICDAR (2017)
[54]
Ziran, Z., Marinai, S.: Object detection in floor plan images. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition (2018)

Cited By

View all
  • (2024)FVCap: An Approach to Understand Scanned Floor Plan Images Using Deep Learning and its ApplicationsSN Computer Science10.1007/s42979-024-02708-55:4Online publication date: 30-Mar-2024
  • (2023)Automatic Extraction and Linkage between Textual and Spatial Data for Architectural HeritageJournal on Computing and Cultural Heritage 10.1145/358615816:3(1-19)Online publication date: 9-Aug-2023
  • (2022)Split it Up: Allocentric Descriptions of Indoor Maps for People with Visual ImpairmentsComputers Helping People with Special Needs10.1007/978-3-031-08648-9_13(102-109)Online publication date: 11-Jul-2022

Index Terms

  1. Knowledge-driven description synthesis for floor plan interpretation
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image International Journal on Document Analysis and Recognition
      International Journal on Document Analysis and Recognition  Volume 24, Issue 1-2
      Jun 2021
      139 pages
      ISSN:1433-2833
      EISSN:1433-2825
      Issue’s Table of Contents

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 01 June 2021
      Accepted: 11 April 2021
      Revision received: 31 March 2021
      Received: 10 January 2020

      Author Tags

      1. Floor plan
      2. Captioning
      3. Evaluation
      4. Language modeling

      Qualifiers

      • Research-article

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 10 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)FVCap: An Approach to Understand Scanned Floor Plan Images Using Deep Learning and its ApplicationsSN Computer Science10.1007/s42979-024-02708-55:4Online publication date: 30-Mar-2024
      • (2023)Automatic Extraction and Linkage between Textual and Spatial Data for Architectural HeritageJournal on Computing and Cultural Heritage 10.1145/358615816:3(1-19)Online publication date: 9-Aug-2023
      • (2022)Split it Up: Allocentric Descriptions of Indoor Maps for People with Visual ImpairmentsComputers Helping People with Special Needs10.1007/978-3-031-08648-9_13(102-109)Online publication date: 11-Jul-2022

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media