research-article

Knowledge-driven description synthesis for floor plan interpretation

Authors:

Chiranjoy Chattopadhyay,

Gaurav BhatnagarAuthors Info & Claims

International Journal on Document Analysis and Recognition (IJDAR), Volume 24, Issue 1-2

Pages 19 - 32

https://doi.org/10.1007/s10032-021-00367-3

Published: 01 June 2021 Publication History

Abstract

Image captioning is a widely known problem in the area of AI. Caption generation from floor plan images has applications in indoor path planning, real estate, and providing architectural solutions. Several methods have been explored in the literature for generating captions or semi-structured descriptions from floor plan images. Since only the caption is insufficient to capture fine-grained details, researchers also proposed descriptive paragraphs from images. However, these descriptions have a rigid structure and lack flexibility, making it difficult to use them in real-time scenarios. This paper offers two models, description synthesis from image cue (DSIC) and transformer-based description generation (TBDG), for text generation from floor plan images. These two models take advantage of modern deep neural networks for visual feature extraction and text generation. The difference between both models is in the way they take input from the floor plan image. The DSIC model takes only visual features automatically extracted by a deep neural network, while the TBDG model learns textual captions extracted from input floor plan images with paragraphs. The specific keywords generated in TBDG and understanding them with paragraphs make it more robust in a general floor plan image. Experiments were carried out on a large-scale publicly available dataset and compared with state-of-the-art techniques to show the proposed model’s superiority.

References

[1]

Adam S, Ogier JM, Cariou C, Mullot R, Labiche J, and Gardes J Symbol and character recognition: application to engineering drawings IJDAR 2000 3 2 89-101

[2]

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

[3]

Barducci, A., Marinai, S.: Object recognition in floor plans by graphs of white connected components. In: ICPR (2012)

[4]

Chatterjee, M., Schwing, A.G.: Diverse and coherent paragraph generation from images. In: ECCV (2018)

[5]

Chen, X., Fang, H., Lin, T.Y., Vedantam, R., Gupta, S., Dollár, P., Zitnick, C.L.: Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325 (2015)

[6]

Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:1409.1259 (2014)

[7]

de las Heras, L.P., Terrades, O.R., Robles, S., Sánchez, G.: CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool. IJDAR 18(1), 15–30 (2015)

[8]

Delalandre M, Valveny E, Pridmore T, and Karatzas D Generation of synthetic documents for performance evaluation of symbol recognition & spotting systems IJDAR 2010 13 3 187-207

[9]

Dutta, A., Llados, J., Pal, U.: Symbol spotting in line drawings through graph paths hashing. In: ICDAR (2011)

[10]

Dutta A, Lladós J, and Pal U A symbol spotting approach in graphical documents by hashing serialized graphs Pattern Recognit. 2013 46 3 752-768

[11]

Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: ECCV (2010)

[12]

Girshick, R.: Fast R-CNN. In: ICCV (2015)

[13]

Goyal S, Bhavsar S, Patel S, Chattopadhyay C, and Bhatnagar G SUGAMAN: describing floor plans for visually impaired by annotation learning and proximity-based grammar Image Process. 2019 13 13 2623-2635

[14]

Goyal, S., Chattopadhyay, C., Bhatnagar, G.: ASYSST: a framework for synopsis synthesis empowering visually impaired. In: MAHCI (2018)

[15]

Goyal, S., Chattopadhyay, C., Bhatnagar, G.: Plan2Text: a framework for describing building floor plan images from first person perspective. In: CSPA (2018)

[16]

Goyal, S., Mistry, V., Chattopadhyay, C., Bhatnagar, G.: BRIDGE: building plan repository for image description generation, and evaluation. In: ICDAR (2019)

[17]

He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)

[18]

He K, Zhang X, Ren S, and Sun J Spatial pyramid pooling in deep convolutional networks for visual recognition T-PAMI 2015 37 9 1904-1916

[19]

Hochreiter S and Schmidhuber J Long short-term memory Neural Comput. 1997 9 8 1735-1780

[20]

Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: fully convolutional localization networks for dense captioning. In: CVPR (2016)

[21]

Khan I, Islam N, Rehman HU, and Khan M A comparative study of graphic symbol recognition methods Multimedia Tools Appl. 2020 79 13 8695-8725

[22]

Krause, J., Johnson, J., Krishna, R., Fei-Fei, L.: A hierarchical approach for generating descriptive image paragraphs. In: CVPR (2017)

[23]

Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Chen S, Kalantidis Y, Li LJ, Shamma DA, et al. Visual genome: connecting language and vision using crowdsourced dense image annotations IJCV 2017 123 1 32-73

[24]

Kulkarni G, Premraj V, Ordonez V, Dhar S, Li S, Choi Y, Berg AC, and Berg TL Babytalk: understanding and generating simple image descriptions T-PAMI 2013 35 12 2891-2903

[25]

Li, S., Kulkarni, G., Berg, T.L., Berg, A.C., Choi, Y.: Composing simple image descriptions using web-scale n-grams. In: CoNLL (2011)

[26]

Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: ECCV (2014)

[27]

Liu, Y., Fu, J., Mei, T., Chen, C.W.: Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks. In: AAAI (2017)

[28]

Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)

[29]

Madugalla A, Marriott K, Marinai S, Capobianco S, and Goncu C Creating accessible online floor plans for visually impaired readers ACM T-ACCESS 2020 13 4 1-37

[30]

Mao, Y., Zhou, C., Wang, X., Li, R.: Show and tell more: topic-oriented multi-sentence image captioning. In: IJCAI (2018)

[31]

Marcus, M., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank (1993)

[32]

Nallapati, R., Zhou, B., Gulcehre, C., Xiang, B., et al.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. arXiv preprint arXiv:1602.06023 (2016)

[33]

Ordonez, V., Kulkarni, G., Berg, T.L.: Im2Text: describing images using 1 million captioned photographs. In: NIPS (2011)

[34]

Park, C.C., Kim, G.: Expressing an image stream with a sequence of natural sentences. In: NIPS (2015)

[35]

Qureshi, R.J., Ramel, J.Y., Barret, D., Cardot, H.: Spotting symbols in line drawing images using graph representations. In: GREC (2007)

[36]

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)

[37]

Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR (2017)

[38]

Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

[39]

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

[40]

Rezvanifar, A., Cote, M., Branzan Albu, A.: Symbol spotting on digital architectural floor plans using a deep learning-based framework. In: Proceedings of the IEEE/CVF CVPR Workshops, pp. 568–569 (2020)

[41]

Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. arXiv preprint arXiv:1509.00685 (2015)

[42]

Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in neural information processing systems, pp. 3856–3866 (2017)

[43]

Saha, R., Mondal, A., Jawahar, C.: Graphical Object Detection in Document Images. In: ICDAR (2019)

[44]

Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: deep learning for detection and structure recognition of tables in document images. In: ICDAR (2017)

[45]

Sharma, D., Gupta, N., Chattopadhyay, C., Mehta, S.: DANIEL: A deep architecture for automatic analysis and retrieval of building floor plans. In: ICDAR (2017)

[46]

Sharma, N., Mandal, R., Sharma, R., Pal, U., Blumenstein, M.: Signature and Logo Detection using Deep CNN for Document Image Retrieval. In: ICFHR (2018)

[47]

Su H, Gong S, and Zhu X Scalable logo detection by self co-learning Pattern Recognition 2020 97 107003

[48]

Sutskever, I., Vinyals, O., Le, Q.: Sequence to sequence learning with neural networks. NIPS (2014)

[49]

Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: CVPR (2001)

[50]

Wang, Q., Chan, A.B.: CNN+CNN: convolutional decoders for image captioning. arXiv preprint arXiv:1805.09019 (2018)

[51]

Wang, Z., Luo, Y., Li, Y., Huang, Z., Yin, H.: Look Deeper See Richer: Depth-aware Image Paragraph Captioning. In: ACM MM (2018)

[52]

Yao, T., Pan, Y., Li, Y., Qiu, Z., Mei, T.: Boosting image captioning with attributes. In: ICCV (2017)

[53]

Yi, X., Gao, L., Liao, Y., Zhang, X., Liu, R., Jiang, Z.: CNN based page object detection in document images. In: ICDAR (2017)

[54]

Ziran, Z., Marinai, S.: Object detection in floor plan images. In: IAPR Workshop on Artificial Neural Networks in Pattern Recognition (2018)

Cited By

Goyal SChattopadhyay CBhatnagar G(2024)FVCap: An Approach to Understand Scanned Floor Plan Images Using Deep Learning and its ApplicationsSN Computer Science10.1007/s42979-024-02708-55:4Online publication date: 30-Mar-2024
https://dl.acm.org/doi/10.1007/s42979-024-02708-5
Jang SKim S(2023)Automatic Extraction and Linkage between Textual and Spatial Data for Architectural HeritageJournal on Computing and Cultural Heritage 10.1145/358615816:3(1-19)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3586158
Anken JRosenthal DMüller KJaworek GStiefelhagen R(2022)Split it Up: Allocentric Descriptions of Indoor Maps for People with Visual ImpairmentsComputers Helping People with Special Needs10.1007/978-3-031-08648-9_13(102-109)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1007/978-3-031-08648-9_13

Index Terms

Knowledge-driven description synthesis for floor plan interpretation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

RISC-Net : rotation invariant siamese convolution network for floor plan image retrieval
Abstract
A floor plan represents the blue print of a building. Organizing a massive set of such floor plans and accessing them based on similarity is challenging for any architect. During the digitization process printed floor plan images are rotated ...
A rotation and scale invariant approach for multi-oriented floor plan image retrieval
Highlights
- We have introduced a novel algorithm for outer shape extraction from floor plan images.
Graphical abstract

Display Omitted

Abstract
An automatic system for analysis and retrieval of building floor plans images is helpful for the architects while designing new projects and providing recommendations to the buyers. For such systems, query by example is preferred over ...
A graph-theoretic approach to constrained floor plan estimation from radar measurements

This paper proposes an approach to deducing the floor plan of a building using only a set of noisy room dimension measurements that have been obtained indirectly through radar probes. It is not assumed that the set of measurements is complete. The key ...

Comments

Information & Contributors

Information

Published In

cover image International Journal on Document Analysis and Recognition

International Journal on Document Analysis and Recognition Volume 24, Issue 1-2

Jun 2021

139 pages

ISSN:1433-2833

EISSN:1433-2825

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 June 2021

Accepted: 11 April 2021

Revision received: 31 March 2021

Received: 10 January 2020

Author Tags

Qualifiers

Research-article

Funding Sources

Science and Engineering Research Board

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Goyal SChattopadhyay CBhatnagar G(2024)FVCap: An Approach to Understand Scanned Floor Plan Images Using Deep Learning and its ApplicationsSN Computer Science10.1007/s42979-024-02708-55:4Online publication date: 30-Mar-2024
https://dl.acm.org/doi/10.1007/s42979-024-02708-5
Jang SKim S(2023)Automatic Extraction and Linkage between Textual and Spatial Data for Architectural HeritageJournal on Computing and Cultural Heritage 10.1145/358615816:3(1-19)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.1145/3586158
Anken JRosenthal DMüller KJaworek GStiefelhagen R(2022)Split it Up: Allocentric Descriptions of Indoor Maps for People with Visual ImpairmentsComputers Helping People with Special Needs10.1007/978-3-031-08648-9_13(102-109)Online publication date: 11-Jul-2022
https://dl.acm.org/doi/10.1007/978-3-031-08648-9_13

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents