research-article

Video Question Answering via Knowledge-based Progressive Spatial-Temporal Attention Network

Editors:

Yueting ZhuangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 15, Issue 2s

Article No.: 52, Pages 1 - 22

https://doi.org/10.1145/3321505

Published: 03 July 2019 Publication History

Abstract

Visual Question Answering (VQA) is a challenging task that has gained increasing attention from both the computer vision and the natural language processing communities in recent years. Given a question in natural language, a VQA system is designed to automatically generate the answer according to the referenced visual content. Though there recently has been much intereset in this topic, the existing work of visual question answering mainly focuses on a single static image, which is only a small part of the dynamic and sequential visual data in the real world. As a natural extension, video question answering (VideoQA) is less explored. Because of the inherent temporal structure in the video, the approaches of ImageQA may be ineffectively applied to video question answering. In this article, we not only take the spatial and temporal dimension of video content into account but also employ an external knowledge base to improve the answering ability of the network. More specifically, we propose a knowledge-based progressive spatial-temporal attention network to tackle this problem. We obtain both objects and region features of the video frames from a region proposal network. The knowledge representation is generated by a word-level attention mechanism using the comment information of each object that is extracted from DBpedia. Then, we develop a question-knowledge-guided progressive spatial-temporal attention network to learn the joint video representation for video question answering task. We construct a large-scale video question answering dataset. The extensive experiments based on two different datasets validate the effectiveness of our method.

References

[1]

Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Neural module networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 39--48.

[2]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proceedings of the IEEE International Conference on Computer Vision. 2425--2433.

Digital Library

[3]

Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. In The Semantic Web. Springer, 722--735.

Digital Library

[4]

Michele Banko, Michael J. Cafarella, Stephen Soderland, Matthew Broadhead, and Oren Etzioni. 2007. Open information extraction from the web. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’07), Vol. 7. 2670--2676.

Digital Library

[5]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. ACM, 1247--1250.

Digital Library

[6]

Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston. 2015. Large-scale simple question answering with memory networks. arXiv preprint arXiv:1506.02075 (2015).

[7]

Antoine Bordes, Jason Weston, and Nicolas Usunier. 2014. Open question answering with weakly supervised embedding models. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 165--180.

Digital Library

[8]

Xinlei Chen, Abhinav Shrivastava, and Abhinav Gupta. 2013. Neil: Extracting visual knowledge from web data. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). IEEE, 1409--1416.

Digital Library

[9]

Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2625--2634.

[10]

Li Dong, Furu Wei, Ming Zhou, and Ke Xu. 2015. Question answering over freebase with multi-column convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Vol. 1. 260--269.

[11]

Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. 2016. Multimodal compact bilinear pooling for visual question answering and visual grounding. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 457--468.

[12]

Jiyang Gao, Runzhou Ge, Kan Chen, and Ram Nevatia. 2018. Motion-appearance co-memory networks for video question answering. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6576--6585.

[13]

Donald Geman, Stuart Geman, Neil Hallonquist, and Laurent Younes. 2015. Visual turing test for computer vision systems. Proc. Natl. Acad. Sci. U.S.A. 112, 12 (2015), 3618--3623.

[14]

Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2013. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13). IEEE, 2712--2719.

Digital Library

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[16]

Michael Heilman and Noah A Smith. 2009. Question Generation via Overgenerating Transformations and Ranking. Technical Report. Language Technologie Institute, Carnegie-Mellon University, Pittsburgh, PA.

[17]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neur. Comput. 9, 8 (1997), 1735--1780.

Digital Library

[18]

Richang Hong, Lei Li, Junjie Cai, Dapeng Tao, Meng Wang, and Qi Tian. 2017. Coherent semantic-visual indexing for large-scale image retrieval in the cloud. IEEE Trans. Image Process. 26, 9 (2017), 4128--4138.

Digital Library

[19]

Richang Hong, Jinhui Tang, Hung-Khoon Tan, Chong-Wah Ngo, Shuicheng Yan, and Tat-Seng Chua. 2011. Beyond search: Event-driven summarization for web videos. ACM Trans. Multimedia Comput. Commun. Appl. 7, 4 (2011), 35.

Digital Library

[20]

Mohit Iyyer, Wen-tau Yih, and Ming-Wei Chang. 2017. Search-based neural structured learning for sequential question answering. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 1821--1831.

[21]

Yunseok Jang, Yale Song, Youngjae Yu, Youngjin Kim, and Gunhee Kim. 2017. TGIF-QA: Toward spatio-temporal reasoning in visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 1359--1367.

[22]

Andrej Karpathy, Armand Joulin, and Li F Fei-Fei. 2014. Deep fragment embeddings for bidirectional image sentence mapping. In Advances in Neural Information Processing Systems. 1889--1897.

Digital Library

[23]

Jin-Hwa Kim, Sang-Woo Lee, Donghyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, and Byoung-Tak Zhang. 2016. Multimodal residual learning for visual qa. In Advances in Neural Information Processing Systems. 361--369.

Digital Library

[24]

Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning. 1188--1196.

Digital Library

[25]

Ruiyu Li and Jiaya Jia. 2016. Visual question answering with question representation update (qru). In Advances in Neural Information Processing Systems. 4655--4663.

Digital Library

[26]

Junwei Liang, Lu Jiang, Liangliang Cao, Li-Jia Li, and Alexander Hauptmann. 2018. Focal visual-text attention for visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6135--6143.

[27]

Lizi Liao, Yunshan Ma, Xiangnan He, Richang Hong, and Tat-seng Chua. 2018. Knowledge-aware multimodal dialogue systems. In Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 801--809.

Digital Library

[28]

Xiao Lin and Devi Parikh. 2016. Leveraging visual question answering for image-caption ranking. In Proceedings of the European Conference on Computer Vision. Springer, 261--277.

[29]

Hugo Liu and Push Singh. 2004. ConceptNet: A practical commonsense reasoning tool-kit. BT Technol. J. 22, 4 (2004), 211--226.

Digital Library

[30]

Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In Advances in Neural Information Processing Systems. 289--297.

Digital Library

[31]

Lin Ma, Zhengdong Lu, and Hang Li. 2016. Learning to answer questions from image using convolutional neural network. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’16). 3567--3573.

Digital Library

[32]

Farzaneh Mahdisoltani, Joanna Biega, and Fabian M. Suchanek. 2013. Yago3: A knowledge base from multilingual wikipedias. In Proceedings of the Conference on Innovative Data Systems Research (CIDR’13).

[33]

Mateusz Malinowski and Mario Fritz. 2014. A multi-world approach to question answering about real-world scenes based on uncertain input. In Advances in Neural Information Processing Systems. 1682--1690.

Digital Library

[34]

Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. 2015. Ask your neurons: A neural-based approach to answering questions about images. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15). IEEE Computer Society, 1--9.

Digital Library

[35]

Seil Na, Sangho Lee, Jisung Kim, and Gunhee Kim. 2017. A read-write memory network for movie story understanding. In Proceedings of the International Conference on Computer Vision. 677--685.

[36]

Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han. 2016. Image question answering using convolutional neural network with dynamic parameter prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 30--38.

[37]

Siva Reddy, Oscar Täckström, Michael Collins, Tom Kwiatkowski, Dipanjan Das, Mark Steedman, and Mirella Lapata. 2016. Transforming dependency structures to logical forms for semantic parsing. Trans. Assoc. Comput. Linguist. 4 (2016), 127--140.

[38]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99.

Digital Library

[39]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.

[40]

Xiaomeng Song, Yucheng Shi, Xin Chen, and Yahong Han. 2018. Explore multi-step reasoning in video question answering. In Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 239--247.

Digital Library

[41]

Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. 2015. End-to-end memory networks. In Advances in Neural Information Processing Systems. 2440--2448.

Digital Library

[42]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, et al. 2015. Going deeper with convolutions. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’15). 1--9.

[43]

Damien Teney, Lingqiao Liu, and Anton van den Hengel. 2017. Graph-structured representations for visual question answering. In Proceedings of the Conference on Computer Vision and Pattern Recognition. 3233--3241.

[44]

Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: A free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78--85.

Digital Library

[45]

Bo Wang, Youjiang Xu, Yahong Han, and Richang Hong. 2018. Movie question answering: Remembering the textual cues for layered visual contents. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’18). 7380--7387.

[46]

Aming Wu and Yahong Han. 2018. Multi-modal circulant fusion for video-to-language and backward. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 1029--1035.

Digital Library

[47]

Qi Wu, Chunhua Shen, Peng Wang, Anthony Dick, and Anton van den Hengel. 2017. Image captioning and visual question answering based on attributes and external knowledge. IEEE Trans. Pattern Anal. Mach. Intell. 40 (2017), 1367--1381.

[48]

Qi Wu, Peng Wang, Chunhua Shen, Anthony Dick, and Anton van den Hengel. 2016. Ask me anything: Free-form visual question answering based on knowledge from external sources. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4622--4630.

[49]

Zhibiao Wu and Martha Palmer. 1994. Verbs semantics and lexical selection. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 133--138.

Digital Library

[50]

Chunyang Xiao, Marc Dymetman, and Claire Gardent. 2016. Sequence-based structured prediction for semantic parsing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. 1341--1350.

[51]

Caiming Xiong, Stephen Merity, and Richard Socher. 2016. Dynamic memory networks for visual and textual question answering. In Proceedings of the International Conference on Machine Learning. 2397--2406.

Digital Library

[52]

Dejing Xu, Zhou Zhao, Jun Xiao, Fei Wu, Hanwang Zhang, Xiangnan He, and Yueting Zhuang. 2017. Video question answering via gradually refined attention over appearance and motion. In ACM Multimedia. 1645--1653.

Digital Library

[53]

Huijuan Xu and Kate Saenko. 2016. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In Proceedings of the European Conference on Computer Vision. Springer, 451--466.

[54]

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning. 2048--2057.

Digital Library

[55]

Youjiang Xu, Yahong Han, Richang Hong, and Qi Tian. 2018. Sequential video VLAD: Training the aggregation locally and temporally. IEEE Trans. Image Process. 27, 10 (2018), 4933--4944.

Digital Library

[56]

Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked attention networks for image question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 21--29.

[57]

Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, and Aaron Courville. 2015. Describing videos by exploiting temporal structure. In Proceedings of the IEEE International Conference on Computer Vision. 4507--4515.

Digital Library

[58]

Yunan Ye, Zhou Zhao, Yimeng Li, Long Chen, Jun Xiao, and Yueting Zhuang. 2017. Video question answering via attribute-augmented attention network learning. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 829--832.

Digital Library

[59]

Dongfei Yu, Jianlong Fu, Tao Mei, and Yong Rui. 2017. Multi-level attention networks for visual question answering. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Vol. 1. 8.

[60]

Youngjae Yu, Hyungjin Ko, Jongwook Choi, and Gunhee Kim. 2017. End-to-end concept word detection for video captioning, retrieval, and question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3261--3269.

[61]

Kuo-Hao Zeng, Tseng-Hung Chen, Ching-Yao Chuang, Yuan-Hong Liao, Juan Carlos Niebles, and Min Sun. 2017. Leveraging video descriptions to learn video question answering. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI’17). 4334--4340.

Digital Library

[62]

Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proceedings of the Conference in Uncertainty in Artificial Intelligence. 658--666.

Digital Library

[63]

Zhou Zhao, Jinghao Lin, Xinghua Jiang, Deng Cai, Xiaofei He, and Yueting Zhuang. 2017. Video question answering via hierarchical dual-level attention network learning. In Proceedings of the 2017 ACM on Multimedia Conference. ACM, 1050--1058.

Digital Library

[64]

Zhou Zhao, Zhu Zhang, Shuwen Xiao, Zhou Yu, Jun Yu, Deng Cai, Fei Wu, and Yueting Zhuang. 2018. Open-ended long-form video question answering via adaptive hierarchical reinforced networks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18). 3683--3689.

Digital Library

[65]

Linchao Zhu, Zhongwen Xu, Yi Yang, and Alexander G. Hauptmann. 2017. Uncovering the temporal context for video question answering. Int. J. Comput. Vis. 124, 3 (2017), 409--421.

Digital Library

[66]

Yuke Zhu, Oliver Groth, Michael Bernstein, and Li Fei-Fei. 2016. Visual7w: Grounded question answering in images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4995--5004.

[67]

Yuke Zhu, Ce Zhang, Christopher Ré, and Li Fei-Fei. 2015. Building a large-scale multimodal knowledge base system for answering visual queries. arXiv preprint arXiv:1507.05670 (2015).

Cited By

Sun XDai YWang YMa WLin X(2024)Video question answering via traffic knowledge database and question classificationMultimedia Systems10.1007/s00530-023-01240-530:1Online publication date: 16-Jan-2024
https://dl.acm.org/doi/10.1007/s00530-023-01240-5
Peng MShao XShi YZhou X(2023)Hierarchical Synergy-Enhanced Multimodal Relational Network for Video Question AnsweringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363010120:4(1-22)Online publication date: 11-Dec-2023
https://dl.acm.org/doi/10.1145/3630101
Wang YLi PSi QZhang HZang WLin ZFu P(2023)Cross-modality Multiple Relations Learning for Knowledge-based Visual Question AnsweringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361830120:3(1-22)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3618301
Show More Cited By

Index Terms

Video Question Answering via Knowledge-based Progressive Spatial-Temporal Attention Network
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Visual content-based indexing and retrieval
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Question answering

Recommendations

Spatiotemporal-Textual Co-Attention Network for Video Question Answering
Special Section on Cross-Media Analysis for Visual Question Answering, Special Section on Big Data, Machine Learning and AI Technologies for Art and Design and Special Section on MMSys/NOSSDAV 2018

Visual Question Answering (VQA) is to provide a natural language answer for a pair of an image or video and a natural language question. Despite recent progress on VQA, existing works primarily focus on image question answering and are suboptimal for ...
Video Question Answering via Hierarchical Dual-Level Attention Network Learning
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Video question answering is a challenging task in visual information retrieval, which provides the accurate answer from the referenced video contents according to the given question. However, the existing visual question answering approaches mainly ...
Video Question Answering via Gradually Refined Attention over Appearance and Motion
MM '17: Proceedings of the 25th ACM international conference on Multimedia

Recently image question answering (ImageQA) has gained lots of attention in the research community. However, as its natural extension, video question answering (VideoQA) is less explored. Although both tasks look similar, VideoQA is more challenging ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 15, Issue 2s

Special Section on Cross-Media Analysis for Visual Question Answering, Special Section on Big Data, Machine Learning and AI Technologies for Art and Design and Special Section on MMSys/NOSSDAV 2018

April 2019

381 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3343360

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 July 2019

Accepted: 01 February 2019

Revised: 01 January 2019

Received: 01 June 2018

Published in TOMM Volume 15, Issue 2s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Zhejiang Natural Science Foundation
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
443
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)4

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sun XDai YWang YMa WLin X(2024)Video question answering via traffic knowledge database and question classificationMultimedia Systems10.1007/s00530-023-01240-530:1Online publication date: 16-Jan-2024
https://dl.acm.org/doi/10.1007/s00530-023-01240-5
Peng MShao XShi YZhou X(2023)Hierarchical Synergy-Enhanced Multimodal Relational Network for Video Question AnsweringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363010120:4(1-22)Online publication date: 11-Dec-2023
https://dl.acm.org/doi/10.1145/3630101
Wang YLi PSi QZhang HZang WLin ZFu P(2023)Cross-modality Multiple Relations Learning for Knowledge-based Visual Question AnsweringACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361830120:3(1-22)Online publication date: 23-Oct-2023
https://dl.acm.org/doi/10.1145/3618301
Li KLi JGuo DYang XWang M(2023)Transformer-Based Visual Grounding with Cross-Modality InteractionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358725119:6(1-19)Online publication date: 9-Mar-2023
https://dl.acm.org/doi/10.1145/3587251
Xie JChen JCai YHuang QLi Q(2023)Visual Paraphrase Generation with Key Information RetainedACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358501019:6(1-19)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3585010
Hao JSun HRen PZhong YWang JQi QLiao J(2023)Fine-Grained Text-to-Video Temporal Grounding from Coarse BoundaryACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357982519:5(1-21)Online publication date: 16-Mar-2023
https://dl.acm.org/doi/10.1145/3579825
Wu QWang PWang XHe XZhu WWu QWang PWang XHe XZhu W(2022)Advanced Models for Video Question AnsweringVisual Question Answering10.1007/978-981-19-0964-1_9(135-143)Online publication date: 13-May-2022
https://doi.org/10.1007/978-981-19-0964-1_9
Volkov ATeslya NMoskvitin GBrovin NBochkarev E(2022)Spatio-temporal Data Sources Integration with Ontology for Road Accidents AnalysisBusiness Information Systems Workshops10.1007/978-3-031-04216-4_23(251-262)Online publication date: 6-Apr-2022
https://doi.org/10.1007/978-3-031-04216-4_23
Savosin STeslya N(2022)Estimation and Aggregation Method of Open Data Sources for Road Accident AnalysisIntelligent Systems Design and Applications10.1007/978-3-030-96308-8_95(1025-1034)Online publication date: 27-Mar-2022
https://doi.org/10.1007/978-3-030-96308-8_95
Savosin STeslya NMikhailov S(2021)Assessment Formation of Open Data Sources During Their Aggregation For Analyzing Road Accidents2021 30th Conference of Open Innovations Association FRUCT10.23919/FRUCT53335.2021.9599968(239-245)Online publication date: 27-Oct-2021
https://doi.org/10.23919/FRUCT53335.2021.9599968
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents