Abstract
Question generation (QG) has been well studied in text and image but never been studied in video, which is popular multimedia in practice. In this paper, we propose a new task, video question generation. We adopt the encoder-decoder based framework to deal with this task. With the consideration that each video can be asked with more than one questions, and each question can belong to different types, we involve question type to guide the generation process. Specifically, a novel type-conditional temporal-spatial attention is proposed, which could capture required information of different types from video content at different time steps. Experiments show that our models outperform baseline and our type-conditional attention module captures the required information precisely. To best of our knowledge, we are the first to apply the end-to-end model on video question generation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Li, F.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Denkowski, M.J., Lavie, A.: Meteor universal: language specific translation evaluation for any target language. In: WMT@ACL, pp. 376–380 (2014)
Du, X., Shao, J., Cardie, C.: Learning to ask: neural question generation for reading comprehension. In: ACL, vol. 1, pp.1342–1352 (2017)
Fan, Z., Wei, Z., Li, P., Lan, Y., Huang, X.: A question type driven framework to diversify visual question generation. In: IJCAI, pp. 4048–4054 (2018)
Gao, J., Ge, R., Chen, K., Nevatia, R.: Motion-appearance co-memory networks for video question answering. In: CVPR, pp. 6576–6585 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Jang, Y., Song, Y., Yu, Y., Kim, Y., Kim, G.: TGIF-QA: toward spatio-temporal reasoning in visual question answering. In: CVPR, pp. 1359–1367 (2017)
Kim, K., Heo, M., Choi, S., Zhang, B.: Deepstory: video story QA by deep embedded memory networks. In: IJCAI, pp. 2016–2022 (2017)
Li, Y., et al.: Visual question generation as dual task of visual question answering. In: CVPR, pp. 6116–6124 (2018)
Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out (2004)
Maharaj, T., Ballas, N., Rohrbach, A., Courville, A.C., Pal, C.J.: A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering. In: CVPR, pp. 7359–7368 (2017)
Mostafazadeh, N., Misra, I., Devlin, J., Mitchell, M., He, X., Vanderwende, L.: Generating natural questions about an image. In: ACL (2016)
Mun, J., Seo, P.H., Jung, I., Han, B.: Marioqa: answering questions by watching gameplay videos. In: ICCV, pp. 2886–2894 (2017)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318. Association for Computational Linguistics (2002)
Skalban, Y.: Automatic generation of factual questions from video documentaries. Ph.D. thesis, University of Wolverhampton, UK (2013)
Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., Fidler, S.: Movieqa: understanding stories in movies through question-answering. In: CVPR, pp. 4631–4640 (2016)
Uehara, K., Tejero-de-Pablos, A., Ushiku, Y., Harada, T.: Visual question generation for class acquisition of unknown objects. In: ECCV, pp. 492–507 (2018)
Vaswani, A., et al.: Attention is all you need. In: NIPS, pp. 6000–6010 (2017)
Xiong, C., Merity, S., Socher, R.: Dynamic memory networks for visual and textual question answering. In: ICML, pp. 2397–2406 (2016)
Yu, L., et al.: Mattnet: modular attention network for referring expression comprehension. In: CVPR, pp. 1307–1315 (2018)
Zhang, S., Qu, L., You, S., Yang, Z., Zhang, J.: Automatic generation of grounded visual questions. In: IJCAI, pp. 4235–4243 (2017)
Zhu, Y., Groth, O., Bernstein, M.S., Fei-Fei, L.: Visual7w: grounded question answering in images. In: CVPR, pp. 4995–5004 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Huang, S., Hu, S., Yan, B. (2019). Watch and Ask: Video Question Generation. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Lecture Notes in Computer Science(), vol 11955. Springer, Cham. https://doi.org/10.1007/978-3-030-36718-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-36718-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36717-6
Online ISBN: 978-3-030-36718-3
eBook Packages: Computer ScienceComputer Science (R0)