survey

A Survey of Joint Intent Detection and Slot Filling Models in Natural Language Understanding

Authors:

Josiah Poon, and

Soyeon Caren HanAuthors Info & Claims

ACM Computing Surveys, Volume 55, Issue 8

Article No.: 156, Pages 1 - 38

https://doi.org/10.1145/3547138

Published: 23 December 2022 Publication History

Abstract

Intent classification, to identify the speaker’s intention, and slot filling, to label each token with a semantic type, are critical tasks in natural language understanding. Traditionally the two tasks have been addressed independently. More recently joint models that address the two tasks together have achieved state-of-the-art performance for each task and have shown there exists a strong relationship between the two. In this survey, we bring the coverage of methods up to 2021 including the many applications of deep learning in the field. As well as a technological survey, we look at issues addressed in the joint task and the approaches designed to address these issues. We cover datasets, evaluation metrics, and experiment design and supply a summary of reported performance on the standard datasets.

References

[1]

Frédéric Béchet and Christian Raymond. 2018. Is ATIS too shallow to go deeper for benchmarking spoken language understanding models? In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). ISCA, 1–5.

[2]

Valentina Bellomaria, Giuseppe Castellucci, Andrea Favalli, and Raniero Romagnoli. 2019. Almawave-SLU: A new dataset for SLU in Italian. In Proceedings of the 6th Italian Conference on Computational Linguistics. AILC.

[3]

Aditya Bhargava, Asli Celikyilmaz, Dilek Hakkani-Tür, and Ruhi Sarikaya. 2013. Easy contextual intent prediction and slot detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 8337–8341.

[4]

Anmol Bhasin, Bharatram Natarajan, Gaurav Mathur, Joo Hyuk Jeon, and Jun-Seong Kim. 2019. Unified parallel intent and slot prediction with cross fusion and slot masking. In Natural Language Processing and Information Systems, Elisabeth Métais, Farid Meziane, Sunil Vadera, Vijayan Sugumaran, and Mohamad Saraee (Eds.). Springer, Cham, 277–285.

[5]

Anmol Bhasin, Bharatram Natarajan, Gaurav Mathur, and Himanshu Mangla. 2020. Parallel intent and slot prediction using MLB fusion. In Proceedings of the 14th International Conference on Semantic Computing (ICSC’20). IEEE, San Diego, 217–220.

[6]

Hemanthage S. Bhathiya and Uthayasanker Thayasivam. 2020. Meta learning for few-shot joint intent detection and slot-filling. In Proceedings of the 5th International Conference on Machine Learning Technologies (ICMLT’20). Association for Computing Machinery, New York, NY, 86–92.

Digital Library

[7]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., Lake Tahoe, CA, 2787–2795.

Digital Library

[8]

Giuseppe Castellucci, Valentina Bellomaria, Andrea Favalli, and Raniero Romagnoli. 2019. Multi-lingual intent detection and slot filling in a joint BERT-based model. arxiv:1907.02884. Retrieved from https://arxiv.org/abs/1907.02884.

[9]

Asli Celikyilmaz and Dilek Hakkani-Tur. 2012. A joint model for discovery of aspects in utterances. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 330–338.

Digital Library

[10]

Mengyang Chen, Jin Zeng, and Jie Lou. 2019. A self-attention joint model for spoken language understanding in situational dialog applications. arxiv:1905.11393. Retrieved from https://arxiv.org/abs/1905.11393.

[11]

Qian Chen, Zhu Zhuo, and Wen Wang. 2019. BERT for joint intent classification and slot filling. arxiv:1902.10909. Retrieved from https://arxiv.org/abs/1902.10909.

[12]

Sixuan Chen and Shuai Yu. 2019. WAIS: Word attention for joint intent detection and slot filling. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. AAAI Press, 9927–9928.

Digital Library

[13]

Yun-Nung Chen, Dilek Hakanni-Tür, Gokhan Tur, Asli Celikyilmaz, Jianfeng Guo, and Li Deng. 2016. Syntax or semantics? Knowledge-guided joint semantic frame parsing. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT’16). IEEE, 348–355.

[14]

Lizhi Cheng, Weijia Jia, and Wenmian Yang. 2021. An effective non-autoregressive model for spoken language understanding. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM’21). Association for Computing Machinery, New York, NY, 241–250.

Digital Library

[15]

Lizhi Cheng, Wenmian Yang, and Weijia Jia. 2021. A result based portable framework for spoken language understanding. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’21). IEEE, Los Alamitos, CA, 1–6.

[16]

Alice Coucke, Alaa Saade, Adrien Ball, Théodore Bluche, Alexandre Caulier, David Leroy, Clément Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, Maël Primet, and Joseph Dureau. 2018. Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces. arxiv:1805.10190. Retrieved from https://arxiv.org/abs/1805.10190.

[17]

Slawomir Dadas, Jaroslaw Protasiewicz, and Witold Pedrycz. 2019. A deep learning model with data enrichment for intent detection and slot filling. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics. IEEE, 3012–3018.

Digital Library

[18]

Fatima Daha and Saniika Hewavitharana. 2019. Deep neural architecture with character embedding for semantic frame detection. In Proceedings of the IEEE 13th International Conference on Semantic Computing (ICSC’19). IEEE, 302–307.

[19]

Deborah A. Dahl, Madeleine Bates, Michael Brown, William Fisher, Kate Hunicke-Smith, David Pallett, Christine Pao, Alexander Rudnicky, and Elizabeth Shriberg. 1994. Expanding the scope of the ATIS task: The ATIS-3 corpus. In Proceedings of the Workshop on Human Language Technology. Association for Computational Linguistics, 43–48.

Digital Library

[20]

Anoop Deoras and Ruhi Sarikaya. 2013. Deep belief network based semantic taggers for spoken language understanding. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’13). ISCA, 2713–2717.

[21]

Jan Deriu, Alvaro Rodrigo, Arantxa Otegi, Guillermo Echegoyen, Sophie Rosset, Eneko Agirre, and Mark Cieliebak. 2020. Survey on evaluation methods for dialogue systems. Artif. Intell. Rev. 54 (2021), 755–810.

[22]

Haihong E, Peiqing Niu, Zhongfu Chen, and Meina Song. 2019. A novel bi-directional interrelated model for joint intent detection and slot filling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5467–5471.

[23]

Mauajama Firdaus, Shobhit Bhatnagar, Asif Ekbal, and Pushpak Bhattacharyya. 2018. A deep learning based multi-task ensemble model for intent detection and slot filling in spoken language understanding. In Neural Information Processing, Long Cheng, Andrew Chi Sing Leung, and Seiichi Ozawa (Eds.). Springer, Cham, 647–658.

[24]

Mauajama Firdaus, Shobhit Bhatnagar, Asif Ekbal, and Pushpak Bhattacharyya. 2018. Intent detection for spoken language understanding using a deep ensemble model. In Lecture Notes in Computer Science. Springer, Cham, 629–642.

[25]

Mauajama Firdaus, Hitesh Golchha, Asif Ekbal, and Pushpak Bhattacharyya. 2020. A deep multi-task model for dialogue act classification, intent detection and slot filling. Cogn. Comput. 13 (2020), 626–645.

[26]

Mauajama Firdaus, Ankit Kumar, Asif Ekbal, and Pushpak Bhattacharyya. 2019. A multi-task hierarchical approach for intent detection and slot filling. Knowl.-Bas. Syst. 183 (2019), 104846.

Digital Library

[27]

Rashmi Gangadharaiah and Balakrishnan Narayanaswamy. 2019. Joint multiple intent detection and slot labeling for goal-oriented dialog. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 564–569.

[28]

Chih-Wen Goo, Guang Gao, Yun-Kai Hsu, Chih-Li Huo, Tsung-Chieh Chen, Keng-Wei Hsu, and Yun-Nung Chen. 2018. Slot-gated modeling for joint slot filling and intent prediction. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, 753–757.

[29]

Daniel Guo, Gokhan Tur, Wen-tau Yih, and Geoffrey Zweig. 2014. Joint semantic utterance classification and slot filling with recursive neural networks. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT’14). IEEE, 554–559.

[30]

Arshit Gupta, John Hewitt, and Katrin Kirchhoff. 2019. Simple, fast, accurate intent classification and slot labeling for goal-oriented dialogue systems. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue. Association for Computational Linguistics, 46–55.

[31]

Arshit Gupta, Peng Zhang, Garima Lalwani, and Mona Diab. 2019. CASA-NLU: Context-aware self-attentive natural language understanding for task-oriented chatbots. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1285–1290.

[32]

Dilek Hakkani-Tür, Gökhan Tür, Asli Celikyilmaz, Yun-Nung Chen, Jianfeng Gao, Li Deng, and Ye-Yi Wang. 2016. Multi-domain joint semantic frame parsing using bi-directional RNN-LSTM. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’16). ISCA, 715–719.

[33]

Soyeon Caren Han, Siqu Long, Huichun Li, Henry Weld, and Josiah Poon. 2021. Bi-directional joint neural networks for intent classification and slot filling. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’21). ISCA, 4743–4747.

[34]

Ting He, Xiaohong Xu, Yating Wu, Huazhen Wang, and Jian Chen. 2021. Multitask learning with knowledge base for joint intent detection and slot filling. Appl. Sci. 11, 11 (2021).

[35]

Charles T. Hemphill, John J. Godfrey, and George R. Doddington. 1990. The ATIS spoken language systems pilot corpus. In Proceedings of the Workshop on Speech and Natural Language (HLT’90). Association for Computational Linguistics, 96–101.

Digital Library

[36]

Lynette Hirschman. 1992. Multi-site data collection for a spoken language corpus—MAD COW. In Proceedings of the 2nd International Conference on Spoken Language Processing (ICSLP’92). International Speech Communication Association, 903–906.

[37]

Lixian Hou, Yanling Li, Chengcheng Li, and Min Lin. 2019. Review of research on task-oriented spoken language understanding. J. Phys.: Conf. Ser. 1267 (July2019), 012023.

[38]

Zhiqi Huang, Fenglin Liu, Peilin Zhou, and Yuexian Zou. 2021. Sentiment injected iteratively co-interactive network for spoken language understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’21). IEEE, 7488–7492.

[39]

Yanfei Hui, Jianzong Wang, Ning Cheng, Fengying Yu, Tianbo Wu, and Jing Xiao. 2021. Joint intent detection and slot filling based on continual learning model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’21). IEEE, 7643–7647.

[40]

Minwoo Jeong and Gary Geunbae Lee. 2008. Triangular-chain conditional random fields. IEEE Trans. Aud. Speech Lang. Process. 16, 7 (2008), 1287–1302.

Digital Library

[41]

Sangkeun Jung, Jinsik Lee, and Jiwon Kim. 2018. Learning to embed semantic correspondence for natural language understanding. In Proceedings of the 22nd Conference on Computational Natural Language Learning. Association for Computational Linguistics, 131–140.

[42]

Young-Bum Kim, Sungjin Lee, and Karl Stratos. 2017. ONENET: Joint domain, intent, slot prediction for spoken language understanding. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU’17). IEEE, 547–553.

[43]

Jason Krone, Yi Zhang, and Mona Diab. 2020. Learning to classify intents and slot labels given a handful of examples. In Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI. Association for Computational Linguistics, 96–108.

[44]

Gakuto Kurata, Bing Xiang, Bowen Zhou, and Mo Yu. 2016. Leveraging sentence-level information with encoder LSTM for semantic slot filling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2077–2083.

[45]

Jihwan Lee, Dongchan Kim, Ruhi Sarikaya, and Young-Bum Kim. 2018. Coupled representation learning for domains, intents and slots in spoken language understanding. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT’18). IEEE, 714–719.

[46]

Michał Lew, Aleksander Obuchowski, and Monika Kutyła. 2021. Improving intent detection accuracy through token level labeling. In Proceedings of the 3rd Conference on Language, Data and Knowledge (LDK’21),Open Access Series in Informatics,Vol. 93, Dagmar Gromann, Gilles Sérasset, Thierry Declerck, John P. McCrae, Jorge Gracia, Julia Bosque-Gil, Fernando Bobillo, and Barbara Heinisch (Eds.). Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 30:1–30:11.

[47]

Changliang Li, Cunliang Kong, and Yan Zhao. 2018. A joint multi-task learning framework for spoken language understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’18). IEEE, Calgary, Canada, 6054–6058.

Digital Library

[48]

Changliang Li, Liang Li, and Ji Qi. 2018. A self-attentive model with gate mechanism for spoken language understanding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3824–3833.

[49]

Changliang Li, Yan Zhao, and Dong Yu. 2019. Conditional joint model for spoken dialogue system. In Proceedings of the International Conference on Cognitive Computing (ICCC’19), Ruifeng Xu, Jianzong Wang, and Liang-Jie Zhang (Eds.). Springer, Cham, 26–36.

Digital Library

[50]

Haoran Li, Abhinav Arora, Shuohui Chen, Anchit Gupta, Sonal Gupta, and Yashar Mehdad. 2021. MTOP: A comprehensive multilingual task-oriented semantic parsing benchmark. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, 2950–2962.

[51]

Shang-Wen Li, Jason Krone, Shuyan Dong, Yi Zhang, and Yaser Al-Onaizan. 2021. Meta learning to classify intent and slot labels with noisy few-shot examples. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT’21). IEEE, 1004–1011.

[52]

Bing Liu and Ian Lane. 2016. Attention-based recurrent neural network models for joint intent detection and slot filling. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’16). ISCA, 685–689.

[53]

Bing Liu and Ian Lane. 2016. Joint online spoken language understanding and language modeling with recurrent neural networks. In Proceedings of the Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL’16). Association for Computational Linguistics, 22–30.

[54]

Jiao Liu, Yanling Li, and Min Lin. 2019. Review of intent detection methods in the human-machine dialogue system. J. Phys.: Conf. Ser. 1267 (July2019), 012059.

[55]

Yijin Liu, Fandong Meng, Jinchao Zhang, Jie Zhou, Yufeng Chen, and Jinan Xu. 2019. CM-net: A novel collaborative memory network for spoken language understanding. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1051–1060.

[56]

Zihan Liu, Jamin Shin, Yan Xu, Genta Indra Winata, Peng Xu, Andrea Madotto, and Pascale Fung. 2019. Zero-shot cross-lingual dialogue systems with transferable latent variables. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1297–1303.

[57]

Cedric Lothritz, Kevin Allix, Bertrand Lebichot, Lisa Veiber, Tegawendé F. Bissyandé, and Jacques Klein. 2021. Comparing multilingual and multiple monolingual models for intent classification and slot filling. In Natural Language Processing and Information Systems, Elisabeth Métais, Farid Meziane, Helmut Horacek, and Epaminondas Kapetanios (Eds.). Springer, Cham, 367–375.

[58]

Samuel Louvan and Bernardo Magnini. 2019. Leveraging non-conversational tasks for low resource slot filling: Does it help? In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue. Association for Computational Linguistics, 85–91.

[59]

Mingbo Ma, Kai Zhao, Liang Huang, Bing Xiang, and Bowen Zhou. 2017. Jointly trained sequential labeling and classification by sparse attention neural networks. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’17). ISCA, 3334–3338.

[60]

Françoise Mairesse, Milica Gasic, Filip Jurcicek, Simon Keizer, Blaise Thomson, Kai Yu, and Steve Young. 2009. Spoken language understanding from unaligned data using discriminative classification models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 4749–4752.

Digital Library

[61]

Alaa Mohasseb, Mohamed Bader-El-Den, and Mihaela Cocea. 2018. Classification of factoid questions intent using grammatical features. ICT Express 4, 4 (December2018), 239–242.

[62]

Mahdi Namazifar, Alexandros Papangelis, Gokhan Tur, and Dilek Hakkani-Tür. 2021. Language model is all you need: Natural language understanding as question answering. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’21). IEEE, 7803–7807.

[63]

Pin Ni, Yuming Li, Gangmin Li, and Victor Chang. 2020. Natural language understanding approaches based on joint task of intent detection and slot filling for IoT voice interaction. Neural Comput. Appl. 32 (2020), 16149–16166.

Digital Library

[64]

Jingcheng Niu and Gerald Penn. 2019. Rationally reappraising ATIS-based dialogue systems. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5503–5507.

[65]

Eda Okur, Shachi H. Kumar, Saurav Sahay, Asli Arslan Esme, and Lama Nachman. 2019. Natural Language Interactions in Autonomous Vehicles: Intent Detection and Slot Filling from Passenger Utterances. arxiv:1904.10500 [cs.CL]. Retrieved from https://arxiv.org/abs/org/1904.10500.

[66]

David S. Pallett, Nancy L. Dahlgren, Jonathan G. Fiscus, William M. Fisher, John S. Garofolo, and Brett C. Tjaden. 1992. DARPA february 1992 ATIS benchmark test results. In Proceedings of the Workshop on Speech and Natural Language (HLT’91). Association for Computational Linguistics, 15–27.

Digital Library

[67]

Lingfeng Pan, Yi Zhang, Feiliang Ren, Yining Hou, Yan Li, Xiaobo Liang, and Yongkang Liu. 2018. A multiple utterances based neural network model for joint intent detection and slot filling. In Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS’18). CEUR-WS.org, 25–33.

[68]

Shiva Pentyala, Mengwen Liu, and Markus Dreyer. 2019. Multi-task networks with universe, group, and task feature learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 820–830.

[69]

Libo Qin, Wanxiang Che, Yangming Li, Haoyang Wen, and Ting Liu. 2019. A stack-propagation framework with token-level intent detection for spoken language understanding. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 2078–2087.

[70]

Libo Qin, Tailu Liu, Wanxiang Che, Bingbing Kang, Sendong Zhao, and Ting Liu. 2021. A co-interactive transformer for joint slot filling and intent detection. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’21). IEEE, 8193–8197.

[71]

Libo Qin, Minheng Ni, Yue Zhang, and Wanxiang Che. 2020. CoSDA-ML: Multi-lingual code-switching data augmentation for zero-shot cross-lingual NLP. In Proceedings of the 29th International Joint Conference on Artificial Intelligence (IJCAI’20), Christian Bessiere (Ed.). International Joint Conferences on Artificial Intelligence Organization, 3853–3860.

[72]

Libo Qin, Xiao Xu, Wanxiang Che, and Ting Liu. 2020. AGIF: An adaptive graph-interactive framework for joint multiple intent detection and slot filling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). Association for Computational Linguistics, 1807–1816.

[73]

Suman Ravuri and Andreas Stolcke. 2015. Recurrent neural network and LSTM models for lexical utterance classification. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’15). ISCA, 135–139.

[74]

Avik Ray, Yilin Shen, and Hongxia Jin. 2018. Robust spoken language understanding via paraphrasing. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18). ISCA, 3454–3458.

[75]

Avik Ray, Yilin Shen, and Hongxia Jin. 2019. Iterative delexicalization for improved spoken language understanding. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). ISCA, 1183–1187.

[76]

Christian Raymond and Giuseppe Riccardi. 2007. Generative and discriminative algorithms for spoken language understanding. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’07). ISCA, 1605–1608.

[77]

Evgeniia Razumovskaia, Goran Glavaš, Olga Majewska, Edoardo M. Ponti, Anna Korhonen, and Ivan Vulić. 2021. Crossing the conversational chasm: A primer on natural language processing for multilingual task-oriented dialogue systems. arxiv:2104.08570. Retrieved from https://arxiv.org/abs/2104.08570.

[78]

Fuji Ren and Siyuan Xue. 2020. Intention detection based on siamese neural network with triplet loss. IEEE Access 8 (2020), 82242–82254.

[79]

Sebastian Schuster, Sonal Gupta, Rushin Shah, and Mike Lewis. 2019. Cross-lingual transfer learning for multilingual task oriented dialog. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, 3795–3805.

[80]

Iulian Vlad Serban, Ryan Lowe, Peter Henderson, Laurent Charlin, and Joelle Pineau. 2018. A survey of available corpora for building data-driven dialogue systems: The journal version. Dialog. Discourse 9, 1 (2018), 1–49.

[81]

Yilin Shen, Wenhu Chen, and Hongxia Jin. 2019. Interpreting and improving deep neural SLU models via vocabulary importance. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’19). ISCA, 1328–1332.

[82]

Yilin Shen, Xiangyu Zeng, and Hongxia Jin. 2019. A progressive model to enable continual learning for semantic slot filling. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). Association for Computational Linguistics, 1279–1284.

[83]

Yilin Shen, Xiangyu Zeng, Yu Wang, and Hongxia Jin. 2018. User information augmented semantic frame parsing using progressive neural networks. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’18), B. Yegnanarayana (Ed.). ISCA, 3464–3468.

[84]

Yangyang Shi, Kaisheng Yao, Hu Chen, Yi-Cheng Pan, Mei-Yuh Hwang, and Baolin Peng. 2015. Contextual spoken language understanding using recurrent neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’15). IEEE, 5271–5275.

[85]

Aditya Siddhant, Anuj Kumar Goyal, and Angeliki Metallinou. 2019. Unsupervised transfer learning for spoken language understanding in intelligent agents. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. AAAI Press, 4959–4966.

Digital Library

[86]

Ieva Staliūnaitė and Ignacio Iacobacci. 2020. Auxiliary capsules for natural language understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’20). IEEE, 8154–8158.

[87]

Rui Sun, Lu Rao, and Xingfa Zhou. 2021. Bidirectional information transfer scheme for joint intent detection and slot filling. In Proceedings of the 17th International Conference on Computational Intelligence and Security (CIS’21). IEEE, 333–337.

[88]

Yik-Cheung Tam, Yangyang Shi, Hunk Chen, and Mei-Yuh Hwang. 2015. RNN-based labeled data generation for spoken language understanding. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’15). ISCA, 125–129.

[89]

Hao Tang, Donghong Ji, and Qiji Zhou. 2020. End-to-end masked graph-based CRF for joint slot filling and intent detection. Neurocomputing 413 (2020), 348–359.

[90]

Shimin Tao, Ying Qin, Yimeng Chen, Chunning Du, Haifeng Sun, Weibin Meng, Yanghua Xiao, Jiaxin Guo, Chang Su, Minghan Wang, Min Zhang, Yuxia Wang, and Hao Yang. 2021. Incorporating complete syntactical knowledge for spoken language understanding. In Knowledge Graph and Semantic Computing: Knowledge Graph Empowers New Infrastructure Construction, Bing Qin, Zhi Jin, Haofen Wang, Jeff Pan, Yongbin Liu, and Bo An (Eds.). Springer Singapore, Singapore, 145–156.

[91]

Quynh Ngoc Thi Do and Judith Gaspers. 2019. Cross-lingual transfer learning for spoken language understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’19). IEEE, 5956–5960.

[92]

Gokhan Tur, Asli Celikyilmaz, Xiaodong He, Dilek Hakkani-Tür, and Li Deng. 2018. Deep learning in conversational language understanding. In Deep Learning in Natural Language Processing, Li Deng and Yang Liu (Eds.). Springer Singapore, Singapore, 23–48.

[93]

Gokhan Tur and Renato De Mori. 2011. Spoken Language Understanding: Systems for Extracting Semantic Information from Speech. John Wiley & Sons, New York, NY.

[94]

Gokhan Tur, Dilek Hakkani-Tür, and Larry Heck. 2010. What is left to be understood in ATIS? In Proceedings of the IEEE Spoken Language Technology Workshop. IEEE, 19–24.

[95]

Gokhan Tur, Dilek Hakkani-Tur, Larry Heck, and S. Parthasarathy. 2011. Sentence simplification for spoken language understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’11). IEEE, 5628–5631.

[96]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30. Curran Associates, Inc., 5998–6008.

[97]

Thang Vu, Pankaj Gupta, Heike Adel, and Hinrich Schütze. 2016. Bi-directional recurrent neural network with ranking loss for spoken language understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’16). IEEE, 6060–6064.

Digital Library

[98]

Congrui Wang, Zhen Huang, and Minghao Hu. 2020. SASGBC: Improving sequence labeling performance for joint learning of slot filling and intent detection. In Proceedings of the 6th International Conference on Computing and Data Engineering (ICCDE’20). Association for Computing Machinery, New York, NY, 29–33.

Digital Library

[99]

Xiaojie Wang and Caixia Yuan. 2016. Recent advances on human-computer dialogue. CAAI Trans. Intell. Technol. 1, 4 (October2016), 303–312.

[100]

Yu Wang, Yue Deng, Yilin Shen, and Hongxia Jin. 2020. A new concept of multiple neural networks structure using convex combination. IEEE Trans. Neural Netw. Learn. Syst. 31, 11 (2020), 9539–9546.

[101]

Yufan Wang, Tingting He, Rui Fan, Wenji Zhou, and Xinhui Tu. 2019. Effective utilization of external knowledge and history context in multi-turn spoken language understanding model. In Proceedings of the IEEE International Conference on Big Data (Big Data’19). IEEE, Los Angeles, USA, 960–967.

[102]

Yu Wang, Yilin Shen, and Hongxia Jin. 2018. A bi-model based RNN semantic frame parsing model for intent detection and slot filling. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, 309–314.

[103]

Yufan Wang, Li Tang, and Tingting He. 2018. Attention-based CNN-BLSTM networks for joint intent detection and slot filling. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Maosong Sun, Ting Liu, Xiaojie Wang, Zhiyuan Liu, and Yang Liu (Eds.). Springer, Cham, 250–261.

Digital Library

[104]

Ye-Yi Wang. 2010. Strategies for statistical spoken language understanding with small amount of data—An empirical study. In Proceedings of the Conference of the International Speech Communication Association (INTERSPEECH’10). ISCA, 2498–2501.

[105]

Henry Weld, Guanghao Huang, Jean Lee, Tongshu Zhang, Kunze Wang, Xinghong Guo, Siqu Long, Josiah Poon, and Caren Han. 2021. CONDA: A CONtextual dual-annotated dataset for in-game toxicity understanding and detection. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, 2406–2416.

[106]

Liyun Wen, Xiaojie Wang, Zhenjiang Dong, and Hong Chen. 2018. Jointly modeling intent identification and slot filling with contextual and hierarchical information. In Natural Language Processing and Chinese Computing, Xuanjing Huang, Jing Jiang, Dongyan Zhao, Yansong Feng, and Yu Hong (Eds.). Springer, Cham, 3–15.

[107]

Cong Xu, Qing Li, Dezheng Zhang, Jiarui Cui, Zhenqi Sun, and Hao Zhou. 2020. A model with length-variable attention for spoken language understanding. Neurocomputing 379 (2020), 197–202.

Digital Library

[108]

Puyang Xu and Ruhi Sarikaya. 2013. Convolutional neural network based triangular CRF for joint intent detection and slot filling. In Proceedings of the Workshop on Automatic Speech Recognition and Understanding. IEEE, 78–83.

[109]

Weijia Xu, Batool Haider, and Saab Mansour. 2020. End-to-end slot alignment and recognition for cross-lingual NLU. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). Association for Computational Linguistics, 5052–5063.

[110]

Xuesong Yang, Yun-Nung Chen, Dilek Hakkani-Tür, Paul Crook, Xiujun Li, Jianfeng Gao, and Li Deng. 2017. End-to-end joint learning of natural language understanding and dialogue manager. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 5690–5694.

Digital Library

[111]

Kaisheng Yao, Baolin Peng, Yu Zhang, Dong Yu, Geoffrey Zweig, and Yangyang Shi. 2014. Spoken language understanding using long short-term memory neural networks. In Proceedings of the IEEE Spoken Language Technology Workshop (SLT’14). IEEE, 189–194.

[112]

Kaisheng Yao, Baolin Peng, Geoffrey Zweig, Dong Yu, Xiaolong Li, and Feng Gao. 2014. Recurrent conditional random field for language understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’14). IEEE, 4077–4081.

[113]

Dong Yu, Shizhen Wang, and Li Deng. 2010. Sequential labeling using deep-structured conditional random fields. IEEE J. Select. Top. Sign. Process. 4, 6 (2010), 965–973.

[114]

Shuai Yu, Lei Shen, Pengcheng Zhu, and Jiansong Chen. 2018. ACJIS: A novel attentive cross approach for joint intent detection and slot filling. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’18). IEEE, 1–7.

[115]

Yulan He and Steve Young. 2003. A data-driven spoken language understanding system. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE, 583–588.

[116]

Chenwei Zhang, Wei Fan, Nan Du, and Philip S. Yu. 2016. Mining user intentions from medical queries: A neural network based heterogeneous jointly modeling approach. In Proceedings of the 25th International Conference on World Wide Web (WWW’16). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 1373–1384.

Digital Library

[117]

Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, and Philip Yu. 2019. Joint slot filling and intent detection via capsule neural networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 5259–5267.

[118]

Dongjie Zhang, Zheng Fang, Yanan Cao, Yanbing Liu, Xiaojun Chen, and Jianlong Tan. 2018. Attention-based RNN model for joint extraction of intent and word slot based on a tagging strategy. In Proceedings of the International Conference on Artificial Neural Networks (ICANN’18), Věra Kůrková, Yannis Manolopoulos, Barbara Hammer, Lazaros Iliadis, and Ilias Maglogiannis (Eds.). Springer, Cham, 178–188.

[119]

Linhao Zhang, Dehong Ma, Xiaodong Zhang, Xiaohui Yan, and Hou-Feng Wang. 2020. Graph LSTM with context-gated mechanism for spoken language understanding. In Proceedings of the AAAI Annual Conference on Artificial Intelligence (AAAI’20). AAAI Press, 9539–9546.

[120]

Linhao Zhang and Houfeng Wang. 2019. Using bidirectional transformer-CRF for spoken language understanding. In Natural Language Processing and Chinese Computing, Jie Tang, Min-Yen Kan, Dongyan Zhao, Sujian Li, and Hongying Zan (Eds.). Springer, Cham, 130–141.

[121]

Shuyou Zhang, Junjie Jiang, Zaixing He, Xinyue Zhao, and Jinhui Fang. 2019. A novel slot-gated model combined with a key verb context feature for task request understanding by service robots. IEEE Access 7 (2019), 105937–105947.

[122]

Xiaodong Zhang and Houfeng Wang. 2016. A joint model of intent determination and slot filling for spoken language understanding. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). AAAI Press, 2993–2999.

[123]

Zhichang Zhang, Zhenwen Zhang, Haoyuan Chen, and Zhiman Zhang. 2019. A joint learning framework with BERT for spoken language understanding. IEEE Access 7 (2019), 168849–168858.

[124]

Xinlu Zhao, E. Haihong, and Meina Song. 2018. A joint model based on CNN-LSTMs in dialogue understanding. In Proceedings of the International Conference on Information Systems and Computer Aided Education (ICISCAE’18). IEEE, 471–475.

[125]

Yang Zheng, Yongkang Liu, and John H. L. Hansen. 2017. Intent detection and semantic parsing for navigation dialogue language processing. In Proceedings of the IEEE 20th International Conference on Intelligent Transportation Systems (ITSC’17). IEEE, 1–6.

Digital Library

[126]

Peilin Zhou, Zhiqi Huang, Fenglin Liu, and Yuexian Zou. 2021. PIN: A novel parallel interactive network for spoken language understanding. In Proceedings of the 25th International Conference on Pattern Recognition (ICPR’21). IEEE, Los Alamitos, CA, 2950–2957.

[127]

Qianrong Zhou, Liyun Wen, Xiaojie Wang, Long Ma, and Yue Wang. 2016. A hierarchical LSTM model for joint tasks. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Maosong Sun, Xuanjing Huang, Hongfei Lin, Zhiyuan Liu, and Yang Liu (Eds.). Springer, Cham, 324–335.

[128]

Su Zhu and Kai Yu. 2017. Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’17). IEEE, 5675–5679.

Digital Library

Cited By

Hernández AOrtega-Mendoza RVillatoro-Tello ECamacho-Bello CPérez-Cortés O(2024)Natural Language Understanding for Navigation of Service Robots in Low-Resource Domains and Languages: Scenarios in Spanish and NahuatlMathematics10.3390/math1208113612:8(1136)Online publication date: 10-Apr-2024
https://doi.org/10.3390/math12081136
Sanguinetti MAtzori M(2024)Conversational Agents for Energy Awareness and Efficiency: A SurveyElectronics10.3390/electronics1302040113:2(401)Online publication date: 18-Jan-2024
https://doi.org/10.3390/electronics13020401
He ZLi SSong YCai Z(2024)Towards Building Condition-Based Cross-Modality Intention-Aware Human-AI Cooperation under VR EnvironmentProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642360(1-13)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642360
Show More Cited By

Index Terms

A Survey of Joint Intent Detection and Slot Filling Models in Natural Language Understanding
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction

Recommendations

Natural language understanding approaches based on joint task of intent detection and slot filling for IoT voice interaction
Abstract
Internet of Things (IoT) based voice interaction system, as a new artificial intelligence application, provides a new human–computer interaction mode. The more intelligent and efficient communication approach poses greater challenges to the ...
Read More
An Interactive Two-Pass Decoding Network for Joint Intent Detection and Slot Filling
Natural Language Processing and Chinese Computing
Abstract
Intent detection and slot filling are two closely related tasks for building a spoken language understanding (SLU) system. The joint methods for the two tasks focus on modeling the semantic correlations between the intent and slots and applying ...
Read More
Multitask learning for multilingual intent detection and slot filling in dialogue systems
Abstract
Dialogue systems are becoming an ubiquitous presence in our everyday lives having a huge impact on business and society. Spoken language understanding (SLU) is the critical component of every goal-oriented dialogue system or any ...
Highlights
- We propose a multilingual multitask approach to fuse the two primary SLU tasks.
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 55, Issue 8

August 2023

789 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3567473

Editor:
Albert Zomaya
University of Sydney, Australia

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 December 2022

Online AM: 09 July 2022

Accepted: 20 June 2022

Revised: 06 June 2022

Received: 04 October 2021

Published in CSUR Volume 55, Issue 8

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

35
Total Citations
View Citations
2,726
Total Downloads

Downloads (Last 12 months)1,293
Downloads (Last 6 weeks)119

Other Metrics

View Author Metrics

Citations

Cited By

Hernández AOrtega-Mendoza RVillatoro-Tello ECamacho-Bello CPérez-Cortés O(2024)Natural Language Understanding for Navigation of Service Robots in Low-Resource Domains and Languages: Scenarios in Spanish and NahuatlMathematics10.3390/math1208113612:8(1136)Online publication date: 10-Apr-2024
https://doi.org/10.3390/math12081136
Sanguinetti MAtzori M(2024)Conversational Agents for Energy Awareness and Efficiency: A SurveyElectronics10.3390/electronics1302040113:2(401)Online publication date: 18-Jan-2024
https://doi.org/10.3390/electronics13020401
He ZLi SSong YCai Z(2024)Towards Building Condition-Based Cross-Modality Intention-Aware Human-AI Cooperation under VR EnvironmentProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642360(1-13)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642360
Chen GXu QZhan CWang FLiu KLiu HHao T(2024)Improving Open Intent Detection via Triplet-Contrastive Learning and Adaptive BoundaryIEEE Transactions on Consumer Electronics10.1109/TCE.2024.336389670:1(2806-2816)Online publication date: Feb-2024
https://doi.org/10.1109/TCE.2024.3363896
Chen XLi LZhu YDeng STan CHuang FSi LZhang NChen H(2024)Sequence Labeling as Non-Autoregressive Dual-Query Set GenerationIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2024.335805332(1546-1558)Online publication date: 5-Feb-2024
https://dl.acm.org/doi/10.1109/TASLP.2024.3358053
Suresh VAït-Mokhtar SBrun CCalapodescu I(2024)An Adapter-Based Unified Model for Multiple Spoken Language Processing TasksICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448240(10676-10680)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10448240
Pham TNguyen D(2024)JPIS: A Joint Model for Profile-Based Intent Detection and Slot Filling with Slot-to-Intent AttentionICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446353(10446-10450)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10446353
Zhang BTu ZWang CSun HChu D(2024)Requirements elicitation and response generation for conversational servicesApplied Intelligence10.1007/s10489-024-05454-654:7(5576-5592)Online publication date: 23-Apr-2024
https://dl.acm.org/doi/10.1007/s10489-024-05454-6
Wu DJiang LYin LLi ZHuang H(2024)CEA-Net: a co-interactive external attention network for joint intent detection and slot fillingNeural Computing and Applications10.1007/s00521-024-09733-8Online publication date: 22-Apr-2024
https://doi.org/10.1007/s00521-024-09733-8
Bazan MGniazdowski TWolkiewicz DSarna JMarchwiany M(2024)Large Language Models for Data Extraction in Slot-Filling TasksSystem Dependability - Theory and Applications10.1007/978-3-031-61857-4_1(1-18)Online publication date: 14-Jun-2024
https://doi.org/10.1007/978-3-031-61857-4_1
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents