Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

CookDial: a dataset for task-oriented dialogs grounded in procedural documents

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This work presents a new dialog dataset, CookDial, that facilitates research on task-oriented dialog systems with procedural knowledge understanding. The corpus contains 260 human-to-human task-oriented dialogs in which an agent, given a recipe document, guides the user to cook a dish. Dialogs in CookDial exhibit two unique features: (i) procedural alignment between the dialog flow and supporting document; (ii) complex agent decision-making that involves segmenting long sentences, paraphrasing hard instructions and resolving coreference in the dialog context. In addition, we identify three challenging (sub)tasks in the assumed task-oriented dialog system: (1) User Question Understanding, (2) Agent Action Frame Prediction, and (3) Agent Response Generation. For each of these tasks, we develop a neural baseline model, which we evaluate on the CookDial dataset. We publicly release the CookDial dataset, comprising rich annotations of both dialogs and recipe documents, to stimulate further research on domain-specific document-grounded dialog systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://github.com/YiweiJiang2015/RISeC

  2. https://responsivevoice.org/

  3. We performed vertical normalization on each cell by dividing its frequency by the sum of all the cell frequencies in the same column.

  4. By default, all FFNNs in this work are composed of 1 hidden layer activated by the GELU function and 1 output layer.

References

  1. Gunasekara C, Kim S, D’Haro LF et al (2021) Overview of the ninth dialog system technology challenge: DSTC9. In: Proceedings of the DSTC workshop at AAAI, Online

  2. Wen TH, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona LM, Su PH, Ultes S, Young S (2017) A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of EACL, Valencia, pp 438–449. https://aclanthology.org/E17-1042

  3. Budzianowski P, Wen TH, Tseng BH, Casanueva I, Ultes S, Ramadan O, Gasic M (2018) Multiwoz - a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In: Proceedings of EMNLP, Brussels, pp 5016–5026. https://doi.org/10.18653/v1/D18-1547

  4. Rastogi A, Zang X, Sunkara S, Gupta R, Khaitan P (2020) Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. In: Proceedings of AAAI, vol 34. New York, pp 8689–8696. https://doi.org/10.1609/aaai.v34i05.6394

  5. Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: Unanswerable questions for SQuAD. In: Proceedings of ACL, vol 2. Melbourne, pp 784–789. https://doi.org/10.18653/v1/P18-2124

  6. Zhou H, Zheng C, Huang K, Huang M, Zhu X (2020) KdConv: A Chinese multi-domain dialogue dataset towards multi-turn knowledge-driven conversation. In: Proceedings of ACL, Online, pp 7098–7108. https://doi.org/10.18653/v1/2020.acl-main.635

  7. Reddy S, Chen D, Manning CD (2019) CoQA: a conversational question answering challenge. Transactions of the Association for Computational Linguistics 7:249–266. https://doi.org/10.1162/tacla00266

    Article  Google Scholar 

  8. Choi E, He H, Iyyer M, Yatskar M, Yih WT, Choi Y, Liang P, Zettlemoyer L (2018) QuAC: question answering in context. In: Proceedings of EMNLP, Brussels, pp 2174–2184. https://doi.org/10.18653/v1/D18-1241

  9. Campos JA, Otegi A, Soroa A, Deriu J, Cieliebak M, Agirre E (2020) DoQA - accessing domain-specific FAQs via conversational QA. In: Proceedings of ACL, Online, pp 7302–7314. https://doi.org/10.18653/v1/2020.acl-main.652

  10. Saeidi M, Bartolo M, Lewis P, Singh S, Rocktäschel T, Sheldon M, Bouchard G, Riedel S (2018) Interpretation of natural language rules in conversational machine reading. In: Proceedings of EMNLP, Brussels, pp 2087–2097. https://doi.org/10.18653/v1/D18-1233

  11. Feng S, Wan H, Gunasekara C, Patel S, Joshi S, Lastras L (2020) Doc2Dial: a goal-oriented document-grounded dialogue dataset. In: Proceedings of EMNLP, Online, pp 8118–8128. https://doi.org/10.18653/v1/2020.emnlp-main.652

  12. Raghu D, Agarwal S, Joshi S (2021) Mausam: end-to-end learning of flowchart grounded task-oriented dialogs. In: Proceedings of EMNLP, Online and Punta Cana, Dominican Republic, pp 4348–4366. https://doi.org/10.18653/v1/2021.emnlp-main.357

  13. Jiang Y, Zaporojets K, Deleu J, Demeester T, Develder C (2020) Recipe instruction semantics corpus (RISeC): resolving semantic structure and zero anaphora in recipes. In: Proceedings of AACL, Online and Suzhou, China, pp 821–826. https://aclanthology.org/2020.aacl-main.82

  14. Burtsev M, Chuklin A, Kiseleva J, Borisov A (2017) Search-oriented conversational AI (SCAI). In: Proceedings of ACM SIGIR ICTIR, Amsterdam, The Netherlands, pp 333–334. https://doi.org/10.1145/3121050.3121111

  15. Henderson M, Thomson B, Williams J (2014) The third dialog state tracking challenge. In: Proceedings of the SLT workshop at IEEE, pp 324–329

  16. Wen TH, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona LM, Su PH, Ultes S, Young S (2017) A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of EACL, vol 1. Valencia, Spain, pp 438–449. https://aclanthology.org/E17-1042

  17. El Asri L, Schulz H, Sharma S, Zumer J, Harris J, Fine E, Mehrotra R, Suleman K (2017) Frames: a corpus for adding memory to goal-oriented dialogue systems. In: Proceedings of SIGDIAL, Saarbrücken, Germany, pp 207–219. https://doi.org/10.18653/v1/W17-5526

  18. Kollar T, Berry D, Stuart L, Owczarzak K, Chung T, Mathias L, Kayser M, Snow B, Matsoukas S (2018) The Alexa meaning representation language. In: Proceedings of NAACL, vol 3. New Orleans - Louisiana, pp 177–184. https://doi.org/10.18653/v1/N18-3022

  19. Gupta S, Shah R, Mohit M, Kumar A, Lewis M (2018) Semantic parsing for task oriented dialog using hierarchical representations. In: Proceedings of EMNLP, Brussels, Belgium, pp 2787–2792. https://doi.org/10.18653/v1/D18-1300

  20. Aghajanyan A, Maillard J, Shrivastava A, Diedrick K, Haeger M, Li H, Mehdad Y, Stoyanov V, Kumar A, Lewis M, Gupta S (2020) Conversational semantic parsing. In: Proceedings of EMNLP, Online, pp 5026–5035. https://doi.org/10.18653/v1/2020.emnlp-main.408

  21. Bunt H, Petukhova V, Traum D, Alexandersson J (2017) Dialogue act annotation with the ISO 24617-2 Standard, pp 109–135. https://doi.org/10.1007/978-3-319-42816-1-6. Springer, Cham

    Google Scholar 

  22. Qu C, Yang L, Qiu M, Zhang Y, Chen C, Croft W, Iyyer M (2019) Attentive history selection for conversational question answering. In: Proceedings of CIKM, Beijing, China, pp 1391–1400. https://doi.org/10.1145/3357384.3357905

  23. Zaheer M, Guruganesh G, Dubey KA, Ainslie J, Alberti C, Ontanon S, Pham P, Ravula A, Wang Q, Yang L, Ahmed A (2020) Big bird: transformers for longer sequences. In: Proceedings of NeurIPS, vol 33. Online, pp 17283–17297

  24. Sutton C, McCallum A (2012) An introduction to conditional random fields. Foundations and Trends in Machine Learning 4:267–373. https://doi.org/10.1561/2200000013

    Article  MATH  Google Scholar 

  25. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67

    MATH  Google Scholar 

  26. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Brew J (2020) Huggingface: Transformers: State-of-the-art natural language processing. In: Proceedings of EMNLP: system demonstrations, Online, pp 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6

  27. Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: Proceedings of ICLR, Vancouver, BC, Canada. https://openreview.net/forum?id=Bkg6RiCqY7

Download references

Acknowledgements

We thank Maarten De Raedt and Amir Hadifar for their insightful suggestions in the initial data collection. The first author is supported by China Scholarship Council (No. 201906020194) and Bijzonder Onderzoeksfonds (BOF) van Universiteit Gent (No. 01SC0618). This research also receives funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yiwei Jiang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Experiment settings

All the transformer modules in our models are implemented with the Huggingface library [26]. We conducted the experiments with a single Nvidia-Tesla-V100 (32GB) card. For all the tasks, we use the AdamW optimizer [27]. For both of Task I and Task II, we use two different learning rates depending on the layers to accelerate convergence: (i) 10− 5 for the layers within the BigBird encoder; (ii) 10− 3 for the top classifier layers (FFNNs and CRF). For Task III, the learning rate for all the layers is set to 3 × 10− 4. The batch size is set to 8. The hidden size for all the FFNN layers is 128 except the intent classifier layer (64) in Task I. The dropout is set to 0.2 in the fine-tuning when needed.

Appendix B: User intent and agent act annotations

Elucidation on how we annotate the user intents and agent acts is presented in Tables B.1 and B.2 respectively. For each intent or agent act, we also provide an annotation example except a few, i.e., other, repeat.

Table B.1 Annotation scheme for the user intents
Table B.2 Annotation scheme for the agent acts

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, Y., Zaporojets, K., Deleu, J. et al. CookDial: a dataset for task-oriented dialogs grounded in procedural documents. Appl Intell 53, 4748–4766 (2023). https://doi.org/10.1007/s10489-022-03692-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03692-0

Keywords