CookDial: a dataset for task-oriented dialogs grounded in procedural documents

Jiang, Yiwei; Zaporojets, Klim; Deleu, Johannes; Demeester, Thomas; Develder, Chris

doi:10.1007/s10489-022-03692-0

CookDial: a dataset for task-oriented dialogs grounded in procedural documents

Published: 15 June 2022

Volume 53, pages 4748–4766, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yiwei Jiang ORCID: orcid.org/0000-0002-4906-7308¹,
Klim Zaporojets¹,
Johannes Deleu¹,
Thomas Demeester¹ &
…
Chris Develder¹

563 Accesses
3 Citations
2 Altmetric
Explore all metrics

Abstract

This work presents a new dialog dataset, CookDial, that facilitates research on task-oriented dialog systems with procedural knowledge understanding. The corpus contains 260 human-to-human task-oriented dialogs in which an agent, given a recipe document, guides the user to cook a dish. Dialogs in CookDial exhibit two unique features: (i) procedural alignment between the dialog flow and supporting document; (ii) complex agent decision-making that involves segmenting long sentences, paraphrasing hard instructions and resolving coreference in the dialog context. In addition, we identify three challenging (sub)tasks in the assumed task-oriented dialog system: (1) User Question Understanding, (2) Agent Action Frame Prediction, and (3) Agent Response Generation. For each of these tasks, we develop a neural baseline model, which we evaluate on the CookDial dataset. We publicly release the CookDial dataset, comprising rich annotations of both dialogs and recipe documents, to stimulate further research on domain-specific document-grounded dialog systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unraveling ChatGPT: A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations

A Commonsense-Enhanced Document-Grounded Conversational Agent: A Case Study on Task-Based Dialogue

Lilia, A Showcase for Fast Bootstrap of Conversation-Like Dialogues Based on a Goal-Oriented System

Notes

https://github.com/YiweiJiang2015/RISeC
https://responsivevoice.org/
We performed vertical normalization on each cell by dividing its frequency by the sum of all the cell frequencies in the same column.
By default, all FFNNs in this work are composed of 1 hidden layer activated by the GELU function and 1 output layer.

References

Gunasekara C, Kim S, D’Haro LF et al (2021) Overview of the ninth dialog system technology challenge: DSTC9. In: Proceedings of the DSTC workshop at AAAI, Online
Wen TH, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona LM, Su PH, Ultes S, Young S (2017) A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of EACL, Valencia, pp 438–449. https://aclanthology.org/E17-1042
Budzianowski P, Wen TH, Tseng BH, Casanueva I, Ultes S, Ramadan O, Gasic M (2018) Multiwoz - a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling. In: Proceedings of EMNLP, Brussels, pp 5016–5026. https://doi.org/10.18653/v1/D18-1547
Rastogi A, Zang X, Sunkara S, Gupta R, Khaitan P (2020) Towards scalable multi-domain conversational agents: the schema-guided dialogue dataset. In: Proceedings of AAAI, vol 34. New York, pp 8689–8696. https://doi.org/10.1609/aaai.v34i05.6394
Rajpurkar P, Jia R, Liang P (2018) Know what you don’t know: Unanswerable questions for SQuAD. In: Proceedings of ACL, vol 2. Melbourne, pp 784–789. https://doi.org/10.18653/v1/P18-2124
Zhou H, Zheng C, Huang K, Huang M, Zhu X (2020) KdConv: A Chinese multi-domain dialogue dataset towards multi-turn knowledge-driven conversation. In: Proceedings of ACL, Online, pp 7098–7108. https://doi.org/10.18653/v1/2020.acl-main.635
Reddy S, Chen D, Manning CD (2019) CoQA: a conversational question answering challenge. Transactions of the Association for Computational Linguistics 7:249–266. https://doi.org/10.1162/tacla00266
Article Google Scholar
Choi E, He H, Iyyer M, Yatskar M, Yih WT, Choi Y, Liang P, Zettlemoyer L (2018) QuAC: question answering in context. In: Proceedings of EMNLP, Brussels, pp 2174–2184. https://doi.org/10.18653/v1/D18-1241
Campos JA, Otegi A, Soroa A, Deriu J, Cieliebak M, Agirre E (2020) DoQA - accessing domain-specific FAQs via conversational QA. In: Proceedings of ACL, Online, pp 7302–7314. https://doi.org/10.18653/v1/2020.acl-main.652
Saeidi M, Bartolo M, Lewis P, Singh S, Rocktäschel T, Sheldon M, Bouchard G, Riedel S (2018) Interpretation of natural language rules in conversational machine reading. In: Proceedings of EMNLP, Brussels, pp 2087–2097. https://doi.org/10.18653/v1/D18-1233
Feng S, Wan H, Gunasekara C, Patel S, Joshi S, Lastras L (2020) Doc2Dial: a goal-oriented document-grounded dialogue dataset. In: Proceedings of EMNLP, Online, pp 8118–8128. https://doi.org/10.18653/v1/2020.emnlp-main.652
Raghu D, Agarwal S, Joshi S (2021) Mausam: end-to-end learning of flowchart grounded task-oriented dialogs. In: Proceedings of EMNLP, Online and Punta Cana, Dominican Republic, pp 4348–4366. https://doi.org/10.18653/v1/2021.emnlp-main.357
Jiang Y, Zaporojets K, Deleu J, Demeester T, Develder C (2020) Recipe instruction semantics corpus (RISeC): resolving semantic structure and zero anaphora in recipes. In: Proceedings of AACL, Online and Suzhou, China, pp 821–826. https://aclanthology.org/2020.aacl-main.82
Burtsev M, Chuklin A, Kiseleva J, Borisov A (2017) Search-oriented conversational AI (SCAI). In: Proceedings of ACM SIGIR ICTIR, Amsterdam, The Netherlands, pp 333–334. https://doi.org/10.1145/3121050.3121111
Henderson M, Thomson B, Williams J (2014) The third dialog state tracking challenge. In: Proceedings of the SLT workshop at IEEE, pp 324–329
Wen TH, Vandyke D, Mrkšić N, Gašić M, Rojas-Barahona LM, Su PH, Ultes S, Young S (2017) A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of EACL, vol 1. Valencia, Spain, pp 438–449. https://aclanthology.org/E17-1042
El Asri L, Schulz H, Sharma S, Zumer J, Harris J, Fine E, Mehrotra R, Suleman K (2017) Frames: a corpus for adding memory to goal-oriented dialogue systems. In: Proceedings of SIGDIAL, Saarbrücken, Germany, pp 207–219. https://doi.org/10.18653/v1/W17-5526
Kollar T, Berry D, Stuart L, Owczarzak K, Chung T, Mathias L, Kayser M, Snow B, Matsoukas S (2018) The Alexa meaning representation language. In: Proceedings of NAACL, vol 3. New Orleans - Louisiana, pp 177–184. https://doi.org/10.18653/v1/N18-3022
Gupta S, Shah R, Mohit M, Kumar A, Lewis M (2018) Semantic parsing for task oriented dialog using hierarchical representations. In: Proceedings of EMNLP, Brussels, Belgium, pp 2787–2792. https://doi.org/10.18653/v1/D18-1300
Aghajanyan A, Maillard J, Shrivastava A, Diedrick K, Haeger M, Li H, Mehdad Y, Stoyanov V, Kumar A, Lewis M, Gupta S (2020) Conversational semantic parsing. In: Proceedings of EMNLP, Online, pp 5026–5035. https://doi.org/10.18653/v1/2020.emnlp-main.408
Bunt H, Petukhova V, Traum D, Alexandersson J (2017) Dialogue act annotation with the ISO 24617-2 Standard, pp 109–135. https://doi.org/10.1007/978-3-319-42816-1-6. Springer, Cham
Google Scholar
Qu C, Yang L, Qiu M, Zhang Y, Chen C, Croft W, Iyyer M (2019) Attentive history selection for conversational question answering. In: Proceedings of CIKM, Beijing, China, pp 1391–1400. https://doi.org/10.1145/3357384.3357905
Zaheer M, Guruganesh G, Dubey KA, Ainslie J, Alberti C, Ontanon S, Pham P, Ravula A, Wang Q, Yang L, Ahmed A (2020) Big bird: transformers for longer sequences. In: Proceedings of NeurIPS, vol 33. Online, pp 17283–17297
Sutton C, McCallum A (2012) An introduction to conditional random fields. Foundations and Trends in Machine Learning 4:267–373. https://doi.org/10.1561/2200000013
Article MATH Google Scholar
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67
MATH Google Scholar
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Brew J (2020) Huggingface: Transformers: State-of-the-art natural language processing. In: Proceedings of EMNLP: system demonstrations, Online, pp 38–45. https://doi.org/10.18653/v1/2020.emnlp-demos.6
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization. In: Proceedings of ICLR, Vancouver, BC, Canada. https://openreview.net/forum?id=Bkg6RiCqY7

Download references

Acknowledgements

We thank Maarten De Raedt and Amir Hadifar for their insightful suggestions in the initial data collection. The first author is supported by China Scholarship Council (No. 201906020194) and Bijzonder Onderzoeksfonds (BOF) van Universiteit Gent (No. 01SC0618). This research also receives funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” programme.

Author information

Authors and Affiliations

IDLab, Ghent University – imec, Technologiepark Zwijnaarde 126, 9052, Ghent, Belgium
Yiwei Jiang, Klim Zaporojets, Johannes Deleu, Thomas Demeester & Chris Develder

Authors

Yiwei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Klim Zaporojets
View author publications
You can also search for this author in PubMed Google Scholar
Johannes Deleu
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Demeester
View author publications
You can also search for this author in PubMed Google Scholar
Chris Develder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiwei Jiang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Experiment settings

All the transformer modules in our models are implemented with the Huggingface library [26]. We conducted the experiments with a single Nvidia-Tesla-V100 (32GB) card. For all the tasks, we use the AdamW optimizer [27]. For both of Task I and Task II, we use two different learning rates depending on the layers to accelerate convergence: (i) 10^− 5 for the layers within the BigBird encoder; (ii) 10^− 3 for the top classifier layers (FFNNs and CRF). For Task III, the learning rate for all the layers is set to 3 × 10^− 4. The batch size is set to 8. The hidden size for all the FFNN layers is 128 except the intent classifier layer (64) in Task I. The dropout is set to 0.2 in the fine-tuning when needed.

Appendix B: User intent and agent act annotations

Elucidation on how we annotate the user intents and agent acts is presented in Tables B.1 and B.2 respectively. For each intent or agent act, we also provide an annotation example except a few, i.e., other, repeat.

Table B.1 Annotation scheme for the user intents

Full size table

Table B.2 Annotation scheme for the agent acts

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, Y., Zaporojets, K., Deleu, J. et al. CookDial: a dataset for task-oriented dialogs grounded in procedural documents. Appl Intell 53, 4748–4766 (2023). https://doi.org/10.1007/s10489-022-03692-0

Download citation

Accepted: 28 April 2022
Published: 15 June 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10489-022-03692-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CookDial: a dataset for task-oriented dialogs grounded in procedural documents

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unraveling ChatGPT: A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations

A Commonsense-Enhanced Document-Grounded Conversational Agent: A Case Study on Task-Based Dialogue

Lilia, A Showcase for Fast Bootstrap of Conversation-Like Dialogues Based on a Goal-Oriented System

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Experiment settings

Appendix B: User intent and agent act annotations

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

CookDial: a dataset for task-oriented dialogs grounded in procedural documents

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unraveling ChatGPT: A Critical Analysis of AI-Generated Goal-Oriented Dialogues and Annotations

A Commonsense-Enhanced Document-Grounded Conversational Agent: A Case Study on Task-Based Dialogue

Lilia, A Showcase for Fast Bootstrap of Conversation-Like Dialogues Based on a Goal-Oriented System

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendices

Appendix A: Experiment settings

Appendix B: User intent and agent act annotations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation