short-paper

Fine-Grained and Complex Food Entity Recognition Benchmark for Ingredient Substitution

Authors:

Agnieszka Lawrynowicz,

Anna Wróblewska,

Agnieszka Kaliska,

Maciej Pawlowski,

Dawid Wiśniewski,

Witold Sosnowski,

Jakub DutkiewiczAuthors Info & Claims

K-CAP '23: Proceedings of the 12th Knowledge Capture Conference 2023

Pages 25 - 29

https://doi.org/10.1145/3587259.3627543

Published: 05 December 2023 Publication History

Abstract

Food computing is currently fast-growing into an innovative area of knowledge extraction. However, benchmarks for information extraction from semi-structured data, especially when dealing with more complex relations, are scarce in this domain. In this paper, we introduce a benchmark aimed at information extraction of complex entities to support ingredient substitution tasks. Firstly, we present a new dataset – called TASTEset – for fine-grained recognition of food entities in culinary recipes. Secondly, we provide complex entity annotations for substitution on top of the fine-grained entity mentions, which we carefully prepared. We share the dataset and the tasks to encourage progress on more in-depth and complex information extraction from recipes.

References

[1]

Michal Bien, Michal Gilski, Martyna Maciejewska, Wojciech Taisner, Dawid Wisniewski, and Agnieszka Lawrynowicz. 2020. RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation. In Proceedings of the 13th International Conference on Natural Language Generation, INLG 2020, Dublin, Ireland, December 15-18, 2020, Brian Davis, Yvette Graham, John D. Kelleher, and Yaji Sripada (Eds.). Association for Computational Linguistics, 22–28. https://aclanthology.org/2020.inlg-1.4/

[2]

Gjorgjina Cenikj, Gorjan Popovski, Riste Stojanov, Barbara Koroušić Seljak, and Tome Eftimov. 2020. BuTTER: BidirecTional LSTM for Food Named-Entity Recognition. In 2020 IEEE International Conference on Big Data (Big Data). 3550–3556. https://doi.org/10.1109/BigData50022.2020.9378151

[3]

Fernando De la Torre, Jessica Hodgins, Adam Bargteil, Xavier Martin, Justin Macey, Alex Collado, and Pep Beltran. 2008. Guide to the carnegie mellon university multimodal activity (cmu-mmac) database. Robotics Institute (2008).

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). http://arxiv.org/abs/1810.04805 arXiv:1810.04805.

[6]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.

Digital Library

[7]

Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. CoRR abs/1508.01991 (2015). arXiv:1508.01991http://arxiv.org/abs/1508.01991

[8]

Clement Jonquet, Nigam Shah, Cherie Youn, Chris Callendar, Margaret-Anne Storey, and M Musen. 2009. NCBO annotator: semantic annotation of biomedical data. In International Semantic Web Conference, Poster and Demo session, Vol. 110. Washington DC, USA.

[9]

John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28 - July 1, 2001, Carla E. Brodley and Andrea Pohoreckyj Danyluk (Eds.). Morgan Kaufmann, 282–289.

[10]

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.

[11]

Javier Marín, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. 2021. Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 1 (2021), 187–203. https://doi.org/10.1109/TPAMI.2019.2927476

Digital Library

[12]

Giulia Menichetti, Babak Ravandi, Dariush Mozaffarian, and Albert-László Barabási. 2021. Machine Learning Prediction of Food Processing. medRxiv (2021). https://doi.org/10.1101/2021.05.22.21257615

[13]

Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. 2019. A Survey on Food Computing. ACM Comput. Surv. 52, 5, Article 92 (sep 2019), 36 pages. https://doi.org/10.1145/3329168

Digital Library

[14]

Shinsuke Mori, Hirokuni Maeta, Yoko Yamakata, and Tetsuro Sasada. 2014. Flow graph corpus from recipe texts. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). 2370–2377.

[15]

Nadeesha Perera, Thi Thuy Linh Nguyen, Matthias Dehmer, and Frank Emmert-Streib. 2022. Comparison of Text Mining Models for Food and Dietary Constituent Named-Entity Recognition. Machine Learning and Knowledge Extraction 4, 1 (2022), 254–275. https://doi.org/10.3390/make4010012

[16]

Gorjan Popovski, Stefan Kochev, Barbara Seljak, and Tome Eftimov. 2019. FoodIE: A Rule-based Named-entity Recognition Method for Food Information Extraction:. In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods. SCITEPRESS - Science and Technology Publications, Prague, Czech Republic, 915–922. https://doi.org/10.5220/0007686309150922

[17]

Gorjan Popovski, Barbara Koroušić Seljak, and Tome Eftimov. 2019. FoodBase corpus: a new resource of annotated food entities. Database 2019, baz121 (Jan. 2019). https://doi.org/10.1093/database/baz121

[18]

Gorjan Popovski, Barbara Koroušić Seljak, and Tome Eftimov. 2020. A Survey of Named-Entity Recognition Methods for Food Information Extraction. IEEE Access 8 (2020), 31586–31594. https://doi.org/10.1109/ACCESS.2020.2973502

[19]

Donghee Choi, Mogan Gim, Samy Badreddine, Hajung Kim, Donghyeon Park, and Jaewoo Kang. 2023. KitchenScale: Learning to predict ingredient quantities from recipe contexts. Expert Syst. Appl. 224 (2023), 120041. https://doi.org/10.1016/j.eswa.2023.120041

Digital Library

[20]

Shuyang Li, Yufei Li, Jianmo Ni, and Julian J. McAuley. 2022. SHARE: a System for Hierarchical Assistive Recipe Editing. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 11077–11090. https://aclanthology.org/2022.emnlp-main.761

[21]

Sola S. Shirai and HyeongSik Kim. 2022. EaT-PIM: Substituting Entities in Procedural Instructions Using Flow Graphs and Embeddings. In The Semantic Web - ISWC 2022 - 21st International Semantic Web Conference, Virtual Event, October 23-27, 2022, Proceedings(Lecture Notes in Computer Science, Vol. 13489), Ulrike Sattler, Aidan Hogan, C. Maria Keet, Valentina Presutti, João Paulo A. Almeida, Hideaki Takeda, Pierre Monnin, Giuseppe Pirrò, and Claudia d’Amato (Eds.). Springer, 161–178. https://doi.org/10.1007/978-3-031-19433-7_10

Digital Library

[22]

Isabel Segura-Bedmar, Paloma Martínez, and María Herrero-Zazo. 2013. SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013). In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Association for Computational Linguistics, Atlanta, Georgia, USA, 341–350. https://aclanthology.org/S13-2056

[23]

Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. 2020. BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23 (to appear).

Digital Library

[24]

Damion M. Dooley, Emma J. Griffiths, Gurinder S. Gosal, Pier L. Buttigieg, Robert Hoehndorf, Matthew C. Lange, Lynn M. Schriml, Fiona S. L. Brinkman, and William W. L. Hsiao. 2018. FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration. npj Science of Food 2, 1 (2018), 23–. https://doi.org/10.1038/s41538-018-0032-6

[25]

Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. brat: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Avignon, France, 102–107. https://aclanthology.org/E12-2021

[26]

Riste Stojanov, Gorjan Popovski, Gjorgjina Cenikj, Barbara Koroušić Seljak, and Tome Eftimov. 2021. A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation. Journal of Medical Internet Research 23, 8 (Aug. 2021), e28229. https://doi.org/10.2196/28229

[27]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

Digital Library

[28]

Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto. 2020. LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 6442–6454. https://doi.org/10.18653/v1/2020.emnlp-main.523

Index Terms

Fine-Grained and Complex Food Entity Recognition Benchmark for Ingredient Substitution
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
      1. Semantic networks
    2. Natural language processing
      1. Language resources
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Information extraction

Recommendations

Fine-grained Dutch named entity recognition

This paper describes the creation of a fine-grained named entity annotation scheme and corpus for Dutch, and experiments on automatic main type and subtype named entity recognition. We give an overview of existing named entity annotation schemes, and ...
Fine-grained Benchmark Subsetting for System Selection
CGO '14: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

System selection aims at finding the best architecture for a set of programs and workloads. It traditionally requires long running benchmarks. We propose a method to reduce the cost of system selection. We break down benchmarks into elementary fragments ...
Entity Retrieval Using Fine-Grained Entity Aspects
SIGIR '21: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval

Using entity aspect links, we improve upon the current state-of-the-art in entity retrieval. Entity retrieval is the task of retrieving relevant entities for search queries, such as "Antibiotic Use In Livestock". Entity aspect linking is a new technique ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

K-CAP '23: Proceedings of the 12th Knowledge Capture Conference 2023

December 2023

270 pages

ISBN:9798400701412

DOI:10.1145/3587259

Editors:
Brent Venable
University of West Florida and Institute for Human and Machine Cognition, Pensacola, FL, USA
,
Daniel Garijo
Ontology Engineering Group, Universidad Politécnica de Madrid, Spain
,
Brian Jalaian
University of West Florida and Institute for Human & Machine Cognition, Pensacola, FL, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Short-paper
Research
Refereed limited

Funding Sources

Norway Grants 2014-2021 via the National Centre for Research and Development

Conference

K-CAP '23

Sponsor:

SIGAI

K-CAP '23: Knowledge Capture Conference 2023

December 5 - 7, 2023

FL, Pensacola, USA

Acceptance Rates

Overall Acceptance Rate 55 of 198 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
74
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten