Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3587259.3627543acmconferencesArticle/Chapter ViewAbstractPublication Pagesk-capConference Proceedingsconference-collections
short-paper

Fine-Grained and Complex Food Entity Recognition Benchmark for Ingredient Substitution

Published: 05 December 2023 Publication History

Abstract

Food computing is currently fast-growing into an innovative area of knowledge extraction. However, benchmarks for information extraction from semi-structured data, especially when dealing with more complex relations, are scarce in this domain. In this paper, we introduce a benchmark aimed at information extraction of complex entities to support ingredient substitution tasks. Firstly, we present a new dataset – called TASTEset – for fine-grained recognition of food entities in culinary recipes. Secondly, we provide complex entity annotations for substitution on top of the fine-grained entity mentions, which we carefully prepared. We share the dataset and the tasks to encourage progress on more in-depth and complex information extraction from recipes.

References

[1]
Michal Bien, Michal Gilski, Martyna Maciejewska, Wojciech Taisner, Dawid Wisniewski, and Agnieszka Lawrynowicz. 2020. RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation. In Proceedings of the 13th International Conference on Natural Language Generation, INLG 2020, Dublin, Ireland, December 15-18, 2020, Brian Davis, Yvette Graham, John D. Kelleher, and Yaji Sripada (Eds.). Association for Computational Linguistics, 22–28. https://aclanthology.org/2020.inlg-1.4/
[2]
Gjorgjina Cenikj, Gorjan Popovski, Riste Stojanov, Barbara Koroušić Seljak, and Tome Eftimov. 2020. BuTTER: BidirecTional LSTM for Food Named-Entity Recognition. In 2020 IEEE International Conference on Big Data (Big Data). 3550–3556. https://doi.org/10.1109/BigData50022.2020.9378151
[3]
Fernando De la Torre, Jessica Hodgins, Adam Bargteil, Xavier Martin, Justin Macey, Alex Collado, and Pep Beltran. 2008. Guide to the carnegie mellon university multimodal activity (cmu-mmac) database. Robotics Institute (2008).
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). http://arxiv.org/abs/1810.04805 arXiv:1810.04805.
[6]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
[7]
Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirectional LSTM-CRF Models for Sequence Tagging. CoRR abs/1508.01991 (2015). arXiv:1508.01991http://arxiv.org/abs/1508.01991
[8]
Clement Jonquet, Nigam Shah, Cherie Youn, Chris Callendar, Margaret-Anne Storey, and M Musen. 2009. NCBO annotator: semantic annotation of biomedical data. In International Semantic Web Conference, Poster and Demo session, Vol. 110. Washington DC, USA.
[9]
John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28 - July 1, 2001, Carla E. Brodley and Andrea Pohoreckyj Danyluk (Eds.). Morgan Kaufmann, 282–289.
[10]
Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.
[11]
Javier Marín, Aritro Biswas, Ferda Ofli, Nicholas Hynes, Amaia Salvador, Yusuf Aytar, Ingmar Weber, and Antonio Torralba. 2021. Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 1 (2021), 187–203. https://doi.org/10.1109/TPAMI.2019.2927476
[12]
Giulia Menichetti, Babak Ravandi, Dariush Mozaffarian, and Albert-László Barabási. 2021. Machine Learning Prediction of Food Processing. medRxiv (2021). https://doi.org/10.1101/2021.05.22.21257615
[13]
Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. 2019. A Survey on Food Computing. ACM Comput. Surv. 52, 5, Article 92 (sep 2019), 36 pages. https://doi.org/10.1145/3329168
[14]
Shinsuke Mori, Hirokuni Maeta, Yoko Yamakata, and Tetsuro Sasada. 2014. Flow graph corpus from recipe texts. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). 2370–2377.
[15]
Nadeesha Perera, Thi Thuy Linh Nguyen, Matthias Dehmer, and Frank Emmert-Streib. 2022. Comparison of Text Mining Models for Food and Dietary Constituent Named-Entity Recognition. Machine Learning and Knowledge Extraction 4, 1 (2022), 254–275. https://doi.org/10.3390/make4010012
[16]
Gorjan Popovski, Stefan Kochev, Barbara Seljak, and Tome Eftimov. 2019. FoodIE: A Rule-based Named-entity Recognition Method for Food Information Extraction:. In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods. SCITEPRESS - Science and Technology Publications, Prague, Czech Republic, 915–922. https://doi.org/10.5220/0007686309150922
[17]
Gorjan Popovski, Barbara Koroušić Seljak, and Tome Eftimov. 2019. FoodBase corpus: a new resource of annotated food entities. Database 2019, baz121 (Jan. 2019). https://doi.org/10.1093/database/baz121
[18]
Gorjan Popovski, Barbara Koroušić Seljak, and Tome Eftimov. 2020. A Survey of Named-Entity Recognition Methods for Food Information Extraction. IEEE Access 8 (2020), 31586–31594. https://doi.org/10.1109/ACCESS.2020.2973502
[19]
Donghee Choi, Mogan Gim, Samy Badreddine, Hajung Kim, Donghyeon Park, and Jaewoo Kang. 2023. KitchenScale: Learning to predict ingredient quantities from recipe contexts. Expert Syst. Appl. 224 (2023), 120041. https://doi.org/10.1016/j.eswa.2023.120041
[20]
Shuyang Li, Yufei Li, Jianmo Ni, and Julian J. McAuley. 2022. SHARE: a System for Hierarchical Assistive Recipe Editing. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 11077–11090. https://aclanthology.org/2022.emnlp-main.761
[21]
Sola S. Shirai and HyeongSik Kim. 2022. EaT-PIM: Substituting Entities in Procedural Instructions Using Flow Graphs and Embeddings. In The Semantic Web - ISWC 2022 - 21st International Semantic Web Conference, Virtual Event, October 23-27, 2022, Proceedings(Lecture Notes in Computer Science, Vol. 13489), Ulrike Sattler, Aidan Hogan, C. Maria Keet, Valentina Presutti, João Paulo A. Almeida, Hideaki Takeda, Pierre Monnin, Giuseppe Pirrò, and Claudia d’Amato (Eds.). Springer, 161–178. https://doi.org/10.1007/978-3-031-19433-7_10
[22]
Isabel Segura-Bedmar, Paloma Martínez, and María Herrero-Zazo. 2013. SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013). In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Association for Computational Linguistics, Atlanta, Georgia, USA, 341–350. https://aclanthology.org/S13-2056
[23]
Fábio Souza, Rodrigo Nogueira, and Roberto Lotufo. 2020. BERTimbau: pretrained BERT models for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS, Rio Grande do Sul, Brazil, October 20-23 (to appear).
[24]
Damion M. Dooley, Emma J. Griffiths, Gurinder S. Gosal, Pier L. Buttigieg, Robert Hoehndorf, Matthew C. Lange, Lynn M. Schriml, Fiona S. L. Brinkman, and William W. L. Hsiao. 2018. FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration. npj Science of Food 2, 1 (2018), 23–. https://doi.org/10.1038/s41538-018-0032-6
[25]
Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. brat: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Avignon, France, 102–107. https://aclanthology.org/E12-2021
[26]
Riste Stojanov, Gorjan Popovski, Gjorgjina Cenikj, Barbara Koroušić Seljak, and Tome Eftimov. 2021. A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation. Journal of Medical Internet Research 23, 8 (Aug. 2021), e28229. https://doi.org/10.2196/28229
[27]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
[28]
Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, and Yuji Matsumoto. 2020. LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 6442–6454. https://doi.org/10.18653/v1/2020.emnlp-main.523

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
K-CAP '23: Proceedings of the 12th Knowledge Capture Conference 2023
December 2023
270 pages
ISBN:9798400701412
DOI:10.1145/3587259
  • Editors:
  • Brent Venable,
  • Daniel Garijo,
  • Brian Jalaian
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Funding Sources

  • Norway Grants 2014-2021 via the National Centre for Research and Development

Conference

K-CAP '23
Sponsor:
K-CAP '23: Knowledge Capture Conference 2023
December 5 - 7, 2023
FL, Pensacola, USA

Acceptance Rates

Overall Acceptance Rate 55 of 198 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 69
    Total Downloads
  • Downloads (Last 12 months)69
  • Downloads (Last 6 weeks)1
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media