Toward a Human-in-the-Loop Approach to Create Training Datasets for RDF Lexicalisation

Barbato, Jessica Amianto; Cremaschi, Marco; Rula, Anisa; Maurino, Andrea

doi:10.1007/978-3-031-47721-8_6

Jessica Amianto Barbato¹⁰,
Marco Cremaschi¹⁰,
Anisa Rula¹¹ &
…
Andrea Maurino¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 822))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

343 Accesses

Abstract

Datasets that include alignments between natural language and Knowledge Graphs are fundamental to a wide variety of Natural Language Processing and Generation tasks. Current state-of-the-art aligned datasets, though, are significantly impacted by reduced size and scarcity of covered domains, and their quality is difficult to evaluate. To compensate for these issues, we introduce SEALIon, a tool for extracting RDF triples from natural language textual corpora based on a human-in-the-loop approach. We present our first results of SEALIon’s approach, paving the way for further researches in the field of human-in-the-loop triple extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

From RDF to Natural Language and Back

Semi-automatic Ontology Builder Based on Relation Extraction from Textual Data

Open Knowledge Extraction Challenge 2017

Notes

1.
https://www.mturk.com/.
2.
https://appen.com/.
3.
The version 0.1 of the tool is available at https://sealion.ml/. The source code can be downloaded from the Git repository https://bitbucket.org/disco_unimib/sealion/.
4.
https://commoncrawl.org/.
5.
Available at https://gitlab.com/shimorina/webnlg-datas.
6.
https://github.com/google-research/text-to-text-transfer-transformer.
7.
https://spacy.io/.
8.
https://stanfordnlp.github.io/CoreNLP/.
9.
https://www.dbpedia-spotlight.org/api.
10.
https://lamapi.ml/.
11.
https://stanfordnlp.github.io/CoreNLP/sutime.html.
12.
https://relatedwords.org/.
13.
https://wordnet.princeton.edu/.

References

Angeli, G., Tibshirani, J., Wu, J., Manning, C.D.: Combining distant and partial supervision for relation extraction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1556–1567. Association for Computational Linguistics, Doha, Qatar (2014)
Google Scholar
Bhattacharjee, B., Kender, J.R., Hill, M., Dube, P., Huo, S., Glass, M.R., Belgodere, B., Pankanti, S., Codella, N., Watson, P.: P2l: Predicting transfer learning for images and semantic relations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 760–761 (2020)
Google Scholar
Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask? jury selection for decision making tasks on micro-blog services. Proc. VLDB Endow. 5(11), 1495–1506 (2012)
Google Scholar
Ferreira, T.C., Gardent, C., Ilinykh, N., van der Lee, C., Mille, S., Moussallem, D., Shimorina, A.: The 2020 bilingual, bi-directional WebNLG+ shared task: overview and evaluation results (WebNLG+ 2020). In: Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), pp. 55–76. Association for Computational Linguistics, Dublin, Ireland (Virtual) (2020)
Google Scholar
Ferreira, T.C., van der Lee, C., van Miltenburg, E., Krahmer, E.: Neural data-to-text generation: A comparison between pipeline and end-to-end architectures. In: Proceedings of the EMNLP-IJCNLP, pp. 552–562. Association for Computational Linguistics, Hong Kong, China (2019)
Google Scholar
Cremaschi, M., De Paoli, F., Rula, A., Spahiu, B.: A fully automated approach to a complete semantic table interpretation. Futur. Gener. Comput. Syst. 112, 478–500 (2020)
Google Scholar
Elsahar, H., Vougiouklis, P., Remaci, A., Gravier, C., Hare, J., Laforest, F., Simperl, E.: T-REx: A large scale alignment of natural language with knowledge base triples. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 3448–3452. European Language Resources Association (ELRA), Miyazaki, Japan (2018)
Google Scholar
Faridani, S., Hartmann, B., Ipeirotis, P.G.: What’s the right price? pricing tasks for finishing on time. In: Proceedings of the 11th AAAI Conference on Human Computation, AAAIWS’11-11, pp. 26–31. AAAI Press (2011)
Google Scholar
Gardent, C., Shimorina, A., Narayan, S., Perez-Beltrachini, L.: Creating training corpora for NLG micro-planners. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 179–188. Association for Computational Linguistics, Vancouver, Canada (2017)
Google Scholar
Glass, M., Gliozzo, A.: A dataset for web-scale knowledge base population. In: The Semantic Web, pp. 256–271. Springer International Publishing, Cham (2018)
Google Scholar
Glass, M., Gliozzo, A., Hassanzadeh, O., Mihindukulasooriya, N., Rossiello, G.: Inducing implicit relations from text using distantly supervised deep nets. In: The Semantic Web—ISWC 2018, pp. 38–55. Springer International Publishing, Cham (2018)
Google Scholar
Grosman, J.S., Furtado, P.H.T., Rodrigues, A.M.B., Schardong, G.G., Barbosa, S.D.J., Lopes, H.C.V.: Eras: improving the quality control in the annotation process for natural language processing tasks. Inf. Syst. 93, 101553 (2020)
Google Scholar
Hu, X., Wen, L., Xu, Y., Zhang, C., Yu, P.: SelfORE: Self-supervised relational feature learning for open relation extraction. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 673–3682. Association for Computational Linguistics (2020)
Google Scholar
Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: A survey. IEEE Trans. Knowl. Data Eng. 28(9), 2296–2319 (2016)
Google Scholar
Li, M., Jin, J., Wu, W., Yang, Y., He, L., Yang, J.: A crowdsourcing based human-in-the-loop framework for denoising uus in relation extraction tasks. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)
Google Scholar
Lin, X., Li, H., Xin, H., Li, Z., Chen, L.: Kbpearl: a knowledge base population system supported by joint entity and relation linking. Proc. VLDB Endow. 13(7), 1035–1049 (2020)
Article Google Scholar
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)
Article MathSciNet Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv:1301.3781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pp. 3111–3119. Curran Associates Inc. (2013)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011. Association for Computational Linguistics, Suntec, Singapore (2009)
Google Scholar
Mrabet, Y., Vougiouklis, P., Kilicoglu, H., Gardent, C., Demner-Fushman, D., Hare, J., Simperl, E.: Aligning texts and knowledge bases with semantic sentence simplification. In: Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016), pp. 29–36. Association for Computational Linguistics, Edinburgh, Scotland (2016)
Google Scholar
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR (2019)
Google Scholar
Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation extraction with matrix factorization and universal schemas. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 74–84. Association for Computational Linguistics, Atlanta, Georgia (2013)
Google Scholar
Shimorina, A., Khasanova, E., Gardent, C.: Creating a corpus for Russian data-to-text generation using neural machine translation and post-editing. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pp. 44–49. Association for Computational Linguistics, Florence, Italy (2019)
Google Scholar
Simon, E., Guigue, V., Piwowarski, B.: Unsupervised information extraction: Regularizing discriminative approaches with relation distribution losses. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1378–1387. Association for Computational Linguistics, Florence, Italy (2019)
Google Scholar
Smirnova A., Cudré-Mauroux, P.: Relation extraction using distant supervision: A survey. ACM Comput. Surv. 51(5) (2018)
Google Scholar
Yao, L., Riedel, S., McCallum, A.: Collective cross-document relation extraction without labelled data. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1013–1023. Association for Computational Linguistics, Cambridge, MA (2010)
Google Scholar
Zhao, C., Walker, M., Chaturvedi, S.: Bridging the structural gap between encoding and decoding for data-to-text generation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2481–2491. Association for Computational Linguistics (2020)
Google Scholar
Zheng, Y., Cheng, R., Maniu, S., Mo, L.: On optimality of jury selection in crowdsourcing. In: Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, pp. 193–204 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Milano-Bicocca, Milan, Italy
Jessica Amianto Barbato, Marco Cremaschi & Andrea Maurino
University of Brescia, Brescia, Italy
Anisa Rula

Authors

Jessica Amianto Barbato
View author publications
You can also search for this author in PubMed Google Scholar
Marco Cremaschi
View author publications
You can also search for this author in PubMed Google Scholar
Anisa Rula
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Maurino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jessica Amianto Barbato .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barbato, J.A., Cremaschi, M., Rula, A., Maurino, A. (2024). Toward a Human-in-the-Loop Approach to Create Training Datasets for RDF Lexicalisation. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-031-47721-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-47721-8_6
Published: 10 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47720-1
Online ISBN: 978-3-031-47721-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Toward a Human-in-the-Loop Approach to Create Training Datasets for RDF Lexicalisation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

From RDF to Natural Language and Back

Semi-automatic Ontology Builder Based on Relation Extraction from Textual Data

Open Knowledge Extraction Challenge 2017

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Toward a Human-in-the-Loop Approach to Create Training Datasets for RDF Lexicalisation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

From RDF to Natural Language and Back

Semi-automatic Ontology Builder Based on Relation Extraction from Textual Data

Open Knowledge Extraction Challenge 2017

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation