Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Toward a Human-in-the-Loop Approach to Create Training Datasets for RDF Lexicalisation

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 822))

Included in the following conference series:

  • 343 Accesses

Abstract

Datasets that include alignments between natural language and Knowledge Graphs are fundamental to a wide variety of Natural Language Processing and Generation tasks. Current state-of-the-art aligned datasets, though, are significantly impacted by reduced size and scarcity of covered domains, and their quality is difficult to evaluate. To compensate for these issues, we introduce SEALIon, a tool for extracting RDF triples from natural language textual corpora based on a human-in-the-loop approach. We present our first results of SEALIon’s approach, paving the way for further researches in the field of human-in-the-loop triple extraction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.mturk.com/.

  2. 2.

    https://appen.com/.

  3. 3.

    The version 0.1 of the tool is available at https://sealion.ml/. The source code can be downloaded from the Git repository https://bitbucket.org/disco_unimib/sealion/.

  4. 4.

    https://commoncrawl.org/.

  5. 5.

    Available at https://gitlab.com/shimorina/webnlg-datas.

  6. 6.

    https://github.com/google-research/text-to-text-transfer-transformer.

  7. 7.

    https://spacy.io/.

  8. 8.

    https://stanfordnlp.github.io/CoreNLP/.

  9. 9.

    https://www.dbpedia-spotlight.org/api.

  10. 10.

    https://lamapi.ml/.

  11. 11.

    https://stanfordnlp.github.io/CoreNLP/sutime.html.

  12. 12.

    https://relatedwords.org/.

  13. 13.

    https://wordnet.princeton.edu/.

References

  1. Angeli, G., Tibshirani, J., Wu, J., Manning, C.D.: Combining distant and partial supervision for relation extraction. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1556–1567. Association for Computational Linguistics, Doha, Qatar (2014)

    Google Scholar 

  2. Bhattacharjee, B., Kender, J.R., Hill, M., Dube, P., Huo, S., Glass, M.R., Belgodere, B., Pankanti, S., Codella, N., Watson, P.: P2l: Predicting transfer learning for images and semantic relations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 760–761 (2020)

    Google Scholar 

  3. Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask? jury selection for decision making tasks on micro-blog services. Proc. VLDB Endow. 5(11), 1495–1506 (2012)

    Google Scholar 

  4. Ferreira, T.C., Gardent, C., Ilinykh, N., van der Lee, C., Mille, S., Moussallem, D., Shimorina, A.: The 2020 bilingual, bi-directional WebNLG+ shared task: overview and evaluation results (WebNLG+ 2020). In: Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+), pp. 55–76. Association for Computational Linguistics, Dublin, Ireland (Virtual) (2020)

    Google Scholar 

  5. Ferreira, T.C., van der Lee, C., van Miltenburg, E., Krahmer, E.: Neural data-to-text generation: A comparison between pipeline and end-to-end architectures. In: Proceedings of the EMNLP-IJCNLP, pp. 552–562. Association for Computational Linguistics, Hong Kong, China (2019)

    Google Scholar 

  6. Cremaschi, M., De Paoli, F., Rula, A., Spahiu, B.: A fully automated approach to a complete semantic table interpretation. Futur. Gener. Comput. Syst. 112, 478–500 (2020)

    Google Scholar 

  7. Elsahar, H., Vougiouklis, P., Remaci, A., Gravier, C., Hare, J., Laforest, F., Simperl, E.: T-REx: A large scale alignment of natural language with knowledge base triples. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), pp. 3448–3452. European Language Resources Association (ELRA), Miyazaki, Japan (2018)

    Google Scholar 

  8. Faridani, S., Hartmann, B., Ipeirotis, P.G.: What’s the right price? pricing tasks for finishing on time. In: Proceedings of the 11th AAAI Conference on Human Computation, AAAIWS’11-11, pp. 26–31. AAAI Press (2011)

    Google Scholar 

  9. Gardent, C., Shimorina, A., Narayan, S., Perez-Beltrachini, L.: Creating training corpora for NLG micro-planners. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 179–188. Association for Computational Linguistics, Vancouver, Canada (2017)

    Google Scholar 

  10. Glass, M., Gliozzo, A.: A dataset for web-scale knowledge base population. In: The Semantic Web, pp. 256–271. Springer International Publishing, Cham (2018)

    Google Scholar 

  11. Glass, M., Gliozzo, A., Hassanzadeh, O., Mihindukulasooriya, N., Rossiello, G.: Inducing implicit relations from text using distantly supervised deep nets. In: The Semantic Web—ISWC 2018, pp. 38–55. Springer International Publishing, Cham (2018)

    Google Scholar 

  12. Grosman, J.S., Furtado, P.H.T., Rodrigues, A.M.B., Schardong, G.G., Barbosa, S.D.J., Lopes, H.C.V.: Eras: improving the quality control in the annotation process for natural language processing tasks. Inf. Syst. 93, 101553 (2020)

    Google Scholar 

  13. Hu, X., Wen, L., Xu, Y., Zhang, C., Yu, P.: SelfORE: Self-supervised relational feature learning for open relation extraction. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 673–3682. Association for Computational Linguistics (2020)

    Google Scholar 

  14. Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: A survey. IEEE Trans. Knowl. Data Eng. 28(9), 2296–2319 (2016)

    Google Scholar 

  15. Li, M., Jin, J., Wu, W., Yang, Y., He, L., Yang, J.: A crowdsourcing based human-in-the-loop framework for denoising uus in relation extraction tasks. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2019)

    Google Scholar 

  16. Lin, X., Li, H., Xin, H., Li, Z., Chen, L.: Kbpearl: a knowledge base population system supported by joint entity and relation linking. Proc. VLDB Endow. 13(7), 1035–1049 (2020)

    Article  Google Scholar 

  17. Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Inf. Comput. 108(2), 212–261 (1994)

    Article  MathSciNet  Google Scholar 

  18. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space (2013). arXiv:1301.3781

  19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pp. 3111–3119. Curran Associates Inc. (2013)

    Google Scholar 

  20. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011. Association for Computational Linguistics, Suntec, Singapore (2009)

    Google Scholar 

  21. Mrabet, Y., Vougiouklis, P., Kilicoglu, H., Gardent, C., Demner-Fushman, D., Hare, J., Simperl, E.: Aligning texts and knowledge bases with semantic sentence simplification. In: Proceedings of the 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016), pp. 29–36. Association for Computational Linguistics, Edinburgh, Scotland (2016)

    Google Scholar 

  22. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR (2019)

    Google Scholar 

  23. Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation extraction with matrix factorization and universal schemas. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 74–84. Association for Computational Linguistics, Atlanta, Georgia (2013)

    Google Scholar 

  24. Shimorina, A., Khasanova, E., Gardent, C.: Creating a corpus for Russian data-to-text generation using neural machine translation and post-editing. In: Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pp. 44–49. Association for Computational Linguistics, Florence, Italy (2019)

    Google Scholar 

  25. Simon, E., Guigue, V., Piwowarski, B.: Unsupervised information extraction: Regularizing discriminative approaches with relation distribution losses. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1378–1387. Association for Computational Linguistics, Florence, Italy (2019)

    Google Scholar 

  26. Smirnova A., Cudré-Mauroux, P.: Relation extraction using distant supervision: A survey. ACM Comput. Surv. 51(5) (2018)

    Google Scholar 

  27. Yao, L., Riedel, S., McCallum, A.: Collective cross-document relation extraction without labelled data. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1013–1023. Association for Computational Linguistics, Cambridge, MA (2010)

    Google Scholar 

  28. Zhao, C., Walker, M., Chaturvedi, S.: Bridging the structural gap between encoding and decoding for data-to-text generation. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 2481–2491. Association for Computational Linguistics (2020)

    Google Scholar 

  29. Zheng, Y., Cheng, R., Maniu, S., Mo, L.: On optimality of jury selection in crowdsourcing. In: Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, pp. 193–204 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jessica Amianto Barbato .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barbato, J.A., Cremaschi, M., Rula, A., Maurino, A. (2024). Toward a Human-in-the-Loop Approach to Create Training Datasets for RDF Lexicalisation. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 822. Springer, Cham. https://doi.org/10.1007/978-3-031-47721-8_6

Download citation

Publish with us

Policies and ethics