Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-41505-1_35guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

CrossOIE: Cross-Lingual Classifier for Open Information Extraction

Published: 02 March 2020 Publication History

Abstract

Open information extraction (Open IE) is the task of extracting open-domain assertions from natural language sentences. Considering the low availability of datasets and tools for this task in languages other than English, recently it has been proposed that multilingual resources can be used to improve Open IE methods for different languages. In this work, we present the CrossOIE, a multilingual publicly available relation tuple validity classifier that scores Open IE systems’ extractions based on their estimated quality and can be used to improve Open IE systems and assist in the creation of Open IE benchmarks for different languages. Experiments show that our model trained using a small corpus in English, Spanish, and Portuguese can trade recall performance for up to 27% improvement in precision. This result was also archived in a zero-shot scenario, demonstrating a successful knowledge transfer across the languages.

References

[1]
Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: FLAIR: an easy-to-use framework for state-of-the-art NLP. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pp. 54–59 (2019)
[2]
Akbik, A., Chiticariu, L., Danilevsky, M., Kbrom, Y., Li, Y., Zhu, H.: Multilingual information extraction with polyglotie. In: COLING (Demos), pp. 268–272 (2016)
[3]
Batista DS, Forte D, Silva R, Martins B, and Silva M Extracçao de relaçoes semânticas de textos em português explorando a dbpédia e a wikipédia Linguamatica 2013 5 1 41-57
[4]
Bender, E.M.: Linguistically naïve != language independent: why NLP needs linguistic typology. In: Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous? pp. 26–32. Association for Computational Linguistics, Athens, March 2009. https://www.aclweb.org/anthology/W09-0106
[5]
Chen, X., Awadallah, A.H., Hassan, H., Wang, W., Cardie, C.: Zero-resource multilingual model transfer: Learning what to share. arXiv preprint arXiv:1810.03552 (2018)
[6]
Claro D, Souza M, Castellã Xavier C, and Oliveira L Multilingual open information extraction: challenges and opportunities Information 2019 10 7 228
[7]
Collovini, S., et al.: IberLEF 2019 Portuguese named entity recognition and relation extraction tasks. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), vol. 2421, pp. 390–410. CEUR-WS.org (2019)
[8]
Cui, L., Wei, F., Zhou, M.: Neural open information extraction. arXiv preprint arXiv:1805.04270 (2018)
[9]
Del Corro, L., Gemulla, R.: Clausie: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 355–366. ACM (2013)
[10]
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
[11]
Ettinger, A.: What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. arXiv preprint arXiv:1907.13528 (2019)
[12]
Faruqui, M., Kumar, S.: Multilingual open relation extraction using cross-lingual projection. arXiv preprint arXiv:1503.06450 (2015)
[13]
Gamallo P and Garcia M Pereira F, Machado P, Costa E, and Cardoso A Multilingual open information extraction Progress in Artificial Intelligence 2015 Cham Springer 711-722
[14]
Glauber R and Claro DB A systematic mapping study on open information extraction Expert Syst. Appl. 2018 112 372-387
[15]
Glauber, R., Claro, D.B., de Oliveira, L.S.: Dependency parser on open information extraction for Portuguese texts - DptOIE and DependentIE on IberLEF. In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019), vol. 2421, pp. 442–448. CEUR-WS.org (2019)
[16]
Glauber R, de Oliveira LS, Sena CFL, Claro DB, Souza M, et al. Villavicencio A et al. Challenges of an annotation task for open information extraction in Portuguese Computational Processing of the Portuguese Language 2018 Cham Springer 66-76
[17]
Lample, G., Conneau, A.: Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291 (2019)
[18]
Léchelle, W., Gotti, F., Langlais, P.: Wire57: A fine-grained benchmark for open information extraction. arXiv preprint arXiv:1809.08962 (2018)
[19]
Matthews BW Comparison of the predicted and observed secondary structure of t4 phage lysozyme Biochimica et Biophysica Acta (BBA)-Protein Structure 1975 405 2 442-451
[20]
Sanches LMP, Cardel VS, Machado LS, Souza Marlo, Salvador LN, et al. Villavicencio A et al. Disambiguating open IE: identifying semantic similarity in relation extraction by word embeddings Computational Processing of the Portuguese Language 2018 Cham Springer 93-103
[21]
Pereira, V., Pinheiro, V.: Report-um sistema de extração de informações aberta para língua portuguesa. In: Proceedings of Symposium in Information and Human Language Technology, pp. 191–200. Sociedade Brasileira de Computação (2015)
[22]
Pires, T., Schlinger, E., Garrette, D.: How multilingual is multilingual bert? arXiv preprint arXiv:1906.01502 (2019)
[23]
Sena CFL, Claro DB, et al. Villavicencio A et al. Pragmatic information extraction in Brazilian Portuguese documents Computational Processing of the Portuguese Language 2018 Cham Springer 46-56
[24]
Sena CFL and Claro DB Inferportoie: a Portuguese open information extraction system with inferences Nat. Lang. Eng. 2019 25 2 287-306
[25]
Sena, C.F.L., Glauber, R., Claro, D.B.: Inference approach to enhance a Portuguese open information extraction. In: Proceedings of the 19th International Conference on Enterprise Information Systems - Volume 1: ICEIS, pp. 442–451, INSTICC, ScitePress, Porto (2017). 10.5220/0006338204420451
[26]
Stanovsky, G., Michael, J., Zettlemoyer, L., Dagan, I.: Supervised open information extraction. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 885–895 (2018)
[27]
Sun, M., Li, X., Wang, X., Fan, M., Feng, Y., Li, P.: Logician: a unified end-to-end neural approach for open-domain information extraction. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 556–564. ACM (2018)
[28]
Wu, S., Dredze, M.: Beto, bentz, becas: The surprising cross-lingual effectiveness of BERT. CoRR abs/1904.09077 (2019). http://arxiv.org/abs/1904.09077
[29]
Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 694–699. ACM (2002)
[30]
Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)
[31]
Zhang, S., Duh, K., Van Durme, B.: MT/IE: cross-lingual open information extraction with neural sequence-to-sequence models. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 64–70 (2017)

Cited By

View all
  • (2022)PortNOIE: A Neural Framework for Open Information Extraction for the Portuguese LanguageComputational Processing of the Portuguese Language10.1007/978-3-030-98305-5_23(243-255)Online publication date: 21-Mar-2022
  • (2020)Cross-Lingual Public Opinion Tracing Based on Blockchain TechnologyWeb Information Systems and Applications10.1007/978-3-030-60029-7_54(607-617)Online publication date: 23-Sep-2020

Index Terms

  1. CrossOIE: Cross-Lingual Classifier for Open Information Extraction
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        Computational Processing of the Portuguese Language: 14th International Conference, PROPOR 2020, Evora, Portugal, March 2–4, 2020, Proceedings
        Mar 2020
        431 pages
        ISBN:978-3-030-41504-4
        DOI:10.1007/978-3-030-41505-1

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 02 March 2020

        Author Tags

        1. Open information extraction
        2. Cross-lingual
        3. Multilingual

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 10 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2022)PortNOIE: A Neural Framework for Open Information Extraction for the Portuguese LanguageComputational Processing of the Portuguese Language10.1007/978-3-030-98305-5_23(243-255)Online publication date: 21-Mar-2022
        • (2020)Cross-Lingual Public Opinion Tracing Based on Blockchain TechnologyWeb Information Systems and Applications10.1007/978-3-030-60029-7_54(607-617)Online publication date: 23-Sep-2020

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media