Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3155133.3155171acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

Vietnamese Open Information Extraction

Published: 07 December 2017 Publication History

Abstract

Open information extraction (OIE) is the process to extract relations and their arguments automatically from textual documents without the need to restrict the search to predefined relations. In recent years, several OIE systems for the English language have been created but there is not any system for the Vietnamese language. In this paper, we propose a method of OIE for Vietnamese using a clause-based approach. Accordingly, we exploit Vietnamese dependency parsing using grammar clauses that strives to consider all possible relations in a sentence. The corresponding clause types are identified by their propositions as extractable relations based on their grammatical functions of constituents. As a result, our system is the first OIE system named vnOIE for the Vietnamese language that can generate open relations and their arguments from Vietnamese text with highly scalable extraction while being domain independent. Experimental results show that our OIE system achieves promising results with a precision of 83.71%.

References

[1]
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O. 2007. Open Information Extraction from the Web. In Proceedings of the 20th international joint conference on Artificial Intelligence (IJCAI 2007), pp. 2670--2676, Hyderabad, India, 06-12 January 2007.
[2]
Buchholz, S., Marsi, E. (2006). CoNLL-X shared task on multilingual dependency parsing. In Proceedings of the Tenth Conference on Computational Natural Language Learning, CoNLL-X2006.
[3]
Ban, D.Q. (2004). Ngũ' pháp tiêng Viêt. NXB Giáo dục. 2004.
[4]
Corro, L.D., Gemulla, R. 2013. ClausIE: Clause-Based Open Information Extraction. In Proceedings of the 22nd international conference on World Wide Web (WWW 2013), pp. 355--366, Rio de Janeiro, Brazil, 13-17 May 2013.
[5]
Etzioni, O., Banko M., Soderland, S., Weld, D.S. (2008). Open Information Extraction from the Web. Communication of the ACM, vol. 51, no. 12 (2008).
[6]
Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam. 2011. Open Information Extraction: the Second Generation. In Proceedings of the 22nd international joint conference on Artificial Intelligence (IJCAI 2012), pp. 3--10, Barcelona, Catalonia, Spain, 16-22 July 2011.
[7]
Fader, A., Soderland, S., Etziom, O. 2011. Identifying Relations for Open Information Extraction. In Proceedings of the 2011 conference on Empirical Methods in Natural Language Processing (EMNLP 2011), pp. 1035--1545, Edinburgh, UK, 27-31 July 2011.
[8]
Garcia, M., Gamallo, P. 2011. Dependency-based text compression for semantic relation extraction. In Proceedings of the Workshop Information Extraction and Knowledge Acquisition (IEKA 2011), Hissar, Bulgaria, 16 September 2011.
[9]
Hong, P.L., Nguyen, T.M.H., Roussanaly, A. (2012). Vietnamese Parsing with an Automatically Extracted Tree-Adjoining Grammar. In Proceedings of the 10th IEEE RIVF International Conference on Computing and Communication Technologies, 2012.
[10]
Lan, H.T. (2010). Môt sô điêm giông và khác nhau cô ban giũ'a câu đô;n trong tiêng Anh và câu đô'n trong tiêng Viêt. In Proceedings of NNQS 2010, sô 4-04/2010.
[11]
Mausam, Schmitz, M., Bart, R., Soderland, S. 2012. Open Language Learning for Information Extraction. In Proceedings of the 2012 conference on Empirical Methods in Natural Language Processing (EMNLP 2012), pp. 523--534, Jeju Island, Korea, 12-14 July 2012.
[12]
Mirrezaei, I., Martins, B., and Cruz, I. 2015. The Triplex Approach for Recognizing Semantic Relations from Noun Phrases, Appositions, and Adjectives. In Proceedings of the Workshopon KDDM & Linked Open Data 2015.
[13]
Nguyen, L.M., Nguyen, H.T., Nguyen, P.T., Ho, T.B., Shimazu, K. (2009) An Empirical study of Vietnamese Noun Phrase Chunking with Discriminative Sequence Models. In Proceedings of the 7th Workshop on Asian Language Resources, ACL-IJCNLP 2009.
[14]
Nguyen, D.B., Hoang, S.H., Pham, S.B., Nguyen, T.P. (2010). Named Entity Recognition for Vietnamese. In Proceedings of the Asian Conference on Intelligent Information and Database Systems (ACIIDS) 2010. (pp 205--214).
[15]
Nguyen, D.Q., Nguyen, D.Q., Pham, D.D., Pham, S.B. (2014). RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014.
[16]
Nguyen, D.Q., Nguyen, D.Q., Pham, S.B., Nguyen, P.T., Nguyen, MX. (2014). From Treebank Conversion to Automatic Dependency Parsing for Vietnamese. In Proceedings of the 19th nternational Conference on Natural Language & Information Systems (NLDB), 2014.
[17]
Nivre, J., Hall, J., Kubler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D. (2007). The CoNLL 2007 Shared Task on Dependency Parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007.
[18]
Pham, T.H., Pham, X.H., Le-Hong, P. (2015). Building a Semantic Role Labeling System for Vietnamese. In Proceedings of the 10th International Conference on Digital Information Management (ICDIM), 2015.
[19]
Quirk, R., Greenbaum, S., Leech, G., Svartvik, J. (1985). A Comprehensive Grammar of the English Language. Longman, 1985.
[20]
Sam, R.C., Le, H.T., Nguyen, T.T., Trinh, T.M. (2010). Relation Extraction in Vietnamese Text Using Conditional Random Fields. In Proceedings of the 6th Asia Information Retrieval Societies Conference (AIRS), 2010, Taiwan, December 1-3, 2010.
[21]
Sam, R.C., Le, H.T., Nguyen, T.T., Le, D.A., Nguyen, N.M.T. (2011). Semi-Supervised Learning for Relation Extraction in Vietnamese text. In Proceedings of the 2nd Symposium on Information and Communication Technology 2011, Hanoi, Vietnam, 13-14 Oct, 2011.
[22]
Schmidek, J., Barbosa, D. (2014). Improving Open Relation Extraction via Sentence Restructuring. In Proceedings of the ninth International Conference on Language Resources and Evaluation (LREC), 2014.
[23]
Tinh, B.Đ. (1995). Văn phạm Viêt Nam. NXB Văn hóa. 1995.
[24]
Thao, P.T.X., Tri, T.Q., Dien, D., Collier, N. (2007). Named entity recognition in Vietnamese using classifier voting. Transactions on Asian Language Information Processing, vol. 6 (4), 2007.
[25]
Thi, L.N., My, L.H., Viet, H.N., Minh, H.N.T., Hong, P.L. (2013). Building a Treebank for Vietnamese Dependency Parsing. In Proceedings of the 10th IEEE RIVE International Conference on Computing and Communication Technologies 2013.
[26]
Tran, M.V., Nguyen, T.T., Nguyen, T.S., Le, H.Q. (2010). Automatic Named Entity Set Expansion Using Semantic Rules and rappers for Unary Relation. In Proceedings of International Conference on Asian Language Processing (IALP) 2010, China, 28-30 December 2010
[27]
Vo, D.T., Ock, C.Y. (2012) A hybrid approach of pattern extraction and semi-supervised learning for Vietnamese named entity recognition. In Collective Intelligence Technologies and Applications 2012, LNAI, vol. 7653, Springer-Verlag.
[28]
Vo, D. T., & Bagheri, E. (2015). Syntactic and Semantic Structures for Relation Extraction. In Proceedings of the 6th Symposium on Future Directions in Infor- mation Access (FDIA 2015), Thessaloniki, Greece, September 2015 (pp. 28--33).
[29]
Vo, D.T., Bagheri, E. (2017a). Self-training on refined clause patterns for relation extraction. Information Processing and Management (2017), Elsevier.
[30]
Vo, D.T., Bagheri, E. (2017b). Open information extraction. Encyclopedia with Semantic Computing and Robotic Intelligence Vol. 1(1) (2017), World Scientific.
[31]
Wu, F., Weld, D.S. (2010). Open Information Extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), pp. 118--127, Uppsala, Sweden, 11-16 July 2010.

Cited By

View all
  • (2022)BERT-TRIPLE: Using BERT for Extracting Triples from Vietnamese Sentences2022 RIVF International Conference on Computing and Communication Technologies (RIVF)10.1109/RIVF55975.2022.10013791(179-184)Online publication date: 20-Dec-2022
  • (2021)Building a Vietnamese question answering system based on knowledge graph and distributed CNNNeural Computing and Applications10.1007/s00521-021-06126-z33:21(14887-14907)Online publication date: 1-Nov-2021
  • (2020)Extracting triples from Vietnamese text to create knowledge graph2020 12th International Conference on Knowledge and Systems Engineering (KSE)10.1109/KSE50997.2020.9287471(219-223)Online publication date: 12-Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SoICT '17: Proceedings of the 8th International Symposium on Information and Communication Technology
December 2017
486 pages
ISBN:9781450353281
DOI:10.1145/3155133
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • SOICT: School of Information and Communication Technology - HUST
  • NAFOSTED: The National Foundation for Science and Technology Development

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Open information extraction
  2. Vietnamese dependency parsing
  3. relation extraction

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

SoICT 2017

Acceptance Rates

Overall Acceptance Rate 147 of 318 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)2
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)BERT-TRIPLE: Using BERT for Extracting Triples from Vietnamese Sentences2022 RIVF International Conference on Computing and Communication Technologies (RIVF)10.1109/RIVF55975.2022.10013791(179-184)Online publication date: 20-Dec-2022
  • (2021)Building a Vietnamese question answering system based on knowledge graph and distributed CNNNeural Computing and Applications10.1007/s00521-021-06126-z33:21(14887-14907)Online publication date: 1-Nov-2021
  • (2020)Extracting triples from Vietnamese text to create knowledge graph2020 12th International Conference on Knowledge and Systems Engineering (KSE)10.1109/KSE50997.2020.9287471(219-223)Online publication date: 12-Nov-2020
  • (2019)Multilingual Open Information Extraction: Challenges and OpportunitiesInformation10.3390/info1007022810:7(228)Online publication date: 2-Jul-2019

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media