Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Summary of the paper

Title Building Universal Dependency Treebanks in Korean
Authors Jayeol Chun, Na-Rae Han, Jena D. Hwang and Jinho D. Choi
Abstract This paper presents three treebanks in Korean that consist of dependency trees derived from existing treebanks, the Google UD Treebank, the Penn Korean Treebank, and the Kaist Treebank, and pseudo-annotated by the latest guidelines from the Universal Dependencies (UD) project. The Korean portion of the Google UD Treebank is re-tokenized to match the morpheme-level annotation suggested by the other corpora, and systematically assessed for errors. Phrase structure trees in the Penn Korean Treebank and the Kaist Treebank are automatically converted into dependency trees using head-finding rules and linguistic heuristics. Additionally, part-of-speech tags in all treebanks are converted into the UD tagset. A total of 38K+ dependency trees are generated that comprise a coherent set of dependency relations for over a half million tokens. To the best of our knowledge, this is the first time that these Korean corpora are analyzed together and transformed into dependency trees following the latest UD guidelines, version 2.
Topics Corpus (Creation, Annotation, Etc.), Lexicon, Lexical Database, Grammar And Syntax
Full paper Building Universal Dependency Treebanks in Korean
Bibtex @InProceedings{CHUN18.378,
  author = {Jayeol Chun and Na-Rae Han and Jena D. Hwang and Jinho D. Choi},
  title = "{Building Universal Dependency Treebanks in Korean}",
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {May 7-12, 2018},
  address = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {979-10-95546-00-9},
  language = {english}
Powered by ELDA © 2018 ELDA/ELRA