Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Summary of the paper

Title Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content
Authors Vilelmini Sosoni, Katia Lida Kermanidis, Maria Stasimioti, Thanasis Naskos, Eirini Takoulidou, Menno Van Zaanen, Sheila Castilho, Panayota Georgakopoulou, Valia Kordoni and Markus Egg
Abstract The present work describes a multilingual corpus of online content in the educational domain, i.e. Massive Open Online Course material, ranging from course forum text to subtitles of online video lectures, that has been developed via large-scale crowdsourcing. The English source text is manually translated into 11 European and BRIC languages using the CrowdFlower platform. During the process several challenges arose which mainly involved the in-domain text genre, the large text volume, the idiosyncrasies of each target language, the limitations of the crowdsourcing platform, as well as the quality assurance and workflow issues of the crowdsourcing process. The corpus constitutes a product of the EU-funded TraMOOC project and is utilised in the project in order to train, tune and test machine translation engines.
Topics Crowdsourcing, Corpus (Creation, Annotation, Etc.), Lr National/International Projects, Infrastructural/Policy Issues
Full paper Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content
Bibtex @InProceedings{SOSONI18.677,
  author = {Vilelmini Sosoni and Katia Lida Kermanidis and Maria Stasimioti and Thanasis Naskos and Eirini Takoulidou and Menno Van Zaanen and Sheila Castilho and Panayota Georgakopoulou and Valia Kordoni and Markus Egg},
  title = "{Translation Crowdsourcing: Creating a Multilingual Corpus of Online Educational Content}",
  booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
  year = {2018},
  month = {May 7-12, 2018},
  address = {Miyazaki, Japan},
  editor = {Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis and Takenobu Tokunaga},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {979-10-95546-00-9},
  language = {english}
Powered by ELDA © 2018 ELDA/ELRA