Coping with Problems of Unicoded Traditional Mongolian

Wang, Boli; Shi, Xiaodong; Chen, Yidong

doi:10.1007/978-3-319-47674-2_11

Boli Wang¹⁸,
Xiaodong Shi^18,19,20 &
Yidong Chen¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10035))

Included in the following conference series:

1775 Accesses

Abstract

Traditional Mongolian Unicode Encoding has serious problems as several pairs of vowels with the same glyphs but different pronunciations are coded differently. We expose the severity of the problem by examples from our Mongolian corpus and propose two ways to alleviate the problem: first, developing a publicly available Mongolian input method that can help users to choose the correct encoding and second, a normalization method to solve the data sparseness problems caused by the proliferation of homographs. Experiments in search engines and statistical machine translation show that our methods are effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Automatic Spelling Correction Method for Classical Mongolian

Towards Lithuanian Grammatical Error Correction

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Article Open access 18 October 2021

Notes

1.
The authors can scarcely find a Unicoded traditional Mongolian web site in the year 2014, although things began to change starting from the year 2015.
2.
.
3.
.
4.
To emphasize the letter ang is one code point, we transliterate it as n̅g̅. It is often pronounced as [ŋ].
5.
http://cloudtranslation.cc/corpus_minority.html.
6.
Prof. Garudi of Inner Mongolia Normal University, personal communication.
7.
Personal communication.
8.
http://search.cloudtranslation.cc/.
9.
http://uread.superfection.com/software/uread_mongolian.rar.
10.
The letters o and u are regarded equivalent letters. We collected 22 such equivalent pairs and they form the letter equivalence table. Note that equivalent letters do not always have same glyphs in all positions, e.g. some letters are only equivalent at the medial positions.
11.
We rely on the Microsoft Uniscribe engine to generate the correct glyphs. However, as [1] points out, even if Microsoft failed to generate some of the correct glyphs.
12.
13.
Google’s powerful engine can index pdf files, while ours does not yet.
14.
For this experiment, we only indexed two popular Mongolian website: www.mgyxw.net and mgl.nmg.gov.cn.
15.
http://cloudtranslation.cc/mt.

References

Batjargal, B., Khaltarkhuu, G., Kimura, F., Maeda, A.: A study of traditional Mongolian script encodings and rendering: use of unicode in OpenType fonts. Int. J. Asian Lang. Proc. 21(1), 23–44 (2011)
Google Scholar
Chinggaltai: A Grammar of the Mongol Language. Frederick Ungar Publishing Co, New York (1963)
Google Scholar
Choijinzhab: Mongolian Encoding. Inner Mongolia University Press, Hohhot. (确精扎布: 蒙古文编码. 内蒙古大学出版社, 呼和浩特) (2000). (in Chinese) http://www.babelstone.co.uk/Mongolian/MGWBM.html
Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching in pattern recognition. Int. J. Pattern Recogn. Artif. Intell. 18(03), 265–298 (2004)
Article Google Scholar
Daoerji, F., Fengshan, B., Huijuan, W.U.: Research on Mongolian input method in unicode. J. Chin. Inf. Process. 24(6), 120–124+128 (2010). (in Chinese)
Google Scholar
Goldsmith, J.: Vowel harmony in Khalkha Mongolian, Yaka, Finnish and Hungarian. Phonology 2(01), 253–275 (1985)
Article Google Scholar
MünggeGal: Menksoft Mongolian IME. http://www.menksoft.com/
Ochir, Wang, G.F.: Corpus and Mongolian inputting methods. In: International Conference on Chinese Computing 2005, Singapore (2005)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W. J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Google Scholar
Poppe, N.: Grammar of Written Mongolian. Otto Harrassowitz Verlag, Wiesbaden (1974)
Google Scholar
The Unicode Consortium: The Unicode Standard, Version 8.0.0 (2015). http://www.unicode.org/versions/Unicode8.0.0/

Download references

Acknowledgements

The work done in this paper is partially supported by the Research Fund for the Doctoral Program of Higher Education of China (No. 20130121110040), National High-Tech R&D Program of China (No. 2012BAH14F03), and the Special Fund Project of Ministry of Education of China (Intelligent Conversion System from Simplified to Traditional Chinese Characters). We thank Dr. Yanlong He for kindly providing the Mongolian-Chinese test corpus for the statistical machine translation experiment.

Author information

Authors and Affiliations

Department of Cognitive Science, Xiamen University, Xiamen, China
Boli Wang, Xiaodong Shi & Yidong Chen
Collaborative Innovation Center for Peaceful Development of Cross-Strait Relations, Xiamen University, Xiamen, China
Xiaodong Shi
Fujian Province Key Laboratory for Brain-Inspired Computing, Xiamen University, Xiamen, China
Xiaodong Shi

Authors

Boli Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Yidong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaodong Shi .

Editor information

Editors and Affiliations

Tsinghua University , Beijing, China
Maosong Sun
Fudan University , Shanghai, China
Xuanjing Huang
Dalian University of Technology , Dalian, China
Hongfei Lin
Tsinghua University , Beijing, China
Zhiyuan Liu
Tsinghua University , Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, B., Shi, X., Chen, Y. (2016). Coping with Problems of Unicoded Traditional Mongolian. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-47674-2_11
Published: 10 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47673-5
Online ISBN: 978-3-319-47674-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Coping with Problems of Unicoded Traditional Mongolian

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Automatic Spelling Correction Method for Classical Mongolian

Towards Lithuanian Grammatical Error Correction

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Coping with Problems of Unicoded Traditional Mongolian

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Automatic Spelling Correction Method for Classical Mongolian

Towards Lithuanian Grammatical Error Correction

Recent advances in Apertium, a free/open-source rule-based machine translation platform for low-resource languages

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation