Automatic Expansion of Abbreviations in Chinese News Text

Fu, Guohong; Luke, Kang-Kwong; Zhou, GuoDong; Xu, Ruifeng

doi:10.1007/11880592_42

Guohong Fu²⁰,
Kang-Kwong Luke²⁰,
GuoDong Zhou²¹ &
…
Ruifeng Xu²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4182))

Included in the following conference series:

Asia Information Retrieval Symposium

997 Accesses
2 Citations

Abstract

This paper presents an n-gram based approach to Chinese abbreviation expansion. In this study, we distinguish reduced abbreviations from non-reduced abbreviations that are created by elimination or generalization. For a reduced abbreviation, a mapping table is compiled to map each short-word in it to a set of long-words, and a bigram based Viterbi algorithm is thus applied to decode an appropriate combination of long-words as its full-form. For a non-reduced abbreviation, a dictionary of non-reduced abbreviation/full-form pairs is used to generate its expansion candidates, and a disambiguation technique is further employed to select a proper expansion based on bigram word segmentation. The evaluation on an abbreviation-expanded corpus built from the PKU corpus showed that the proposed system achieved a recall of 82.9% and a precision of 85.5% on average for different types of abbreviations in Chinese news text.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Automatic Matching and Expansion of Abbreviated Phrases Without Context

Towards Malay Abbreviation Disambiguation: Corpus and Unsupervised Model

Developing Database of Vietnamese Abbreviations and Some Applications

References

Yu, H., Hripcsak, G., Friedman, C.: Mapping abbreviations to full-forms in biomedical articles. Journal of American Medical Information Association 9(3), 262–272 (2002)
Article Google Scholar
Terada, A., Tokunaga, T., Tanaka, H.: Automatic expansion of abbreviations by using context and character information. Information Processing and Management 40(1), 31–45 (2004)
Article Google Scholar
Gaudan, S., Kirsch, H., Rebholz-Schuhmann, D.: Resolving abbreviations to their senses in Medline. Bioinformatics 21(18), 3658–3664 (2005)
Article Google Scholar
Yu, Z., Tsuruoka, Y., Tsujii, J.: Automatic resolution of ambiguous abbreviations in biomedical texts using support vector machines and one sense per discourse hypothesis. In: Proceedings of the 26th ACM SIGIR, Toronto, Canada, pp. 57–62 (2003)
Google Scholar
Chang, J.-S., Lai, Y.-T.: A preliminary study on probabilistic models for Chinese abbreviations. In: Proceedings of the 3rd SIGHAN Workshop on Chinese Language Processing, Barcelona, Spain, pp. 9–16 (2004)
Google Scholar
Lee, H.-W.: A study of automatic expansion of Chinese abbreviations. MA Thesis, The University of Hong Kong (2005)
Google Scholar
Yin, Z.: Methodologies and principles of Chinese abbreviation formation. Language Teaching and Study 2, 73–82 (1999)
Google Scholar
Yu, S., Duan, H., Zhu, S., Swen, B., Chang, B.: Specification for corpus processing at Peking University: Word segmentation, POS tagging and phonetic notation. Journal of Chinese Language and Computing 13(2), 121–158 (2003)
Google Scholar
Fu, G., Luke, K.-K.: Chinese unknown word identification using class-based LM. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 704–713. Springer, Heidelberg (2005)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Linguistics, The University of Hong Kong, Hong Kong
Guohong Fu & Kang-Kwong Luke
School of Computer Science and Technology, Suzhou University, 215006, China
GuoDong Zhou
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
Ruifeng Xu

Authors

Guohong Fu
View author publications
You can also search for this author in PubMed Google Scholar
Kang-Kwong Luke
View author publications
You can also search for this author in PubMed Google Scholar
GuoDong Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Ruifeng Xu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, National University of Singapore, 3 Science Drive 2, 117543, Singapore
Hwee Tou Ng
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, 119613, Singapore
Mun-Kew Leong
Department of Computer Science, School of Computing, National University of Singapore, 117543, Singapore
Min-Yen Kan
Institute for Infocomm Research, 21 Heng Mui Keng Terrace, P.O. Box, 119613, Singapore
Donghong Ji

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, G., Luke, KK., Zhou, G., Xu, R. (2006). Automatic Expansion of Abbreviations in Chinese News Text. In: Ng, H.T., Leong, MK., Kan, MY., Ji, D. (eds) Information Retrieval Technology. AIRS 2006. Lecture Notes in Computer Science, vol 4182. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880592_42

Download citation

DOI: https://doi.org/10.1007/11880592_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45780-0
Online ISBN: 978-3-540-46237-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Expansion of Abbreviations in Chinese News Text

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Automatic Matching and Expansion of Abbreviated Phrases Without Context

Towards Malay Abbreviation Disambiguation: Corpus and Unsupervised Model

Developing Database of Vietnamese Abbreviations and Some Applications

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Expansion of Abbreviations in Chinese News Text

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Automatic Matching and Expansion of Abbreviated Phrases Without Context

Towards Malay Abbreviation Disambiguation: Corpus and Unsupervised Model

Developing Database of Vietnamese Abbreviations and Some Applications

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation