Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Variant Chinese Domain Name Resolution

Published: 01 November 2008 Publication History

Abstract

Many efforts in past years have been made to lower the linguistic barriers for non-native English speakers to access the Internet. Internet standard RFC 3490, referred to as IDNA (Internationalizing Domain Names in Applications), focuses on access to IDNs (Internationalized Domain Names) in a range of scripts that is broader in scope than the original ASCII. However, the use of character variants that have similar appearances and/or interpretations could create confusion. A variant IDL (Internationalized Domain Label), derived from an IDL by replacing some characters with their variants, should match the original IDL; and thus a variant IDN does. In RFC 3743, referred to as JET (Joint Engineering Team) Guidelines, it is suggested that zone administrators model this concept of equivalence as an atomic IDL package. When an IDL is registered, an IDL package is created that contains its variant IDLs generated according to the zone-specific Language Variant Tables (LVTs). In addition to the registered IDL, the name holder can request the domain registry to activate some of the variant IDLs, free or by an extra fee. The activated variant IDLs are stored in the zone files, and thus become resolvable. However, an issue of scalability arises when there is a large number of variant IDLs to be activated.
In this article, the authors present a resolution protocol that resolves the variant IDLs into the registered IDL, specifically for Han character variants. Two Han characters are said to be variants of each other if they have the same meaning and are pronounced the same. Furthermore, Han character variants usually have similar appearances. It is not uncommon that a Chinese IDL has a large number of variant IDLs. The proposed protocol introduces a new RR (resource record) type, denoted as VarIdx RR, to associate a variant expression of the variant IDLs with the registered IDL. The label of the VarIdx RR, denoted as the variant index, is assigned by an indexing function that is designed to give the same value to all of the variant IDLs enumerated by the variant expression. When one of the variant IDLs is accessed, Internet applications can compute the variant index, look up the VarIdx RRs, and resolve the variant IDL into the registered IDL.
The authors examine two sets of Chinese IDLs registered in TWNIC and CNNIC, respectively. The results show that for a registered Chinese IDL, a very small number of VarIdx RRs, usually one or two, are sufficient to activate all of its variant IDLs. The authors also represent a Web redirection service that employs the proposed resolution protocol to redirect a URL addressed by a variant IDN to the URL addressed by the registered IDN. The experiment results show that the proposed protocol successfully resolves the variant IDNs into the registered IDNs.

References

[1]
China Internet Network Information Center (CNNIC). Retrieved from http://www.cnnic.net.cn.
[2]
China Internet Network Information Center (CNNIC). 2005. IANA IDN Languages Table: CN Chinese Character Table.
[3]
Costello, A. M. 2003. Punycode: A bootstring encoding of unicode for internationalized domain names in applications (IDNA). RFC 3492.
[4]
Crawford, M. 1999. Non-terminal DNS name redirection. RFC 2672.
[5]
Danzig, P. B., Obraczka, K., and Kumar, A. 1992. An analysis of wide-area name server traffic: A study of the domain name system. In Proceedings of the Annual Conference of the ACM SIGCOMM on Communications Architectures and Protocols (SIGCOMM’92), 281--292.
[6]
Faltstrom, P., Hoffman, P., and Costello, A. M. 2003. Internationalizing domain names in applications (IDNA). RFC 3490.
[7]
Fielding, R., Gettys, J., Mogul, J., Frystyk, H., and Berners-Lee, T. 1997. Hypertext Transfer Protocol -- HTTP/1.1. RFC 2068.
[8]
Hanyu da zidian Editorial Committee. 1986. Hanyu Da Zidan (Great dictionary of Chinese characters). Sichuan Cishuan Publishing, Chengdu, China. ISBN 780-543-001-2.
[9]
Hoffman, P. and Blanchet, M. 2002. Preparation of internationalized strings (stringprep). RFC 3454.
[10]
Hoffman, P. and Blanchet, M. 2003. Nameprep: A stringprep profile for internationalized domain names. RFC 3491.
[11]
IEEE Standard 1003.2-1992. IEEE standard for information technology -- Portable Operating System Interface (POSIX) - Part 2: Shell and utilities, vol. 1.
[12]
IETF Internationalized Domain Names Working Group. Retrieved from http://www.ietf.org/html.charters/idn-charter.html.
[13]
Klensin, J. 2004. A search-based access model for the DNS. Internet Draft.
[14]
Konishi, K., Huang, K., Qian, H., and Ko, Y. 2004. Joint engineering team (JET) guidelines for internationalized domain names (IDN) registration and administration for Chinese, Japanese, and Korean. RFC 3743.
[15]
Lampson, B. W. 1985. Designing a global name service. In Proceedings of the 4th Annual ACM Symposium on Principles of Distributed Computing (PODC’85), 1--10.
[16]
Lee, X. D., Hsu, N. W., Chen, E., and Sun, G. N. 2001. Traditional and simplified Chinese conversion. Internet Draft.
[17]
Lin, J. W., Ho, J. M., Tseng, L. M., and Lai, F. 2006. IDN server proxy architecture for internationalized domain name resolution and experiences with providing Web services. ACM Trans. Internet Technol. 6, 1.
[18]
Mockapetris, P. 1987. Domain names: Concepts and facilities (RFC 1034) and Domain names: Implementation and specification (RFC 1035). STD 13.
[19]
Secunia Stay Secure. 2005. Retrieved from http://secunia.com/multiple_browsers_idn_spoofing_test/.
[20]
Seng, J., Yoney, A. Y., Huang, K., and Kim, K. 2001. Han ideograph (CJK) for internationalized domain names. Internet draft.
[21]
State Council of the People’s Republic of China. 1986. A complete set of simplified Chinese characters.
[22]
The Unicode Consortium. 2002. Unihan database, version 3.2. Retrieved from ftp://ftp.unicode.org/Public/UNIDATA/Unihan.txt.
[23]
Tseng, L. M., Ho, J. M., Qian, H., and Huang, K. 2001. Internationalized domain names and unique identifiers/names. Internet draft.
[24]
TWNIC. 2005. IANA IDN Language table: TW Chinese character table.
[25]
Zhang, Y., Chen, T. et al. 1989. The KangXi dictionary. ISBN 962-231-006-0.

Cited By

View all
  • (2012)A method for DNS names identical resolution7th International Conference on Communications and Networking in China10.1109/ChinaCom.2012.6417443(30-34)Online publication date: Aug-2012
  • (2011)An auxiliary unicode Han character lookup service based on glyph shape similarity2011 11th International Symposium on Communications & Information Technologies (ISCIT)10.1109/ISCIT.2011.6092155(489-492)Online publication date: Oct-2011

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing
ACM Transactions on Asian Language Information Processing  Volume 7, Issue 4
November 2008
81 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/1450295
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2008
Published in TALIP Volume 7, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Han character folding
  2. Han character variant
  3. IDN spoof
  4. conversion between traditional Chinese and simplified Chinese
  5. internationalized domain name
  6. localization

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2012)A method for DNS names identical resolution7th International Conference on Communications and Networking in China10.1109/ChinaCom.2012.6417443(30-34)Online publication date: Aug-2012
  • (2011)An auxiliary unicode Han character lookup service based on glyph shape similarity2011 11th International Symposium on Communications & Information Technologies (ISCIT)10.1109/ISCIT.2011.6092155(489-492)Online publication date: Oct-2011

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media