L2/06-041
Unified Tai Script for Unicode
Ngo Trung Viet
Institute of Information Technology
Vietnamese Academy of Science and Technology
&
Jim Brase
SIL International
January 30, 2006
Referred to as Viet Thai, row AA in the latest Roadmap
See also: http://www.evertype.com/standards/tai/viet-thai-comparisons.pdf
Sociolinguistic Background
The Tai script is used by four Tai languages spoken primarily in northwestern Vietnam,
northern Laos, and central Thailand—Tai Daeng (also Red Tai or Tai Rouge), Tai Dam (Black
Tai or Tai Noir), Tai Dón (White Tai or Tai Blanc), and Thai Song (Lao Song or Lao Song
Dam). The Thai Song of Thailand are geographically removed from, but linguistically related
to the Tai people of Vietnam and Laos. There are also populations in Australia, China, France,
and the United States. The script is related to other Thai scripts used throughout Southeast
Asia.
The Ethnologue estimates the total population of the four languages, across all countries, at 1.5
million. (Tai Daeng 165,000, Tai Dam 764,000, Tai Dón 490,000, Thai Song 32,000.)
The degree of usage of the script varies from community to community. It has been widely
used by the Tai Dam community in the United States. There is a desire to introduce it into
formal education in Vietnam (Cam Trong 2005). On the other hand, it is not known whether it
is in current use by the Thai Song, and the only dated document available from Laos is a 30year old Tai Daeng manuscript.
The Traditional Script vs. the Unified Alphabet
Anyone attempting to establish a standard for writing the Tai script must cope with the great
diversity between communities in the traditional form of the script. Cam Trong (2005) lists
eight dialects of the script for Vietnam alone. Other dialects exist in Laos and Thailand.
The Vietnamese government has attempted to establish a standard for the Tai script, which was
called the Thống Nhất, or Unified Alphabet in an anonymous 1961 paper (Các Mẫu Tư Thái Ở
Unified Tai Script for Unicode
page 2 of 22
Miền Tây Bắc Viêt Nam). This standard will be referred to as the Unified Alphabet in the
remainder of this paper. The most recent revision of the Unified Alphabet is found in Cam
Trong (2005).
Not everyone has had the opportunity to learn the Unified Alphabet. This would include the
elderly, who learned to read and write before the Unified Alphabet was introduced, and those
Tai communities outside of Vietnam, including the Tai Daeng of Laos, the Thai Song of
Thailand, Tai Dam communities in Laos, the United States, and France, and many smaller
communities. Thus, it is my desire to include the traditional forms of the script as well as the
Unified Alphabet in this proposal.
The Tai Writing System
Basic Features
The Tai scripts share many features common to most Thai alphabets:
• They are written left to right. (One variation of the script, Tai Do, is written vertically, but
is beyond the scope of this study.)
• There is a double set of initial consonants, one for high tone class and one for low tone
class.
• In the traditional form, vowel marks can be placed before, after, above, or below the
syllable’s initial consonant, depending on the vowel. Vowel digraphs are common. In the
Unified Alphabet, the diacritic vowels have been replaced with spacing vowels.
Tone Classes and Tone Marks
In the Tai scripts each consonant has two forms. The high form of the initial consonant
indicates that the syllable uses tone 1, 2, or 3. The low form of the initial consonant indicates
that the syllable uses tone 4, 5, or 6. (Tai Daeng has only five tones, but the practice is
similar.)
Traditionally, these scripts did not use any further marking for tone, and the reader had to
determine the tone from the context. In recent times, however, several groups have introduced
tone marks into Tai writing. The Tai Heritage font (Tai Dam) borrowed tone marks from Lao,
and these are now widely used by the Tai Dam community in the U.S. The Song Petburi font
(Thai Song) includes Thai style tone marks, which are identical to the Lao. The Unified Tai
Alphabet invented a new set of spacing tone marks which are placed at the end of the syllable.
Aam and Aanu (1974) present a unique set of diacritic tone marks for Tai Daeng.
When combined with the consonant class, two tone marks are sufficient to unambiguously
mark the tone. Thus, some authors mark tone in Tai Dam as follows:
no mark
high class consonant
tone 1
tone 2
tone 3
low class consonant
tone 4
tone 5
tone 6
Note, however, that checked syllables (those ending /p/, /t/, /k/, or /ʔ/) are restricted to tones 2
and 5, and that no marking other than the consonant class is necessary for those syllables.
Unified Tai Script for Unicode
page 3 of 22
The practice for the other languages would be similar to that for Tai Dam.
Final Consonants
In written form, the high-tone class symbols for ‘b’ (b)1 and ‘d’ (d) are used for syllable final
/p/ and /t/, as is the practice in all Thai scripts. This usage should not mislead one into thinking
that oral /b/ and /d/ occur syllable final.
The high-tone class symbol for ‘k’ (k) is used for both final /k/ and final /ʔ/.
The low-tone class symbols are used for writing final /j/ (J) and the final nasals, /m/ (M), /n/
(N), and /ŋ/ (). Low-tone /v/ (V) is used for final /w/.
There are a number of exceptions to the above rules, in the form of Vowel + Final Consonant
ligatures. These vary from region to region, but the ones with the broadest usage are the
ligatures for /-aj/ (ꭣ◌), /-am/ ( ◌ꭥ), /-an/ (◌ꭤ), and /-əw/ (ꭢ◌). The ligature /-at/ ( ) is limited to
some dialects of Tai Dón.
Word Spacing and Baseline
Traditional Tai writing does not use space between words. Thus, a line breaking algorithm will
have to be developed to accommodate the oldest forms of the script.
In the last 20 years the Tai Dam community in the U.S. has adopted the practice of using word
spacing, although the spaces are usually narrower than for Latin alphabets. A trilingual
pamphlet published by the Hanoi National University in 1999, Giới Thiệu Chương Trình Thái
Học Việt Nam, shows spacing between words in the Tai script. (See Figure 1 in Script
Samples, below.)
The Tai Daeng sample (Figure 2, Script Samples) has clear spacing between words. This is a
surprise, as the manuscript appears to be rather old.
Tai scripts usually use a bottom baseline. But the Tai Daeng manuscript in Figure 2, written on
lined paper, again surprises us with a center baseline.
Sort Order
The Tai scripts do not have an established standard for sorting. Sequences have sometimes
been borrowed from neighboring languages. Baccam, et. al. (1989) use an order borrowed
from Lao. On the other hand, Cam Trong (2005) preferred an order based on the Vietnamese
alphabet (the Quốc Ngữ). These will be discussed further, below.
Key Issues
Is it sufficient to encode only the Unified Alphabet?
This is the most crucial question to be answered. My conclusion is “No, it is not sufficient,”
for the following reasons.
1. As noted above, not everyone one can read the Unified Alphabet. Some communities
will try to continue using their traditional form of the script.
2. One possible solution is to encode only the Unified Alphabet, and then to make
language-specific fonts for each of the languages which reflect their traditional form.
Thus, a Tai Dón person would use a Tai Dón font, and a Tai Dam person would use a
1
The samples of the symbols shown in this section, except for the /-at/ ligature, are from the Tai Heritage font
(Tai Dam). Forms may vary across script dialects.
Unified Tai Script for Unicode
page 4 of 22
Tai Dam font, but they would have the same encoding. However, this would result in
an encoding that is interpreted by the font, which defeats the purpose of Unicode.
Therefore, I have concluded that it is necessary to encode the Unified Alphabet plus any
characters that are required by the traditional forms of the script.
Should Tai Daeng be included?
The Tai Daeng character set has only about a 50% correlation to those of the other languages.
Should it be included as part of the Unified Tai Script, or should it be encoded as its own
script?
Although it has many unique characters, the basic form and mechanics of the script are similar
to that of the other languages. Furthermore, encoding Tai Daeng as a separate script would
require the duplication of those characters which are similar. Consequently, Tai Daeng should
be considered part of the Unified Tai Script.
Should Thai Song be included?
The contrast between the styles of Thai Song writing and Tai Dam writing is quite stark. Yet
when the stylistic differences are set aside, the underlying form of many of the characters are
similar. Therefore, at this time Thai Song should be considered part of the Unified Tai Script.
Unfortunately, only a limited amount of data was available for evaluation for this proposal.
Some new data has recently become available, but has not yet been analyzed. It is possible that
additional analysis will lead to different conclusions about the Thai Song writing.
Character Order and Sort Order
As noted above under The Tai Writing System, the Tai script does not have an established sort
order. The two best options are an order derived from Lao or one derived from the Quốc Ngữ.
The advantage of an order based on the Quốc Ngữ is that the majority of the users of the Tai
script live in an area where Vietnamese is the language in influence. Communication between
the Tai dialects and Vietnamese would thus be enhanced. This would help to encourage the
teaching of the Tai script in the schools—an important consideration.
The advantage of a Lao based order is that Tai and Lao scripts are from the same family.
This matter will require additional discussion. The order of the currently suggested character
chart is based on the Lao.
Unified Tai Script for Unicode
page 5 of 22
Bibliography
____. “Các Mẫu Tư Thái Ở Miền Tây Bắc Viêt Nam.” Internet:
http://www.evertype.com/standards/tai/viet-thai-samples.pdf
____. Conflict and Relocation. http://www.seasite.niu.edu/tai/TaiDam/article/a4.htm.
____. 1999. Giới Thiệu Chương Trình Thái Học Việt Nam. (LJ kN HN Vd nM
mJ su Pi N; Introduction for Vietnam Programme of Thai Studies). Hanoi National
University, Hanoi.
____. Khhãm Kháo Đi Chảu Dê-su Seo Lũng Ók Mác Tẻm.
____. Sip Hoc Chau Thai. (Journal de l'Association A.R.T.F.) Orly, France. Cited by
http://www.evertype.com/standards/tai/viet-thai-samples.pdf.
____. Song Petburi font. Internet: http://www.seasite.niu.edu/tai/TaiDam/index.htm.
____. Untitled, undated manuscript in Tai Daeng.
and
(Aam and Aanu). 1974.
(“bɛ:p
hiə:n4 a:n2 lɛʔ khiən1 naŋ1 sɨ:1 taj4 dɛŋ1”, “Learning to Read and Write Tai Daeng”).
Vientiane.
Baccam Don, Baccam Faluang, Baccam Hung, Dorothy Fippinger. 1989. Tai Dam – English,
English – Tai Dam Vocabulary Book. Summer Institute of Linguistics.
Cam Trong. 2005. “Thai Scripts in Vietnam,” in Workshop on the Preservation and
Digitization of Tai Scripts. Hanoi, Vietnam.
Điêu Chính Nhìm and Jean Donaldson. 1970. Păp San Khhãm Pák Tãy-Keo-Eng (Ngũ-Vụng
Thái-Việt-Anh, Tai-Vietnamese-English Vocabulary). Saigon.
Donaldson, Jean and Jerold A. Edmondson. 1997. “A Preliminary Examination of Tai Tac,” in
Comparative Kadai, The Tai Branch, edited by Jerold A. Edmondson and David B. Solnit.
The Summer Institute of Linguistics and The University of Texas at Arlington. pp 235-266.
Everson, Michael. Proposal for the Universal Character Set. Internet:
http://www.evertype.com/standards/tai/viet-thai-comparisons.pdf
Ferlus, Michel. 1988. “Langues et ecritures en asie du sud-est,” The 21st International
Conference on Sino-Tibetan Languages and Linguistics, University of Lund, Sweden.
Ferlus, Michel. 1999. “Les dialectes et les écritures des Tai (Thai) du Nghệ An (Vietnam),”
Treizièmes journées de linguistique d'asie orientale. Centre de recherches linguistiques sur
l'asie orientale (EHESS-CNRS), 105 bd raspail, 75006 Paris.
Ferlus, Michel. 2003 “L’intérêt linguistique du Hưng Hóa Ký Lược de Phạm Thậ Dụ̂at,” Dixseptièmes de Linguistiques d’Asie Orientale, Centre de Recherches Linguistiques sur l’Asie
Orientale (EHESS-CNRS).
Finot, Louis, 1917. “Recherhes sur la literature laotienne,” Bulletin de l’Ecole Française
d’Extrême-Orient. 17(5).
Fippinger, Jay and Dorothy. 1970. “Black Tai Phonemes with reference to White Tai.”
Anthropological Linguistics 12.3: 83-97.
Lafont, Pierre-Bernard, 1962. “Les écritures ’Tay du Laos,” Bulletin de l’Ecole Français
d’Extrême-Orient.
Unified Tai Script for Unicode
page 6 of 22
Gedney, William J. 1989. “A Comparative Sketch of White, Black and Red Tai.” Selected
Papers on Comparative Tai Studies, Michigan Papers on South and Southeast Asia no. 29,
Center for South and Southeast Asia Studies, The University of Michigan.
Lo Văn Mươi (L v MJ). 1966. Ép Sư ˈTáy Piên Peng. (b s N , Learning
the Revised Tai Alphabet.)
Marcus, Russell. 1970. English-Lao, Lao-English Dictionary. Charles E. Tuttle Company.
Martini, Francois. 1954. “Romanisation des parlers ‘Tay du Nord Vietnam.” Bulletin de
l’Ecole Française d’extrême-orient.
Minot, Lieutenant. 1933. "Dictonnaire Français - Thay Blanc." Mường Té.
Minot, Georges. 1940. "Dictonnaire Tẵy Blanc-Français." Bulletin de l'Ecole d'ExtrêmeOriente, t. XL.
Ngo Trung Viet. 2005. “ICT for Thai Ethnic Culture and Education Multilingual Projects,” in
Workshop on the Preservation and Digitization of Tai Scripts. Hanoi, Vietnam.
Phan Anh Dung and Ngo Trung Viet. 2005. “Technical Design, Software for Inputting and
Displaying Vietnam Thai Scripts,” in Workshop on the Preservation and Digitization of Tai
Scripts. Hanoi, Vietnam.
Robert, R. 1941. Notes sur les Tay Dèng de Lang Chánh (Thanh-Hoá, Annam). Institut
Indochinois pour l'Etude de L'Homme, mémoire n' 1. Hanoi: Imprimerie d'Extrême-Orient.
SIL. Tai Heritage font. Published by SIL International. Internet:
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=SILTD_home
Whitehouse, Ruth. 1975. Phonemic Write-up, Lao Song Language. Unpublished paper.
Unified Tai Script for Unicode
page 7 of 22
ISO/IEC JTC 1/SC 2/WG 2
PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS
FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 10646
Please fill all the sections A, B and C below.
TPPT
Please read Principles and Procedures Document (P & P) from http://www.dkuug.dk/JTC1/SC2/WG2/docs/principles.html for
guidelines and details before filling this form.
Please ensure you are using the latest Form from http://www.dkuug.dk/JTC1/SC2/WG2/docs/summaryform.html .
See also http://www.dkuug.dk/JTC1/SC2/WG2/docs/roadmaps.html for latest Roadmaps.
HTU
UTH
HTU
UTH
HTU
UTH
A. Administrative
1. Title:
Unified Tai (referred to as Viet Thai, row AA in the latest Roadmap)
see also: http://www.evertype.com/standards/tai/viet-thai-comparisons.pdf
2. Requester's name:
Ngo Trung Viet, Institute of Informatin Technology, VAST
Jim Brase, SIL International
3. Requester type (Member body/Liaison/Individual contribution):
Individual contribution
4. Submission date:
February 1, 2006
5. Requester's reference (if applicable):
6. Choose one of the following:
This is a complete proposal:
yes
(or) More information will be provided later:
B. Technical – General
1. Choose one of the following:
a. This proposal is for a new script (set of characters):
yes
Proposed name of script:
Unified Tai
b. The proposal is for addition of character(s) to an existing block:
no
Name of the existing block:
2. Number of characters in proposal:
124
3. Proposed category (select one from below - see section 2.2 of P&P document):
A-Contemporary
X B.1-Specialized (small collection)
B.2-Specialized (large collection)
C-Major extinct
D-Attested extinct
E-Minor extinct
F-Archaic Hieroglyphic or Ideographic
G-Obscure or questionable usage symbols
4. Proposed Level of Implementation (1, 2 or 3) (see Annex K in P&P document):
3
Is a rationale provided for the choice?
If Yes, reference:
5. Is a repertoire including character names provided?
yes
a. If YES, are the names in accordance with the “character naming guidelines”
in Annex L of P&P document?
yes
shapes currently provided are a mixture
b. Are the character shapes attached in a legible form suitable for review?
of styles. We are waiting for a type
designer to provide a uniform font.
6. Who will provide the appropriate computerized font (ordered preference: True Type, or PostScript format) for
publishing the standard?
SIL International
If available now, identify source(s) for the font (include address, e-mail, ftp-site, etc.) and indicate the tools
used:
7. References:
a. Are references (to other character sets, dictionaries, descriptive texts etc.) provided?
yes
b. Are published examples of use (such as samples from newspapers, magazines, or other sources)
of proposed characters attached?
8. Special encoding issues:
Does the proposal address other aspects of character data processing (if applicable) such as input,
presentation, sorting, searching, indexing, transliteration etc. (if yes please enclose information)?
9. Additional Information:
Submitters are invited to provide any additional information about Properties of the proposed Character(s) or Script
that will assist in correct understanding of and correct linguistic processing of the proposed character(s) or script.
Examples of such properties are: Casing information, Numeric information, Currency information, Display behaviour
information such as line breaks, widths etc., Combining behaviour, Spacing behaviour, Directional behaviour, Default
Collation behaviour, relevance in Mark Up contexts, Compatibility equivalence and other Unicode normalization
related information. See the Unicode standard at http://www.unicode.org for such information on other scripts. Also
see http://www.unicode.org/Public/UNIDATA/UCD.html and associated Unicode Technical Reports for information
needed for consideration by the Unicode Technical Committee for inclusion in the Unicode Standard.
HTU
HTU
UTH
UTH
Unified Tai Script for Unicode
page 8 of 22
C. Technical - Justification
1. Has this proposal for addition of character(s) been submitted before?
If YES explain
2. Has contact been made to members of the user community (for example: National Body,
user groups of the script or characters, other experts, etc.)?
private individuals from Tai Dam community in United States
If YES, with whom?
no
yes
Contact with community in Vietnam through Dr. Ngo Trung Viet
If YES, available relevant documents:
3. Information on the user community for the proposed characters (for example:
yes
size, demographics, information technology use, or publishing use) is included?
Reference:
4. The context of use for the proposed characters (type of use; common or rare)
common
Reference:
5. Are the proposed characters in current use by the user community?
yes
Vietnam and United States. Uncertain about Laos and Thailand
If YES, where? Reference:
6. After giving due considerations to the principles in the P&P document must the proposed characters be entirely
yes
in the BMP?
If YES, is a rationale provided?
If YES, reference:
7. Should the proposed characters be kept together in a contiguous range (rather than being scattered)?
yes
8. Can any of the proposed characters be considered a presentation form of an existing
no
character or character sequence?
If YES, is a rationale for its inclusion provided?
If YES, reference:
9. Can any of the proposed characters be encoded using a composed character sequence of either
yes--ligatures
existing characters or other proposed characters?
yes
If YES, is a rationale for its inclusion provided?
If YES, reference:
10. Can any of the proposed character(s) be considered to be similar (in appearance or function)
no
to an existing character?
If YES, is a rationale for its inclusion provided?
If YES, reference:
11. Does the proposal include use of combining characters and/or use of composite sequences?
yes
combining characters are an
If YES, is a rationale for such use provided?
inherent part of the writing system
If YES, reference:
none
Is a list of composite sequences and their corresponding glyph images (graphic symbols) provided?
If YES, reference:
12. Does the proposal contain characters with any special properties such as
no
control function or similar semantics?
If YES, describe in detail (include attachment if necessary)
13. Does the proposal contain any Ideographic compatibility character(s)?
If YES, is the equivalent corresponding unified ideographic character(s) identified?
If YES, reference:
no
Unified Tai Script for Unicode
page 9 of 22
UNIFIED TAI
xx0
xx1
xx2
xx3
0
ꬎ
ꬬ
꭛◌
1
ꬁ
ꬭ
ꭜ◌
ꬮ
2
ꬂ
ꬠ
3
ꬃ
ꬑ
ꬡ
4
ꬄ
ꬒ
ꬢ
5
ꬅ
ꬓ
ꬣ
6
ꬔ
7
ꬕ
ꬆ
ꬖ
ꬤ
9
ꬥ
B
ꬉ
C
ꬊ
D
ꬋ
E
F
ꬌ
ꬍ
ꬦ
ꬨ
ꬩ
ꬪ
ꬫ
xx5
xx6
◌ꭞ
ꬰ
-
ꬱ
◌ꭑ
ꭠ◌
◌ꭡ
ꬳ
8
A
xx4
◌ꭓ
ꬶ
ꬷ
ꭉ
ꭢ◌
ꭣ◌
◌ꭔ
◌ꭤ
◌ꭖ
◌ꭦ
◌ꭘ
◌ꭨ
◌ꭥ
xx7
Unified Tai Script for Unicode
page 10 of 22
Names Table
Consonants & Symbols—Unified Tai
Alphabet
xx11
UNIFIED TAI LETTER KO HIGH
xx12
UNIFIED TAI LETTER KO LOW
xx13
UNIFIED TAI LETTER KHO
HIGH
xx14
UNIFIED TAI LETTER KHO LOW
xx15
UNIFIED TAI LETTER KHHO
HIGH
xx16
UNIFIED TAI LETTER KHHO
LOW
xx17
xx06
UNIFIED TAI LETTER GO HIGH
xx18
xx07
UNIFIED TAI LETTER GO LOW
xx19
UNIFIED TAI LETTER NGO
HIGH
xx1A
UNIFIED TAI LETTER NGO LOW
xx1B
UNIFIED TAI LETTER CO HIGH
xx1C
UNIFIED TAI LETTER CO LOW
xx1D
UNIFIED TAI LETTER CHO
HIGH
xx1E
UNIFIED TAI LETTER CHO LOW
xx1F
UNIFIED TAI LETTER SO HIGH
xx20
UNIFIED TAI LETTER SO LOW
xx21
UNIFIED TAI LETTER NHO
HIGH
xx22
xx00
xx01
xx02
xx03
xx04
xx05
xx08
xx09
xx0A
xx0B
xx0C
xx0D
xx0E
xx0F
xx10
ꬁ
ꬂ
ꬃ
ꬄ
ꬅ
ꬆ
ꬉ
ꬊ
ꬋ
ꬌ
ꬍ
ꬎ
ꬑ
ꬒ
ꬓ
ꬔ
ꬕ
ꬖ
ꬠ
UNIFIED TAI LETTER NHO LOW
UNIFIED TAI LETTER DO HIGH
UNIFIED TAI LETTER DO LOW
UNIFIED TAI LETTER TO HIGH
UNIFIED TAI LETTER TO LOW
UNIFIED TAI LETTER THO HIGH
UNIFIED TAI LETTER THO LOW
UNIFIED TAI LETTER NO HIGH
UNIFIED TAI LETTER NO LOW
UNIFIED TAI LETTER BO HIGH
UNIFIED TAI LETTER BO LOW
UNIFIED TAI LETTER PO HIGH
UNIFIED TAI LETTER PO LOW
UNIFIED TAI LETTER PHO
HIGH
UNIFIED TAI LETTER PHO LOW
UNIFIED TAI LETTER FO HIGH
UNIFIED TAI LETTER FO LOW
UNIFIED TAI LETTER MO HIGH
Unified Tai Script for Unicode
ꬡ
page 11 of 22
ꬱ
UNIFIED TAI LETTER
ALTERNATE SO HIGH
UNIFIED TAI LETTER MO LOW
xx35
UNIFIED TAI LETTER YO HIGH
xx36
UNIFIED TAI LETTER YO LOW
xx37
xx26
UNIFIED TAI LETTER RO HIGH
xx38
UNIFIED TAI LETTER
ALTERNATE THO LOW
xx27
UNIFIED TAI LETTER RO LOW
xx39
UNIFIED TAI LETTER
ALTERNATE FO LOW
UNIFIED TAI LETTER LO HIGH
xx3A
UNIFIED TAI LETTER LO LOW
xx3B
UNIFIED TAI LETTER VO HIGH
xx3C
xx23
xx24
xx25
xx28
xx29
xx2A
xx2B
xx2C
xx2D
xx2E
xx2F
xx30
xx31
xx32
ꬢ
ꬣ
ꬤ
ꬥ
ꬦ
ꬨ
ꬩ
ꬪ
ꬫ
ꬬ
ꬭ
ꬮ
UNIFIED TAI LETTER VO LOW
xx34
ꬰ
ꬳ
ꬶ
ꬷ
UNIFIED TAI LETTER TAI
DAENG NHO LOW
UNIFIED TAI LETTER
ALTERNATE YO HIGH
UNIFIED TAI LETTER
ALTERNATE YO LOW
UNIFIED TAI LETTER
ALTERNATE LO LOW
Consonants & Symbols—Tai Daeng
additions
UNIFIED TAI LETTER HO HIGH
xx3D
UNIFIED TAI LETTER TAI
DAENG KO LOW
UNIFIED TAI LETTER HO LOW
xx3E
UNIFIED TAI LETTER TAI
DAENG KO ALTERNATE
UNIFIED TAI LETTER O HIGH
xx3F
UNIFIED TAI LIGATURE TAI
DAENG KN
UNIFIED TAI LETTER O LOW
xx40
UNIFIED TAI LIGATURE TAI
DAENG KW
UNIFIED TAI SYMBOL KON
(Person)
xx41
UNIFIED TAI LETTER TAI
DAENG KHO HIGH
UNIFIED TAI SYMBOL NEUNG
(One)
xx42
UNIFIED TAI LETTER TAI
DAENG NGO HIGH
UNIFIED TAI SYMBOL SAM
(Repetition)
xx43
UNIFIED TAI LETTER TAI
DAENG NGO LOW
xx44
UNIFIED TAI LETTER TAI
DAENG SO LOW
UNIFIED TAI LETTER
ALTERNATE KO LOW
xx45
UNIFIED TAI LETTER TAI
DAENG NHO HIGH
UNIFIED TAI LETTER
ALTERNATE CO LOW
xx46
UNIFIED TAI LETTER TAI
DAENG DO HIGH
Consonants & Symbols—Additions used by
two or more languages
xx33
UNIFIED TAI LETTER
ALTERNATE NHO LOW
Unified Tai Script for Unicode
page 12 of 22
xx47
UNIFIED TAI LETTER TAI
DAENG PO LOW
xx56
xx48
UNIFIED TAI LETTER TAI
DAENG FO ALTERNATE
xx57
xx49
UNIFIED TAI LETTER TAI
DAENG YO
xx58
xx4A
UNIFIED TAI LIGATURE TAI
DAENG HO YO
xx59
xx4B
UNIFIED TAI LETTER TAI
DAENG VO HIGH
xx5A
xx4C
UNIFIED TAI LETTER TAI
DAENG VO LOW
xx5B
UNIFIED TAI VOWEL SPACING
A
◌ꭓ
UNIFIED TAI VOWEL RAISED A
◌ꭔ
xx4D
ꭉ
UNIFIED TAI LETTER TAI DAM
THO LOW
xx5D
◌ꭖ
◌ꭘ
UNIFIED TAI VOWEL SPACING
U
UNIFIED TAI VOWEL SPACING
E
UNIFIED TAI LETTER TAI DON
KO LOW
xx5F
xx4F
UNIFIED TAI LETTER TAI DON
NGO HIGH
xx60
xx50
UNIFIED TAI LETTER TAI DON
SO LOW
xx61
xx51
UNIFIED TAI LETTER TAI DON
DO HIGH
xx62
xx52
UNIFIED TAI LETTER TAI DON
FO LOW
xx63
xx53
UNIFIED TAI LETTER TAI DON
MO HIGH
xx64
-
UNIFIED TAI LETTER THAI
SONG KHO HIGH
xx65
xx66
xx67
Vowels & Tones—Unified Tai Alphabet
xx55
◌ꭑ
UNIFIED TAI VOWEL
COMBINING A
UNIFIED TAI VOWEL
COMBINING U
xx5E
xx4E
xx54
UNIFIED TAI VOWEL
COMBINING UE
UNIFIED TAI VOWEL SPACING
UE
Consonants & Symbols—Tai Don additions
Consonants & Symbols—Thai Song
additions
UNIFIED TAI VOWEL
COMBINING I
UNIFIED TAI VOWEL SPACING I
xx5C
Consonants & Symbols—Tai Dam additions
UNIFIED TAI VOWEL AA
xx68
꭛◌
ꭜ◌
UNIFIED TAI VOWEL EH
UNIFIED TAI VOWEL O
UNIFIED TAI VOWEL UH
◌ꭞ
UNIFIED TAI VOWEL
COMBINING IA
UNIFIED TAI VOWEL SPACING
IA
ꭠ◌
◌ꭡ
ꭢ◌
ꭣ◌
UNIFIED TAI VOWEL UEA
UNIFIED TAI VOWEL UA
UNIFIED TAI VOWEL UHW
UNIFIED TAI VOWEL AY
Unified Tai Script for Unicode
xx69
xx6A
xx6B
◌ꭤ
◌ꭥ
◌ꭦ
xx6C
xx6D
xx6E
◌ꭨ
page 13 of 22
UNIFIED TAI VOWEL AN
xx73
UNIFIED TAI VOWEL TAI
DAENG U
UNIFIED TAI VOWEL AM
xx74
UNIFIED TAI VOWEL TAI
DAENG UU
UNIFIED TAI TONE COMBINING
MAI EK
xx75
UNIFIED TAI VOWEL TAI
DAENG EE
UNIFIED TAI TONE SPACING
MAI EK
xx76
UNIFIED TAI VOWEL TAI
DAENG SHORT O
UNIFIED TAI TONE COMBINING
MAI THO
xx77
UNIFIED TAI VOWEL TAI
DAENG UUA
UNIFIED TAI TONE SPACING
MAI THO
xx78
UNIFIED TAI VOWEL TAI
DAENG SHORT UH
Vowels & Tones—Tai Daeng additions
Vowels & Tones—Tai Don additions
xx6F
UNIFIED TAI VOWEL TAI
DAENG A
xx79
UNIFIED TAI VOWEL TAI DON A
xx70
UNIFIED TAI VOWEL TAI
DAENG II
xx7A
UNIFIED TAI VOWEL TAI DON
AT
xx71
UNIFIED TAI VOWEL TAI
DAENG UE
xx7B
UNIFIED TAI VOWEL LOW
TONE AA
xx72
UNIFIED TAI VOWEL TAI
DAENG UUE
Unified Tai Script for Unicode
page 14 of 22
Character Properties
code
value/
range
Rep
Glyph
Unicode Character Name
xx00
..xx2F
Gen
Cat
Can
Comb
Class
Bidi
Cat
Lo
0
L
N
N
Char
Decomp
Dec
Dig
Val
Dig
Val
Num
Val
Mirr’d
xx30
ꬬ
UNIFIED TAI
SYMBOL KON
(Person)
0
L
xx31
ꬭ
UNIFIED TAI
SYMBOL NEUNG
(One)
0
L
xx32
ꬮ
UNIFIED TAI
SYMBOL SAM
(Repetition)
0
L
N
Lo
0
L
N
MN
230
NSM
N
Lo
0
L
N
Lo
0
L
N
Lo
0
L
N
MN
230
NSM
N
Lo
0
L
N
MN
230
NSM
N
Lo
0
L
N
MN
220
NSM
N
Lo
0
L
N
xx33
..xx54
xx55
◌ꭑ
xx56
◌ꭓ
UNIFIED TAI
VOWEL COMBINING
A
UNIFIED TAI
VOWEL SPACING A
1
N
xx5E
UNIFIED TAI
VOWEL AA
UNIFIED TAI
VOWEL RAISED A
UNIFIED TAI
VOWEL COMBINING
I
UNIFIED TAI
VOWEL SPACING I
UNIFIED TAI
VOWEL COMBINING
UE
UNIFIED TAI
VOWEL SPACING
UE
UNIFIED TAI
VOWEL COMBINING
U
UNIFIED TAI
VOWEL SPACING U
xx5F
UNIFIED TAI
VOWEL SPACING E
Lo
0
L
N
UNIFIED TAI
VOWEL EH
Lo
0
L
N
UNIFIED TAI
VOWEL O
Lo
0
L
N
xx57
xx58
xx59
◌ꭔ
xx5A
xx5B
◌ꭖ
xx5C
xx5D
xx60
xx61
◌ꭘ
꭛◌
ꭜ◌
xx62
xx63
◌ꭞ
xx64
xx65
xx66
xx67
xx68
xx69
ꭠ◌
◌ꭡ
ꭢ◌
ꭣ◌
◌ꭤ
UNIFIED TAI
VOWEL UH
UNIFIED TAI
VOWEL COMBINING
IA
UNIFIED TAI
VOWEL SPACING IA
Lo
0
L
N
MN
230
NSM
N
Lo
0
L
N
UNIFIED TAI
VOWEL UEA
Lo
0
L
N
UNIFIED TAI
VOWEL UA
Lo
0
L
N
UNIFIED TAI
VOWEL UHW
Lo
0
L
N
UNIFIED TAI
VOWEL AY
Lo
0
L
N
UNIFIED TAI
VOWEL AN
Lo
0
L
N
U 1.0
Name
10646
Com
Upper
Case
Equiv
Lwr
Case
Equiv
Title
Case
Equiv
Unified Tai Script for Unicode
xx6A
xx6B
◌ꭥ
◌ꭦ
xx6C
xx6D
xx6E
xx6F
xx70
xx71
xx72
xx73
xx74
xx75
xx76
xx77
xx78
◌ꭨ
page 15 of 22
UNIFIED TAI
VOWEL AM
MN
230
NSM
N
UNIFIED TAI TONE
COMBINING MAI EK
MN
230
NSM
N
Lo
0
L
N
MN
230
NSM
N
Lo
0
L
N
Lo
0
L
N
MN
230
NSM
N
MN
230
NSM
N
MN
230
NSM
N
MN
220
NSM
N
MN
220
NSM
N
MN
230
NSM
N
MN
230
NSM
N
Lo
0
L
N
Lo
0
L
N
UNIFIED TAI TONE
SPACING MAI EK
UNIFIED TAI TONE
COMBINING MAI
THO
UNIFIED TAI TONE
SPACING MAI THO
UNIFIED TAI
VOWEL TAI DAENG
A
UNIFIED TAI
VOWEL TAI DAENG
II
UNIFIED TAI
VOWEL TAI DAENG
UE
UNIFIED TAI
VOWEL TAI DAENG
UUE
UNIFIED TAI
VOWEL TAI DAENG
U
UNIFIED TAI
VOWEL TAI DAENG
UU
UNIFIED TAI
VOWEL TAI DAENG
EE
UNIFIED TAI
VOWEL TAI DAENG
SHORT O
UNIFIED TAI
VOWEL TAI DAENG
UUA
UNIFIED TAI
VOWEL TAI DAENG
SHORT UH
xx79
UNIFIED TAI
VOWEL TAI DON A
MN
230
NSM
N
xx7A
UNIFIED TAI
VOWEL TAI DON AT
Lo
0
L
N
xx7B
UNIFIED TAI
VOWEL LOW TONE
AA
MN
220
NSM
N
Unified Tai Script for Unicode
page 16 of 22
Sort Order—Lao Based
The following description is an initial attempt to define a sort order based on Lao. It is adapted from
the orders used by Baccam, et. al. (1989) for Tai Dam and Marcus (1970) for Lao. The primary
difference between this description and the order used by Baccam is the addition of the aspirated stops
from Tai Dón, the ‘g’ and ‘r’ characters, and the vowel length contrast from Tai Daeng. We will look
to Marcus for guidance on how to make those adjustments.
Consideration of Word and Syllable Structure
The best information is available for Tai Dam. The information given here for Tai Dam is thought to
be representative of the other languages, unless explicitly noted.
There are two syllable patterns in Tai Dam: CV and CVC. When sorting, the segments of the syllable
are considered in their spoken order, not their written order. Thus, when comparing bN ‘moon,
month’, to another word, first compare the b (/b/) to the initial consonant of the other word. Next,
compare (/ɨə/) to the vowel of the other word. Third, compare N (/n/) to the final consonant of the
other word. Compare the tones last of all.
In Thai Song, the syllable can have a very limited range of initial consonant clusters. It is not clear at
this time how those clusters should be sorted.
Tai Dam is almost exclusively monosyllabic. A very small number of words have an unstressed initial
syllable. The first uses a mid-central vowel even though it is written with an ‘’ (/a:/). E.g. kt
‘even if’, t ‘eye’. For a set of words with any given initial consonant, those with two syllables sort
before those with only one. Effectively, the unstressed vowel of the initial syllable is considered to
precede all other vowels.
Consonant order
In general, the consonants in Thai languages are sorted according to the point of articulation, starting
at the back of the mouth and moving to the front. A few residue characters are often tacked on at the
end. This rule leads to the order shown for the Unified Alphabet in the code chart, from xx00 to xx2F.
The symbols ꬬ (/kon4/) and ꬭ (/nɨŋ5/) are sorted as though the words were spelled out. The symbol
ꬮ (Repetition) has no sort order value.
Two considerations arise as to the sort order for the traditional forms of the script. First, those
consonants that are added between xx33 and xx54 for the traditional writing have the same sort order
as the ones from the Unified Alphabet that they correspond to. E.g. UNIFIED TAI LETTER ALTERNATE
SO HIGH (xx35, ꬱ) has the same sorting value as UNIFIED TAI LETTER SO HIGH (xx0E, ꬌ).
Second, if the sort order is always according to the point of articulation, then the order becomes
language dependent. E.g. in Tai Dam, UNIFIED TAI LETTER PO LOW has the orthographic value /p/
low. Thus the sort order for it remains unchanged from the default of the Unified Alphabet. But in
Tai Dón, UNIFIED TAI LETTER PO LOW has the orthographic value /m/ low. Therefore it would sort
after the UNIFIED TAI LETTER TAI DON MO HIGH.
Unified Tai Script for Unicode
page 17 of 22
Labialized Consonants and Consonant Clusters
In Baccam (1989), words with a labialized consonant were sorted after all words with the
corresponding unlabialized consonant. It may be best to handle the Thai Song consonant clusters in a
similar fashion.
Vowel order
The order shown in the character chart is an approximation of the vowel order, but leaves out many
digraph vowels. A more complete order is shown in this chart. The vowel + final consonant
ligatures are treated as vowels for sorting. As with the consonants, the orthographic value assigned to
the characters affects the sort order.
This chart shows the Unified Alphabet, Tai Dam, and Tai Daeng. More study is needed for Tai Dón
and Thai Song.
Unified Alphabet (spacing
vowels)
◌ꭓ
Tai Dam traditional form
(combining vowels)
◌ꭑ (closed syllables)
◌ꭓ
◌ꭔ
Tai Daeng traditional form
(long & short vowels)
/a/
◌ꭓ
◌ꭔ
◌ꭖ
/u/
/u:/
ꭠ◌ꭞ
꭛◌
/ɐ/ (Vietnamese)
/a:/
/i/
/i:/
/ɨ/
/ɨ:/
◌ꭘ
꭛◌
IPA representation
꭛◌
/e/
/e:/
/ɛ/
/ɛ/ (/ɛ:/ in Tai Daeng)
/oʔ/
/o/
ꭜ◌
ꭜ◌
ꭜ◌
◌ꬪ
◌ꬪ
◌ꭑ (open syllable)
◌ꬪ
ꭠ◌ꭑ
◌ꬪ
/o/ (/o:/ in Tai Daeng)
/ɔ/
/ɔʔ/
/ɔ#/
/ɔ/ (/ɔ:/ in Tai Daeng)
/ə/
Unified Tai Script for Unicode
page 18 of 22
ꭢ◌
ꭠ◌
◌ꭞ
ꭠ◌
◌ꭡ
◌ꭡ
ꭢ◌
ꭢ◌
ꭣ◌
ꭠ◌ꭓ
ꭣ◌
ꭠ◌ꭓ
◌ꭤ
◌ꭥ
◌ꭥ
◌ꭞ
/ə:/
/iə/
/iə/ (/iə:/ in Tai Daeng)
ꭠ◌ꭑ
ꭣ◌
ꭠ◌ꭓ
◌ꭥ
/ɨə/
/ɨə:/
/uə/
/uə:/
/əw/
/aj/
/aw/
/an/
/am/
/ap/
Line Breaking
This is an initial draft of the line breaking rules for the Unified Tai Script. These rules apply when a
text does not have inter-word spacing, which would be the case with the oldest tradition of the script.
1. A line break can always occur before or after the characters:
•
UNIFIED TAI SYMBOL KON
•
UNIFIED TAI SYMBOL NEUNG
•
UNIFIED TAI SYMBOL SAM.
2. A break can always occur before a vowel which is written in front of the initial consonant.
These vowels include:
•
UNIFIED TAI VOWEL SPACING E
•
UNIFIED TAI VOWEL EH
•
UNIFIED TAI VOWEL O
•
UNIFIED TAI VOWEL UH
•
UNIFIED TAI VOWEL UEA
•
UNIFIED TAI VOWEL UHW
•
UNIFIED TAI VOWEL AY
•
UNIFIED TAI VOWEL TAI DAENG SHORT UH
3. A break can always occur after a Vowel + Final Consonant ligature which is written after the
initial consonant. These ligatures include:
•
UNIFIED TAI VOWEL AN
•
UNIFIED TAI VOWEL AM
•
UNIFIED TAI VOWEL TAI DON AT
•
UNIFIED TAI VOWEL LOW TONE AA (occurs only in open syllables)
4. a) A break can occur before a consonant providing:
Unified Tai Script for Unicode
page 19 of 22
(1) The break will not split a labialized velar consonant.
That is, if the consonant is a UNIFIED TAI LETTER VO LOW or UNIFIED TAI LETTER TAI
DAENG VO LOW, it must not be preceded by a velar consonant:
o UNIFIED TAI LETTER KO HIGH
o UNIFIED TAI LETTER KO LOW
o UNIFIED TAI LETTER KHO HIGH
o UNIFIED TAI LETTER KHO LOW
o UNIFIED TAI LETTER KHHO HIGH
o UNIFIED TAI LETTER KHHO LOW
o UNIFIED TAI LETTER NGO HIGH
o UNIFIED TAI LETTER NGO LOW
o UNIFIED TAI LETTER ALTERNATE KO LOW
o UNIFIED TAI LETTER TAI DAENG KO LOW
o UNIFIED TAI LETTER TAI DAENG KO ALTERNATE
o UNIFIED TAI LETTER TAI DAENG KHO HIGH
o UNIFIED TAI LETTER TAI DAENG NGO HIGH
o UNIFIED TAI LETTER TAI DAENG NGO LOW
o UNIFIED TAI LETTER TAI DON KO LOW
o UNIFIED TAI LETTER TAI DON NGO HIGH
o UNIFIED TAI LETTER THAI SONG KHO HIGH
(2) None of the vowels listed in rule 2 occur before it.
b) and one of the following vowels or tones occurs after it:
•
UNIFIED TAI VOWEL COMBINING A
•
UNIFIED TAI VOWEL SPACING A
•
UNIFIED TAI VOWEL AA
•
UNIFIED TAI VOWEL RAISED A
•
UNIFIED TAI VOWEL COMBINING I
•
UNIFIED TAI VOWEL SPACING I
•
UNIFIED TAI VOWEL COMBINING UE
•
UNIFIED TAI VOWEL SPACING UE
•
UNIFIED TAI VOWEL COMBINING U
•
UNIFIED TAI VOWEL SPACING U
•
UNIFIED TAI VOWEL COMBINING IA
•
UNIFIED TAI VOWEL SPACING IA
•
UNIFIED TAI VOWEL UA
•
UNIFIED TAI VOWEL AN
•
UNIFIED TAI VOWEL AM
•
UNIFIED TAI TONE COMBINING MAI EK
•
UNIFIED TAI TONE SPACING MAI EK
•
UNIFIED TAI TONE COMBINING MAI THO
•
UNIFIED TAI TONE SPACING MAI THO
•
UNIFIED TAI VOWEL TAI DAENG A
•
UNIFIED TAI VOWEL TAI DAENG II
•
UNIFIED TAI VOWEL TAI DAENG UE
Unified Tai Script for Unicode
•
UNIFIED TAI VOWEL TAI DAENG UUE
•
UNIFIED TAI VOWEL TAI DAENG U
•
UNIFIED TAI VOWEL TAI DAENG UU
•
UNIFIED TAI VOWEL TAI DAENG EE
•
UNIFIED TAI VOWEL TAI DAENG SHORT O
•
UNIFIED TAI VOWEL TAI DAENG UUA
•
UNIFIED TAI VOWEL TAI DON A
•
UNIFIED TAI VOWEL TAI DON AT
•
UNIFIED TAI VOWEL LOW TONE AA
page 20 of 22
Additional study is needed to determine whether these rules are accurate and adequate.
Unified Tai Script for Unicode
page 21 of 22
Script Samples
Figure 1—From Giới Thiệu Chương Trình Thái Học Việt Nam, 1999. Note the interword spacing.
Figure 2—Untitled, undated manuscript in Tai Daeng. Note the word spacing and
center baseline.
Unified Tai Script for Unicode
page 22 of 22
Figure 3—From Baccam et. al., p 13.
Figure 4—From Khhãm Kháo Đi Chảu Dê-su Seo Lũng Ók Mác Tẻm, 1983.