Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Jump to content

KPS 9566: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
cleanup
 
(18 intermediate revisions by 9 users not shown)
Line 1: Line 1:
{{Short description|North Korean character set}}
{{Short description|North Korean character set}}
{{Very long|date=August 2024}}
{{Infobox character encoding
{{Infobox character encoding
| name = KPS 9566
| name = KPS 9566
Line 16: Line 17:
'''KPS 9566''' ("''DPRK Standard Korean Graphic Character Set for Information Interchange''")<ref name="lunde2009"/> is a [[North Korea]]n standard specifying a character encoding for the [[Chosŏn'gŭl]] (Hangul) writing system used for the [[Korean language]]. The edition of 1997 specified an [[ISO 2022]]-compliant 94&times;94 two-byte [[coded character set]]. Subsequent editions have added additional encoded characters outside of the 94&times;94 plane, in a manner comparable to [[Unified Hangul Code|UHC]] or [[GBK (character encoding)|GBK]].<ref name="utc-L2-18-011"/>
'''KPS 9566''' ("''DPRK Standard Korean Graphic Character Set for Information Interchange''")<ref name="lunde2009"/> is a [[North Korea]]n standard specifying a character encoding for the [[Chosŏn'gŭl]] (Hangul) writing system used for the [[Korean language]]. The edition of 1997 specified an [[ISO 2022]]-compliant 94&times;94 two-byte [[coded character set]]. Subsequent editions have added additional encoded characters outside of the 94&times;94 plane, in a manner comparable to [[Unified Hangul Code|UHC]] or [[GBK (character encoding)|GBK]].<ref name="utc-L2-18-011"/>


KPS 9566 differs in approach from [[KS X 1001]], its [[South Korea]]n counterpart, in using a different ordering of chosŏn'gŭl,<ref name="wg2-n2231"/> in encoding explicit vertical presentation forms of punctuation, in not encoding duplicate [[hanja]] for multiple readings, and in including several characters specific to the North Korean political system, including special encodings for the names of the country's past and present leaders ([[Kim Il Sung]], [[Kim Jong Il]] and [[Kim Jong Un]]).<ref name="kps9566txt"/><ref name="lunde2009"/><ref name="utc-L2-18-011"/><ref name="lundeduplicates">{{cite web |url=https://ccjktype.fonts.adobe.com/2019/03/four-of-a-kind.html |title=Four of a Kind: KS X 1001 & KPS 9566 |last=Lunde |first=Ken |author-link=Ken Lunde |work=CJK Type Blog |publisher=[[Adobe Inc]] |date=2019-03-25}}</ref>
KPS 9566 differs in approach from [[KS X 1001]], its [[South Korea]]n counterpart, in using a different ordering of Chosŏn'gŭl,<ref name="wg2-n2231"/> in encoding explicit vertical presentation forms of punctuation, in not encoding duplicate [[Hanja]] for multiple readings, and in including several characters specific to the North Korean political system, including special encodings for the names of the country's past and present leaders ([[Kim Il Sung]], [[Kim Jong Il]] and [[Kim Jong Un]]).<ref name="kps9566txt"/><ref name="lunde2009"/><ref name="utc-L2-18-011"/><ref name="lundeduplicates">{{cite web |url=https://ccjktype.fonts.adobe.com/2019/03/four-of-a-kind.html |title=Four of a Kind: KS X 1001 & KPS 9566 |last=Lunde |first=Ken |author-link=Ken Lunde |work=CJK Type Blog |publisher=[[Adobe Inc]] |date=2019-03-25}}</ref>


Although KPS 9566 was the original source of several characters added to [[Unicode]],<ref name="ewellflags"/> not all KPS 9566 characters have Unicode equivalents. Those which do not are mapped to similar Unicode characters or to the [[Private Use Area]].<ref name="westtilde" />
Although KPS 9566 was the original source of several characters added to [[Unicode]],<ref name="ewellflags"/> not all KPS 9566 characters have Unicode equivalents. Those which do not are mapped to similar Unicode characters or to the [[Private Use Area]].<ref name="westtilde" />
Line 82: Line 83:
|pages=242–255 }}</ref>
|pages=242–255 }}</ref>


Although the Korean writing system includes individual symbols ([[Hangul consonant and vowel tables|jamo]]) for consonants and vowels, serving as an [[alphabet]], Korean text is properly typeset with these symbols composed into blocks for each syllable. Wansung code included individual Korean syllable blocks separately, treating them as a large set of characters similarly to [[hanja]],<ref name="shin" /> and was first defined by the third edition of the South Korean standard KS C 5601. The first edition had defined an encoding of individual jamo which allowed syllable blocks to be encoded as sequences, which was named [[KS X 1001#1974|N-byte Hangul]], and had not been adopted as widely as intended.<ref name="Hwang">{{citation|mode=cs1|title=The Social Shaping of ICTs Standards: A Case of National Coded Character Set Standards Controversy in Korea|first=Jinsang|last=Hwang|date=2005|publisher=University of Edinburgh|url=https://www.era.lib.ed.ac.uk/bitstream/handle/1842/12253/Hwang2005.pdf}}</ref><ref name="cjkinf336">{{citation|mode=cs1|section=3.3.6: N-byte Hangul|url=https://ccjktype.fonts.adobe.com/wp-content/uploads/2013/09/cjk_inf.txt|last=Lunde|first=Ken|author-link=Ken Lunde|title=CJK.INF Version 1.9|date=1995-12-18}}</ref>
Although the Korean writing system includes individual symbols ([[Hangul consonant and vowel tables|jamo]]) for consonants and vowels, serving as an [[alphabet]], Korean text is properly typeset with these symbols composed into blocks for each syllable. Wansung code included individual Korean syllable blocks separately, treating them as a large set of characters similarly to [[Hanja]],<ref name="shin" /> and was first defined by the third edition of the South Korean standard KS C 5601. The first edition had defined an encoding of individual jamo which allowed syllable blocks to be encoded as sequences, which was named [[KS X 1001#1974|N-byte Hangul]], and had not been adopted as widely as intended.<ref name="Hwang">{{citation|mode=cs1|title=The Social Shaping of ICTs Standards: A Case of National Coded Character Set Standards Controversy in Korea|first=Jinsang|last=Hwang|date=2005|publisher=University of Edinburgh|url=https://www.era.lib.ed.ac.uk/bitstream/handle/1842/12253/Hwang2005.pdf}}</ref><ref name="cjkinf336">{{citation|mode=cs1|section=3.3.6: N-byte Hangul|url=https://ccjktype.fonts.adobe.com/wp-content/uploads/2013/09/cjk_inf.txt|last=Lunde|first=Ken|author-link=Ken Lunde|title=CJK.INF Version 1.9|date=1995-12-18}}</ref>


Wansung code did not encode all possible modern Korean syllables, only a selection of the 2350 most common,<ref name="lunde2009"/> although it allowed them to be specified using combining sequences, which often were not supported.<ref name="shin">{{cite web | url=http://stason.org/TULARC/languages/korean/8-What-are-KS-X-1001-KS-C-5601-and-other-Hangul-codes.html | title=What are KS X 1001(KS C 5601) and other Hangul codes? | work=Hangul & Internet in Korea FAQ | last=Shin | first=Jungshik}}</ref> An alternative encoding, also South Korean, named [[Johab]] did, and served as a competitor to Wansung for some time.<ref name="Hwang" /> [[Unified Hangul Code]] (UHC), introduced by Microsoft with [[Windows 95]], extended EUC-KR, allowing the use of invalid EUC double-byte codes to represent all other syllables available in Johab.<ref name="shin"/> A similar approach was taken by the Mainland Chinese [[GBK (character encoding)|GBK]] encoding, extending [[GB 2312]] with support for Traditional Chinese and for less common Chinese characters by encoding them to double-byte codes invalid in [[EUC-CN]].<ref name="lunde2009-othercjksets" />
Wansung code did not encode all possible modern Korean syllables, only a selection of the 2350 most common,<ref name="lunde2009"/> although it allowed them to be specified using combining sequences, which often were not supported.<ref name="shin">{{cite web | url=http://stason.org/TULARC/languages/korean/8-What-are-KS-X-1001-KS-C-5601-and-other-Hangul-codes.html | title=What are KS X 1001(KS C 5601) and other Hangul codes? | work=Hangul & Internet in Korea FAQ | last=Shin | first=Jungshik}}</ref> An alternative encoding, also South Korean, named [[Johab]] did, and served as a competitor to Wansung for some time.<ref name="Hwang" /> [[Unified Hangul Code]] (UHC), introduced by Microsoft with [[Windows 95]], extended EUC-KR, allowing the use of invalid EUC double-byte codes to represent all other syllables available in Johab.<ref name="shin"/> A similar approach was taken by the Mainland Chinese [[GBK (character encoding)|GBK]] encoding, extending [[GB 2312]] with support for Traditional Chinese and for less common Chinese characters by encoding them to double-byte codes invalid in [[EUC-CN]].<ref name="lunde2009-othercjksets" />


South Korea was not the only country developing an ISO 2022 DBCS for Korean: the Mainland Chinese [[GB 12052]] was published in 1989. This was not closely related to Wansung code, although it also included composed syllables. Instead, it corresponded to GB 2312 with Korean syllables (and 94 [[hanja]]) replacing the Chinese characters, except for the inclusion of a dollar sign in place of a yuan sign. It may have been developed for use by the Korean minority in north-eastern China.<ref name="lunde2009"/>
South Korea was not the only country developing an ISO 2022 DBCS for Korean: the Mainland Chinese [[GB 12052]] was published in 1989. This was not closely related to Wansung code, although it also included composed syllables. Instead, it corresponded to GB 2312 with Korean syllables (and 94 [[Hanja]]) replacing the Chinese characters, except for the inclusion of a dollar sign in place of a yuan sign. It was developed for use by the Korean minority in north-eastern China.<ref name="lunde2009"/>


Likewise, North Korea developed KPS 9566. Although North Korea and South Korea both use Korean Chosŏn'gŭl (Hangul) as their primary writing system, they use different [[lexicographical order]]s.<ref name="wg2-n2246"/> Hence, character ordering differs between Wansung code and KPS 9566.<ref name="wg2-n2231"/>
Likewise, North Korea developed KPS 9566. Although North Korea and South Korea both use Korean Chosŏn'gŭl (Hangul) as their primary writing system, they use different [[lexicographical order]]s.<ref name="wg2-n2246"/> Hence, character ordering differs between Wansung code and KPS 9566.<ref name="wg2-n2231"/>
Line 95: Line 96:


== Design ==
== Design ==
In principle, KPS 9566 is similar to the Wansung character set defined by the [[South Korea]]n [[KS X 1001]] standard, although the two are not compatible. Both encode a section of punctuation, symbols, [[Hangul consonant and vowel tables|jamo]], [[kana]] and alphabetical characters, followed by a subset of the possible modern chosŏn'gŭl syllables, followed by a section of [[hanja]].<ref name="lunde2009"/> However, KPS 9566 uses a different ordering of jamo and syllables to conform with North Korean [[lexicographical order]]ing standards.<ref name="wg2-n2231"/> KPS 9566 also includes 28 explicitly rotated punctuation characters for vertical typography, which KS X 1001 does not, and encodes each hanja only once, whereas KS X 1001 encodes several hanja with multiple readings multiple times.<ref name="lunde2009">
In principle, KPS 9566 is similar to the Wansung character set defined by the [[South Korea]]n [[KS X 1001]] standard, although the two are not compatible. Both encode a section of punctuation, symbols, [[Hangul consonant and vowel tables|jamo]], [[kana]] and alphabetical characters, followed by a subset of the possible modern Chosŏn'gŭl syllables, followed by a section of [[Hanja]].<ref name="lunde2009"/> However, KPS 9566 uses a different ordering of jamo and syllables to conform with North Korean [[lexicographical order]]ing standards.<ref name="wg2-n2231"/> KPS 9566 also includes 28 explicitly rotated punctuation characters for vertical typography, which KS X 1001 does not, and encodes each Hanja only once, whereas KS X 1001 encodes several Hanja with multiple readings multiple times.<ref name="lunde2009">
{{cite book
{{cite book
|title=CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing
|title=CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing
Line 108: Line 109:
|pages=148–151 }}</ref>
|pages=148–151 }}</ref>


KPS 9566-97 encodes a total of 2679 chosŏn'gŭl syllables and 4653 hanja. This provides better coverage than the 2350 syllables encoded by Wansung code: for instance, the 똠 character used in the name of {{ill|똠방각하|ko}}, a noted Korean literary work, does not have an assigned Wansung codepoint, but has one (38-02) in KPS 9566.<ref name="lunde2009" /> The hanja section includes 4652 characters from the [[CJK Unified Ideographs (Unicode block)|Unified Repertoire and Ordering]] and one from [[CJK Unified Ideographs Extension A]]. The entirety of row 15, the latter half of row 44 (after the syllables block) and the latter half of row 94 (after the hanja block) may be used for user-defined purposes.<ref name="ir202" /><ref name="lunde2009" />
KPS 9566-97 encodes a total of 2679 Chosŏn'gŭl syllables and 4653 Hanja. This provides better coverage than the 2350 syllables encoded by Wansung code: for instance, the 똠 character used in the name of {{lang|ko|[[:ko:똠방각하|똠방각하]]}}, a noted Korean literary work, does not have an assigned Wansung codepoint, but has one (38-02) in KPS 9566.<ref name="lunde2009" /> The Hanja section includes 4652 characters from the [[CJK Unified Ideographs (Unicode block)|Unified Repertoire and Ordering]] and one from [[CJK Unified Ideographs Extension A]]. The entirety of row 15, the latter half of row 44 (after the syllables block) and the latter half of row 94 (after the Hanja block) may be used for user-defined purposes.<ref name="ir202" /><ref name="lunde2009" />


KPS 9566 is especially distinguished by its inclusion of several special characters from North Korean political life. Specifically, it includes the hammer, sickle and brush emblem of the [[Workers' Party of Korea]], both uncircled and circled<ref name="westtilde"/> (code points 12-01 and 12-02),<ref name="ir202"/> and two groups of three special-purpose characters which spell out the names of the North Korean leaders [[Kim Il Sung]] ({{lang|ko|김일성}}) and [[Kim Jong Il]] ({{lang|ko|김정일}}) in a special decorative font (code points 04-72 to 04-74 and 04-75 to 04-77, respectively).<ref name="lunde1999">
KPS 9566 is especially distinguished by its inclusion of several special characters from North Korean political life. Specifically, it includes the hammer, sickle and brush emblem of the [[Workers' Party of Korea]], both uncircled and circled<ref name="westtilde"/> (code points 12-01 and 12-02),<ref name="ir202"/> and two groups of three special-purpose characters which spell out the names of the North Korean leaders [[Kim Il Sung]] ({{lang|ko|김일성}}) and [[Kim Jong Il]] ({{lang|ko|김정일}}) in a special decorative font (code points 04-72 to 04-74 and 04-75 to 04-77, respectively).<ref name="lunde1999">
Line 127: Line 128:


=== KPS 10721 ===
=== KPS 10721 ===
North Korea also developed a second character set, KPS 10721 "''Code of the supplementary Korean Hanja Set for Information Interchange''", which was published in 2000. KPS 10721 encodes a set of at least 19469 hanja<ref name="lunde2009"/> additional to those included in KPS 9566. {{as of|2009}}, these did not all have mappings to Unicode, but included 10358 from the [[CJK Unified Ideographs (Unicode block)|Unified Repertoire and Ordering]], 3187 from [[CJK Unified Ideographs Extension A]] and 107 from [[CJK Compatibility Ideographs]] (all in the [[Basic Multilingual Plane]]), as well as 5767 from [[CJK Unified Ideographs Extension B]] and 50 from [[CJK Compatibility Ideographs Supplement]] (in the [[Supplementary Ideographic Plane]]).<ref name="lunde2009"/> All KPS 9566 hanja are also included in KPS 10721,<ref name="utc-l2-22-238"/> which uses a different encoding structure, unrelated to ISO 2022.
North Korea also developed a second character set, KPS 10721 "''Code of the supplementary Korean Hanja Set for Information Interchange''", which was published in 2000. KPS 10721 encodes a set of at least 19469 Hanja<ref name="lunde2009"/> additional to those included in KPS 9566. {{as of|2009}}, these did not all have mappings to Unicode, but included 10358 from the [[CJK Unified Ideographs (Unicode block)|Unified Repertoire and Ordering]], 3187 from [[CJK Unified Ideographs Extension A]] and 107 from [[CJK Compatibility Ideographs]] (all in the [[Basic Multilingual Plane]]), as well as 5767 from [[CJK Unified Ideographs Extension B]] and 50 from [[CJK Compatibility Ideographs Supplement]] (in the [[Supplementary Ideographic Plane]]).<ref name="lunde2009"/> All KPS 9566 Hanja are also included in KPS 10721,<ref name="utc-l2-22-238"/> which uses a different encoding structure, unrelated to ISO 2022.


Besides the mapping of these hanja (excluding those also in KPS 9566)<ref name="utc-l2-22-238"/> to Unicode, little was known about the KPS 10721 standard outside of North Korea<ref name="lunde2009"/><ref name="lundeduplicates"/> prior to 2022. North Korean reference glyphs were provided for only a subset of these hanja in the Unicode code charts, due to a lack of suitable font data available to the Unicode Consortium.<ref>{{cite web |url=https://www.unicode.org/faq/han_cjk.html#22 |archive-url=https://web.archive.org/web/20221004002836/https://www.unicode.org/faq/han_cjk.html#22 |archive-date=2022-10-04 |url-status=unfit<!--Section has since been removed from page.--> |title=Q: Why are DPRK (North Korean == kIRG_KPSource) glyphs missing from some CJK code charts? |work=FAQ - Chinese and Japanese |publisher=[[Unicode Consortium]] |first=Richard |last=Cook}}</ref><ref name="utc-l2-22-238">{{cite web |url=https://www.unicode.org/L2/L2022/22238-kirg-kp-source-glyphs.pdf |id=[[Unicode Technical Committee|UTC]] L2/22-238 |title=Proposal to consider adding CodeCharts support for kIRG_KPSource representative glyphs in Unicode |first1=Yi |last1=Bai |first2=CheonHyeong |last2=Sim |date=2022-10-16}}</ref> Unicode hanja characters with KPS 9566 or KPS 10721 sources are nonetheless cross-referenced to their KPS codes in the [[Unihan]] database with the key <code>kIRG_KPSource</code>; the Unihan source codes use "KP0" to refer to KPS 9566 and "KP1" for KPS 10721.<ref>{{cite web |url=https://www.unicode.org/reports/tr38/tr38-29.html#kIRG_KPSource |at=kIRG_KPSource |title=Unicode Han Database (Unihan) |id=Unicode Standard Annex #38 |date=2020-03-05 |first1=John H. |last1=Jenkins |first2=Richard |last2=Cook |first3=Ken |last3=Lunde |author-link3=Ken Lunde}}</ref>
Besides the mapping of these Hanja (excluding those also in KPS 9566)<ref name="utc-l2-22-238"/> to Unicode, little was known about the KPS 10721 standard outside of North Korea<ref name="lunde2009"/><ref name="lundeduplicates"/> prior to 2022. North Korean reference glyphs were provided for only a subset of these Hanja in the Unicode code charts, due to a lack of suitable font data available to the Unicode Consortium.<ref>{{cite web |url=https://www.unicode.org/faq/han_cjk.html#22 |archive-url=https://web.archive.org/web/20221004002836/https://www.unicode.org/faq/han_cjk.html#22 |archive-date=2022-10-04 |url-status=unfit<!--Section has since been removed from page.--> |title=Q: Why are DPRK (North Korean == kIRG_KPSource) glyphs missing from some CJK code charts? |work=FAQ - Chinese and Japanese |publisher=[[Unicode Consortium]] |first=Richard |last=Cook}}</ref><ref name="utc-l2-22-238">{{cite web |url=https://www.unicode.org/L2/L2022/22238-kirg-kp-source-glyphs.pdf |id=[[Unicode Technical Committee|UTC]] L2/22-238 |title=Proposal to consider adding CodeCharts support for kIRG_KPSource representative glyphs in Unicode |first1=Yi |last1=Bai |first2=CheonHyeong |last2=Sim |date=2022-10-16}}</ref> Unicode Hanja characters with KPS 9566 or KPS 10721 sources are nonetheless cross-referenced to their KPS codes in the [[Unihan]] database with the key <code>kIRG_KPSource</code>; the Unihan source codes use "KP0" to refer to KPS 9566 and "KP1" for KPS 10721.<ref>{{cite web |url=https://www.unicode.org/reports/tr38/tr38-29.html#kIRG_KPSource |at=kIRG_KPSource |title=Unicode Han Database (Unihan) |id=Unicode Standard Annex #38 |date=2020-03-05 |first1=John H. |last1=Jenkins |first2=Richard |last2=Cook |first3=Ken |last3=Lunde |author-link3=Ken Lunde}}</ref>


In 2022, a hanja font was isolated from the North Korean Okpyon [[Android (operating system)|Android]] app, which was used to correct some errors in the KPS-10721-to-Unicode mapping data and to supply new North Korean reference glyphs for the Unicode code charts; while doing so, the mappings of KPS 9566 hanja to KPS 10721 were also deduced.<ref name="utc-l2-22-238"/><ref>{{cite web |title=KPS 10721:2000 (Unicode KP1源) 文件重构 (修订版) |lang=zh-Hans |date=2022-06-19 |first=CheonHyeong |last=Sim |url=http://cheonhyeong.com/PDF/KP1-reconstitution.pdf}}</ref> The existing reference glyphs were updated in Unicode 15 in September 2022,<ref>For example: {{cite web |url=https://unicode.org/charts/PDF/Unicode-15.0/U150-F900.pdf#page=12 |title=CJK Compatibility Ideographs (§ DPRK compatibility ideographs |work=Unicode 15.0 Versioned Charts (delta charts) |publisher=[[Unicode Consortium]] |year=2022}}</ref> while the Unicode Consortium's CJK and Unihan Group recommended in November 2022 that the [[Unicode Technical Committee]] include the additional reference glyphs in the next version of Unicode,<ref>{{cite web |url=https://www.unicode.org/L2/L2022/22247-cjk-unihan-group-utc173.pdf#page=50 |title=35) L2/22-238: Proposal to consider adding CodeCharts support for kIRG_KPSource representative glyphs |work=CJK & Unihan Group Recommendations for UTC #173 Meeting |id=[[Unicode Technical Committee|UTC]] L2/22-247 |last=Lunde |first=Ken |author-link=Ken Lunde |date=2022-11-01}}</ref> to be included in Unicode 15.1 in September 2023.<ref>{{cite web |url=https://www.unicode.org/L2/L2023/23058-irgn2599-activity-rept.pdf |title=US/Unicode Activity Report for IRG #60 |last=Lunde |first=Ken |author-link=Ken Lunde |date=2023-02-07 |id=[[Unicode Technical Committee|UTC]] L2/23-058, [[ISO/IEC JTC 1/SC 2|ISO/IEC JTC1/SC2]]/WG2/[[Ideographic Research Group|IRG]] N2599}}</ref>
In 2022, a Hanja font was isolated from the North Korean Okpyon [[Android (operating system)|Android]] app, which was used to correct some errors in the KPS-10721-to-Unicode mapping data and to supply new North Korean reference glyphs for the Unicode code charts; while doing so, the mappings of KPS 9566 Hanja to KPS 10721 were also deduced.<ref name="utc-l2-22-238"/><ref>{{cite web |title=KPS 10721:2000 (Unicode KP1源) 文件重构 (修订版) |lang=zh-Hans |date=2022-06-19 |first=CheonHyeong |last=Sim |url=http://cheonhyeong.com/PDF/KP1-reconstitution.pdf}}</ref> The existing reference glyphs were updated in Unicode 15 in September 2022,<ref>For example: {{cite web |url=https://unicode.org/charts/PDF/Unicode-15.0/U150-F900.pdf#page=12 |title=CJK Compatibility Ideographs (§ DPRK compatibility ideographs |work=Unicode 15.0 Versioned Charts (delta charts) |publisher=[[Unicode Consortium]] |year=2022}}</ref> while the Unicode Consortium's CJK and Unihan Group recommended in November 2022 that the [[Unicode Technical Committee]] include the additional reference glyphs in the next version of Unicode,<ref>{{cite web |url=https://www.unicode.org/L2/L2022/22247-cjk-unihan-group-utc173.pdf#page=50 |title=35) L2/22-238: Proposal to consider adding CodeCharts support for kIRG_KPSource representative glyphs |work=CJK & Unihan Group Recommendations for UTC #173 Meeting |id=[[Unicode Technical Committee|UTC]] L2/22-247 |last=Lunde |first=Ken |author-link=Ken Lunde |date=2022-11-01}}</ref> to be included in Unicode 15.1 in September 2023.<ref>{{cite web |url=https://www.unicode.org/L2/L2023/23058-irgn2599-activity-rept.pdf |title=US/Unicode Activity Report for IRG #60 |last=Lunde |first=Ken |author-link=Ken Lunde |date=2023-02-07 |id=[[Unicode Technical Committee|UTC]] L2/23-058, [[ISO/IEC JTC 1/SC 2|ISO/IEC JTC1/SC2]]/WG2/[[Ideographic Research Group|IRG]] N2599}}</ref>


== Documentation and relationship to Unicode ==
== Documentation and relationship to Unicode ==
Line 144: Line 145:
In July 2000, the North Korean body wrote to WG2, accusing them of developing both versions of the Unicode encoding for Korean on the basis of South Korean proposals only, without consulting North Korea, accusing them putting the commercial interests of companies and fears of international confusion over respect to North Korea's sovereignty, and stating that North Korea would regard further refusal to change the name and order of the Korean characters in Unicode as an insult to their sovereign dignity and as compromising the [[ISO]]'s claims to impartiality. They re-iterated their demand for WG2 and Unicode to "correct" the order of the Korean characters, and to "correct" the names "Hangul Jamo" and "Hangul Syllable" to "Korean Alphabet" and "Korean Syllable".<ref name="wg2-n2231">{{cite web |url=https://unicode.org/wg2/docs/n2231.pdf |title=DPRK letter on character names and ordering in 10646-1: 2000 |date=2000-07-05 |last=Cho |first=Chun-Hui |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2231}}</ref>
In July 2000, the North Korean body wrote to WG2, accusing them of developing both versions of the Unicode encoding for Korean on the basis of South Korean proposals only, without consulting North Korea, accusing them putting the commercial interests of companies and fears of international confusion over respect to North Korea's sovereignty, and stating that North Korea would regard further refusal to change the name and order of the Korean characters in Unicode as an insult to their sovereign dignity and as compromising the [[ISO]]'s claims to impartiality. They re-iterated their demand for WG2 and Unicode to "correct" the order of the Korean characters, and to "correct" the names "Hangul Jamo" and "Hangul Syllable" to "Korean Alphabet" and "Korean Syllable".<ref name="wg2-n2231">{{cite web |url=https://unicode.org/wg2/docs/n2231.pdf |title=DPRK letter on character names and ordering in 10646-1: 2000 |date=2000-07-05 |last=Cho |first=Chun-Hui |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2231}}</ref>


In August 2000, the North Korean national body submitted a more detailed version of their requests in a series of five consecutive proposals. These requested the addition of 14 additional jamo characters,<ref name="wg2-n2243">{{cite web |url=http://unicode.org/wg2/docs/n2243.pdf |title=Proposal for the addition of 14 Korean alphabets to ISO/IEC 10646-1 |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2243 |date=2000-08-10}}</ref> the addition of 82 symbol characters,<ref name="wg2-n2244">{{cite web |url=http://unicode.org/wg2/docs/n2244.pdf |title=Proposal for the addition of 82 symbols to ISO/IEC 10646-1 |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2244 |date=2000-08-10}}</ref> and the use of the term "Korean alphabet" instead of "Hangul",<ref name="wg2-n2245">{{cite web |url=http://unicode.org/wg2/docs/n2245.pdf |title=Proposal to change the existing name of Korean characters in ISO/IEC 10646-1 |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2245 |date=2000-08-10}}</ref> provided supporting evidence for the North Korean collation order,<ref name="wg2-n2246">{{cite web |url=http://unicode.org/wg2/docs/n2246.pdf |title=Evidence for arrangement of Korean characters proposed by CSK |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2246 |date=2000-08-10}}</ref> and requested addition of the North Korean hanja repertoire.<ref name="wg2-n2247">{{cite web |url=http://unicode.org/wg2/docs/n2247.pdf |title=Proposal to add the hanja column of D. P. R. of Korea in ISO/IEC 10646-1 (14938 ideographs to CJK Unified Ideographs and 3181 ideographs to its Extention &#91;sic&#93; A) |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2247 |date=2000-08-10}}</ref> These proposals were discussed in two meetings between North Korean, [[South Korea]]n, Swedish and other WG2 representatives in September 2000, in which the North Korean body was asked to provide manuscript evidence for the additional jamo characters, to resubmit their symbols proposal with symbols which had already been accepted into Unicode removed, and to consider using [[ISO/IEC 14651]], then at final draft stage, for collation purposes.<ref name="wg2-n2282">{{cite web |url=http://unicode.org/wg2/docs/n2282.doc |date=2000-09-21 |author=Korean script ad hoc group |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2282 |title=Report of the meeting of the Korean script ad hoc group}}</ref>
In August 2000, the North Korean national body submitted a more detailed version of their requests in a series of five consecutive proposals. These requested the addition of 14 additional jamo characters,<ref name="wg2-n2243">{{cite web |url=http://unicode.org/wg2/docs/n2243.pdf |title=Proposal for the addition of 14 Korean alphabets to ISO/IEC 10646-1 |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2243 |date=2000-08-10}}</ref> the addition of 82 symbol characters,<ref name="wg2-n2244">{{cite web |url=http://unicode.org/wg2/docs/n2244.pdf |title=Proposal for the addition of 82 symbols to ISO/IEC 10646-1 |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2244 |date=2000-08-10}}</ref> and the use of the term "Korean alphabet" instead of "Hangul",<ref name="wg2-n2245">{{cite web |url=http://unicode.org/wg2/docs/n2245.pdf |title=Proposal to change the existing name of Korean characters in ISO/IEC 10646-1 |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2245 |date=2000-08-10}}</ref> provided supporting evidence for the North Korean collation order,<ref name="wg2-n2246">{{cite web |url=http://unicode.org/wg2/docs/n2246.pdf |title=Evidence for arrangement of Korean characters proposed by CSK |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2246 |date=2000-08-10}}</ref> and requested addition of the North Korean Hanja repertoire.<ref name="wg2-n2247">{{cite web |url=http://unicode.org/wg2/docs/n2247.pdf |title=Proposal to add the Hanja column of D. P. R. of Korea in ISO/IEC 10646-1 (14938 ideographs to CJK Unified Ideographs and 3181 ideographs to its Extention &#91;sic&#93; A) |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2247 |date=2000-08-10}}</ref> These proposals were discussed in two meetings between North Korean, [[South Korea]]n, Swedish and other WG2 representatives in September 2000, in which the North Korean body was asked to provide manuscript evidence for the additional jamo characters, to resubmit their symbols proposal with symbols which had already been accepted into Unicode removed, and to consider using [[ISO/IEC 14651]], then at final draft stage, for collation purposes.<ref name="wg2-n2282">{{cite web |url=http://unicode.org/wg2/docs/n2282.doc |date=2000-09-21 |author=Korean script ad hoc group |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2282 |title=Report of the meeting of the Korean script ad hoc group}}</ref>


In September 2001, the North Korean national body submitted a revised series of proposals requesting the addition of several KPS 9566 and KPS 10721 characters, including 70 symbol characters, to Unicode.<ref name="wg2-n2374"/><ref name="wg2-n2375">{{citation|mode=cs1 |url=https://unicode.org/wg2/docs/n2375.pdf |title=Proposal to add the 160 Compatibility Hanja code table of D P R of Korea into CJK Compatibility Ideographs |date=2001-09-03 |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2375}}</ref> In this version of the proposal, a section of document excerpts demonstrating use of several characters and short explanations of their purpose was included. The [[Workers' Party of Korea]] symbol was named the "Hammer and Sickle and Brush",<ref name="wg2-n2374"/> renamed from "Mark of the Workers' Party of Korea" in earlier versions of the proposal,<ref name="wg2-n2244" /> and justified as being used as an identifying symbol on maps.<ref name="wg2-n2374"/> As justification for the proposed characters for leaders' names, they explained that the leaders' names often appear with a different size and font weight in North Korean publications for the purpose of emphasis.<ref name="wg2-n2374"/> A follow-up by South Korean WG2 representatives requested evidence, names in Korean and justifications for adding certain of these characters, and noted that non-emphasised versions of the characters for the leaders' names already existed.<ref name="wg2-n2390" /> A meeting of North and South Korean representatives from WG2 was convened in October 2001, which recommended 47 of the symbol characters for adding to Unicode, and suggested that the leaders' names and WPK symbols be raised for further discussion by WG2.<ref name="wg2-n2392">{{citation |mode=cs1 |url=https://unicode.org/wg2/docs/n2392.pdf |title=A Report of Korean Script ad hoc group meeting on Oct. 15, 2001 |date=2001-10-16 |author=Korean Script ad hoc group |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2392, UTC L2/01-388 |access-date=2020-04-29 |archive-date=2020-08-03 |archive-url=https://web.archive.org/web/20200803022727/https://unicode.org/wg2/docs/n2392.pdf |url-status=dead }}</ref>
In September 2001, the North Korean national body submitted a revised series of proposals requesting the addition of several KPS 9566 and KPS 10721 characters, including 70 symbol characters, to Unicode.<ref name="wg2-n2374"/><ref name="wg2-n2375">{{citation|mode=cs1 |url=https://unicode.org/wg2/docs/n2375.pdf |title=Proposal to add the 160 Compatibility Hanja code table of D P R of Korea into CJK Compatibility Ideographs |date=2001-09-03 |author=Committee for Standardization of the D P R of Korea (CSK) |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2375}}</ref> In this version of the proposal, a section of document excerpts demonstrating use of several characters and short explanations of their purpose was included. The [[Workers' Party of Korea]] symbol was named the "Hammer and Sickle and Brush",<ref name="wg2-n2374"/> renamed from "Mark of the Workers' Party of Korea" in earlier versions of the proposal,<ref name="wg2-n2244" /> and justified as being used as an identifying symbol on maps.<ref name="wg2-n2374"/> As justification for the proposed characters for leaders' names, they explained that the leaders' names often appear with a different size and font weight in North Korean publications for the purpose of emphasis.<ref name="wg2-n2374"/> A follow-up by South Korean WG2 representatives requested evidence, names in Korean and justifications for adding certain of these characters, and noted that non-emphasised versions of the characters for the leaders' names already existed.<ref name="wg2-n2390" /> A meeting of North and South Korean representatives from WG2 was convened in October 2001, which recommended 47 of the symbol characters for adding to Unicode, and suggested that the leaders' names and WPK symbols be raised for further discussion by WG2.<ref name="wg2-n2392">{{citation |mode=cs1 |url=https://unicode.org/wg2/docs/n2392.pdf |title=A Report of Korean Script ad hoc group meeting on Oct. 15, 2001 |date=2001-10-16 |author=Korean Script ad hoc group |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2392, UTC L2/01-388 |access-date=2020-04-29 |archive-date=2020-08-03 |archive-url=https://web.archive.org/web/20200803022727/https://unicode.org/wg2/docs/n2392.pdf |url-status=dead }}</ref>
Line 150: Line 151:
A subsequent feedback document from February 2002 regarding the North Korean proposed additions requested that the "tea" symbol for a [[tea house]] be accepted as a more general "hot beverage" symbol, equating it with symbols used in guidebooks to denote hot or non-alcoholic beverages. It also recommended that the reference glyph for the existing codepoint for an umbrella without rain be modified to harmonise with the proposed reference glyph for the umbrella with rain, equating them to the "keep dry" symbols used on packaging, and raised the question of which lightning bolt and high voltage warning symbols in existing symbol collections could be unified with the proposed "high voltage" character.<ref name="utc-L2-02-102">{{cite web |url=https://www.unicode.org/L2/L2002/02102-n2417-dprk.pdf |first=Asmus |last=Freytag |date=2002-02-13 |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2417, UTC L2/02-102 |title=Notes on proposed Symbols from DPRK}}</ref> All three of these characters were accepted into Unicode in version 4.0.<ref name="emojipediaU4.0" /> It also recommended that the horizontal-barred fractions and the left-up pointing scissors be encoded using a [[Variation Selectors|variation selector]], since the scissors did not accompany a differently-oriented pair of scissors, and since the existing Unicode fraction codepoints unified the skewed and horizontal forms.<ref name="utc-L2-02-102"/>
A subsequent feedback document from February 2002 regarding the North Korean proposed additions requested that the "tea" symbol for a [[tea house]] be accepted as a more general "hot beverage" symbol, equating it with symbols used in guidebooks to denote hot or non-alcoholic beverages. It also recommended that the reference glyph for the existing codepoint for an umbrella without rain be modified to harmonise with the proposed reference glyph for the umbrella with rain, equating them to the "keep dry" symbols used on packaging, and raised the question of which lightning bolt and high voltage warning symbols in existing symbol collections could be unified with the proposed "high voltage" character.<ref name="utc-L2-02-102">{{cite web |url=https://www.unicode.org/L2/L2002/02102-n2417-dprk.pdf |first=Asmus |last=Freytag |date=2002-02-13 |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2417, UTC L2/02-102 |title=Notes on proposed Symbols from DPRK}}</ref> All three of these characters were accepted into Unicode in version 4.0.<ref name="emojipediaU4.0" /> It also recommended that the horizontal-barred fractions and the left-up pointing scissors be encoded using a [[Variation Selectors|variation selector]], since the scissors did not accompany a differently-oriented pair of scissors, and since the existing Unicode fraction codepoints unified the skewed and horizontal forms.<ref name="utc-L2-02-102"/>


In November 2002, the South Korean body published a set of three-way tables mapping characters between the KPS 9566, KS X 1001 (as EUC-KR) and ISO/IEC 10646 standards as they existed in 2000. These tables had been prepared without input from North Korea.<ref name="wg2n2564">{{cite web |url=https://unicode.org/wg2/docs/n2564.pdf |last=Kim |first=Kyongsok |title=National Body Position: 3-way cross-reference tables - KS X 1001, KPS 9566, and UCS |date=2002-11-30 |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2564}} [Note: updated links for tables accompanying document: [http://asadal.pusan.ac.kr/~gimgs0/hangeul/code/3xreftbl/ks2kp_ucs-v09.txt] {{Webarchive|url=https://web.archive.org/web/20210403091457/http://asadal.pusan.ac.kr/~gimgs0/hangeul/code/3xreftbl/ks2kp_ucs-v09.txt |date=2021-04-03 }} [http://asadal.pusan.ac.kr/~gimgs0/hangeul/code/3xreftbl/kp2ks_ucs-v09.txt] {{Webarchive|url=https://web.archive.org/web/20210403091419/http://asadal.pusan.ac.kr/%7Egimgs0/hangeul/code/3xreftbl/kp2ks_ucs-v09.txt |date=2021-04-03 }}</ref>
In November 2002, the South Korean body published a set of three-way tables mapping characters between the KPS 9566, KS X 1001 (as EUC-KR) and ISO/IEC 10646 standards as they existed in 2000. These tables had been prepared without input from North Korea.<ref name="wg2n2564">{{cite web |last=Kim |first=Kyongsok |date=2002-11-30 |title=National Body Position: 3-way cross-reference tables - KS X 1001, KPS 9566, and UCS |url=https://unicode.org/wg2/docs/n2564.pdf |id=[[ISO/IEC JTC 1/SC 2]]/WG 2 N2564}} [Note: updated links for tables accompanying document: [http://asadal.pusan.ac.kr/~gimgs0/hangeul/code/3xreftbl/ks2kp_ucs-v09.txt] {{Webarchive|url=https://web.archive.org/web/20210403091457/http://asadal.pusan.ac.kr/~gimgs0/hangeul/code/3xreftbl/ks2kp_ucs-v09.txt|date=2021-04-03}} [http://asadal.pusan.ac.kr/~gimgs0/hangeul/code/3xreftbl/kp2ks_ucs-v09.txt] {{Webarchive|url=https://web.archive.org/web/20210403091419/http://asadal.pusan.ac.kr/%7Egimgs0/hangeul/code/3xreftbl/kp2ks_ucs-v09.txt|date=2021-04-03}}</ref>


In August 2004, a pair of mapping tables between KPS 9566-2003 and [[Unicode]] were submitted to the [[OpenOffice.org]] project by an individual using the name "ooprojlover", who stated that they represented the updated version of the KPS 9566 standard and requested that support be added.<ref name="openoffice" /> These files mapped the characters unavailable in Unicode to the [[Private Use Area]], and included additional encoded forms for other syllable blocks outside of the main ISO-IR-202 plane. A mapping table was later published by the [[Unicode Consortium]] in 2011, based on this mapping data but with errors corrected with reference to the ISO-IR chart.<ref name="kps9566txt"/>
In August 2004, a pair of mapping tables between KPS 9566-2003 and [[Unicode]] were submitted to the [[OpenOffice.org]] project by an individual using the name "ooprojlover", who stated that they represented the updated version of the KPS 9566 standard and requested that support be added.<ref name="openoffice" /> These files mapped the characters unavailable in Unicode to the [[Private Use Area]], and included additional encoded forms for other syllable blocks outside of the main ISO-IR-202 plane. A mapping table was later published by the [[Unicode Consortium]] in 2011, based on this mapping data but with errors corrected with reference to the ISO-IR chart.<ref name="kps9566txt"/>


Copies of [[Red Star OS]] 3.0 include fonts for a more recent edition of KPS 9566, appearing to be KPS 9566-2011. The mapping table used by Red Star OS internally has been successfully extracted. Besides adding [[Kim Jong Un]] to the list of leaders, KPS 9566-2011 amends the mappings of certain vertical forms compared to the 2003 mappings (taking advantage of the [[Vertical Forms]] block added in Unicode 4.1), and also includes several additional hanja and symbols encoded outside of the ISO-IR-202 plane. Several of these additional symbols are also mapped to the Private Use Area; however, their identity is not known, since no names or reference glyphs for those characters are known outside of North Korea.<ref name="utc-L2-18-011">{{cite web |last=Chung |first=Jaemin |url=https://www.unicode.org/L2/L2018/18011-info-kps9566-2011.pdf |id=UTC L2/18-011 |title=Information on the most recent version of KPS 9566 (KPS 9566-2011?) |date=2018-01-05}}</ref>
Copies of [[Red Star OS]] 3.0 include fonts for a more recent edition of KPS 9566, appearing to be KPS 9566-2011. The mapping table used by Red Star OS internally has been successfully extracted. Besides adding [[Kim Jong Un]] to the list of leaders, KPS 9566-2011 amends the mappings of certain vertical forms compared to the 2003 mappings (taking advantage of the [[Vertical Forms]] block added in Unicode 4.1), and also includes several additional Hanja and symbols encoded outside of the ISO-IR-202 plane. Several of these additional symbols are also mapped to the Private Use Area; however, their identity is not known, since no names or reference glyphs for those characters are known outside of North Korea.<ref name="utc-L2-18-011">{{cite web |last=Chung |first=Jaemin |url=https://www.unicode.org/L2/L2018/18011-info-kps9566-2011.pdf |id=UTC L2/18-011 |title=Information on the most recent version of KPS 9566 (KPS 9566-2011?) |date=2018-01-05}}</ref>


=== Impact on Unicode today ===
=== Impact on Unicode today ===
Line 172: Line 173:
The documented mappings between KPS 9566 and Unicode for the 2003<ref name="openoffice" /><ref name="kps9566txt"/> and 2011<ref name="utc-L2-18-011" /> editions of KPS 9566 use an encoding resembling an adaptation of [[Unified Hangul Code]] (UHC) to encode KPS 9566 rather than Wansung code, with their updated versions of the ISO-IR-202 plane being encoded using pairs of bytes between 0xA1 and 0xFE, and with other two-byte codes used for syllables not present in ISO-IR-202. The order of the extended syllables follows usual KPS 9566 order. Similarly to UHC, they use lead bytes 0x81 and above, and trail bytes from the ranges 0x41&ndash;0x5A, 0x61&ndash;0x7A and 0x81&ndash;0xFE, excluding the range 0xA1&ndash;0xFE if the lead byte is 0xA1 or above.<ref name="utc-L2-18-011" />
The documented mappings between KPS 9566 and Unicode for the 2003<ref name="openoffice" /><ref name="kps9566txt"/> and 2011<ref name="utc-L2-18-011" /> editions of KPS 9566 use an encoding resembling an adaptation of [[Unified Hangul Code]] (UHC) to encode KPS 9566 rather than Wansung code, with their updated versions of the ISO-IR-202 plane being encoded using pairs of bytes between 0xA1 and 0xFE, and with other two-byte codes used for syllables not present in ISO-IR-202. The order of the extended syllables follows usual KPS 9566 order. Similarly to UHC, they use lead bytes 0x81 and above, and trail bytes from the ranges 0x41&ndash;0x5A, 0x61&ndash;0x7A and 0x81&ndash;0xFE, excluding the range 0xA1&ndash;0xFE if the lead byte is 0xA1 or above.<ref name="utc-L2-18-011" />


The 2011 edition also includes several additional hanja and symbols encoded outside of the ISO-IR-202 plane, after the range used for the extended syllable blocks.<ref name="utc-L2-18-011" /> This approach is similar to that taken by [[GBK (character encoding)|GBK]], but with the trail bytes remaining in the UHC-style ranges: like the extended syllables with lead bytes 0xA1 and above, these all use the trail byte ranges 0x41&ndash;0x5A, 0x61&ndash;0x7A and 0x81&ndash;0xA0. Extended hanja are encoded with lead bytes between 0xC8 and 0xDC, extended symbols are encoded using lead bytes between 0xE0 and 0xEA, and extended codes with lead bytes between 0xEC and 0xFE are mapped, without gaps, to the [[Private Use Area]]<ref name="utc-L2-18-011" /> (compare the user-defined ranges in GBK). Several of the characters in the extended symbols section and three in the hanja section are also mapped to the Unicode Private Use Area; unlike the PUA-mapped symbols in the main ISO-IR-202 plane, the identity of these characters is unknown.<ref name="utc-L2-18-011" />
The 2011 edition also includes several additional Hanja and symbols encoded outside of the ISO-IR-202 plane, after the range used for the extended syllable blocks.<ref name="utc-L2-18-011" /> This approach is similar to that taken by [[GBK (character encoding)|GBK]], but with the trail bytes remaining in the UHC-style ranges: like the extended syllables with lead bytes 0xA1 and above, these all use the trail byte ranges 0x41&ndash;0x5A, 0x61&ndash;0x7A and 0x81&ndash;0xA0. Extended Hanja are encoded with lead bytes between 0xC8 and 0xDC, extended symbols are encoded using lead bytes between 0xE0 and 0xEA, and extended codes with lead bytes between 0xEC and 0xFE are mapped, without gaps, to the [[Private Use Area]]<ref name="utc-L2-18-011" /> (compare the user-defined ranges in GBK). Several of the characters in the extended symbols section and three in the Hanja section are also mapped to the Unicode Private Use Area; unlike the PUA-mapped symbols in the main ISO-IR-202 plane, the identity of these characters is unknown.<ref name="utc-L2-18-011" />


== Lead byte ==
== Lead byte ==
This chart details the overall layout of the main plane of the KPS 9566 character set by lead byte.<ref name="ir202"/> For lead bytes used for characters other than composed chosŏn'gŭl syllables or hanja, links are provided to charts on this page listing the characters encoded under that lead byte. For lead bytes used for hanja, links are provided to the appropriate section of [[Wiktionary]]'s hanja index.
This chart details the overall layout of the main plane of the KPS 9566 character set by lead byte.<ref name="ir202"/> For lead bytes used for characters other than composed Chosŏn'gŭl syllables or Hanja, links are provided to charts on this page listing the characters encoded under that lead byte. For lead bytes used for Hanja, links are provided to the appropriate section of [[Wiktionary]]'s Hanja index.


Where two hexadecimal numbers are given, the value below 0x7F is used in a 7-bit encoding,{{efn|For instance, the headings of the ISO-IR-202 chart show 7-bit binary codes, as well as kuten/hang-yol codes, for the characters).<ref name="ir202"/>}} and the larger value (between 0xA1 and 0xFE) is used in an 8-bit [[Extended Unix Code|EUC]]-style encoding.<ref name="lunde2009-euc"/> The extended [[Unified Hangul Code|UHC]]-style 8-bit encodings defined by the 2003 edition onwards likewise use the larger byte values, between 0xA1 and 0xFE inclusive, for the main ISO-IR-202-based plane.<ref name="kps9566txt"/><ref name="utc-L2-18-011" />
Where two hexadecimal numbers are given, the value below 0x7F is used in a 7-bit encoding,{{efn|For instance, the headings of the ISO-IR-202 chart show 7-bit binary codes, as well as kuten/hang-yol codes, for the characters).<ref name="ir202"/>}} and the larger value (between 0xA1 and 0xFE) is used in an 8-bit [[Extended Unix Code|EUC]]-style encoding.<ref name="lunde2009-euc"/> The extended [[Unified Hangul Code|UHC]]-style 8-bit encodings defined by the 2003 edition onwards likewise use the larger byte values, between 0xA1 and 0xFE inclusive, for the main ISO-IR-202-based plane.<ref name="kps9566txt"/><ref name="utc-L2-18-011" />
Line 512: Line 513:
|{{chset-cell1|u=2756|2-80 U+2756 BLACK DIAMOND MINUS WHITE X|[[❖]]}}
|{{chset-cell1|u=2756|2-80 U+2756 BLACK DIAMOND MINUS WHITE X|[[❖]]}}
|{{chset-cell1|2-81 U+F10D Private Use|{{lang|ko-KP|[[File:KPS 9566 02-81.svg|12px|alt=]]}}|fn={{efn|name=triangle|1=Mapped to Private Use Area, shown here using an image.}}{{efn|[[x-mac-korean|Mac OS Korean]] (HangulTalk), an encoding of [[KS X 1001|Wansung code]] plus extension sets, encodes a visually similar character at 0xA79B,<ref name="lunde2009appE">{{citation|mode=cs1 |title=Appendix E: Vendor Character Set Standards |work=CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing |last=Lunde |first=Ken |authorlink=Ken Lunde |year=2009 |edition=2nd |publisher=[[O'Reilly Media|O'Reilly]] |location=[[Sebastopol, CA]] |isbn=978-0-596-51447-1 |url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appE.pdf}}</ref> which Apple maps to the Unicode sequence U+25B4+20E4 (▴⃤).<ref>{{cite web |url=https://unicode.org/Public/MAPPINGS/VENDORS/APPLE/KOREAN.TXT |author=Apple |author-link=Apple, Inc |title=Map (external version) from Mac OS Korean encoding to Unicode 3.2 and later |date=2005-04-05 |publisher=[[Unicode Consortium]]}}</ref> There is no documented use of this mapping for the KPS 9566 character, however.}}|style=background:#FEE}}
|{{chset-cell1|2-81 U+F10D Private Use|{{lang|ko-KP|[[File:KPS 9566 02-81.svg|12px|alt=]]}}|fn={{efn|name=triangle|1=Mapped to Private Use Area, shown here using an image.}}{{efn|[[x-mac-korean|Mac OS Korean]] (HangulTalk), an encoding of [[KS X 1001|Wansung code]] plus extension sets, encodes a visually similar character at 0xA79B,<ref name="lunde2009appE">{{citation|mode=cs1 |title=Appendix E: Vendor Character Set Standards |work=CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing |last=Lunde |first=Ken |authorlink=Ken Lunde |year=2009 |edition=2nd |publisher=[[O'Reilly Media|O'Reilly]] |location=[[Sebastopol, CA]] |isbn=978-0-596-51447-1 |url=https://resources.oreilly.com/examples/9780596514471/blob/master/cjkvip2e-appE.pdf}}</ref> which Apple maps to the Unicode sequence U+25B4+20E4 (▴⃤).<ref>{{cite web |url=https://unicode.org/Public/MAPPINGS/VENDORS/APPLE/KOREAN.TXT |author=Apple |author-link=Apple, Inc |title=Map (external version) from Mac OS Korean encoding to Unicode 3.2 and later |date=2005-04-05 |publisher=[[Unicode Consortium]]}}</ref> There is no documented use of this mapping for the KPS 9566 character, however.}}|style=background:#FEE}}
|{{chset-cell1|u=1CC81|2-82 U+1CC81 STRIPED UP-POINTING TRIANGLE|[[File:KPS 9566 02-82.svg|12px|alt=𜲁]]|fn={{efn|Accepted for inclusion in Unicode 16.0.<ref>{{cite web |url=https://www.unicode.org/charts/PDF/Unicode-16.0/U160-1CC00.pdf |access-date=2024-05-27 |title=Symbols for Legacy Computing Supplement |work=DRAFT The Unicode Standard, Version 16.0 BETA REVIEW |institution=[[Unicode Consortium]]}}</ref>}}}}
|{{chset-cell1|2-82 U+F10E Private Use|{{lang|ko-KP|[[File:KPS 9566 02-82.svg|12px|alt=]]}}|fn={{efn|name=triangle}}|style=background:#FEE}}
|{{chset-cell1|2-83 U+F10F Private Use|{{lang|ko-KP|[[File:KPS 9566 02-83.svg|12px|alt=]]}}|fn={{efn|name=triangle}}|style=background:#FEE}}
|{{chset-cell1|2-83 U+F10F Private Use|{{lang|ko-KP|[[File:KPS 9566 02-83.svg|12px|alt=]]}}|fn={{efn|name=triangle}}|style=background:#FEE}}
|{{chset-cell1|2-84 U+F110 Private Use|{{lang|ko-KP|[[File:KPS 9566 02-84.svg|12px|alt=]]}}|fn={{efn|name=triangle}}|style=background:#FEE}}
|{{chset-cell1|2-84 U+F110 Private Use|{{lang|ko-KP|[[File:KPS 9566 02-84.svg|12px|alt=]]}}|fn={{efn|name=triangle}}|style=background:#FEE}}
Line 1,733: Line 1,734:
This row is omitted from the mapping for the 2011 edition of the standard,<ref name="utc-L2-18-011" /> indicating it may have been removed at some point after the 2003 edition. The halfwidth yen sign is instead encoded at [[#ext0xE9|0xE98E]] in the 2011 edition.<ref name="utc-L2-18-011" />
This row is omitted from the mapping for the 2011 edition of the standard,<ref name="utc-L2-18-011" /> indicating it may have been removed at some point after the 2003 edition. The halfwidth yen sign is instead encoded at [[#ext0xE9|0xE98E]] in the 2011 edition.<ref name="utc-L2-18-011" />


The [[required space]] would fall outside of the 94-character range, colliding with the area used for extended chosŏn'gŭl syllables when a [[Unified Hangul Code|UHC]]-style encoding is used (specifically, with the syllable 쁲),<ref name="kps9566txt" /> and is omitted. Although the [[ÿ|y with trema]] also falls outside the 94-character range, and the trail byte 0xFF is otherwise unused, the code 0xAEFF is mapped to it in KPS 9566-2003.<ref name="kps9566txt" />
The [[required space]] would fall outside of the 94-character range, colliding with the area used for extended Chosŏn'gŭl syllables when a [[Unified Hangul Code|UHC]]-style encoding is used (specifically, with the syllable 쁲),<ref name="kps9566txt" /> and is omitted. Although the [[ÿ|y with trema]] also falls outside the 94-character range, and the trail byte 0xFF is otherwise unused, the code 0xAEFF is mapped to it in KPS 9566-2003.<ref name="kps9566txt" />


{|{{chset-table-header1|KPS 9566-2003 (prefixed with 0x2E/0xAE)}}
{|{{chset-table-header1|KPS 9566-2003 (prefixed with 0x2E/0xAE)}}
Line 4,941: Line 4,942:
|colspan="15" style="background:#DDD;font-size:85%"|(user-defined area)
|colspan="15" style="background:#DDD;font-size:85%"|(user-defined area)
|{{chset-cell1|||style=background:#DDD}}
|{{chset-cell1|||style=background:#DDD}}
|}

===Statistics by jamo===
{|
| style="vertical-align:top"|
; Initial consonants
{| class="wikitable sortable"
!Jamo!!Count
|-
|ㄱ||186
|-
|ㄴ||157
|-
|ㄷ||151
|-
|ㄹ||144
|-
|ㅁ||151
|-
|ㅂ||151
|-
|ㅅ||177
|-
|ㅈ||155
|-
|ㅊ||125
|-
|ㅋ||122
|-
|ㅌ||114
|-
|ㅍ||113
|-
|ㅎ||153
|-
|ㄲ||138
|-
|ㄸ||104
|-
|ㅃ||85
|-
|ㅆ||119
|-
|ㅉ||111
|-
|ㅇ||223
|-
!Total!!2679
|}
| style="vertical-align:top"|
; Vowels
{| class="wikitable sortable"
!Jamo!!Count
|-
|ㅏ||255
|-
|ㅑ||101
|-
|ㅓ||232
|-
|ㅕ||144
|-
|ㅗ||200
|-
|ㅛ||80
|-
|ㅜ||184
|-
|ㅠ||98
|-
|ㅡ||185
|-
|ㅣ||195
|-
|ㅐ||176
|-
|ㅒ||30
|-
|ㅔ||168
|-
|ㅖ||52
|-
|ㅚ||115
|-
|ㅟ||111
|-
|ㅢ||59
|-
|ㅘ||102
|-
|ㅝ||76
|-
|ㅙ||58
|-
|ㅞ||58
|-
!Total!!2679
|}
|
; Final consonants
{| class="wikitable sortable"
!Jamo!!Count
|-
|(none)||391
|-
|ㄱ||226
|-
|ㄳ||7
|-
|ㄴ||317
|-
|ㄵ||3
|-
|ㄶ||10
|-
|ㄷ||51
|-
|ㄹ||288
|-
|ㄺ||26
|-
|ㄻ||50
|-
|ㄼ||11
|-
|ㄽ||3
|-
|ㄾ||5
|-
|ㄿ||4
|-
|ㅀ||15
|-
|ㅁ||250
|-
|ㅂ||233
|-
|ㅄ||3
|-
|ㅅ||224
|-
|ㅇ||264
|-
|ㅈ||29
|-
|ㅊ||18
|-
|ㅋ||5
|-
|ㅌ||31
|-
|ㅍ||40
|-
|ㅎ||26
|-
|ㄲ||16
|-
|ㅆ||133
|-
!Total!!2679
|}
|}
|}


Line 4,946: Line 5,108:
{{hatnote|See [[wikt:Appendix:Korean Hanja by KPS 9566 hangyol code|Appendix:Korean Hanja by KPS 9566 hangyol code]] on Wiktionary.}}
{{hatnote|See [[wikt:Appendix:Korean Hanja by KPS 9566 hangyol code|Appendix:Korean Hanja by KPS 9566 hangyol code]] on Wiktionary.}}


The hanja at 69-09 (0xE5A9) is mapped to U+676E {{linktext|杮|lang=zh}} ([[wood shavings]]) in all documented tables; characters are, however ordered according to their readings, from which it appears that it is intended to be U+67FF {{linktext|柿|lang=zh}} ([[persimmon]]) instead.<ref>{{cite web |last=Chung |first=Jaemin |url=https://www.unicode.org/L2/L2021/21059-irgn2479-mapping.pdf |id=UTC L2/21-059 |title=KP0-E5A9 should be mapped to U+67FF instead of U+676E |date=2021-03-17}}</ref>
The Hanja at 69-09 (0xE5A9) is mapped to U+676E {{linktext|杮|lang=zh}} in all documented tables; characters are, however ordered according to their readings, from which it appears that it is intended to be U+67FF {{linktext|柿|lang=zh}} instead.<ref>{{cite web |last=Chung |first=Jaemin |url=https://www.unicode.org/L2/L2021/21059-irgn2479-mapping.pdf |id=UTC L2/21-059 |title=KP0-E5A9 should be mapped to U+67FF instead of U+676E |date=2021-03-17}}</ref>


== Extended non-syllable, non-hanja sets in KPS 9566:2011 ==
== Extended non-syllable, non-Hanja sets in KPS 9566-2011 ==
Following are charts for the non-syllable, non-hanja section of KPS 9566-2011 outside of the main plane.<ref name="utc-L2-18-011" />
Following are charts for the non-syllable, non-Hanja section of KPS 9566-2011 outside of the main plane.<ref name="utc-L2-18-011" />


=== {{anchor|ext0xE0}}Extension set 0xE0 (symbols and pictographs) ===
=== {{anchor|ext0xE0}}Extension set 0xE0 (symbols and pictographs) ===
Line 5,774: Line 5,936:


== External links ==
== External links ==
* [https://www.itscj-ipsj.jp/ir/202.pdf KPS 9566-97 code table] {{Webarchive|url=https://web.archive.org/web/20220310001705/https://www.itscj-ipsj.jp/ir/202.pdf |date=2022-03-10 }} from [[ISO-IR]] registry
* [https://itscj.ipsj.or.jp/ir/202.pdf KPS 9566-97 code table] from [[ISO-IR]] registry
* [http://asadal.pusan.ac.kr/~gimgs0/hangeul/code/3xreftbl/kp2ks_ucs-v09.txt Three-way mappings between EUC-KP (KPS 9566), EUC-KR and Unicode as of 2000] {{Webarchive|url=https://web.archive.org/web/20210403091419/http://asadal.pusan.ac.kr/%7Egimgs0/hangeul/code/3xreftbl/kp2ks_ucs-v09.txt |date=2021-04-03 }} (file in EUC-KR; note typographical error mapping 0xA1BA to {{unichar|3090}} rather than {{unichar|309C}})
* [http://asadal.pusan.ac.kr/~gimgs0/hangeul/code/3xreftbl/kp2ks_ucs-v09.txt Three-way mappings between EUC-KP (KPS 9566), EUC-KR and Unicode as of 2000] {{Webarchive|url=https://web.archive.org/web/20210403091419/http://asadal.pusan.ac.kr/%7Egimgs0/hangeul/code/3xreftbl/kp2ks_ucs-v09.txt |date=2021-04-03 }} (file in EUC-KR; note error mapping 0xB8F0 to {{unichar|BCD2}} instead of {{unichar|BCD3}})
* [https://unicode.org/Public/MAPPINGS/VENDORS/MISC/KPS9566.TXT KPS 9566-2003 to Unicode mapping]
* [https://unicode.org/Public/MAPPINGS/VENDORS/MISC/KPS9566.TXT KPS 9566-2003 to Unicode mapping]
* [https://www.unicode.org/L2/L2018/18011-info-kps9566-2011.pdf KPS 9566-2011 code table and mapping] reverse engineered from [[Red Star OS]]
* [https://www.unicode.org/L2/L2018/18011-info-kps9566-2011.pdf KPS 9566-2011 code table and mapping] reverse engineered from [[Red Star OS]]

Latest revision as of 17:34, 15 November 2024

KPS 9566
Alias(es)ISO-IR-202 (1997 version)
Language(s)
Partial support:
StandardKPS 9566
Current statusUsed only in North Korea.
Classification
Encoding formats
Other related encoding(s)Other ISO 2022 Chosŏn'gŭl DBCSes:
Other ISO 2022 CJK DBCSes:

KPS 9566 ("DPRK Standard Korean Graphic Character Set for Information Interchange")[2] is a North Korean standard specifying a character encoding for the Chosŏn'gŭl (Hangul) writing system used for the Korean language. The edition of 1997 specified an ISO 2022-compliant 94×94 two-byte coded character set. Subsequent editions have added additional encoded characters outside of the 94×94 plane, in a manner comparable to UHC or GBK.[3]

KPS 9566 differs in approach from KS X 1001, its South Korean counterpart, in using a different ordering of Chosŏn'gŭl,[4] in encoding explicit vertical presentation forms of punctuation, in not encoding duplicate Hanja for multiple readings, and in including several characters specific to the North Korean political system, including special encodings for the names of the country's past and present leaders (Kim Il Sung, Kim Jong Il and Kim Jong Un).[1][2][3][5]

Although KPS 9566 was the original source of several characters added to Unicode,[6] not all KPS 9566 characters have Unicode equivalents. Those which do not are mapped to similar Unicode characters or to the Private Use Area.[7]

Background and other standards

[edit]

The ASCII character set originated in the United States in 1963, and was revised in 1967 to the form it has today.[8] ASCII also became accepted as an international standard in 1967, becoming ECMA-6,[8] designated ISO/IEC 646 by the International Organization for Standardization.[9] It is presently designated ANSI X3.4-1986 and ISO 646:1991.[10] ASCII was a 7-bit, single-byte encoding including 94 graphical characters, the space, and 33 control codes, which provided basic support for representing American English text as a series of bytes.[8][10]

The next edition of ISO 646, published in 1972, revised the standard to introduce the concept of national versions of the code, allowing countries to replace a few less commonly used codes with their own required characters. At the same time, work on defining extension mechanisms for ASCII was underway, with the intention of being applicable to both 7-bit and 8-bit environments. This was completed in 1973 and published as JIS X 0202, ECMA-35 and ISO 2022.[11] ISO 2022 specifies mechanisms for using single-byte and multiple-byte character sets with a certain structure in both 7-bit and 8-bit environments, and for declaring and switching between them in a standard fashion using shift codes and escape sequences.[12]

Countries in East Asia, due to using large repertoires of Chinese characters, introduced standardised double-byte encodings (DBCS) for their writing systems, since the number of characters representable in a single-byte code was not sufficient. In an ISO 2022 compliant DBCS, every character can be represented with two ASCII printing character bytes; the location of a character can be referenced by these byte values, or by two numbers from 1 to 94 (a kuten), equal to the respective bytes minus 32.[13] The first registered ISO 2022 compliant DBCS, and the first East Asian DBCS to be established as a national standard, was the first edition of JIS X 0208 (Japan), published in 1978.[14][15] This was followed by GB 2312 (Mainland China) in 1980, and by Wansung code (South Korea; first designated KS C 5601-1987) in 1987.[16][15] Big5 (Taiwan), defined in 1984, did not follow the ISO 2022 structure.[16] When used in an 8-bit (rather than 7-bit) environment, GB 2312 and Wansung code were usually used with the eighth bit set, with ASCII or a similar SBCS used with the eighth bit unset; these encoding schemes are known as EUC-CN and EUC-KR, respectively.[17]

Although the Korean writing system includes individual symbols (jamo) for consonants and vowels, serving as an alphabet, Korean text is properly typeset with these symbols composed into blocks for each syllable. Wansung code included individual Korean syllable blocks separately, treating them as a large set of characters similarly to Hanja,[18] and was first defined by the third edition of the South Korean standard KS C 5601. The first edition had defined an encoding of individual jamo which allowed syllable blocks to be encoded as sequences, which was named N-byte Hangul, and had not been adopted as widely as intended.[19][20]

Wansung code did not encode all possible modern Korean syllables, only a selection of the 2350 most common,[2] although it allowed them to be specified using combining sequences, which often were not supported.[18] An alternative encoding, also South Korean, named Johab did, and served as a competitor to Wansung for some time.[19] Unified Hangul Code (UHC), introduced by Microsoft with Windows 95, extended EUC-KR, allowing the use of invalid EUC double-byte codes to represent all other syllables available in Johab.[18] A similar approach was taken by the Mainland Chinese GBK encoding, extending GB 2312 with support for Traditional Chinese and for less common Chinese characters by encoding them to double-byte codes invalid in EUC-CN.[16]

South Korea was not the only country developing an ISO 2022 DBCS for Korean: the Mainland Chinese GB 12052 was published in 1989. This was not closely related to Wansung code, although it also included composed syllables. Instead, it corresponded to GB 2312 with Korean syllables (and 94 Hanja) replacing the Chinese characters, except for the inclusion of a dollar sign in place of a yuan sign. It was developed for use by the Korean minority in north-eastern China.[2]

Likewise, North Korea developed KPS 9566. Although North Korea and South Korea both use Korean Chosŏn'gŭl (Hangul) as their primary writing system, they use different lexicographical orders.[21] Hence, character ordering differs between Wansung code and KPS 9566.[4]

KPS 9566 has undergone several revisions, including editions of 1997 and 2003,[22] mainly to enhance compatibility with Unicode. These are commonly indicated by specifying the year (e.g. KPS 9566-97, 9566-2003). The current edition as of the release of Red Star OS 3.0 appears to be KPS 9566-2011, which adds Kim Jong Un to the list of leaders.[3] The publicly available code chart for the 1997 edition of KPS 9566 shows a ISO 2022 94×94 plane.[23] The more recent editions, from what sources of information are available outside of North Korea itself, appear to define additional allocations outside of the EUC plane (similarly to GBK or UHC).[3]

Due to the interoperability issues arising from the use of multiple national standard and platform- or font-specific proprietary character encodings, the Unicode standard was developed with the intent of allowing all representable text to be interchanged in a single, universal format. The first edition of Unicode was published in 1991 and 1992,[24] and ISO/IEC 10646 was established in sync with Unicode in 1993.[25] Unicode formats are preferred for international use on the World Wide Web, where legacy character encodings are treated as partial encodings of Unicode by means of mapping files.[26][27]

Design

[edit]

In principle, KPS 9566 is similar to the Wansung character set defined by the South Korean KS X 1001 standard, although the two are not compatible. Both encode a section of punctuation, symbols, jamo, kana and alphabetical characters, followed by a subset of the possible modern Chosŏn'gŭl syllables, followed by a section of Hanja.[2] However, KPS 9566 uses a different ordering of jamo and syllables to conform with North Korean lexicographical ordering standards.[4] KPS 9566 also includes 28 explicitly rotated punctuation characters for vertical typography, which KS X 1001 does not, and encodes each Hanja only once, whereas KS X 1001 encodes several Hanja with multiple readings multiple times.[2]

KPS 9566-97 encodes a total of 2679 Chosŏn'gŭl syllables and 4653 Hanja. This provides better coverage than the 2350 syllables encoded by Wansung code: for instance, the 똠 character used in the name of 똠방각하, a noted Korean literary work, does not have an assigned Wansung codepoint, but has one (38-02) in KPS 9566.[2] The Hanja section includes 4652 characters from the Unified Repertoire and Ordering and one from CJK Unified Ideographs Extension A. The entirety of row 15, the latter half of row 44 (after the syllables block) and the latter half of row 94 (after the Hanja block) may be used for user-defined purposes.[23][2]

KPS 9566 is especially distinguished by its inclusion of several special characters from North Korean political life. Specifically, it includes the hammer, sickle and brush emblem of the Workers' Party of Korea, both uncircled and circled[7] (code points 12-01 and 12-02),[23] and two groups of three special-purpose characters which spell out the names of the North Korean leaders Kim Il Sung (김일성) and Kim Jong Il (김정일) in a special decorative font (code points 04-72 to 04-74 and 04-75 to 04-77, respectively).[28] The syllables for Kim and Il, which are identical in the spelling of both names, are encoded twice. KPS 9566-2011 additionally includes the name of Kim Jong Un (김정은) as code points 04-78 to 04-80.[3][5]

Due to these special characters, there is currently no full round-trip compatibility between KPS 9566 and Unicode, unless unsupported characters are mapped to the Private Use Area.[1]

KPS 10721

[edit]

North Korea also developed a second character set, KPS 10721 "Code of the supplementary Korean Hanja Set for Information Interchange", which was published in 2000. KPS 10721 encodes a set of at least 19469 Hanja[2] additional to those included in KPS 9566. As of 2009, these did not all have mappings to Unicode, but included 10358 from the Unified Repertoire and Ordering, 3187 from CJK Unified Ideographs Extension A and 107 from CJK Compatibility Ideographs (all in the Basic Multilingual Plane), as well as 5767 from CJK Unified Ideographs Extension B and 50 from CJK Compatibility Ideographs Supplement (in the Supplementary Ideographic Plane).[2] All KPS 9566 Hanja are also included in KPS 10721,[29] which uses a different encoding structure, unrelated to ISO 2022.

Besides the mapping of these Hanja (excluding those also in KPS 9566)[29] to Unicode, little was known about the KPS 10721 standard outside of North Korea[2][5] prior to 2022. North Korean reference glyphs were provided for only a subset of these Hanja in the Unicode code charts, due to a lack of suitable font data available to the Unicode Consortium.[30][29] Unicode Hanja characters with KPS 9566 or KPS 10721 sources are nonetheless cross-referenced to their KPS codes in the Unihan database with the key kIRG_KPSource; the Unihan source codes use "KP0" to refer to KPS 9566 and "KP1" for KPS 10721.[31]

In 2022, a Hanja font was isolated from the North Korean Okpyon Android app, which was used to correct some errors in the KPS-10721-to-Unicode mapping data and to supply new North Korean reference glyphs for the Unicode code charts; while doing so, the mappings of KPS 9566 Hanja to KPS 10721 were also deduced.[29][32] The existing reference glyphs were updated in Unicode 15 in September 2022,[33] while the Unicode Consortium's CJK and Unihan Group recommended in November 2022 that the Unicode Technical Committee include the additional reference glyphs in the next version of Unicode,[34] to be included in Unicode 15.1 in September 2023.[35]

Documentation and relationship to Unicode

[edit]

Unicode's initial coverage of Korean syllables, added in version 1.0, was based on Wansung code. In Unicode version 2.0, a new block of Korean syllables (the present Hangul Syllables block) was added, based on the syllable repertoire available in Johab, and the previous block was deleted (it is now occupied by CJK Unified Ideographs Extension A). This was done under the assumption that no Unicode-encoded Korean data existed yet, but became known as the "Korean mess", and the responsible committees pledged not to make such an incompatible change in the future,[36] a pledge codified by the Unicode Stability Policy.[37]

The code chart for KPS 9566-97, published April 1997,[2] was submitted to the ISO International Register of Coded Character Sets for registration for use with ISO/IEC 2022. It was registered in June 1998 with the number ISO-IR-202. This code chart is publicly available from the Information Processing Society of Japan.[23]

In August 1999, the North Korean national body submitted a document to WG2 (ISO/IEC JTC 1/SC 2 Working Group 2), the ISO body responsible for ISO/IEC 10646, the international standard corresponding to Unicode. This document requested the addition of the KPS 9566 codes to the existing cross-references from the CJK Unified Ideographs charts, the addition of 80 symbol characters from KPS 9566 which did not have existing Unicode mappings, a resolution to the difference in collation order between KPS 9566 and Unicode (due to the order of the characters in Unicode following the South Korean encodings) and the addition of 8 combining jamo. It also requested for WG2 to edit the existing Unicode character and block names to use the term "Korean character" rather than "Hangul".[38] An expanded version of this proposal, broken into several documents, was submitted as a work item in December 1999.[39]

A detailed response was submitted by the Swedish representative in March 2000, opposing several of the points and elaborating on Sweden's vote against the proposal. This response stated that changing the encoding of the Korean characters again would cause major disruption, even more so than the first time, which was done when comparatively few implementations existed, but which in retrospect should not have been done. It explained that that few or no languages can be collated correctly by code point value, and that a tailoring for the Unicode Collation Algorithm or ISO/IEC 14651 (then being drafted) should be used for that purpose, and that normative names of characters already assigned cannot be changed, due to the stability policy, although non-normative translations to other languages can be employed. It suggested that a machine-readable mapping file between Unicode and KPS 9566 could be provided by the North Korean body itself, and would be more useful than a printed cross-reference in the standard document. Regarding the proposed additional characters, the response stated that characters which would have compatibility decompositions in Unicode should not be added and that logos, including those of political parties, and special characters for names of particular people should not be added.[40]

In July 2000, the North Korean body wrote to WG2, accusing them of developing both versions of the Unicode encoding for Korean on the basis of South Korean proposals only, without consulting North Korea, accusing them putting the commercial interests of companies and fears of international confusion over respect to North Korea's sovereignty, and stating that North Korea would regard further refusal to change the name and order of the Korean characters in Unicode as an insult to their sovereign dignity and as compromising the ISO's claims to impartiality. They re-iterated their demand for WG2 and Unicode to "correct" the order of the Korean characters, and to "correct" the names "Hangul Jamo" and "Hangul Syllable" to "Korean Alphabet" and "Korean Syllable".[4]

In August 2000, the North Korean national body submitted a more detailed version of their requests in a series of five consecutive proposals. These requested the addition of 14 additional jamo characters,[41] the addition of 82 symbol characters,[42] and the use of the term "Korean alphabet" instead of "Hangul",[43] provided supporting evidence for the North Korean collation order,[21] and requested addition of the North Korean Hanja repertoire.[44] These proposals were discussed in two meetings between North Korean, South Korean, Swedish and other WG2 representatives in September 2000, in which the North Korean body was asked to provide manuscript evidence for the additional jamo characters, to resubmit their symbols proposal with symbols which had already been accepted into Unicode removed, and to consider using ISO/IEC 14651, then at final draft stage, for collation purposes.[45]

In September 2001, the North Korean national body submitted a revised series of proposals requesting the addition of several KPS 9566 and KPS 10721 characters, including 70 symbol characters, to Unicode.[46][47] In this version of the proposal, a section of document excerpts demonstrating use of several characters and short explanations of their purpose was included. The Workers' Party of Korea symbol was named the "Hammer and Sickle and Brush",[46] renamed from "Mark of the Workers' Party of Korea" in earlier versions of the proposal,[42] and justified as being used as an identifying symbol on maps.[46] As justification for the proposed characters for leaders' names, they explained that the leaders' names often appear with a different size and font weight in North Korean publications for the purpose of emphasis.[46] A follow-up by South Korean WG2 representatives requested evidence, names in Korean and justifications for adding certain of these characters, and noted that non-emphasised versions of the characters for the leaders' names already existed.[48] A meeting of North and South Korean representatives from WG2 was convened in October 2001, which recommended 47 of the symbol characters for adding to Unicode, and suggested that the leaders' names and WPK symbols be raised for further discussion by WG2.[49]

A subsequent feedback document from February 2002 regarding the North Korean proposed additions requested that the "tea" symbol for a tea house be accepted as a more general "hot beverage" symbol, equating it with symbols used in guidebooks to denote hot or non-alcoholic beverages. It also recommended that the reference glyph for the existing codepoint for an umbrella without rain be modified to harmonise with the proposed reference glyph for the umbrella with rain, equating them to the "keep dry" symbols used on packaging, and raised the question of which lightning bolt and high voltage warning symbols in existing symbol collections could be unified with the proposed "high voltage" character.[50] All three of these characters were accepted into Unicode in version 4.0.[51] It also recommended that the horizontal-barred fractions and the left-up pointing scissors be encoded using a variation selector, since the scissors did not accompany a differently-oriented pair of scissors, and since the existing Unicode fraction codepoints unified the skewed and horizontal forms.[50]

In November 2002, the South Korean body published a set of three-way tables mapping characters between the KPS 9566, KS X 1001 (as EUC-KR) and ISO/IEC 10646 standards as they existed in 2000. These tables had been prepared without input from North Korea.[52]

In August 2004, a pair of mapping tables between KPS 9566-2003 and Unicode were submitted to the OpenOffice.org project by an individual using the name "ooprojlover", who stated that they represented the updated version of the KPS 9566 standard and requested that support be added.[22] These files mapped the characters unavailable in Unicode to the Private Use Area, and included additional encoded forms for other syllable blocks outside of the main ISO-IR-202 plane. A mapping table was later published by the Unicode Consortium in 2011, based on this mapping data but with errors corrected with reference to the ISO-IR chart.[1]

Copies of Red Star OS 3.0 include fonts for a more recent edition of KPS 9566, appearing to be KPS 9566-2011. The mapping table used by Red Star OS internally has been successfully extracted. Besides adding Kim Jong Un to the list of leaders, KPS 9566-2011 amends the mappings of certain vertical forms compared to the 2003 mappings (taking advantage of the Vertical Forms block added in Unicode 4.1), and also includes several additional Hanja and symbols encoded outside of the ISO-IR-202 plane. Several of these additional symbols are also mapped to the Private Use Area; however, their identity is not known, since no names or reference glyphs for those characters are known outside of North Korea.[3]

Impact on Unicode today

[edit]

Several current Unicode characters were added to Unicode 4.0 as a result of the North Korean proposals, although not always at the original proposed codepoints. These include HOT BEVERAGE (☕, proposed as TEA SYMBOL), which was proposed as a map symbol for marking a tea house, and the flag symbols WHITE FLAG (⚐) and BLACK FLAG (⚑), which were proposed as map symbols for sites of battles and military victories.[6] These characters were proposed for the provisional code points U+270A, U+268E and U+268F respectively,[49] but encoded at the final code points U+2615, U+2690 and U+2691 respectively.[53] They also include a series of directional bold arrows in the range U+2B05 through U+2B0D,[49] excluding a rightward arrow, which was mapped to an existing character in the Dingbats block,[54] which were added at the same code points they were proposed for, besides the north-east and north-west arrows being swapped compared to the proposal.[55]

Other pictographic characters which were included in the North Korean proposal include the umbrella with raindrops (☔), the lightning bolt for high voltage (⚡) and the warning triangle (⚠).[49] Following some discussion about which other high voltage symbol glyphs in use represented the same character as the one from the North Korean proposal,[50] and which glyph would be best to include for it in the Unicode code chart,[56] and following modification of the code chart glyph of the existing umbrella character without rain (U+2602, ☂) to harmonise with the new umbrella with raindrops from the North Korean proposal,[50][58] these characters were also added in Unicode 4.0, at the same time as the flags and the beverage symbol.[51][53][56] Although proposed for the provisional code points U+2618, U+267F and U+267E,[49] they were given the final code points U+2614, U+26A1 and U+26A0 respectively.[53]

Of these characters, the hot beverage, umbrella with raindrops, lightning bolt and warning triangle, and the upward, downward and leftward arrows were subsequently selected as mappings from the Japanese cellular emoji sets,[59] making a total of seven current Unicode emoji which were originally added to Unicode at the request of North Korea. The umbrella with raindrops and the upward, downward and leftward arrows were also unified with characters from the ARIB extensions used in Japanese broadcasting,[60] which include several characters now classified as emoji,[61] and was mapped to Unicode in Unicode 5.2.[62] However, the pair of white and black flags used as emoji or in emoji regional and identity flag sequences is a different, "waving" set added in Unicode 7.0 (U+1F3F3 🏳 and U+1F3F4 🏴),[63][64] not the North Korean pair.

As of 2018, several KPS 9566 characters remained which are not mapped to Unicode. These include the WPK symbol, four triangular marks, a leftward-pointing pair of scissors (excluded on the rationale that contrastive use with the rightward scissors in the Dingbats block had not been demonstrated), an upward-pointing manicule in a circle, vertical presentation forms of punctuation marks, variants of closing brackets incorporating full stops, horizontal-barred variants of vulgar fractions encoded separately from their slanted versions, and the leaders' names.[65]

A Japanese postal mark with a downward pointing triangle was included in KPS 9566-97 but removed in KPS 9566-2003[1] after the North Korean body had withdrawn it from their Unicode proposal for review[66] in response to requests from the South Korean body for evidence of the symbol's use in North Korea.[48] This mark was re-proposed in 2018 on the basis of KPS 9566 compatibility, and identified as an electrical conformity mark used in Japan prior to its replacement by the PSE diamond.[67] It was added to Unicode in version 13.0, published in 2020.

Encoded forms

[edit]

The 1997 edition of KPS 9566 was registered with the International Register of Coded Character Sets for Use with Escape Sequences as ISO-IR-202,[23] and can therefore be encoded using ISO/IEC 2022. It is a 94n multiple-byte G-set, i.e. if it is used in a 7-bit ISO 2022 code (analogous to ISO-2022-JP or ISO-2022-KR), characters will be encoded with pairs of bytes between 0x21 and 0x7E when in the appropriate mode.

The documented mappings between KPS 9566 and Unicode for the 2003[22][1] and 2011[3] editions of KPS 9566 use an encoding resembling an adaptation of Unified Hangul Code (UHC) to encode KPS 9566 rather than Wansung code, with their updated versions of the ISO-IR-202 plane being encoded using pairs of bytes between 0xA1 and 0xFE, and with other two-byte codes used for syllables not present in ISO-IR-202. The order of the extended syllables follows usual KPS 9566 order. Similarly to UHC, they use lead bytes 0x81 and above, and trail bytes from the ranges 0x41–0x5A, 0x61–0x7A and 0x81–0xFE, excluding the range 0xA1–0xFE if the lead byte is 0xA1 or above.[3]

The 2011 edition also includes several additional Hanja and symbols encoded outside of the ISO-IR-202 plane, after the range used for the extended syllable blocks.[3] This approach is similar to that taken by GBK, but with the trail bytes remaining in the UHC-style ranges: like the extended syllables with lead bytes 0xA1 and above, these all use the trail byte ranges 0x41–0x5A, 0x61–0x7A and 0x81–0xA0. Extended Hanja are encoded with lead bytes between 0xC8 and 0xDC, extended symbols are encoded using lead bytes between 0xE0 and 0xEA, and extended codes with lead bytes between 0xEC and 0xFE are mapped, without gaps, to the Private Use Area[3] (compare the user-defined ranges in GBK). Several of the characters in the extended symbols section and three in the Hanja section are also mapped to the Unicode Private Use Area; unlike the PUA-mapped symbols in the main ISO-IR-202 plane, the identity of these characters is unknown.[3]

Lead byte

[edit]

This chart details the overall layout of the main plane of the KPS 9566 character set by lead byte.[23] For lead bytes used for characters other than composed Chosŏn'gŭl syllables or Hanja, links are provided to charts on this page listing the characters encoded under that lead byte. For lead bytes used for Hanja, links are provided to the appropriate section of Wiktionary's Hanja index.

Where two hexadecimal numbers are given, the value below 0x7F is used in a 7-bit encoding,[a] and the larger value (between 0xA1 and 0xFE) is used in an 8-bit EUC-style encoding.[17] The extended UHC-style 8-bit encodings defined by the 2003 edition onwards likewise use the larger byte values, between 0xA1 and 0xFE inclusive, for the main ISO-IR-202-based plane.[1][3]

KPS 9566 (lead bytes)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax SP[b] 1-_ 2-_ 3-_ 4-_ 5-_ 6-_ 7-_ 8-_ 9-_ 10-_ 11-_ 12-_ 13-_ 14-_ 15-_
3x/Bx 16-_ 17-_ 18-_ 19-_ 20-_ 21-_ 22-_ 23-_ 24-_ 25-_ 26-_ 27-_ 28-_ 29-_ 30-_ 31-_
4x/Cx 32-_ 33-_ 34-_ 35-_ 36-_ 37-_ 38-_ 39-_ 40-_ 41-_ 42-_ 43-_ 44-_ 45-_ 46-_ 47-_
5x/Dx 48-_ 49-_ 50-_ 51-_ 52-_ 53-_ 54-_ 55-_ 56-_ 57-_ 58-_ 59-_ 60-_ 61-_ 62-_ 63-_
6x/Ex 64-_ 65-_ 66-_ 67-_ 68-_ 69-_ 70-_ 71-_ 72-_ 73-_ 74-_ 75-_ 76-_ 77-_ 78-_ 79-_
7x/Fx 80-_ 81-_ 82-_ 83-_ 84-_ 85-_ 86-_ 87-_ 88-_ 89-_ 90-_ 91-_ 92-_ 93-_ 94-_ DEL[b]

Non-Hanja, non-composed sets in the main plane

[edit]

Character set 0x21/0xA1 (row number 1, punctuation and vertical forms)

[edit]

This set contains common sentence punctuation such as brackets, quotation marks, commas and so forth, as well as presentation forms for use in vertical writing. ASCII punctuation (highlighted) is shown below mapped to Basic Latin codepoints (consistent with articles on other CJK character sets, such as KS X 1001 or JIS X 0208), but is mapped to the Halfwidth and Fullwidth Forms block when used in an encoding which combines KPS 9566 with ASCII (as defined by, for example, the 2003 edition).[1]

Compared to the 2003 mapping, the 2011 mapping changes the Unicode mappings of three vertical presentation forms to take advantage of the Vertical Forms block introduced with Unicode 4.1.[3]

KPS 9566 (prefixed with 0x21/0xA1)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax IDSP
3000

3001

3002
,
002C
.
002E
·
00B7
:
003A
;
003B
?
003F
!
0021

2025

2026
~[c]
007E

3003

2015
3x/Bx [d]
2010
_
005F
[e]
FFE3
/
002F
\
005C
|
007C

2225
 ∕ 
2215

2216

309B

309C
´
00B4
`
0060
¨
00A8
^
005E
ˇ
02C7
4x/Cx ˙
02D9
ʼ/ ˚/ ˊ/
22EE
ⸯ[f]
2018

2019

201C

201D
(
0028
)
0029

3014

3015
[
005B
]
005D
5x/Dx {
007B
}
007D

3008

3009

300A

300B

300C

300D

300E

300F

3010

3011
.)[g] .⟫[g]
201A

201B
6x/Ex
201E

201F

FE35

FE36

FE39

FE3A

FE47

FE48

FE37

FE38
︿
FE3F

FE40

FE3D

FE3E

FE41

FE42
7x/Fx
FE43

FE44

FE3B

FE3C
  ASCII punctuation, may also be mapped to the Halfwidth and Fullwidth Forms block.
  Mapped to Private Use Area, shown simulated.

Character set 0x22/0xA2 (row number 2, symbols and operators)

[edit]

This set includes mathematical operators, and some other symbols such as the ampersand, pilcrow, musical note and so forth. ASCII punctuation (highlighted) is shown below mapped to Basic Latin codepoints (consistent with articles on other CJK character sets), but is mapped to the Halfwidth and Fullwidth Forms block when used in an encoding which combines KPS 9566 with ASCII.[1]

Several triangular "road mark" symbols denoting upcoming mountains or inclines ahead or to one side are included in this row, but not presently included in Unicode. They are mapped to the Private Use Area.[46]

KPS 9566 (prefixed with 0x22/0xA2)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax +
002B
-
002D
±
00B1
×
00D7
÷
00F7
=
003D

2260
<
003C
>
003E

2266

2267

221E

2234

2642

2640
3x/Bx
2220

22A5

2312

2202

2207

2261

2252

2248

226A

226B

221A

223D

221D

2235

222B

222C
4x/Cx
222E

2208

220B

2286

2287

2282

2283

2209

220C

2288

2289

2284

2285

222A

2229

2227
5x/Dx
2228
[e]
FFE2

21D2

21D4

2200

2203

2211
#
0023
&
0026
*
002A
@
0040
§
00A7

203B

2606

2605

25CB
6x/Ex
25CF

25CE

25C7

25C6

25A1

25A0

25B3

25B2

25BD

25BC

25B7

25C1

25B6

25C0
[h]
2218
[i]
2219
7x/Fx
2756
[j][k] 𜲁[l]
1CC81
[j] [j]
2690

2691

266F

266D

266A

2020

2021

00B6

2295

2296
  ASCII punctuation, may also be mapped to the Halfwidth and Fullwidth Forms block.
  Mapped to Private Use Area, shown simulated.

Character set 0x23/0xA3 (row number 3, digits and Roman)

[edit]

This set includes a subset of ASCII, minus punctuation and symbols, comprising western Arabic numerals and both cases of the Basic Latin alphabet. Compare row 3 of JIS X 0208, which this row exactly matches. Compare and contrast row 3 of KS X 1001 and GB 2312, which include their entire national variants of ISO 646 in this row, rather than only the alphanumeric subset.

The characters in this row are shown below mapped to Basic Latin codepoints (consistent with articles on the other character sets), but is mapped to the Halfwidth and Fullwidth Forms block when used in an encoding which combines KPS 9566 with ASCII.[1]

KPS 9566 (prefixed with 0x23/0xA3)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax
3x/Bx 0
0030
1
0031
2
0032
3
0033
4
0034
5
0035
6
0036
7
0037
8
0038
9
0039
4x/Cx A
0041
B
0042
C
0043
D
0044
E
0045
F
0046
G
0047
H
0048
I
0049
J
004A
K
004B
L
004C
M
004D
N
004E
O
004F
5x/Dx P
0050
Q
0051
R
0052
S
0053
T
0054
U
0055
V
0056
W
0057
X
0058
Y
0059
Z
005A
6x/Ex a
0061
b
0062
c
0063
d
0064
e
0065
f
0066
g
0067
h
0068
i
0069
j
006A
k
006B
l
006C
m
006D
n
006E
o
006F
7x/Fx p
0070
q
0071
r
0072
s
0073
t
0074
u
0075
v
0076
w
0077
x
0078
y
0079
z
007A

Character set 0x24/0xA4 (row number 4, Chosŏn'gŭl jamo and leaders' names)

[edit]

This set contains Chosŏn'gŭl jamo, as well as special encodings for the names of (as of 2003) the North Korean Leaders Kim Il Sung and Kim Jong Il. The name of Kim Jong Un is also included as of the 2011 edition.[3] Compare with row 4 of KS X 1001.

The jamo in this row which exist in the Unicode Hangul Compatibility Jamo block (which contains the position-independent characters mapped from KS X 1001) are mapped to that block. The obsolete jamo distinguishing palatalised sibilants map to the position-specific characters in the Hangul Jamo block.[1] Conversely, not all of the obsolete jamo encoded by KS X 1001 are encoded in the main plane of KPS 9566. In the 2011 edition of KPS 9566, some of the other historic jamo from KS X 1001 are included outside of the main plane, with the lead byte 0xEA.[3]

The special encodings of the leaders' names are not present in Unicode and are mapped to the Private Use Area. They are shown below simulated with markup.

KPS 9566 (prefixed with 0x24/0xA4)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax
3131

3134

3137

3139

3141

3142

3145

3147

3148

314A

314B

314C

314D

314E

3132
3x/Bx
3138

3143

3146

3149

314F

3151

3153

3155

3157

315B

315C

3160

3161

3163

3150

3152
4x/Cx
3154

3156

315A

315F

3162

3158

315D

3159

315E

3133

3135

3136

313A

313B

313C

313D
5x/Dx
313E

313F

3140

3144

317F

3181

3186

318D

113C

113D

113E

113F

114E

114F

1150

1151
6x/Ex
1154

1155
[m] [m] [m] [m] [m] [m] [m] [m]
7x/Fx [m]
  Mapped to Private Use Area, shown simulated.

Character set 0x25/0xA5 (row number 5, Cyrillic)

[edit]

This set includes both cases of 33 letters from the Cyrillic script, sufficient to write the modern Russian alphabet and Bulgarian alphabet, although other forms of Cyrillic require additional letters.[71]

Compare row 12 of KS X 1001 and row 7 of JIS X 0208, which use the same layout (but in a different row).

KPS 9566 (prefixed with 0x25/0xA5)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax А
0410
Б
0411
В
0412
Г
0413
Д
0414
Е
0415
Ё
0401
Ж
0416
З
0417
И
0418
Й
0419
К
041A
Л
041B
М
041C
Н
041D
3x/Bx О
041E
П
041F
Р
0420
С
0421
Т
0422
У
0423
Ф
0424
Х
0425
Ц
0426
Ч
0427
Ш
0428
Щ
0429
Ъ
042A
Ы
042B
Ь
042C
Э
042D
4x/Cx Ю
042E
Я
042F
5x/Dx а
0430
б
0431
в
0432
г
0433
д
0434
е
0435
ё
0451
ж
0436
з
0437
и
0438
й
0439
к
043A
л
043B
м
043C
н
043D
6x/Ex о
043E
п
043F
р
0440
с
0441
т
0442
у
0443
ф
0444
х
0445
ц
0446
ч
0447
ш
0448
щ
0449
ъ
044A
ы
044B
ь
044C
э
044D
7x/Fx ю
044E
я
044F

Character set 0x26/0xA6 (row number 6, Greek letters and Roman numerals)

[edit]

This set contains Roman numerals and basic support for the Greek alphabet, without diacritics or the final sigma.

Compare and contrast row 5 of KS X 1001 (which uses the same characters but in a different layout and a different row) and row 6 of JIS X 0208 (which uses the same layout for the Greek letters, but without the Roman numerals).

KPS 9566 (prefixed with 0x26/0xA6)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax Α
0391
Β
0392
Γ
0393
Δ
0394
Ε
0395
Ζ
0396
Η
0397
Θ
0398
Ι
0399
Κ
039A
Λ
039B
Μ
039C
Ν
039D
Ξ
039E
Ο
039F
3x/Bx Π
03A0
Ρ
03A1
Σ
03A3
Τ
03A4
Υ
03A5
Φ
03A6
Χ
03A7
Ψ
03A8
Ω
03A9
4x/Cx α
03B1
β
03B2
γ
03B3
δ
03B4
ε
03B5
ζ
03B6
η
03B7
θ
03B8
ι
03B9
κ
03BA
λ
03BB
μ
03BC
ν
03BD
ξ
03BE
ο
03BF
5x/Dx π
03C0
ρ
03C1
σ
03C3
τ
03C4
υ
03C5
φ
03C6
χ
03C7
ψ
03C8
ω
03C9
6x/Ex
2160

2161

2162

2163

2164

2165

2166

2167

2168

2169
7x/Fx
2170

2171

2172

2173

2174

2175

2176

2177

2178

2179

Character set 0x27/0xA7 (row number 7, encircled, superscript, subscript, fractions)

[edit]

Several circled numbers in this row were mapped to Unicode incorrectly in the 2003 edition, due to using non-final proposed code points.[1] They were corrected in the 2011 edition.[3]

KPS 9566 (prefixed with 0x27/0xA7)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax
2460

2461

2462

2463

2464

2465

2466

2467

2468

2469

246A

246B

246C

246D

246E
3x/Bx
246F

2470

2471

2472

2473

3251

3252

3253

3254

3255

3256

3257

3258

3259

325A
4x/Cx
3260

3261

3262

3263

3264

3265

3266

3267

3268

3269

326A

326B

326C

326D
5x/Dx
326E

326F

3270

3271

3272

3273

3274

3275

3276

3277

3278

3279

327A

327B
6x/Ex
2070
¹
00B9
²
00B2
³
00B3

2074

2075

2076

2077

2078

2079
½
00BD

2153

2154
¼
00BC
¾
00BE
7x/Fx
2080

2081

2082

2083

2084

2085

2086

2087

2088

2089
1/2[n] 1/3[n] 2/3[n] 1/4[n] 3/4[n]
  Mapped to Private Use Area, shown simulated.

Character set 0x28/0xA8 (row number 8, unit, quantity and currency symbols)

[edit]

This set contains symbols for units of measure and currency. Those present in ASCII (highlighted) are shown below mapped to Basic Latin codepoints (consistent with articles on other CJK character sets), but are mapped to the Halfwidth and Fullwidth Forms block when used in an encoding which combines KPS 9566 with ASCII.[1]

The Kelvin sign was replaced with a euro sign in the 2003 edition.[1] The 2011 edition includes an alternative encoding of the Kelvin sign at 0xE988.[3]

Compare and contrast with the repertoire of unit symbols included in row 7 of KS X 1001.

KPS 9566 (prefixed with 0x28/0xA8)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax °
00B0

2032

2033

2103

2109
/[o]
FFE6
$
0024
[e]
FFE0
[e]
FFE1
[e]
FFE5
%
0025

2030

212B

33C4
3x/Bx
33A1

33A5

339D

33A0

33A4

339C

339F

33A3

3377

3378

3379

339E

33A2

33A6

3399

339A
4x/Cx
339B

33A7

33A8

338D

338E

338F

33B4

33B5

33B6

33B7

33B8

33B9

3380

3381

3382

3383
5x/Dx
3384

33BA

33BB

33BC

33BD

33BE

33BF

2126

33C0

33C1

3390

3391

3392

3393

3394

33DE
6x/Ex
33DF

33B0

33B1

33B2

33B3

338A

338B

338C

33A9

33AA

33AB

33AC

2113

3395

3396

3397
7x/Fx
3398

33FF

3388

3389

33AD

33AE

33AF

32CC

33DD

33C8

32CD

32CE

33D6

33CB

33CA
  ASCII punctuation, may also be mapped to the Halfwidth and Fullwidth Forms block.

Character set 0x29/0xA9 (row number 9, box drawing)

[edit]
KPS 9566 (prefixed with 0x29/0xA9)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax
2500

2502

250C

2510

2518

2514

251C

252C

2524

2534

253C

2501

2503

250F

2513
3x/Bx
251B

2517

2523

2533

252B

253B

254B

2520

252F

2528

2537

253F

251D

2530

2525

2538
4x/Cx
2542

2512

2511

251A

2519

2516

2515

250E

250D

251E

251F

2521

2522

2526

2527

2529
5x/Dx
252A

252D

252E

2531

2532

2535

2536

2539

253A

253D

253E

2540

2541

2543

2544

2545
6x/Ex
2546

2547

2548

2549

254A
7x/Fx

Character set 0x2A/0xAA (row number 10, Hiragana)

[edit]

This row contains Hiragana for use in the Japanese language.

Compare row 10 of KS X 1001, which uses the same layout. Compare and contrast row 4 of JIS X 0208, which also uses the same layout, but in a different row.

KPS 9566 (prefixed with 0x2A/0xAA)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax
3041

3042

3043

3044

3045

3046

3047

3048

3049

304A

304B

304C

304D

304E

304F
3x/Bx
3050

3051

3052

3053

3054

3055

3056

3057

3058

3059

305A

305B

305C

305D

305E

305F
4x/Cx
3060

3061

3062

3063

3064

3065

3066

3067

3068

3069

306A

306B

306C

306D

306E

306F
5x/Dx
3070

3071

3072

3073

3074

3075

3076

3077

3078

3079

307A

307B

307C

307D

307E

307F
6x/Ex
3080

3081

3082

3083

3084

3085

3086

3087

3088

3089

308A

308B

308C

308D

308E

308F
7x/Fx
3090

3091

3092

3093

Character set 0x2B/0xAB (row number 11, Katakana)

[edit]

This row contains Katakana for use in the Japanese language. However, the Japanese long vowel mark, which is used in katakana text and included in row 1 of JIS X 0208, is not included (similarly to with GB 2312 and KS X 1001),[72] although it is included by KPS 9566-2011 outside of the main plane, at 0xEA48.[3]

Compare row 11 of KS X 1001, which uses the same layout. Compare and contrast row 5 of JIS X 0208, which also uses the same layout, but in a different row.

KPS 9566 (prefixed with 0x2B/0xAB)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax
30A1

30A2

30A3

30A4

30A5

30A6

30A7

30A8

30A9

30AA

30AB

30AC

30AD

30AE

30AF
3x/Bx
30B0

30B1

30B2

30B3

30B4

30B5

30B6

30B7

30B8

30B9

30BA

30BB

30BC

30BD

30BE

30BF
4x/Cx
30C0

30C1

30C2

30C3

30C4

30C5

30C6

30C7

30C8

30C9

30CA

30CB

30CC

30CD

30CE

30CF
5x/Dx
30D0

30D1

30D2

30D3

30D4

30D5

30D6

30D7

30D8

30D9

30DA

30DB

30DC

30DD

30DE

30DF
6x/Ex
30E0

30E1

30E2

30E3

30E4

30E5

30E6

30E7

30E8

30E9

30EA

30EB

30EC

30ED

30EE

30EF
7x/Fx
30F0

30F1

30F2

30F3

30F4

30F5

30F6

Character set 0x2C/0xAC (row number 12, miscellaneous symbols and arrows)

[edit]

For the purpose of mapping this row to Unicode, the bold rightward arrow was unified with the bold rightward arrow from Zapf Dingbats (U+27A1),[54] although earlier tables (which lacked mappings for the other bold arrows) had instead unified it with U+279E, a slightly different Zapf Dingbats character.[52] Since corresponding arrows in other directions were not included in the Dingbats block, additional arrows were encoded between U+2B05 and U+2B0D for compatibility with KPS 9566. These were incorporated into the Unicode code charts using the reference glyphs proposed by the North Korean national body, while U+27A1 retained its reference glyph based on Zapf Dingbats.[54] These arrows (U+2B05 through U+2B07, plus U+27A1) were chosen in Unicode 6.0 as the mappings for some of the arrow characters in cellular emoji sets.[59] Subsequently, during the addition of the Wingdings 3 repertoire in Unicode 7.0, the Unicode coverage of arrow characters was reviewed, resulting in an additional rightward arrow being added at U+2B95 with the intent of harmonising with characters U+2B05 through U+2B0D (in text presentation), since changing the reference glyph for the Zapf Dingbats character was not considered appropriate.[54]

In earlier editions of KPS 9566, such as the 1997 edition, this row included both the simple Japanese-style postal mark (〒) and a version in a downward-pointing triangle,[46][23] which was proposed by the North Korean national body for addition to Unicode alongside the other missing KPS 9566 characters.[46] A response by a South Korean representative, amongst other requests, requested evidence for the symbol's use in North Korea, noting that the Japanese-style postal mark is not used in South Korea, which uses a circled 우 (i.e. ㉾) for a similar purpose, and enquiring whether a Japanese-style postal mark was in use in North Korea.[48] A subsequent meeting was held to discuss this proposal, attended by North and South Korean WG2 representatives; the meeting report notes that the North Korean body had decided to review the character before discussing it further, and therefore did not recommend it for consideration by WG2 as a whole.[66] The postal mark triangle was subsequently removed from KPS 9566 in 2003, leaving only the unenclosed postal mark.[1]

The postal mark triangle was eventually added to Unicode in version 13.0, both for compatibility with the legacy KPS 9566-97 character, and subsequent to the mark being identified as a symbol which had been used for certification for electrical appliances in Japan (as a predecessor to the PSE diamond).[67]

Certain KPS 9566 characters in this row, namely two forms of the emblem of the Workers' Party of Korea, a pair of scissors pointing in a different direction to those in the Dingbats block, and a circled upward-pointing manicule, remain mapped to the Private Use Area.[1]

The north-east and north-west white arrows used incorrect swapped Unicode mappings in the 2003 edition.[1] This was corrected in the 2011 edition mappings.[3]

KPS 9566 (prefixed with 0x2C/0xAC)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax [p] [p]
235F

2600

2602
☔︎
2614

2601

2744
⚡︎
26A1

26A0

2116

2192

2190

2191

2193
3x/Bx
2197

2196

2198

2199

2194

2195

21E8

21E6

21E7

21E9

2B00

2B01

2B02

2B03

2B04

21F3
4x/Cx [q]
27A1

2B05

2B06

2B07

2B08

2B09

2B0A

2B0B

2B0C

2B0D

2663

2665

2660

2666

3012
[r]
2B97
5x/Dx
260F

260E

23CE
[s]
261E
[t] [u] ☕︎
2615

327C

327D

321D

321E

33C7

32CF

3250

2121

213B
6x/Ex
337A
®
00AE
7x/Fx
  Mapped to Private Use Area, shown simulated.

Character set 0x2E/0xAE (row number 14, Latin-1 subset)

[edit]

The characters in this set were not present in the 1997 version of the character set, but were added in the 2003 version.[1] They constitute a subset of the Latin-1 Supplement block of Unicode (equivalent to the upper half of the ISO 8859-1 (Latin-1) character set). This includes accented Roman letters and symbols. Some of the symbols which were already included are omitted, while some others are duplicated as halfwidth counterparts to the earlier fullwidth forms: for example, the not sign (¬, U+00AC) is represented as 0xAEAC, while its fullwidth form (¬, U+FFE2) is represented as 0xA2D1 (in row 2).[1]

This row is omitted from the mapping for the 2011 edition of the standard,[3] indicating it may have been removed at some point after the 2003 edition. The halfwidth yen sign is instead encoded at 0xE98E in the 2011 edition.[3]

The required space would fall outside of the 94-character range, colliding with the area used for extended Chosŏn'gŭl syllables when a UHC-style encoding is used (specifically, with the syllable 쁲),[1] and is omitted. Although the y with trema also falls outside the 94-character range, and the trail byte 0xFF is otherwise unused, the code 0xAEFF is mapped to it in KPS 9566-2003.[1]

KPS 9566-2003 (prefixed with 0x2E/0xAE)
0 1 2 3 4 5 6 7 8 9 A B C D E F
2x/Ax ¡
00A1
¢
00A2
£
00A3
¤
00A4
¥
00A5
¦
00A6
©
00A9
ª
00AA
«
00AB
¬
00AC
SHY
00AD
¯
00AF
3x/Bx µ
00B5
¸
00B8
º
00BA
»
00BB
¿
00BF
4x/Cx À
00C0
Á
00C1
Â
00C2
Ã
00C3
Ä
00C4
Å
00C5
Æ
00C6
Ç
00C7
È
00C8
É
00C9
Ê
00CA
Ë
00CB
Ì
00CC
Í
00CD
Î
00CE
Ï
00CF
5x/Dx Ð
00D0
Ñ
00D1
Ò
00D2
Ó
00D3
Ô
00D4
Õ
00D5
Ö
00D6
Ø
00D8
Ù
00D9
Ú
00DA
Û
00DB
Ü
00DC
Ý
00DD
Þ
00DE
ß
00DF
6x/Ex à
00E0
á
00E1
â
00E2
ã
00E3
ä
00E4
å
00E5
æ
00E6
ç
00E7
è
00E8
é
00E9
ê
00EA
ë
00EB
ì
00EC
í
00ED
î
00EE
ï
00EF
7x/Fx ð
00F0
ñ
00F1
ò
00F2
ó
00F3
ô
00F4
õ
00F5
ö
00F6
ø
00F8
ù
00F9
ú
00FA
û
00FB
ü
00FC
ý
00FD
þ
00FE
ÿ
00FF

Precomposed Chosŏn'gŭl sets (rows number 16 through 44)

[edit]

Precomposed Chosŏn'gŭl syllable clusters are allocated code points in a continuous sorted block between code points 16-01 and 44-47 inclusive. Not all possible clusters are allocated code points.[73] Compare the different ordering and availability in KS X 1001.

The encoded form documented for KPS 9566-2003 encodes the KPS 9566 plane on GR (0xA1-0xFE) and additionally encodes the remaining syllable clusters using lead bytes in the range 0x80-0xC2 and trail bytes in the ranges 0x41-0x5A, 0x61-0x7A and 0x81-0xFE (where at most one byte is in the range 0xA1-0xFE),[1] similarly to Unified Hangul Code but with the omitted clusters from and sorting order of KPS 9566, not KS X 1001.

KPS 9566 (precomposed Chosŏn'gŭl syllables)
0 1 2 3 4 5 6 7 8 9 A B C D E F
302x/B0Ax
AC00

AC01

AC04

AC07

AC08

AC09

AC0A

AC10

AC11

AC12

AC13

AC15

AC16

AC17

AC19
303x/B0Bx
AC1A

AC1B

AC14

AC38

AC39

AC3C

AC40

AC48

AC4B

AC4D

AC70

AC71

AC74

AC77

AC78

AC79
304x/B0Cx
AC7A

AC80

AC81

AC83

AC85

AC86

AC89

AC8A

AC8B

AC84

ACA8

ACA9

ACAC

ACAF

ACB0

ACB8
305x/B0Dx
ACB9

ACBB

ACBD

ACC1

ACAA

ACBC

ACE0

ACE1

ACE4

ACE7

ACE8

ACEA

ACEC

ACEF

ACF0

ACF1
306x/B0Ex
ACF3

ACF5

ACF6

ACFA

AD50

AD54

AD58

AD61

AD63

AD6C

AD6D

AD70

AD73

AD74

AD75

AD76
307x/B0Fx
AD7B

AD7C

AD7D
굿
AD7F

AD81

AD82

ADDC

ADE0

ADE4

ADEC

ADF1

ADF8

ADF9

ADFC
귿
ADFF
312x/B1Ax
AE00

AE01

AE07

AE08

AE09

AE0B

AE0D

AE30

AE31

AE34

AE37

AE38

AE3A

AE40

AE41
313x/B1Bx
AE43

AE45

AE46

AE47

AE49

AE4A

AC1C

AC1D

AC20

AC24

AC2C

AC2D

AC2F

AC31

AC30

AC54
314x/B1Cx
AC58

AC5C

AC8C

AC8D

AC90

AC94

AC9C

AC9D

AC9F

ACA1

ACA0

ACC4

ACC8

ACCC

ACD5

ACD7
315x/B1Dx
AD34

AD35

AD38

AD3C

AD44

AD45

AD47

AD49

AD48

ADC0

ADC1

ADC4

ADC8

ADD0

ADD1

ADD3
316x/B1Ex
AE14

ACFC

ACFD

AD00

AD03

AD04

AD06

AD0C

AD0D

AD0F

AD11

AD10

AD88

AD89

AD8C

AD90
317x/B1Fx
AD98

AD9D

AD9C

AD18

AD19

AD1C

AD20

AD29

AD2D

AD2C

ADA4

ADA5

ADB7

B098

B099
322x/B2Ax
B09B

B09C

B09F

B0A0

B0A1

B0A2

B0A8

B0A9

B0AB

B0AD

B0AE

B0AF

B0B1

B0B3

B09A
323x/B2Bx
B0AC

B0D0

B0D1

B0D4

B0D8

B0E0

B0E1

B0E5

B108

B109

B10B

B10C

B110

B112

B113

B118
324x/B2Cx
B119

B11B

B11D

B122

B123

B10A

B11C

B140

B141

B144

B148

B150

B151

B153

B155

B158
325x/B2Dx
B154

B178

B179

B17C

B180

B182

B188

B189

B18B

B18D

B192

B193

B1E8

B1E9

B1EC

B1F0
326x/B2Ex
B1F8

B1F9

B1FB

B1FD

B204

B205

B208

B20B

B20C

B214

B215

B217

B219

B21E

B274

B275
327x/B2Fx
B278

B27C

B284

B285

B289

B290

B291

B294

B298

B299

B29A

B2A0

B2A1

B2A3

B2A5
332x/B3Ax
B2A6

B2AA

B2C8

B2C9

B2CC

B2D0

B2D2

B2D8

B2D9

B2DB

B2DD

B2E2

B0B4

B0B5

B0B8
333x/B3Bx
B0BC

B0C4

B0C5

B0C7

B0C9

B0C8

B0EC

B124

B125

B128

B12C

B134

B135

B137

B139

B138
334x/B3Cx
B15C

B160

B1CC

B1D0

B1D4

B1DC

B1DD

B1DF

B258

B25C

B260

B268

B269

B26D

B2AC

B2B0
335x/B3Dx
B2B4

B2BC

B2C1

B194

B198

B19C

B1A7

B1A8

B220

B228

B233

B234

B1B0

B23C

B2E4

B2E5
336x/B3Ex
B2E8

B2EB

B2EC

B2ED

B2EE

B2EF

B2F2

B2F3

B2F4

B2F5

B2F7

B2F9

B2FA

B2FB

B2FE

B2FF
337x/B3Fx
B2E6

B2F8

B31C

B354

B355

B358

B35B

B35C

B35E

B35F

B364

B365

B367

B369

B36B
342x/B4Ax
B36E

B36F

B356

B368

B38C

B390

B394

B3A1

B3A0

B3C4

B3C5

B3C8

B3CB

B3CC

B3CE
343x/B4Bx
B3D0

B3D4

B3D5

B3D7

B3D9

B3DB

B3DD

B434

B450

B451

B454

B458

B460

B461

B463

B465
344x/B4Cx
B4C0

B4C4

B4C8

B4D0

B4D5

B4DC

B4DD

B4E0

B4E3

B4E4

B4E5

B4E6

B4E7

B4EC

B4ED

B4EF
345x/B4Dx
B4F1

B514

B515

B518

B51B

B51C

B524

B525

B527

B529

B52A

B52E

B528

B300

B301

B304
346x/B4Ex
B308

B310

B311

B313

B315

B314

B338

B370

B371

B374

B377

B378

B380

B381

B383

B385
347x/B4Fx
B384

B3A8

B3AC

B418

B41C

B420

B428

B429

B42B

B42D

B42C

B4A4

B4A5

B4A8

B4AC
352x/B5Ax
B4B4

B4B5

B4B7

B4B9

B4F8

B4FC

B500

B509

B50D

B3E0

B3E4

B3E8

B46C

B470

B474
353x/B5Bx
B47C

B47F

B480

B3FC

B400

B404

B410

B488

B49D

B77C

B77D

B780

B784

B78C

B78D

B78F
354x/B5Cx
B791

B792

B796

B797

B790

B7B4

B7B5

B7B8

B7BC

B7C4

B7C5

B7C7

B7C9

B7EC

B7ED

B7F0
355x/B5Dx
B7F4

B7FC

B7FD

B7FF

B801

B806

B807

B800

B824

B825

B828

B82C

B834

B835

B837

B839
356x/B5Ex
B838

B85C

B85D

B860

B864

B86C

B86D

B86F

B871

B876

B8CC

B8D0

B8D4

B8DC

B8DD

B8DF
357x/B5Fx
B8E1

B8E8

B8E9

B8EC

B8F0

B8F8

B8F9

B8FB

B8FD

B958

B959

B95C

B960

B968

B969
362x/B6Ax
B96B

B96D

B974

B975

B978

B97C

B984

B985

B987

B989

B98A

B98D

B98E

B9AC

B9AD
363x/B6Bx
B9B0

B9B4

B9BC

B9BD
릿
B9BF

B9C1

B9C6

B798

B799

B79C

B7A0

B7A8

B7A9

B7AB

B7AD

B7AC
364x/B6Cx
B7D0

B808

B809

B80C

B810

B818

B819

B81B

B81D

B81C

B840

B844

B848

B851

B853

B8B0
365x/B6Dx
B8B4

B8B8

B8C0

B8C1

B8C3

B8C5

B8C4

B93C

B93D

B940

B944

B94C

B94F

B951

B990

B994
366x/B6Ex
B998

B9A0

B878

B87C

B889

B88D

B904

B918

B894

B8A8

B920

B9C8

B9C9

B9CC

B9CE

B9CF
367x/B6Fx
B9D0

B9D1

B9D2

B9D8

B9D9

B9DB

B9DD

B9DE

B9DF

B9E1

B9E3

BA00

BA01

BA04

BA08
372x/B7Ax
BA10

BA15

BA38

BA39

BA3C

BA40

BA41

BA42

BA48

BA49

BA4B

BA4D

BA4E

BA53

BA4C
373x/B7Bx
BA70

BA71

BA74

BA78

BA80

BA81

BA83

BA85

BA87

BA84

BAA8

BAA9

BAAB

BAAC

BAAF

BAB0
374x/B7Cx
BAB2

BAB8

BAB9

BABB

BABD

BAC3

BB18

BB1C

BB20

BB29

BB2B

BB34

BB35

BB38

BB3B

BB3C
375x/B7Dx
BB3D

BB3E

BB44

BB45

BB47

BB49

BB4D

BB4F

BB36

BBA4

BBA5

BBA8

BBAC

BBB4

BBB7

BBB9
376x/B7Ex
BBC0

BBC4

BBC8

BBD0

BBD1

BBD3

BBD5

BBF8

BBF9

BBFC
믿
BBFF

BC00

BC02

BC08

BC09

BC0B
377x/B7Fx
BC0D

BC0F

BC11

BC0C

B9E4

B9E5

B9E8

B9EC

B9F4

B9F5

B9F7

B9F9

B9FA

B9F8

BA1C
382x/B8Ax
BA54

BA55

BA58

BA5C

BA64

BA65

BA67

BA69

BA68

BA8C

BA90

BAFC

BB00

BB04

BB0C
383x/B8Bx
BB0D

BB0F

BB11

BB88

BB8C

BB90

BBDC

BBE0

BBEC

BAC4

BAC8

BAD9

BAD8

BB50

BB54

BB58
384x/B8Cx
BB60

BB61

BB63

BB64

BAE0

BB6C

BC14

BC15

BC17

BC18

BC1B

BC1C

BC1D

BC1E

BC1F

BC24
385x/B8Dx
BC25

BC27

BC29

BC2D

BC16

BC4C

BC4D

BC50

BC5C

BC5D

BC84

BC85

BC88

BC8B

BC8C

BC8D
386x/B8Ex
BC8E

BC94

BC95

BC97

BC99

BC9A

BC9C

BC98

BCBC

BCBD

BCC0

BCC4

BCCC

BCCD

BCCF

BCD1
387x/B8Fx
BCD3

BCD5

BCD0

BCF4

BCF5

BCF8

BCFC

BD04

BD05

BD07

BD09

BD0F

BCF6

BD64

BD68
392x/B9Ax
BD6C

BD80

BD81

BD84

BD87

BD88

BD89

BD8A

BD90

BD91

BD93

BD95

BD99

BD9A

BDF0
393x/B9Bx
BDF4

BDF8

BE00

BE01

BE03

BE05

BE0C

BE0D

BE10

BE14

BE1C

BE1D

BE1F

BE21

BE44

BE45
394x/B9Cx
BE48

BE4C

BE4E

BE54

BE55

BE57

BE59

BE5A

BE5B

BC30

BC31

BC34

BC37

BC38

BC40

BC41
395x/B9Dx
BC43

BC45

BC49

BC44

BC68

BCA0

BCA1

BCA4

BCA7

BCA8

BCB0

BCB1

BCB3

BCB5

BCB4

BCD8
396x/B9Ex
BCDC

BD48

BD49

BD4C

BD50

BD58

BD59

BD5C

BDD4

BDD5

BDD8

BDDC

BDE9

BE28

BE2C

BE30
397x/B9Fx
BE3D

BD10

BD14

BD21

BD23

BD24

BD9C

BDA4

BDAF

BDB4

BDB0

BD2C

BD30

BD40

BDB8
3A2x/BAAx
C0AC

C0AD

C0AF

C0B0

C0B3

C0B4

C0B5

C0B6

C0BC

C0BD

C0BF

C0C1

C0C5

C0C0

C0E4
3A3x/BABx
C0E5

C0E8

C0EC

C0F4

C0F5

C0F7

C0F9

C11C

C11D

C11F

C120

C123

C124

C126

C127

C12C
3A4x/BACx
C12D

C12F

C131

C136

C11E

C130

C154

C155

C158

C15C

C164

C165

C167

C169

C168

C18C
3A5x/BADx
C18D

C190

C193

C194

C196

C19C

C19D

C19F

C1A1

C1A5

C18E

C1FC

C1FD

C200

C204

C20C
3A6x/BAEx
C20D

C20F

C211

C218

C219

C21C

C21F

C220

C228

C229

C22B

C22D

C22F

C231

C232

C288
3A7x/BAFx
C289

C28C

C290

C298

C299

C29B

C29D

C2A4

C2A5

C2A8

C2AC

C2AD

C2B2

C2B3

C2B4
3B2x/BBAx
C2B5

C2B7

C2B9

C2DC

C2DD

C2E0

C2E3

C2E4

C2EB

C2EC

C2ED

C2EF

C2F1

C2F6

C0C8
3B3x/BBBx
C0C9

C0CC

C0D0

C0D8

C0D9

C0DB

C0DD

C0DC

C100

C104

C108

C110

C115

C138

C139

C13C
3B4x/BBCx
C140

C148

C149

C14B

C14D

C151

C152

C14C

C170

C174

C178

C185

C1E0

C1E1

C1E4

C1E8
3B5x/BBDx
C1F0

C1F1

C1F3

C1F5

C1F4

C26C

C26D

C270

C274

C27C

C27D

C27F

C281

C2C0

C2C4

C1A8
3B6x/BBEx
C1A9

C1AC

C1B0

C1BB

C1BD

C234

C248

C1C4

C1C8

C1CC

C1D4

C1D7

C1D8

C250

C251

C254
3B7x/BBFx
C258

C260

C261

C265

C790

C791

C794

C796

C797

C798

C79A

C7A0

C7A1

C7A3

C7A5
3C2x/BCAx
C7A6

C7A4

C7C8

C7C9

C7CC

C7CE

C7D0

C7D8

C7D9

C7DD

C800

C801

C804

C808

C80A
3C3x/BCBx
C810

C811

C813

C815

C816

C814

C838

C839

C83C

C840

C848

C849

C84B

C84D

C84C

C870
3C4x/BCCx
C871

C874

C878

C87A

C880

C881

C883

C885

C886

C887

C88B

C8E0

C8E1

C8E4

C8E8

C8F0
3C5x/BCDx
C8F5

C8FC

C8FD

C900

C904

C905

C906

C90C

C90D

C90F

C911

C96C

C970

C974

C97C

C981
3C6x/BCEx
C988

C989

C98C

C990

C998

C999

C99B

C99D

C9C0

C9C1

C9C4

C9C7

C9C8

C9CA

C9D0

C9D1
3C7x/BCFx
C9D3

C9D5

C9D6

C9D9

C9DA

C7AC

C7AD

C7B0

C7B4

C7BC

C7BD

C7BF

C7C1

C7C0

C7E4
3D2x/BDAx
C7E8

C7EC

C81C

C81D

C820

C824

C82C

C82D

C82F

C831

C836

C830

C854

C858

C85C
3D3x/BDBx
C8C4

C8C8

C8CC

C8D4

C8D5

C8D7

C8D9

C8D8

C950

C951

C954

C957

C958

C960

C961

C963
3D4x/BDCx
C9A4

C88C

C88D

C890

C894

C89D

C89F

C8A1

C918

C92C

C8A8

C8BD

C8BC

C934

C938

C93C
3D5x/BDDx
C944

C945

C948

CC28

CC29

CC2C

CC2E

CC30

CC38

CC39

CC3B

CC3D

CC3E

CC3C

CC60

CC64
3D6x/BDEx
CC66

CC68

CC70

CC71

CC75

CC98

CC99

CC9C

CCA0

CCA8

CCA9

CCAB

CCAD

CCAC

CCD0

CCD1
3D7x/BDFx
CCD4

CCD8

CCE4

CD08

CD09

CD0C

CD10

CD18

CD19

CD1B

CD1D

CD78

CD7C

CD80

CD88
3E2x/BEAx
CD94

CD95

CD98

CD9B

CD9C

CDA4

CDA5

CDA7

CDA9

CE04

CE08

CE0C

CE14

CE19

CE20
3E3x/BEBx
CE21

CE24

CE28

CE30

CE31

CE33

CE35

CE58

CE59

CE5C

CE5F

CE60

CE61

CE68

CE69

CE6B
3E4x/BECx
CE6D

CC44

CC45

CC48

CC4C

CC54

CC55

CC57

CC59

CC58

CC7C

CCB4

CCB5

CCB8

CCBC

CCC4
3E5x/BEDx
CCC5

CCC7

CCC9

CCC8

CCEC

CCF0

CD01

CD5C

CD60

CD64

CD6C

CD6D

CD6F

CD71

CDE8

CDEC
3E6x/BEEx
CDF0

CDF8

CDF9

CDFB

CDFD

CE3C

CD24

CD25

CD28

CD2C

CD39

CDB0

CDC3

CDC4

CD40

CD44
3E7x/BEFx
CDCC

CDD0

CE74

CE75

CE78

CE7C

CE84

CE85

CE87

CE89

CE8E

CE88

CEAC

CEAD

CEB0
3F2x/BFAx
CEBC

CEBD

CEC1

CEE4

CEE5

CEE8

CEEB

CEEC

CEF4

CEF5

CEF7

CEF9

CEFD

CEFE

CEF8
3F3x/BFBx
CF1C

CF20

CF24

CF2C

CF2D

CF2F

CF31

CF30

CF54

CF55

CF58

CF5C

CF64

CF65

CF67

CF69
3F4x/BFCx
CFC4

CFE0

CFE1

CFE4

CFE8

CFF0

CFF1

CFF3

CFF5

D050

D054

D058

D060

D06C

D06D

D070
3F5x/BFDx
D074

D07C

D07D

D081

D0A4

D0A5

D0A8

D0AC

D0B4

D0B5

D0B7

D0B9

D0BE

CE90

CE91

CE94
3F6x/BFEx
CE98

CEA0

CEA1

CEA3

CEA5

CEAA

CEA4

CEC8

CF00

CF01

CF04

CF08

CF10

CF11

CF13

CF15
3F7x/BFFx
CF38

CFA8

CFB0

D034

D035

D038

D03C

D044

D045

D047

D049

D088

CF70

CF71

CF74
402x/C0Ax
CF78

CF80

CF85

CFFC
퀀
D000

D004

D011

CF8C

CF90

CF94

CFA1

D018

D019

D020

D02D
403x/C0Bx
D0C0

D0C1

D0C4

D0C8

D0C9

D0D0

D0D1

D0D3

D0D5

D0DA

D0D4

D0F8

D0FC

D10D

D130

D131
404x/C0Cx
D134

D138

D13A

D140

D141

D143

D145

D144

D168

D16C

D17C

D1A0

D1A1

D1A4

D1A8

D1B0
405x/C0Dx
D1B1

D1B3

D1B5

D1BA

D210

D22C

D22D

D230

D234

D23C

D23D

D23F

D241

D29C

D2A0

D2A4
406x/C0Ex
D2AC

D2B1

D2B8

D2B9

D2BC

D2BF

D2C0

D2C2

D2C8

D2C9

D2CB

D2CD

D2F0

D2F1

D2F4

D2F8
407x/C0Fx
D300

D301

D303

D305

D0DC

D0DD

D0E0

D0E4

D0EC

D0ED

D0EF

D0F1

D0F6

D0F0

D114
412x/C1Ax
D14C

D14D

D150

D154

D15C

D15D

D15F

D161

D166

D184

D188

D1F4

D1F8

D207

D209
413x/C1Bx
D280

D281

D284

D288

D290

D291

D295

D2D4

D2D8

D2DC

D2E4

D2E5

D1BC

D1C0

D248

D25C
414x/C1Cx
D1D8

D264

D268

D278

D30C

D30D

D310

D314

D316

D31C

D31D

D31F

D321

D325

D30E

D320
415x/C1Dx
D344

D345

D37C

D37D

D380

D384

D38C

D38D

D38F

D391

D390

D3B4

D3B5

D3B8

D3BC

D3C4
416x/C1Ex
D3C5

D3C7

D3C9

D3C8

D3EC

D3ED

D3F0

D3F4

D3FC

D3FD

D3FF

D401

D45C

D460

D464

D46D
417x/C1Fx
D46F

D478

D479

D47C

D47F

D480

D482

D488

D489

D48B

D48D

D4E8

D4EC

D4F0

D4F8
422x/C2Ax
D4FB

D4FD

D504

D508

D50C

D514

D515

D517

D519

D53C

D53D

D540

D544

D54C

D54D
423x/C2Bx
D54F

D551

D328

D329

D32C

D330

D338

D339

D33B

D33D

D33C

D360

D398

D399

D39C

D3A0
424x/C2Cx
D3A8

D3A9

D3AB

D3AD

D3B2

D3D0

D3D4

D3D8

D3E1

D3E3

D440

D444

D4CC

D4D0

D4D4

D4DC
425x/C2Dx
D4DF

D520

D524

D408

D41D

D494

D4A9

D558

D559

D55C

D560

D565

D568

D569

D56B

D56D
426x/C2Ex
D590

D5A5

D5C8

D5C9

D5CC

D5D0

D5D2

D5D5

D5D7

D5D8

D5D9

D5DB

D5DD

D600

D601

D604
427x/C2Fx
D608

D610

D611

D613

D615

D614

D638

D639

D63C

D63F

D640

D645

D648

D649

D64B
432x/C3Ax
D64D

D651

D6A8

D6AC

D6B0

D6B9

D6BB

D6C4

D6C5

D6C8

D6CC

D6D1

D6D4

D6D5

D6D7
433x/C3Bx
D6D9

D734

D735

D738

D73C

D744

D747

D749

D750

D751

D754

D756

D757

D758

D759

D75D
434x/C3Cx
D760

D761

D763

D765

D769

D788

D789

D78C

D790

D798

D799

D79B

D79D

D574

D575

D578
435x/C3Dx
D57C

D584

D585

D587

D589

D588

D5AC

D5E4

D5E5

D5E8

D5EC

D5F4

D5F5

D5F7

D5F9

D5F8
436x/C3Ex
D61C

D620

D624

D62D

D68C

D68D

D690

D694

D69D

D69F

D6A1

D718

D719

D71C

D720

D728
437x/C3Fx
D729

D72B

D72D

D76C

D770

D774

D77C

D77D

D781

D654

D655

D658

D65C

D664

D665
442x/C4Ax
D667

D669

D6E0

D6E1

D6E4

D6E8

D6F0

D6F5

D670

D671

D674

D683

D685

D684

D6FC
443x/C4Bx
D6FD

D700

D704

D711

AE4C

AE4D

AE50

AE53

AE54

AE56

AE5C

AE5D

AE5F

AE61

AE65

AE4E
444x/C4Cx
AE60

AE84

AE85

AE88

AE8C

AEBC

AEBD

AEC0

AEC4

AECC

AECD

AECF

AED1

AEBE

AED0

AEF4
445x/C4Dx
AEF8

AEFC

AF07

AF0D

AF08

AF2C

AF2D

AF30

AF31

AF32

AF34

AF3C

AF3D
꼿
AF3F

AF41

AF42
446x/C4Ex
AF43

AF9C

AFB8

AFB9

AFBC
꾿
AFBF

AFC0

AFC7

AFC8

AFC9

AFCB

AFCD

AFCE

B028

B044

B045
447x/C4Fx
B048

B04A

B04C

B04E

B053

B054

B055

B057

B059

B05D

B07C

B07D

B080

B084

B08C
452x/C5Ax
B08D

B08F

B091

AE68

AE69

AE6C

AE70

AE78

AE79

AE7B

AE7D

AE7C

AEA0

AED8

AED9
453x/C5Bx
AEDC

AEE0

AEE8

AEE9

AEEB

AEED

AEEC

AF10

AF80

AF81

AF84

AF88

AF90

AF91

AF95

B00C
454x/C5Cx
B010

B014

B01C

B01D

B021

AF48

AF49

AF4C

AF50

AF5B

AF5D

AF5C

AFD4

AFD8

AFDC

AFE5
455x/C5Dx
AFE7

AFE9

AFE8

AF64

AF65

AF68

AF6C

AF79

AFF0

AFF1

AFF4

AFF8
뀀
B000

B001

B005

B004
456x/C5Ex
B530

B531

B534

B538

B53F

B540

B541

B543

B545

B54B

B532

B544

B568

B570

B5A0

B5A1
457x/C5Fx
B5A4

B5A8

B5AA

B5AB

B5B0

B5B1

B5B3

B5B5

B5BB

B5B4

B5D8

B5EC

B610

B611

B614
462x/C6Ax
B618

B620

B621

B623

B625

B680

B69C

B69D

B6A0

B6A4

B6AB

B6AC

B6AD

B6B1

B70C
463x/C6Bx
B728

B729

B72C

B72F

B730

B738

B739

B73B

B73D

B760

B761

B764

B768

B770

B771

B773
464x/C6Cx
B775

B54C

B54D

B550

B554

B55C

B55D

B55F

B561

B560

B5BC

B5BD

B5C0

B5C4

B5CC

B5CD
465x/C6Dx
B5CF

B5D1

B5D0

B664

B668

B6F0

B6F4

B6F8

B700

B701

B705

B744

B745

B748

B74C

B754
466x/C6Ex
B755

B759

B62C

B630

B634

B6B8

B6CC

B648

B649

B6D4

BE60

BE61

BE64

BE68

BE6A

BE70
467x/C6Fx
BE71

BE73

BE75

BE7B

BE74

BE98

BE99

BE9C

BEA8

BED0

BED1

BED4

BED7

BED8

BEE0
472x/C7Ax
BEE3

BEE5

BEE4

BF08

BF09

BF18

BF19

BF1B

BF1D

BF1C

BF40

BF41

BF44

BF48

BF50
473x/C7Bx
BF51

BF53

BF55

BFB0

BFC5

BFCC

BFCD

BFD0

BFD4

BFDC

BFDD

BFDF

BFE1

C03C

C051

C058
474x/C7Cx
C05C

C060

C068

C069

C090

C091

C094

C098

C0A0

C0A1

C0A3

C0A5

BE7C

BE7D

BE80

BE84
475x/C7Dx
BE8C

BE8D

BE8F

BE91

BE90

BEB4

BEEC

BEED

BEF0

BEF4

BEFC

BF01

BF94

C020

C074

BF5C
476x/C7Ex
BFE8

C2F8

C2F9

C2FB

C2FC

C300

C308

C309

C30B

C30D

C313

C30C

C330

C334

C338

C345
477x/C7Fx
C368

C369

C36C

C370

C372

C378

C379

C37B

C37D

C36A

C37C

C3A0

C3D8

C3D9

C3DC
482x/C8Ax
C3DF

C3E0

C3E2

C3E8

C3E9

C3EB

C3ED

C448

C44C

C450

C458

C45D

C464

C465

C468
483x/C8Bx
C46C

C474

C475

C479

C4D4

C4D8

C4E7

C4E9

C4F0

C4F1

C4F4

C4F8

C4FA

C4FF

C500

C501
484x/C8Cx
C505

C528

C529

C52C

C52F

C530

C538

C539

C53B

C53D

C53C

C314

C315

C318

C31C

C324
485x/C8Dx
C325

C327

C329

C328

C34C

C384

C385

C388

C38C

C394

C395

C399

C3BC

C3C0

C42C

C42D
486x/C8Ex
C430

C434

C43C

C43D

C440

C4B8

C4BC

C50C

C510

C514

C51C

C3F4

C3F5

C3F8

C3FC

C407
487x/C8Fx
C409

C408

C480

C494

C410

C411

C424

C49C

C4A0

C4AD

C9DC

C9DD

C9E0

C9E2

C9E4
492x/C9Ax
C9E7

C9EC

C9ED

C9EF

C9F1

C9F0

CA14

CA18

CA24

CA29

CA4C

CA4D

CA50

CA54

CA57
493x/C9Bx
CA5C

CA5D

CA5F

CA61

CA60

CA84

CA98

CABC

CABD

CAC0

CAC4

CACC

CACD

CACF

CAD1

CAD2
494x/C9Cx
CAD3

CAD7

CB2C

CB30

CB3C

CB41

CB48

CB49

CB4C

CB50

CB58

CB59

CB5B

CB5D

CBB8

CBC0
495x/C9Dx
CBD4

CBD5

CBD8

CBDC

CBE4

CBE7

CBE9

CBEA

CC0C

CC0D

CC10

CC14

CC1C

CC1D

CC1F

CC21
496x/C9Ex
CC22

CC26

CC27

C9F8

C9F9

C9FC

CA00

CA08

CA09

CA0B

CA0D

CA0C

CA30

CA34

CA68

CA69
497x/C9Fx
CA6C

CA70

CA78

CA79

CA7D

CAA0

CB10

CB14

CB18

CB20

CB21

CB24

CB9C

CBF0

CBF4
4A2x/CAAx
CAD8

CAD9

CADC

CAE0

CAED

CAEC

CB64

CB79

CB78

CAF4

CB08

CB80

C544

C545

C548
4A3x/CABx
C549

C54A

C54C

C54D

C54E

C552

C553

C554

C555

C557

C559

C55D

C55E

C55F

C558

C57C
4A4x/CACx
C57D

C580

C583

C584

C587

C58C

C58D

C58F

C591

C595

C597

C590

C5B4

C5B5

C5B8

C5B9
4A5x/CADx
C5BB

C5BC

C5BD

C5BE

C5C4

C5C5

C5C6

C5C7

C5C9

C5CA

C5CC

C5CE

C5CF

C5C8

C5EC

C5ED
4A6x/CAEx
C5F0

C5F3

C5F4

C5F6

C5F7

C5FC

C5FD

C5FE

C5FF

C601

C605

C606

C607

C5EE

C600

C624
4A7x/CAFx
C625

C628

C62C

C62D

C62E

C630

C633

C634

C635

C637

C639

C63B

C63E

C694

C695
4B2x/CBAx
C698

C69C

C6A4

C6A5

C6A7

C6A9

C6B0

C6B1

C6B4

C6B8

C6B9

C6BA

C6C0

C6C1

C6C3
4B3x/CBBx
C6C5

C720

C721

C724

C728

C730

C731

C733

C735

C737

C73C

C73D

C740

C743

C744

C745
4B4x/CBCx
C74A

C74C

C74D

C74F

C751

C752

C753

C754

C755

C756

C757

C774

C775

C778

C77C

C77D
4B5x/CBDx
C77E

C783

C784

C785

C787

C789

C78A

C78E

C788

C560

C561

C564

C568

C570

C571

C573
4B6x/CBEx
C575

C574

C598

C59C

C5A0

C5A9

C5D0

C5D1

C5D4

C5D8

C5E0

C5E1

C5E3

C5E5

C5E4

C608
4B7x/CBFx
C60C

C610

C618

C619

C61B

C61D

C61C

C678

C679

C67C

C680

C688

C689

C68B

C68D
4C2x/CCAx
C704

C705

C708

C70C

C714

C715

C717

C719

C758

C75C

C760

C768

C76B

C640

C641
4C3x/CCBx
C644

C647

C648

C650

C651

C653

C655

C654

C6CC

C6CD

C6D0

C6D4

C6DC

C6DD

C6DF

C6E1
4C4x/CCCx
C6E0

C65C

C65D

C660

C66C

C66F

C671

C6E8

C6E9

C6EC

C6F0

C6F8

C6F9

C6FB

C6FD

C701
4C5x/CCDx (user-defined area)
4C6x/CCEx (user-defined area)
4C7x/CCFx (user-defined area)

Statistics by jamo

[edit]
Initial consonants
Jamo Count
186
157
151
144
151
151
177
155
125
122
114
113
153
138
104
85
119
111
223
Total 2679
Vowels
Jamo Count
255
101
232
144
200
80
184
98
185
195
176
30
168
52
115
111
59
102
76
58
58
Total 2679
Final consonants
Jamo Count
(none) 391
226
7
317
3
10
51
288
26
50
11
3
5
4
15
250
233
3
224
264
29
18
5
31
40
26
16
133
Total 2679

Hanja sets (rows number 45 through 94)

[edit]

The Hanja at 69-09 (0xE5A9) is mapped to U+676E in all documented tables; characters are, however ordered according to their readings, from which it appears that it is intended to be U+67FF instead.[74]

Extended non-syllable, non-Hanja sets in KPS 9566-2011

[edit]

Following are charts for the non-syllable, non-Hanja section of KPS 9566-2011 outside of the main plane.[3]

Extension set 0xE0 (symbols and pictographs)

[edit]
KPS 9566-2011 (prefixed with 0xE0)
0 1 2 3 4 5 6 7 8 9 A B C D E F
4x
25D1

2298

2709

261B

261E

270C
5x
270D

270F

270E

2710

2713

2714

22A1

2394
6x
2299
7x ⚓︎
2693

263C
8x
25C9
9x
272A

272F

272C

272B

272E

272D

2730

2729
  Not in Unicode, mapped to Private Use area

Extension sets 0xE1, 0xE2, 0xE3 (unknown)

[edit]

All characters in these extension sets map to the private use area. Their purpose is unknown.[3]

Extension set 0xE4 (arrows)

[edit]

This set includes several, mostly rightward arrows mapping to the Unicode Dingbats block and elsewhere.[3]

KPS 9566-2011 (prefixed with 0xE4)
0 1 2 3 4 5 6 7 8 9 A B C D E F
4x
2794

2798

2799

279A

279B

279C

279D

279F

27A0

27A2

27A3

27A4

27A5

27A6

27A7
5x
27A8

27A9

27AA

27AB

27AC

27AD

27AE

27AF

27B1

27B2

27B3
6x
27B4

27B5

27B6

27B7

27B8

27B9

27BA

27BB

27BE

27BC

27BD
7x
8x
27F7

21CC

296B

296C

21D0

27F9
9x
  Not in Unicode, mapped to Private Use area

Extension set 0xE5 (Roman superscripts and subscripts)

[edit]

This row includes several lowercase Roman superscripts with trail bytes corresponding to their uppercase ASCII equivalents, and lowercase Roman subscripts with trail bytes corresponding to their lowercase ASCII equivalents.[3]

KPS 9566-2011 (prefixed with 0xE5)
0 1 2 3 4 5 6 7 8 9 A B C D E F
4x
1D43

1D47

1D9C

1D48

1D49

1DA0

1D4D

1D4F

1D50

207F

1D52
5x
1D56

1D57

1D58

1D5B
6x
2090

2091

1D62

2C7C

2092
7x
1D63

1D64

1D65

2093
8x
9x
  Not in Unicode, mapped to Private Use area

Extension set 0xE6 (Greek and symbol superscripts and subscripts)

[edit]
KPS 9566-2011 (prefixed with 0xE6)
0 1 2 3 4 5 6 7 8 9 A B C D E F
4x
1D45

1D5D

1D5E

1D5F

1D4B
ᶿ
1DBF

1DB9
5x
1D60

1D61
6x
1D66

1D67
7x
1D68

1D69

1D6A
8x
207A

207B
9x
208A

208B
  Not in Unicode, mapped to Private Use area

Extension set 0xE7 (further list markers)

[edit]
KPS 9566-2011 (prefixed with 0xE7)
0 1 2 3 4 5 6 7 8 9 A B C D E F
4x
325B

325C

325D

325E

325F

32B1

32B2

32B3

32B4

32B5

32B6

32B7

32B8

32B9

32BA
5x
32BB

32BC

32BD

32BE

32BF
6x
7x
8x
9x
  Not in Unicode, mapped to Private Use area

Extension set 0xE8

[edit]

All characters in this extension set map to the private use area, except 0xE884 which maps to U+FE30 PRESENTATION FORM FOR VERTICAL TWO DOT LEADER.[3]

Extension set 0xE9 (additional symbols and punctuation)

[edit]

This set contains playing card suit symbols, various miscellaneous symbols, and halfwidth counterparts for some of the currency symbols in row 8. The Kelvin sign is also included,[3] having been replaced in row 8 by the euro sign.[1]

KPS 9566-2011 (prefixed with 0xE9)
0 1 2 3 4 5 6 7 8 9 A B C D E F
4x
2205

2297

3013

2667

2661

2664

2662

25EF

29BE
5x
6x
7x
8x
212A

20A9

20A4
¥
00A5
9x
  Not in Unicode, mapped to Private Use area

Extension set 0xEA (Japanese punctuation and additional jamo)

[edit]

This set contains several punctuation marks used in Japan, and some characters from the Hangul Compatibility Jamo Unicode block which are not already included in row 4.[3] This comprises some of the jamo characters present in KS X 1001, but previously absent in KPS 9566.

KPS 9566-2011 (prefixed with 0xEA)
0 1 2 3 4 5 6 7 8 9 A B C D E F
4x
30FD

30FE

309D

309E

3005

3006

3007

30FC
5x
6x
3165

316D

3171

3172

3173

3174

3175

3176

3177

3178

3179

317A

317B

317D

317E
7x
3180

3184

3185

3187

3188

3189

318A

318B

318C

119E

318E
8x
9x
  Not in Unicode, mapped to Private Use area

Footnotes

[edit]
  1. ^ For instance, the headings of the ISO-IR-202 chart show 7-bit binary codes, as well as kuten/hang-yol codes, for the characters).[23]
  2. ^ a b As a ISO 2022 compatible 94n-character set, the plain space and delete character are always available as single-byte codes at 0x20 and 0x7F (not 0xA0 and 0xFF) respectively.
  3. ^ Or U+223C TILDE OPERATOR.[52]
  4. ^ Other mappings use U+00AD SOFT HYPHEN, to match KS X 1001 01-09.[52]
  5. ^ a b c d e A halfwidth such character is present in row 14, this is specifically a fullwidth character.
  6. ^ A vertical form of the tilde dash. The mapping file provided by the Unicode Consortium acknowledges by-name mapping to U+2E2F,[1] which is used by Red Star OS,[7] but notes that the Unicode character is intended for a significantly different character (a spacing vertical-tilde high diacritic) and also lists the mapping U+F104 (in the Private Use Area),[1] based on mapping data which had been submitted to the OpenOffice.org project in 2004.[22] Shown here using an image.
  7. ^ a b A character combining a period with a closing bracket, mapped to Private Use Area, shown here substituted.
  8. ^ Or U+25E6 WHITE BULLET.[52]
  9. ^ Or U+2022 BULLET.[52]
  10. ^ a b c Mapped to Private Use Area, shown here using an image.
  11. ^ Mac OS Korean (HangulTalk), an encoding of Wansung code plus extension sets, encodes a visually similar character at 0xA79B,[68] which Apple maps to the Unicode sequence U+25B4+20E4 (▴⃤).[69] There is no documented use of this mapping for the KPS 9566 character, however.
  12. ^ Accepted for inclusion in Unicode 16.0.[70]
  13. ^ a b c d e f g h i An emboldened/emphasised character from the name of a North Korean leader, mapped to Private Use Area, shown here simulated with markup.
  14. ^ a b c d e Form of a fraction with a horizontal bar and vertical arrangement, mapped to Private Use Area, shown here simulated.
  15. ^ Degrees Kelvin in 1997 version (some versions of the code chart include a degree sign in the unit symbol). Euro as of 2003 version.
  16. ^ a b Emblem of the Workers' Party of Korea, mapped to Private Use Area, shown here using an image.
  17. ^ Or U+279E HEAVY TRIANGLE-HEADED RIGHTWARDS ARROW or U+2B95 RIGHTWARDS BLACK ARROW: see text.
  18. ^ Listed in 1997 version charts and in Unicode proposal N2374 from 2001. Removed in 2003 version.
  19. ^ Mapped to U+261E (☞) in the 2003 edition.[1] The 2011 edition instead maps it to the Private Use Area character U+F13B.[3] The reference glyph is a backhand manicule,[23][3] i.e. matching U+1F449 (👉︎). Compare 0xE04D in KPS 9566-2011.
  20. ^ Circled upward-pointing manicule, mapped to Private Use Area,[1] shown here using an image. One possible non-PUA mapping would be to the sequence U+1F446+20DD (👆︎⃝).[7]
  21. ^ Up-left pointing scissors, mapped to Private Use Area, shown here using an image.

References

[edit]
  1. ^ a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab "KPS 9566-2003 to Unicode". Unicode Consortium.
  2. ^ a b c d e f g h i j k l Lunde, Ken (2009). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. pp. 148–151. ISBN 978-0-596-51447-1.
  3. ^ a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad Chung, Jaemin (2018-01-05). "Information on the most recent version of KPS 9566 (KPS 9566-2011?)" (PDF). UTC L2/18-011.
  4. ^ a b c d Cho, Chun-Hui (2000-07-05). "DPRK letter on character names and ordering in 10646-1: 2000" (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2231.
  5. ^ a b c Lunde, Ken (2019-03-25). "Four of a Kind: KS X 1001 & KPS 9566". CJK Type Blog. Adobe Inc.
  6. ^ a b Ewell, Doug (2002-08-15). "Re: Scripts in Unicode 4.0". Unicode Mail List Archive.
  7. ^ a b c d West, Andrew (2015-05-29). "KPS 9566 mappings (was Re: Arrow dingbats)". Unicode Mailing List Archive.
  8. ^ a b c Jennings, Thomas Daniel (2020-03-17) [1999]. "An annotated history of some character codes or ASCII: American Standard Code for Information Infiltration". Sensitive research (SR-IX). Archived from the original on 2016-05-22. Retrieved 2020-03-17.
  9. ^ "Standard ECMA-6: 7-bit Coded Character Set". Ecma International.
  10. ^ a b Lunde, Ken (2009). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. p. 89. ISBN 978-0-596-51447-1.
  11. ^ ECMA/TC 1 (1973). "Brief History". 7-bit Input/Output Coded Character Set (PDF) (4th ed.). ECMA. ECMA-6:1973.{{citation}}: CS1 maint: numeric names: authors list (link)
  12. ^ ECMA (1994). Character Code Structure and Extension Techniques (PDF) (6th ed.). ECMA-35:1994.
  13. ^ Lunde, Ken (2009). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. pp. 19–20, 581–582. ISBN 978-0-596-51447-1.
  14. ^ Lunde, Ken (2009). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. pp. 84–85. ISBN 978-0-596-51447-1.
  15. ^ a b "2.4: Multiple byte graphic character sets". International Register of Coded Character Sets to be Used With Escape Sequences (ISO-IR) (PDF). ITSCJ/IPSJ. p. 14.
  16. ^ a b c Lunde, Ken (2009). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. pp. 94–147. ISBN 978-0-596-51447-1.
  17. ^ a b Lunde, Ken (2009). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. pp. 242–255. ISBN 978-0-596-51447-1.
  18. ^ a b c Shin, Jungshik. "What are KS X 1001(KS C 5601) and other Hangul codes?". Hangul & Internet in Korea FAQ.
  19. ^ a b Hwang, Jinsang (2005). The Social Shaping of ICTs Standards: A Case of National Coded Character Set Standards Controversy in Korea (PDF). University of Edinburgh.
  20. ^ Lunde, Ken (1995-12-18). "3.3.6: N-byte Hangul". CJK.INF Version 1.9.
  21. ^ a b Committee for Standardization of the D P R of Korea (CSK) (2000-08-10). "Evidence for arrangement of Korean characters proposed by CSK" (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2246.
  22. ^ a b c d "Conversion tables between KPS 9566-2003(N. Korean) & Unicode". Apache OpenOffice (AOO) Bugzilla. 2004-08-27.
  23. ^ a b c d e f g h i Committee for Standardization of D. P. R. of Korea (1998-06-22). DPRK Standard Korean Graphic Character Set for Information Interchange (PDF). ITSCJ/IPSJ. ISO-IR-202.
  24. ^ Unicode Consortium. "History of Unicode Release and Publication Dates".
  25. ^ West, Andrew (2019-06-17) [2007-06-05]. "Unicode and ISO/IEC 10646".
  26. ^ Murata, Makoto (14 April 2000). "XML Japanese Profile". W3C Notes. W3C.
  27. ^ van Kesteren, Anne. Encoding Standard. WHATWG.
  28. ^ Lunde, Ken (1999). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing. Sebastopol, CA: O'Reilly. p. 116. ISBN 1-56592-224-7.
  29. ^ a b c d Bai, Yi; Sim, CheonHyeong (2022-10-16). "Proposal to consider adding CodeCharts support for kIRG_KPSource representative glyphs in Unicode" (PDF). UTC L2/22-238.
  30. ^ Cook, Richard. "Q: Why are DPRK (North Korean == kIRG_KPSource) glyphs missing from some CJK code charts?". FAQ - Chinese and Japanese. Unicode Consortium. Archived from the original on 2022-10-04.{{cite web}}: CS1 maint: unfit URL (link)
  31. ^ Jenkins, John H.; Cook, Richard; Lunde, Ken (2020-03-05). "Unicode Han Database (Unihan)". kIRG_KPSource. Unicode Standard Annex #38.
  32. ^ Sim, CheonHyeong (2022-06-19). "KPS 10721:2000 (Unicode KP1源) 文件重构 (修订版)" (PDF) (in Simplified Chinese).
  33. ^ For example: "CJK Compatibility Ideographs (§ DPRK compatibility ideographs" (PDF). Unicode 15.0 Versioned Charts (delta charts). Unicode Consortium. 2022.
  34. ^ Lunde, Ken (2022-11-01). "35) L2/22-238: Proposal to consider adding CodeCharts support for kIRG_KPSource representative glyphs" (PDF). CJK & Unihan Group Recommendations for UTC #173 Meeting. UTC L2/22-247.
  35. ^ Lunde, Ken (2023-02-07). "US/Unicode Activity Report for IRG #60" (PDF). UTC L2/23-058, ISO/IEC JTC1/SC2/WG2/IRG N2599.
  36. ^ Yergeau, F. (1998). UTF-8, a transformation format of ISO 10646. IETF. doi:10.17487/rfc2279. RFC 2279.
  37. ^ "Unicode Character Encoding Stability Policies". Unicode Consortium. 2017-06-23.
  38. ^ Jo, Chun-Hui (1999-08-10). "Amendment of the part containing the Korean characters in ISO/IEC 10646-1:1998 amendment 5" (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2056.
  39. ^ "New Work item proposal (NP) for an amendment of the Korean part of ISO/IEC 10646-1:1993". 1999-12-07. L2/99-380, ISO/IEC JTC 1 N5999.
  40. ^ Karlsson, Kent (2000-03-02). "Comments on DPRK New Work Item proposal on Korean characters". ISO/IEC JTC 1/SC 2/WG 2 N2167.
  41. ^ Committee for Standardization of the D P R of Korea (CSK) (2000-08-10). "Proposal for the addition of 14 Korean alphabets to ISO/IEC 10646-1" (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2243.
  42. ^ a b Committee for Standardization of the D P R of Korea (CSK) (2000-08-10). "Proposal for the addition of 82 symbols to ISO/IEC 10646-1" (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2244.
  43. ^ Committee for Standardization of the D P R of Korea (CSK) (2000-08-10). "Proposal to change the existing name of Korean characters in ISO/IEC 10646-1" (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2245.
  44. ^ Committee for Standardization of the D P R of Korea (CSK) (2000-08-10). "Proposal to add the Hanja column of D. P. R. of Korea in ISO/IEC 10646-1 (14938 ideographs to CJK Unified Ideographs and 3181 ideographs to its Extention [sic] A)" (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2247.
  45. ^ Korean script ad hoc group (2000-09-21). "Report of the meeting of the Korean script ad hoc group". ISO/IEC JTC 1/SC 2/WG 2 N2282.
  46. ^ a b c d e f g Committee for Standardization of the D P R of Korea (CSK) (2001-09-03). Proposal to add of 70 symbols to ISO/IEC 10646-1:2000 (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2374.
  47. ^ Committee for Standardization of the D P R of Korea (CSK) (2001-09-03). Proposal to add the 160 Compatibility Hanja code table of D P R of Korea into CJK Compatibility Ideographs (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2375.
  48. ^ a b c Gim, Gyeongseog (2001-10-13). ROK's Comments about DPRK's proposal, WG2 N 2374, to add 70 symbols to ISO/IEC 10646-1:2000 (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2390.
  49. ^ a b c d e Korean Script ad hoc group (2001-10-16). A Report of Korean Script ad hoc group meeting on Oct. 15, 2001 (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2392, UTC L2/01-388. Archived from the original (PDF) on 2020-08-03. Retrieved 2020-04-29.
  50. ^ a b c d Freytag, Asmus (2002-02-13). "Notes on proposed Symbols from DPRK" (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2417, UTC L2/02-102.
  51. ^ a b Emojipedia. "Unicode 4.0 Emoji". Emojipedia.
  52. ^ a b c d e f Kim, Kyongsok (2002-11-30). "National Body Position: 3-way cross-reference tables - KS X 1001, KPS 9566, and UCS" (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2564. [Note: updated links for tables accompanying document: [1] Archived 2021-04-03 at the Wayback Machine [2] Archived 2021-04-03 at the Wayback Machine
  53. ^ a b c d "Miscellaneous Symbols" (PDF). Unicode 4.0.0 Delta Code Charts. Unicode Consortium.
  54. ^ a b c d Whistler, Ken (2015-05-28). "Re: Arrow dingbats". Unicode Mail List Archive.
  55. ^ "Miscellaneous Symbols and Arrows" (PDF). Unicode 4.0.0 Delta Code Charts. Unicode Consortium.
  56. ^ a b Overington, William (2003-02-24). "Unicode 4.0 beta characters".
  57. ^ "Miscellaneous Symbols" (PDF). Unicode 3.2.0 Delta Code Charts. Unicode Consortium.
  58. ^ The Unicode 4.0 code chart shows the modified glyph,[53] whereas the Unicode 3.2 code chart shows the previous glyph.[57]
  59. ^ a b Scherer, Markus; Davis, Mark; Momoi, Kat; Tong, Darick; Kida, Yasuo; Edberg, Peter. "Emoji Symbols: Background Data—Background data for Proposal for Encoding Emoji Symbols" (PDF). UTC L2/10-132.
  60. ^ Suignard, Michel (2007-09-18). "Japanese TV Symbols" (PDF). UTC L2/07-391, ISO/IEC JTC 1/SC 2/WG 2 N3341.
  61. ^ Unicode Consortium (2020). "Emoji Versions & Sources, v13.0".
  62. ^ Emojipedia. "Unicode 5.2 Emoji List". Emojipedia.
  63. ^ Emojipedia. "Waving White Flag Emoji". Emojipedia.
  64. ^ Emojipedia. "Waving Black Flag Emoji". Emojipedia.
  65. ^ Marin Silva, Eduardo (2018). Proposal to reconsider compatibility symbols and punctuation used in the DPRK (PDF). UTC L2/18-004.
  66. ^ a b Korean Script ad hoc group (2001-10-16). A Report of Korean Script ad hoc group meeting on Oct. 15, 2001 (PDF). ISO/IEC JTC 1/SC 2/WG 2 N2392, UTC L2/01-388. Archived from the original (PDF) on 2020-08-03. Retrieved 2020-04-29. D P R of Korea suggested that they would review this character more carefully before it is discussed again at Korean Script ad hoc group or WG2.
  67. ^ a b Marín Silva, Eduardo (2018). Proposal to encode: SYMBOL FOR TYPE A ELECTRONICS (PDF). UTC L2/18-184R.
  68. ^ Lunde, Ken (2009). "Appendix E: Vendor Character Set Standards" (PDF). CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. ISBN 978-0-596-51447-1.
  69. ^ Apple (2005-04-05). "Map (external version) from Mac OS Korean encoding to Unicode 3.2 and later". Unicode Consortium.
  70. ^ "Symbols for Legacy Computing Supplement" (PDF). DRAFT The Unicode Standard, Version 16.0 BETA REVIEW. Unicode Consortium. Retrieved 2024-05-27.
  71. ^ Czyborra, Roman (1998-11-30) [1998-05-25]. "The Cyrillic Charset Soup". Archived from the original on 2016-12-03. Retrieved 2016-12-03.
  72. ^ Lunde, Ken (2009). "Seemingly Missing Characters". CJKV Information Processing: Chinese, Japanese, Korean & Vietnamese Computing (2nd ed.). Sebastopol, CA: O'Reilly. p. 180. ISBN 978-0-596-51447-1.
  73. ^ This table is generated from KPS9566.TXT.[1]
  74. ^ Chung, Jaemin (2021-03-17). "KP0-E5A9 should be mapped to U+67FF instead of U+676E" (PDF). UTC L2/21-059.
[edit]