Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Editing UTF-32

You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to a username, among other benefits.
Content that violates any copyrights will be deleted. Encyclopedic content must be verifiable through citations to reliable sources.
Latest revision Your text
Line 7: Line 7:


== History ==
== History ==
The original [[ISO/IEC 10646]] standard defines a 32-bit ''encoding form'' called '''UCS-4''', in which each code point in the [[Universal Character Set]] (UCS) is represented by a 31-bit value from 0 to 0x7FFFFFFF (the sign bit was unused and zero). In November 2003, Unicode was restricted by RFC&nbsp;3629 to match the constraints of the [[UTF-16]] encoding: explicitly prohibiting code points greater than U+10FFFF (and also the high and low surrogates U+D800 through U+DFFF). This limited subset defines UTF-32.<ref>{{Cite web|title=Publicly Available Standards - ISO/IEC 10646:2020 |url=https://standards.iso.org/ittf/PubliclyAvailableStandards/index.html|access-date=2021-10-12|website=ISO Standards |quote=Clause 9.4: "Because surrogate code points are not UCS scalar values, UTF-32 code units in the range 0000 D800-0000 DFFF are ill-formed". Clause 4.57: "[UCS codespace] <!--doubt this is in any source, or then redundant previously: codespace--> consisting of the integers from 0 to 10 FFFF (hexadecimal)".<!--in an older(?) source like: "uses the UCS codespace which consists of the integers from 0 to 10FFFF."[http://std.dkuug.dk/JTC1/sc2/WG2/docs/n3967.zip/FCD-10646-00-Main.pdf]--> Clause 4.58: "[UCS scalar value] any UCS code point except high-surrogate and low-surrogate code points".}}</ref><ref name="4_or_3_bytes">{{Cite web |title=Mapping codepoints to Unicode encoding forms |url=https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-AppendixA |first1=Peter |last1=Constable |date=2001-06-13 |access-date=2022-10-03 |website=Computers and Writing Systems - SIL International }}</ref> Although the ISO standard had (as of 1998 in Unicode 2.1) "reserved for private use" 0xE00000 to 0xFFFFFF, and 0x60000000 to 0x7FFFFFFF<ref>{{Cite web |title=Annex B - The Universal Character Set (UCS) |url=http://std.dkuug.dk/cen/tc304/guidecharactersets/guideannexb.html |access-date=2022-10-03 |website=DKUUG Standardizing |url-status=live |archive-url=https://web.archive.org/web/20220122081513/http://std.dkuug.dk/cen/tc304/guidecharactersets/guideannexb.html |archive-date= Jan 22, 2022 }}</ref> these areas were removed in later versions. Because the Principles and Procedures document of [[ISO/IEC JTC 1/SC 2]] Working Group 2 states that all future assignments of code points will be constrained to the Unicode range, UTF-32 will be able to represent all UCS code points and UTF-32 and UCS-4 are identical.<ref>{{Cite book |title=The Unicode Standard, version 6.0 |date=February 2011 |publisher=[[Unicode Consortium]] |isbn=978-1-936213-01-6 |location=Mountain View, CA |pages=573 |chapter=C.2 Encoding Forms in ISO/IEC 10646 |quote=It [UCS-4] is now treated simply as a synonym for UTF-32, and is considered the canonical form for representation of characters in 10646. |chapter-url=https://www.unicode.org/versions/Unicode6.0.0/appC.pdf}}</ref>
The original [[ISO/IEC 10646]] standard defines a 32-bit ''encoding form'' called '''UCS-4''', in which each code point in the [[Universal Character Set]] (UCS) is represented by a 31-bit value from 0 to 0x7FFFFFFF (the sign bit was unused and zero). In November 2003, Unicode was restricted by RFC&nbsp;3629 to match the constraints of the [[UTF-16]] encoding: explicitly prohibiting code points greater than U+10FFFF (and also the high and low surrogates U+D800 through U+DFFF). This limited subset defines UTF-32.<ref>{{Cite web|title=Publicly Available Standards - ISO/IEC 10646:2020 |url=https://standards.iso.org/ittf/PubliclyAvailableStandards/index.html|access-date=2021-10-12|website=ISO Standards |quote=Clause 9.4: "Because surrogate code points are not UCS scalar values, UTF-32 code units in the range 0000 D800-0000 DFFF are ill-formed". Clause 4.57: "[UCS codespace] <!--doubt this is in any source, or then redundant previously: codespace--> consisting of the integers from 0 to 10 FFFF (hexadecimal)".<!--in an older(?) source like: "uses the UCS codespace which consists of the integers from 0 to 10FFFF."[http://std.dkuug.dk/JTC1/sc2/WG2/docs/n3967.zip/FCD-10646-00-Main.pdf]--> Clause 4.58: "[UCS scalar value] any UCS code point except high-surrogate and low-surrogate code points".}}</ref><ref name="4_or_3_bytes">{{Cite web |title=Mapping codepoints to Unicode encoding forms |url=https://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IWS-AppendixA |first1=Peter |last1=Constable |date=2001-06-13 |access-date=2022-10-03 |website=Computers and Writing Systems - SIL International }}</ref> Although the ISO standard had (as of 1998 in Unicode 2.1) "reserved for private use" 0xE00000 to 0xFFFFFF, and 0x60000000 to 0x7FFFFFFF<ref>{{Cite web |title=Annex B - The Universal Character Set (UCS) |url=http://std.dkuug.dk/cen/tc304/guidecharactersets/guideannexb.html |access-date=2022-10-03 |website=DKUUG Standardizing |url-status=live |archive-url=https://web.archive.org/web/20220122081513/http://std.dkuug.dk/cen/tc304/guidecharactersets/guideannexb.html |archive-date= Jan 22, 2022 }}</ref> these areas were removed in later versions. Because the Principles and Procedures document of [[ISO/IEC JTC 1/SC 2]] Working Group 2 states that all future assignments of code points will be constrained to the Unicode range, UTF-32 will be able to represent all UCS code points and UTF-32 and UCS-4 are identical.


== Utility of fixed width ==
== Utility of fixed width ==
By publishing changes, you agree to the Terms of Use, and you irrevocably agree to release your contribution under the CC BY-SA 4.0 License and the GFDL. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel Editing help (opens in new window)

Copy and paste: – — ° ′ ″ ≈ ≠ ≤ ≥ ± − × ÷ ← → · §   Cite your sources: <ref></ref>


{{}}   {{{}}}   |   []   [[]]   [[Category:]]   #REDIRECT [[]]   &nbsp;   <s></s>   <sup></sup>   <sub></sub>   <code></code>   <pre></pre>   <blockquote></blockquote>   <ref></ref> <ref name="" />   {{Reflist}}   <references />   <includeonly></includeonly>   <noinclude></noinclude>   {{DEFAULTSORT:}}   <nowiki></nowiki>   <!-- -->   <span class="plainlinks"></span>


Symbols: ~ | ¡ ¿ † ‡ ↔ ↑ ↓ • ¶   # ∞   ‹› «»   ¤ ₳ ฿ ₵ ¢ ₡ ₢ $ ₫ ₯ € ₠ ₣ ƒ ₴ ₭ ₤ ℳ ₥ ₦ № ₧ ₰ £ ៛ ₨ ₪ ৳ ₮ ₩ ¥   ♠ ♣ ♥ ♦   𝄫 ♭ ♮ ♯ 𝄪   © ® ™
Latin: A a Á á À à  â Ä ä Ǎ ǎ Ă ă Ā ā à ã Å å Ą ą Æ æ Ǣ ǣ   B b   C c Ć ć Ċ ċ Ĉ ĉ Č č Ç ç   D d Ď ď Đ đ Ḍ ḍ Ð ð   E e É é È è Ė ė Ê ê Ë ë Ě ě Ĕ ĕ Ē ē Ẽ ẽ Ę ę Ẹ ẹ Ɛ ɛ Ǝ ǝ Ə ə   F f   G g Ġ ġ Ĝ ĝ Ğ ğ Ģ ģ   H h Ĥ ĥ Ħ ħ Ḥ ḥ   I i İ ı Í í Ì ì Î î Ï ï Ǐ ǐ Ĭ ĭ Ī ī Ĩ ĩ Į į Ị ị   J j Ĵ ĵ   K k Ķ ķ   L l Ĺ ĺ Ŀ ŀ Ľ ľ Ļ ļ Ł ł Ḷ ḷ Ḹ ḹ   M m Ṃ ṃ   N n Ń ń Ň ň Ñ ñ Ņ ņ Ṇ ṇ Ŋ ŋ   O o Ó ó Ò ò Ô ô Ö ö Ǒ ǒ Ŏ ŏ Ō ō Õ õ Ǫ ǫ Ọ ọ Ő ő Ø ø Œ œ   Ɔ ɔ   P p   Q q   R r Ŕ ŕ Ř ř Ŗ ŗ Ṛ ṛ Ṝ ṝ   S s Ś ś Ŝ ŝ Š š Ş ş Ș ș Ṣ ṣ ß   T t Ť ť Ţ ţ Ț ț Ṭ ṭ Þ þ   U u Ú ú Ù ù Û û Ü ü Ǔ ǔ Ŭ ŭ Ū ū Ũ ũ Ů ů Ų ų Ụ ụ Ű ű Ǘ ǘ Ǜ ǜ Ǚ ǚ Ǖ ǖ   V v   W w Ŵ ŵ   X x   Y y Ý ý Ŷ ŷ Ÿ ÿ Ỹ ỹ Ȳ ȳ   Z z Ź ź Ż ż Ž ž   ß Ð ð Þ þ Ŋ ŋ Ə ə
Greek: Ά ά Έ έ Ή ή Ί ί Ό ό Ύ ύ Ώ ώ   Α α Β β Γ γ Δ δ   Ε ε Ζ ζ Η η Θ θ   Ι ι Κ κ Λ λ Μ μ   Ν ν Ξ ξ Ο ο Π π   Ρ ρ Σ σ ς Τ τ Υ υ   Φ φ Χ χ Ψ ψ Ω ω   {{Polytonic|}}
Cyrillic: А а Б б В в Г г   Ґ ґ Ѓ ѓ Д д Ђ ђ   Е е Ё ё Є є Ж ж   З з Ѕ ѕ И и І і   Ї ї Й й Ј ј К к   Ќ ќ Л л Љ љ М м   Н н Њ њ О о П п   Р р С с Т т Ћ ћ   У у Ў ў Ф ф Х х   Ц ц Ч ч Џ џ Ш ш   Щ щ Ъ ъ Ы ы Ь ь   Э э Ю ю Я я   ́
IPA: t̪ d̪ ʈ ɖ ɟ ɡ ɢ ʡ ʔ   ɸ β θ ð ʃ ʒ ɕ ʑ ʂ ʐ ç ʝ ɣ χ ʁ ħ ʕ ʜ ʢ ɦ   ɱ ɳ ɲ ŋ ɴ   ʋ ɹ ɻ ɰ   ʙ ⱱ ʀ ɾ ɽ   ɫ ɬ ɮ ɺ ɭ ʎ ʟ   ɥ ʍ ɧ   ʼ   ɓ ɗ ʄ ɠ ʛ   ʘ ǀ ǃ ǂ ǁ   ɨ ʉ ɯ   ɪ ʏ ʊ   ø ɘ ɵ ɤ   ə ɚ   ɛ œ ɜ ɝ ɞ ʌ ɔ   æ   ɐ ɶ ɑ ɒ   ʰ ʱ ʷ ʲ ˠ ˤ ⁿ ˡ   ˈ ˌ ː ˑ ̪   {{IPA|}}

Wikidata entities used in this page

  • UTF-32: Sitelink, Title, Description: en

Pages transcluded onto the current version of this page (help):