Unicode 16.0.0
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" ""> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta http-equiv="Content-Language" content="en-us"> <meta name="keywords" content="Unicode Standard"> <title>Unicode 16.0.0</title> <link rel="stylesheet" type="text/css" href=""> <style type="text/css"> :root { /* The following variable is used to control display of a (pre-release) "Status" paragraph at the top of the page. The element is displayed during alpha and beta periods, but hidden for the release. Prior to release, set to: --status_display: block; For the release, set to: --status_display: none; */ --status_display: none; } .status { background-color: aquamarine; border: 1px solid black; padding: 2px; display: var(--status_display); } .nobullet { list-style-type: none; } </style> </head> <body> <table width="100%" cellpadding="0" cellspacing="0" border="0"> <!-- BEGIN HEADER BAR --> <tr> <td colspan="2"> <table width="100%" border="0" cellpadding="0" cellspacing="0"> <tr> <td class="icon" style="width:38px; height:35px"> <a href=""> <img border="0" src="" align="middle" alt="[Unicode]" width="34" height="33"></a> </td> <td class="icon" style="vertical-align:middle"> <a class="bar"> </a> <a class="bar" href=""><font size="3">Unicode 16.0.0</font></a> </td> <td class="bar"> <a href="" class="bar">Tech Site</a> | <a href="" class="bar">Site Map</a> | <a href="" class="bar">Search </a> </td> </tr> </table> </td> </tr> <tr> <td colspan="2" class="gray"> </td> </tr> <!-- END HEADER BAR --> <tr> <!-- BEGIN CONTENTS --> <td valign="top" class="contents"> <blockquote> <h1>Unicode® 16.0.0</h1> <h3 align="center" style="margin-top:1em">2024 September 10 (<a href="">Announcement</a>)</h3> <blockquote class="status"><strong>STATUS:</strong> This is a preliminary draft page for an upcoming release. Some details may be missing or incorrect, and some links may be wrong or broken. <!--During the alpha review period, errors are expected and feedback is not necessary.--> During the beta review period, feedback on errors will be helpful and appreciated.</blockquote> <p>This page summarizes the important changes for the Unicode Standard, Version 16.0.0. This version supersedes all previous versions of the Unicode Standard.</p> <blockquote> <ul class="nobullet"> <li>A. <a href="#Summary">Summary</a></li> <li>B. <a href="#Technical_Overview">Technical Overview</a> <ul class="nobullet"> <li><a href="#Core_Specification">Core Specification</a></li> <li><a href="#Code_Charts">Code Charts</a></li> <li><a href="#Radical_Stroke">Han Radical-Stroke Indices</a></li> <li><a href="#Annexes">Unicode Standard Annexes</a></li> <li><a href="#UCD">Unicode Character Database</a></li> <li><a href="#Version_References">Version References</a></li> <li><a href="#Errata">Errata</a></li> </ul> </li> <li>C. <a href="#Stability_Policy">Stability Policy Update</a></li> <li>D. <a href="#Character_Additions">Textual Changes and Character Additions</a></li> <li>E. <a href="#Conformance_Changes">Conformance Changes</a></li> <li>F. <a href="#Database_Changes">Changes in the Unicode Character Database</a></li> <li>G. <a href="#UAX_Changes">Changes in the Unicode Standard Annexes</a></li> <li>H. <a href="#UTS_Changes">Changes in Synchronized Unicode Technical Standards</a></li> <li>I. <a href="#Components">List of Components</a></li> <li>M. <a href="#Migration">Implications for Migration</a></li> </ul> </blockquote> <h2><a id="Summary"></a>A. Summary</h2> <p>Unicode 16.0 adds 5185 characters, for a total of 154,998 characters. The new additions include seven new scripts: </p> <ul> <li>Garay is a modern-use script from West Africa.</li> <li>Gurung Khema, Kirat Rai, Ol Onal and Sunuwar are four modern-use scripts from Northeast India and Nepal.</li> <li>Todhri is an historic script used for Albanian.</li> <li>Tulu-Tigalari is an historic script from Southwest India.</li> </ul> <p>Other character additions include seven new emoji characters plus 3,995 additional Egyptian Hieroglyphs and over 700 symbols from legacy computing environments.</p> <p>In addition to new characters, new “Moji Jōhō Kiban” (文字情報基盤) Japanese source references have been added for over 36,000 CJK unified ideographs. These are reflected in the code charts for virtually all CJK unified ideograph blocks by additional representative glyphs in the “J” column.</p> <h3>New Data Files for Unicode 16.0</h3> <ul> <li>DoNotEmit.txt. This is a new file that collects information about characters or character sequences that should not be emitted or generated in newly authored text and for which a suitable alternative sequence exists. This data could be used by applications such as input methods or autocorrect.</li> <li>Unikemet.txt. This data file provides property and other character information in support of Egyptian hieroglyphs.</li> </ul> <h3>Synchronization</h3> <p>Several other important Unicode specifications have been updated for Version 16.0. The following four Unicode Technical Standards are versioned in synchrony with the Unicode Standard, because their data files cover the same repertoire. All have been updated to Version 16.0:</p> <div align="center"> <table class="subtle"> <tr> <th>Specification</th> <th>Scope</th> <th>Data Files</th> </tr> <tr> <td><a href="">UTS #10, Unicode Collation Algorithm</a></td> <td>Sorting Unicode text</td> <td><a href="">UCA data</a></td> </tr> <tr> <td><a href="">UTS #39, Unicode Security Mechanisms</a></td> <td>Reducing Unicode spoofing</td> <td><a href="">Security data</a></td> </tr> <tr> <td><a href="">UTS #46, Unicode IDNA Compatibility Processing</a></td> <td>Compatible processing of non-ASCII URLs</td> <td><a href="">IDNA data</a><br> <a href="">IDNA 2008 derived data</a></td> </tr> <tr> <td><a href="">UTS #51, Unicode Emoji</a></td> <td>Emoji and their behavior</td> <td><a href="">Emoji data</a></td> </tr> </table> </div> <p>Some of the changes in Version 16.0 and associated Unicode Technical Standards may require modifications to implementations. For more information, see the migration and modification sections of UTS #10, UTS #39, UTS #46, and UTS #51.</p> <p>See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.</p> <p>See the following resource links for general information about Unicode versions and other information about the Unicode Standard and other publications of the Unicode Consortium.</p> <ul> <li> <a href="">Archive of Unicode Versions</a></li> <li> <a href="">About Versions</a></li> <li> <a href="">Glossary of Unicode Terms</a></li> <li> <a href="">References for the Unicode Standard</a></li> <li><a href="">Unicode Acknowledgements</a></li> <li> <a href="">Technical Reports</a></li> <li> <a href="">Unicode Emoji</a></li> </ul> <h2><a id="Technical_Overview"></a>B. Technical Overview</h2> <p>Version 16.0 of the Unicode Standard consists of:</p> <ul> <li>The core specification</li> <li>The code charts (delta and archival) for this version</li> <li>The Unicode Standard Annexes</li> <li>The Unicode Character Database (UCD)</li> </ul> <p>The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters. The Unicode Standard Annexes supply detailed normative information about particular aspects of the standard. The Unicode Character Database supplies normative and informative data for implementers to allow them to implement the Unicode Standard.</p> <h3><a id="Core_Specification"></a>Core Specification</h3> <p>The core specification for Version 16.0 is available for browsing online as <a href="">per-chapter web pages</a>. Because the full table of contents for the core specification is provided, with interactive links, no separate bookmarks page is provided for this release, nor are separate chapter links provided directly in this summary page for the Unicode Standard. Anchors for chapters, sections, tables, and figures in the core specification are shown with the convention of a "#" in the left margin of the heading or caption. Those anchors can be clicked on to provide custom bookmarks to any particular portion of the text, down to the level of subsections. Numbering of sections has been extended down to the subsection level, as well, to improve referenceabiity of precise content.</p> <p>The HTML version of the core specification is authoritative. However, for convenience of reference, an archival version of core specification is also available as a <a href="">single pdf</a>. (12 MB)</p> <h3><a id="Code_Charts"></a>Code Charts</h3> <p>Several sets of code charts are available. They serve different purposes:</p> <div align="center"> <table class="simple"> <tr> <th>Chart Type</th> <th>Description</th> </tr> <tr> <td nowrap><a href="">Latest Code Charts</a></td> <td>These charts are always the most current code charts available, and may be updated at any time. The charts are organized by scripts and blocks for easy reference. An online <a href="">index by character name</a> is also provided. <!-- The <a href="">Tableaux des caractères</a> provides a French translation of these latest code charts. --></td> </tr> <tr> <td nowrap><a href="">Delta Code Charts</a></td> <td>These charts show the new blocks and any blocks in which characters were added specifically for Unicode 16.0.0. The new characters and any major updates to the representative glyphs are visually highlighted in these charts.</td> </tr> <tr> <td nowrap><a href="">Archival Code Charts</a></td> <td>These charts contain the entire set of characters, names and representative glyphs at the time of publication of Unicode 16.0.0. <!-- A <a href="">French translation</a> of the archival code charts is also available for this version. --></td> </tr> </table> </div> <p>The delta and archival code charts are a stable part of this release of the Unicode Standard. They will never be updated.</p> <h3><a id="Radical_Stroke"></a>Han Radical-Stroke Indices</h3> <p>There are a number of radical-stroke indices available to assist in the lookup of Han ideographs in the code charts.</p> <div align="center"> <table class="simple"> <tr> <th>Index Type</th> <th>Description</th> </tr> <tr> <td><a href="">Interactive</a></td> <td>An interactive CJK character lookup page that supports lookup either by code point or by radical and stroke values.</td> </tr> <tr> <td nowrap><a href="">IICore</a> (3.8 MB)</td> <td>A static radical-stroke index PDF file limited to only the <a href="">IICore</a> repertoire. (This RS index is seldom updated.)</td> </tr> <tr> <td nowrap><a href="">Unihan Core 2020</a> (8.2 MB)</td> <td>A static radical-stroke index PDF file limited to only the <a href="">Unihan Core 2020</a> repertoire. (This RS index is seldom updated.)</td> </tr> <tr> <td nowrap><a href="">Complete</a> (43 MB)</td> <td>A static radical-stroke index PDF file that covers the entire CJK ideograph repertoire for Unicode 16.0.</td> </tr> <tr> <td nowrap><a href="">Complete</a></td> <td>A static data file that corresponds to the complete radical-stroke index for Unicode 16.0.</td> </tr> </table> </div> <p>The complete radical-stroke index is a stable part of this release of the Unicode Standard. It will never be updated.</p> <h3><a id="Annexes"></a>Unicode Standard Annexes</h3> <blockquote class="status"><strong>STATUS:</strong> During the alpha review and beta review periods, links to individual UAXes (or UTSes) point to the proposed update for that document, if any. If no proposed update has been posted for the document, links point to the last published version of the document, for reference.</blockquote> <p>Links to the individual Unicode Standard Annexes for this version are available in <a href="#Components">Section I, List of Components</a> below. The summary list of significant changes in the content of each Unicode Standard Annex for Version 16.0 can be found in <a href="#UAX_Changes">Section G, Changes in the Unicode Standard Annexes</a> below.</p> <h3><a id="UCD"></a>Unicode Character Database</h3> <!-- <blockquote class="status"><strong>STATUS:</strong> During the alpha review period, some of the data files may not be posted. Later, during beta review, a complete set of consistent data files will be posted, including data files associated with various Unicode Technical Standards.</blockquote> --> <p><a href="">Data files</a> for Version 16.0 of the Unicode Character Database are available. The ReadMe.txt in that directory provides a roadmap to the functions of the various subdirectories. Detailed documentation about the data files can be found in <a href="">UAX #44, Unicode Character Database</a>. <a href="">Zipped versions</a> of the UCD for bulk download are available, as well.</p> <h3><a id="Version_References"></a>Version References</h3> <p>Version 16.0.0 of the Unicode Standard should be referenced as:</p> <blockquote> <p>The Unicode Consortium. <em>The Unicode Standard, Version 16.0.0</em>, (South San Francisco: The Unicode Consortium, 2024. ISBN 978-1-936213-34-4)<br> <a href=""></a></p> </blockquote> <p>The terms “Version 16.0” or “Unicode 16.0” are abbreviations for the full version reference, Version 16.0.0.</p> <p>The citation and permalink for the latest published version of the Unicode Standard is:</p> <blockquote> <p>The Unicode Consortium. <em>The Unicode Standard</em>.<br> <a href=""></a></p> </blockquote> <p>A complete specification of the contributory files for Unicode 16.0 is found below in <a href="#Components">Section I, List of Components</a>. For examples of how to cite particular portions of the Unicode Standard, see also the <a href="">Reference Examples</a>.</p> <h3><a id="Errata"></a>Errata</h3> <p>Errata incorporated into Unicode 16.0 are listed by date in a <a href="erratafixed.html">separate table</a>. For corrigenda and errata after the release of Unicode 16.0, see the list of current <a href="">Updates and Errata</a>.</p> <h2><a id="Stability_Policy"></a>C. Stability Policy Update</h2> <p>No significant updates to the <a href="">Character Encoding Stability Policies</a> have occurred in the interval since the last release of the Unicode Standard.</p> <h2><a id="Character_Additions"></a>D. Textual Changes and Character Additions</h2> <p> Changes in the Unicode Standard Annexes are listed in <a href="#UAX_Changes">Section G</a>.</p> <h3>Character Assignment Overview</h3> <p>5185 characters have been added. Most character additions are in new blocks, but there are also character additions to a number of existing blocks. For details, see the <a href="">delta code charts</a>.</p> <h3>New Blocks</h3> <!-- <p>There are no new blocks defined in Version 16.0.</p> --> <p>The following blocks are newly defined in Version 16.0:</p> <div align="center"> <table class="simple"> <tr> <th>Range</th> <th>Block Name</th> </tr> <tr> <td style = "text-align:right">105C0..105FF</td> <td>Todhri</td> </tr> <tr> <td style = "text-align:right">10D40..10D8F</td> <td>Garay</td> </tr> <tr> <td style = "text-align:right">11380..113FF</td> <td>Tulu-Tigalari</td> </tr> <tr> <td style = "text-align:right">116D0..116FF</td> <td>Myanmar Extended-C</td> </tr> <tr> <td style = "text-align:right">11BC0..11BFF</td> <td>Sunuwar</td> </tr> <tr> <td style = "text-align:right">13460..1355F</td> <td>Egyptian Hieroglyphs Extended-A</td> </tr> <tr> <td style = "text-align:right">16100..1613F</td> <td>Gurung Khema</td> </tr> <tr> <td style = "text-align:right">16D40..16D7F</td> <td>Kirat Rai</td> </tr> <tr> <td style = "text-align:right">1CC00..1CEBF</td> <td>Symbols for Legacy Computing Supplement</td> </tr> <tr> <td style = "text-align:right">1E5D0..1E5FF</td> <td>Ol Onal</td> </tr> </table> </div> <h2><a id="Conformance_Changes"></a>E. Conformance Changes</h2> <p>There are no new conformance requirements for the core specification in Unicode 16.0.</p> <h2><a id="Database_Changes"></a>F. Changes in the Unicode Character Database</h2> <p>The detailed listing of all changes to the contributory data files of the Unicode Character Database for Version 16.0 can be found in <a href="">UAX #44, Unicode Character Database</a>. The changes listed there include character additions and property revisions to existing characters that will affect implementations. Some of the important impacts on implementations migrating from earlier versions of the standard are highlighted in <a href="#Migration">Section M</a>.</p> <h2><a id="UAX_Changes"></a>G. Changes in the Unicode Standard Annexes</h2> <p>In Version 16.0, some of the Unicode Standard Annexes have had significant revisions. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UAX, linked directly from the following list of UAXes.</p> <div align="center"> <table class="subtle"> <tr> <th nowrap>Unicode Standard Annex</th> <th>Changes</th> </tr> <tr> <td> <a href="">UAX #9</a><br>Unicode Bidirectional Algorithm </td> <td>Textual clarification was added to Section 3.3.2, Explicit Levels and Directions.</td> </tr> <tr> <td> <a href="">UAX #11</a><br>East Asian Width</td> <td>The summary was updated. ED7 in Section 4 was updated. Section 4.3 was added to explain that variation sequences can be considered when resolving ambiguous width. The last bullet in Section 5 was removed. The East_Asian_Width property value of some characters was changed from N to W.</td> </tr> <tr> <td> <a href="">UAX #14</a><br>Unicode Line Breaking Algorithm</td> <td>The rules for line breaking of numbers, hyphens, and Simplified Chinese quotation marks were improved, bringing UAX #14 into alignment with the ICU implementation.</td> </tr> <tr> <td> <a href="">UAX #15</a><br>Unicode Normalization Forms </td> <td>A new section was added regarding normalization contexts that require care in optimization. The conformance clause UAX15-C4 was clarified by explicit reference to the previously implied Stream-Safe Text Process.</td> </tr> <tr> <td> <a href="">UAX #24</a><br>Unicode Script Property </td> <td>Documentation was added regarding the change in order of entries in ScriptExtensions.txt.</td> </tr> <tr> <td> <a href="">UAX #29</a><br>Unicode Text Segmentation</td> <td>The definition of GCB=V was updated to include Kirat Rai vowels. The description of rules GB6 - GB8 was updated.</td> </tr> <tr> <td> <a href="">UAX #31</a><br>Unicode Identifiers and Syntax </td> <td>A clarification was added that NFD must be applied before toNFKC_Casefold in order to correctly meet requirements UAX31-R4 and UAX-R5 with NFKC and full case folding. A reference to definition D147 of the Unicode Standard was added.</td> </tr> <tr> <td> <a href="">UAX #34</a><br>Unicode Named Character Sequences</td> <td><font color="#767676">No significant changes in this version.</font></td> </tr> <tr> <td> <a href="">UAX #38</a><br>Unicode Han Database (Unihan)</td> <td>The relationship between the Equivalent_Unified_Ideograph property and the Unihan database was clarified. The sorting algorithm examples have been updated. A reference to the new RSIndex.txt data file was added. The delimiter of the kAccountingNumeric property was updated. Two new provisional properties, kFanqie and kZhuang, were added. The provisional kFrequency property was removed. The syntax and description of the kIRG_GSource and kPhonetic properties were updated. The description of the kPrimaryNumeric property was updated. The syntax and description of the kRSUnicode property were updated to accommodate a second non-Chinese simplified radical.</td> </tr> <tr> <td> <a href="">UAX #41</a><br>Common References for Unicode Standard Annexes</td> <td>All references were updated for Unicode 16.0.</td> </tr> <tr> <td> <a href="">UAX #42</a><br>Unicode Character Database in XML</td> <td>New code point attributes, values, and patterns were added for Unicode 16.0.</td> </tr> <tr> <td><a href="">UAX #44</a><br> Unicode Character Database </td> <td>The documentation was updated to describe the changes to the UCD for Version 16.0. Documentation was added for the new property Modifier_Combining_Mark. A clarification was added regarding the derivation of Numeric_Value from various Unihan properties. The definition of Indic_Conjunct_Break was updated for correctness. Clarifying text was added regarding stability issues for aliases.</td> </tr> <tr> <td> <a href="">UAX #45</a><br> U-Source Ideographs</td> <td><font color="#767676">No significant changes in this version.</font></td> </tr> <tr> <td> <a href="">UAX #50</a><br> Unicode Vertical Text Layout</td> <td>Section 3.2.4 and Table 2 were added to explain the tailoring of fullwidth quotation marks.</td> </tr> <tr> <td> <a href="">UAX #53</a><br> Unicode Arabic Mark Rendering</td> <td>This specification was changed from a UTR to a UAX for Unicode 16.0. The image for Example 3 was corrected. An implementation note was added after the description of the algorithm. A section was added for U+10EFC ARABIC COMBINING ALEF OVERLAY.</td> </tr> <tr> <td> <a href="">UAX #57</a><br> Unicode Egyptian Hieroglyph Database (Unikemet)</td> <td>This UAX is new for Unicode 16.0, and describes the Unikemet.txt data file for the UCD.</td> </tr> </table> </div> <h2><a id="UTS_Changes"></a>H. Changes in Synchronized Unicode Technical Standards</h2> <p>There are also significant revisions in the Unicode Technical Standards whose versions are synchronized with the Unicode Standard. The most important of these changes are listed below. For the full details of all changes, see the Modifications section of each UTS, linked directly from the following list of UTSes.</p> <div align="center"> <table class="subtle"> <tr> <th nowrap>Unicode Technical Standard</th> <th>Changes</th> </tr> <tr> <td><a href="">UTS #10</a><br>Unicode Collation Algorithm</td> <td>Table 18 in appendix B was extended to include a CTT Name column. Text was added to Appendix B to enable ISO/IEC 14651 to refer to the CTT tables published (starting with Unicode 16.0) on the Unicode website. A note was added to Section 10.1.3, Implicit Weights, explaining how the CTT for ISO/IEC 14651 uses the implicit weight calculated in Table 16.</td> </tr> <tr> <td><a href="">UTS #39</a><br>Unicode Security Mechanisms</td> <td>The definitions of <i>skeleton</i> and <i>confusable</i> were updated.</td> </tr> <tr> <td><a href="">UTS #46</a><br>Unicode IDNA Compatibility Processing</td> <td>The handling of UseSTD3ASCIIRules was simplified. The derivation of the Base Valid Set was updated, along with the derivation of the base exclusion set. Section 7 was removed.</td> </tr> <tr> <td><a href="">UTS #51</a><br>Unicode Emoji</td> <td>All references were updated for Unicode 16.0.</td> </tr> </table> </div> <h2><a id="Components"></a>I. List of Components</h2> <p>This section lists the components of Version 16.0.0 of the Unicode Standard. The version numbering and the role of each component are explained in <a href=""> Versions of The Unicode Standard</a>.</p> <table> <tr> <th class="noborder" align="left">Core Specification</th> </tr> <tr> <td class="noborder"><a href="">Authoritative HTML</a></td> </tr> <tr> <td class="noborder"> Archival PDF: <a href="">UnicodeStandard-16.0.pdf</a> (size: 14 MB)</td> </tr> <tr> <th class="noborder" align="left">Code Charts and Radical-Stroke Index</th> </tr> <tr> <td class="noborder"> <a href="">Code Charts</a> (size: 110 MB)<br> <a href="">Radical-Stroke Index</a> (size: 44 MB)<br> <a href="">Radical-Stroke Index data</a></td> </tr> <tr> <th class="noborder" align="left"> <a href="">Unicode Standard Annexes</a></th> </tr> <tr> <td> <a href=""> UAX #9: Unicode Bidirectional Algorithm</a><br> <a href=""> UAX #11: East Asian Width</a><br> <a href=""> UAX #14: Unicode Line Breaking Algorithm</a><br> <a href=""> UAX #15: Unicode Normalization Forms</a><br> <a href=""> UAX #24: Unicode Script Property</a><br> <a href=""> UAX #29: Unicode Text Segmentation</a><br> <a href=""> UAX #31: Unicode Identifiers and Syntax</a><br> <a href=""> UAX #34: Unicode Named Character Sequences</a><br> <a href=""> UAX #38: Unicode Han Database (Unihan)</a><br> <a href=""> UAX #41: Common References for Unicode Standard Annexes</a><br> <a href=""> UAX #42: Unicode Character Database in XML</a><br> <a href=""> UAX #44: Unicode Character Database</a><br> <a href=""> UAX #45: U-Source Ideographs</a><br> <a href=""> UAX #50: Unicode Vertical Text Layout</a><br> <a href=""> UAX #53: Unicode Arabic Mark Rendering</a><br> <a href=""> UAX #57: Unicode Egyptian Hieroglyph Database (Unikemet)</a> </td> </tr> <tr> <th class="noborder" align="left">Unicode Character Database</th> </tr> <tr> <td class="noborder"> <a href=""></a></td> </tr> <tr> <th class="noborder" align="left">Documentation</th> </tr> <tr> <td class="noborder"> <a href="">Index.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">NamesList.html</a></td> </tr> <tr> <td class="noborder"> <a href="">ReadMe.txt</a></td> </tr> <tr> <th class="noborder" align="left">Core Data</th> </tr> <tr> <td class="noborder"> <a href="">ArabicShaping.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">BidiBrackets.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">BidiMirroring.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">Blocks.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">CJKRadicals.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">CompositionExclusions.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DoNotEmit.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">EastAsianWidth.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">EmojiSources.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">EquivalentUnifiedIdeograph.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">HangulSyllableType.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">IndicPositionalCategory.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">IndicSyllabicCategory.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">Jamo.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">LineBreak.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">NameAliases.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">NamedSequences.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">NamedSequencesProv.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">NamesList.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">NormalizationCorrections.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">NushuSources.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">PropertyAliases.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">PropertyValueAliases.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">PropList.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">Scripts.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">ScriptExtensions.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">SpecialCasing.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">StandardizedVariants.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">TangutSources.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">UnicodeData.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">Unikemet.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">VerticalOrientation.txt</a></td> </tr> <tr> <th class="noborder" align="left"> Unihan Database (<a href=""></a>)</th> </tr> <tr> <td class="noborder">Unihan_DictionaryIndices.txt</td> </tr> <tr> <td class="noborder">Unihan_DictionaryLikeData.txt</td> </tr> <tr> <td class="noborder">Unihan_IRGSources.txt</td> </tr> <tr> <td class="noborder">Unihan_NumericValues.txt</td> </tr> <tr> <td class="noborder">Unihan_OtherMappings.txt</td> </tr> <tr> <td class="noborder">Unihan_RadicalStrokeCounts.txt</td> </tr> <tr> <td class="noborder">Unihan_Readings.txt</td> </tr> <tr> <td class="noborder">Unihan_Variants.txt</td> </tr> <tr> <th class="noborder" align="left">Data for UAX #45</th> </tr> <tr> <td class="noborder"> <a href="">USourceData.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">USourceGlyphs.pdf</a></td> </tr> <tr> <td class="noborder"> <a href="">USourceRSChart.pdf</a></td> </tr> <tr> <th class="noborder" align="left">Derived Data</th> </tr> <tr> <td class="noborder"> <a href="">CaseFolding.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedAge.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedCoreProperties.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedNormalizationProps.txt</a></td> </tr> <tr> <th class="noborder" align="left">Extracted Data</th> </tr> <tr> <td class="noborder"> <a href="">DerivedBidiClass.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedBinaryProperties.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedCombiningClass.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedDecompositionType.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedEastAsianWidth.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedGeneralCategory.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedJoiningGroup.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedJoiningType.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedLineBreak.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedName.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedNumericType.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">DerivedNumericValues.txt</a></td> </tr> <tr> <th class="noborder" align="left">Conformance Test Data</th> </tr> <tr> <td class="noborder"> <a href="">BidiCharacterTest.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">BidiTest.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">NormalizationTest.txt</a></td> </tr> <tr> <th class="noborder" align="left">Auxiliary Data for UAX #14 and UAX #29</th> </tr> <tr> <td class="noborder"> <a href="">GraphemeBreakProperty.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">GraphemeBreakTest.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">LineBreakTest.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">SentenceBreakProperty.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">SentenceBreakTest.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">WordBreakProperty.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">WordBreakTest.txt</a></td> </tr> <tr> <th class="noborder" align="left">Documentation for Auxiliary Data</th> </tr> <tr> <td class="noborder"> <a href="">GraphemeBreakTest.html</a></td> </tr> <tr> <td class="noborder"> <a href="">LineBreakTest.html</a></td> </tr> <tr> <td class="noborder"> <a href="">SentenceBreakTest.html</a></td> </tr> <tr> <td class="noborder"> <a href="">WordBreakTest.html</a></td> </tr> <tr> <th class="noborder" align="left">Emoji Data</th> </tr> <tr> <td class="noborder"> <a href="">emoji-data.txt</a></td> </tr> <tr> <td class="noborder"> <a href="">emoji-variation-sequences.txt</a></td> </tr> </table> <h2><a id="Migration"></a>M. Implications for Migration</h2> <p>There are a significant number of changes in Unicode 16.0 which may impact implementations upgrading to Version 16.0 from earlier versions of the standard. The most important of these are listed and explained here, to help focus on the issues most likely to cause unexpected trouble during upgrades.</p> <h3>Core Specification Changes</h3> <p>The core specification has been completely revamped for Unicode 16.0.0. The text has all been converted to HTML, and has been deployed on a self-contained subsite. The text is no longer published as per-chapter pdf files, but prior bookmarked links into the chapter files resolve correctly to the new per-chapter HTML files. An archival pdf version of the entire core specification has been produced for this release, and looks and behaves very similarly to the corresponding archival pdf files for prior releases.</p> <h3>Script-related Changes</h3> <p>There are seven new scripts encoded in Unicode 16.0. Some of these scripts, such as Tulu-Tigalari, have complex layout.</p> <p>There are 3,995 additional Egyptian hieroglyphs, particularly in support of Ptolemaic texts. There is also a new data file, Unikemet.txt, with source data, function, and phonetic information for hieroglyphs, including the previously encoded repertoire. See the new <a href="">UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet)</a> for details.</p> <h3>General Character Property Issues</h3> <ul> <li>Starting with U+11F5A KAWI SIGN NUKTA in Unicode 16.0, newly encoded nukta characters use Canonical_Combining_Class (ccc) 0 or positional ccc values such as 220 or 230. Nukta characters encoded in earlier versions typically, but not always, use ccc=7. Software that needs to identify nuktas in Brahmic scripts should check for Indic_Syllabic_Category=Nukta.</li> <li>The ScriptExtensions.txt data file has had a format change for 16.0. Each individual entry is formatted as before, but the overall order of entries has been changed to code point order.</li> </ul> <h3>Normalization Behavior</h3> <p>Several characters have been added in Unicode 16.0 which have subtle implications for certain optimizations of normalization. These do not change the normalization algorithm, but have implications for the derivation and use of Quick_Check properties for optimization of normalization form detection. See <a href="">UAX #15</a> for details.</p> <h3>Segmentation</h3> <p>There has been a change of linebreaking affecting U+2018 LEFT SINGLE QUOTATION MARK and similar directional quotation marks in specific East Asian contexts to correct for issues in simplified Chinese line breaking, as well as other rule changes to better align the specification with the behavior of ICU. See <a href="">UAX #14</a> for details.</p> <p>There has also been a change to the Grapheme_Cluster_Break property data, extending the use of GCB=V to apply to certain non-Hangul vowels, and in particular for Kirat Rai vowels. This change finesses the behavior of the segmentation of grapheme cluster breaks in such cases, while respecting normalization requirements and canonical equivalence. Implementations should take note that GCB=V and HST=V are no longer coextensive. See <a href="">UAX #29</a> for details.</p> <h3>Numeric Property Issues</h3> <p>There are eight new sets of decimal digits added in Unicode 16.0. Five of these sets are for newly encoded scripts: Garay, Sunuwar, Gurung Khema, Kirat Rai, and Ol Onal. Two sets of digits constitute more region-specific digit sets for the Myanmar script. Finally, there is one additional set, consisting of stylistically outlined digits, intended for support of legacy computer symbol sets for terminal emulations. Implementations of numeric values and numeric formatting should take these new sets into account.</p> <h3>CJK/Unihan Changes</h3> <ul> <li>Some kRSUnicode values now include triple-apostrophe radicals.</li> <li>One old provisional property has been removed.</li> <li>Two new provisional properties have been added.</li> </ul> <p>See <a href="">UAX #38</a>, Unicode Han Database (Unihan) for further details on these changes, especially Section 4.2, <i>Listing by Date of Addition to the Unicode Standard</i>, and Section 4.3, <i>Listing by Location within</i>. UAX #38 also has updated regex values for three Unihan properties. For the changes associated with the triple-apostrophe radicals, see:</p> <ul> <li>UAX #38: <a href="">kRSUnicode</a></li> <li>UAX #38: Section 3.6, <a href=""><i>Radical-Stroke Counts</i></a></li> <li>UAX #38: Section 2.1.2, <a href=""><i>Sorting Algorithm Used by the Radical-Stroke Charts</i></a></li> </ul> <h3>Standardized Variation Sequences</h3> <ul> <li>Four unused Egyptian hieroglyph variation sequences have been removed from the data. Ten other new sequences have been added to deal with various rotations of previously encoded hieroglyphs.</li> <li>Eight variation sequences have been added for curly quotation marks (U+2018, U+2019, U+201C, U+201D) to deal with full-width layout considerations in Chinese text.</li> </ul> <h3>UTS #46 (IDNA) Changes</h3> <p>There have been a number of changes to the specification, in general to bring it forward to better align with current practice and to simplify no longer needed transitional features.</p> <ul> <li>The text of UTS #46 has been changed to simplify the base exclusion set and adjust the derivation of the mappings in IdnaMappingTable.txt. Previously, the base exclusion set had been derived from differences between IDNA2003 data and the principles of UTS #46. It is no longer necessary to disallow characters on the basis of differences from IDNA2003, so the base exclusion has been radically simplified.</li> <li>The handling of UseSTD3ASCIIRules has been simplified. Conditional data involving disallowed_STD3_* Status values has been replaced with simple checking for a subset of ASCII characters in the Validity Criteria. This simplifies the data format and data lookup, makes standard UseSTD3ASCIIRules=true handling consistent with custom UseSTD3ASCIIRules, and avoids unnecessarily disallowing certain labels that contain disallowed_STD3_mapped characters but which do not contain non-LDH ASCII characters when the mappings are applied.</li> <li>In Section 4, Processing, if the label starts with “xn--”, and the conversion from Punycode yields either an empty label or an all-ASCII label, then an error is now recorded, consistent with IDNA2008.</li> <li>In the test data file, there is a small addition to the syntax: "" means an empty string. There are also other test data corrections and improvements. For details see Section 8, Conformance Testing, Migration.</li> </ul> <h3>Changes to Code Charts</h3> <ul> <li>There are a number of Han glyph updates, particularly for CJK Unified Ideographs Extension B.</li> <li>Other glyph updates are listed explicitly in the <a href="">delta charts index page</a>.</li> <li>There are also a very large number of J-Source (Japanese) additions to the CJK charts. These extensions are not individually highlighted in the code charts.</li> <li>The two code charts for Egyptian hieroglyphs contain extensive functional and phonetic information derived from the new data file, Unikemet.txt.</li> </ul> <h3>Collation-related Changes</h3> <p>A significant new change for DUCET in Unicode 16.0 involves moving the non-decimal digits to sort after the main decimal digits. This change greatly reduces the superfluous differences between DUCET and the CLDR base tailoring of DUCET.</p> <p>There has also been a small fix to correct the ordering for U+312C BOPOMOFO LETTER GN.</p> <h3>Emoji Changes</h3> <p>For details about emoji changes, see the Unicode 16.0 <a href="">emoji charts</a> and <a href="">Emoji Recently Added, v16.0</a>.</p> <h3> </h3> </blockquote> <hr width="50%"> <div align="center"> <center> <table cellspacing="0" cellpadding="0" border="0"> <tr> <td><a href=""> <img src="" border="0" alt="Access to Copyright and terms of use" width="216" height="50"></a></td> </tr> </table> </center> </div> </td> </tr> </table> </body> </html>