CINXE.COM

Orthography development in relation to Unicode

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Orthography development in relation to Unicode</title> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <meta name="keywords" content="unicode, orthography, development, standards"> <link rel="stylesheet" href="/cms/assets/misc/css/default.css" type="text/css"> <link rel="stylesheet" href="/cms/sites/nrsi/themes/default/_css/default.css" type="text/css"> <style type="text/css"> <!-- A.GlobalNavLink, A.GlobalNavLink:visited { color: #FFFF00; font-size: smaller; font-weight: bold; } --> </style> <!-- 2023-05-25 PKM Added for Google Analytics 4 --> <!-- Google tag (gtag.js) --> <script async src="https://www.googletagmanager.com/gtag/js?id=G-FVXRGR2Q9V"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-FVXRGR2Q9V'); </script> <title>Orthography development in relation to Unicode</title> </head> <body style="padding:0; margin:0"> <style> .archive_notice { /* box-shadow: black 0pt 4pt 20px -8px inset; */ display: block; background-color: orange; font-size: 12pt; font-style: normal; font-weight: lighter; line-height: 100%; padding: 5pt; text-align: center; width: auto; } form { display: none } .webform::before { content: "Forms are disabled on this static version of the site."; display: block; width: fit-content; } </style> <div class="archive_notice"> This is an archive of the original scripts.sil.org site, preserved as a historical reference. Some of the content is outdated. Please consult our other sites for more current information: <a href="https://software.sil.org">software.sil.org</a>, <a href="https://scriptsource.org">ScriptSource</a>, <a href="https://silnrsi.github.io/FDBP/">FDBP</a>, and <a href="https://silnrsi.github.io/silfontdev/">silfontdev</a> </div> <table width="100%" height="100%" border="0" cellspacing="0" cellpadding="0"> <tr> <td style="background: #0068a6; padding-left:20; padding-top:10; white-space:nowrap;" width="110" valign="top"> <p><a href="http://www.sil.org/"> <!-- <img src="/cms/sites/nrsi/themes/default/_media/SIL_logo_left_column.gif" width="86" height="80" border="0"> --> <img src="/cms/sites/nrsi/themes/default/_media/SIL_Logo_TM_Blue_2014.png" width="85" height="95" border="0" alt=""> </a><br><br></p> <p class="Cat1"><a class="Cat1" href="/cms/scripts/page.php%3Fid%3Dhome%26site_id%3Dnrsi.html">Home</a></p> <p class="Cat1"><a class="Cat1" href="/cms/scripts/page.php%3Fid%3Dcontactus%26site_id%3Dnrsi.html">Contact Us</a></p> <p class="Cat1"><a class="Cat1" href="/cms/scripts/page.php%3Fid%3Dgeneral%26site_id%3Dnrsi.html">General</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dbabel%26site_id%3Dnrsi.html">Initiative B@bel</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dwsi_guidelines%26site_id%3Dnrsi.html">WSI Guidelines</a></p> <p class="Cat1"><a class="Cat1" href="/cms/scripts/page.php%3Fid%3Dencoding%26site_id%3Dnrsi.html">Encoding</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dencodingprinciples%26site_id%3Dnrsi.html">Principles</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dunicode%26site_id%3Dnrsi.html">Unicode</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Dunicodetraining%26site_id%3Dnrsi.html">Training</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Dunicodetutorials%26site_id%3Dnrsi.html">Tutorials</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Dunicodepua%26site_id%3Dnrsi.html">PUA</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dconversion%26site_id%3Dnrsi.html">Conversion</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Dencconvres%26site_id%3Dnrsi.html">Resources</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Dconversionutilities%26site_id%3Dnrsi.html">Utilities</a></p> <p class="Cat4"><a class="Cat4" href="/cms/scripts/page.php%3Fid%3Dteckit%26site_id%3Dnrsi.html">TECkit</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Dconversionmaps%26site_id%3Dnrsi.html">Maps</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dencodingresources%26site_id%3Dnrsi.html">Resources</a></p> <p class="Cat1"><a class="Cat1" href="/cms/scripts/page.php%3Fid%3Dinput%26site_id%3Dnrsi.html">Input</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dinputprinciples%26site_id%3Dnrsi.html">Principles</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dinpututilities%26site_id%3Dnrsi.html">Utilities</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dinputtutorials%26site_id%3Dnrsi.html">Tutorials</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dinputresources%26site_id%3Dnrsi.html">Resources</a></p> <p class="Cat1"><a class="Cat1" href="/cms/scripts/page.php%3Fid%3Dtypedesign%26site_id%3Dnrsi.html">Type Design</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dtypedesignprinciples%26site_id%3Dnrsi.html">Principles</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dfontdesigntools%26site_id%3Dnrsi.html">Design Tools</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dfontformats%26site_id%3Dnrsi.html">Formats</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Dtypedesignresources%26site_id%3Dnrsi.html">Resources</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Dfontdownloads%26site_id%3Dnrsi.html">Font Downloads</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Dfontdownloadsgentium%26site_id%3Dnrsi.html">Gentium</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Dfontdownloadsdoulos%26site_id%3Dnrsi.html">Doulos</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Dfontdownloadsipa%26site_id%3Dnrsi.html">IPA</a></p> <p class="Cat1"><a class="Cat1" href="/cms/scripts/page.php%3Fid%3Drendering%26site_id%3Dnrsi.html">Rendering</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Drenderingprinciples%26site_id%3Dnrsi.html">Principles</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Drenderingtechnologies%26site_id%3Dnrsi.html">Technologies</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Drenderingopentype%26site_id%3Dnrsi.html">OpenType</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Drenderinggraphite%26site_id%3Dnrsi.html">Graphite</a></p> <p class="Cat2"><a class="Cat2" href="/cms/scripts/page.php%3Fid%3Drenderingresources%26site_id%3Dnrsi.html">Resources</a></p> <p class="Cat3"><a class="Cat3" href="/cms/scripts/page.php%3Fid%3Dfontfaq%26site_id%3Dnrsi.html">Font FAQ</a></p> <p class="Cat1"><a class="Cat1" href="/cms/scripts/page.php%3Fid%3Dlinks%26site_id%3Dnrsi.html">Links</a></p> <p class="Cat1"><a class="Cat1" href="/cms/scripts/page.php%3Fid%3Dglossary%26site_id%3Dnrsi.html">Glossary</a></p> <br> </td> <td valign="top" style="padding:0" xwidth="650"> <div style="background: #6699CC url(/cms/sites/nrsi/themes/default/_media/home_banner_gradient.gif) no-repeat right; padding:0 0 0 25; height:36px; margin:0; color:#FFFFFF;"> <p style="font-family:Times New Roman; font-size:25px; color:#FFFFFF; padding:10 0 0 0; margin:0 0 0 0">Computers & Writing Systems</p> </div> <div style="padding:0 0 0 0; background-color:#000000; color:#FFFFFF"> <table width='100%'> <tr> <td style="padding: 0 0 0 25px"><a class="GlobalNavLink" href="http://www.sil.org/">SIL HOME</a> | <a class="GlobalNavLink" href="https://software.sil.org/products/">SIL SOFTWARE</a> | <a class="GlobalNavLink" href="/support.html">SUPPORT</a> | <a class="GlobalNavLink" href="https://www.givedirect.org/donate/?cid=13536">DONATE</a> | <a class="GlobalNavLink" href="/privacy-policy.html">PRIVACY POLICY</a> </td> <td align='right' width='20%'> <script async src="https://cse.google.com/cse.js?cx=0760bf09a6bff4b0c"></script><style>.gsc-control-cse {padding: 0.6em; min-width: 10em; width: 18em; max-width: 20em} form.gsc-search-box {display: unset;}</style><div class="gcse-search"></div> </td> </tr> </table> </div> <div style="padding:0 25 25 25"> <p class='CategoryPath'>You are here: <a class='CategoryPath' href='/cms/scripts/page.php%3Fid%3Dencoding%26site_id%3Dnrsi.html'>Encoding</a> &gt; <a class='CategoryPath' href='/cms/scripts/page.php%3Fid%3Dunicode%26site_id%3Dnrsi.html'>Unicode</a><br> Short URL: <a href='/orthographydev.html'>https://scripts.sil.org/OrthographyDev</a></p> <!-- --> <!-- <div class='Warning' > <p class='Warning_heading' > Site unavailability </p> <p> Due to essential repairs, this website may be unavailable at times during September 6 (Tue) and 7 (Wed). We apologize for the inconvenience. </p> </div> --> <h1>Orthography development in relation to Unicode </h1> <p> <span class='author_date_hits'>Lorna A. Priest, 2004-11-18</span></p><a name='develop'></a> <p>It is out of our scope to give complete guidelines for developing an orthography. However, we would like to give you a process to work through from a <a href='page.php%3Fid%3Dglossary%26site_id%3Dnrsi.html#U'>Unicode</a> perspective.</p> <p>In designing a writing system, one must decide what symbols will be used and how. Here we list Unicode factors that should be taken into account. At least two sets of standards should be considered when developing an orthography: </p> <ul class='dListUnordered'> <li><span class='KeyTerm'>National/local alphabet</span> (local or national standards may have higher priority than phonetic considerations) and </li> <li><a href='http://www.unicode.org' target='_blank'><img src='/cms/assets/icons/offsite_link.png'>&nbsp;The Unicode Standard</a>. </li> </ul> <p>Our concern here is with the Unicode Standard.</p> <div class='Note'><p class='Note_heading'>Note</p><p>The crucial question now is, “Does the character being considered already exist in the Unicode Standard?”</p> </div><p>The Unicode Consortium will not readily accept the addition of randomly created characters. If the character under consideration does not already exist in Unicode you should seriously reconsider an alternative that does. Using a character that is not already in the Unicode Standard leads to the following problems:</p> <ul class='dListUnordered'> <li>Standard fonts will not contain that character</li> <li>Diacritics will not be placed properly on that character</li> <li>Words will possibly be broken in inappropriate places</li> </ul> <p>If the character you want to use exists in the Unicode Standard, then there are some further issues to think about. </p> <a name='cs'></a> <p>In developing an orthography (see <a href='/cms/scripts/page.php%3Fid%3Dexcelunicodedata%26site_id%3Dnrsi.html'>Unicode Character Properties Excel and LibreOffice Calc spreadsheet</a>),</p> <ul class='dListUnordered'> <li>Consideration should be given to the following <span class='Em'>for each character</span>: <ul class='dListUnordered'> <li>What are the character’s properties? <ul class='dListUnordered'> <li>If it is a lower case letter it should have a <span class='KeyTerm'>Ll</span> (Letter, Lowercase) category <ul class='dListUnordered'> <li>Make sure there is already a matching upper case letter</li> </ul> </li> <li>If it is an upper case letter it should have a <span class='KeyTerm'>Lu</span> (Letter, Uppercase) category.</li> <li>Some letters do not have case (no upper and lower case variants). For these, consideration should be given to: <ul class='dListUnordered'> <li>Unless it has “case”, the character you choose should have a <span class='KeyTerm'>Lm</span> (Letter, Modifier) category. </li> <li>If the script you are working with does not use case at all, characters will have a <span class='KeyTerm'>Lo</span> (Letter, Other), <span class='KeyTerm'>Mc</span> (Mark, Spacing Combining), or <span class='KeyTerm'>Mn</span> (Mark, Spacing Combining) category.</li> <li>Some people have been creative in using <span class='KeyTerm'>punctuation marks</span> for marking tone, glottal stops and other features that are properly part of a word. <span class='Highlight'>Do not consider doing this!</span> If punctuation marks are used, they will not be considered part of the word. You might find what you need in this block: <a href='http://www.unicode.org/charts/PDF/U02B0.pdf' target='_blank'><img src='/cms/assets/icons/offsite_link.png'>&nbsp;Spacing Modifier Letters</a>. However, not all of these are considered word building (for instance, some of these have a character property of <span class='KeyTerm'>sk</span> “Symbol, Modifier”). Check the character properties (see, <a href='/cms/scripts/page.php%3Fid%3Dexcelunicodedata%26site_id%3Dnrsi.html'>Unicode Character Properties Excel and LibreOffice Calc spreadsheet</a>) of any symbol you are considering using.</li> </ul> </li> </ul> </li> <li>What is the form of capital letters? If you are uncertain, look at “Upper Case equivalent” and “Lower Case equivalent” in <a href='/cms/scripts/page.php%3Fid%3Dexcelunicodedata%26site_id%3Dnrsi.html'>Unicode Character Properties Excel and LibreOffice Calc spreadsheet</a>. <ul class='dListUnordered'> <li>Make sure Lower Case letters match the appropriate Upper Case letter or you will run into problems if you choose pairs that are not associated with each other (see <a href='page.php%3Fid%3Dencodingfaq%26site_id%3Dnrsi.html#schwa'>“How do I know which version of the schwa to use?”</a>).</li> </ul> </li> </ul> </li> <li>How do you represent tone? <ul class='dListUnordered'> <li>Diacritics &mdash; these are found in the <a href='http://www.unicode.org/charts/PDF/U0300.pdf' target='_blank'><img src='/cms/assets/icons/offsite_link.png'>&nbsp;Combining Diacritical Marks</a> and <a href='http://www.unicode.org/charts/PDF/U1DC0.pdf' target='_blank'><img src='/cms/assets/icons/offsite_link.png'>&nbsp;Combining Diacritical Marks Supplement</a> sections of the Unicode Standard. Do not use <span class='USV'>U+0334</span>..<span class='USV'>U+0338</span> and <span class='USV'>U+0340</span>..<span class='USV'>U+0345</span>. </li> <li>Special letters &mdash; follow the guidelines for <a href='#cs'>character properties</a>.</li> <li>Punctuation &mdash; Tone is generally considered part of a word and if you use punctuation marks it will not be considered part of the word. Follow the guidelines listed under <a href='#cs'>punctuation</a>.</li> <li>Numbers &mdash; “Normal” superscript numbers can be used (U+00B9, U+00B2, U+00B3, U+2074, U+2075, U+2076, U+2077, U+2078, U+2079, U+2070 represent <sup>1</sup>,<sup>2</sup>,<sup>3</sup>,<sup>4</sup>,<sup>5</sup>,<sup>6</sup>,<sup>7</sup>,<sup>8</sup>,<sup>9</sup>,<sup>0</sup> respectively). </li> </ul> </li> <li>How do you represent a word boundary? Unless you are using a non-Roman script you can probably count on this being simply a space and will not be an issue in the orthography.</li> </ul> <a name='65c13229'></a> <h2>Other Useful Resources</h2> <ul class='dListUnordered'> <li><a href='http://www.unicode.org/notes/tn19/' target='_blank'><img src='/cms/assets/icons/offsite_link.png'>&nbsp;Unicode Technical Note #19 (Recommendations for Creating New Orthographies)</a></li> <li><a href='https://www.sil.org/resources/archives/7830' target='_blank'><img src='/cms/assets/icons/offsite_link.png'>&nbsp;Factors in Designing Effective Orthographies for Unwritten Languages</a></li> <li><a href='/cms/scripts/page.php%3Fid%3Dutconvertq2%26site_id%3Dnrsi.html'>Is Unicode ready for you?</a></li> <li><a href='/cms/scripts/page.php%3Fid%3Dencodingfaq%26site_id%3Dnrsi.html'>How do I encode...?</a></li> <li><a href='/cms/scripts/page.php%3Fid%3Dwp-encoding%26site_id%3Dnrsi.html'>Presentations and working papers in the area of encoding</a></li> </ul> <br><hr clear='all'><p>Note: the opinions expressed in submitted contributions below do not necessarily reflect the opinions of our website.</p><hr> <hr> <p><small>© 2003-2024 <a href='http://www.sil.org/' target='_blank'>SIL International</a>, all rights reserved, unless otherwise noted elsewhere on this page.<br> Provided by SIL's Writing Systems Technology team (formerly known as NRSI). Read our <a href="/privacy-policy.html">Privacy Policy</a>. <a href='/support.html'>Contact us here.</a></small></p> </div> </td> </table> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10