CINXE.COM
Tamil orthography notes
<!DOCTYPE html> <html lang="en-GB"> <head> <meta charset="utf-8"/> <title>Tamil orthography notes</title> <link rel="stylesheet" href="../../shared/style/docs.css" /> <link rel="stylesheet" href="../common28/local.css" /> <link rel="stylesheet" href="ta.css" /> <script src="../../shared/fontdb/db.js"> </script> <script src="../../shared/fontdb/managefonts.js"></script> <script src="../../shared/scriptdb/taml.js"> </script> <script src="ta-details.html"> </script> <script src="../../shared/code/boilerplate.js"></script> <script src="../../shared/code/toc_2020.js"></script> <script src="../../shared/code/show_codepoints.js"></script> <script src="../../shared/code/scriptGroups.js"></script> <script src="../common28/functions.js"></script> <script src="../common28/transliterate.js"></script> <script src="../common28/prompts.js"> </script> <script src="../glossary/glossary.js"> </script> <script src="../common28/egcode.js"></script> <script src="../common28/references.js"></script> <script src="refs.js"></script> <script src="ta-langdata.js"></script> <script src="ta-examples.js"></script> <script src="ta-translit.js"> </script> <script src="../../shared/manifests/taml-ta.js"></script> </head> <body class="useBlockExamples"> <header> <div id="header-boilerplate"></div> <script>document.getElementById('header-boilerplate').innerHTML = bp_header('../../shared/images/world.gif','docs');</script> </header> <h1>Tamil (draft) <div class="subhead" style="margin-inline-start: .5em;">Tamil</div></h1> <aside class="sidebar"> <nav> <h2 class="notoc flush"><a id="tochead">Contents</a></h2> <div id="toc"><!-- placeholder --></div> </nav> <div id="fontsetting" class="closed"> </div> </aside> <p id="status">Updated <!-- #BeginDate format:Sw1 -->13 December, 2022<!-- #EndDate --> <span id="versionTop"></span> </p> <div id="intro"> <p>This page brings together basic information about the Tamil script and its use for the Tamil language. It aims to provide a brief, descriptive summary of the modern, printed orthography and typographic features, and to advise how to write Tamil using Unicode.</p> <p>Tamil has a fairly complicated set of rules and variations on pronunciation, and the writing system abstracts away from the detail. Phonetic transcriptions on this page should be treated as an approximate guide, only. Many are more phonemic than phonetic, and there may be variations depending on the source of the transcription. For example, the symbol <span class="ipa">a</span> represents a set of central sounds which may be written <span class="ipa">a</span>, <span class="ipa">ə</span>, or <span class="ipa">ʌ</span> in more detailed transcriptions.</p> <p id="usage"></p> <script>addUsageAdvice()</script> </div> <section id="sample"> <h2>Sample</h2> <p class="instructions noprint">Select part of this sample text to show a list of characters, with links to more details.<br>Change size: <input id="fontFSSizeSlider" type="range" min="12" max="100" step="1" value="28" oninput="document.getElementById('freeText').style.fontSize = this.value+'px'; document.getElementById('sizeFSIndicator').textContent=this.value+'px'"> <span id="sizeFSIndicator">28px</span></p> <div id="freeText" dir="ltr" lang="ta" data-target="c" data-base="/scripts/tamil/block" data-source="http://unicode.org/udhr/translations.html"> <p title="UDHR, article 1" onMouseUp="showtext('freeText')">உறுப்புரை 1 மனிதப் பிறிவியினர் சகலரும் சுதந்திரமாகவே பிறக்கின்றனர்; அவர்கள் மதிப்பிலும், உரிமைகளிலும் சமமானவர்கள், அவர்கள் நியாயத்தையும் மனச்சாட்சியையும் இயற்பண்பாகப் பெற்றவர்கள். அவர்கள் ஒருவருடனொருவர் சகோதர உணர்வுப் பாங்கில் நடந்துகொள்ளல் வேண்டும்.</p> <p title="UDHR, article 2" onMouseUp="showtext('freeText')">உறுப்புரை 2 இனம், நிறம், பால், மொழி, மதம், அரசியல் அல்லது வேறு அபிப்பிராயமுடைமை, தேசிய அல்லது சமூகத் தோற்றம், ஆதனம், பிறப்பு அல்லது பிற அந்தஸ்து என்பன போன்ற எத்தகைய வேறுபாடுமின்றி, இப்பிரகடனத்தில் தரப்பட்டுள்ள எல்லா உரிமைகளுக்கும் சுதந்திரங்களுக்கும் எல்லோரும் உரித்துடையவராவர். மேலும், எவரும் அவருக்குரித்துள்ள நாட்டின் அல்லது ஆள்புலத்தின் அரசியல், நியாயாதிக்க அல்லது நாட்டிடை அந்தஸ்தின் அடிப்படையில் — அது தனியாட்சி நாடாக, நம்பிக்கைப் பொறுப்பு நாடாக, தன்னாட்சியற்ற நாடாக அல்லது இறைமை வேறேதேனும் வகையில் மட்டப்படுத்தப்பட்ட நாடாக இருப்பினுஞ்சரி — வேறுபாடெதுவும் காட்டப்படுதலாகாது.</p> <!-- Source: http://unicode.org/udhr/translations.html --> </div> </section> <section id="history"> <h2>Usage & history</h2> <p>The Tamil script is used for writing the Tamil language, a Dravidian language spoken by over 65,500,000 people in India, Sri Lanka, Singapore, Malaysia and Mauritius. Tamil is an official language in the south Indian state of Tamil Nadu as well as in Sri Lanka and Malaysia. It is also used to write the liturgical language Sanskrit, using consonants and diacritics not represented in the Tamil alphabet. Certain minority languages such as Saurashtra, Badaga, Irula, and Paniya are also written in the Tamil script.</p> <p><span class="charExample" translate="no"><span class="ex" lang="ta">தமிழ் அரிச்சுவடி</span> <span class="ipa">t̪ɐmɨɻ ˈɐɾit͡ɕːuʋəɽi</span></span></p> <p>An old Tamil script derived from Brahmi, and dates back to the Ashokan period, however this differs in various significant ways from the modern script, which evolved from a new script created during the 6th century Pallava dynasty. It took around 500 years for this new script to spread throughout the Tamil regions. Orthographic reform in the 19th and 20th centuries simplified and regularised the script, removing many ligated forms, to facilitate typesetting.</p> <p class="instructions">Sources: <a href="http://scriptsource.org/scr/Taml">Scriptsource</a>, <a class="more" href="http://en.wikipedia.org/wiki/Tamil_script">Wikipedia</a>.</p> <section id="variants"> <h3>Orthographic development & variants</h3> <p>The script was reformed in the 19th century to make it easier to typeset, and again in the 20th. The advent of printing also brought back the use of the pulli to denote consonants without an inherent vowel, since the difficulty of using such on palm leaves made it become rare.<tt>ws</tt></p> <p>In 1978, in an attempt to simplify the script, the government of Tamil Nadu proposed the reform of certain letters and syllables. See <a class="secref">writing_styles</a> for details.</p> <p>These reforms only spread in India and the digital world, whereas Sri Lanka, Singapore, Malaysia, Mauritius, Reunion and other Tamil speaking regions continue to use the traditional syllables.<tt>wss</tt></p> </section> </section> <div id="features"> <table> <tbody id="featureTableBody"> </tbody> </table> <p class="ctlink"><a href="../featurelist/">See the comparison table</a></p> </div> <section id="type"> <h2>Basic features</h2> <p> The Tamil script is an <a class="termref" href="../glossary/index.html#abugida">abugida</a>, ie. consonants carry an inherent vowel sound that is overridden using vowel signs or killed using a <a class="termref" href="../glossary/index.html#virama">virama</a>. See the table to the right for a brief overview of features for the modern Tamil orthography.</p> <p>The Tamil script is written horizontally, left to right.</p> <p>Words are separated by spaces.</p> <p>There are fewer consonants than in other Indic scripts. Tamil has no aspirated consonant letters, and symbols are allocated on a phonemic basis, rather than phonetic. This means that <span lang="ta" >க</span>, for example, may be pronounced as the allophones <span class="ipa">k</span> <span class="ipa">ɡ</span> <span class="ipa">x</span> <span class="ipa">ɣ</span> or <span class="ipa">h</span>, according to where it appears relative to other sounds in a word, but its pronunciation doesn't change the word.</p> <p>The 18 consonant letters used for pure Tamil words are supplemented by 6 more Grantha consonant signs which are used for English and Sanskrit loan words. Repertoire extensions for 4 more non-native sounds are achieved by applying the <span class="name">āytam</span> diacritic to characters. ❯ <a class="secref" title="Read more.">consonants</a></p> <p>Consonant clusters are indicated using the visible <span class="name">puḷḷi</span> dot (the virama) to indicate that no vowel follows a consonant. Exceptions to the rule are 2 ligated forms (shown just below). ❯ <a class="secref" title="Read more.">clusters</a><span class="charExample" translate="no"><span class="ex" lang="ta">க்ஷ</span> <span class="trans">k͓ʂ</span> <span class="ipa">kʃʌ</span></span><span class="charExample" translate="no"><span class="ex" lang="ta">ஶ்ரீ</span> <span class="trans">ʃ͓ɾī</span> <span class="ipa">ʃri</span></span></p> <p>Word-initial clusters do not appear in Tamil. Syllable-/word-final consonants are just written using ordinary consonants with the <span class="name">puḷḷi</span> overhead, eg.</p> <p><span class="eg" lang="ta">தமிழ்</span></p> <p>The Tamil orthography has an <a class="termref" href="../glossary/index.html#inherentvowel">inherent vowel</a>, and represents vowels using 11 <a class="termref" href="../glossary/index.html#vowelsign">vowel signs</a>, including 3 <a class="termref" href="../glossary/index.html#prebase">pre-base glyphs</a> and 3 <a class="termref" href="../glossary/index.html#circumgraph">circumgraphs</a>. All circumgraphs can be decomposed. All vowel signs are combining marks, and are stored after the base character. ❯ <a class="secref" title="Read more.">vowels</a></p> <p>There are 12 <a class="termref" href="../glossary/index.html#independentvowel">independent vowels</a>, one for each vowel sound, including the inherent vowel, and these are used to write all <a class="termref" href="../glossary/index.html#standalonevowel">standalone vowel</a> sounds. ❯ <a class="secref" title="Read more.">standalone</a></p> <p>The only <a class="termref" href="../glossary/index.html#compositevowel">composite vowels</a> are those created by decomposition of the circumgraphs, and involve 2 glyphs, one on each side of the base consonant(s). ❯ <a class="secref" title="Read more.">composite_vowels</a></p> <p>Tamil is diglossic: the classic form is preferred for writing and public speaking, and is mostly standard across the Tamil-speaking regions; the colloquial, spoken form differs widely from the written.</p> <p>There can also be differences in letter shapes and other typographic approaches between the Tamil used in India and that used in places like Singapore and Malaysia (and even Sri Lanka).</p> </section> <section id="index"> <h2>Character index</h2> <div id="index_intro"></div> <section id="index_letters"> <h3>Letters</h3> <details> <summary class="instructions">Show</summary> <section id="index_letters_consonants"> <h4>Basic consonants</h4> <figure class="characterBox noindex indexline" data-cols="" data-links="">ப␣த␣ச␣ட␣க␣ம␣ந␣ன␣ண␣ஞ␣ங␣வ␣ர␣ற␣ழ␣ல␣ள␣ய</figure> </section> <section id="index_letters_extended"> <h4>Grantha consonants</h4> <figure class="characterBox noindex indexline" data-cols="" data-links="">ஜ␣ஸ␣ஶ␣ஷ␣ஹ</figure> </section> <section id="index_letters_independent"> <h4>Independent vowels</h4> <figure class="characterBox noindex indexline" data-cols="trans" data-links="">இ␣ஈ␣உ␣ஊ␣எ␣ஏ␣ஒ␣ஓ␣அ␣ஆ␣ஐ␣ஔ</figure> </section> <section id="index_letters_other"> <h4>Other</h4> <figure class="characterBox noindex indexline" data-cols="" data-links="#visarga,#symbols">ஃ␣ௐ</figure> </section> </details> </section> <section id="index_cchars"> <h3>Combining marks</h3> <details> <summary class="instructions">Show</summary> <section id="index_cchars_vowels"> <h4>Vowel signs</h4> <figure class="characterBox noindex indexline" data-cols="" data-links="">ி␣ீ␣ு␣ூ␣ெ␣ே␣ொ␣ோ␣ா␣ை␣ௌ</figure> </section> <section id="index_cchars_other"> <h4>Other</h4> <figure class="characterBox noindex indexline" data-cols="" data-links="#suppression,#circumgraphs">்␣ௗ</figure> </section> <section id="index_cchars_unused"> <h4>Not used for Tamil</h4> <figure class="deprecatedBox noindex indexline" data-cols="" data-links="">ஂ␣𑌻</figure> </section> </details> </section> <section id="index_numbers"> <h3>Numbers</h3> <details> <summary class="instructions">Show</summary> <section id="index_numbers_unused"> <h4>Not used for modern Tamil</h4> <figure class="characterBox noindex indexline" data-cols="" data-links="">௦␣௧␣௨␣௩␣௪␣௫␣௬␣௭␣௮␣௯␣௰␣௱␣௲</figure> </section> </details> </section> <section id="index_punctuation"> <h3>Punctuation</h3> <details> <summary class="instructions">Show</summary> <figure class="characterBox noindex indexline" data-cols="" data-links="#quotations,#quotations,#quotations,#quotations,">“␣”␣‘␣’</figure> <figure class="characterBox noindex indexline" data-cols="" data-links="#phrase,#phrase,">।␣॥</figure> <section id="index_punctuation_ascii"> <h4>ASCII</h4> <figure class="characterBox noindex indexline" data-cols="" data-links="#phrase,#bracketing,#bracketing,#phrase,#phrase,#phrase,#phrase,#phrase,">!␣(␣)␣,␣.␣:␣;␣?</figure> </section> <!--section id="index_punctuation_cldr"> <h4>CLDR additions</h4> <figure class="auxiliaryBox noindex indexline" data-cols="" data-links="">§␣–␣—␣†␣‡␣…␣′␣″</figure> </section--> </details> </section> <section id="index_symbols"> <h3>Symbols</h3> <details> <summary class="instructions">Show</summary> <figure class="characterBox noindex indexline" data-links="#currency,#currency" data-cols="">௹␣₹</figure> <section id="index_symbols_unused"> <h4>Not used for modern Tamil</h4> <figure class="characterBox noindex indexline" data-cols="" data-links="">௳␣௴␣௵␣௶␣௷␣௸␣௺␣₨</figure></section> </details> </section> </section> <div id="showTranscriptions"> <details> <summary>Items to show in lists</summary> <div><label>Codepoint <input type="checkbox" onChange="toggleTranscription('listUnum', this.checked)" checked/></label></div> <div><label>IPA <input type="checkbox" onChange="toggleTranscription('listIPA', this.checked)" checked/></label></div> <div><label>ISO 15919 <input type="checkbox" onChange="toggleTranscription('listTransc', this.checked)"/></label></div> <div> <label>Transliteration <input type="checkbox" onChange="toggleTranscription('listTrans', this.checked)" /></label></div> </details> </div> <section id="structure"> <h2>Structure</h2> <p>Tamil has a very restricted set of consonant clusters, and no word-initial clusters.<tt>wp</tt> Geminated consonants, however, are common.</p> <p>Some consonants cannot begin a word (eg. <span class="ipa">ɾ ɻ l</span>) and others cannot appear at the end.<tt>ws,#Basic_consonants</tt></p> </section> <section id="phonology"> <h2>Phonology</h2> <p class="instructions">Click on the sounds to reveal locations in this document where they are mentioned.</p> <p class="instructions">Phones in a lighter colour are non-native or allophones. Source <a href="https://en.wikipedia.org/wiki/Tamil_phonology">Wikipedia</a>.</p> <section id="phonemesV"> <h3>Vowel sounds</h3> <section id="plain_vowels"> <h4>Plain vowels</h4> <svg xmlns="http://www.w3.org/2000/svg" viewBox="-50 -20 500 240" class="ipaSVG"> <defs> <filter x="0" y="0" width="1" height="1" id="solid"> <feFlood flood-color="white"/> <feComposite in="SourceGraphic" operator="xor"/> </filter> </defs> <!--image xlink:href="images/vowelgrid.gif" alt="Les caractères sont associés à des octets." x="0" y="0" width="400" height="200"></image--> <use xlink:href="#arrowRight" x="155" y="25"></use> <line x1="40" y1="10" x2="240" y2="10" stroke="#ccc" /> <line x1="240" y1="10" x2="240" y2="190" stroke="#ccc" /> <line x1="115" y1="190" x2="240" y2="190" stroke="#ccc" /> <line x1="40" y1="10" x2="115" y2="190" stroke="#ccc" /> <line x1="65" y1="70" x2="240" y2="70" stroke="#ccc" /> <line x1="90" y1="130" x2="240" y2="130" stroke="#ccc" /> <line x1="140" y1="10" x2="177" y2="190" stroke="#ccc" /> <!-- close --> <circle cx="40" cy="10" r="3" fill="#ccc"/> <circle cx="140" cy="10" r="3" fill="#ccc"/> <circle cx="240" cy="10" r="3" fill="#ccc"/> <!-- mid close --> <circle cx="65" cy="70" r="3" fill="#ccc"/> <circle cx="152.5" cy="70" r="3" fill="#ccc"/> <circle cx="240" cy="70" r="3" fill="#ccc"/> <!-- mid open --> <circle cx="90" cy="130" r="3" fill="#ccc"/> <circle cx="165" cy="130" r="3" fill="#ccc"/> <circle cx="240" cy="130" r="3" fill="#ccc"/> <!-- close --> <circle cx="115" cy="190" r="3" fill="#ccc"/> <circle cx="177" cy="190" r="3" fill="#ccc"/> <circle cx="240" cy="190" r="3" fill="#ccc"/> <!-- close --> <text x="32" y="15" text-anchor="end"> <tspan class="ipa">i</tspan> <tspan class="ipa">iː</tspan> </text> <text x="250" y="15"> <tspan class="ipa">u</tspan> <tspan class="ipa">uː</tspan> </text> <!-- close mid --> <text x="57" y="75" text-anchor="end"> <tspan class="ipa">e</tspan> <tspan class="ipa">eː</tspan> </text> <text x="250" y="75"> <tspan class="ipa">o</tspan> <tspan class="ipa">oː</tspan> </text> <!-- open --> <text x="106" y="195" text-anchor="end"> <tspan class="ipa">a</tspan> <tspan class="ipa">aː</tspan> </text> </svg> <table class="ipaTable" style="display: none"> <thead> <tr> <th>  </th> <th>Front</th> <th>Central</th> <th>Back </th> </tr> </thead> <tbody> <tr> <th>Close </th> <td> <span class="ipa">i iː</span> </td> <td> </td> <td> <span class="ipa">u uː</span></td> </tr> <tr> <th>Close-mid </th> <td> <span class="ipa">e eː</span></td> <td><span class="ipa">ʌ</span></td> <td> <span class="ipa">o oː</span></td> </tr> <tr> <th>Open </th> <td> </td> <td><span class="ipa">aː</span><span class="ex" lang="lo"></span></td> <td> </td> </tr> </tbody> </table> <p>Vowel length is distinctive; long vowels being about twice the length of short vowels.</p> <p>Vowel quality can vary depending on the adjacent sounds. A good number of such variations are described in Comrie<tt class="more">b</tt>.</p> </section> <section id="diphthongs"> <h4>Diphthongs</h4> <svg xmlns="http://www.w3.org/2000/svg" viewBox="-50 -20 500 240" class="ipaSVG"> <defs> <filter x="0" y="0" width="1" height="1" id="solid2"> <feFlood flood-color="white"/> <feComposite in="SourceGraphic" operator="xor"/> </filter> </defs> <!--image xlink:href="images/vowelgrid.gif" alt="Les caractères sont associés à des octets." x="0" y="0" width="400" height="200"></image--> <use xlink:href="#arrowRight" x="155" y="25"></use> <line x1="40" y1="10" x2="240" y2="10" stroke="#ccc" /> <line x1="240" y1="10" x2="240" y2="190" stroke="#ccc" /> <line x1="115" y1="190" x2="240" y2="190" stroke="#ccc" /> <line x1="40" y1="10" x2="115" y2="190" stroke="#ccc" /> <line x1="65" y1="70" x2="240" y2="70" stroke="#ccc" /> <line x1="90" y1="130" x2="240" y2="130" stroke="#ccc" /> <line x1="140" y1="10" x2="177" y2="190" stroke="#ccc" /> <!-- close --> <circle cx="40" cy="10" r="3" fill="#ccc"/> <circle cx="140" cy="10" r="3" fill="#ccc"/> <circle cx="240" cy="10" r="3" fill="#ccc"/> <!-- mid close --> <circle cx="65" cy="70" r="3" fill="#ccc"/> <circle cx="152.5" cy="70" r="3" fill="#ccc"/> <circle cx="240" cy="70" r="3" fill="#ccc"/> <!-- mid open --> <circle cx="90" cy="130" r="3" fill="#ccc"/> <circle cx="165" cy="130" r="3" fill="#ccc"/> <circle cx="240" cy="130" r="3" fill="#ccc"/> <!-- close --> <circle cx="115" cy="190" r="3" fill="#ccc"/> <circle cx="177" cy="190" r="3" fill="#ccc"/> <circle cx="240" cy="190" r="3" fill="#ccc"/> <!-- open --> <text x="106" y="195" text-anchor="end"> <tspan class="ipa">aɪ</tspan> <tspan class="ipa">aʊ</tspan> </text> </svg> <table class="ipaTable" style="display: none"> <thead> <tr> <th>  </th> <th>Front</th> <th>Central</th> <th>Back </th> </tr> </thead> <tbody> <tr> <th>Close </th> <td> </td> <td> </td> <td> </td> </tr> <tr> <th>Close-mid </th> <td> </td> <td> </td> <td> </td> </tr> <tr> <th>Open </th> <td> </td> <td><span class="ipa">ai̯ aʊ̯</span></td> <td> </td> </tr> </tbody> </table> <p>Use of <span class="ipa">aʊ</span> is restricted to a few lexical items.<tt>wp</tt></p> </section> </section> <section id="phonemesC"> <h3>Consonant sounds</h3> <table class="ipaTable"> <thead> <tr> <th></th> <th>labial </th> <th>dental </th> <th>alveolar </th> <th>post-<br> alveolar</th> <th>retroflex</th> <th>palatal </th> <th>velar </th> <th>glottal </th> </tr> </thead> <tbody> <tr> <th>stop</th> <td><span class="ipa">p</span> <span class="allophone">b</span></td> <td><span class="ipa">t̪</span> <span class="allophone">d̪</span></td> <td> </td> <td> </td> <td><span class="ipa">ʈ</span> <span class="allophone">ɖ</span></td> <td> </td> <td><span class="ipa">k</span> <span class="allophone">ɡ</span></td> <td> </td> </tr> <tr> <th>affricate</th> <td> </td> <td> </td> <td> </td> <td><span class="ipa">t͡ʃ</span> <span class="allophone">d͡ʒ</span></td> <td> </td> <td> </td> <td> </td> <td> </td> </tr> <tr> <th>fricative </th> <td><span class="allophone">f</span><br/><span class="allophone">β</span></td> <td><span class="allophone">ð</span></td> <td><span class="ipa">s</span> <span class="allophone">z</span></td> <td><span class="allophone">ʃ</span> <span class="allophone">ʒ</span></td> <td><span class="allophone">ʂ</span></td> <td> </td> <td><span class="allophone">x</span> <span class="allophone">ɣ</span></td> <td><span class="allophone">h</span></td> </tr> <tr> <th>nasal</th> <td><span class="ipa">m</span></td> <td><span class="ipa">n̪</span></td> <td><span class="ipa">n</span></td> <td> </td> <td><span class="ipa">ɳ</span></td> <td><span class="ipa">ɲ</span></td> <td><span class="ipa">ŋ</span></td> <td></td> </tr> <tr> <th>approximant</th> <td><span class="ipa">ʋ</span></td> <td> </td> <td><span class="ipa">l</span></td> <td> </td> <td><span class="ipa">ɻ</span> <span class="ipa">ɭ</span></td> <td><span class="ipa">j</span></td> <td> </td> <td></td> </tr> <tr> <th>trill/flap</th> <td> </td> <td> </td> <td><span class="ipa">r</span> <span class="ipa">ɾ</span></td> <td> </td> <td><span class="allophone">ɽ</span></td> <td></td> <td></td> <td></td> </tr> <tr> <th></th> <td class="mouth"><a href="../common28/mouth/mouth_labial.png" target="_blank"><img src="../common28/mouth/mouth_small_labial.png" alt=" "/></a></td> <td class="mouth"><a href="../common28/mouth/mouth_dental.png" target="_blank"><img src="../common28/mouth/mouth_small_dental.png" alt=" "/></a></td> <td class="mouth"><a href="../common28/mouth/mouth_alveolar.png" target="_blank"><img src="../common28/mouth/mouth_small_alveolar.png" alt=" "/></a></td> <td class="mouth"><a href="../common28/mouth/mouth_postalveolar.png" target="_blank"><img src="../common28/mouth/mouth_small_postavleolar.png" alt=" "/></a></td> <td class="mouth"><a href="../common28/mouth/mouth_retroflex.png" target="_blank"><img src="../common28/mouth/mouth_small_retroflex.png" alt=" "/></a></td> <td class="mouth"><a href="../common28/mouth/mouth_palatal.png" target="_blank"><img src="../common28/mouth/mouth_small_palatal.png" alt=" "/></a></td> <td class="mouth"><a href="../common28/mouth/mouth_velar.png" target="_blank"><img src="../common28/mouth/mouth_small_velar.png" alt=" "/></a></td> <td class="mouth"><a href="../common28/mouth/mouth_glottal.png" target="_blank"><img src="../common28/mouth/mouth_small_glottal.png" alt=" "/></a></td> </tr> </tbody> </table> </section> </section> <section id="vowels"> <h2>Vowels</h2> <div id="vowel_description"> <p>The Tamil orthography has an <a class="termref" href="../glossary/index.html#inherentvowel">inherent vowel</a>, and represents vowels using 11 <a class="termref" href="../glossary/index.html#vowelsign">vowel signs</a>, including 3 <a class="termref" href="../glossary/index.html#prebase">pre-base glyphs</a> and 3 <a class="termref" href="../glossary/index.html#circumgraph">circumgraphs</a>. All circumgraphs can be decomposed. All vowel signs are combining marks, and are stored after the base character.</p> <p>There are 12 <a class="termref" href="../glossary/index.html#independentvowel">independent vowels</a>, one for each vowel sound, including the inherent vowel, and these are used to write all <a class="termref" href="../glossary/index.html#standalonevowel">standalone vowel</a> sounds.</p> <p>The only <a class="termref" href="../glossary/index.html#compositevowel">composite vowels</a> are those created by decomposition of the circumgraphs, and involve 2 glyphs, one on each side of the base consonant(s).</p> </div> <section id="inherent"> <h3>Inherent vowel</h3> <p id="def-independentvowel" class="explanatoryintro definitionStub"></p> <p><span class="ipa">a</span> following a consonant is not written, but is seen as an inherent part of the consonant letter, so <span class="ipa">ka</span> is written using just the consonant letter, eg.</p> <p><span class="charExample inline" translate="no"><span class="ex" lang="ta" style="font-size:3rem;">க</span> <span class="ipa">ka</span></span> [<a href="/scripts/tamil/block#char0B95"><span class="uname">U+0B95 TAMIL LETTER KA</span></a>]</p> <p class="note">Daniels<tt>d</tt> describes the inherent vowel as <span class="ipa">ʌ</span>, though not consistently.</p> </section> <section id="vowelsigns"> <h3>Vowel signs</h3> <p id="def-vowelsign" class="explanatoryintro definitionStub"></p> <p>Non-inherent vowel sounds that follow a consonant are represented using <span class="name">vowel signs</span>, eg.</p> <p><span class="charExample inline" translate="no"><span class="ex" lang="ta" style="font-size:3rem;">கீ</span> <span class="ipa">kiː</span></span> [<a href="/scripts/tamil/block#char0B95"><span class="uname">U+0B95 TAMIL LETTER KA</span></a> + <a href="/scripts/tamil/block#char0BC0"><span class="uname">U+0BC0 TAMIL VOWEL SIGN II</span></a>]</p> <p>Tamil vowel signs are all combining characters. In principle a single Unicode character is used per base consonant, even if the vowel signs appear on both sides of the base consonant (however see also <a class="secref">encoding_ce</a> for decomposed forms). All vowel signs are typed and stored after the base consonant, and the glyph rendering process takes care of the positioning at display time.</p> <p>For consonant clusters that are indicated using a visible pulli, the vowel sign is associated with the consonant that immediately precedes it phonetically. For example, a vowel sign that is rendered to the left of the base will appear <em>between</em> the consonants in the cluster (see <a class="secref">prebase_vowels</a>). And where a vowel sign is composed of components that are rendered before and after the base, they will surround just that consonant (see <a class="secref">composite_vowels</a>).</p> <p>However, for the few consonant clusters that are indicated using conjunct forms, vowel signs are arranged <em>around the conjunct</em>. A vowel sign that is rendered to the left of the base will appear before all the consonants in the syllable onset (see <a class="secref">prebase_vowels</a>). And where a vowel sign is composed of components that are rendered before and after the base, they will surround the conjuct (see <a class="secref">composite_vowels</a>).</p> <p>All but one vowel signs are <em>spacing</em> combining characters, ie. they expand the text width when applied to a consonant.</p> <p>Although modern Tamil uses fewer <em>conjunct</em> ligatures than most other indic scripts, many ligatures are still needed for a Tamil font, mostly for combinations of base consonant and vowel sign. See <a class="sectionref">vowelligation</a>.</p> </section> <section id="combiningvowels"> <h3>Combining marks used for vowels</h3> <p>Tamil uses the following dedicated combining marks for vowels.</p> <figure class="characterBox auto" data-cols="ipa,trans,transc">ி␣ீ␣ு␣ூ␣ெ␣ே␣ொ␣ோ␣ ␣ை␣ௌ</figure> <p>The <span class="trans">u</span> and <span class="trans">ū</span> vowel signs, and to some extent the <span class="trans">i</span> and <span class="trans">ī</span> signs, tend to form ligatures with the base consonant. See <a class="sectionref">vowelligation</a>.</p> </section> <section id="prebase_vowels"> <h3>Pre-base vowel signs</h3> <figure class="characterBox auto small" data-cols="ipa,trans,transc">ெ␣ே␣ை</figure> <figure id="fig_prebase" class="sideCaption"> <img src="images/fig_prebase.svg" style="height:3rem" alt="எங்கே" class="ex" lang="ta" data-notes="Noto Serif Tamil 60px"> <div> <figcaption>In a consonant cluster, a pre-base vowel glyph immediately precedes the consonant it is pronounced after.</figcaption> <details><summary>details</summary><p><span class="eg" lang="ta">எங்கே</span></p></details> </div> </figure> <!--figure id="fig_prebase" class="sideCaption"> <img src="images/fig_prebase.png" alt="எங்கே" class="ex" lang="ta" data-notes="Noto Serif Tamil 60px"> <figcaption>In a consonant cluster, a pre-base vowel glyph immediately precedes the consonant it is pronounced after.</figcaption> </figure--> <p>Three vowel signs appear to the left of the base consonant letter or conjunct, eg. <span class="eg" lang="ta">கெடு</span></p> <p>These are combining marks that are always stored after the base consonant. The rendering process places the glyph before the base consonant.</p> <p>Because modern Tamil usually indicates consonant clusters with a visible virama, pre-base vowel signs normally appear before the consonant that immediately precedes them audially, eg. <span class="eg" lang="ta">எங்கே</span> </p> <p>However, in versions of the orthography that include conjunct forms the pre-base vowel appears before the whole consonant cluster at the beginning of the orthographic syllable.</p> </section> <section id="circumgraphs"> <h3>Circumgraphs</h3> <p id="def-circumgraph" class="explanatoryintro definitionStub"></p> <figure class="characterBox auto small" data-cols="ipa,trans,transc">ொ␣ோ␣ௌ</figure> <figure id="fig_circumgraph" class="sideCaption"> <img src="images/fig_circumgraph.svg" style="height:3rem" alt="மீக்கோள்" class="ex" lang="ta" data-notes="Noto Serif Tamil 60px >72ppi"> <div> <figcaption>In a consonant cluster, a circumgraph vowel sign surrounds just the consonant it is pronounced after.</figcaption> <details><summary>details</summary><p><span class="eg" lang="ta">மீக்கோள்</span></p></details> </div> </figure> <!--figure id="fig_circumgraph" class="sideCaption"> <img src="images/fig_circumgraph.png" alt="மீக்கோள்" class="ex" lang="ta" data-notes="Noto Serif Tamil 60px >72ppi"> <figcaption>In a consonant cluster, a circumgraph vowel sign surrounds just the consonant it is pronounced after.</figcaption> </figure--> <p>Three vowels are produced by a single combining character with visually separate parts, that appear on opposite sides of the consonant onset eg. <span class="eg" lang="ta">கொடு</span></p> <p>As for pre-base vowel signs, in Tamil consonant clusters the circumgraph normally surrounds only the consonant that phonetically precedes it. In the few cases where it is pronounced after a cluster that is rendered as a conjunct, it surrounds the whole conjunct.</p> <p>These circumgraphs have canonically equivalent decomposed forms (see <a class="secref">encoding_ce</a>).</p> </section> <section id="composite_vowels"> <h3>Composite vowels</h3> <p id="def-compositevowel" class="explanatoryintro definitionStub"></p> <p>Composite vowels only occur in Tamil when the circumgraphs just mentioned are decomposed. When they do occur, both combining marks must be typed and stored after the consonant base and in the visual order (see <a class="secref">encoding_ce</a>). Click on the following example to see the (de)composition.</p> <p><span class="charExample" translate="no"><span class="ex" lang="ta">கொடு</span> <span class="ipa">koɖu</span> <span class="meaning">to give</span></span></p> <details> <summary class="instructions" style="margin-block-start: 3em;margin-block-end: 4em;">Show details about vowel glyph positioning.</summary> <p>The following list summarises where vowel signs are positioned around a base consonant to produce vowels, and how many instances of that pattern there are.</p> <ul> <li>3 pre-base, eg. <span class="charExample" translate="no"><span class="ex" lang="ta">கெ</span> <span class="trans">ke</span></span></li> <li>4 post-base, eg. <span class="charExample" translate="no"><span class="ex" lang="ta">கூ</span> <span class="trans">kū</span></span></li> <li>1 superscript, eg. <span class="charExample" translate="no"><span class="ex" lang="ta">கீ</span> <span class="trans">kī</span></span></li> <li>3 pre+post-base, eg. <span class="charExample" translate="no"><span class="ex" lang="ta">கௌ</span> <span class="trans">kʌʷ</span></span></li> </ul> <p>However, some of the vowel signs are tightly integrated with the consonant shape. See <a class="secref">vowelligation</a>.</p> </details> </section> <section id="standalone"> <h3>Standalone vowels</h3> <p id="def-standalonevowel" class="explanatoryintro definitionStub"></p> <p>Tamil represents syllable-initial vowels using a set of independent vowel letters.</p> <figure class="characterBox auto" data-cols="ipa,trans,transc">இ␣ஈ␣உ␣ஊ␣எ␣ஏ␣ஒ␣ஓ␣அ␣ஆ␣ ␣ஐ␣ஔ</figure> <p>Independent vowel forms used to be used at the beginning of metrical groups, but now they are used at the beginning of a word, eg. </p> <p><span class="eg" lang="ta">இந்த</span></p> <p>They are also used internally to represent 'overlong' vowel sounds, eg. compare <span class="charExample" translate="no"><span class="ex" lang="ta">பெரீய</span> <span class="trans">peɾīy</span> <span class="transc">perīya</span> <span class="meaning">really big</span></span><span class="charExample" translate="no"><span class="ex" lang="ta">பெரீஇஇய</span> <span class="trans">peɾīịịy</span> <span class="transc">perīiiya</span> <span class="meaning">reeeeally big</span></span></p> </section> <section id="novowel"> <h3>Consonants with no following vowel</h3> <figure class="characterBox auto" data-cols="">்</figure> <p>Tamil uses <span class="codepoint" translate="no"><span lang="ta">்</span> [<a href="/scripts/tamil/block#char0BCD"><span class="uname">U+0BCD TAMIL SIGN VIRAMA</span></a>]</span> (called <span class="name">puḷḷi</span> in Tamil) to kill the inherent vowel after a consonant, eg.<span class="codepoint noindex" translate="no"><span lang="ta"> க்</span> [<a href="/scripts/tamil/block#char0B95"><span class="uname">U+0B95 TAMIL LETTER KA</span></a> + <a href="/scripts/tamil/block#char0BCD"><span class="uname">U+0BCD TAMIL SIGN VIRAMA</span></a>]</span> explicitly represents just the sound <span class="ipa">k</span>.</p> <p>The <span class="name">puḷḷi</span> may be rendered as a dot, or as a small, open circle.</p> <p>The <span class="name">puḷḷi</span> tends to be visible anywhere a vowel is dropped. For example, unlike Devanagari, it is used at the end of a word if there is no final vowel, eg. <span class="charExample" translate="no"><span class="ex" lang="ta">மனிதப்</span> <span class="trans">mnitp</span> <span class="ipa">mənid̪əp</span> <span class="meaning">human</span></span></p> <p>The <span class="name">puḷḷi</span> is also used to form conjuncts, although there are normally only 2 of those in modern Tamil (see <a class="secref">clusters</a>).</p> <p class="btw"><span class="codepoint" translate="no"><span lang="ta">ஂ</span> [<a href="/scripts/tamil/block#char0B82"><span class="uname">U+0B82 TAMIL SIGN ANUSVARA</span></a>]</span> is not used for Tamil. Nor should it be used as a graphical variant of the <span class="name">pulli</span>.<tt>s</tt></p> </section> <section id="vowel_mappings"> <h3>Vowel sounds to characters</h3> <p class="instructions">This section maps Tamil vowel sounds to common graphemes in the Tamil orthography, grouped according to whether they are dependent ( <mark>d</mark> ) or standalone ( <mark>s</mark><mark></mark> ) forms. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.</p> <section id="plain_vowel_map"> <h4>Plain vowels</h4> <div id="mapv_high" class="map"> <div class="mapItem"> <div class="phone"><span class="ipa">i</span></div> <div class="posn">d</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ி</span> [<a href="/scripts/tamil/block#char0BBF"><span class="uname">U+0BBF TAMIL VOWEL SIGN I</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">இ</span> [<a href="/scripts/tamil/block#char0B87"><span class="uname">U+0B87 TAMIL LETTER I</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">iː</span></div> <div class="posn">d</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ீ</span> [<a href="/scripts/tamil/block#char0BC0"><span class="uname">U+0BC0 TAMIL VOWEL SIGN II</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஈ</span> [<a href="/scripts/tamil/block#char0B88"><span class="uname">U+0B88 TAMIL LETTER II</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">u</span></div> <div class="posn">d</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ு</span> [<a href="/scripts/tamil/block#char0BC1"><span class="uname">U+0BC1 TAMIL VOWEL SIGN U</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">உ</span> [<a href="/scripts/tamil/block#char0B89"><span class="uname">U+0B89 TAMIL LETTER U</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">uː</span></div> <div class="posn">d</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ூ</span> [<a href="/scripts/tamil/block#char0BC2"><span class="uname">U+0BC2 TAMIL VOWEL SIGN UU</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஊ</span> [<a href="/scripts/tamil/block#char0B8A"><span class="uname">U+0B8A TAMIL LETTER UU</span></a>]</span></p> </div> </div> </div> <div id="mapv_hmid" class="map"> <div class="mapItem"> <div class="phone"><span class="ipa">e</span></div> <div class="posn">d</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ெ</span> [<a href="/scripts/tamil/block#char0BC6"><span class="uname">U+0BC6 TAMIL VOWEL SIGN E</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">எ</span> [<a href="/scripts/tamil/block#char0B8E"><span class="uname">U+0B8E TAMIL LETTER E</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">eː</span></div> <div class="posn">d</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ே</span> [<a href="/scripts/tamil/block#char0BC7"><span class="uname">U+0BC7 TAMIL VOWEL SIGN EE</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஏ</span> [<a href="/scripts/tamil/block#char0B8F"><span class="uname">U+0B8F TAMIL LETTER EE</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">o</span></div> <div class="posn">d</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ொ</span> [<a href="/scripts/tamil/block#char0BCA"><span class="uname">U+0BCA TAMIL VOWEL SIGN O</span></a>]</span></p> <p><span class="codepoint" translate="no"><span lang="ta">ொ</span> [<a href="/scripts/tamil/block#char0BC6"><span class="uname">U+0BC6 TAMIL VOWEL SIGN E</span></a> + <a href="/scripts/tamil/block#char0BBE"><span class="uname">U+0BBE TAMIL VOWEL SIGN AA</span></a>]</span> when decomposed (not recommended) </p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஒ</span> [<a href="/scripts/tamil/block#char0B92"><span class="uname">U+0B92 TAMIL LETTER O</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">oː</span></div> <div class="posn">d</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ோ</span> [<a href="/scripts/tamil/block#char0BCB"><span class="uname">U+0BCB TAMIL VOWEL SIGN OO</span></a>]</span></p> <p><span class="codepoint" translate="no"><span lang="ta">ோ</span> [<a href="/scripts/tamil/block#char0BC7"><span class="uname">U+0BC7 TAMIL VOWEL SIGN EE</span></a> + <a href="/scripts/tamil/block#char0BBE"><span class="uname">U+0BBE TAMIL VOWEL SIGN AA</span></a>]</span> when decomposed (not recommended) </p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஓ</span> [<a href="/scripts/tamil/block#char0B93"><span class="uname">U+0B93 TAMIL LETTER OO</span></a>]</span></p> </div> </div> </div> <div id="mapv_low" class="map"> <div class="mapItem"> <div class="phone"><span class="ipa">a</span></div> <div class="posn">d</div> <div> <p><a href="#inherentvowel">Inherent vowel</a></p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">அ</span> [<a href="/scripts/tamil/block#char0B85"><span class="uname">U+0B85 TAMIL LETTER A</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">aː</span></div> <div class="posn">d</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ா</span> [<a href="/scripts/tamil/block#char0BBE"><span class="uname">U+0BBE TAMIL VOWEL SIGN AA</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஆ</span> [<a href="/scripts/tamil/block#char0B86"><span class="uname">U+0B86 TAMIL LETTER AA</span></a>]</span></p> </div> </div> </div> </section> <section id="diphthong_map"> <h4>Diphthongs</h4> <div id="mapv_low2" class="map"> <div class="mapItem"> <div class="phone"><span class="ipa">aɪ</span></div> <div class="posn">d</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ை</span> [<a href="/scripts/tamil/block#char0BC8"><span class="uname">U+0BC8 TAMIL VOWEL SIGN AI</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஐ</span> [<a href="/scripts/tamil/block#char0B90"><span class="uname">U+0B90 TAMIL LETTER AI</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">aʊ</span></div> <div class="posn">d</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ௌ</span> [<a href="/scripts/tamil/block#char0BCC"><span class="uname">U+0BCC TAMIL VOWEL SIGN AU</span></a>]</span></p> <p><span class="codepoint" translate="no"><span lang="ta">ௌ</span> [<a href="/scripts/tamil/block#char0BC6"><span class="uname">U+0BC6 TAMIL VOWEL SIGN E</span></a> + <a href="/scripts/tamil/block#char0BD7"><span class="uname">U+0BD7 TAMIL AU LENGTH MARK</span></a>]</span> when decomposed (not recommended)</p> </div> </div> <div class="mapItem"> <div class="phone"> </div> <div class="posn">i</div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஔ</span> [<a href="/scripts/tamil/block#char0B94"><span class="uname">U+0B94 TAMIL LETTER AU</span></a>]</span></p> <p><span class="codepoint" translate="no"><span lang="ta">ஔ</span> [<a href="/scripts/tamil/block#char0B92"><span class="uname">U+0B92 TAMIL LETTER O</span></a> + <a href="/scripts/tamil/block#char0BD7"><span class="uname">U+0BD7 TAMIL AU LENGTH MARK</span></a>]</span> <span class="codepoint" translate="no">]</span> when decomposed (not recommended)</p> </div> </div> </div> <p style="font-size: 60%;text-align: end; line-height: 1;">Sources: <a href="https://en.wikipedia.org/wiki/Help:IPA/Tamil">Wikipedia</a>, <a href="http://anunaadam.appspot.com/">Anunaadam</a>, and Google Translate.</p> </section> </section> </section> <section id="consonants"> <h2>Consonants</h2> <p>There are fewer consonants than in other Indic scripts. Tamil has no aspirated consonant letters, and symbols are allocated on a phonemic basis, rather than phonetic. This means that <span lang="ta" >க</span>, for example, may be pronounced as the allophones <span class="ipa">k</span> <span class="ipa">ɡ</span> <span class="ipa">x</span> <span class="ipa">ɣ</span> or <span class="ipa">h</span>, according to where it appears relative to other sounds in a word, but its pronunciation doesn't change the word.</p> <p>The <mark>18 consonant letters</mark> used for pure Tamil words are supplemented by <mark>6 more Grantha</mark> consonant signs which are used for English and Sanskrit loan words. <mark>Repertoire extensions</mark> for 4 more non-native sounds are achieved by applying the <span class="name">āytam</span> diacritic to characters.<a class="secref" title="Read more."></a></p> <p><mark>Consonant clusters</mark> are indicated using the visible <span class="name">puḷḷi</span> dot to indicate that no vowel follows a consonant. There are <mark>2 ligated forms</mark> which are exceptions from the rule (shown just below).</p> <p>Word-initial clusters do not appear in Tamil. Syllable-/word-final consonants are just written using ordinary consonants with the <span class="name">puḷḷi</span> overhead.<span class="eg" lang="ta"></span></p> <section id="basicC"> <h3>Basic consonants</h3> <p>The basic consonant sounds of the standard Tamil alphabet are represented by the following characters. Note that there are no consonants dedicated only to voiced stops or to fricative sounds.</p> <p>This list uses hyphens to provide information about the context in which allophonic variants are used (see <a class="figref">fig_allophone_table</a>).</p> <figure class="characterBox auto" data-cols="ipa,trans,transc">ப␣த␣ச␣ட␣க</figure> <figure class="characterBox auto" data-cols="ipa,trans,transc">ம␣ந␣ன␣ண␣ஞ␣ங</figure> <figure class="characterBox auto" data-cols="ipa,trans,transc">வ␣ர␣ற␣ழ␣ல␣ள␣ய</figure> <section id="allophonic_variants"> <h4>Allophonic variants for Tamil plosives</h4> <p>The Tamil writing system only represents phonemic differences. The sounds in parentheses in the chart are allophonic variations or sounds used for foreign words. Allophonic variants are not usually indicated in Latin transcriptions.</p> <p>Plosives are unvoiced if they occur word-initially or doubled. Elsewhere they are voiced, with a few becoming fricatives intervocalically. Nasals and approximants are always voiced.</p> <p>Wikipedia provides the following useful table for the realisation of the plosive sounds in context.</p> <figure id="fig_allophone_table"> <table> <thead> <tr> <th> </th> <th>Letter</th> <th>Initial</th> <th>Geminate</th> <th>Intervocalic</th> <th>Post-nasal</th> </tr> </thead> <tbody> <tr> <th>Labial</th> <td><span class="charExample" translate="no"><span class="ex" lang="ta">ப</span></span></td> <td><span class="ipa">p</span></td> <td><span class="ipa">pː</span></td> <td><span class="ipa">β</span>~<span class="ipa">w</span></td> <td><span class="ipa">b</span></td> </tr> <tr> <th>Dental</th> <td><span class="charExample" translate="no"><span class="ex" lang="ta">த</span></span></td> <td><span class="ipa">t̪</span></td> <td><span class="ipa">t̪ː</span></td> <td><span class="ipa">ð</span></td> <td><span class="ipa">d̪</span></td> </tr> <tr> <th>Alveolar</th> <td><span class="charExample" translate="no"><span class="ex" lang="ta">ற</span></span></td> <td>—</td> <td><span class="ipa">tːr</span></td> <td><span class="ipa">r</span></td> <td><span class="ipa">(d)r</span></td> </tr> <tr> <th>Retroflex</th> <td><span class="charExample" translate="no"><span class="ex" lang="ta">ட</span></span></td> <td>—</td> <td><span class="ipa">ʈː</span></td> <td><span class="ipa">ɽ</span></td> <td><span class="ipa">ɖ</span></td> </tr> <tr> <th>Palatal</th> <td><span class="charExample" translate="no"><span class="ex" lang="ta">ச</span></span></td> <td><span class="ipa">tɕ~s</span></td> <td><span class="ipa">tːɕ</span></td> <td><span class="ipa">s</span></td> <td><span class="ipa">dʑ</span></td> </tr> <tr> <th>Velar</th> <td><span class="charExample" translate="no"><span class="ex" lang="ta">க</span></span></td> <td><span class="ipa">k</span></td> <td><span class="ipa">kː</span></td> <td><span class="ipa">x</span>~<span class="ipa">∅</span></td> <td><span class="ipa">ɡ</span></td> </tr> </tbody> </table> <figcaption>Allophonic variants for Tamil plosives.<tt>wp</tt></figcaption> </figure> <p>The consonants are classified into three categories: <dfn>vallinam</dfn> (hard consonants), <dfn>mellinam</dfn> (soft consonants, including all nasals), and <dfn>idayinam</dfn> (medium consonants). These categories are important for the rules of pronunciation.</p> <p class="btw">The mapping of consonants, in particular the plosives, to phonetic sounds is particularly varied for an indic script. These rules for the pronunciation of consonants for the written form of Tamil make for complementary distribution. However, the rules break down to varying degrees when dealing with Sanskrit loan words and the colloquial spoken form of Tamil (particularly in northern areas). For more read <a href="http://en.wikipedia.org/wiki/Tamil_phonology">Tamil phonology</a> and <a href="#ref_krishnamurthi">Krishnamurthi<sup>23-28</sup></a>.</p> </section> </section> <section id="extendedC"> <h3>Grantha consonants</h3> <p>Because the core set of Tamil consonants is quite a lot smaller than that of most indic scripts, Tamil adds additional letters from the <a href="../../uniview/?block=grantha">Grantha</a> script to cover sounds in Sanskrit and English, and complete the basic consonant set.</p> <figure class="characterBox auto" data-cols="ipa,trans,transc">ஜ␣ஸ␣ஶ␣ஷ␣ஹ␣க்ஷ</figure> <p>The last item in the list just above is actually a cluster of two consonants, but is viewed as a single letter of the alphabet.</p> <p><span class="codepoint" translate="no"><span lang="ta">ஶ</span> [<a href="/scripts/tamil/block#char0BB6"><span class="uname">U+0BB6 TAMIL LETTER SHA</span></a>]</span> is not commonly used, except in the <span class="trans">ʃɾī</span> ligature <span class="ex" xml:lang="ta" lang="ta">ஶ்ரீ</span>. See <a class="secref">shri</a>.</p> </section> <section id="aytham"> <h3>Repertoire extension using <span class="name">āytam</span></h3> <figure class="characterBox auto" data-cols="trans">ஃ</figure> <p>For compatability with modern communication, Tamil presses into service <span class="codepoint"><span lang="ta">ஃ</span> <a href="/scripts/tamil/block#char0B83">[<span class="uname">U+0B83 TAMIL SIGN VISARGA</span>]</a></span> (called <span class="name">āytam</span>, and not actually a visarga) to produce fricative sounds from stops.</p> <figure class="otherBox auto" data-cols="ipa,trans,transc">ஃப␣ஃஜ␣ஃஸ␣ஃக</figure> <p>Examples: <span class="eg" lang="ta">ஃபீசு</span> <span class="eg" lang="ta">ஃஜிரொக்ஸ்</span> <span class="charExample" translate="no"><span class="ex" lang="ta">செங்கிஸ் ஃகான்</span> <span class="trans">ceŋ͓kis͓ ˑkɑ̄n͓</span> <span class="ipa">t͡ʃɛŋgɪs xɑːn</span> <span class="meaning">Gengis Khan</span></span></p> <p>Note that a vowel sign can occur between the visarga and the other consonant – ie. the two are not treated as an indivisible unit, eg. <span class="eg" lang="ta">ஃபோரியர்</span> </p> <p>(The Unicode name <span class="uname">VISARGA</span> was applied in error. At one point, the Unicode Standard also treated this as a combining character, but that has also since been rectified.)</p> </section> <section id="allophonic_transcriptions"> <h3>Other extension mechanisms</h3> <section id="superscript_extension"> <h4>Superscript numbers</h4> <p>The Unicode Standard describes a method of extension that uses superscript or subscript digits, particularly to represent missing letters in transcriptions of languages such as Sanskrit and Saurashtra. Each number represents the sound that is unvoiced, unvoiced-aspirated, voiced, or voiced-aspirated, respectively, eg. <span lang="ta" >ப¹</span> = <span class="trans">pa</span>, <span lang="ta" >ப²</span> = <span class="trans">pha</span>, <span lang="ta" >ப³</span> = <span class="trans">ba</span>, and <span lang="ta" >ப⁴</span> = <span class="trans">bha</span>.<tt>u</tt></p> <figure id="allophonic_transcription" class="sideCaption"> <div lang="ta" class="ex" style="font-size: 1.5rem; line-height: 1.4;" data-source="https://tipitaka.org/taml/cscd/e0802n.nrf1.xml">ட²பெத்வா அட்ட² ஸரே ஸேஸா அக்க²ரா ககாராத³யோ நிக்³க³ஹிதந்தா ப்³யஞ்ஜனா நாம ஹொந்தி.</div> <figcaption>Example of superscript numbers being used for allophone disambiguation.</figcaption> </figure> </section> <section id="grantha_extension"> <h4>Grantha script</h4> <p>The Grantha script is often also used by Tamil speakers to write Sanskrit because Grantha contains needed consonants, conjunct forms, and signs.<tt>u</tt></p> </section> <section id="minority_extension"> <h4>Minority languages</h4> <p class="btw">A number of minority languages use a <span class="name">nukta</span> symbol to identify sounds for that language. The shape of the <span class="name">nukta</span> can vary. It always appears below the character, and the shape is most commonly a single dot below the letter (as or Chetti), a small open circle (as in Betta Kurumba), or 2 dots side-by-side (as in Irula). The code to use for such <span class="name">nuktas</span> is <span class="codepoint" translate="no"><span lang="ta">𑌻</span> [<a href="/scripts/tamil/block#char1133B"><span class="uname">U+1133B COMBINING BINDU BELOW</span></a>]</span>.</p> </section> </section> <section id="finals"> <h3>Final consonants</h3> <p>Syllable-/word-final consonants in Tamil are just written using ordinary consonants with the pulli overhead, eg. <span class="eg" lang="ta">தமிழ்</span> </p> </section> <section id="clusters"> <h3>Consonant clusters</h3> <p id="def-consonantcluster" class="explanatoryintro definitionStub"></p> <p id="def-conjunct" class="explanatoryintro definitionStub"></p> <p>Rather than using conjunct glyphs like most other indic scripts, consonant clusters are normally represented using the <span class="name">puḷḷi</span> dot over the character(s) that are not followed by a vowel, eg. <span class="eg" lang="ta">தீர்ப்பு</span> </p> <p>There are two common exceptions in modern Tamil orthography, which are conjunct forms: <span class="charExample" translate="no"><span class="ex" lang="ta">க்ஷ</span> <span class="trans">k͓ʂ</span> <span class="ipa">kʃ</span></span><span class="charExample" translate="no"><span class="ex" lang="ta">ஶ்ரீ</span> <span class="trans">ʃɾī</span> <span class="ipa">ʃri</span></span></p> <p>The <span class="ipa">ʃri</span> combination only occurs with the vowel <span class="ipa">i</span>, but can be composed of two different sequences of consonants.</p> <section id="shri"> <h4>Representation of shrī</h4> <p>The syllable <span class="charExampleInline"><span class="ipa">ʃri</span></span> can be written with two different initial letters: <span class="codepoint" translate="no"><span lang="ta">ஶ</span> [<a href="/scripts/tamil/block#char0BB6"><span class="uname">U+0BB6 TAMIL LETTER SHA</span></a>]</span> (ie. <span class="charExampleInline" translate="no"><span class="ex" lang="ta">ஶ்ரீ</span> <span class="trans">ʃɾī</span></span>) or <span class="codepoint" translate="no"><span lang="ta">ஸ</span> [<a href="/scripts/tamil/block#char0BB8"><span class="uname">U+0BB8 TAMIL LETTER SA</span></a>]</span> (ie.<span class="charExampleInline" translate="no"><span class="ex" lang="ta">ஸ்ரீ</span> <span class="trans">s͓ɾī</span></span>). The result looks identical. Since 2005, the Unicode Consortium has recommended use of the former, but both are still in wide circulation, so Unicode 12 recommends that both be treated as equivalent sequences.<tt>u</tt></p> </section> </section> <section id="consonant_mappings"> <h3>Consonant sounds to characters</h3> <p class="instructions">This section maps Tamil consonant sounds to common graphemes in the Tamil orthography. Click on a grapheme to find other mentions on this page (links appear at the bottom of the page). Click on the character name to see examples and for detailed descriptions of the character(s) shown.</p> <section id="stops"> <h4>Stops</h4> <div id="map_plosives" class="map"> <div class="mapItem"> <div class="phone"><span class="ipa">p</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ப</span> [<a href="/scripts/tamil/block#char0BAA"><span class="uname">U+0BAA TAMIL LETTER PA</span></a>]</span> when initial, or geminated.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">b</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ப</span> [<a href="/scripts/tamil/block#char0BAA"><span class="uname">U+0BAA TAMIL LETTER PA</span></a>]</span> when between vowels, or after a nasal.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">t̪</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">த</span> [<a href="/scripts/tamil/block#char0BA4"><span class="uname">U+0BA4 TAMIL LETTER TA</span></a>]</span> when initial, or geminated, or after a stop.</p> <p><span class="codepoint" translate="no"><span lang="ta">ற</span> [<a href="/scripts/tamil/block#char0BB1"><span class="uname">U+0BB1 TAMIL LETTER RRA</span></a>]</span> inserted when this letter is geminated after a nasal.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">d̪</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">த</span> [<a href="/scripts/tamil/block#char0BA4"><span class="uname">U+0BA4 TAMIL LETTER TA</span></a>]</span> occurs after a nasal.</p> <p><span class="codepoint" translate="no"><span lang="ta">ற</span> [<a href="/scripts/tamil/block#char0BB1"><span class="uname">U+0BB1 TAMIL LETTER RRA</span></a>]</span> inserted when this letter follows a nasal.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ʈ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ட</span> [<a href="/scripts/tamil/block#char0B9F"><span class="uname">U+0B9F TAMIL LETTER TTA</span></a>]</span> when geminated.</p> <p>Not used initially.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ɖ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ட</span> [<a href="/scripts/tamil/block#char0B9F"><span class="uname">U+0B9F TAMIL LETTER TTA</span></a>]</span> after a nasal, or between vowels.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">k</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">க</span> [<a href="/scripts/tamil/block#char0B95"><span class="uname">U+0B95 TAMIL LETTER KA</span></a>]</span> when initial, geminated, or in a cluster.</p> <p>Also, the first part of <span class="codepoint" translate="no"><span lang="ta">க்ஷ</span> [<a href="/scripts/tamil/block#char0B95"><span class="uname">U+0B95 TAMIL LETTER KA</span></a> + <a href="/scripts/tamil/block#char0BCD"><span class="uname">U+0BCD TAMIL SIGN VIRAMA</span></a> + <a href="/scripts/tamil/block#char0BB7"><span class="uname">U+0BB7 TAMIL LETTER SSA</span></a>]</span>. <span class="ipa">kʂ</span>.</p> <p>And part of <span class="codepoint" translate="no"><span lang="ta">ஃஸ</span> [<a href="/scripts/tamil/block#char0B83"><span class="uname">U+0B83 TAMIL SIGN VISARGA</span></a> + <a href="/scripts/tamil/block#char0BB8"><span class="uname">U+0BB8 TAMIL LETTER SA</span></a>]</span> <span class="ipa">ks</span> for foreign words.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">g</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">க</span> [<a href="/scripts/tamil/block#char0B95"><span class="uname">U+0B95 TAMIL LETTER KA</span></a>]</span> when between vowels, or after a nasal.</p> <p> Not found word-initially.</p> </div> </div> </div> </section> <section id="affricates"> <h4>Affricates</h4> <div id="map_affricates" class="map"> <div class="mapItem"> <div class="phone"><span class="ipa">t͡ʃ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ச</span> [<a href="/scripts/tamil/block#char0B9A"><span class="uname">U+0B9A TAMIL LETTER CA</span></a>]</span> when initial, geminated, or following a stop consonant, </p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">d͡ʒ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ச</span> [<a href="/scripts/tamil/block#char0B9A"><span class="uname">U+0B9A TAMIL LETTER CA</span></a>]</span> when following a nasal.</p> <p><span class="codepoint" translate="no"><span lang="ta">ஜ</span> [<a href="/scripts/tamil/block#char0B9C"><span class="uname">U+0B9C TAMIL LETTER JA</span></a>]</span> (grantha consonant).</p> </div> </div> </div> </section> <section id="fricatives"> <h4>Fricatives</h4> <div id="map_fricatives" class="map"> <div class="mapItem"> <div class="phone"><span class="ipa">f</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஃப</span> [<a href="/scripts/tamil/block#char0B83"><span class="uname">U+0B83 TAMIL SIGN VISARGA</span></a> + <a href="/scripts/tamil/block#char0BAA"><span class="uname">U+0BAA TAMIL LETTER PA</span></a>]</span> for foreign words.</p> <p><span class="codepoint" translate="no"><span lang="ta">ப</span> [<a href="/scripts/tamil/block#char0BAA"><span class="uname">U+0BAA TAMIL LETTER PA</span></a>]</span> for some foreign words. </p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">β</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ப</span> [<a href="/scripts/tamil/block#char0BAA"><span class="uname">U+0BAA TAMIL LETTER PA</span></a>]</span> between vowels.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ð</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">த</span> [<a href="/scripts/tamil/block#char0BA4"><span class="uname">U+0BA4 TAMIL LETTER TA</span></a>]</span> between vowels,</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">s</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ச</span> [<a href="/scripts/tamil/block#char0B9A"><span class="uname">U+0B9A TAMIL LETTER CA</span></a>]</span> between vowels, or at the beginning of some words.</p> <p><span class="codepoint" translate="no"><span lang="ta">ஸ</span> [<a href="/scripts/tamil/block#char0BB8"><span class="uname">U+0BB8 TAMIL LETTER SA</span></a>]</span> (grantha consonant).</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">z</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஜ</span> [<a href="/scripts/tamil/block#char0B9C"><span class="uname">U+0B9C TAMIL LETTER JA</span></a>]</span> for borrowed words.</p> <p><span class="codepoint" translate="no"><span lang="ta">ஃஜ</span> [<a href="/scripts/tamil/block#char0B83"><span class="uname">U+0B83 TAMIL SIGN VISARGA</span></a> + <a href="/scripts/tamil/block#char0B9C"><span class="uname">U+0B9C TAMIL LETTER JA</span></a>]</span> for foreign words.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ʃ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஶ</span> [<a href="/scripts/tamil/block#char0BB6"><span class="uname">U+0BB6 TAMIL LETTER SHA</span></a>]</span> (grantha consonant).</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ʂ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஷ</span> [<a href="/scripts/tamil/block#char0BB7"><span class="uname">U+0BB7 TAMIL LETTER SSA</span></a>]</span> (grantha consonant).</p> <p>Also, as the second part of <span class="codepoint" translate="no"><span lang="ta">க்ஷ</span> [<a href="/scripts/tamil/block#char0B95"><span class="uname">U+0B95 TAMIL LETTER KA</span></a> + <a href="/scripts/tamil/block#char0BCD"><span class="uname">U+0BCD TAMIL SIGN VIRAMA</span></a> + <a href="/scripts/tamil/block#char0BB7"><span class="uname">U+0BB7 TAMIL LETTER SSA</span></a>]</span>.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">x</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">க</span> [<a href="/scripts/tamil/block#char0B95"><span class="uname">U+0B95 TAMIL LETTER KA</span></a>]</span> between vowels.</p> <p><span class="codepoint" translate="no"><span lang="ta">ஃக</span> [<a href="/scripts/tamil/block#char0B83"><span class="uname">U+0B83 TAMIL SIGN VISARGA</span></a> + <a href="/scripts/tamil/block#char0B95"><span class="uname">U+0B95 TAMIL LETTER KA</span></a>]</span> for foreign words.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ɣ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">க</span> [<a href="/scripts/tamil/block#char0B95"><span class="uname">U+0B95 TAMIL LETTER KA</span></a>]</span> between vowels.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">h</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஹ</span> [<a href="/scripts/tamil/block#char0BB9"><span class="uname">U+0BB9 TAMIL LETTER HA</span></a>]</span> (grantha consonant).</p> <p><span class="codepoint" translate="no"><span lang="ta">க</span> [<a href="/scripts/tamil/block#char0B95"><span class="uname">U+0B95 TAMIL LETTER KA</span></a>]</span> between vowels only in colloquial speech.</p> </div> </div> </div> </section> <section id="nasals"> <h4>Nasals</h4> <div id="map_nasals" class="map"> <div class="mapItem"> <div class="phone"><span class="ipa">m</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ம</span> [<a href="/scripts/tamil/block#char0BAE"><span class="uname">U+0BAE TAMIL LETTER MA</span></a>]</span> when initial, geminated, in a cluster, or finally.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">n̪</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ந</span> [<a href="/scripts/tamil/block#char0BA8"><span class="uname">U+0BA8 TAMIL LETTER NA</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">n</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ன</span> [<a href="/scripts/tamil/block#char0BA9"><span class="uname">U+0BA9 TAMIL LETTER NNNA</span></a>]</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ɳ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ண</span> [<a href="/scripts/tamil/block#char0BA3"><span class="uname">U+0BA3 TAMIL LETTER NNA</span></a>]</span> when in a cluster, or geminated.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ɲ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ஞ</span> [<a href="/scripts/tamil/block#char0B9E"><span class="uname">U+0B9E TAMIL LETTER NYA</span></a>]</span> when initial, geminated, or in a cluster.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ŋ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ங</span> [<a href="/scripts/tamil/block#char0B99"><span class="uname">U+0B99 TAMIL LETTER NGA</span></a>]</span> when in a cluster, or geminated.</p> </div> </div> </div> </section> <section id="sonorants"> <h4>Other</h4> <div id="map_approximants" class="map"> <div class="mapItem"> <div class="phone"><span class="ipa">ʋ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">வ</span> [<a href="/scripts/tamil/block#char0BB5"><span class="uname">U+0BB5 TAMIL LETTER VA</span></a>]</span> initially, or when geminated. </p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">r</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ற</span> [<a href="/scripts/tamil/block#char0BB1"><span class="uname">U+0BB1 TAMIL LETTER RRA</span></a>]</span> <span class="ednote">check all this out பற்றினார், பன்றி pət̺t̺ʳɪnɑːr , pənd̺ʳɪˑ</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ɾ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ர</span> [<a href="/scripts/tamil/block#char0BB0"><span class="uname">U+0BB0 TAMIL LETTER RA</span></a>]</span><span class="ednote">check out வந்தார்/ʋṅ͓tāɾ͓/ʋən̪d̪ɑːr/came</span> <span class="ednote">aslo ?</span></p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ɻ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ழ</span> [<a href="/scripts/tamil/block#char0BB4"><span class="uname">U+0BB4 TAMIL LETTER LLLA</span></a>]</span> between vowels, or when geminated.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ɽ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ழ</span> [<a href="/scripts/tamil/block#char0BB4"><span class="uname">U+0BB4 TAMIL LETTER LLLA</span></a>]</span> between vowels.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">l</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ல</span> [<a href="/scripts/tamil/block#char0BB2"><span class="uname">U+0BB2 TAMIL LETTER LA</span></a>]</span> when between vowels, geminated, or final. </p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">ɭ</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ள</span> [<a href="/scripts/tamil/block#char0BB3"><span class="uname">U+0BB3 TAMIL LETTER LLA</span></a>]</span> when between vowels, geminated, or final.</p> <p>Doesn't occur in word-initial position.</p> </div> </div> <div class="mapItem"> <div class="phone"><span class="ipa">j</span></div> <div> <p><span class="codepoint" translate="no"><span lang="ta">ய</span> [<a href="/scripts/tamil/block#char0BAF"><span class="uname">U+0BAF TAMIL LETTER YA</span></a>]</span> when initial, or geminated. </p> </div> </div> </div> </section> <p style="font-size: 60%;text-align: end; line-height: 1;">Sources: <a href="https://en.wikipedia.org/wiki/Help:IPA/Tamil">Wikipedia</a>, <a href="http://anunaadam.appspot.com/">Anunaadam</a>, and Google Translate.</p> </section> </section> <section id="encoding"> <h2>Encoding choices</h2> <p>Tamil is a script where different sequences of Unicode characters may produce the same visual result. Here we look at those related to vowels.</p> <section id="encoding_ce"> <h3>Canonical equivalence</h3> <p>Three of the circumgraphs can be written as a single character, or as two characters.</p> <table class="comparison"> <thead> <tr> <th scope="col">Precomposed</th> <th scope="col">Decomposed</th> </tr> </thead> <tbody> <tr> <td><span class="codepoint" translate="no"><span lang="ta">ொ</span> [<a href="/scripts/tamil/block#char0BCA"><span class="uname">U+0BCA TAMIL VOWEL SIGN O</span></a>]</span></td> <td><span class="codepoint" translate="no"><span lang="ta">ொ</span> [<a href="/scripts/tamil/block#char0BC6"><span class="uname">U+0BC6 TAMIL VOWEL SIGN E</span></a> + <a href="/scripts/tamil/block#char0BBE"><span class="uname">U+0BBE TAMIL VOWEL SIGN AA</span></a>]</span></td> </tr> <tr> <td><span class="codepoint" translate="no"><span lang="ta">ோ</span> [<a href="/scripts/tamil/block#char0BCB"><span class="uname">U+0BCB TAMIL VOWEL SIGN OO</span></a>]</span></td> <td><span class="codepoint" translate="no"><span lang="ta">ோ</span> [<a href="/scripts/tamil/block#char0BC7"><span class="uname">U+0BC7 TAMIL VOWEL SIGN EE</span></a> + <a href="/scripts/tamil/block#char0BBE"><span class="uname">U+0BBE TAMIL VOWEL SIGN AA</span></a>]</span></td> </tr> <tr> <td><span class="codepoint" translate="no"><span lang="ta">ௌ</span> [<a href="/scripts/tamil/block#char0BCC"><span class="uname">U+0BCC TAMIL VOWEL SIGN AU</span></a>]</span></td> <td><span class="codepoint" translate="no"><span lang="ta">ௌ</span> [<a href="/scripts/tamil/block#char0BC6"><span class="uname">U+0BC6 TAMIL VOWEL SIGN E</span></a> + <a href="/scripts/tamil/block#char0BD7"><span class="uname">U+0BD7 TAMIL AU LENGTH MARK</span></a>]</span></td> </tr> </tbody> </table> <p class="info">The single code point per vowel sign is preferred, however the parts are separated in <span class="termref" href="../glossary/#nfd">Unicode Normalisation Form D</span> (NFD), and recomposed in <span class="termref" href="../glossary/#nfc">Unicode Normalisation Form C</span> (NFC), so both approaches are canonically equivalent.</p> <p class="info"><span class="codepoint" translate="no"><span lang="ta">ௗ</span> [<a href="/scripts/tamil/block#char0BD7"><span class="uname">U+0BD7 TAMIL AU LENGTH MARK</span></a>]</span> is never used alone.</p> <p>Whichever approach is used, the vowel signs must be typed and stored after the consonant characters they surround, and in left to right order.</p> <p>One of the independent vowels can also be written as a single character, or as two.</p> <table class="comparison"> <thead> <tr> <th scope="col">Precomposed</th> <th scope="col">Decomposed</th> </tr> </thead> <tbody> <tr> <td><span class="codepoint" translate="no"><bdi lang="ta">ஔ</bdi> [<a href="/scripts/tamil/block#char0B94"><span class="uname">U+0B94 TAMIL LETTER AU</span></a>]</span> </td> <td><span class="codepoint" translate="no"><bdi lang="ta">ஔ</bdi> [<a href="/scripts/tamil/block#char0B92"><span class="uname">U+0B92 TAMIL LETTER O</span></a> + <a href="/scripts/tamil/block#char0BD7"><span class="uname">U+0BD7 TAMIL AU LENGTH MARK</span></a>]</span> </td> </tr> </tbody> </table> <p class="info">The precomposed character decomposes in NFD, and re-forms again in NFC. It is generally recommended to use the precomposed character.</p> </section> <!--section id="deprecated"> <h3>Deprecated characters</h3> <p>The followin</p> </section--> <section id="charorder"> <h3>Code point sequences</h3> <p>The following indicates the expected ordering of Unicode characters within a Tamil combining character sequence. The labels are those used for the Unicode <a target="_blank" href="../apps/listindic/index.html?chars=பதசடகமநனணஞஙவரறழலளயஇஈஉஊஎஏஒஓஅஆஐஔிீுூெேொோாைௌ்ௗ“”‘’!().:;?௹₹ஜஸஶஷஹஃௐ।॥">Indic Syllabic Categories</a>. Follow the links to see what characters are represented by a given label.</p> <ol> <li><a href="../apps/listindic/index.html?chars=பதசடகமநனணஞஙவரறழலளயஜஸஶஷஹ" target="_blank">Consonant</a> (23) <bdi class="grammar">|</bdi> <a href="../apps/listindic/index.html?chars=இஈஉஊஎஏஒஓஅஆஐஔ" target="_blank">Vowel_Independent</a> (12)</li> <li><a href="../apps/listindic/index.html?chars=்" target="_blank">Virama</a> <bdi class="grammar">|</bdi> <a href="../apps/listindic/index.html?chars=ிீுூெேொோாைௌௗ" target="_blank">Vowel_Dependent</a> (12)</li> </ol> <p class="info">When a base consonant is followed by 2 vowel signs (ie. in decomposed text with circumgraphs) the code point for the glyph that appears to the left of the base when rendered should be the first after the base code point. Otherwise, the sequence will not recompose.</p> </section> </section> <section id="namedchar"> <h2>Named character sequences</h2> <p>Tamil speakers tend to think of grapheme clusters containing consonant+vowel as a single entity. In some cases, people want to process Tamil using these grapheme clusters as a single unit.</p> <p>To assist with this Unicode provides named character sequences that apply standardised names to whole syllables. These can then be mapped to the private use area for applications wanting to work with Tamil in this way. For normal Tamil data interchange, however, the standard codepoints should be used.<tt>u</tt></p> </section> <section id="symbols"> <h2>Symbol</h2> <p>OM is a religious concept found in all three major religions born in India viz. Hinduism, Jainism and Buddhism. <span class="codepoint"><span lang="ta">ௐ</span> [<a href="/scripts/tamil/block#char0BD0"><span class="uname">U+0BD0 TAMIL OM</span></a>]</span> is widely used in Hindu religious texts, temple publications, and as neon lamps of sign boards in shops etc.</p> <figure class="auxiliaryBox auto" data-cols="" data-notes="om">ௐ</figure> </section> <section id="otherletter"> <h2>Tamil supplement</h2> <p>Unicode version 12 added the <a target="_blank" href="https://r12a.github.io/uniview/?block=Tamil_Supplement">Tamil Supplement block</a>. This contains numbers, symbols, and one punctuation mark that are not normally used in modern Tamil, although a few are sometimes used in traditional formats, such as wedding invitations.<tt>u</tt></p> <p>The number characters are for fractions, and the symbols include measures of grain, old currency symbols, symbols of weight, length, and area, agricultural symbols, clerical symbols, and other symbols and abbreviations. The punctuation marks the end of a text.</p> <p>For more information see Sharma<tt>ss</tt>.</p> </section> <section id="numbers"> <h2>Numbers, dates, currency, etc</h2> <p>There is a set of Tamil numbers, but modern Tamil text uses Western digits. </p> <p>The CLDR standard-decimal pattern is <code>#,##,##0.###</code>. The standard-percent pattern is <code>#,##,##0%</code>.<tt>c</tt></p> <p>An interesting feature of large numbers written in India is that they apply groupings of two, rather than three, digits between commas (even when using european digits).</p> <figure class="sideCaption"> <div class="large"><span class="ex" lang="ta" style="color:#333;">20,00,000</span></div> <figcaption>Two million, written with Indian comma separators.</figcaption> </figure> <section id="archaic_digits"> <h3>Archaic digits & symbols</h3> <div class="btw"> <p>The Tamil digits can be used as a standard decimal counting system, but older versions of the Tamil system had no zero and inserted characters to indicate tens, hundreds, and thousands.</p> <figure class="archaicBox auto" data-cols="trans,transc">௦␣௧␣௨␣௩␣௪␣௫␣௬␣௭␣௮␣௯␣௰␣௱␣௲</figure> <p>For a description of the algorithm, see <a href="http://www.w3.org/TR/predefined-counter-styles/#tamil-styles">Predefined Counter Styles</a> and <a href="http://unicode.org/notes/tn21/tamil_numbers.pdf">Unicode Technical Note #21</a>. You can experiment with this using the<a href="/apps/counterconverter/"> Counter styles converter</a> tool (select Tamil, Ancient).</p> <figure class="sideCaption"> <div lang="ta" class="ex large" style="color:#333;">௲௨௱௩௰௪</div> <figcaption>The number 1,234 using the old Tamil numbering system.</figcaption> </figure> <p>The following signs were formerly used with numbers.</p> <figure class="archaicBox auto" data-cols="" data-notes="number,debit,credit,quantity,as above">௺␣௶␣௷␣௳␣௸</figure> </div> </section> <section id="currency"> <h3>Currency</h3> <figure class="characterBox auto small" data-cols="">₹␣௹</figure> <p>The CLDR standard format for currency is <code>¤ #,##,##0.00</code>, and the Indian currency symbol is <span class="codepoint" translate="no"><span lang="ta">₹</span> [<a href="/scripts/tamil/block#char20B9"><span class="uname">U+20B9 INDIAN RUPEE SIGN</span></a>]</span><tt>c</tt>. The latter sign was introduced by the Indian government in 2010.</p> <p><span class="codepoint"><span lang="ta">௹</span> [<a href="/scripts/tamil/block#char0BF9"><span class="uname">U+0BF9 TAMIL RUPEE SIGN</span></a>]</span> is the Tamil rupee sign.</p> <figure class="sideCaption"> <div lang="ta" class="ex large" style="color:#333;">௹. 6,000</div> <figcaption>The Tamil rupee sign used to indicate a sum of 6,000 rupees.</figcaption> </figure> <p class="btw">The Indian rupee sign is distinguished from <span class="codepoint" translate="no"><span lang="ta">₨</span> [<a href="/scripts/currencysymbols/block#char20A8"><span class="uname">U+20A8 RUPEE SIGN</span></a>]</span>, which is an older symbol not formally tied to any particular currency.<tt>u</tt> Follow that link for more information about the rupee.</p> </section> <section id="dates"> <h3>Dates</h3> <div class="btw"> <p>The following signs were formerly used for dates.</p> <figure class="archaicBox auto" data-cols="" data-notes="day,month,year,time">௳␣௴␣௵␣ள</figure> </div> </section> </section> <section id="direction"> <h2>Text direction</h2> <p>The Tamil script is written horizontally, left to right.</p> <p><a id="showBidiClass" target="_blank" href="tbd">Show default <code class="kw" translate="no">bidi_class</code> properties for characters by the modern Tamil orthography.</a></p> </section> <section id="shaping"> <h2>Glyph shaping & positioning</h2> <p class="instructions">This section brings together information about the following topics: <span id="writingstylesInline">writing styles</span>; <span id="cursiveInline">cursive text</span>; <span id="gsubInline">context-based shaping</span>; <span id="gposInline">context-based positioning</span>; <span id="baselinesInline">baselines, line height, etc.</span>; <span id="fontstyleInline">font styles</span>; <span id="transformsInline">case & other character transforms</span>.</p> <p class="instructions">You can experiment with examples using the <a href="/pickers/tamil/">Tamil character app</a>.</p> <p>Tamil printed text is not cursive, and has no special requirements for baseline alignment between mixed scripts or in general.</p> <p>The orthography has no case distinction, and no special transforms are needed to convert between characters.</p> <section id="writing_styles"> <h3>Font styles</h3> <p>In 1978, in an attempt to simplify the script, the government of Tamil Nadu proposed the reform of certain letters and syllables. See <a class="figref">fig_1978_reform</a> for a list of changes that were adopted. In all cases this is just a font change, rather than a change to the underlying code points.</p> <figure id="fig_1978_reform" class="sideCaption"> <img alt="னா,றா,ணா,ணை,லை,னை,ளை,ணொ,றொ,னொ,ணோ,றோ,னோ" class="ex" lang="ta" src="images/1978_reforms.png" style="max-width:20rem;"> <figcaption>Proposed reforms of 1978<tt>m</tt>.</figcaption></figure> <p>The fonts used in this page use the new forms.</p> <p>Two more proposed changes, which <em>would</em> have changed the code points used, have not been widely adopted: <span class="codepoint" translate="no"><bdi lang="ta">அய்</bdi> [<a href="/scripts/tamil/block#char0B85"><span class="uname">U+0B85 TAMIL LETTER A</span></a> + <a href="/scripts/tamil/block#char0BAF"><span class="uname">U+0BAF TAMIL LETTER YA</span></a> + <a href="/scripts/tamil/block#char0BCD"><span class="uname">U+0BCD TAMIL SIGN VIRAMA</span></a>]</span> for <span class="codepoint" translate="no"><bdi lang="ta">ஐ</bdi> [<a href="/scripts/tamil/block#char0B90"><span class="uname">U+0B90 TAMIL LETTER AI</span></a>]</span>; and <span class="codepoint" translate="no"><bdi lang="ta">அவ்</bdi> [<a href="/scripts/tamil/block#char0B85"><span class="uname">U+0B85 TAMIL LETTER A</span></a> + <a href="/scripts/tamil/block#char0BB5"><span class="uname">U+0BB5 TAMIL LETTER VA</span></a> + <a href="/scripts/tamil/block#char0BCD"><span class="uname">U+0BCD TAMIL SIGN VIRAMA</span></a>]</span> for <span class="codepoint" translate="no"><bdi lang="ta">ஔ</bdi> [<a href="/scripts/tamil/block#char0B94"><span class="uname">U+0B94 TAMIL LETTER AU</span></a>]</span>.</p> <!--p>These reforms only spread in India and the digital world, whereas Sri Lanka, Singapore, Malaysia, Mauritius, Reunion and other Tamil speaking regions continue to use the traditional syllables.<tt>wss</tt></p--> </section> <section id="context"> <h3>Context-based shaping & positioning</h3> <section id="vowelligation"> <h4>Vowel ligatures</h4> <p>Vowel signs for <span class="ipa">u</span> and <span class="ipa">uː</span>, and to some extent <span class="ipa">i</span> and <span class="ipa">i:</span>, produce significantly different, ligated shapes as they combine with the base consonant. The figure below shows the various alternative shapes produced by <span class="codepoint" translate="no"><span lang="ta">ு</span> [<a href="/scripts/tamil/block#char0BC1"><span class="uname">U+0BC1 TAMIL VOWEL SIGN U</span></a>]</span> when combined with different base characters.</p> <style> #consvowellig td { font-size: 2rem; line-height: 1.4; text-align: center; } </style> <figure id="consvowellig" class="sideCaption"> <table style="margin: auto;"> <tbody> <tr> <th>Base consonant</th> <th>Combination</th> </tr> <tr> <td><span class="ex" lang="ta">க</span></td> <td><span class="ex" lang="ta">கு</span></td> </tr> <tr> <td><span class="ex" lang="ta">ச</span></td> <td><span class="ex" lang="ta">சு</span></td> </tr> <tr> <td><span class="ex" lang="ta">ஞ</span></td> <td><span class="ex" lang="ta">ஞு</span></td> </tr> <tr> <td><span class="ex" lang="ta">ட</span></td> <td><span class="ex" lang="ta">டு</span></td> </tr> <tr> <td><span class="ex" lang="ta">ஜ</span></td> <td><span class="ex" lang="ta">ஜு</span></td> </tr> </tbody> </table> <figcaption>Ligatures with <span class="codepoint" translate="no"><span lang="ta">ு</span> [<a href="/scripts/tamil/block#char0BC1"><span class="uname">U+0BC1 TAMIL VOWEL SIGN U</span></a>]</span>.</figcaption> </figure> <p>Besides these significant transformations, special shaping is used to ensure a clean join between the consonant and vowel, eg. <span class="charExample" translate="no"><span class="ex" lang="ta">லி</span> <span class="transc">li</span></span></p> <p>In the following sequence the vowel sign is stretched slightly to fit the shape of the consonant.<span class="charExample" translate="no"><span class="ex" lang="ta">ஷி</span> <span class="transc">ʂi</span></span></p> <p><a href="../apps/vowel_signs/index.html?preset=taml" target="_blank">Show a table of all consonant+vowel combinations.</a></p> </section> <section id="shaping_ra"> <h4>Shaping RA</h4> <p>In certain contexts or fonts, <span class="codepoint" translate="no"><span lang="ta">ர</span> [<a href="/scripts/tamil/block#char0BB0"><span class="uname">U+0BB0 TAMIL LETTER RA</span></a>]</span> may look identical to <span class="codepoint" translate="no"><span lang="ta">ா</span> [<a href="/scripts/tamil/block#char0BBE"><span class="uname">U+0BBE TAMIL VOWEL SIGN AA</span></a>]</span>, or it may have a short tail in others. These letters looked the same in old manuscripts, especially palm leaves, and in early printed materials. The stroke was introduced by Father Beschi to differentiate the two, but only if it didn't have a vowel sign or pulli attached, so ரா, ரெ, ரோ, etc. carried a stroke, but not ர், ரி and ரீ. This approach is still followed, particularly in India, but in Malaysia and Singapore, there is a government regulation requiring the use of the form with a bottom stroke in all contexts. People are comfortable with both forms and will hardly notice the difference.<tt>m</tt> </p> <p><a class="figref">fig_ra_variants</a> shows a text that distinguishes between the two variant glyphs. Compare the items circled in red. The orange circle indicates a vowel sign that would be ambiguous if one was not expecting a tail for the consonant.</p> <figure id="fig_ra_variants" class="sideCaption"> <img alt="Newspaper clipping" src="images/ra_variants.jpeg" style="width: 20em;" data-source="https://github.com/w3c/iip/issues/58"/> <figcaption>Variant forms of <span class="codepoint" translate="no"><span lang="ta">ர</span> [<a href="/scripts/tamil/block#char0BB0"><span class="uname">U+0BB0 TAMIL LETTER RA</span></a>]</span>.</figcaption></figure> </section> <section id="gpos"> <h4>Context-based positioning</h4> <p class="observation"><i>Observation:</i> Tamil consonants tend to all be the same height, and so the vertical positioning of the pulli tends to be the same. Otherwise, apart from the vowel signs and pulli, Tamil doesn't really have combining characters.</p> <p class="observation"><i>Observation:</i> The only time Tamil has multiple combining marks attached to the same base character is when decomposed multi-part vowel signs are used, see <a class="secref">vowelsigns</a>.</p> </section> </section> <section id="fontstyle"> <h3>Font styling & weight</h3> <p>Italics and bold are not traditional features of Tamil text.<tt>i,#h_segmentation</tt></p> <p>Some fonts have upright glyphs, whereas others have slightly slanted glyphs.</p> <p class="observation"><i>Observation:</i> <a href="images/oblique.jpeg" target="_blank">Panels</a> of text in a Tamil newspaper that uses fonts that are more slanted than normal. Could this be an italic font face? Note that all the body text of the panel uses that font. There appear to be no instances where italic-looking fonts are applied to inline text. Other fonts used for the body text in other articles tended to also have a slight lean, though not as much. The verticals in headings tend to be upright.</p> </section> </section> <section id="graphemes"> <h2>Graphemes</h2> <p id="def-grapheme" class="explanatoryintro definitionStub"></p> <p id="def-orthographicsyllable" class="explanatoryintro definitionStub"></p> <p>Grapheme clusters can be used most of the time to segment Tamil words. However, in modern Tamil, 3 character sequences form conjuncts which should not be broken during edit operations such as letter-spacing, first-letter highlighting, and in-word line breaking. For those operations one needs to segment the text using <em>orthographic syllables</em>, which string grapheme clusters together with <span class="codepoint" translate="no"><bdi lang="ta">்</bdi> [<a href="/scripts/tamil/block#char0BCD"><span class="uname">U+0BCD TAMIL SIGN VIRAMA</span></a>]</span>, which has an Indic Syllabic Category of <code class="kw" translate="no">Virama</code>.</p> <p>Since there is only one Tamil virama, modern Tamil needs to interpret the virama (pulli) in two different ways for segmentation: (1) as a simple vowel-killer, and (2) as a conjunct initiator. </p> <p>Tamil also segments text in a third way for <a href="#justification">justification</a>.</p> <section id="graphemeC"> <h3>Grapheme clusters</h3> <p><code>Base Combining_mark*</code></p> <p>Combining marks may include one of the following types of character.</p> <ol> <li><a href="../apps/listindic/index.html?chars=िीुूेोैौाृॅॉ">Dependent vowel</a>s (see <a class="secref">combiningvowels</a>)</li> <li><a href="../apps/listindic/index.html?chars=्">Virama</a> (<span class="name">pulli</span>) (see <a class="secref">clusters</a> and <a class="secref">novowel</a>)</li> </ol> <p>Combining marks may occur after a consonant base. There is usually only one vowel sign component per base consonant, however in decomposed text circumgraphs are represented by 2 combining characters.</p> <p>The following examples show a variety of grapheme clusters:</p> <p class="instructions">Click on the text version of these words to see more detail about the composition.</p> <table class="noBorderTable alignStartTable" style="margin-inline-start: 10%;"> <tr><td><img src="images/g_road_street.svg" alt="" style="height:3rem;"></td><td><span class="eg" lang="ta">சாலை</span></td></tr> <tr><td><img src="images/g_cut.svg" alt="" style="height:3rem;"></td><td><span class="eg" lang="ta">அரி</span></td></tr> <tr><td><img src="images/g_there.svg" alt="" style="height:3rem;"></td><td><span class="eg" lang="ta">அங்கே</span></td></tr> <tr><td><img src="images/g_fineness.svg" alt="" style="height:3.3rem;"></td><td><span class="eg" lang="ta">அஞ்ஞானம்</span></td></tr> <tr><td><img src="images/g_stalk.svg" alt="" style="height:3.3rem;"></td><td><span class="eg" lang="ta">உலர்ந்தம்</span></td></tr> </table> <p>In Tamil the <span class="name">aytham</span>, <span class="codepoint" translate="no"><bdi lang="ta">ஃ</bdi> [<a href="/scripts/tamil/block#char0B83"><span class="uname">U+0B83 TAMIL SIGN VISARGA</span></a>]</span>, has the general category of <code class="kw" translate="no">Other_Letter</code> and Indic syllabic category of <code class="kw" translate="no">Modifying_Letter</code>. It is it's own typographic unit.</p> </section> <section id="orthographicS"> <h3>Larger typographic units</h3> <p><code>(Consonant Pulli)* Grapheme_cluster</code></p> <p>Because a virama (with the exceptions described below) is a visible vowel-killer and doesn't create conjunct forms, it can usually be treated as just another combining mark and segmentation can break after it.</p> <p>However, modern Tamil uses <a class="termref" href="../glossary/index.html#conjunct">conjunct</a> forms for <a href="#clusters">two sound sequences</a> (though this can be written using 3 code point sequences). The sequences are <span class="charExampleInline" translate="no"><span class="ex" lang="ta">க்ஷ</span> <span class="transc">kṣa</span></span>, and <span class="charExampleInline" translate="no"><span class="ex" lang="ta">ஶ்ரீ</span> <span class="transc">śrī</span></span> or <span class="charExampleInline" translate="no"><span class="ex" lang="ta">ஸ்ரீ</span> <span class="transc">srī</span></span> (the latter two being alternate ways of writing the same sound). These sequences should not be broken during segmentation. Note, that the 'shri' sequence must include the <span class="trans">i</span> vowel to produce a conjunct. With a different vowel, the sequence of characters is displayed using a visual <span class="name">pulli</span>, eg. <span class="charExample" translate="no"><span class="ex" lang="ta">இஶ்ரேல்</span> <span class="transc">iśrēl</span> <span class="meaning">Israel</span></span></p> <p>Editorial operations that change the visual appearance of the text, such as letter-spacing, first-letter highlighting, in-word line-breaking, and justification, should never split conjunct forms apart. For this reason, an alternative way of segmenting graphemes is needed. This may not apply, however, for some other operations such as cursor movement or backwards delete.</p> <p>Where conjuncts appear, a typographic unit contains multiple grapheme clusters. The non-final grapheme clusters all end with <span class="codepoint" translate="no"><bdi lang="ta">்</bdi> [<a href="/scripts/tamil/block#char0BCD"><span class="uname">U+0BCD TAMIL SIGN VIRAMA</span></a>]</span>, and the final grapheme cluster begins with a consonant. However, as mentioned, <em>this only applies for 3 character sequences in modern Tamil</em>.</p> <p>This difference in segmentation arises from a reinterpretation of the effect of the virama – no special, alternative virama character is available. So this behaviour needs to be triggered by special rules.</p> <p>The following are examples.</p> <p class="instructions">Click on the text version of these words to see more detail about the composition.</p> <table class="noBorderTable alignStartTable" style="margin-inline-start: 10%;"> <tr><td><img src="images/g_lakh.svg" alt="" style="height: 3.3rem;"></td><td><span class="eg" lang="ta">லக்ஷம்</span></td></tr> <tr><td><img src="images/g_weakness.svg" alt="" style="height: 3.3rem;"></td><td><span class="eg" lang="ta">க்ஷீணம்</span></td></tr> <tr><td><img src="images/g_shri.svg" alt="" style="height: 3.3rem;"></td><td><span class="eg" lang="ta" dir="ltr">ஶ்ரீ</span></td></tr> </table> <p>Treatment as conjuncts rather than grapheme clusters can also affect vowel sign positioning. An illustration of this can be seen when a consonant cluster is associated with a vowel sign component that is displayed to the left of the base. For example, observe the placement of the pre-base vowel in <a class="figref">fig_virama_seg</a>. In the conjunct form on the left, the vowel sign surrounds the whole conjunct. If the sequence is not rendered as a conjunct, as in the second example, the pre-base glyph precedes the <span class="uname">SA</span>, not the <span class="uname">KA</span>.</p> <figure id="fig_virama_seg"> <img src="images/fig_virama_seg.svg" alt="க்ஷொ க்ஸொ" class="ex" lang="ta" data-notes="40px Noto Serif Tamil" style="height: 4rem; max-width: 40rem;"> <figcaption>Placement of pre-base vowel glyphs.<br><span class="instructions">Click to see the change in composition.</span></figcaption> </figure> </section> <section id="glyphSeg"> <h3>Glyph-based typographic units</h3> <p>When Tamil is justified, space is added around graphemes, but in many cases it is also added <em>within</em> grapheme clusters. The code point sequences are not affected, though. See <a class="secref">justification</a>.</p> <figure id="fig_partridge_graphemes" class="sideCaption"> <img alt="கௌதாரி" class="ex" lang="ta" src="images/partridge.png" data-source=""> <figcaption>Tamil glyph separation. The vowel signs are coloured, and on the top line the grapheme-cluster boundaries are shown with thin vertical lines. The first vowel sign is a circumgraph (ie. a single code point that renders glyphs on more than one side of the base). The bottom line shows how this word would be expanded to fill a line.</figcaption> </figure> </section> <section id="webSegmentation"> <h3>Browser behaviour</h3> <div style="padding: 3rem; padding-block: 0;"> <p><span class="leadin">Test in your browser.</span> <span class="instructions">The word on the left contains a conjunct form. On the right is a virama that doesn't form a conjunct. First, the text is displayed in a contenteditable paragraph, then in a textarea. Results are reported for Gecko (Firefox), Blink (Chrome), and WebKit (Safari) on a Mac.</span></p> <p class="graphemeTest" contenteditable><span class="ex" lang="ta">க்ஷீணம்</span> <span class="ex" lang="ta">தோசிக்கொக்கு</span></p> <div class="graphemeTextarea"><textarea class="ex" lang="ta">க்ஷீணம் தோசிக்கொக்கு</textarea></div> </div> <p><span class="leadin">Cursor movement.</span> <span class="instructions">Move the cursor through the text.</span><br> Gecko steps through the whole text using grapheme clusters. The cursor visually stops in the middle of the conjunct sequence. Blink does the same, however the cursor appears to skip to the end of the conjunct sequence and you have to hit the cursor key again (with no apparent movement) to actually clear it. WebKit treats all sequences with a virama as a single unit, which is inappropriate for the second word.</p> <p><span class="leadin">Selection.</span> <span class="instructions">Place the cursor next to a character and hold down shift while pressing an arrow key.</span><br> The behaviour is the same as for cursor movement. This has the effect of sometimes appearing to highlight backwards in Blink.</p> <p><span class="leadin">Deletion.</span> Forward deletion works in the same way as cursor movement. The backspace key deletes code point by code point.</p> <p><span class="leadin">Line-break.</span><span class="instructions">See <a href="https://w3c.github.io/i18n-tests/exploratory/line_breaking/int-line-break?lang=ta&fontSize=36&width=820&lineBreak=anywhere&wordBreak=normal&text=xxxxx%20%E0%AE%A4%E0%AF%8B%E0%AE%9A%E0%AE%BF%E0%AE%95%E0%AF%8D%E0%AE%95%E0%AF%8A%E0%AE%95%E0%AF%8D%E0%AE%95%E0%AF%81%20%E0%AE%95%E0%AF%8D%E0%AE%B7%E0%AF%80%E0%AE%A3%E0%AE%AE%E0%AF%8D&a=&i=Change%20the%20width%20of%20the%20box%20slowly%20to%20see%20how%20the%20text%20wraps." target="_blank">this test</a>. The CSS sets the value of the <code class="kw" translate="no">line-break</code> property to <code class="kw" translate="no">anywhere</code>. Change the size of the box to slowly move the line break point.</span><br> Gecko wraps at grapheme cluster boundaries but wraps the conjunct as a single unit. Blink and WebKit wrap everything at grapheme cluster boundaries, which has the effect of breaking the conjunct in half at the end of a line.</p> </section> </section> <!--section id="segmentsOLD"> <h2>Typographic units</h2> <p>Tamil principally uses word boundaries for <a href="#linebreak">line-breaking</a> and basic <a href="#justification">justification</a>, but uses grapheme boundaries for other operations that work at the sub-word level.</p> <p>Phrase, sentence, and section delimiters are described in <a class="secref">phrase</a>. </p> <p><strong>This section is still undergoing research and development.</strong></p> <section id="graphemes"> <h3>Grapheme boundaries</h3> <p id="def-grapheme" class="explanatoryintro definitionStub"></p> <p id="def-ccs" class="explanatoryintro definitionStub"></p> <p>Modern Tamil needs to interpret the virama (pulli) in two different ways for segmentation. It also segments text in a third way for <a href="#justification">justification</a>.</p> <section id="basicSeg"> <h4>Basic typographic units</h4> <p><code>Base (Combining_mark)*</code></p> <p>Most of the time, in modern Tamil, a typographic unit is equivalent to a single CCS.</p> <p>The base is generally a consonant, however independent vowels also constitute a base (though they are not followed by combining marks).</p> <p>In the case of Tamil, <mark>sequences that include a syllable nucleus</mark> only include one type of combining mark: <a href="#vowelsigns">vowel signs</a>.</p> <p> Examples: </p> <table class="noBorderTable alignStartTable" style="margin-inline-start: 10%;"> <tr><td><img src="images/s_road_street.png" alt=""></td><td><span class="eg" lang="ta">சாலை</span></td></tr> <tr><td><img src="images/s_cut.png" alt=""></td><td><span class="eg" lang="ta">அரி</span></td></tr> </table> <p><mark>Syllable codas and vowel-less consonants in clusters</mark> are written using a sequence of consonant letter followed by a visible <a href="#suppression">virama</a> (<span class="name">pulli</span>), eg.</p> <table class="noBorderTable alignStartTable" style="margin-inline-start: 10%;"> <tr><td><img src="images/s_there.png" alt=""></td><td><span class="eg" lang="ta">அங்கே</span></td></tr> <tr><td><img src="images/s_fineness.png" alt=""></td><td><span class="eg" lang="ta">அஞ்ஞானம்</span></td></tr> <tr> <td><img src="images/s_stalk.png" alt=""></td> <td><span class="eg" lang="ta">உலர்ந்தம்</span></td> </tr> </table> <p>In Tamil the <span class="name">aytham</span>, <span class="codepoint" translate="no"><bdi lang="ta">ஃ</bdi> [<a href="/scripts/tamil/block#char0B83"><span class="uname">U+0B83 TAMIL SIGN VISARGA</span></a>]</span>, has the general category of Other_Letter and Indic syllabic category of Modifying_Letter. It is it's own typographic unit.</p> </section> <section id="conjSeg"> <h4>Conjunct typographic units</h4> <p><code>Base Virama Base (Combining_mark)*</code></p> <p>Modern Tamil uses conjunct forms for only <a href="#clusters">two sound sequences</a> (though this can be written using 3 code point sequences). This kind of typographic unit is appropriate when, and only when, text with these conjuncts need to be segmented. </p> <p>The exceptions are the sequences <span class="charExampleInline" translate="no"><span class="ex" lang="ta">க்ஷ</span> <span class="trans">k͓ʂ</span></span>, and <span class="charExampleInline" translate="no"><span class="ex" lang="ta">ஶ்ரீ</span> <span class="trans">ʃ͓ɾī</span></span> / <span class="charExampleInline" translate="no"><span class="ex" lang="ta">ஸ்ரீ</span> <span class="trans">s͓ɾī</span></span> (the latter two being alternate ways of writing the same sound). These sequences should not be broken during segmentation. Note, that the 'shri' sequence must include the <span class="trans">i</span> vowel to produce a conjunct. With a different vowel, the sequence of characters is displayed using a visual <span class="name">pulli</span>, eg. <span class="charExample" translate="no"><span class="ex" lang="ta">இஶ்ரேல்</span> <span class="trans">ịʃ͓ɾēl͓</span> <span class="meaning">Israel</span></span> </p> <p>Because <mark> conjunct forms</mark> are never split apart visually, the whole conjunct form is incorporated into a typographic unit. The consonants that make up the cluster are all encoded with a following <span class="name">virama</span>, however in these cases the typographic unit doesn't stop there.</p> <table class="noBorderTable alignStartTable" style="margin-inline-start: 10%;"> <tr><td><img src="images/s_lakh.png" alt=""></td><td><span class="eg" lang="ta">லக்ஷம்</span></td></tr> <tr><td><img src="images/s_weakness.png" alt=""></td><td><span class="eg" lang="ta">க்ஷீணம்</span></td></tr> <tr><td><img src="images/s_shri.png" alt=""></td><td><span class="eg" lang="ta" dir="ltr">ஶ்ரீ</span></td></tr> </table> <p>This difference in segmentation arises from a reinterpretation of the effect of the virama – no special, alternative virama character is available. So this behaviour needs to be triggered by special rules.</p> </section> <section id="grapheme_clusters"> <h4>Grapheme clusters</h4> <p><a href="../glossary/index.html#graphemecluster">Unicode grapheme clusters</a> correspond to the normal (non-conjunct) typographic units used in Tamil.</p> <p> The kind of typographic unit that includes conjuncts <em>cannot</em> be realised using Unicode grapheme clusters because they create breaks after a <span class="name">pulli</span>, rather than including the following consonants.</p> </section> <section id="glyphSeg"> <h4>Glyph-based typographic units</h4> <p>When Tamil is justified, space is added around graphemes, but in many cases it is also added <em>within</em> the normal basic typographic units. The code point sequences are not affected, though. See <a class="secref">justification</a>.</p> </section> </section--> <section id="inline"> <h2>Punctuation & inline features</h2> <section id="word"> <h3>Word boundaries</h3> <p id="def-wordboundary" class="explanatoryintro definitionStub"></p> <p>Words are separated by spaces. Tamil words are often quite long because they bind together multiple morphemes.</p> <p>Word-level segmentation is used for <a href="#linebreak">line-breaking</a>.</p> </section> <section id="phrase"> <h3>Phrase & section boundaries</h3> <figure class="characterBox auto small" data-cols="">,␣;␣:␣.␣?␣!␣।␣॥</figure> <p>Western punctuation is used generally. There are no punctuation marks in the Tamil Unicode block.</p> <p> For separators at the sentence level and below, the following are used in Tamil language text.</p> <table class="character_list"> <tbody> <tr> <th scope="row">phrase</th> <td> <p><span class="codepoint" translate="no"><span lang="ta">,</span> [<a href="/scripts/devanagari/block#char002C"><span class="uname">U+002C COMMA</span></a>]</span></p> <p><span class="codepoint" translate="no"><span lang="ta">;</span> [<a href="/scripts/devanagari/block#char003B"><span class="uname">U+003B SEMICOLON</span></a>]</span></p> <p><span class="codepoint" translate="no"><span lang="ta">:</span> [<a href="/scripts/devanagari/block#char003A"><span class="uname">U+003A COLON</span></a>]</span></p></td> </tr> <tr> <th scope="row">sentence</th> <td> <p><span class="codepoint" translate="no"><span lang="ta">.</span> [<a href="/scripts/devanagari/block#char002E"><span class="uname">U+002E FULL STOP</span></a>]</span></p> <p><span class="codepoint" translate="no"><span lang="ta">?</span> [<a href="/scripts/devanagari/block#char003F"><span class="uname">U+003F QUESTION MARK</span></a>]</span></p> <p><span class="codepoint" translate="no"><span lang="ta">!</span> [<a href="/scripts/devanagari/block#char0021"><span class="uname">U+0021 EXCLAMATION MARK</span></a>]</span> </p> <span class="codepoint"><span lang="ta">।</span> <a href="/scripts/devanagari/block#char0964">[<span class="uname">U+0964 DEVANAGARI DANDA</span>]</a></span> </td> </tr> <tr> <th scope="row">section</th> <td><span class="codepoint"><span lang="ta">॥</span> [<a href="/scripts/devanagari/block#char0965"><span class="uname">U+0965 DEVANAGARI DOUBLE DANDA</span></a>]</span> (occasionally)</td> </tr> </tbody> </table> <p>The danda and double danda are sometimes used. They are punctuation marks in the Devanagari block that are also used for several other scripts.<tt>u</tt></p> <!--figure id="fig_punctuation"> <div> <img src="images/fig_punctuation.png" alt="அவர் பதிலளித்தார்: “நான் இதைவிட ஒரு படி மேலான அழகான ஒன்றைத் தேர்ந்தெடுத்துள்ளேன்!” “யார்?” “லா பாவர்ட்டா (வறுமை)”: இதற்குப் பிறகு தன்னிடம் இருந்த அனைத்தையும் அவர் துறந்தார். நிலப்பகுதி வழியாக ஒரு யாசகனாகத் திரிந்தார்." class="ex" lang="ta" data-notes="Noto Serif Tamil 24px >72ppi"> <details><summary>translation</summary><p>He replied: “I have chosen something one step more beautiful!” "Who?" “La Pavarta (Poverty)”: After this he gave up everything he had. Wandered through the land as a Yasaka.</p></details> </div> <figcaption>Various types of punctuation in Tamil.</figcaption> </figure--> <figure id="fig_punctuation" class="sideCaption"> <img src="images/fig_punctuation.png" alt="அவர் பதிலளித்தார்: “நான் இதைவிட ஒரு படி மேலான அழகான ஒன்றைத் தேர்ந்தெடுத்துள்ளேன்!” “யார்?” “லா பாவர்ட்டா (வறுமை)”: இதற்குப் பிறகு தன்னிடம் இருந்த அனைத்தையும் அவர் துறந்தார். நிலப்பகுதி வழியாக ஒரு யாசகனாகத் திரிந்தார்." class="ex" lang="ta" data-notes="Noto Serif Tamil 24px >72ppi"> <div> <figcaption>Various types of punctuation in Tamil.</figcaption> <details><summary>translation</summary><p>He replied: “I have chosen something one step more beautiful!” "Who?" “La Pavarta (Poverty)”: After this he gave up everything he had. Wandered through the land as a Yasaka.</p></details> </div> </figure> </section> <section id="bracketing"> <h3>Bracketed text</h3> <figure class="characterBox auto small" data-cols="">(␣)</figure> <p>Tamil commonly uses ASCII parentheses to insert parenthetical information into text.</p> <table class="character_list"> <thead> <tr> <th> </th> <th scope="row">start</th> <th scope="row">end</th> </tr> </thead> <tbody> <tr> <th>standard</th> <td><p><span class="codepoint" translate="no"><span lang="ta">(</span> [<a href="/scripts/devanagari/block#char0028"><span class="uname">U+0028 LEFT PARENTHESIS</span></a>]</span></p></td> <td><p><span class="codepoint" translate="no"><span lang="ta">)</span> [<a href="/scripts/devanagari/block#char0029"><span class="uname">U+0029 RIGHT PARENTHESIS</span></a>]</span></p></td> </tr> </tbody> </table> </section> <section id="quotations"> <h3>Quotations & citations</h3> <figure class="characterBox auto small" data-cols="">“␣”␣‘␣’</figure> <p>Tamil texts use quotation marks around quotations. Of course, due to keyboard design, quotations may also be surrounded by ASCII double and single quote marks.</p> <table class="character_list"> <thead> <tr> <th> </th> <th scope="row">start</th> <th scope="row">end</th> </tr> </thead> <tbody> <tr> <th scope="row">default</th> <td><p><span class="codepoint" translate="no"><span lang="ta">“</span> [<a href="/scripts/devanagari/block#char201C"><span class="uname">U+201C LEFT DOUBLE QUOTATION MARK</span></a>]</span></p></td> <td><span class="codepoint" translate="no"><span lang="ta">”</span> [<a href="/scripts/devanagari/block#char201D"><span class="uname">U+201D RIGHT DOUBLE QUOTATION MARK</span></a>]</span></td> </tr> <tr> <th scope="row">nested</th> <td><p><span class="codepoint" translate="no"><span lang="ta">‘</span> [<a href="/scripts/devanagari/block#char2018"><span class="uname">U+2018 LEFT SINGLE QUOTATION MARK</span></a>]</span></p></td> <td><span class="codepoint" translate="no"><span lang="ta">’</span> [<a href="/scripts/devanagari/block#char2019"><span class="uname">U+2019 RIGHT SINGLE QUOTATION MARK</span></a>]</span></td> </tr> </tbody> </table> <p>The default quote marks for Tamil are <span class="codepoint" translate="no"><span lang="ta">“</span> [<a href="/scripts/devanagari/block#char201C"><span class="uname">U+201C LEFT DOUBLE QUOTATION MARK</span></a>]</span> at the start, and <span class="codepoint" translate="no"><span lang="ta">”</span> [<a href="/scripts/devanagari/block#char201D"><span class="uname">U+201D RIGHT DOUBLE QUOTATION MARK</span></a>]</span> at the end.<tt>c</tt></p> <p>When an additional quote is embedded within the first, the quote marks are <span class="codepoint" translate="no"><span lang="ta">‘</span> [<a href="/scripts/devanagari/block#char2018"><span class="uname">U+2018 LEFT SINGLE QUOTATION MARK</span></a>]</span> and <span class="codepoint" translate="no"><span lang="ta">’</span> [<a href="/scripts/devanagari/block#char2019"><span class="uname">U+2019 RIGHT SINGLE QUOTATION MARK</span></a>]</span>.<tt>c</tt></p> </section> <section id="emphasis"> <h3>Emphasis</h3> <p class="prompts">tbd</p> <p>Underlining is not a traditional feature of Tamil text.<tt>i,#text_decoration</tt></p> <p>One way to express emphasis is to elongate vowel sounds using extra independent vowels, eg. compare <span class="charExample" translate="no"><span class="ex" lang="ta">பெரீய</span> <span class="trans">peɾīy</span> <span class="transc">(perīya)</span> <span class="meaning">really big</span></span><span class="charExample" translate="no"><span class="ex" lang="ta">பெரீஇஇய</span> <span class="trans">peɾīịịy</span> <span class="transc">(perīiiya)</span> <span class="meaning">reeeeally big</span></span></p> </section> <section id="abbrev"> <h3>Abbreviation, ellipsis & repetition</h3> <p class="prompts">tbd</p> </section> <section id="inlinenotes"> <h3>Inline notes & annotations</h3> <p class="prompts">tbd</p> </section> <section id="otherpunctuation"> <h3>Other punctuation</h3> <p class="prompts">tbd</p> </section> <section id="otherinline"> <h3>Other inline text decoration</h3> <p class="prompts">tbd</p> </section> </section> <section id="para"> <h2>Line & paragraph layout</h2> <section id="linebreak"> <h3>Line breaking & hyphenation</h3> <p>The primary break points for Tamil are word boundaries, however Tamil is an agglutinative language and Tamil words can be long. This can lead to large gaps during justification, and sometimes words that are longer than the available column width, so it is desirable to also hyphenate words.</p> <section id="hyphenation"> <h4>Hyphenation</h4> <p id="def-hyphenation" class="explanatoryintro definitionStub"></p> <p>Because of the length of Tamil words, hyphenation is useful during layout, but it isn't easy to do because of the complexity of Tamil words.</p> <p>Hyphenation must take place at syllable boundaries. A hyphen is not usually added at the end of the line when a word is hyphenated.<tt>st</tt></p> <figure id="fig_hyphenation_ta" class="sideCaption"> <p><img alt="Newspaper clipping" src="images/hyphenation_ta.png" data-source="https://github.com/w3c/iip/issues/58"/></p> <figcaption>Text from the Tamil newpaper, Daily Thanthi, showing hyphenated words with yellow highlighting.</figcaption> </figure> <p>Prabhakar<tt>p</tt> proposes rules that single characters should be avoided at line start/end, especially characters with nukta at line start, and a word with 5 characters including 3 consecutive consonants can't be split. He says that due to the fact that Tamil is highly inflexional, morphological or pattern based approaches are needed, rather than simple dictionary lookup.</p> </section> <section id="linedge"> <h4>Line-edge rules</h4> <p>As in almost all writing systems, certain punctuation characters should not appear at the end or the start of a line. The Unicode line-break properties help applications decide whether a character should appear at the start or end of a line.</p> <p><a id="showLinebreaks" target="_blank" href="tbd">Show line-breaking properties for characters in the modern Tamil orthography.</a></p> <p>The following list gives examples of typical behaviours for some of the characters used in modern Tamil. Context may affect the behaviour of some of these and other characters.</p> <p class="instructions">Click/tap on the Tamil characters to show what they are.</p> <ul> <li><span class="ex list" lang="ta">“ ‘ (</span> should not be the last character on a line.</li> <li><span class="ex list" lang="ta">” ’ ) . , ; ! ? । ॥ %</span> should not begin a new line.</li> <li><span class="ex list" lang="ta">௹ ₹</span> should be kept with any number, even if separated by a space or parenthesis.</li> </ul> <p>Line breaking should not move a danda or double danda to the beginning of a new line even if they are preceded by a space character.</p> </section> </section> <section id="justification"> <h3>Text alignment & justification</h3> <section id="justifn"> <h4>Justification</h4> <p>Tamil usually adjusts inter-word spacing in order to justify text on a line, however there are situations where words are stretched, too. In those cases, some special rules apply.</p> <p>Justification can be helped significantly by hyphenating the text (see <a class="secref">hyphenation</a>).</p> <p>Especially in narrow columns of text where text is not hyphenated, large empty spaces can appear if only one or a small number of words will fit on a line. </p> <figure id="fig_justification_gaps" class="sideCaption"> <p><img alt="Newspaper clipping" src="images/justification_gaps.png" data-source="https://github.com/w3c/iip/issues/58"/></p> <figcaption>A narrow column in newsprint with large gaps between justified words on a line.</figcaption> </figure> <p>On a line with few words and large inter-word spacing, justification can be improved by adjusting the width of the words themselves on the affected lines.<tt>g58,#issuecomment-561995889</tt> <a class="figref">fig_justification_in_newsprint</a> shows examples of inter-character space being compacted and stretched to minimise inter-word gaps.</p> <figure id="fig_justification_in_newsprint" class="sideCaption"> <img alt="Newspaper clipping" src="images/justification_in_newsprint.png" data-source="https://github.com/w3c/iip/issues/58"/> <figcaption>Newspaper clipping showing inter-character compaction and stretching.</figcaption> </figure> <p>If only a single word fits on a line, it is common practise to stretch the word so that it fills the line, eg. the word <span class="charExampleInline" translate="no"><span class="ex" lang="ta">பெரும்பான்மை</span> <span class="trans">peɾum͓pɑ̄n͓mʌʲ</span> <span class="transc">perumpāṉmai</span> <span class="meaning">majority</span></span> in <a class="figref">fig_justification_one_word</a>.</p> <figure id="fig_justification_one_word" class="sideCaption"> <p><img alt="Newspaper clipping" src="images/justification_one_word.png" data-source="https://github.com/w3c/type-samples/issues/76"/></p> <figcaption>Examples of a single word stretched to fill a line.</figcaption> </figure> <p>However, it is important to note that, for Tamil, any stretching is applied evenly between <em>each non-connected glyph</em> across the line, regardless of whether the separated glyphs are part of a syllabic cluster, or even a single code point. This is not inter-character spacing, but rather inter-glyph spacing. The spacing doesn't occur between glyphs or code points that are ligated (touching), nor between a non-spacing mark and its base.</p> <p><a class="figref">fig_partridge</a> illustrates how this stretching is based on glyphs, and is independent of the underlying code points.</p> <figure id="fig_partridge" class="sideCaption"> <img alt="கௌதாரி" class="ex" lang="ta" src="images/partridge.png" data-source=""> <figcaption>Tamil glyph separation. The vowel signs are coloured, and on the top line the grapheme-cluster boundaries are shown with thin vertical lines. The first vowel sign is a circumgraph (ie. a single code point that renders glyphs on more than one side of the base). The bottom line shows how this word would be expanded to fill a line.</figcaption> </figure> <p>Note the following:</p> <ol> <li>The sequence of three items on the far left is actually composed of only two code points, <span class="codepoint" translate="no"><span lang="ta">க</span> [<span class="uname">U+0B95 TAMIL LETTER KA</span>]</span> followed by the circumgraph <span class="codepoint" translate="no"><span lang="ta"> ௌ</span> [<span class="uname">U+0BCC TAMIL VOWEL SIGN AU</span>]</span>. Notice that there are spaces between the base consonant and <em>both</em> glyphs that make up the vowel sign. </li> <li>To its immediate right, the base character and combining mark that make up the middle syllable have been split apart, so the units are codepoints rather than grapheme clusters.</li> <li>The last grapheme cluster (on the right) is kept intact, because the vowel sign is joined to the base consonant.</li> </ol> <p>Examples of other consonant-vowel combinations that are not separated include <span class="charExample inline" translate="no"><span class="ex" lang="ta">ஜு</span> <span class="transc">ʤu</span></span> and non-spacing marks such as pulli, eg. <span class="ex" lang="ta">ங்</span>.</p> <p class="observation"><em>Observation:</em> Where the word being stretched across a whole line is the last word in the sentence, it appears that the sentence-final punctuation also participates in the letter-spacing (see <a class="figref">fig_justification_full_stop</a>).</p> <figure id="fig_justification_full_stop"> <p><img alt="Newspaper clipping" src="images/justification_full_stop.png"/></p> <figcaption>Full stop participating in the letter-spacing when a single word is stretched to fill a line.</figcaption> </figure> </section> <section id="indents"> <h4>Paragraph indents</h4> <p>Paragraph features are the same as in English. Paragraphs can start with or without indents.<tt>g26</tt></p> </section> </section> <section id="spacing"> <h3>Text spacing</h3> <p class="prompts">tbd</p> <p>This section looks at ways in which spacing is applied between characters over and above that which is introduced during justification.</p> </section> <section id="baselines"> <h3>Baselines, line height, etc.</h3> <p>Tamil uses the so-called 'alphabetic' baseline, which is the same as for Latin and many other scripts.</p> <p>To give an approximate idea, <a class="figref">fig_baselines</a> compares Latin and Tamil glyphs from Noto Serif fonts. The basic height of Tamil letters is the same as the Latin x-height, however extenders extend slightly beyond the Latin ascenders and descenders, creating a need for slightly larger line heights.</p> <figure id="fig_baselines"> <img src="images/fig_baselines.png" alt="Hhqxக்ஞிஒஶ்ரீஇஜொ" class="ex" lang="ta" data-notes="Noto Serif Tamil 130px" style="max-width: 40rem;"> <img src="images/fig_baselines_sans.png" alt="Hhqxக்ஞிஒஶ்ரீஇஜொ" class="ex" lang="ta" data-notes="Noto Sans Tamil 130px" style="max-width: 40rem;"> <figcaption>Font metrics for Latin text compared with Tamil glyphs in the Noto Serif Tamil (top) and Noto Sans Tamil (bottom) fonts.</figcaption> </figure> <p><a class="figref">fig_baselines_other</a> shows similar comparisons for the Tamil MN and Latha fonts. The basic height of the Latha font is set to the Latin ascender height, but the overall height of the Tamil glyphs is not much taller than the other fonts.</p> <figure id="fig_baselines_other"> <img src="images/fig_baselines_mn.png" alt="Hhqxக்ஞிஒஶ்ரீஇஜொனை" class="ex" lang="ta" data-notes="Tamil MN 130px" style="max-width: 40rem;"> <img src="images/fig_baselines_latha.png" alt="Hhqxக்ஞிஒஶ்ரீஇஜொனை" class="ex" lang="ta" data-notes="Latha 116px" style="max-width: 40rem;"> <figcaption>Latin font metrics compared with Tamil glyphs in the Tamil MN (top) and Latha (bottom) fonts.</figcaption> </figure> </section> <section id="lists"> <h3>Counters, lists, etc.</h3> <p class="instructions">You can experiment with counter styles using the <a target="_blank" href="../../app-counters/">Counter styles converter</a>. Patterns for using these styles in CSS can be found in <a href="https://www.w3.org/TR/predefined-counter-styles/#tamil-styles">Ready-made Counter Styles</a>, and we use the names of those patterns here to refer to the various styles.</p> <p>Counters are used to number lists, chapter headings, etc.</p> <p>Tamil commonly uses western numbering systems for lists, however, Tamil also has a native numeric style and an archaic additive style.</p> <section id="cs_numeric"> <h4>Numeric</h4> <p>The <span class="kw" translate="no">tamil</span> numeric style for Tamil is decimal-based and uses the digits below.</p> <figure class="characterBox auto" data-cols="trans,transc">௦␣௧␣௨␣௩␣௪␣௫␣௬␣௭␣௮␣௯</figure> <p>Examples:</p> <figure class="auto counterstylesBox noindex" data-cols="" data-notes="1,2,3,4,11,22,33,44,111,222,333,444">௧␣௨␣௩␣௪␣௧௧␣௨௨␣௩௩␣௪௪␣௧௧௧␣௨௨௨␣௩௩௩␣௪௪௪</figure> </section> <section id="cs_additive"> <h4>Additive</h4> <div class="btw"> <p>The <span class="kw" translate="no">ancient-tamil</span> additive style uses the numbers below. It is specified for a range between 1 and 9,999. </p> <figure class="characterBox auto" data-notes="9000,8000,7000,6000,5000,4000,3000,2000,1000,900,800,700,600,500,400,300,200,100,90,80,70,60,50,40,30,20,10,9,8,7,6,5,4,3,2,1" data-cols="">௯௲␣௮௲␣௭௲␣௬௲␣௫௲␣௪௲␣௩௲␣௨௲␣௲␣௯௱␣௮௱␣௭௱␣௬௱␣௫௱␣௪௱␣௩௱␣௨௱␣௱␣௯௰␣௮௰␣௭௰␣௬௰␣௫௰␣௪௰␣௩௰␣௨௰␣௰␣௯␣௮␣௭␣௬␣௫␣௪␣௩␣௨␣௧</figure> <p>Examples:</p> <figure class="auto counterstylesBox noindex" data-cols="" data-notes="1,2,3,4,11,22,33,44,111,222,333,444">௧␣௨␣௩␣௪␣௰௧␣௨௰௨␣௩௰௩␣௪௰௪␣௱௰௧␣௨௱௨௰௨␣௩௱௩௰௩␣௪௱௪௰௪</figure> </div> </section> <section id="cs_alphabetic"> <h4>Alphabetic</h4> <p class="observation"><span class="leadin">Observation:</span> Alphabetic counters are seen, but are not very common.<tt>g57</tt> See <a href="https://github.com/w3c/type-samples/issues/75">an example</a>. It is not clear whether the counters extend beyond the vowel range.</p> </section> </section> <section id="initials"> <h3>Styling initials</h3> <p>It is possible to find the first letter in a paragraph styled in a distinctive way – usually larger and dropping down from the top of the first line. Some rules for positioning south Indian scripts are proposed by [<a href="http://w3c.github.io/ilreq/#h_scripts_without_hanging_baseline">ilreq</a>].</p> <p>Initials should not just include the first character on the line, but should include any associated combining characters. If the first character is the beginning of the sequences <span class="charExampleInline" translate="no"><span class="ex" lang="ta">க்ஷ</span> <span class="trans">k͓ʂ</span></span>, and <span class="charExampleInline" translate="no"><span class="ex" lang="ta">ஶ்ரீ</span> <span class="trans">ʃ͓ɾī</span></span>/<span class="charExampleInline" translate="no"><span class="ex" lang="ta">ஸ்ரீ</span> <span class="trans">s͓ɾī</span></span>, all of the characters making up the conjunct should be included in the styling. See an example of a highlighted syllable in <a class="figref">fig_drop_caps</a>.</p> <p>Any punctuation such as opening quotes and opening parentheses should also be included in the initial styling.</p> <p>Indian languages generally use the drop style or a boxed letter. <a href="https://www.w3.org/TR/css-inline-3/#initial-letter-wrapping">Contour-filling</a> is not needed for Indian text.<tt>i</tt></p> <figure id="fig_drop_caps" class="sideCaption"> <div> <img alt="Examples of dropped highlights" src="images/drop_cap_wide.png" data-source="UDHR"/> <img alt="Examples of dropped highlights" src="images/drop_cap_low.png" data-source="UDHR"/> </div> <figcaption>Two example paragraphs showing dropped highlighted initials.</figcaption> </figure> <p>For the drop style, the alphabetic baseline of the highlighted letter(s) should match the bottom of the row that determines the size of the highlighted letter(s). In box examples in <a class="figref">fig_drop_caps</a> the highlighted text is set to 3 lines in height. In the second example, the highlighted text descends below the baseline, so an extra line is cleared to accommodate it. The tall vowel sign in the first example rises slightly higher than the normal character height, and slightly exceeds the height of the first line of text.</p> <p>The exact positioning of the normal character height relative to the characters in the rest of the first line needs further research. The examples in <a class="figref">fig_drop_caps</a> show the default result for the Safari browser.</p> <p>For the <a href="https://www.w3.org/TR/dpub-latinreq/#raised-caps-and-sunken-caps"> raised</a> style of initial, the highlighted characters sit on the same baseline as the first line of the paragraph, but rise above it (see <a class="figref">fig_raised_initial</a>). The inter-line spacing needs to accommodate any descenders.</p> <figure id="fig_raised_initial" class="sideCaption"> <p><img alt="Example of raised highlight" src="images/raised_initial.png" data-source="UDHR"/></p> <figcaption> Example of a raised highlighted initial.</figcaption> </figure> <p>Another common approach in Indic text is to create a box around the enlarged letter(s), often with a background colour. In this case the box dimensions are associated with the other lines in the paragraph, and the highlighted letters float within the box.</p> </section> </section> <section id="page"> <h2>Page & book layout</h2> <p class="instructions">This section is for any features that are specific to Tamil and that relate to the following topics: <span id="generallayoutInline">general page layout & progression</span>; <span id="gridsInline">grids & tables</span>; <span id="footnotesInline">notes, footnotes, etc</span>; <span id="interactionInline">forms & user interaction</span>; <span id="headersInline">page numbering, running headers, etc</span>. </p> <!--section id="generallayout"> <h3>General page layout & progression</h3> <p class="prompts">tbd</p> </section> <section id="grids"> <h3>Grids & tables</h3> <p class="prompts">tbd</p> </section> <section id="footnotes"> <h3>Notes, footnotes, etc</h3> <p class="prompts">tbd</p> </section> <section id="interaction"> <h3>Forms & user interaction</h3> <p class="prompts">tbd</p> </section> <section id="headers"> <h3>Page numbering, running headers, etc</h3> <p class="prompts">tbd</p> </section--> </section> <section id="online_samples"> <h2>Online resources</h2> <ol> <li><a href="https://unicode.org/udhr/d/udhr_tam.html">Universal Declaration of Human Rights - Assyrian Neo-Aramaic</a></li> <li><a href="https://en.wikipedia.org/wiki/List_of_Newspapers_in_Chennai">List of Newspapers in Chennai</a></li> </ol> </section> <section id="refs"> <h2>References</h2> </section> <section id="acknowledgments"> <h2>Acknowledgments</h2> <p>Many thanks to Muthu Nedumaran for reviewing this material and sending suggestions.</p> </section> <div id="analytics"></div> <script>addAnalytics()</script> <div id="phoneLinks"></div> <div id="panel" style="display:none"><div style="display:block"> </div><p style="text-align:right;"></p></div> <div class="smallprint"><span id="version"></span></div> <script> addPageFeatures() </script> </body> </html>