CINXE.COM

Transcription | GiellaLT

<!DOCTYPE html> <html lang="en-US"> <head> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="shortcut icon" type="image/png" href="favicon.png?"> <!-- Begin Jekyll SEO tag v2.8.0 --> <title>Transcription | GiellaLT</title> <meta name="generator" content="Jekyll v3.10.0" /> <meta property="og:title" content="Transcription" /> <meta property="og:locale" content="en_US" /> <meta name="description" content="GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document." /> <meta property="og:description" content="GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document." /> <link rel="canonical" href="https://giellalt.github.io/transcriptions/" /> <meta property="og:url" content="https://giellalt.github.io/transcriptions/" /> <meta property="og:site_name" content="GiellaLT" /> <meta property="og:type" content="website" /> <meta name="twitter:card" content="summary" /> <meta property="twitter:title" content="Transcription" /> <script type="application/ld+json"> {"@context":"https://schema.org","@type":"WebPage","description":"GiellaLT provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more about Why. See also How to get started and our Privacy document.","headline":"Transcription","url":"https://giellalt.github.io/transcriptions/"}</script> <!-- End Jekyll SEO tag --> <link rel="stylesheet" href="/assets/css/style.css?v=e3934bfb8bf14d6d971a919e715913840e05ca06"> <!--[if lt IE 9]> <script src="https://cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv.min.js"></script> <![endif]--> </head> <body> <div class="wrapper"> <header> <h1><a href="https://giellalt.github.io/">GiellaLT</a></h1> <p><a href="/AboutGiellaLT.html">GiellaLT</a> provides an infrastructure for rule-based language technology aimed at minority and indigenous languages, and streamlines building anything from keyboards to speech technology. Read more <a href="https://indigenous-langtech.uit.no">about Why</a>. See also <a href="/infra/GettingStarted.html"><b>How to get started</b><a/> and our <a href="/Personvern.html">Privacy document</a>.</p> <p><form method="get" action="https://www.google.com/search"> <input type="hidden" name="sitesearch" value="https://giellalt.github.io/" /> <input type="text" name="q" maxlength="255" placeholder="Search site with Google" /> </form></p> <p class="view"><a href="https://github.com/giellalt">View GiellaLT on GitHub</a></p> <div id="toc"> <h2 class="tocheader">Page Content</h2> <ul class="left_toc" id="left_toc"><li><a href="#overview">Overview</a></li><li><a href="#testing">Testing</a><ul><li><a href="#commands">Commands</a></li><li><a href="#documents-for-testing">Documents for testing</a></li></ul></li><li><a href="#phonetics">Phonetics</a></li><li><a href="#spell-relax">Spell relax</a></li></ul> </div> </header> <section> <h1 id="transcription">Transcription</h1> <p>The infrastructure has several FSTs for transcribing from one text string to another.</p> <h2 id="overview">Overview</h2> <p>The folder <code class="language-plaintext highlighter-rouge">lang-xxx/src/transcriptions/</code> contains setup for various number and symbol representations to their text representation. The source files in the catalogue are:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>transcriptor-abbrevs2text.lexc <span class="c"># for abbreviations</span> transcriptor-clock-digit2text.lexc <span class="c"># for time expressions</span> transcriptor-date-digit2text.lexc <span class="c"># for dates</span> transcriptor-numbers-digit2text.lexc <span class="c"># for cardinals and ordinals</span> </code></pre></div></div> <p>Each <code class="language-plaintext highlighter-rouge">lexc</code> file gives rise to two transducers, here with <code class="language-plaintext highlighter-rouge">clock</code> as example:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>transcriptor-clock-digit2text.lexc <span class="o">[</span>...] transcriptor-clock-digit2text.filtered.lookup.hfstol transcriptor-clock-text2digit.filtered.lookup.hfstol </code></pre></div></div> <p>The direction (from digit to text or vice versa) is shown in the filename.</p> <h2 id="testing">Testing</h2> <h3 id="commands">Commands</h3> <p>Here are some resources for testing the transcriptors. You may generate the first 100 numbers as follows (replace the digits after <code class="language-plaintext highlighter-rouge">seq</code> according to what you want to test):</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">seq </span>1 100 | <span class="se">\</span> hfst-lookup <span class="nt">-q</span> src/transcriptions/transcriptor-numbers-digit2text.filtered.lookup.hfstol </code></pre></div></div> <p>Then you may check the output against the normative analyser:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">seq </span>1 100 | <span class="se">\</span> hfst-lookup <span class="nt">-q</span> src/transcriptions/transcriptor-numbers-digit2text.filtered.lookup.hfstol | <span class="se">\</span> <span class="nb">cut</span> <span class="nt">-f2</span> | <span class="se">\</span> <span class="nb">cut</span> <span class="nt">-c1-</span> | <span class="se">\</span> <span class="nb">grep</span> <span class="nt">-v</span> <span class="s1">'^$'</span> | <span class="se">\</span> hfst-lookup <span class="nt">-q</span> src/analyser-gt-norm.hfstol </code></pre></div></div> <h3 id="documents-for-testing">Documents for testing</h3> <p>There are ready-made files for all numeral formats:</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$GTHOME</span>/ped/doc/common/numratesting/cardinal <span class="nv">$GTHOME</span>/ped/doc/common/numratesting/clock <span class="nv">$GTHOME</span>/ped/doc/common/numratesting/date <span class="nv">$GTHOME</span>/ped/doc/common/numratesting/ordinal </code></pre></div></div> <p>You may thus test with these files (here with <code class="language-plaintext highlighter-rouge">clock</code> as example):</p> <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cat</span> <span class="nv">$GTHOME</span>/ped/doc/common/numratesting/clock | <span class="se">\</span> hfst-lookup src/transcriptions/transcriptor-clock-digit2text.filtered.lookup.hfstol </code></pre></div></div> <p>(If you don鈥檛 have GTHOME, the files are <a href="https://gtsvn.uit.no/langtech/trunk/ped/doc/common/numratesting/">here</a></p> <h2 id="phonetics">Phonetics</h2> <p>The folder <code class="language-plaintext highlighter-rouge">lang-xxx/src/phonetics/</code> contains setup for text-to-IPA transcription.</p> <h2 id="spell-relax">Spell relax</h2> <p>The folder <code class="language-plaintext highlighter-rouge">lang-xxx/src/orthography/</code> contains files for translating sloppy writing and non-standard encoding to standard forms.</p> </section> <footer> <p><small>Hosted on GitHub Pages &mdash; Theme by <a href="https://github.com/orderedlist">orderedlist</a></small></p> </footer> </div> <script src="/assets/js/scale.fix.js"></script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10