CINXE.COM

The Sanskrit Heritage Site

<!doctype html> <html> <head> <meta charset="utf-8"> <title>The Sanskrit Heritage Site</title> <meta name="author" content="Gérard Huet"> <meta property="dc:datecopyrighted" content="2025"> <meta property="dc:rightsholder" content="Gérard Huet"> <meta name ="keywords" content="india,dictionary,indology,sanskrit,lexicography,linguistics,indo-european,dictionnaire,sanscrit,panini,indology,linguistics"> <meta name="description" content="This site provides tools for Sanskrit processing: dictionary search, morphology generation and analysis, segmentation, tagging and parsing."> <link rel="shortcut icon" href="IMAGES/favicon.ico"> <link rel="apple-touch-icon" href="IMAGES/touch-icon-iphone-60x60.png"> <link rel="apple-touch-icon" sizes="60x60" href="IMAGES/touch-icon-ipad-76x76.png"> <link rel="apple-touch-icon" sizes="114x114" href="IMAGES/touch-icon-iphone-retina-120x120.png"> <link rel="apple-touch-icon" sizes="144x144" href="IMAGES/touch-icon-ipad-retina-152x152.png"> <link rel="stylesheet" type="text/css" href="DICO/style.css" media="screen,tv"> </head> <body class="pink_back"> <!-- Pale_rose --> <table class="body"> <tr><td> <h1 class="title">The Sanskrit Heritage Site <br> <a href="IMAGES/Yantra.jpg"> <img src="IMAGES/smallyantra.gif" alt="Shri Yantra"> </a> </h1> <h3 class="c3">Version 3.65 [2025-02-25] (fr)<br> <img src="/cgi-bin/Count.cgi?df=sanskrit.dat&amp;ft=4" alt="counter"/><br/> <!-- (since 01/09/2003) --> </h3> <p> Welcome to the Sanskrit Heritage site. It provides various services for the computational treatment of Sanskrit. <p> The first service is dictionary access. The dictionary is a hypertext structure giving access to the Sanskrit lexicon, given with grammatical information. There are currently two versions of the dictionary. <br> The first one is the original Heritage Sanskrit-French dictionary, that serves as morphology generator, and is thus fully equipped with grammatical tools. Furthermore it offers a rich encyclopedic contents about Indian culture. You may also download a printable pdf version of this dictionary, as explained below. A fully hypertext version in the <a href="goldendict.html">Goldendict</a> format is also available. <p> The second lexicon is a digital version of the Monier-Williams Sanskrit-English dictionary, a much more complete lexicon for the Sanskrit language. It is issued from Thomas Malten's digitalization of the Monier-Williams at K&ouml;ln University, turned into an XML databank by Jim Funderburk, and finally adapted to the HTML Heritage look and feel by Pawan Goyal. The Sanskrit Heritage dictionary is thus mirrored in the Monier-Williams, which allows compatibility of the grammatical tools. <p> The choice of the dictionary is set to Heritage by default by accessing the standard entry page "sanskrit.inria.fr", but is set to Monier-Williams if you rather invoke <a href="https://sanskrit.inria.fr/index.en.html"> "https://sanskrit.inria.fr/index.en.html"</a> Each dictionary is accessible separately by its search page, respectively <a href="DICO/index.fr.html"> <strong><span class="red">Sanskrit Heritage</span></strong></a> and <a href="DICO/index.en.html"><strong><span class="red">Monier-Williams</span></strong></a>. </p> <p> This site offers a number of linguistic services for the Sanskrit language, such as a <a href="DICO/reader.html">Sanskrit Reader</a> that parses Sanskrit text under various formats into Sanskrit banks of tagged hypertext. Various phonological and morphological tools are also provided. </p> <p> Please visit the <a href="manual.html">Reference manual</a> for learning how to use the various facilities. </p> <p> <!--<strong><span class="red">!!! NEW !!!</span></strong>--> An elementary Sanskrit course for beginners, using the site resources, may be found in its French version <a href="COURSE/Lessons/lesson1.fr.html">ici</a> and in its English version <a href="COURSE/Lessons/lesson1.en.html">here.</a> It allows you to read and understand a simple text, extracted from the Vikramacarita story. An updated version of the English lesson is available <a href="COURSE/Lessons/HST_1.html">here.</a> <br> </p> <p> <strong><span class="red">!!! NEW !!!</span></strong> The story of Nala, taken from the 3rd book of Mahābhārata, is now available in the Corpus repository, section Nala. Its French translation is available <a href="COURSE/Lessons/lesson2.fr.html">here</a> as the second lesson on using our tools. </p> <p> A more extensive course on using Hypertext Sanskrit Tools, developed jointly with the Sanskrit Studies Department of the University of Hyderabad, was taught remotely in Spring 2024. Its video contents may be accessed <a href="https://sanskrit.uohyd.ac.in/IOE-2024/lecture_notes.html">here.</a> </p> <h2 class="b2"> Sanskrit Heritage dictionary in book form </h2> You may download the Heritage dictionary as a pdf document from <a href="Heritage.pdf">PDF</a>. This document is readable through Acrobat Reader, a well-known document management software from Adobe freely available on Internet. Since the document is rather large, you have to account for some delay in loading its 5 Mb. This is a still on-going effort, lexical acquisition implies quick obsolescense of this document which grows along with versions. <br> The Sanskrit Heritage dictionary is also available in an older version, in an ebook format, usable with the Babyloo, Stardict or Goldendict software. Please visit the <a href="goldendict.html">Golden Sanskrit Heritage</a> page. <h2 class="b2"> Multilingual hyper-text dictionary </h2> <h3 class="b3"> Interactive browsing </h3> <p> The dictionary may be accessed through an indexing engine: <a href="DICO/index.fr.html"> <strong><span class="red">Héritage du sanskrit</span></strong></a> for the French dictionary, or <a href="DICO/index.en.html"> <strong><span class="red">Monier-Williams</span></strong></a> for the English one. </p> <p> Your browser must be HTML5 compliant, and for proper viewing of Sanskrit text you must have installed on your system open type fonts for roman transliteration with diacritics, and for devan&#257;gar&#299;. A Unicode-compliant font for devan&#257;gar&#299; with proper ligatures is Apple's Devanagari MT for Macintosh OS X stations. For Windows users, installation of font 'Arial MS Unicode' is advised for proper rendering. </p> <!-- OBSOLETE <p> You may have to fiddle with the controls of your browser, so that the font declarations from the dictionary pages get precedence over the standard selection, and thus encoding is specified as Unicode compliant (UTF-8 encoding). </p> --> <p> Note that many words are given with their etymology as hypertext links. You may thus navigate from a word to its morphological components, down to its roots. Also, the gender declarations of the main entries are mouse-sensitive, and give you direct access to the relevant declension table. Similarly, the present class mark of the verbal roots gives access to the conjugation schemes. Also for verb entries, preverbs lead you to the correspondingly prefixed derived verbs. </p> <p> All these grammatical tools, originally developed for the Heritage dictionary, are being progressiveley extended to the Monier-Williams dictionary. Thus our HTML Monier-Williams offers similar declension and conjugation facilities. </p> <h3 class="b3"> Sanskrit made easy </h3> <p> If you want to search for a Sanskrit word without knowing its exact transliteration, go to section "Sanskrit made easy" of the index page, which allows you to search for words without knowing precise diacritics usage. For instance, search Vishnou, Siva, or the grammarian Panini. This interface is limited for the moment to the Sanskrit Heritage dictionary. </p> <h2 class="b2"> Sanskrit Grammarian <br> <img src="IMAGES/panini.jpg" alt="Panini"> </h2> <p> This interface gives the declension tables for Sanskrit substantives. Try out this <a href="DICO/grammar.html">declension engine</a> by submitting Sanskrit stems with intended gender. The same transliteration conventions as for the dictionary index apply. For instance, submit "deva" with gender Mas, or (assuming Velthuis transliteration) "devii" with gender Fem, or "brahman" with gender Neu. The fourth button, labeled "Any", may be used for the words which take their gender from the context, such as deictic personal pronouns ("aham", "tvad"), or numeral words such as "dva", "tri", etc. </p> <p> A conjugation engine for roots is also available. It handles the full present system: present indicative, imperfect, imperative and optative, as well as the passive present system, the perfect, the aorist and the future. Participial stems, absolutives and infinitives are listed as well. Some secondary conjugations (causative, intensive, desiderative) are also generated, for the full present and future systems. Try out this <a href="DICO/grammar.html#roots">conjugation engine</a> with data such as "bhuu" 1, "as" 2, "m.rj" 2, "han" 2, "haa" 3, "hu" 3, "daa" 4, "su" 5, "p.r" 6, "yuj" 7, "k.r" 8, "j~naa" 9, "cur" 10, "namas" 11. You may cascade by generating declensions of the generated participial stems. </p> <p> A word of caution is called for here. The only safe way to get correct inflected forms is to enter the stem and its morphological parameters consistently with their specification in the Heritage dictionary. This is specially true of roots, since they appear with various names according to Sanskrit grammars. For instance, root hū is called hū, hvā or hve according to various grammarians. Another problem is homophony. When two items have the same phonetic realization, their respective lexemes are disambiguated by an integer index, which is specific to the lexicon. Thus there are three roots named mā in the Sanskrit Heritage dictionary. They are adressed respectively (in Velthuis transliteration) as maa#1, maa#3 and maa#4. If you ask for the conjugated forms of maa in present classes 2 or 3, the system will guess you mean maa#1 (to measure). But if you mean maa#3 (to mow) or maa#4 (to exchange) you have to enter explicitly their disambiguated stems maa#3 or maa#4. Entering an arbitrary stem and arbitrary morphology parameters may yield random results or error messages. </p> <h2 class="b2"> Lemmatizer </h2> <p> Conversely, a <a href="DICO/index.html#stemmer">lemmatiser</a> attempts to tag inflected words. Try for instance (in Velthuis format) "devaat", "jagmivaan", "a.s.tau" (selecting Noun) or "apibat", "akaar.siit", "dudoha", "vaahyate" (selecting Verb). This lemmatizer knows about inflected forms of derived stems in some secondary derivations. For instance, "darzayi.syati" is found as conjugated form: { ca. fut. ac. sg. 3 }[dṛś_1], "dariid.rzyate" yields { int. pr. md. sg. 3 }[dṛś_1], "did.rk.sate" yields { des. pr. md. sg. 3 }[dṛś_1] and "bibhik.se" yields { des. pft. md. sg. 3 | des. pft. md. sg. 1 }[bhaj]. Please note the multitag notation of this ambiguous form. </p> Other lexical categories are available, such as Part for participles. For instance, "bhikṣitavyānām" (selecting IAST transliteration and the Part lexical category), yields { g. pl. f. | g. pl. n. | g. pl. m. }[bhikṣitavya { des. pfp. [3] }[bhaj]]. <p> The various grammatical abbreviations used in these lemmas are available <a href="abrevs.pdf">here</a>. N.B. Do not attempt to lemmatize verbal forms with preverbs - this will not work, it knows only how to invert root forms. Lemmatizing more complex forms is possible through the Sanskrit Reader interface, as explained in the manual. <h2 class="b2"> Morphology </h2> A dictionary of inflected forms of Sanskrit words is provided in XML form under various transliteration schemes. Please visit the <a href="xml.html">Sanskrit linguistic resources page</a>. <!-- OBSOLETE This resource may now be downloaded as a git repository, using command:<br/> <span class="Green"> git clone https://gitlab.inria.fr/huet/Heritage_Resources.git </span> --> <a id="reader"></a> <h2 class="b2"> Sanskrit Reader </h2> <p> The main tool provided by this site is a <span class="Green">Sanskrit Reader</span> that allows machine-assisted analysis of Sanskrit sentences, that is segmentation (including sandhi viccheda), morphological tagging, and several parsers. Please consult the <a href="manual.html">Reference manual</a> for learning how to use these tools. <!-- Try our interactive <a href="DICO/reader.html">Sanskrit Reader</a>. It is able to segment simple sentences. Try for instance to segment "tryambaka.myajaamahesugandhi.mpu.s.tivardhanam" (we assume Velthuis transliteration here). Then push the "Tagging" button and get the fully tagged sentence. You will see two segmentations, one with an identified compound form "tri-ambakam", the second with a compounded segment "tryambakam". Note that each segment is indicated with a lemma giving its stem and the set of morphological parameters that may generate the segment form from its stem. The stem is hyperlinked to the dictionary of choice. </p><p> Note also that segments are separated by phonological information in the shape of a sandhi rule, justifying correct obtention of the original sentence by successive sandhi application. For instance, solution 1 explains the compound "tryambakam" as the sandhi of segments "tri" and "ambakam" by rule‹<span class="Magenta">i</span><span class="Green">|</span><span class="Magenta">a</span><span class="Blue"> → </span><span class="Red">ya</span>›. </p> <p> The reader may be helped by inserting blanks in the input at word junction. For instance, the above mantra may be entered as "tryambaka.m yajaamahe sugandhi.m pu.s.tivardhanam". But compounds should stay in one piece. Spaces are also needed for hiatus, in sentences such as: "tacchrutvaasa~njaya uvaaca". </p> <p> Many options are provided in the menu of the Reader page. For instance, clicking on the Unsandhied button we may present text in <i>padapāṭha</i> form, where each chunk is in terminal sandhi form. For instance "tryambakam yajaamahe sugandhim pu.s.tivardhanam". </p> <p> Two strengths of the Reader are provided. The Simplified mode, offered as a default, does not recognize vocatives. The Complete mode is more powerful, using the full range of participles of verbs, privative compounds, etc. It may however return so many solutions that listing all solutions is impractical, and other facilities must be used. </p> <p> The grammar used to recognize sentences is explained as a local automaton state transition graph <a href="IMAGES/lexer17.jpg">Lexer automaton</a>. This is actually a simplification of the segmenter automaton control. A simpler one, close to the Simplified mode of the reader, is <a href="IMAGES/lexer10.jpg">Simplified automaton</a>. A fuller one, close to the Complete mode of the reader, is <a href="IMAGES/lexer40.jpg">Complete automaton</a>. The color codes of these diagrams explain the output conventions of the tags. </p> <p> In these diagrams, transparent nodes are non generative, and colored nodes correspond to the lexical categories recognized by the lemmatizer. The category Auxi is the subset of Verb consisting of conjugated forms of roots "k.r", "as" and "bhuu" used as auxiliaries in periphrastic constructions. Pv denotes sequences of preverbs. </p> <h2 class="b2"> Sanskrit Parser </h2> <p> Two parsers are currently in use with the Sanskrit Heritage Platform. One is a shallow parser, available using the "Parsing" button, which appears when there are not too many remaining solutions. </p> <p> It is naive, but may be of use for beginners. For instance, try "dvaadazabhirvar.sairvyaakara.na.mzruuyate", checking the "Parsing" button. It returns a unique solution among the 8 possible segmentations. </p> <p> Each solution returned with the parser is marked with a green check sign, which may be pressed to get the semantic analysis of the sentence in terms of roles (<i>kāraka</i>). </p><p> The parser recognizes sentences. It may be made to recognize nominal phrases, provided one presses the "Optional topic" button with the intended gender. You may for instance analyze the compound: "pravaran.rpamuku.tama.nimariicima~njariicayacarcitacara.nayugala.h" as a masculine nominal. Alternatively, one can ask to recognize this form as a single word, by pressing "Word" rather than the default "Sentence" text category. When breaking the text with spaces, the Word mode allows to recognize texts given in <i>padapāṭha</i> fashion. It is also possible to recognize sequences of chunks in final sandhi form separated by spaces, where sandhi will be assumed to be undone between the chunks, by specifying the "Unsandhied" mode in the reader interface. </p> <p> Another dependency parser is under development at University of Hyderabad; it may be accessed from the Heritage segmenter, seen as a plugin. More documentation on these facilities are described in the <a href="manual.html">Reference manual</a>. <h2 class="b2"> Sanskrit Tagger </h2> <p> The semantic analysis may be still ambiguous, since a given segment may be decorated by several morphological categories. All interpretations are presented under the role matrix, sorted by increasing penalty. Check for your favorite interpretation in this list, and select it by clicking on its green heart symbol. The system will return the corresponding unambiguously tagged sentence, as a page which you may save on your own station. Iterating this process allows you to progressively tag a Sanskrit text with the Sanskrit reader assistance. </p> <p> Alternatively, you may select the ambiguous morphology choices, each being provided with a selection button. Selections are chosen by default at the first choice, but you may override this default and choose manually e.g. the genders of nominals. When your choice is finalized, just click on the "Submit" button and you will get the corresponding deterministically tagged sentence. This tool is useful for semi-manual corpus annotation. </p> <h2 class="b2"> Summary mode <span class="Red"></span> </h2> <p> Now that you are more familiar with using the various modes of the Reader on small Sanskrit sentences, it is time to try to analyse more complex sentences. Obviously the listing of all solutions is out of the question with long sentences in Complete mode. A new visual interface is offered for semi-automatic segmentation. This new Summary mode is actually now proposed as a default in the Reader page. </p> <p> Try for instance "satya.mbruuyaatpriya.mbruuyaannabruuyaatsatyamapriya.mpriya.mcanaan.rtambruuyaade.sadharma.h sanaatana.h". The display presents a summary of the union of all solutions, as a chart of segments aligned on their respective input contribution. You see at the right end the segment <i>sanātanas</i> proposed first, on top of a forest of smaller words combinations. Click on the green check sign below it. The check sign becomes blue, and the forest of irrelevant combinations vanishes. Do the same under the satyam segments, then under apriyam, all segments presented as top candidates. Now choose the particle ca (and thus na), and the blue pronominal segment eṣa. Now only one choice remains, between brūyāt and brūyām. Clicking on the first one will finish the job. Indeed only one solution remains, as indicated by the "Unique Solution" link. Clicking on its check sign, you are now viewing the same output as given by the Reader in Tagging mode, but constrained to use only segments checked in the Summary. </p> <h2 class="b2"> Other Sanskrit Resources </h2> <p> We have on on-going cooperation with the Department of Sanskrit Studies of the University of Hyderabad and the Computer Science Department of the Indian Institute of Technology at Kharagpur on computational linguistics for Sanskrit. A joint research team has been formed, cooperating with scholars from the Sanskrit Library. This team is actively developing cooperating multi-platform Web services. </p> <p> In october 2007 we organized the First International Sanskrit Computational Linguistics Symposium. Please visit the <a href= "Symposium/">Symposium Site</a>. This was followed by the Second Symposium in may 2008 at <a href="http://sanskritlibrary.org/Symposium/">Brown University</a>, by a third one in january 2009 at <a href="http://sanskrit.uohyd.ernet.in/Symposium/">Hyderabad University</a>, a fourth one in december 2010 at <a href="http://sanskrit.jnu.ac.in/conf/4iscls/index.jsp">JNU</a>. a fifth one in january 2013 at <a href="http://sites.google.com/site/5isclc2013/">IIT Bombay</a>. </p> A workshop on <a href="http://sanskrit.uohyd.ernet.in/Symposium/">Bridging the gap between Sanskrit computational linguistics tools and management of Sanskrit digital libraries</a> was organized in December 2016 at Banaras Hindu University, at the occasion of the ICON 2016 conference. The computational tools for Sanskrit developed at University of Hyderabad are available here as a <a href= "~anusaaraka/">Mirror Site</a>. <div class="center"> <img src="IMAGES/yinyang.gif" alt="Yinyang"> </div> --> <h2 class="b2"><img src="IMAGES/JoeCaml.png" alt="Cool Joe Caml"> The Zen Library</h2> <p> This site reflects an ongoing project of Sanskrit processing on a comprehensive software platform. The project is based on a structured lexicographic database, compiled from the Sanskrit Heritage dictionary, and on the Zen computational linguistics toolkit. This toolkit is a library of programs implemented in the <a href="http://ocaml.org">Objective Caml</a> programming language. The Zen library and its documentation are available as free software under the Gnu Lesser General Public License (LGPL) from the <a href="https://gitlab.inria.fr/huet/Zen.git">Zen gitlab site.</a> </p> <!-- Forum closed Please visit the <a href="http://sanskrit.inria.fr/zf/">Zen Forum</a> for announcements and discussions concerning the ZEN toolkit. --> <!-- Portal legacy closed <h2 class="b2"><img src="IMAGES/ganesh.jpg" alt="Ganesh"> The Sanskrit Portal</h2> Please visit our <a href="portal.html">Sanskrit Portal</a> to find links to other Sanskrit resources. --> <p> <!-- If you are reading this from a mirror site, don't forget to regularly update this server with the development Git site "https://gitlab.inria.fr/huet/Heritage_Platform". --> <!-- Credits for art work on old versions - OBSOLETE <h2 class="b2"><img src="IMAGES/om1.jpg" alt="Om"> Artwork credits</h2> <span class="green">Orissan artwork at this site courtesy of Shauraj Rath. © Screenex, Bhubaneshwar, Ekamra, Orissa. All rights reserved. </span><br> <span class="green">Wallpaper om images courtesy of <a href="http://www.vishvarupa.com/aum-om-omkara-pranava.html">Vishvarupa.com</a>. </span><br> <span class="green">Ganesh wallpaper courtesy of <a href="http://www.math-info.univ-paris5.fr/~patte/">François Patte</a>. </span><br> <span class="green">Shri Yantra design © <a href="MAGES/Yantra.jpg">Gérard Huet</a> 1990.<br> </span> --> </td></tr> </table> <!-- body --> <table class="pad60"> <!--padding for bandeau --> <tr><td></td></tr></table> <div class="enpied"> <table class="bandeau"><tr><td> <a href="http://ocaml.org"> <img src="IMAGES/icon_ocaml.png" alt="Objective Caml" height="50"></a> </td><td> <table class="center"> <tr><td> <a href="index.html"><strong>Top</strong></a> | <a href="DICO/index.fr.html"><strong>Index</strong></a> | <!-- <a href="DICO/index.fr.html#stemmer"><strong>Stemmer</strong></a> | --> <a href="DICO/grammar.fr.html"><strong>Grammar</strong></a> | <a href="DICO/sandhi.fr.html"><strong>Sandhi</strong></a> | <a href="DICO/reader.fr.html"><strong>Reader</strong></a> | <a href="DICO/corpus.fr.html"><b>Corpus</b></a> <!-- | OBS <a href="faq.fr.html"><strong>Help</strong></a> | <a href="portal.fr.html"><strong>Portal</strong></a> --> </td></tr><tr><td>© Gérard Huet 1994-2025</td></tr></table></td><td> <a href="http://www.inria.fr/"> <img src="IMAGES/logo_inria.png" alt="Logo Inria" height="50"></a> <br></td></tr></table></div> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10