CINXE.COM

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html> <head> <title>The Xapian Project : Features</title> <link rel="stylesheet" type="text/css" media="print" href="print.css"> <link rel="icon" href="/apple-touch-icon-precomposed.png" sizes="180x180"> <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"> <link rel="icon" href="/favicon.ico" type="image/x-icon"> <meta name="viewport" content="width=device-width, initial-scale=1"> <style type="text/css" media="screen">  </style> <link rel="canonical" href="https://xapian.org/features"> </head> <body> <div id="Content"> <h1>Features</h1> <p>Noteworthy features of Xapian include:</p> <ul> <li> Free Software/Open Source - licensed under the GPL. <li> Supports Unicode 9.0 (including codepoints beyond the BMP), and stores indexed text in UTF-8. <li> <a HREF="https://trac.xapian.org/wiki/SupportedPlatforms">Highly portable</a> - runs on Android, Linux, macOS, FreeBSD, NetBSD, OpenBSD, Solaris, HP-UX, AIX and probably other Unix platforms; as well as Microsoft Windows. <li> Written in C++, with bindings allowing use from <a href="/docs/bindings/">many other languages</a>. <li> Ranked search (so the most relevant documents are more likely to come near the top of the results list) with built-in support for multiple models from the Probabilistic, Divergence from Randomness, and Language Modelling families of weighting models. Custom user-supplied weighting models are also supported. <li> Relevance feedback - given one or more documents, Xapian can suggest the most relevant index terms to expand a query, suggest related documents, categorise documents, etc. <li> Phrase and proximity searching - users can search for words occurring in an exact phrase or within a specified number of words, either in a specified order, or in any order. <li> Full range of structured boolean search operators ("stock NOT market", etc). The results of the boolean search are ranked by the weighting model, and boolean filters can also be applied (which don't themselves contribute to a document's weight). <li> Supports stemming of search terms (e.g. a search for "football" would match documents which mention "footballs" or "footballer"). This helps to find relevant documents which might otherwise be missed. <a href="http://snowballstem.org/">Snowball stemmers</a> are currently included for Arabic, Armenian, Basque, Catalan, Danish, Dutch, English, Finnish, French, German, Hungarian, Indonesian, Irish, Italian, Lithuanian, Nepali, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish, Tamil and Turkish. <li> Wildcard search is supported (e.g. "xap*"). <li> Synonyms are supported, both explicitly (e.g. "~cash") and as an automatic form of query expansion. <li> Dynamically generated snippets from matching documents can be generated, with matching words, phrases and wildcards highlighted. <li> Xapian can suggest spelling corrections for user supplied queries. This is based on words which occur in the data being indexed, so works even for words which wouldn't be found in a dictionary (e.g. "xapian" would be suggested as a correction for "xapain"). <li> <a href="/docs/facets">Faceted search</a> is supported. <li> Supports database files > 2GB - essential for <A HREF="docs/scalability.html">scaling to large document collections</A>. <li> Platform independent data formats - you can build a database on one machine and search it on another. <li> Allows simultaneous update and searching. New documents become searchable right away. </ul> <p>As well as the library, we supply a number of small example programs, and a larger application - an indexing and CGI search application called Omega:</p> <ul> <li> The indexer supplied can index HTML, PHP, PDF, PostScript, LibreOffice/OpenOffice/StarOffice, OpenDocument, Microsoft Word/Excel/Powerpoint/Publisher/Visio/Works/XPS, Microsoft Outlook saved messages, Apple iWork, Word Perfect, AbiWord, RTF, DVI, Perl POD documentation, CSV, SVG, reStructured text, markdown, MAFF, MHTML, ATOM feeds, dejavu, RFC822 mail messages (.eml), vCard, RPM packages, Debian packages, and plain text. Adding support for indexing other formats is easy where conversion filters are available. <li> You can also index data from any SQL or other RDBMS supported by the <A HREF="http://dbi.perl.org/">Perl DBI module</A>. That includes MySQL, PostgreSQL, SQLite, Oracle, DB2, MS SQL, LDAP, and ODBC. <li> CGI search front-end supplied with highly customisable appearance. This can also be customised to output results in JSON, XML or CSV, which is useful if you just want raw search results which you can process in your own page layout code for dynamically generated pages, or for integrating search into an AJAX front-end. </ul> </div> <div id="Menu"> <a href=".">home</a><br> <span id="current">features</span><br> <a href="history">history</a><br> <a href="lists">mailing lists</a><br> <a href="docs/">docs</a><br> <a href="users">current users</a><br> <a href="sponsors">sponsors</a><br> <a href="support">commercial support</a><br> <a href="download">download</a><br> <a href="bleeding">bleeding edge</a><br> <a href="bugs">bugs</a><br> <a href="contact">contact us</a><br> <a href="search">search this website</a><br> <form method="GET" action="search"><div><input type="search" name="P" size="14"></div></form> </div> </body> </html>