CINXE.COM
PCRE Open Source Library for Perl Compatible Regular Expressions
<!DOCTYPE html> <html lang="en"><head><meta charset="utf-8"><link rel=canonical href='https://https://www.regular-expressions.info//pcre.html'><title>PCRE Open Source Library for Perl Compatible Regular Expressions</title> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="author" content="Jan Goyvaerts"> <meta name="description" content="PCRE (Perl Compatible Regular Expressions) is an open source library written in C that allows developers to add regular expression support to their applications. The library is compatible with a great number of C compilers and operating systems."> <meta name="keywords" content="pcre, c, cpp, open, source, free, library, lib"> <link rel=stylesheet href="regex.css" type="text/css"><script src="theme.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="New at Regular-Expressions.info" href="updates.xml"> </head> <body bgcolor=white text=black> <div id=top></div> <div id=btntop><div id=btngrid><a href="quickstart.html" target="_top"><div>Quick Start</div></a><a href="tutorial.html" target="_top"><div>Tutorial</div></a><a href="tools.html" target="_top"><div>Tools & Languages</div></a><a href="examples.html" target="_top"><div>Examples</div></a><a href="refflavors.html" target="_top"><div>Reference</div></a><a href="books.html" target="_top"><div>Book Reviews</div></a></div></div> <div id=contents><div id=side> <TABLE CLASS=side CELLSPACING=0 CELLPADDING=4><TR><TD CLASS=sideheader>Regex Tools</TD></TR><TR><TD><A HREF="grep.html" TARGET=_top>grep</A></TD></TR><TR><TD><A HREF="powergrep.html" TARGET=_top>PowerGREP</A></TD></TR><TR><TD><A HREF="regexbuddy.html" TARGET=_top>RegexBuddy</A></TD></TR><TR><TD><A HREF="regexmagic.html" TARGET=_top>RegexMagic</A></TD></TR> </TABLE><TABLE CLASS=side CELLSPACING=0 CELLPADDING=4><TR><TD CLASS=sideheader>General Applications</TD></TR><TR><TD><A HREF="editpadlite.html" TARGET=_top>EditPad Lite</A></TD></TR><TR><TD><A HREF="editpadpro.html" TARGET=_top>EditPad Pro</A></TD></TR> </TABLE><TABLE CLASS=side CELLSPACING=0 CELLPADDING=4><TR><TD CLASS=sideheader>Languages & Libraries</TD></TR><TR><TD><A HREF="boost.html" TARGET=_top>Boost</A></TD></TR><TR><TD><A HREF="delphi.html" TARGET=_top>Delphi</A></TD></TR><TR><TD><A HREF="gnu.html" TARGET=_top>GNU (Linux)</A></TD></TR><TR><TD><A HREF="groovy.html" TARGET=_top>Groovy</A></TD></TR><TR><TD><A HREF="java.html" TARGET=_top>Java</A></TD></TR><TR><TD><A HREF="javascript.html" TARGET=_top>JavaScript</A></TD></TR><TR><TD><A HREF="dotnet.html" TARGET=_top>.NET</A></TD></TR><TR><TD><A HREF="pcre.html" TARGET=_top>PCRE (C/C++)</A></TD></TR><TR><TD><A HREF="pcre2.html" TARGET=_top>PCRE2 (C/C++)</A></TD></TR><TR><TD><A HREF="perl.html" TARGET=_top>Perl</A></TD></TR><TR><TD><A HREF="php.html" TARGET=_top>PHP</A></TD></TR><TR><TD><A HREF="posix.html" TARGET=_top>POSIX</A></TD></TR><TR><TD><A HREF="powershell.html" TARGET=_top>PowerShell</A></TD></TR><TR><TD><A HREF="python.html" TARGET=_top>Python</A></TD></TR><TR><TD><A HREF="rlanguage.html" TARGET=_top>R</A></TD></TR><TR><TD><A HREF="ruby.html" TARGET=_top>Ruby</A></TD></TR><TR><TD><A HREF="stdregex.html" TARGET=_top>std::regex</A></TD></TR><TR><TD><A HREF="tcl.html" TARGET=_top>Tcl</A></TD></TR><TR><TD><A HREF="vbscript.html" TARGET=_top>VBScript</A></TD></TR><TR><TD><A HREF="vb.html" TARGET=_top>Visual Basic 6</A></TD></TR><TR><TD><A HREF="wxwidgets.html" TARGET=_top>wxWidgets</A></TD></TR><TR><TD><A HREF="xml.html" TARGET=_top>XML Schema</A></TD></TR><TR><TD><A HREF="realbasic.html" TARGET=_top>Xojo</A></TD></TR><TR><TD><A HREF="xpath.html" TARGET=_top>XQuery & XPath</A></TD></TR><TR><TD><A HREF="xregexp.html" TARGET=_top>XRegExp</A></TD></TR> </TABLE><TABLE CLASS=side CELLSPACING=0 CELLPADDING=4><TR><TD CLASS=sideheader>Databases</TD></TR><TR><TD><A HREF="mysql.html" TARGET=_top>MySQL</A></TD></TR><TR><TD><A HREF="oracle.html" TARGET=_top>Oracle</A></TD></TR><TR><TD><A HREF="postgresql.html" TARGET=_top>PostgreSQL</A></TD></TR> </TABLE><TABLE CLASS=side CELLSPACING=0 CELLPADDING=4><TR><TD CLASS=sideheader>More on This Site</TD></TR><TR><TD><A HREF="index.html" TARGET=_top>Introduction</A></TD></TR><TR><TD><A HREF="quickstart.html" TARGET=_top>Regular Expressions Quick Start</A></TD></TR><TR><TD><A HREF="tutorial.html" TARGET=_top>Regular Expressions Tutorial</A></TD></TR><TR><TD><A HREF="replacetutorial.html" TARGET=_top>Replacement Strings Tutorial</A></TD></TR><TR><TD><A HREF="tools.html" TARGET=_top>Applications and Languages</A></TD></TR><TR><TD><A HREF="examples.html" TARGET=_top>Regular Expressions Examples</A></TD></TR><TR><TD><A HREF="refflavors.html" TARGET=_top>Regular Expressions Reference</A></TD></TR><TR><TD><A HREF="refreplace.html" TARGET=_top>Replacement Strings Reference</A></TD></TR><TR><TD><A HREF="books.html" TARGET=_top>Book Reviews</A></TD></TR><TR><TD><A HREF="print.html" TARGET=_top>Printable PDF</A></TD></TR><TR><TD><A HREF="about.html" TARGET=_top>About This Site</A></TD></TR><TR><TD><A HREF="updates.html" TARGET=_top>RSS Feed & Blog</A></TD></TR></TABLE></DIV><div class=bodytext><div class=topad><A HREF="https://www.regexbuddy.com/pcre.html" TARGET="_top"><picture><source media="(max-width: 370px)" srcset="ads/320/rxbpcre100.png 1x, ads/320/rxbpcre150.png 1.5x, ads/320/rxbpcre200.png 2x, ads/320/rxbpcre250.png 2.5x, ads/320/rxbpcre300.png 3x, ads/320/rxbpcre350.png 3.5x, ads/320/rxbpcre400.png 4x"><source media="(max-width: 500px)" srcset="ads/360/rxbpcre100.png 1x, ads/360/rxbpcre150.png 1.5x, ads/360/rxbpcre200.png 2x, ads/360/rxbpcre250.png 2.5x, ads/360/rxbpcre300.png 3x, ads/360/rxbpcre350.png 3.5x, ads/360/rxbpcre400.png 4x"><source media="(max-width: 660px)" srcset="ads/480/rxbpcre100.png 1x, ads/480/rxbpcre150.png 1.5x, ads/480/rxbpcre200.png 2x, ads/480/rxbpcre250.png 2.5x, ads/480/rxbpcre300.png 3x, ads/480/rxbpcre350.png 3.5x, ads/480/rxbpcre400.png 4x"><source media="(max-width: 747px)" srcset="ads/640/rxbpcre100.png 1x, ads/640/rxbpcre150.png 1.5x, ads/640/rxbpcre200.png 2x, ads/640/rxbpcre250.png 2.5x, ads/640/rxbpcre300.png 3x, ads/640/rxbpcre350.png 3.5x, ads/640/rxbpcre400.png 4x"><img src="ads/728/rxbpcre100.png" srcset="ads/728/rxbpcre100.png 1x, ads/728/rxbpcre125.png 1.25x, ads/728/rxbpcre150.png 1.5x, ads/728/rxbpcre175.png 1.75x, ads/728/rxbpcre200.png 2x, ads/728/rxbpcre250.png 2.5x, ads/728/rxbpcre300.png 3x, ads/728/rxbpcre350.png 3.5x, ads/728/rxbpcre400.png 4x" alt="RegexBuddy—The best regex editor and tester for PCRE users!"></picture></A></div> <div class=bulb><h1>The PCRE Open Source Regex Library</h1><script type="text/javascript">showbulb();</script></div> <p>PCRE is short for Perl Compatible Regular Expressions. It is the name of an open source library written in C by Philip Hazel. The library is compatible with a great number of C compilers and operating systems. Many people have derived libraries from PCRE to make it compatible with other programming languages. The regex features included with <A HREF="php.html" TARGET="_top">PHP</A> (prior to 7.3.0), <A HREF="delphi.html" TARGET="_top">Delphi</A>, and <A HREF="rlanguage.html" TARGET="_top">R</A> (prior to 4.0.0), and <A HREF="realbasic.html" TARGET="_top">Xojo (REALbasic)</A> are all based on PCRE. The library is also included with many Linux distributions as a shared .so library and a .h header file.</p> <p>Though PCRE claims to be Perl-compatible, there are more than enough differences between contemporary versions of Perl and PCRE to consider them distinct regex flavors. Recent versions of Perl have even copied features from PCRE that PCRE had copied from other programming languages before Perl had them, in an attempt to make Perl more PCRE-compatible. Today PCRE is used more widely than Perl because PCRE is part of so many libraries and applications.</p> <p>Philip Hazel has recently released a new library called <A HREF="pcre2.html" TARGET="_top">PCRE2</A>. The first PCRE2 release was given version number 10.00 to make a clear break with the previous PCRE 8.36. Future PCRE releases will be limited to bug fixes. New features will go into PCRE2 only. If you’re taking on a new development project, you should consider using PCRE2 instead of PCRE. But for existing projects that already use PCRE, it’s probably best to stick with PCRE. Moving from PCRE to PCRE2 requires significant changes to your source code (but not to your regular expressions).</p> <p>You can find more information about PCRE and PCRE2 at <a href="https://www.pcre.org/">https://www.pcre.org/</a>.</p> <h2>Using PCRE</h2> <p>Using PCRE is very straightforward. Before you can use a regular expression, it needs to be converted into a binary format for improved efficiency. To do this, simply call pcre_compile() passing your regular expression as a null-terminated string. The function returns a pointer to the binary format. You cannot do anything with the result except pass it to the other pcre functions.</p> <p>To use the regular expression, call pcre_exec() passing the pointer returned by pcre_compile(), the character array you want to search through, and the number of characters in the array (which need not be null-terminated). You also need to pass a pointer to an array of integers where pcre_exec() stores the results, as well as the length of the array expressed in integers. The length of the array should equal the number of <A HREF="brackets.html" TARGET="_top">capturing groups</A> you want to support, plus one (for the entire regex match), multiplied by three (!). The function returns -1 if no match could be found. Otherwise, it returns the number of capturing groups filled plus one. If there are more groups than fit into the array, it returns 0. The first two integers in the array with results contain the start of the regex match (counting bytes from the start of the array) and the number of bytes in the regex match, respectively. The following pairs of integers contain the start and length of the backreferences. So array[n*2] is the start of capturing group n, and array[n*2+1] is the length of capturing group n, with capturing group 0 being the entire regex match.</p> <p>When you are done with a regular expression, all pcre_dispose() with the pointer returned by pcre_compile() to prevent memory leaks.</p> <p>The original PCRE library only supports regex matching, a job it does rather well. It provides no support for search-and-replace, splitting of strings, etc. This may not seem as a major issue because you can easily do these things in your own code. The unfortunate consequence, however, is that all the programming languages and libraries that use PCRE for regex matching have their own replacement text syntax and their own idiosyncrasies when splitting strings. The new <A HREF="pcre2.html" TARGET="_top">PCRE2</A> library does support search-and-replace.</p> <a name="supportucp"></a><h2>Compiling PCRE with Unicode Support</h2> <p>By default, PCRE compiles without <A HREF="unicode.html" TARGET="_top">Unicode</A> support. If you try to use <tt>\p</tt>, <tt>\P</tt> or <tt>\X</tt> in your regular expressions, PCRE will complain it was compiled without Unicode support.</p> <p>To compile PCRE with Unicode support, you need to define the SUPPORT_UTF8 and SUPPORT_UCP conditional defines. If PCRE’s configuration script works on your system, you can easily do this by running <tt>./configure --enable-unicode-properties</tt> before running <tt>make</tt>. The <A HREF="tutorial.html" TARGET="_top">regular expressions tutorial</A> on this website assumes that you’ve compiled PCRE with these options and that all other options are set to their defaults.</p> <a name="callout"></a><h2>PCRE Callout</h2> <p>A feature unique to PCRE is the “callout”. If you put <TT CLASS=syntax><SPAN CLASS="regexmeta">(?C1)</SPAN></TT> through <TT CLASS=syntax><SPAN CLASS="regexmeta">(?C255)</SPAN></TT> anywhere in your regex, PCRE calls the <tt>pcre_callout</tt> function when it reaches the callout during the match attempt.</p> <h2>UTF-8, UTF-16, and UTF-32</h2> <p>By default, PCRE works with 8-bit strings, where each character is one byte. You can pass the PCRE_UTF8 as the second parameter to pcre_compile() (possibly combined with other flavors using binary or) to tell PCRE to interpret your regular expression as a UTF-8 string. When you do this, pcre_match() automatically interprets the subject string using UTF-8 as well.</p> <p>If you have PCRE 8.30 or later, you can enable UTF-16 support by passing <tt>--enable-pcre16</tt> to the <tt>configure</tt> script before running <tt>make</tt>. Then you can pass PCRE_UTF16 to pcre16_compile() and then do the matching with pcre16_match() if your regular expression and subject strings are stored as UTF-16. UTF-16 uses two bytes for code points up to U+FFFF, and four bytes for higher code points. In Visual C++, whchar_t strings use UTF-16. It’s important to make sure that you do not mix the pcre_ and pcre16_ functions. The PCRE_UTF8 and PCRE_UTF16 constants are actually the same. You need to use the pcre16_ functions to get the UTF-16 version.</p> <p>If you have PCRE 8.32 or later, you can enable UTF-16 support by passing <tt>--enable-pcre32</tt> to the <tt>configure</tt> script before running <tt>make</tt>. Then you can pass PCRE_UTF32 to pcre32_compile() and then do the matching with pcre32_match() if your regular expression and subject strings are stored as UTF-32. UTF-32 uses four bytes per character and is common for in-memory Unicode strings on Linux. It’s important to make sure that you do not mix the pcre32_ functions with the pcre16_ or pcre_ sets. Again, the PCRE_UTF8 and PCRE_UTF32 constants are the same. You need to use the pcre32_ functions to get the UTF-32 version.</p> <div id=cntmobi><p>| <a href='quickstart.html'>Quick Start</a> | <a href='tutorial.html'>Tutorial</a> | <a href='tools.html'>Tools & Languages</a> | <a href='examples.html'>Examples</a> | <a href='refflavors.html'>Reference</a> | <a href='books.html'>Book Reviews</a> |</p><p>| <a href='grep.html'>grep</a> | <a href='powergrep.html'>PowerGREP</a> | <a href='regexbuddy.html'>RegexBuddy</a> | <a href='regexmagic.html'>RegexMagic</a> |</p><p>| <a href='editpadlite.html'>EditPad Lite</a> | <a href='editpadpro.html'>EditPad Pro</a> |</p><p>| <a href='boost.html'>Boost</a> | <a href='delphi.html'>Delphi</a> | <a href='gnu.html'>GNU (Linux)</a> | <a href='groovy.html'>Groovy</a> | <a href='java.html'>Java</a> | <a href='javascript.html'>JavaScript</a> | <a href='dotnet.html'>.NET</a> | <a href='pcre.html'>PCRE (C/C++)</a> | <a href='pcre2.html'>PCRE2 (C/C++)</a> | <a href='perl.html'>Perl</a> | <a href='php.html'>PHP</a> | <a href='posix.html'>POSIX</a> | <a href='powershell.html'>PowerShell</a> | <a href='python.html'>Python</a> | <a href='rlanguage.html'>R</a> | <a href='ruby.html'>Ruby</a> | <a href='stdregex.html'>std::regex</a> | <a href='tcl.html'>Tcl</a> | <a href='vbscript.html'>VBScript</a> | <a href='vb.html'>Visual Basic 6</a> | <a href='wxwidgets.html'>wxWidgets</a> | <a href='xml.html'>XML Schema</a> | <a href='realbasic.html'>Xojo</a> | <a href='xpath.html'>XQuery & XPath</a> | <a href='xregexp.html'>XRegExp</a> |</p><p>| <a href='mysql.html'>MySQL</a> | <a href='oracle.html'>Oracle</a> | <a href='postgresql.html'>PostgreSQL</a> |</p></div> <div id=copyright> <P CLASS=copyright>Page URL: <A HREF="https://www.regular-expressions.info/pcre.html" TARGET="_top">https://www.regular-expressions.info/pcre.html</A><BR> Page last updated: 24 August 2021<BR> Site last updated: 06 November 2024<BR> Copyright © 2003-2024 Jan Goyvaerts. All rights reserved.</P> </div> </div> </div> </body></html>