CINXE.COM

Regex Tutorial - Named Capturing Groups - Backreference Names

<!DOCTYPE html> <html lang="en"><head><meta charset="utf-8"><link rel=canonical href='https://https://www.regular-expressions.info//named.html'><title>Regex Tutorial - Named Capturing Groups - Backreference Names</title> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="author" content="Jan Goyvaerts"> <meta name="description" content=""> <meta name="keywords" content=""> <link rel=stylesheet href="regex.css" type="text/css"><script src="theme.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="New at Regular-Expressions.info" href="updates.xml"> </head> <body bgcolor=white text=black> <div id=top></div> <div id=btntop><div id=btngrid><a href="quickstart.html" target="_top"><div>Quick&nbsp;Start</div></a><a href="tutorial.html" target="_top"><div>Tutorial</div></a><a href="tools.html" target="_top"><div>Tools&nbsp;&amp;&nbsp;Languages</div></a><a href="examples.html" target="_top"><div>Examples</div></a><a href="refflavors.html" target="_top"><div>Reference</div></a><a href="books.html" target="_top"><div>Book&nbsp;Reviews</div></a></div></div> <div id=contents><div id=side> <TABLE CLASS=side CELLSPACING=0 CELLPADDING=4><TR><TD CLASS=sideheader>Regex Tutorial</TD></TR><TR><TD><A HREF="tutorial.html" TARGET=_top>Introduction</A></TD></TR><TR><TD><A HREF="tutorialcnt.html" TARGET=_top>Table of Contents</A></TD></TR><TR><TD><A HREF="characters.html" TARGET=_top>Special Characters</A></TD></TR><TR><TD><A HREF="nonprint.html" TARGET=_top>Non-Printable Characters</A></TD></TR><TR><TD><A HREF="engine.html" TARGET=_top>Regex Engine Internals</A></TD></TR><TR><TD><A HREF="charclass.html" TARGET=_top>Character Classes</A></TD></TR><TR><TD><A HREF="charclasssubtract.html" TARGET=_top>Character Class Subtraction</A></TD></TR><TR><TD><A HREF="charclassintersect.html" TARGET=_top>Character Class Intersection</A></TD></TR><TR><TD><A HREF="shorthand.html" TARGET=_top>Shorthand Character Classes</A></TD></TR><TR><TD><A HREF="dot.html" TARGET=_top>Dot</A></TD></TR><TR><TD><A HREF="anchors.html" TARGET=_top>Anchors</A></TD></TR><TR><TD><A HREF="wordboundaries.html" TARGET=_top>Word Boundaries</A></TD></TR><TR><TD><A HREF="alternation.html" TARGET=_top>Alternation</A></TD></TR><TR><TD><A HREF="optional.html" TARGET=_top>Optional Items</A></TD></TR><TR><TD><A HREF="repeat.html" TARGET=_top>Repetition</A></TD></TR><TR><TD><A HREF="brackets.html" TARGET=_top>Grouping &amp; Capturing</A></TD></TR><TR><TD><A HREF="backref.html" TARGET=_top>Backreferences</A></TD></TR><TR><TD><A HREF="backref2.html" TARGET=_top>Backreferences, part 2</A></TD></TR><TR><TD><A HREF="named.html" TARGET=_top>Named Groups</A></TD></TR><TR><TD><A HREF="backrefrel.html" TARGET=_top>Relative Backreferences</A></TD></TR><TR><TD><A HREF="branchreset.html" TARGET=_top>Branch Reset Groups</A></TD></TR><TR><TD><A HREF="freespacing.html" TARGET=_top>Free-Spacing &amp; Comments</A></TD></TR><TR><TD><A HREF="unicode.html" TARGET=_top>Unicode</A></TD></TR><TR><TD><A HREF="modifiers.html" TARGET=_top>Mode Modifiers</A></TD></TR><TR><TD><A HREF="atomic.html" TARGET=_top>Atomic Grouping</A></TD></TR><TR><TD><A HREF="possessive.html" TARGET=_top>Possessive Quantifiers</A></TD></TR><TR><TD><A HREF="lookaround.html" TARGET=_top>Lookahead &amp; Lookbehind</A></TD></TR><TR><TD><A HREF="lookaround2.html" TARGET=_top>Lookaround, part 2</A></TD></TR><TR><TD><A HREF="keep.html" TARGET=_top>Keep Text out of The Match</A></TD></TR><TR><TD><A HREF="conditional.html" TARGET=_top>Conditionals</A></TD></TR><TR><TD><A HREF="balancing.html" TARGET=_top>Balancing Groups</A></TD></TR><TR><TD><A HREF="recurse.html" TARGET=_top>Recursion</A></TD></TR><TR><TD><A HREF="subroutine.html" TARGET=_top>Subroutines</A></TD></TR><TR><TD><A HREF="recurseinfinite.html" TARGET=_top>Infinite Recursion</A></TD></TR><TR><TD><A HREF="recurserepeat.html" TARGET=_top>Recursion &amp; Quantifiers</A></TD></TR><TR><TD><A HREF="recursecapture.html" TARGET=_top>Recursion &amp; Capturing</A></TD></TR><TR><TD><A HREF="recursebackref.html" TARGET=_top>Recursion &amp; Backreferences</A></TD></TR><TR><TD><A HREF="recursebacktrack.html" TARGET=_top>Recursion &amp; Backtracking</A></TD></TR><TR><TD><A HREF="posixbrackets.html" TARGET=_top>POSIX Bracket Expressions</A></TD></TR><TR><TD><A HREF="zerolength.html" TARGET=_top>Zero-Length Matches</A></TD></TR><TR><TD><A HREF="continue.html" TARGET=_top>Continuing Matches</A></TD></TR> </TABLE><TABLE CLASS=side CELLSPACING=0 CELLPADDING=4><TR><TD CLASS=sideheader>More on This Site</TD></TR><TR><TD><A HREF="index.html" TARGET=_top>Introduction</A></TD></TR><TR><TD><A HREF="quickstart.html" TARGET=_top>Regular Expressions Quick Start</A></TD></TR><TR><TD><A HREF="tutorial.html" TARGET=_top>Regular Expressions Tutorial</A></TD></TR><TR><TD><A HREF="replacetutorial.html" TARGET=_top>Replacement Strings Tutorial</A></TD></TR><TR><TD><A HREF="tools.html" TARGET=_top>Applications and Languages</A></TD></TR><TR><TD><A HREF="examples.html" TARGET=_top>Regular Expressions Examples</A></TD></TR><TR><TD><A HREF="refflavors.html" TARGET=_top>Regular Expressions Reference</A></TD></TR><TR><TD><A HREF="refreplace.html" TARGET=_top>Replacement Strings Reference</A></TD></TR><TR><TD><A HREF="books.html" TARGET=_top>Book Reviews</A></TD></TR><TR><TD><A HREF="print.html" TARGET=_top>Printable PDF</A></TD></TR><TR><TD><A HREF="about.html" TARGET=_top>About This Site</A></TD></TR><TR><TD><A HREF="updates.html" TARGET=_top>RSS Feed &amp; Blog</A></TD></TR></TABLE></DIV><div class=bodytext><div class=topad style="height:130px"><A HREF="https://www.regexbuddy.com/create.html" TARGET="_top"><picture><source media="(max-width: 370px)" srcset="ads/320/rxbtutorial100.png 1x, ads/320/rxbtutorial150.png 1.5x, ads/320/rxbtutorial200.png 2x, ads/320/rxbtutorial250.png 2.5x, ads/320/rxbtutorial300.png 3x, ads/320/rxbtutorial350.png 3.5x, ads/320/rxbtutorial400.png 4x"><source media="(max-width: 500px)" srcset="ads/360/rxbtutorial100.png 1x, ads/360/rxbtutorial150.png 1.5x, ads/360/rxbtutorial200.png 2x, ads/360/rxbtutorial250.png 2.5x, ads/360/rxbtutorial300.png 3x, ads/360/rxbtutorial350.png 3.5x, ads/360/rxbtutorial400.png 4x"><source media="(max-width: 660px)" srcset="ads/480/rxbtutorial100.png 1x, ads/480/rxbtutorial150.png 1.5x, ads/480/rxbtutorial200.png 2x, ads/480/rxbtutorial250.png 2.5x, ads/480/rxbtutorial300.png 3x, ads/480/rxbtutorial350.png 3.5x, ads/480/rxbtutorial400.png 4x"><source media="(max-width: 747px)" srcset="ads/640/rxbtutorial100.png 1x, ads/640/rxbtutorial150.png 1.5x, ads/640/rxbtutorial200.png 2x, ads/640/rxbtutorial250.png 2.5x, ads/640/rxbtutorial300.png 3x, ads/640/rxbtutorial350.png 3.5x, ads/640/rxbtutorial400.png 4x"><img src="ads/728/rxbtutorial100.png" srcset="ads/728/rxbtutorial100.png 1x, ads/728/rxbtutorial125.png 1.25x, ads/728/rxbtutorial150.png 1.5x, ads/728/rxbtutorial175.png 1.75x, ads/728/rxbtutorial200.png 2x, ads/728/rxbtutorial250.png 2.5x, ads/728/rxbtutorial300.png 3x, ads/728/rxbtutorial350.png 3.5x, ads/728/rxbtutorial400.png 4x" alt="RegexBuddy—Better than a regular expression tutorial!"></picture></A></div> <div class=bulb><h1>Named Capturing Groups and Backreferences</h1><script type="text/javascript">showbulb();</script></div> <p>Nearly all modern regular expression engines support <A HREF="brackets.html" TARGET="_top">numbered capturing groups</A> and <A HREF="backref.html" TARGET="_top">numbered backreferences</A>. Long regular expressions with lots of groups and backreferences may be hard to read. They can be particularly difficult to maintain as adding or removing a capturing group in the middle of the regex upsets the numbers of all the groups that follow the added or removed group.</p> <p><A HREF="python.html" TARGET="_top">Python’s re module</A> was the first to offer a solution: named capturing groups and named backreferences. <TT CLASS=syntax><SPAN CLASS="regexnest1">(?P&lt;name&gt;</SPAN><SPAN CLASS="regexplain">group</SPAN><SPAN CLASS="regexnest1">)</SPAN></TT> captures the match of <TT CLASS=syntax><SPAN CLASS="regexplain">group</SPAN></TT> into the backreference “name”. <TT CLASS=syntax><SPAN CLASS="regexnest1">name</SPAN></TT> must be an alphanumeric sequence starting with a letter. <TT CLASS=syntax><SPAN CLASS="regexplain">group</SPAN></TT> can be any regular expression. You can reference the contents of the group with the named backreference <TT CLASS=syntax><SPAN CLASS="regexspecial">(?P=name)</SPAN></TT>. The question mark, P, angle brackets, and equals signs are all part of the syntax. Though the syntax for the named backreference uses parentheses, it’s just a backreference that doesn’t do any capturing or grouping. The <A HREF="backref.html" TARGET="_top">HTML tags example</A> can be written as <TT CLASS=syntax><SPAN CLASS="regexplain">&lt;</SPAN><SPAN CLASS="regexnest1">(?P&lt;tag&gt;</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">A-Z</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">A-Z</SPAN><SPAN CLASS="regexccrange">0-9</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexspecial">\b</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccliteral">&gt;</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexplain">&gt;</SPAN><SPAN CLASS="regexspecial">.</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexspecial">?</SPAN><SPAN CLASS="regexplain">&lt;/</SPAN><SPAN CLASS="regexspecial">(?P=tag)</SPAN><SPAN CLASS="regexplain">&gt;</SPAN></TT>.</p> <p><A HREF="dotnet.html" TARGET="_top">.NET</A> also supports named capture. Microsoft’s developers invented their own syntax, rather than follow the one pioneered by Python and copied by PCRE (the only two regex engines that supported named capture at that time). <TT CLASS=syntax><SPAN CLASS="regexnest1">(?&lt;name&gt;</SPAN><SPAN CLASS="regexplain">group</SPAN><SPAN CLASS="regexnest1">)</SPAN></TT> or <TT CLASS=syntax><SPAN CLASS="regexnest1">(?'name'</SPAN><SPAN CLASS="regexplain">group</SPAN><SPAN CLASS="regexnest1">)</SPAN></TT> captures the match of <TT CLASS=syntax><SPAN CLASS="regexplain">group</SPAN></TT> into the backreference “name”. The named backreference is <TT CLASS=syntax><SPAN CLASS="regexspecial">\k&lt;name&gt;</SPAN></TT> or <TT CLASS=syntax><SPAN CLASS="regexspecial">\k'name'</SPAN></TT>. Compared with Python, there is no P in the syntax for named groups. The syntax for named backreferences is more similar to that of numbered backreferences than to what Python uses. You can use single quotes or angle brackets around the name. This makes absolutely no difference in the regex. You can use both styles interchangeably. The syntax using angle brackets is preferable in programming languages that use single quotes to delimit strings, while the syntax using single quotes is preferable when adding your regex to an XML file, as this minimizes the amount of escaping you have to do to format your regex as a literal string or as XML content.</p> <p>Because Python and .NET introduced their own syntax, we refer to these two variants as the “Python syntax” and the “.NET syntax” for named capture and named backreferences. Today, many other regex flavors have copied this syntax.</p> <p><A HREF="perl.html" TARGET="_top">Perl 5.10</A> added support for both the Python and .NET syntax for named capture and backreferences. It also adds two more syntactic variants for named backreferences: <TT CLASS=syntax><SPAN CLASS="regexspecial">\k{one}</SPAN></TT> and <TT CLASS=syntax><SPAN CLASS="regexspecial">\g{two}</SPAN></TT>. There’s no difference between the five syntaxes for named backreferences in Perl. All can be used interchangeably. In the replacement text, you can interpolate the variable <tt class=code>$+{name}</tt> to insert the text matched by a named capturing group.</p> <p><A HREF="pcre.html" TARGET="_top">PCRE 7.2</A> and later support all the syntax for named capture and backreferences that Perl 5.10 supports. Old versions of PCRE supported the Python syntax, even though that was not “Perl-compatible” at the time. Languages like <A HREF="php.html" TARGET="_top">PHP</A>, <A HREF="delphi.html" TARGET="_top">Delphi</A>, and <A HREF="rlanguage.html" TARGET="_top">R</A> that implement their regex support using PCRE also support all this syntax. Unfortunately, neither PHP or R support named references in the replacement text. You’ll have to use numbered references to the named groups. PCRE does not support search-and-replace at all.</p> <p><A HREF="java.html" TARGET="_top">Java 7</A> and <A HREF="xregexp.html" TARGET="_top">XRegExp</A> copied the .NET syntax, but only the variant with angle brackets. <A HREF="ruby.html" TARGET="_top">Ruby 1.9</A> and supports both variants of the .NET syntax. The <A HREF="jgsoft.html" TARGET="_top">JGsoft flavor</A> supports the Python syntax and both variants of the .NET syntax.</p> <a name="boost"></a><p><A HREF="boost.html" TARGET="_top">Boost 1.42</A> and later support named capturing groups using the .NET syntax with angle brackets or quotes and named backreferences using the <tt>\g</tt> syntax with curly braces from Perl 5.10. Boost 1.47 additionally supports backreferences using the <tt>\k</tt> syntax with angle brackets and quotes from .NET. Boost 1.47 allowed these variants to multiply. Boost 1.47 allows named and numbered backreferences to be specified with <tt>\g</tt> or <tt>\k</tt> and with curly braces, angle brackets, or quotes. So Boost 1.47 and later have six variations of the backreference syntax on top of the basic <tt>\1</tt> syntax. This puts Boost in conflict with Ruby, PCRE, PHP, R, and JGsoft which treat <tt>\g</tt> with angle brackets or quotes as a <A HREF="subroutine.html" TARGET="_top">subroutine call</A>.</p> <a name="number"></a><h2>Numbers for Named Capturing Groups</h2> <p>Mixing named and numbered capturing groups is not recommended because flavors are inconsistent in how the groups are numbered. If a group doesn’t need to have a name, make it non-capturing using the <TT CLASS=syntax><SPAN CLASS="regexnest1">(?:</SPAN><SPAN CLASS="regexplain">group</SPAN><SPAN CLASS="regexnest1">)</SPAN></TT> syntax. In .NET you can make all unnamed groups non-capturing by setting <tt class=code>RegexOptions.ExplicitCapture</tt>. In <A HREF="delphi.html" TARGET="_top">Delphi</A>, set <tt>roExplicitCapture</tt>. With <A HREF="xregexp.html" TARGET="_top">XRegExp</A>, use the <tt>/n</tt> flag. <A HREF="perl.html" TARGET="_top">Perl</A> supports <tt>/n</tt> starting with Perl 5.22. With <A HREF="pcre.html" TARGET="_top">PCRE</A>, set <tt>PCRE_NO_AUTO_CAPTURE</tt>. The <A HREF="jgsoft.html" TARGET="_top">JGsoft flavor</A> and .NET support the <TT CLASS=syntax><SPAN CLASS="regexmeta">(?</SPAN><SPAN CLASS="regexmeta">n</SPAN><SPAN CLASS="regexmeta">)</SPAN></TT> <A HREF="modifiers.html" TARGET="_top">mode modifier</A>. If you make all unnamed groups non-capturing, you can skip this section and save yourself a headache.</p> <p>Most flavors number both named and unnamed capturing groups by counting their opening parentheses from left to right. Adding a named capturing group to an existing regex still upsets the numbers of the unnamed groups. In .NET, however, unnamed capturing groups are assigned numbers first, counting their opening parentheses from left to right, skipping all named groups. After that, named groups are assigned the numbers that follow by counting the opening parentheses of the named groups from left to right.</p> <p>The <A HREF="jgsoft.html" TARGET="_top">JGsoft regex engine</A> copied the Python and the .NET syntax at a time when only Python and PCRE used the Python syntax, and only .NET used the .NET syntax. Therefore it also copied the numbering behavior of both Python and .NET, so that regexes intended for Python and .NET would keep their behavior. It numbers Python-style named groups along unnamed ones, like Python does. It numbers .NET-style named groups afterward, like .NET does. These rules apply even when you mix both styles in the same regex.</p> <p>As an example, the regex <TT CLASS=syntax><SPAN CLASS="regexnest1">(</SPAN><SPAN CLASS="regexplain">a</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexnest1">(?P&lt;x&gt;</SPAN><SPAN CLASS="regexplain">b</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexnest1">(</SPAN><SPAN CLASS="regexplain">c</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexnest1">(?P&lt;y&gt;</SPAN><SPAN CLASS="regexplain">d</SPAN><SPAN CLASS="regexnest1">)</SPAN></TT> matches <tt class=match>abcd</tt> as expected. If you do a search-and-replace with this regex and the replacement <TT CLASS=syntax><SPAN CLASS="regexspecial">\1</SPAN><SPAN CLASS="regexspecial">\2</SPAN><SPAN CLASS="regexspecial">\3</SPAN><SPAN CLASS="regexspecial">\4</SPAN></TT> or <TT CLASS=syntax><SPAN CLASS="regexspecial">$1</SPAN><SPAN CLASS="regexspecial">$2</SPAN><SPAN CLASS="regexspecial">$3</SPAN><SPAN CLASS="regexspecial">$4</SPAN></TT> (depending on the flavor), you will get <tt class=string>abcd</tt>. All four groups were numbered from left to right, from one till four.</p> <p>Things are a bit more complicated with .NET. The regex <TT CLASS=syntax><SPAN CLASS="regexnest1">(</SPAN><SPAN CLASS="regexplain">a</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexnest1">(?&lt;x&gt;</SPAN><SPAN CLASS="regexplain">b</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexnest1">(</SPAN><SPAN CLASS="regexplain">c</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexnest1">(?&lt;y&gt;</SPAN><SPAN CLASS="regexplain">d</SPAN><SPAN CLASS="regexnest1">)</SPAN></TT> again matches <tt class=match>abcd</tt>. However, if you do a search-and-replace with <tt class=string>$1$2$3$4</tt> as the replacement, you will get <tt class=string>acbd</tt>. First, the unnamed groups <TT CLASS=syntax><SPAN CLASS="regexnest1">(</SPAN><SPAN CLASS="regexplain">a</SPAN><SPAN CLASS="regexnest1">)</SPAN></TT> and <TT CLASS=syntax><SPAN CLASS="regexnest1">(</SPAN><SPAN CLASS="regexplain">c</SPAN><SPAN CLASS="regexnest1">)</SPAN></TT> got the numbers 1 and 2. Then the named groups “x” and “y” got the numbers 3 and 4.</p> <p>In all other flavors that copied the .NET syntax the regex <TT CLASS=syntax><SPAN CLASS="regexnest1">(</SPAN><SPAN CLASS="regexplain">a</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexnest1">(?&lt;x&gt;</SPAN><SPAN CLASS="regexplain">b</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexnest1">(</SPAN><SPAN CLASS="regexplain">c</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexnest1">(?&lt;y&gt;</SPAN><SPAN CLASS="regexplain">d</SPAN><SPAN CLASS="regexnest1">)</SPAN></TT> still matches <tt class=match>abcd</tt>. But in all those flavors, except the JGsoft flavor, the replacement <TT CLASS=syntax><SPAN CLASS="regexspecial">\1</SPAN><SPAN CLASS="regexspecial">\2</SPAN><SPAN CLASS="regexspecial">\3</SPAN><SPAN CLASS="regexspecial">\4</SPAN></TT> or <TT CLASS=syntax><SPAN CLASS="regexspecial">$1</SPAN><SPAN CLASS="regexspecial">$2</SPAN><SPAN CLASS="regexspecial">$3</SPAN><SPAN CLASS="regexspecial">$4</SPAN></TT> (depending on the flavor) gets you <tt class=string>abcd</tt>. All four groups were numbered from left to right.</p> <p>In <A HREF="powergrep.html" TARGET="_top">PowerGREP</A>, which uses the JGsoft flavor, named capturing groups play a special role. Groups with the same name are shared between all regular expressions and replacement texts in the same PowerGREP action. This allows captured by a named capturing group in one part of the action to be referenced in a later part of the action. Because of this, PowerGREP does not allow numbered references to named capturing groups at all. When mixing named and numbered groups in a regex, the numbered groups are still numbered following the Python and .NET rules, like the JGsoft flavor always does.</p> <a name="duplicate"></a><h2>Multiple Groups with The Same Name</h2> <p>The <A HREF="dotnet.html" TARGET="_top">.NET framework</A> and the <A HREF="jgsoft.html" TARGET="_top">JGsoft flavor</A> allow multiple groups in the regular expression to have the same name. All groups with the same name share the same storage for the text they match. Thus, a backreference to that name matches the text that was matched by the group with that name that most recently captured something. A reference to the name in the replacement text inserts the text matched by the group with that name that was the last one to capture something.</p> <p><A HREF="perl.html" TARGET="_top">Perl</A> and <A HREF="ruby.html" TARGET="_top">Ruby</A> also allow groups with the same name. But these flavors only use smoke and mirrors to make it look like the all the groups with the same name act as one. In reality, the groups are separate. In Perl, a backreference matches the text captured by the leftmost group in the regex with that name that matched something. In Ruby, a backreference matches the text captured by any of the groups with that name. Backtracking makes Ruby try all the groups.</p> <p>So in Perl and Ruby, you can only meaningfully use groups with the same name if they are in separate alternatives in the regex, so that only one of the groups with that name could ever capture any text. Then backreferences to that group sensibly match the text captured by the group.</p> <p>For example, if you want to match “a” followed by a digit 0..5, or “b” followed by a digit 4..7, and you only care about the digit, you could use the regex <TT CLASS=syntax><SPAN CLASS="regexplain">a</SPAN><SPAN CLASS="regexnest1">(?&lt;digit&gt;</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">0-5</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexspecial">|</SPAN><SPAN CLASS="regexplain">b</SPAN><SPAN CLASS="regexnest1">(?&lt;digit&gt;</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">4-7</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexnest1">)</SPAN></TT>. In these four flavors, the group named “digit” will then give you the digit 0..7 that was matched, regardless of the letter. If you want this match to be followed by c and the exact same digit, you could use <TT CLASS=syntax><SPAN CLASS="regexnest1">(?:</SPAN><SPAN CLASS="regexplain">a</SPAN><SPAN CLASS="regexnest2">(?&lt;digit&gt;</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">0-5</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexnest2">)</SPAN><SPAN CLASS="regexnest1">|</SPAN><SPAN CLASS="regexplain">b</SPAN><SPAN CLASS="regexnest2">(?&lt;digit&gt;</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">4-7</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexnest2">)</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexplain">c</SPAN><SPAN CLASS="regexspecial">\k&lt;digit&gt;</SPAN></TT></p> <p>PCRE does not allow duplicate named groups by default. PCRE 6.7 and later allow them if you turn on that option or use the <A HREF="modifiers.html" TARGET="_top">mode modifier</A> <TT CLASS=syntax><SPAN CLASS="regexmeta">(?</SPAN><SPAN CLASS="regexmeta">J</SPAN><SPAN CLASS="regexmeta">)</SPAN></TT>. But prior to PCRE 8.36 that wasn’t very useful as backreferences always pointed to the first capturing group with that name in the regex regardless of whether it participated in the match. Starting with PCRE 8.36 (and thus PHP 5.6.9 and R 3.1.3) and also in PCRE2, backreferences point to the first group with that name that actually participated in the match. Though PCRE and Perl handle duplicate groups in opposite directions the end result is the same if you follow the advice to only use groups with the same name in separate alternatives.</p> <p>Boost allows duplicate named groups. Prior to Boost 1.47 that wasn’t useful as backreferences always pointed to the last group with that name that appears before the backreference in the regex. In Boost 1.47 and later backreferences point to the first group with that name that actually participated in the match just like in PCRE 8.36 and later.</p> <p>Python, Java, and XRegExp 3 do not allow multiple groups to use the same name. Doing so will give a regex compilation error. XRegExp 2 allowed them, but did not handle them correctly.</p> <p>In Perl 5.10, PCRE 8.00, PHP 5.2.14, and Boost 1.42 (or later versions of these) it is best to use a <A HREF="branchreset.html" TARGET="_top">branch reset group</A> when you want groups in different alternatives to have the same name, as in <TT CLASS=syntax><SPAN CLASS="regexnest1">(?|</SPAN><SPAN CLASS="regexplain">a</SPAN><SPAN CLASS="regexnest2">(?&lt;digit&gt;</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">0-5</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexnest2">)</SPAN><SPAN CLASS="regexnest1">|</SPAN><SPAN CLASS="regexplain">b</SPAN><SPAN CLASS="regexnest2">(?&lt;digit&gt;</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">4-7</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexnest2">)</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexplain">c</SPAN><SPAN CLASS="regexspecial">\k&lt;digit&gt;</SPAN></TT>. With this special syntax&mdash;group opened with <TT CLASS=syntax><SPAN CLASS="regexnest1">(?|</SPAN></TT> instead of <TT CLASS=syntax><SPAN CLASS="regexnest1">(?:</SPAN></TT>&mdash;the two groups named “digit” really are one and the same group. Then backreferences to that group are always handled correctly and consistently between these flavors. (Older versions of PCRE and PHP may support branch reset groups, but don’t correctly handle duplicate names in branch reset groups.)</p> <div id=cntmobi><p>|&ensp;<a href='quickstart.html'>Quick&nbsp;Start</a>&ensp;|&ensp;<a href='tutorial.html'>Tutorial</a>&ensp;|&ensp;<a href='tools.html'>Tools&nbsp;&amp;&nbsp;Languages</a>&ensp;|&ensp;<a href='examples.html'>Examples</a>&ensp;|&ensp;<a href='refflavors.html'>Reference</a>&ensp;|&ensp;<a href='books.html'>Book&nbsp;Reviews</a>&ensp;|</p><p>|&ensp;<a href='tutorial.html'>Introduction</a>&ensp;|&ensp;<a href='tutorialcnt.html'>Table of Contents</a>&ensp;|&ensp;<a href='characters.html'>Special Characters</a>&ensp;|&ensp;<a href='nonprint.html'>Non-Printable Characters</a>&ensp;|&ensp;<a href='engine.html'>Regex Engine Internals</a>&ensp;|&ensp;<a href='charclass.html'>Character Classes</a>&ensp;|&ensp;<a href='charclasssubtract.html'>Character Class Subtraction</a>&ensp;|&ensp;<a href='charclassintersect.html'>Character Class Intersection</a>&ensp;|&ensp;<a href='shorthand.html'>Shorthand Character Classes</a>&ensp;|&ensp;<a href='dot.html'>Dot</a>&ensp;|&ensp;<a href='anchors.html'>Anchors</a>&ensp;|&ensp;<a href='wordboundaries.html'>Word Boundaries</a>&ensp;|&ensp;<a href='alternation.html'>Alternation</a>&ensp;|&ensp;<a href='optional.html'>Optional Items</a>&ensp;|&ensp;<a href='repeat.html'>Repetition</a>&ensp;|&ensp;<a href='brackets.html'>Grouping &amp; Capturing</a>&ensp;|&ensp;<a href='backref.html'>Backreferences</a>&ensp;|&ensp;<a href='backref2.html'>Backreferences, part 2</a>&ensp;|&ensp;<a href='named.html'>Named Groups</a>&ensp;|&ensp;<a href='backrefrel.html'>Relative Backreferences</a>&ensp;|&ensp;<a href='branchreset.html'>Branch Reset Groups</a>&ensp;|&ensp;<a href='freespacing.html'>Free-Spacing &amp; Comments</a>&ensp;|&ensp;<a href='unicode.html'>Unicode</a>&ensp;|&ensp;<a href='modifiers.html'>Mode Modifiers</a>&ensp;|&ensp;<a href='atomic.html'>Atomic Grouping</a>&ensp;|&ensp;<a href='possessive.html'>Possessive Quantifiers</a>&ensp;|&ensp;<a href='lookaround.html'>Lookahead &amp; Lookbehind</a>&ensp;|&ensp;<a href='lookaround2.html'>Lookaround, part 2</a>&ensp;|&ensp;<a href='keep.html'>Keep Text out of The Match</a>&ensp;|&ensp;<a href='conditional.html'>Conditionals</a>&ensp;|&ensp;<a href='balancing.html'>Balancing Groups</a>&ensp;|&ensp;<a href='recurse.html'>Recursion</a>&ensp;|&ensp;<a href='subroutine.html'>Subroutines</a>&ensp;|&ensp;<a href='recurseinfinite.html'>Infinite Recursion</a>&ensp;|&ensp;<a href='recurserepeat.html'>Recursion &amp; Quantifiers</a>&ensp;|&ensp;<a href='recursecapture.html'>Recursion &amp; Capturing</a>&ensp;|&ensp;<a href='recursebackref.html'>Recursion &amp; Backreferences</a>&ensp;|&ensp;<a href='recursebacktrack.html'>Recursion &amp; Backtracking</a>&ensp;|&ensp;<a href='posixbrackets.html'>POSIX Bracket Expressions</a>&ensp;|&ensp;<a href='zerolength.html'>Zero-Length Matches</a>&ensp;|&ensp;<a href='continue.html'>Continuing Matches</a>&ensp;|</p></div> <div id=copyright> <P CLASS=copyright>Page URL: <A HREF="https://www.regular-expressions.info/named.html" TARGET="_top">https://www.regular-expressions.info/named.html</A><BR> Page last updated: 12 August 2021<BR> Site last updated: 06 November 2024<BR> Copyright &copy; 2003-2024 Jan Goyvaerts. All rights reserved.</P> </div> </div> </div> </body></html>

Pages: 1 2 3 4 5 6 7 8 9 10