CINXE.COM
Character Class Subtraction in Regular Expressions
<!DOCTYPE html> <html lang="en"><head><meta charset="utf-8"><link rel=canonical href='https://https://www.regular-expressions.info//charclasssubtract.html'><title>Character Class Subtraction in Regular Expressions</title> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="author" content="Jan Goyvaerts"> <meta name="description" content="Explains character class subtraction in regular expressions"> <meta name="keywords" content=""> <link rel=stylesheet href="regex.css" type="text/css"><script src="theme.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="New at Regular-Expressions.info" href="updates.xml"> </head> <body bgcolor=white text=black> <div id=top></div> <div id=btntop><div id=btngrid><a href="quickstart.html" target="_top"><div>Quick Start</div></a><a href="tutorial.html" target="_top"><div>Tutorial</div></a><a href="tools.html" target="_top"><div>Tools & Languages</div></a><a href="examples.html" target="_top"><div>Examples</div></a><a href="refflavors.html" target="_top"><div>Reference</div></a><a href="books.html" target="_top"><div>Book Reviews</div></a></div></div> <div id=contents><div id=side> <TABLE CLASS=side CELLSPACING=0 CELLPADDING=4><TR><TD CLASS=sideheader>Regex Tutorial</TD></TR><TR><TD><A HREF="tutorial.html" TARGET=_top>Introduction</A></TD></TR><TR><TD><A HREF="tutorialcnt.html" TARGET=_top>Table of Contents</A></TD></TR><TR><TD><A HREF="characters.html" TARGET=_top>Special Characters</A></TD></TR><TR><TD><A HREF="nonprint.html" TARGET=_top>Non-Printable Characters</A></TD></TR><TR><TD><A HREF="engine.html" TARGET=_top>Regex Engine Internals</A></TD></TR><TR><TD><A HREF="charclass.html" TARGET=_top>Character Classes</A></TD></TR><TR><TD><A HREF="charclasssubtract.html" TARGET=_top>Character Class Subtraction</A></TD></TR><TR><TD><A HREF="charclassintersect.html" TARGET=_top>Character Class Intersection</A></TD></TR><TR><TD><A HREF="shorthand.html" TARGET=_top>Shorthand Character Classes</A></TD></TR><TR><TD><A HREF="dot.html" TARGET=_top>Dot</A></TD></TR><TR><TD><A HREF="anchors.html" TARGET=_top>Anchors</A></TD></TR><TR><TD><A HREF="wordboundaries.html" TARGET=_top>Word Boundaries</A></TD></TR><TR><TD><A HREF="alternation.html" TARGET=_top>Alternation</A></TD></TR><TR><TD><A HREF="optional.html" TARGET=_top>Optional Items</A></TD></TR><TR><TD><A HREF="repeat.html" TARGET=_top>Repetition</A></TD></TR><TR><TD><A HREF="brackets.html" TARGET=_top>Grouping & Capturing</A></TD></TR><TR><TD><A HREF="backref.html" TARGET=_top>Backreferences</A></TD></TR><TR><TD><A HREF="backref2.html" TARGET=_top>Backreferences, part 2</A></TD></TR><TR><TD><A HREF="named.html" TARGET=_top>Named Groups</A></TD></TR><TR><TD><A HREF="backrefrel.html" TARGET=_top>Relative Backreferences</A></TD></TR><TR><TD><A HREF="branchreset.html" TARGET=_top>Branch Reset Groups</A></TD></TR><TR><TD><A HREF="freespacing.html" TARGET=_top>Free-Spacing & Comments</A></TD></TR><TR><TD><A HREF="unicode.html" TARGET=_top>Unicode</A></TD></TR><TR><TD><A HREF="modifiers.html" TARGET=_top>Mode Modifiers</A></TD></TR><TR><TD><A HREF="atomic.html" TARGET=_top>Atomic Grouping</A></TD></TR><TR><TD><A HREF="possessive.html" TARGET=_top>Possessive Quantifiers</A></TD></TR><TR><TD><A HREF="lookaround.html" TARGET=_top>Lookahead & Lookbehind</A></TD></TR><TR><TD><A HREF="lookaround2.html" TARGET=_top>Lookaround, part 2</A></TD></TR><TR><TD><A HREF="keep.html" TARGET=_top>Keep Text out of The Match</A></TD></TR><TR><TD><A HREF="conditional.html" TARGET=_top>Conditionals</A></TD></TR><TR><TD><A HREF="balancing.html" TARGET=_top>Balancing Groups</A></TD></TR><TR><TD><A HREF="recurse.html" TARGET=_top>Recursion</A></TD></TR><TR><TD><A HREF="subroutine.html" TARGET=_top>Subroutines</A></TD></TR><TR><TD><A HREF="recurseinfinite.html" TARGET=_top>Infinite Recursion</A></TD></TR><TR><TD><A HREF="recurserepeat.html" TARGET=_top>Recursion & Quantifiers</A></TD></TR><TR><TD><A HREF="recursecapture.html" TARGET=_top>Recursion & Capturing</A></TD></TR><TR><TD><A HREF="recursebackref.html" TARGET=_top>Recursion & Backreferences</A></TD></TR><TR><TD><A HREF="recursebacktrack.html" TARGET=_top>Recursion & Backtracking</A></TD></TR><TR><TD><A HREF="posixbrackets.html" TARGET=_top>POSIX Bracket Expressions</A></TD></TR><TR><TD><A HREF="zerolength.html" TARGET=_top>Zero-Length Matches</A></TD></TR><TR><TD><A HREF="continue.html" TARGET=_top>Continuing Matches</A></TD></TR> </TABLE><TABLE CLASS=side CELLSPACING=0 CELLPADDING=4><TR><TD CLASS=sideheader>More on This Site</TD></TR><TR><TD><A HREF="index.html" TARGET=_top>Introduction</A></TD></TR><TR><TD><A HREF="quickstart.html" TARGET=_top>Regular Expressions Quick Start</A></TD></TR><TR><TD><A HREF="tutorial.html" TARGET=_top>Regular Expressions Tutorial</A></TD></TR><TR><TD><A HREF="replacetutorial.html" TARGET=_top>Replacement Strings Tutorial</A></TD></TR><TR><TD><A HREF="tools.html" TARGET=_top>Applications and Languages</A></TD></TR><TR><TD><A HREF="examples.html" TARGET=_top>Regular Expressions Examples</A></TD></TR><TR><TD><A HREF="refflavors.html" TARGET=_top>Regular Expressions Reference</A></TD></TR><TR><TD><A HREF="refreplace.html" TARGET=_top>Replacement Strings Reference</A></TD></TR><TR><TD><A HREF="books.html" TARGET=_top>Book Reviews</A></TD></TR><TR><TD><A HREF="print.html" TARGET=_top>Printable PDF</A></TD></TR><TR><TD><A HREF="about.html" TARGET=_top>About This Site</A></TD></TR><TR><TD><A HREF="updates.html" TARGET=_top>RSS Feed & Blog</A></TD></TR></TABLE></DIV><div class=bodytext><div class=topad style="height:130px"><A HREF="https://www.regexbuddy.com/create.html" TARGET="_top"><picture><source media="(max-width: 370px)" srcset="ads/320/rxbtutorial100.png 1x, ads/320/rxbtutorial150.png 1.5x, ads/320/rxbtutorial200.png 2x, ads/320/rxbtutorial250.png 2.5x, ads/320/rxbtutorial300.png 3x, ads/320/rxbtutorial350.png 3.5x, ads/320/rxbtutorial400.png 4x"><source media="(max-width: 500px)" srcset="ads/360/rxbtutorial100.png 1x, ads/360/rxbtutorial150.png 1.5x, ads/360/rxbtutorial200.png 2x, ads/360/rxbtutorial250.png 2.5x, ads/360/rxbtutorial300.png 3x, ads/360/rxbtutorial350.png 3.5x, ads/360/rxbtutorial400.png 4x"><source media="(max-width: 660px)" srcset="ads/480/rxbtutorial100.png 1x, ads/480/rxbtutorial150.png 1.5x, ads/480/rxbtutorial200.png 2x, ads/480/rxbtutorial250.png 2.5x, ads/480/rxbtutorial300.png 3x, ads/480/rxbtutorial350.png 3.5x, ads/480/rxbtutorial400.png 4x"><source media="(max-width: 747px)" srcset="ads/640/rxbtutorial100.png 1x, ads/640/rxbtutorial150.png 1.5x, ads/640/rxbtutorial200.png 2x, ads/640/rxbtutorial250.png 2.5x, ads/640/rxbtutorial300.png 3x, ads/640/rxbtutorial350.png 3.5x, ads/640/rxbtutorial400.png 4x"><img src="ads/728/rxbtutorial100.png" srcset="ads/728/rxbtutorial100.png 1x, ads/728/rxbtutorial125.png 1.25x, ads/728/rxbtutorial150.png 1.5x, ads/728/rxbtutorial175.png 1.75x, ads/728/rxbtutorial200.png 2x, ads/728/rxbtutorial250.png 2.5x, ads/728/rxbtutorial300.png 3x, ads/728/rxbtutorial350.png 3.5x, ads/728/rxbtutorial400.png 4x" alt="RegexBuddy—Better than a regular expression tutorial!"></picture></A></div> <a name="subtract"></a><div class=bulb><h1>Character Class Subtraction</h1><script type="text/javascript">showbulb();</script></div> <p>Character class subtraction is supported by the <A HREF="xml.html" TARGET="_top">XML Schema</A>, <A HREF="xpath.html" TARGET="_top">XPath</A>, <A HREF="dotnet.html" TARGET="_top">.NET</A> (version 2.0 and later), and <A HREF="jgsoft.html" TARGET="_top">JGsoft</A> regex flavors. It makes it easy to match any single character present in one list (the character class), but not present in another list (the subtracted class). The syntax for this is <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccliteral">class</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccliteral">subtract</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>. If the character after a hyphen is an opening bracket, these flavors interpret the hyphen as the subtraction operator rather than the range operator. You can use the full character class syntax within the subtracted character class.</p> <p>The character class <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">a-z</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccliteral">aeiuo</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> matches a single letter that is not a vowel. In other words: it matches a single consonant. Without character class subtraction or <A HREF="charclassintersect.html" TARGET="_top">intersection</A>, the only way to do this would be to list all consonants: <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">b-d</SPAN><SPAN CLASS="regexccrange">f-h</SPAN><SPAN CLASS="regexccrange">j-n</SPAN><SPAN CLASS="regexccrange">p-t</SPAN><SPAN CLASS="regexccrange">v-z</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>.</p> <p>The character class <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\p{Nd}</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccspecial">\p{IsThai}</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> matches any single Thai digit. The base class matches any Unicode digit. All non-Thai characters are subtracted from that class. <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\p{Nd}</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccspecial">\P{IsThai}</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> does the same. <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\p{IsThai}</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccspecial">\p{Nd}</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> and <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\p{IsThai}</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccspecial">\P{Nd}</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> also match a single Thai digit by subtracting all non-digits from the Thai characters.</p> <h2>Nested Character Class Subtraction</h2> <p>Since you can use the full character class syntax within the subtracted character class, you can subtract a class from the class being subtracted. <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">0-9</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccrange">0-6</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccrange">0-3</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> first subtracts <tt>0-3</tt> from <tt>0-6</tt>, yielding <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">0-9</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccrange">4-6</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>, or <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">0-3</SPAN><SPAN CLASS="regexccrange">7-9</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>, which matches any character in the string <tt class=string>0123789</tt>.</p> <p>The class subtraction must always be the last element in the character class. <tt>[0-9-[4-6]a-f]</tt> is not a valid regular expression. It should be rewritten as <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">0-9</SPAN><SPAN CLASS="regexccrange">a-f</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccrange">4-6</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>. The subtraction works on the whole class. E.g. <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\p{Ll}</SPAN><SPAN CLASS="regexccspecial">\p{Lu}</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccspecial">\p{IsBasicLatin}</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> matches all uppercase and lowercase Unicode letters, except any ASCII letters. The <tt>\p{IsBasicLatin}</tt> is subtracted from the combination of <tt>\p{Ll}\p{Lu}</tt> rather than from <tt>\p{Lu}</tt> alone. This regex will not match <tt class=string>abc</tt>.</p> <p>While you can use nested character class subtraction, you cannot subtract two classes sequentially. To subtract ASCII characters and Greek characters from a class with all Unicode letters, combine the ASCII and Greek characters into one class, and subtract that, as in <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\p{L}</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccspecial">\p{IsBasicLatin}</SPAN><SPAN CLASS="regexccspecial">\p{IsGreek}</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>.</p> <h2>Negation Takes Precedence over Subtraction</h2> <p>The character class <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccliteral">1234</SPAN><SPAN CLASS="regexccopen">-[</SPAN><SPAN CLASS="regexccliteral">3456</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> is both negated and subtracted from. In all flavors that support character class subtraction, the base class is negated before it is subtracted from. This class should be read as “(not 1234) minus 3456”. Thus this character class matches any character other than the digits 1, 2, 3, 4, 5, and 6.</p> <h2>Notational Compatibility with Other Regex Flavors</h2> <p>Note that a regex like <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">a-z</SPAN><SPAN CLASS="regexccliteral">-[aeiuo</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexplain">]</SPAN></TT> does not cause any errors in most regex flavors that do not support character class subtraction. But it won’t match what you intended either. In most flavors, this regex consists of a character class followed by a literal <TT CLASS=syntax><SPAN CLASS="regexccopen">]</SPAN></TT>. The character class matches a character that is either in the range a-z, or a hyphen, or an opening bracket, or a vowel. Since the a-z range and the vowels are redundant, you could write this character class as <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">a-z</SPAN><SPAN CLASS="regexccliteral">-[</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> or <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccliteral">-[</SPAN><SPAN CLASS="regexccrange">a-z</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> in Perl. A hyphen after a range is treated as a literal character, just like a hyphen immediately after the opening bracket. This is true in the XML, .NET and JGsoft flavors too. <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">a-z</SPAN><SPAN CLASS="regexccliteral">-_</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> matches a lowercase letter, a hyphen or an underscore in these flavors.</p> <p>Strictly speaking, this means that the character class subtraction syntax is incompatible with Perl and the majority of other regex flavors. But in practice there’s no difference. Using non-alphanumeric characters in character class ranges is very bad practice because it relies on the order of characters in the ASCII character table. That makes the regular expression hard to understand for the programmer who inherits your work. While <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">A-[</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> would match any uppercase letter or an opening square bracket in Perl, this regex is much clearer when written as <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">A-Z</SPAN><SPAN CLASS="regexccliteral">[</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>. The former regex would cause an error with the XML, .NET and JGsoft flavors, because they interpret <tt>-[]</tt> as an empty subtracted class, leaving an unbalanced <tt>[</tt>.</p> <div id=cntmobi><p>| <a href='quickstart.html'>Quick Start</a> | <a href='tutorial.html'>Tutorial</a> | <a href='tools.html'>Tools & Languages</a> | <a href='examples.html'>Examples</a> | <a href='refflavors.html'>Reference</a> | <a href='books.html'>Book Reviews</a> |</p><p>| <a href='tutorial.html'>Introduction</a> | <a href='tutorialcnt.html'>Table of Contents</a> | <a href='characters.html'>Special Characters</a> | <a href='nonprint.html'>Non-Printable Characters</a> | <a href='engine.html'>Regex Engine Internals</a> | <a href='charclass.html'>Character Classes</a> | <a href='charclasssubtract.html'>Character Class Subtraction</a> | <a href='charclassintersect.html'>Character Class Intersection</a> | <a href='shorthand.html'>Shorthand Character Classes</a> | <a href='dot.html'>Dot</a> | <a href='anchors.html'>Anchors</a> | <a href='wordboundaries.html'>Word Boundaries</a> | <a href='alternation.html'>Alternation</a> | <a href='optional.html'>Optional Items</a> | <a href='repeat.html'>Repetition</a> | <a href='brackets.html'>Grouping & Capturing</a> | <a href='backref.html'>Backreferences</a> | <a href='backref2.html'>Backreferences, part 2</a> | <a href='named.html'>Named Groups</a> | <a href='backrefrel.html'>Relative Backreferences</a> | <a href='branchreset.html'>Branch Reset Groups</a> | <a href='freespacing.html'>Free-Spacing & Comments</a> | <a href='unicode.html'>Unicode</a> | <a href='modifiers.html'>Mode Modifiers</a> | <a href='atomic.html'>Atomic Grouping</a> | <a href='possessive.html'>Possessive Quantifiers</a> | <a href='lookaround.html'>Lookahead & Lookbehind</a> | <a href='lookaround2.html'>Lookaround, part 2</a> | <a href='keep.html'>Keep Text out of The Match</a> | <a href='conditional.html'>Conditionals</a> | <a href='balancing.html'>Balancing Groups</a> | <a href='recurse.html'>Recursion</a> | <a href='subroutine.html'>Subroutines</a> | <a href='recurseinfinite.html'>Infinite Recursion</a> | <a href='recurserepeat.html'>Recursion & Quantifiers</a> | <a href='recursecapture.html'>Recursion & Capturing</a> | <a href='recursebackref.html'>Recursion & Backreferences</a> | <a href='recursebacktrack.html'>Recursion & Backtracking</a> | <a href='posixbrackets.html'>POSIX Bracket Expressions</a> | <a href='zerolength.html'>Zero-Length Matches</a> | <a href='continue.html'>Continuing Matches</a> |</p></div> <div id=copyright> <P CLASS=copyright>Page URL: <A HREF="https://www.regular-expressions.info/charclasssubtract.html" TARGET="_top">https://www.regular-expressions.info/charclasssubtract.html</A><BR> Page last updated: 22 November 2019<BR> Site last updated: 06 November 2024<BR> Copyright © 2003-2024 Jan Goyvaerts. All rights reserved.</P> </div> </div> </div> </body></html>