CINXE.COM
Regexp Tutorial - Shorthand Character Classes
<!DOCTYPE html> <html lang="en"><head><meta charset="utf-8"><link rel=canonical href='https://https://www.regular-expressions.info//shorthand.html'><title>Regexp Tutorial - Shorthand Character Classes</title> <meta name="viewport" content="width=device-width, initial-scale=1"> <meta name="author" content="Jan Goyvaerts"> <meta name="description" content="In a regular expression, shorthand character classes match a single character from a predefined set of characters."> <meta name="keywords" content=""> <link rel=stylesheet href="regex.css" type="text/css"><script src="theme.js" type="text/javascript"></script><link rel="alternate" type="application/rss+xml" title="New at Regular-Expressions.info" href="updates.xml"> </head> <body bgcolor=white text=black> <div id=top></div> <div id=btntop><div id=btngrid><a href="quickstart.html" target="_top"><div>Quick Start</div></a><a href="tutorial.html" target="_top"><div>Tutorial</div></a><a href="tools.html" target="_top"><div>Tools & Languages</div></a><a href="examples.html" target="_top"><div>Examples</div></a><a href="refflavors.html" target="_top"><div>Reference</div></a><a href="books.html" target="_top"><div>Book Reviews</div></a></div></div> <div id=contents><div id=side> <TABLE CLASS=side CELLSPACING=0 CELLPADDING=4><TR><TD CLASS=sideheader>Regex Tutorial</TD></TR><TR><TD><A HREF="tutorial.html" TARGET=_top>Introduction</A></TD></TR><TR><TD><A HREF="tutorialcnt.html" TARGET=_top>Table of Contents</A></TD></TR><TR><TD><A HREF="characters.html" TARGET=_top>Special Characters</A></TD></TR><TR><TD><A HREF="nonprint.html" TARGET=_top>Non-Printable Characters</A></TD></TR><TR><TD><A HREF="engine.html" TARGET=_top>Regex Engine Internals</A></TD></TR><TR><TD><A HREF="charclass.html" TARGET=_top>Character Classes</A></TD></TR><TR><TD><A HREF="charclasssubtract.html" TARGET=_top>Character Class Subtraction</A></TD></TR><TR><TD><A HREF="charclassintersect.html" TARGET=_top>Character Class Intersection</A></TD></TR><TR><TD><A HREF="shorthand.html" TARGET=_top>Shorthand Character Classes</A></TD></TR><TR><TD><A HREF="dot.html" TARGET=_top>Dot</A></TD></TR><TR><TD><A HREF="anchors.html" TARGET=_top>Anchors</A></TD></TR><TR><TD><A HREF="wordboundaries.html" TARGET=_top>Word Boundaries</A></TD></TR><TR><TD><A HREF="alternation.html" TARGET=_top>Alternation</A></TD></TR><TR><TD><A HREF="optional.html" TARGET=_top>Optional Items</A></TD></TR><TR><TD><A HREF="repeat.html" TARGET=_top>Repetition</A></TD></TR><TR><TD><A HREF="brackets.html" TARGET=_top>Grouping & Capturing</A></TD></TR><TR><TD><A HREF="backref.html" TARGET=_top>Backreferences</A></TD></TR><TR><TD><A HREF="backref2.html" TARGET=_top>Backreferences, part 2</A></TD></TR><TR><TD><A HREF="named.html" TARGET=_top>Named Groups</A></TD></TR><TR><TD><A HREF="backrefrel.html" TARGET=_top>Relative Backreferences</A></TD></TR><TR><TD><A HREF="branchreset.html" TARGET=_top>Branch Reset Groups</A></TD></TR><TR><TD><A HREF="freespacing.html" TARGET=_top>Free-Spacing & Comments</A></TD></TR><TR><TD><A HREF="unicode.html" TARGET=_top>Unicode</A></TD></TR><TR><TD><A HREF="modifiers.html" TARGET=_top>Mode Modifiers</A></TD></TR><TR><TD><A HREF="atomic.html" TARGET=_top>Atomic Grouping</A></TD></TR><TR><TD><A HREF="possessive.html" TARGET=_top>Possessive Quantifiers</A></TD></TR><TR><TD><A HREF="lookaround.html" TARGET=_top>Lookahead & Lookbehind</A></TD></TR><TR><TD><A HREF="lookaround2.html" TARGET=_top>Lookaround, part 2</A></TD></TR><TR><TD><A HREF="keep.html" TARGET=_top>Keep Text out of The Match</A></TD></TR><TR><TD><A HREF="conditional.html" TARGET=_top>Conditionals</A></TD></TR><TR><TD><A HREF="balancing.html" TARGET=_top>Balancing Groups</A></TD></TR><TR><TD><A HREF="recurse.html" TARGET=_top>Recursion</A></TD></TR><TR><TD><A HREF="subroutine.html" TARGET=_top>Subroutines</A></TD></TR><TR><TD><A HREF="recurseinfinite.html" TARGET=_top>Infinite Recursion</A></TD></TR><TR><TD><A HREF="recurserepeat.html" TARGET=_top>Recursion & Quantifiers</A></TD></TR><TR><TD><A HREF="recursecapture.html" TARGET=_top>Recursion & Capturing</A></TD></TR><TR><TD><A HREF="recursebackref.html" TARGET=_top>Recursion & Backreferences</A></TD></TR><TR><TD><A HREF="recursebacktrack.html" TARGET=_top>Recursion & Backtracking</A></TD></TR><TR><TD><A HREF="posixbrackets.html" TARGET=_top>POSIX Bracket Expressions</A></TD></TR><TR><TD><A HREF="zerolength.html" TARGET=_top>Zero-Length Matches</A></TD></TR><TR><TD><A HREF="continue.html" TARGET=_top>Continuing Matches</A></TD></TR> </TABLE><TABLE CLASS=side CELLSPACING=0 CELLPADDING=4><TR><TD CLASS=sideheader>More on This Site</TD></TR><TR><TD><A HREF="index.html" TARGET=_top>Introduction</A></TD></TR><TR><TD><A HREF="quickstart.html" TARGET=_top>Regular Expressions Quick Start</A></TD></TR><TR><TD><A HREF="tutorial.html" TARGET=_top>Regular Expressions Tutorial</A></TD></TR><TR><TD><A HREF="replacetutorial.html" TARGET=_top>Replacement Strings Tutorial</A></TD></TR><TR><TD><A HREF="tools.html" TARGET=_top>Applications and Languages</A></TD></TR><TR><TD><A HREF="examples.html" TARGET=_top>Regular Expressions Examples</A></TD></TR><TR><TD><A HREF="refflavors.html" TARGET=_top>Regular Expressions Reference</A></TD></TR><TR><TD><A HREF="refreplace.html" TARGET=_top>Replacement Strings Reference</A></TD></TR><TR><TD><A HREF="books.html" TARGET=_top>Book Reviews</A></TD></TR><TR><TD><A HREF="print.html" TARGET=_top>Printable PDF</A></TD></TR><TR><TD><A HREF="about.html" TARGET=_top>About This Site</A></TD></TR><TR><TD><A HREF="updates.html" TARGET=_top>RSS Feed & Blog</A></TD></TR></TABLE></DIV><div class=bodytext><div class=topad style="height:130px"><A HREF="https://www.regexbuddy.com/create.html" TARGET="_top"><picture><source media="(max-width: 370px)" srcset="ads/320/rxbtutorial100.png 1x, ads/320/rxbtutorial150.png 1.5x, ads/320/rxbtutorial200.png 2x, ads/320/rxbtutorial250.png 2.5x, ads/320/rxbtutorial300.png 3x, ads/320/rxbtutorial350.png 3.5x, ads/320/rxbtutorial400.png 4x"><source media="(max-width: 500px)" srcset="ads/360/rxbtutorial100.png 1x, ads/360/rxbtutorial150.png 1.5x, ads/360/rxbtutorial200.png 2x, ads/360/rxbtutorial250.png 2.5x, ads/360/rxbtutorial300.png 3x, ads/360/rxbtutorial350.png 3.5x, ads/360/rxbtutorial400.png 4x"><source media="(max-width: 660px)" srcset="ads/480/rxbtutorial100.png 1x, ads/480/rxbtutorial150.png 1.5x, ads/480/rxbtutorial200.png 2x, ads/480/rxbtutorial250.png 2.5x, ads/480/rxbtutorial300.png 3x, ads/480/rxbtutorial350.png 3.5x, ads/480/rxbtutorial400.png 4x"><source media="(max-width: 747px)" srcset="ads/640/rxbtutorial100.png 1x, ads/640/rxbtutorial150.png 1.5x, ads/640/rxbtutorial200.png 2x, ads/640/rxbtutorial250.png 2.5x, ads/640/rxbtutorial300.png 3x, ads/640/rxbtutorial350.png 3.5x, ads/640/rxbtutorial400.png 4x"><img src="ads/728/rxbtutorial100.png" srcset="ads/728/rxbtutorial100.png 1x, ads/728/rxbtutorial125.png 1.25x, ads/728/rxbtutorial150.png 1.5x, ads/728/rxbtutorial175.png 1.75x, ads/728/rxbtutorial200.png 2x, ads/728/rxbtutorial250.png 2.5x, ads/728/rxbtutorial300.png 3x, ads/728/rxbtutorial350.png 3.5x, ads/728/rxbtutorial400.png 4x" alt="RegexBuddy—Better than a regular expression tutorial!"></picture></A></div> <div class=bulb><h1>Shorthand Character Classes</h1><script type="text/javascript">showbulb();</script></div> <p>Since certain character classes are used often, a series of shorthand character classes are available. <TT CLASS=syntax><SPAN CLASS="regexspecial">\d</SPAN></TT> is short for <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">0-9</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>. In most flavors that support Unicode, <TT CLASS=syntax><SPAN CLASS="regexspecial">\d</SPAN></TT> includes all digits from all scripts. Notable exceptions are <A HREF="java.html" TARGET="_top">Java</A>, <A HREF="javascript.html" TARGET="_top">JavaScript</A>, and <A HREF="pcre.html" TARGET="_top">PCRE</A>. These Unicode flavors match only ASCII digits with <TT CLASS=syntax><SPAN CLASS="regexspecial">\d</SPAN></TT>.</p> <p><TT CLASS=syntax><SPAN CLASS="regexspecial">\w</SPAN></TT> stands for “word character”. It always matches the ASCII characters <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">A-Z</SPAN><SPAN CLASS="regexccrange">a-z</SPAN><SPAN CLASS="regexccrange">0-9</SPAN><SPAN CLASS="regexccliteral">_</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>. Notice the inclusion of the underscore and digits. In most flavors that support Unicode, <TT CLASS=syntax><SPAN CLASS="regexspecial">\w</SPAN></TT> includes many characters from other scripts. There is a lot of inconsistency about which characters are actually included. Letters and digits from alphabetic scripts and ideographs are generally included. Connector punctuation other than the underscore and numeric symbols that aren’t digits may or may not be included. <A HREF="xml.html" TARGET="_top">XML Schema</A> and <A HREF="xpath.html" TARGET="_top">XPath</A> even include all symbols in <TT CLASS=syntax><SPAN CLASS="regexspecial">\w</SPAN></TT>. Again, <A HREF="java.html" TARGET="_top">Java</A>, <A HREF="javascript.html" TARGET="_top">JavaScript</A>, and <A HREF="pcre.html" TARGET="_top">PCRE</A> match only ASCII characters with <TT CLASS=syntax><SPAN CLASS="regexspecial">\w</SPAN></TT>.</p> <p><TT CLASS=syntax><SPAN CLASS="regexspecial">\s</SPAN></TT> stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccliteral"> </SPAN><SPAN CLASS="regexccspecial">\t</SPAN><SPAN CLASS="regexccspecial">\r</SPAN><SPAN CLASS="regexccspecial">\n</SPAN><SPAN CLASS="regexccspecial">\f</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>. That is: <TT CLASS=syntax><SPAN CLASS="regexspecial">\s</SPAN></TT> matches a space, a tab, a carriage return, a line feed, or a form feed. Most flavors also include the vertical tab, with <A HREF="perl.html" TARGET="_top">Perl</A> (prior to version 5.18) and <A HREF="pcre.html" TARGET="_top">PCRE</A> (prior to version 8.34) being notable exceptions. In flavors that support Unicode, <TT CLASS=syntax><SPAN CLASS="regexspecial">\s</SPAN></TT> normally includes all characters from the Unicode “separator” category. <A HREF="java.html" TARGET="_top">Java</A> and <A HREF="pcre.html" TARGET="_top">PCRE</A> are exceptions once again. But <A HREF="javascript.html" TARGET="_top">JavaScript</A> does match all Unicode whitespace with <TT CLASS=syntax><SPAN CLASS="regexspecial">\s</SPAN></TT>.</p> <p>Shorthand character classes can be used both inside and outside the square brackets. <TT CLASS=syntax><SPAN CLASS="regexspecial">\s</SPAN><SPAN CLASS="regexspecial">\d</SPAN></TT> matches a whitespace character followed by a digit. <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\s</SPAN><SPAN CLASS="regexccspecial">\d</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> matches a single character that is either whitespace or a digit. When applied to <tt class=string>1 + 2 = 3</tt>, the former regex matches <tt class=match> 2</tt> (space two), while the latter matches <tt class=match>1</tt> (one). <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\d</SPAN><SPAN CLASS="regexccrange">a-f</SPAN><SPAN CLASS="regexccrange">A-F</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> matches a hexadecimal digit, and is equivalent to <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">0-9</SPAN><SPAN CLASS="regexccrange">a-f</SPAN><SPAN CLASS="regexccrange">A-F</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> if your flavor only matches ASCII characters with <TT CLASS=syntax><SPAN CLASS="regexspecial">\d</SPAN></TT>.</p> <a name="negated"></a><h2>Negated Shorthand Character Classes</h2> <p>The above three shorthands also have negated versions. <TT CLASS=syntax><SPAN CLASS="regexspecial">\D</SPAN></TT> is the same as <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccspecial">\d</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>, <TT CLASS=syntax><SPAN CLASS="regexspecial">\W</SPAN></TT> is short for <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccspecial">\w</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> and <TT CLASS=syntax><SPAN CLASS="regexspecial">\S</SPAN></TT> is the equivalent of <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccspecial">\s</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>.</p> <p>Be careful when using the negated shorthands inside square brackets. <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\D</SPAN><SPAN CLASS="regexccspecial">\S</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> is <em>not</em> the same as <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccspecial">\d</SPAN><SPAN CLASS="regexccspecial">\s</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>. The latter matches any character that is neither a digit nor whitespace. It matches <tt class=match>x</tt>, but not <tt class=string>8</tt>. The former, however, matches any character that is either not a digit, or is not whitespace. Because all digits are not whitespace, and all whitespace characters are not digits, <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\D</SPAN><SPAN CLASS="regexccspecial">\S</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> matches any character; digit, whitespace, or otherwise.</p> <a name="more"></a><h2>More Shorthand Character Classes</h2> <p>While support for <TT CLASS=syntax><SPAN CLASS="regexspecial">\d</SPAN></TT>, <TT CLASS=syntax><SPAN CLASS="regexspecial">\s</SPAN></TT>, and <TT CLASS=syntax><SPAN CLASS="regexspecial">\w</SPAN></TT> is quite universal, there are some regex flavors that support additional shorthand character classes. <A HREF="perl.html" TARGET="_top">Perl</A> 5.10 introduced <TT CLASS=syntax><SPAN CLASS="regexspecial">\h</SPAN></TT> and <TT CLASS=syntax><SPAN CLASS="regexspecial">\v</SPAN></TT>. <TT CLASS=syntax><SPAN CLASS="regexspecial">\h</SPAN></TT> matches horizontal whitespace, which includes the tab and all characters in the “space separator” Unicode category. It is the same as <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\t</SPAN><SPAN CLASS="regexccspecial">\p{Zs}</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>. <TT CLASS=syntax><SPAN CLASS="regexspecial">\v</SPAN></TT> matches “vertical whitespace”, which includes all characters treated as line breaks in the Unicode standard. It is the same as <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\n</SPAN><SPAN CLASS="regexccspecial">\cK</SPAN><SPAN CLASS="regexccspecial">\f</SPAN><SPAN CLASS="regexccspecial">\r</SPAN><SPAN CLASS="regexccspecial">\x85</SPAN><SPAN CLASS="regexccspecial">\x{2028}</SPAN><SPAN CLASS="regexccspecial">\x{2029}</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>.</p> <p><A HREF="pcre.html" TARGET="_top">PCRE</A> also supports <TT CLASS=syntax><SPAN CLASS="regexspecial">\h</SPAN></TT> and <TT CLASS=syntax><SPAN CLASS="regexspecial">\v</SPAN></TT> starting with version 7.2. <A HREF="php.html" TARGET="_top">PHP</A> does as of version 5.2.2, <A HREF="java.html" TARGET="_top">Java</A> as of version 8, and the <A HREF="jgsoft.html" TARGET="_top">JGsoft engine</A> as of version 2.</p> <p>If your flavor supports <TT CLASS=syntax><SPAN CLASS="regexspecial">\h</SPAN></TT> and <TT CLASS=syntax><SPAN CLASS="regexspecial">\v</SPAN></TT> then you should definitely use them instead of <TT CLASS=syntax><SPAN CLASS="regexspecial">\s</SPAN></TT> whenever you want to match only one type of whitespace. Using <TT CLASS=syntax><SPAN CLASS="regexspecial">\h</SPAN></TT> instead of <TT CLASS=syntax><SPAN CLASS="regexspecial">\s</SPAN></TT> to match spaces and tabs makes sure your regex match doesn’t accidentally spill into the next line.</p> <p>In many other regex flavors, <TT CLASS=syntax><SPAN CLASS="regexspecial">\v</SPAN></TT> matches only the <A HREF="nonprint.html" TARGET="_top">vertical tab</A> character. Perl, PCRE, and PHP never supported this, so they were free to give <TT CLASS=syntax><SPAN CLASS="regexspecial">\v</SPAN></TT> a different meaning. Java 4 to 7 and JGsoft V1 did use <TT CLASS=syntax><SPAN CLASS="regexspecial">\v</SPAN></TT> to match only the vertical tab. Java 8 and JGsoft V2 changed the meaning of this token anyway. The vertical tab is also a vertical whitespace character. To avoid confusion, the above paragraph uses <TT CLASS=syntax><SPAN CLASS="regexspecial">\cK</SPAN></TT> to represent the vertical tab.</p> <p><A HREF="boost.html" TARGET="_top">Boost</A> supports <TT CLASS=syntax><SPAN CLASS="regexspecial">\h</SPAN></TT> starting with version 1.42. Boost 1.42 and later support <TT CLASS=syntax><SPAN CLASS="regexspecial">\v</SPAN></TT> as a shorthand only outside character classes. <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">\v</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> matches only the vertical tab in Boost.</p> <p><A HREF="ruby.html" TARGET="_top">Ruby</A> 1.9 and later have their own version of <TT CLASS=syntax><SPAN CLASS="regexspecial">\h</SPAN></TT>. It matches a single hexadecimal digit just like <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccrange">0-9</SPAN><SPAN CLASS="regexccrange">a-f</SPAN><SPAN CLASS="regexccrange">A-F</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT>. <TT CLASS=syntax><SPAN CLASS="regexspecial">\v</SPAN></TT> is a vertical tab in Ruby.</p> <a name="xml"></a><h2>XML Character Classes</h2> <p><A HREF="xml.html" TARGET="_top">XML Schema</A>, <A HREF="xpath.html" TARGET="_top">XPath</A>, and <a href="jgsoft.html#v2">JGsoft V2</a> regular expressions support four more shorthands that aren’t supported by any other regular expression flavors. <TT CLASS=syntax><SPAN CLASS="regexspecial">\i</SPAN></TT> matches any character that may be the first character of an XML name. <TT CLASS=syntax><SPAN CLASS="regexspecial">\c</SPAN></TT> matches any character that may occur after the first character in an XML name. <TT CLASS=syntax><SPAN CLASS="regexspecial">\I</SPAN></TT> and <TT CLASS=syntax><SPAN CLASS="regexspecial">\C</SPAN></TT> are the respective negated shorthands. Note that the <TT CLASS=syntax><SPAN CLASS="regexspecial">\c</SPAN></TT> shorthand syntax conflicts with the <A HREF="nonprint.html" TARGET="_top">control character</A> syntax used in many other regex flavors.</p> <p>You can use these four shorthands both inside and outside character classes using the bracket notation. They’re very useful for validating XML references and values in your XML schemas. The regular expression <TT CLASS=syntax><SPAN CLASS="regexspecial">\i</SPAN><SPAN CLASS="regexspecial">\c</SPAN><SPAN CLASS="regexspecial">*</SPAN></TT> matches an XML name like <tt class=match>xml:schema</tt>.</p> <p>The regex <TT CLASS=syntax><SPAN CLASS="regexplain"><</SPAN><SPAN CLASS="regexspecial">\i</SPAN><SPAN CLASS="regexspecial">\c</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexspecial">\s</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexplain">></SPAN></TT> matches an opening XML tag without any attributes. <TT CLASS=syntax><SPAN CLASS="regexplain"></</SPAN><SPAN CLASS="regexspecial">\i</SPAN><SPAN CLASS="regexspecial">\c</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexspecial">\s</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexplain">></SPAN></TT> matches any closing tag. <TT CLASS=syntax><SPAN CLASS="regexplain"><</SPAN><SPAN CLASS="regexspecial">\i</SPAN><SPAN CLASS="regexspecial">\c</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexnest1">(</SPAN><SPAN CLASS="regexspecial">\s</SPAN><SPAN CLASS="regexspecial">+</SPAN><SPAN CLASS="regexspecial">\i</SPAN><SPAN CLASS="regexspecial">\c</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexspecial">\s</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexplain">=</SPAN><SPAN CLASS="regexspecial">\s</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexnest2">(</SPAN><SPAN CLASS="regexplain">"</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccliteral">"</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexplain">"</SPAN><SPAN CLASS="regexnest2">|</SPAN><SPAN CLASS="regexplain">'</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccliteral">'</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexplain">'</SPAN><SPAN CLASS="regexnest2">)</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexspecial">\s</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexplain">></SPAN></TT> matches an opening tag with any number of attributes. Putting it all together, <TT CLASS=syntax><SPAN CLASS="regexplain"><</SPAN><SPAN CLASS="regexnest1">(</SPAN><SPAN CLASS="regexspecial">\i</SPAN><SPAN CLASS="regexspecial">\c</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexnest2">(</SPAN><SPAN CLASS="regexspecial">\s</SPAN><SPAN CLASS="regexspecial">+</SPAN><SPAN CLASS="regexspecial">\i</SPAN><SPAN CLASS="regexspecial">\c</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexspecial">\s</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexplain">=</SPAN><SPAN CLASS="regexspecial">\s</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexnest3">(</SPAN><SPAN CLASS="regexplain">"</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccliteral">"</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexplain">"</SPAN><SPAN CLASS="regexnest3">|</SPAN><SPAN CLASS="regexplain">'</SPAN><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccspecial">^</SPAN><SPAN CLASS="regexccliteral">'</SPAN><SPAN CLASS="regexccopen">]</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexplain">'</SPAN><SPAN CLASS="regexnest3">)</SPAN><SPAN CLASS="regexnest2">)</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexnest1">|</SPAN><SPAN CLASS="regexplain">/</SPAN><SPAN CLASS="regexspecial">\i</SPAN><SPAN CLASS="regexspecial">\c</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexnest1">)</SPAN><SPAN CLASS="regexspecial">\s</SPAN><SPAN CLASS="regexspecial">*</SPAN><SPAN CLASS="regexplain">></SPAN></TT> matches either an opening tag with attributes or a closing tag.</p> <p>No other regex flavors discussed in this tutorial support XML character classes. If your XML files are plain ASCII , you can use <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccliteral">_:</SPAN><SPAN CLASS="regexccrange">A-Z</SPAN><SPAN CLASS="regexccrange">a-z</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> for <TT CLASS=syntax><SPAN CLASS="regexspecial">\i</SPAN></TT> and <TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccliteral">-._:</SPAN><SPAN CLASS="regexccrange">A-Z</SPAN><SPAN CLASS="regexccrange">a-z</SPAN><SPAN CLASS="regexccrange">0-9</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT> for <TT CLASS=syntax><SPAN CLASS="regexspecial">\c</SPAN></TT>. If you want to allow all Unicode characters that the XML standard allows, then you will end up with some pretty long regexes. Instead of <TT CLASS=syntax><SPAN CLASS="regexspecial">\i</SPAN></TT> you would use:</p> <p><TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccliteral">:</SPAN><SPAN CLASS="regexccrange">A-Z</SPAN><SPAN CLASS="regexccliteral">_</SPAN><SPAN CLASS="regexccrange">a-z</SPAN><SPAN CLASS="regexccrange">\u00C0-\u00D6</SPAN><SPAN CLASS="regexccrange">\u00D8-\u00F6</SPAN><SPAN CLASS="regexccrange">\u00F8-\u02FF</SPAN><SPAN CLASS="regexccrange">\u0370-\u037D</SPAN><SPAN CLASS="regexccrange">\u037F-\u1FFF</SPAN><SPAN CLASS="regexccrange">\u200C-\u200D</SPAN><SPAN CLASS="regexccrange">\u2070-\u218F</SPAN><SPAN CLASS="regexccrange">\u2C00-\u2FEF</SPAN><SPAN CLASS="regexccrange">\u3001-\uD7FF</SPAN><SPAN CLASS="regexccrange">\uF900-\uFDCF</SPAN><SPAN CLASS="regexccrange">\uFDF0-\uFFFD</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT></p> <p>Instead of <TT CLASS=syntax><SPAN CLASS="regexspecial">\c</SPAN></TT> you would use:</p> <p><TT CLASS=syntax><SPAN CLASS="regexccopen">[</SPAN><SPAN CLASS="regexccliteral">-.</SPAN><SPAN CLASS="regexccrange">0-9</SPAN><SPAN CLASS="regexccliteral">:</SPAN><SPAN CLASS="regexccrange">A-Z</SPAN><SPAN CLASS="regexccliteral">_</SPAN><SPAN CLASS="regexccrange">a-z</SPAN><SPAN CLASS="regexccspecial">\u00B7</SPAN><SPAN CLASS="regexccrange">\u00C0-\u00D6</SPAN><SPAN CLASS="regexccrange">\u00D8-\u00F6</SPAN><SPAN CLASS="regexccrange">\u00F8-\u037D</SPAN><SPAN CLASS="regexccrange">\u037F-\u1FFF</SPAN><SPAN CLASS="regexccrange">\u200C-\u200D</SPAN><SPAN CLASS="regexccspecial">\u203F</SPAN><SPAN CLASS="regexccspecial">\u2040</SPAN><SPAN CLASS="regexccrange">\u2070-\u218F</SPAN><SPAN CLASS="regexccrange">\u2C00-\u2FEF</SPAN><SPAN CLASS="regexccrange">\u3001-\uD7FF</SPAN><SPAN CLASS="regexccrange">\uF900-\uFDCF</SPAN><SPAN CLASS="regexccrange">\uFDF0-\uFFFD</SPAN><SPAN CLASS="regexccopen">]</SPAN></TT></p> <div id=cntmobi><p>| <a href='quickstart.html'>Quick Start</a> | <a href='tutorial.html'>Tutorial</a> | <a href='tools.html'>Tools & Languages</a> | <a href='examples.html'>Examples</a> | <a href='refflavors.html'>Reference</a> | <a href='books.html'>Book Reviews</a> |</p><p>| <a href='tutorial.html'>Introduction</a> | <a href='tutorialcnt.html'>Table of Contents</a> | <a href='characters.html'>Special Characters</a> | <a href='nonprint.html'>Non-Printable Characters</a> | <a href='engine.html'>Regex Engine Internals</a> | <a href='charclass.html'>Character Classes</a> | <a href='charclasssubtract.html'>Character Class Subtraction</a> | <a href='charclassintersect.html'>Character Class Intersection</a> | <a href='shorthand.html'>Shorthand Character Classes</a> | <a href='dot.html'>Dot</a> | <a href='anchors.html'>Anchors</a> | <a href='wordboundaries.html'>Word Boundaries</a> | <a href='alternation.html'>Alternation</a> | <a href='optional.html'>Optional Items</a> | <a href='repeat.html'>Repetition</a> | <a href='brackets.html'>Grouping & Capturing</a> | <a href='backref.html'>Backreferences</a> | <a href='backref2.html'>Backreferences, part 2</a> | <a href='named.html'>Named Groups</a> | <a href='backrefrel.html'>Relative Backreferences</a> | <a href='branchreset.html'>Branch Reset Groups</a> | <a href='freespacing.html'>Free-Spacing & Comments</a> | <a href='unicode.html'>Unicode</a> | <a href='modifiers.html'>Mode Modifiers</a> | <a href='atomic.html'>Atomic Grouping</a> | <a href='possessive.html'>Possessive Quantifiers</a> | <a href='lookaround.html'>Lookahead & Lookbehind</a> | <a href='lookaround2.html'>Lookaround, part 2</a> | <a href='keep.html'>Keep Text out of The Match</a> | <a href='conditional.html'>Conditionals</a> | <a href='balancing.html'>Balancing Groups</a> | <a href='recurse.html'>Recursion</a> | <a href='subroutine.html'>Subroutines</a> | <a href='recurseinfinite.html'>Infinite Recursion</a> | <a href='recurserepeat.html'>Recursion & Quantifiers</a> | <a href='recursecapture.html'>Recursion & Capturing</a> | <a href='recursebackref.html'>Recursion & Backreferences</a> | <a href='recursebacktrack.html'>Recursion & Backtracking</a> | <a href='posixbrackets.html'>POSIX Bracket Expressions</a> | <a href='zerolength.html'>Zero-Length Matches</a> | <a href='continue.html'>Continuing Matches</a> |</p></div> <div id=copyright> <P CLASS=copyright>Page URL: <A HREF="https://www.regular-expressions.info/shorthand.html" TARGET="_top">https://www.regular-expressions.info/shorthand.html</A><BR> Page last updated: 21 May 2024<BR> Site last updated: 06 November 2024<BR> Copyright © 2003-2024 Jan Goyvaerts. All rights reserved.</P> </div> </div> </div> </body></html>