|  | Home | Libraries | People | FAQ | More | 
        When using xpressive, the first thing you'll do is create a basic_regex<>
The feature that really sets xpressive apart from other C/C++ regular expression libraries is the ability to author a regular expression using C++ expressions. xpressive achieves this through operator overloading, using a technique called expression templates to embed a mini-language dedicated to pattern matching within C++. These "static regexes" have many advantages over their string-based brethren. In particular, static regexes:
Since we compose static regexes using C++ expressions, we are constrained by the rules for legal C++ expressions. Unfortunately, that means that "classic" regular expression syntax cannot always be mapped cleanly into C++. Rather, we map the regex constructs, picking new syntax that is legal C++.
          You create a static regex by assigning one to an object of type basic_regex<>std::string:
        
sregex re = '$' >> +_d >> '.' >> _d >> _d;
Assignment works similarly.
          In static regexes, character and string literals match themselves. For
          instance, in the regex above, '$'
          and '.' match the characters
          '$' and '.'
          respectively. Don't be confused by the fact that $ and
          . are meta-characters in Perl. In xpressive, literals
          always represent themselves.
        
When using literals in static regexes, you must take care that at least one operand is not a literal. For instance, the following are not valid regexes:
sregex re1 = 'a' >> 'b'; // ERROR! sregex re2 = +'a'; // ERROR!
          The two operands to the binary >>
          operator are both literals, and the operand of the unary + operator is also a literal, so these statements
          will call the native C++ binary right-shift and unary plus operators, respectively.
          That's not what we want. To get operator overloading to kick in, at least
          one operand must be a user-defined type. We can use xpressive's as_xpr()
          helper function to "taint" an expression with regex-ness, forcing
          operator overloading to find the correct operators. The two regexes above
          should be written as:
        
sregex re1 = as_xpr('a') >> 'b'; // OK sregex re2 = +as_xpr('a'); // OK
          As you've probably already noticed, sub-expressions in static regexes must
          be separated by the sequencing operator, >>.
          You can read this operator as "followed by".
        
// Match an 'a' followed by a digit sregex re = 'a' >> _d;
          Alternation works just as it does in Perl with the |
          operator. You can read this operator as "or". For example:
        
// match a digit character or a word character one or more times sregex re = +( _d | _w );
          In Perl, parentheses () have
          special meaning. They group, but as a side-effect they also create back-references
          like $1 and $2. In C++, parentheses
          only group -- there is no way to give them side-effects. To get the same
          effect, we use the special s1,
          s2, etc. tokens. Assigning
          to one creates a back-reference. You can then use the back-reference later
          in your expression, like using \1 and \2
          in Perl. For example, consider the following regex, which finds matching
          HTML tags:
        
"<(\\w+)>.*?</\\1>"
In static xpressive, this would be:
'<' >> (s1= +_w) >> '>' >> -*_ >> "</" >> s1 >> '>'
          Notice how you capture a back-reference by assigning to s1,
          and then you use s1 later
          in the pattern to find the matching end tag.
        
| ![[Tip]](../../../../../../doc/src/images/tip.png) | Tip | 
|---|---|
| 
            Grouping without capturing a back-reference
             | 
          Perl lets you make part of your regular expression case-insensitive by
          using the (?i:) pattern modifier. xpressive also has
          a case-insensitivity pattern modifier, called icase.
          You can use it as follows:
        
sregex re = "this" >> icase( "that" );
          In this regular expression, "this"
          will be matched exactly, but "that"
          will be matched irrespective of case.
        
          Case-insensitive regular expressions raise the issue of internationalization:
          how should case-insensitive character comparisons be evaluated? Also, many
          character classes are locale-specific. Which characters are matched by
          digit and which are matched
          by alpha? The answer depends
          on the std::locale object the regular expression
          object is using. By default, all regular expression objects use the global
          locale. You can override the default by using the imbue() pattern modifier, as follows:
        
std::locale my_locale = /* initialize a std::locale object */; sregex re = imbue( my_locale )( +alpha >> +digit );
          This regular expression will evaluate alpha
          and digit according to
          my_locale. See the section
          on Localization
          and Regex Traits for more information about how to customize the
          behavior of your regexes.
        
The table below lists the familiar regex constructs and their equivalents in static xpressive.
Table 1.4. Perl syntax vs. Static xpressive syntax
| Perl | Static xpressive | Meaning | 
|---|---|---|
| 
                     | any character (assuming Perl's /s modifier). | |
| 
                     | 
                     | 
                    sequencing of  | 
| 
                     | 
                     | 
                    alternation of  | 
| 
                     | 
                     | group and capture a back-reference. | 
| 
                     | 
                     | group and do not capture a back-reference. | 
| 
                     | a previously captured back-reference. | |
| 
                     | 
                     | zero or more times, greedy. | 
| 
                     | 
                     | one or more times, greedy. | 
| 
                     | 
                     | zero or one time, greedy. | 
| 
                     | 
                     | 
                    between  | 
| 
                     | 
                     | zero or more times, non-greedy. | 
| 
                     | 
                     | one or more times, non-greedy. | 
| 
                     | 
                     | zero or one time, non-greedy. | 
| 
                     | 
                     | 
                    between  | 
| 
                     | beginning of sequence assertion. | |
| 
                     | end of sequence assertion. | |
| 
                     | word boundary assertion. | |
| 
                     | 
                     | not word boundary assertion. | 
| 
                     | literal newline. | |
| 
                     | 
                     | any character except a literal newline (without Perl's /s modifier). | 
| 
                     | logical newline. | |
| 
                     | 
                     | any single character not a logical newline. | 
| 
                     | a word character, equivalent to set[alnum | '_']. | |
| 
                     | 
                     | not a word character, equivalent to ~set[alnum | '_']. | 
| 
                     | a digit character. | |
| 
                     | 
                     | not a digit character. | 
| 
                     | a space character. | |
| 
                     | 
                     | not a space character. | 
| 
                     | an alpha-numeric character. | |
| 
                     | an alphabetic character. | |
| 
                     | a horizontal white-space character. | |
| 
                     | a control character. | |
| 
                     | a digit character. | |
| 
                     | a graphable character. | |
| 
                     | a lower-case character. | |
| 
                     | a printing character. | |
| 
                     | a punctuation character. | |
| 
                     | a white-space character. | |
| 
                     | an upper-case character. | |
| 
                     | a hexadecimal digit character. | |
| 
                     | 
                     | 
                    characters in range  | 
| 
                     | 
                     | 
                    characters  | 
| 
                     | 
                     | same as above | 
| 
                     | 
                    characters  | |
| 
                     | same as above | |
| 
                     | 
                     | 
                    not characters  | 
| 
                     | 
                     | match stuff disregarding case. | 
| 
                     | 
                     | independent sub-expression, match stuff and turn off backtracking. | 
| 
                     | 
                     | positive look-ahead assertion, match if before stuff but don't include stuff in the match. | 
| 
                     | 
                     | negative look-ahead assertion, match if not before stuff. | 
| 
                     | 
                     | positive look-behind assertion, match if after stuff but don't include stuff in the match. (stuff must be constant-width.) | 
| 
                     | 
                     | negative look-behind assertion, match if not after stuff. (stuff must be constant-width.) | 
| 
                     | 
                     | Create a named capture. | 
| 
                     | 
                     | Refer back to a previously created named capture. |