|  | Home | Libraries | People | FAQ | More | 
This section describes how to use xpressive to accomplish text manipulation and parsing tasks. If you are looking for detailed information regarding specific components in xpressive, check the Reference section.
xpressive is a regular expression template library. Regular expressions (regexes) can be written as strings that are parsed dynamically at runtime (dynamic regexes), or as expression templates[9] that are parsed at compile-time (static regexes). Dynamic regexes have the advantage that they can be accepted from the user as input at runtime or read from an initialization file. Static regexes have several advantages. Since they are C++ expressions instead of strings, they can be syntax-checked at compile-time. Also, they can naturally refer to code and data elsewhere in your program, giving you the ability to call back into your code from within a regex match. Finally, since they are statically bound, the compiler can generate faster code for static regexes.
xpressive's dual nature is unique and powerful. Static xpressive is a bit like the Spirit Parser Framework. Like Spirit, you can build grammars with static regexes using expression templates. (Unlike Spirit, xpressive does exhaustive backtracking, trying every possibility to find a match for your pattern.) Dynamic xpressive is a bit like Boost.Regex. In fact, xpressive's interface should be familiar to anyone who has used Boost.Regex. xpressive's innovation comes from allowing you to mix and match static and dynamic regexes in the same program, and even in the same expression! You can embed a dynamic regex in a static regex, or vice versa, and the embedded regex will participate fully in the search, back-tracking as needed to make the match succeed.
Enough theory. Let's have a look at Hello World, xpressive style:
#include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace boost::xpressive; int main() { std::string hello( "hello world!" ); sregex rex = sregex::compile( "(\\w+) (\\w+)!" ); smatch what; if( regex_match( hello, what, rex ) ) { std::cout << what[0] << '\n'; // whole match std::cout << what[1] << '\n'; // first capture std::cout << what[2] << '\n'; // second capture } return 0; }
This program outputs the following:
hello world! hello world
        The first thing you'll notice about the code is that all the types in xpressive
        live in the boost::xpressive namespace.
      
| ![[Note]](../../../doc/src/images/note.png) | Note | 
|---|---|
| 
          Most of the rest of the examples in this document will leave off the  | 
        Next, you'll notice the type of the regular expression object is sregex. If you are familiar with Boost.Regex, this is different than what you
        are used to. The "s"
        in "sregex" stands
        for "string", indicating
        that this regex can be used to find patterns in std::string
        objects. I'll discuss this difference and its implications in detail later.
      
Notice how the regex object is initialized:
sregex rex = sregex::compile( "(\\w+) (\\w+)!" );
        To create a regular expression object from a string, you must call a factory
        method such as basic_regex<>::compile()
sregex rex = (s1= +_w) >> ' ' >> (s2= +_w) >> '!';
This describes the same regular expression, except it uses the domain-specific embedded language defined by static xpressive.
        As you can see, static regexes have a syntax that is noticeably different
        than standard Perl syntax. That is because we are constrained by C++'s syntax.
        The biggest difference is the use of >>
        to mean "followed by". For instance, in Perl you can just put sub-expressions
        next to each other:
      
abc
But in C++, there must be an operator separating sub-expressions:
a >> b >> c
        In Perl, parentheses () have
        special meaning. They group, but as a side-effect they also create back-references
        like $1 and $2. In C++, there is no
        way to overload parentheses to give them side-effects. To get the same effect,
        we use the special s1, s2, etc. tokens. Assign to one to create
        a back-reference (known as a sub-match in xpressive).
      
        You'll also notice that the one-or-more repetition operator + has moved from postfix to prefix position.
        That's because C++ doesn't have a postfix +
        operator. So:
      
"\\w+"
is the same as:
+_w
We'll cover all the other differences later.
There are two ways to get xpressive. The first and simplest is to download the latest version of Boost. Just go to http://sf.net/projects/boost and follow the “Download” link.
The second way is by directly accessing the Boost Subversion repository. Just go to http://svn.boost.org/trac/boost/ and follow the instructions there for anonymous Subversion access. The version in Boost Subversion is unstable.
        Xpressive is a header-only template library, which means you don't need to
        alter your build scripts or link to any separate lib file to use it. All
        you need to do is #include <boost/xpressive/xpressive.hpp>.
        If you are only using static regexes, you can improve compile times by only
        including xpressive_static.hpp. Likewise,
        you can include xpressive_dynamic.hpp if
        you only plan on using dynamic regexes.
      
        If you would also like to use semantic actions or custom assertions with
        your static regexes, you will need to additionally include regex_actions.hpp.
      
Xpressive requires Boost version 1.34.1 or higher.
Currently, Boost.Xpressive is known to work on the following compilers:
Check the latest tests results at Boost's Regression Results Page.
| ![[Note]](../../../doc/src/images/note.png) | Note | 
|---|---|
| Please send any questions, comments and bug reports to eric <at> boost-consulting <dot> com. | 
You don't need to know much to start being productive with xpressive. Let's begin with the nickel tour of the types and algorithms xpressive provides.
Table 42.1. xpressive's Tool-Box
| Tool | Description | 
|---|---|
| 
                  Contains a compiled regular expression.  | |
| 
                   | |
| 
                  Checks to see if a string matches a regex. For  | |
| 
                  Searches a string to find a sub-string that matches the regex.
                   | |
| 
                  Given an input string, a regex, and a substitution string,  | |
| 
                  An STL-compatible iterator that makes it easy to find all the places
                  in a string that match a regex. Dereferencing a  | |
| 
                  Like  | |
| 
                  A factory for  | 
Now that you know a bit about the tools xpressive provides, you can pick the right tool for you by answering the following two questions:
Most of the classes in xpressive are templates that are parameterized on the iterator type. xpressive defines some common typedefs to make the job of choosing the right types easier. You can use the table below to find the right types based on the type of your iterator.
Table 42.2. xpressive Typedefs vs. Iterator Types
| std::string::const_iterator | char const * | std::wstring::const_iterator | wchar_t const * | |
|---|---|---|---|---|
| 
                   | 
                   | 
                   | 
                   | |
| 
                   | 
                   | 
                   | 
                   | |
| 
                   | 
                   | 
                   | 
                   | |
| 
                   | 
                   | 
                   | 
                   | |
| 
                   | 
                   | 
                   | 
                   | 
        You should notice the systematic naming convention. Many of these types are
        used together, so the naming convention helps you to use them consistently.
        For instance, if you have a sregex,
        you should also be using a smatch.
      
If you are not using one of those four iterator types, then you can use the templates directly and specify your iterator type.
Do you want to find a pattern once? Many times? Search and replace? xpressive has tools for all that and more. Below is a quick reference:
Table 42.3. Tasks and Tools
| To do this ... | Use this ... | 
|---|---|
| 
                  The  | |
| 
                  The  | |
| 
                  The  | |
| 
                   | 
                  The  | 
| 
                  The  | |
| 
                  The  | 
These algorithms and classes are described in excruciating detail in the Reference section.
| ![[Tip]](../../../doc/src/images/tip.png) | Tip | 
|---|---|
| Try clicking on a task in the table above to see a complete example program that uses xpressive to solve that particular task. | 
        When using xpressive, the first thing you'll do is create a basic_regex<>
The feature that really sets xpressive apart from other C/C++ regular expression libraries is the ability to author a regular expression using C++ expressions. xpressive achieves this through operator overloading, using a technique called expression templates to embed a mini-language dedicated to pattern matching within C++. These "static regexes" have many advantages over their string-based brethren. In particular, static regexes:
Since we compose static regexes using C++ expressions, we are constrained by the rules for legal C++ expressions. Unfortunately, that means that "classic" regular expression syntax cannot always be mapped cleanly into C++. Rather, we map the regex constructs, picking new syntax that is legal C++.
          You create a static regex by assigning one to an object of type basic_regex<>std::string:
        
sregex re = '$' >> +_d >> '.' >> _d >> _d;
Assignment works similarly.
          In static regexes, character and string literals match themselves. For
          instance, in the regex above, '$'
          and '.' match the characters
          '$' and '.'
          respectively. Don't be confused by the fact that $ and
          . are meta-characters in Perl. In xpressive, literals
          always represent themselves.
        
When using literals in static regexes, you must take care that at least one operand is not a literal. For instance, the following are not valid regexes:
sregex re1 = 'a' >> 'b'; // ERROR! sregex re2 = +'a'; // ERROR!
          The two operands to the binary >>
          operator are both literals, and the operand of the unary + operator is also a literal, so these statements
          will call the native C++ binary right-shift and unary plus operators, respectively.
          That's not what we want. To get operator overloading to kick in, at least
          one operand must be a user-defined type. We can use xpressive's as_xpr()
          helper function to "taint" an expression with regex-ness, forcing
          operator overloading to find the correct operators. The two regexes above
          should be written as:
        
sregex re1 = as_xpr('a') >> 'b'; // OK sregex re2 = +as_xpr('a'); // OK
          As you've probably already noticed, sub-expressions in static regexes must
          be separated by the sequencing operator, >>.
          You can read this operator as "followed by".
        
// Match an 'a' followed by a digit sregex re = 'a' >> _d;
          Alternation works just as it does in Perl with the |
          operator. You can read this operator as "or". For example:
        
// match a digit character or a word character one or more times sregex re = +( _d | _w );
          In Perl, parentheses () have
          special meaning. They group, but as a side-effect they also create back-references
          like $1 and $2. In C++, parentheses
          only group -- there is no way to give them side-effects. To get the same
          effect, we use the special s1,
          s2, etc. tokens. Assigning
          to one creates a back-reference. You can then use the back-reference later
          in your expression, like using \1 and \2
          in Perl. For example, consider the following regex, which finds matching
          HTML tags:
        
"<(\\w+)>.*?</\\1>"
In static xpressive, this would be:
'<' >> (s1= +_w) >> '>' >> -*_ >> "</" >> s1 >> '>'
          Notice how you capture a back-reference by assigning to s1,
          and then you use s1 later
          in the pattern to find the matching end tag.
        
| ![[Tip]](../../../doc/src/images/tip.png) | Tip | 
|---|---|
| 
            Grouping without capturing a back-reference
             | 
          Perl lets you make part of your regular expression case-insensitive by
          using the (?i:) pattern modifier. xpressive also has
          a case-insensitivity pattern modifier, called icase.
          You can use it as follows:
        
sregex re = "this" >> icase( "that" );
          In this regular expression, "this"
          will be matched exactly, but "that"
          will be matched irrespective of case.
        
          Case-insensitive regular expressions raise the issue of internationalization:
          how should case-insensitive character comparisons be evaluated? Also, many
          character classes are locale-specific. Which characters are matched by
          digit and which are matched
          by alpha? The answer depends
          on the std::locale object the regular expression
          object is using. By default, all regular expression objects use the global
          locale. You can override the default by using the imbue() pattern modifier, as follows:
        
std::locale my_locale = /* initialize a std::locale object */; sregex re = imbue( my_locale )( +alpha >> +digit );
          This regular expression will evaluate alpha
          and digit according to
          my_locale. See the section
          on Localization
          and Regex Traits for more information about how to customize the
          behavior of your regexes.
        
The table below lists the familiar regex constructs and their equivalents in static xpressive.
Table 42.4. Perl syntax vs. Static xpressive syntax
| Perl | Static xpressive | Meaning | 
|---|---|---|
| 
                     | any character (assuming Perl's /s modifier). | |
| 
                     | 
                     | 
                    sequencing of  | 
| 
                     | 
                     | 
                    alternation of  | 
| 
                     | 
                     | group and capture a back-reference. | 
| 
                     | 
                     | group and do not capture a back-reference. | 
| 
                     | a previously captured back-reference. | |
| 
                     | 
                     | zero or more times, greedy. | 
| 
                     | 
                     | one or more times, greedy. | 
| 
                     | 
                     | zero or one time, greedy. | 
| 
                     | 
                     | 
                    between  | 
| 
                     | 
                     | zero or more times, non-greedy. | 
| 
                     | 
                     | one or more times, non-greedy. | 
| 
                     | 
                     | zero or one time, non-greedy. | 
| 
                     | 
                     | 
                    between  | 
| 
                     | beginning of sequence assertion. | |
| 
                     | end of sequence assertion. | |
| 
                     | word boundary assertion. | |
| 
                     | 
                     | not word boundary assertion. | 
| 
                     | literal newline. | |
| 
                     | 
                     | any character except a literal newline (without Perl's /s modifier). | 
| 
                     | logical newline. | |
| 
                     | 
                     | any single character not a logical newline. | 
| 
                     | a word character, equivalent to set[alnum | '_']. | |
| 
                     | 
                     | not a word character, equivalent to ~set[alnum | '_']. | 
| 
                     | a digit character. | |
| 
                     | 
                     | not a digit character. | 
| 
                     | a space character. | |
| 
                     | 
                     | not a space character. | 
| 
                     | an alpha-numeric character. | |
| 
                     | an alphabetic character. | |
| 
                     | a horizontal white-space character. | |
| 
                     | a control character. | |
| 
                     | a digit character. | |
| 
                     | a graphable character. | |
| 
                     | a lower-case character. | |
| 
                     | a printing character. | |
| 
                     | a punctuation character. | |
| 
                     | a white-space character. | |
| 
                     | an upper-case character. | |
| 
                     | a hexadecimal digit character. | |
| 
                     | 
                     | 
                    characters in range  | 
| 
                     | 
                     | 
                    characters  | 
| 
                     | 
                     | same as above | 
| 
                     | 
                    characters  | |
| 
                     | same as above | |
| 
                     | 
                     | 
                    not characters  | 
| 
                     | 
                     | match stuff disregarding case. | 
| 
                     | 
                     | independent sub-expression, match stuff and turn off backtracking. | 
| 
                     | 
                     | positive look-ahead assertion, match if before stuff but don't include stuff in the match. | 
| 
                     | 
                     | negative look-ahead assertion, match if not before stuff. | 
| 
                     | 
                     | positive look-behind assertion, match if after stuff but don't include stuff in the match. (stuff must be constant-width.) | 
| 
                     | 
                     | negative look-behind assertion, match if not after stuff. (stuff must be constant-width.) | 
| 
                     | 
                     | Create a named capture. | 
| 
                     | 
                     | Refer back to a previously created named capture. | 
          
        
Static regexes are dandy, but sometimes you need something a bit more ... dynamic. Imagine you are developing a text editor with a regex search/replace feature. You need to accept a regular expression from the end user as input at run-time. There should be a way to parse a string into a regular expression. That's what xpressive's dynamic regexes are for. They are built from the same core components as their static counterparts, but they are late-bound so you can specify them at run-time.
          There are two ways to create a dynamic regex: with the basic_regex<>::compile()regex_compiler<>basic_regex<>::compile()regex_compiler<>regex_compiler<>
          Here is an example of using basic_regex<>::compile():
        
sregex re = sregex::compile( "this|that", regex_constants::icase );
          Here is the same example using regex_compiler<>
sregex_compiler compiler; sregex re = compiler.compile( "this|that", regex_constants::icase );
          basic_regex<>::compile()regex_compiler<>
Since the dynamic syntax is not constrained by the rules for valid C++ expressions, we are free to use familiar syntax for dynamic regexes. For this reason, the syntax used by xpressive for dynamic regexes follows the lead set by John Maddock's proposal to add regular expressions to the Standard Library. It is essentially the syntax standardized by ECMAScript, with minor changes in support of internationalization.
Since the syntax is documented exhaustively elsewhere, I will simply refer you to the existing standards, rather than duplicate the specification here.
          As with static regexes, dynamic regexes support internationalization by
          allowing you to specify a different std::locale.
          To do this, you must use regex_compiler<>regex_compiler<>imbue()
          function. After you have imbued a regex_compiler<>std::locale,
          all regex objects compiled by that regex_compiler<>
std::locale my_locale = /* initialize your locale object here */; sregex_compiler compiler; compiler.imbue( my_locale ); sregex re = compiler.compile( "\\w+|\\d+" );
          This regex will use my_locale
          when evaluating the intrinsic character sets "\\w"
          and "\\d".
        
        Once you have created a regex object, you can use the regex_match()regex_search()regex_match()regex_search()
        The regex_match()
| ![[Warning]](../../../doc/src/images/warning.png) | Warning | 
|---|---|
| 
          The  | 
        The input can be a bidirectional range such as std::string,
        a C-style null-terminated string or a pair of iterators. In all cases, the
        type of the iterator used to traverse the input sequence must match the iterator
        type used to declare the regex object. (You can use the table in the Quick
        Start to find the correct regex type for your iterator.)
      
cregex cre = +_w; // this regex can match C-style strings sregex sre = +_w; // this regex can match std::strings if( regex_match( "hello", cre ) ) // OK { /*...*/ } if( regex_match( std::string("hello"), sre ) ) // OK { /*...*/ } if( regex_match( "hello", sre ) ) // ERROR! iterator mis-match! { /*...*/ }
        The regex_match()match_results<>regex_match()match_results<>
cmatch what; cregex cre = +(s1= _w); // store the results of the regex_match in "what" if( regex_match( "hello", what, cre ) ) { std::cout << what[1] << '\n'; // prints "o" }
        The regex_match()match_flag_typematch_flag_typematch_flag_type
std::string str("hello"); sregex sre = bol >> +_w; // match_not_bol means that "bol" should not match at [begin,begin) if( regex_match( str.begin(), str.end(), sre, regex_constants::match_not_bol ) ) { // should never get here!!! }
        Click here
        to see a complete example program that shows how to use regex_match()regex_match()
        Use regex_search()regex_search()
        In all other regards, regex_search()regex_match()std::string, C-style null-terminated strings
        or iterator ranges. The same care must be taken to ensure that the iterator
        type of your regex matches the iterator type of your input sequence. As with
        regex_match()match_results<>match_flag_type
        Click here
        to see a complete example program that shows how to use regex_search()regex_search()
        Sometimes, it is not enough to know simply whether a regex_match()regex_search()match_results<>regex_match()regex_search()match_results<>$1, $2,
        etc. In xpressive, they are objects of type sub_match<>match_results<>sub_match<>
        So, you've passed a match_results<>match_results<>sub_match<>match_results<>
        The table below shows how to access the information stored in a match_results<>what.
      
Table 42.5. match_results<> Accessors
| Accessor | Effects | 
|---|---|
| 
                   | Returns the number of sub-matches, which is always greater than zero after a successful match because the full match is stored in the zero-th sub-match. | 
| 
                   | Returns the n-th sub-match. | 
| 
                   | 
                  Returns the length of the n-th sub-match.
                  Same as  | 
| 
                   | Returns the offset into the input sequence at which the n-th sub-match begins. | 
| 
                   | 
                  Returns a  | 
| 
                   | 
                  Returns a  | 
| 
                   | 
                  Returns a  | 
| 
                   | 
                  Returns the  | 
        There is more you can do with the match_results<>
        When you index into a match_results<>sub_match<>sub_match<>
template< class BidirectionalIterator > struct sub_match : std::pair< BidirectionalIterator, BidirectionalIterator > { bool matched; // ... };
        Since it inherits publicaly from std::pair<>, sub_match<>first and second data members of type BidirectionalIterator. These are the beginning
        and end of the sub-sequence this sub_match<>sub_match<>matched
        data member, which is true if this sub_match<>
        The following table shows how you might access the information stored in
        a sub_match<>sub.
      
Table 42.6. sub_match<> Accessors
| Accessor | Effects | 
|---|---|
| 
                   | 
                  Returns the length of the sub-match. Same as  | 
| 
                   | 
                  Returns a  | 
| 
                   | 
                  Performs a string comparison between the sub-match and  | 
 Results Invalidation
 Results Invalidation  
      
        Results are stored as iterators into the input sequence. Anything which invalidates
        the input sequence will invalidate the match results. For instance, if you
        match a std::string object, the results are only valid
        until your next call to a non-const member function of that std::string
        object. After that, the results held by the match_results<>
        Regular expressions are not only good for searching text; they're good at
        manipulating it. And one of the most common text manipulation
        tasks is search-and-replace. xpressive provides the regex_replace()
        Performing search-and-replace using regex_replace()regex_replace()std::string and returns the result in a new
        container of the same type. Others accept the input as a null terminated
        string and return a std::string. Still others accept the input sequence
        as a pair of iterators and writes the result into an output iterator. The
        substitution may be specified as a string with format sequences or as a formatter
        object. Below are some simple examples of using string-based substitutions.
      
std::string input("This is his face"); sregex re = as_xpr("his"); // find all occurrences of "his" ... std::string format("her"); // ... and replace them with "her" // use the version of regex_replace() that operates on strings std::string output = regex_replace( input, re, format ); std::cout << output << '\n'; // use the version of regex_replace() that operates on iterators std::ostream_iterator< char > out_iter( std::cout ); regex_replace( out_iter, input.begin(), input.end(), re, format );
The above program prints out the following:
Ther is her face Ther is her face
        Notice that all the occurrences of "his"
        have been replaced with "her".
      
        Click here
        to see a complete example program that shows how to use regex_replace()regex_replace()
        The regex_replace()
Table 42.7. Format Flags
| Flag | Meaning | 
|---|---|
| 
                   | Recognize the ECMA-262 format sequences (see below). | 
| 
                   | Only replace the first match, not all of them. | 
| 
                   | Don't copy the parts of the input sequence that didn't match the regex to the output sequence. | 
| 
                   | Treat the format string as a literal; that is, don't recognize any escape sequences. | 
| 
                   | Recognize the Perl format sequences (see below). | 
| 
                   | Recognize the sed format sequences (see below). | 
| 
                   | In addition to the Perl format sequences, recognize some Boost-specific format sequences. | 
        These flags live in the xpressive::regex_constants
        namespace. If the substitution parameter is a function object instead of
        a string, the flags format_literal,
        format_perl, format_sed, and format_all
        are ignored.
      
When you haven't specified a substitution string dialect with one of the format flags above, you get the dialect defined by ECMA-262, the standard for ECMAScript. The table below shows the escape sequences recognized in ECMA-262 mode.
Table 42.8. Format Escape Sequences
| Escape Sequence | Meaning | 
|---|---|
| 
                   | the corresponding sub-match | 
| 
                   | the full match | 
| 
                   | the match prefix | 
| 
                   | the match suffix | 
| 
                   | 
                  a literal  | 
        Any other sequence beginning with '$'
        simply represents itself. For example, if the format string were "$a" then "$a"
        would be inserted into the output sequence.
      
        When specifying the format_sed
        flag to regex_replace()
Table 42.9. Sed Format Escape Sequences
| Escape Sequence | Meaning | 
|---|---|
| 
                   | The corresponding sub-match | 
| 
                   | the full match | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  The control character  | 
        When specifying the format_perl
        flag to regex_replace()
Table 42.10. Perl Format Escape Sequences
| Escape Sequence | Meaning | 
|---|---|
| 
                   | the corresponding sub-match | 
| 
                   | the full match | 
| 
                   | the match prefix | 
| 
                   | the match suffix | 
| 
                   | 
                  a literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  A literal  | 
| 
                   | 
                  The control character  | 
| 
                   | Make the next character lowercase | 
| 
                   | 
                  Make the rest of the substitution lowercase until the next  | 
| 
                   | Make the next character uppercase | 
| 
                   | 
                  Make the rest of the substitution uppercase until the next  | 
| 
                   | 
                  Terminate  | 
| 
                   | The corresponding sub-match | 
| 
                   | The named backref name | 
        When specifying the format_all
        flag to regex_replace()format_perl. In addition, conditional expressions
        of the following form are recognized:
      
?Ntrue-expression:false-expression
        where N is a decimal digit representing a sub-match.
        If the corresponding sub-match participated in the full match, then the substitution
        is true-expression. Otherwise, it is false-expression.
        In this mode, you can use parens () for grouping. If you
        want a literal paren, you must escape it as \(.
      
        Format strings are not always expressive enough for all your text substitution
        needs. Consider the simple example of wanting to map input strings to output
        strings, as you may want to do with environment variables. Rather than a
        format string, for this you would use a formatter object.
        Consider the following code, which finds embedded environment variables of
        the form "$(XYZ)" and
        computes the substitution string by looking up the environment variable in
        a map.
      
#include <map> #include <string> #include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace boost; using namespace xpressive; std::map<std::string, std::string> env; std::string const &format_fun(smatch const &what) { return env[what[1].str()]; } int main() { env["X"] = "this"; env["Y"] = "that"; std::string input("\"$(X)\" has the value \"$(Y)\""); // replace strings like "$(XYZ)" with the result of env["XYZ"] sregex envar = "$(" >> (s1 = +_w) >> ')'; std::string output = regex_replace(input, envar, format_fun); std::cout << output << std::endl; }
        In this case, we use a function, format_fun() to compute the substitution string on the
        fly. It accepts a match_results<>format_fun() uses the first submatch as a key into the
        global env map. The above
        code displays:
      
"this" has the value "that"
The formatter need not be an ordinary function. It may be an object of class type. And rather than return a string, it may accept an output iterator into which it writes the substitution. Consider the following, which is functionally equivalent to the above.
#include <map> #include <string> #include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace boost; using namespace xpressive; struct formatter { typedef std::map<std::string, std::string> env_map; env_map env; template<typename Out> Out operator()(smatch const &what, Out out) const { env_map::const_iterator where = env.find(what[1]); if(where != env.end()) { std::string const &sub = where->second; out = std::copy(sub.begin(), sub.end(), out); } return out; } }; int main() { formatter fmt; fmt.env["X"] = "this"; fmt.env["Y"] = "that"; std::string input("\"$(X)\" has the value \"$(Y)\""); sregex envar = "$(" >> (s1 = +_w) >> ')'; std::string output = regex_replace(input, envar, fmt); std::cout << output << std::endl; }
        The formatter must be a callable object -- a function or a function object
        -- that has one of three possible signatures, detailed in the table below.
        For the table, fmt is a function
        pointer or function object, what
        is a match_results<>out is an OutputIterator,
        and flags is a value of
        regex_constants::match_flag_type:
      
Table 42.11. Formatter Signatures
| Formatter Invocation | Return Type | Semantics | 
|---|---|---|
| 
                   | 
                  Range of characters (e.g.  | The string matched by the regex is replaced with the string returned by the formatter. | 
| 
                   | OutputIterator | 
                  The formatter writes the replacement string into  | 
| 
                   | OutputIterator | 
                  The formatter writes the replacement string into  | 
        In addition to format strings and formatter objects,
        regex_replace()regex_replace()
#include <map> #include <string> #include <iostream> #include <boost/xpressive/xpressive.hpp> #include <boost/xpressive/regex_actions.hpp> using namespace boost::xpressive; int main() { std::map<std::string, std::string> env; env["X"] = "this"; env["Y"] = "that"; std::string input("\"$(X)\" has the value \"$(Y)\""); sregex envar = "$(" >> (s1 = +_w) >> ')'; std::string output = regex_replace(input, envar, ref(env)[s1]); std::cout << output << std::endl; }
        In the above, the formatter expression is ref(env)[s1]. This
        means to use the value of the first submatch, s1,
        as a key into the env map.
        The purpose of xpressive::ref()
        here is to make the reference to the env
        local variable lazy so that the index operation is deferred
        until we know what to replace s1
        with.
      
        regex_token_iterator<>regex_token_iterator<>
        You initialize a regex_token_iterator<>regex_token_iterator<>regex_search()regex_token_iterator<>std::basic_string<>. Which string it returns depends
        on the configuration parameters. By default it returns a string corresponding
        to the full match, but it could also return a string corresponding to a particular
        marked sub-expression, or even the part of the sequence that didn't
        match. When you increment the regex_token_iterator<>
        As you can see, regex_token_iterator<>
        This example uses regex_token_iterator<>
std::string input("This is his face"); sregex re = +_w; // find a word // iterate over all the words in the input sregex_token_iterator begin( input.begin(), input.end(), re ), end; // write all the words to std::cout std::ostream_iterator< std::string > out_iter( std::cout, "\n" ); std::copy( begin, end, out_iter );
This program displays the following:
This is his face
        This example also uses regex_token_iterator<>-1 as the last parameter to the regex_token_iterator<>
std::string input("This is his face"); sregex re = +_s; // find white space // iterate over all non-white space in the input. Note the -1 below: sregex_token_iterator begin( input.begin(), input.end(), re, -1 ), end; // write all the words to std::cout std::ostream_iterator< std::string > out_iter( std::cout, "\n" ); std::copy( begin, end, out_iter );
This program displays the following:
This is his face
        This example also uses regex_token_iterator<>N
        as the last parameter to the regex_token_iterator<>N-th marked sub-expression of each
        match.
      
std::string input("01/02/2003 blahblah 04/23/1999 blahblah 11/13/1981"); sregex re = sregex::compile("(\\d{2})/(\\d{2})/(\\d{4})"); // find a date // iterate over all the years in the input. Note the 3 below, corresponding to the 3rd sub-expression: sregex_token_iterator begin( input.begin(), input.end(), re, 3 ), end; // write all the words to std::cout std::ostream_iterator< std::string > out_iter( std::cout, "\n" ); std::copy( begin, end, out_iter );
This program displays the following:
2003 1999 1981
        This example is like the previous one, except that instead of tokenizing
        just the years, this program turns the days, months and years into tokens.
        When we pass an array of integers {I,J,...}
        as the last parameter to the regex_token_iterator<>I-th,
        J-th, etc. marked sub-expression
        of each match.
      
std::string input("01/02/2003 blahblah 04/23/1999 blahblah 11/13/1981"); sregex re = sregex::compile("(\\d{2})/(\\d{2})/(\\d{4})"); // find a date // iterate over the days, months and years in the input int const sub_matches[] = { 2, 1, 3 }; // day, month, year sregex_token_iterator begin( input.begin(), input.end(), re, sub_matches ), end; // write all the words to std::cout std::ostream_iterator< std::string > out_iter( std::cout, "\n" ); std::copy( begin, end, out_iter );
This program displays the following:
02 01 2003 23 04 1999 13 11 1981
        The sub_matches array instructs
        the regex_token_iterator<>regex_search()
For complicated regular expressions, dealing with numbered captures can be a pain. Counting left parentheses to figure out which capture to reference is no fun. Less fun is the fact that merely editing a regular expression could cause a capture to be assigned a new number, invaliding code that refers back to it by the old number.
Other regular expression engines solve this problem with a feature called named captures. This feature allows you to assign a name to a capture, and to refer back to the capture by name rather by number. Xpressive also supports named captures, both in dynamic and in static regexes.
        For dynamic regular expressions, xpressive follows the lead of other popular
        regex engines with the syntax of named captures. You can create a named capture
        with "(?P<xxx>...)"
        and refer back to that capture with "(?P=xxx)".
        Here, for instance, is a regular expression that creates a named capture
        and refers back to it:
      
// Create a named capture called "char" that matches a single // character and refer back to that capture by name. sregex rx = sregex::compile("(?P<char>.)(?P=char)");
The effect of the above regular expression is to find the first doubled character.
        Once you have executed a match or search operation using a regex with named
        captures, you can access the named capture through the match_results<>
std::string str("tweet"); sregex rx = sregex::compile("(?P<char>.)(?P=char)"); smatch what; if(regex_search(str, what, rx)) { std::cout << "char = " << what["char"] << std::endl; }
The above code displays:
char = e
        You can also refer back to a named capture from within a substitution string.
        The syntax for that is "\\g<xxx>".
        Below is some code that demonstrates how to use named captures when doing
        string substitution.
      
std::string str("tweet"); sregex rx = sregex::compile("(?P<char>.)(?P=char)"); str = regex_replace(str, rx, "**\\g<char>**", regex_constants::format_perl); std::cout << str << std::endl;
        Notice that you have to specify format_perl
        when using named captures. Only the perl syntax recognizes the "\\g<xxx>" syntax. The above
        code displays:
      
tw**e**t
        If you're using static regular expressions, creating and using named captures
        is even easier. You can use the mark_tags1, s2 and friends, but with a
        name that is more meaningful. Below is how the above example would look using
        static regexes:
      
mark_tag char_(1); // char_ is now a synonym for s1 sregex rx = (char_= _) >> char_;
        After a match operation, you can use the mark_tag
        to index into the match_results<>
std::string str("tweet"); mark_tag char_(1); sregex rx = (char_= _) >> char_; smatch what; if(regex_search(str, what, rx)) { std::cout << what[char_] << std::endl; }
The above code displays:
char = e
        When doing string substitutions with regex_replace()
std::string str("tweet"); mark_tag char_(1); sregex rx = (char_= _) >> char_; str = regex_replace(str, rx, "**" + char_ + "**"); std::cout << str << std::endl;
The above code displays:
tw**e**t
| ![[Note]](../../../doc/src/images/note.png) | Note | 
|---|---|
| 
          You need to include  | 
One of the key benefits of representing regexes as C++ expressions is the ability to easily refer to other C++ code and data from within the regex. This enables programming idioms that are not possible with other regular expression libraries. Of particular note is the ability for one regex to refer to another regex, allowing you to build grammars out of regular expressions. This section describes how to embed one regex in another by value and by reference, how regex objects behave when they refer to other regexes, and how to access the tree of results after a successful parse.
        The basic_regex<>
Consider a text editor that has a regex-find feature with a whole-word option. You can implement this with xpressive as follows:
find_dialog dlg; if( dialog_ok == dlg.do_modal() ) { std::string pattern = dlg.get_text(); // the pattern the user entered bool whole_word = dlg.whole_word.is_checked(); // did the user select the whole-word option? sregex re = sregex::compile( pattern ); // try to compile the pattern if( whole_word ) { // wrap the regex in begin-word / end-word assertions re = bow >> re >> eow; } // ... use re ... }
Look closely at this line:
// wrap the regex in begin-word / end-word assertions re = bow >> re >> eow;
This line creates a new regex that embeds the old regex by value. Then, the new regex is assigned back to the original regex. Since a copy of the old regex was made on the right-hand side, this works as you might expect: the new regex has the behavior of the old regex wrapped in begin- and end-word assertions.
| ![[Note]](../../../doc/src/images/note.png) | Note | 
|---|---|
| 
          Note that  | 
If you want to be able to build recursive regular expressions and context-free grammars, embedding a regex by value is not enough. You need to be able to make your regular expressions self-referential. Most regular expression engines don't give you that power, but xpressive does.
| ![[Tip]](../../../doc/src/images/tip.png) | Tip | 
|---|---|
| The theoretical computer scientists out there will correctly point out that a self-referential regular expression is not "regular", so in the strict sense, xpressive isn't really a regular expression engine at all. But as Larry Wall once said, "the term [regular expression] has grown with the capabilities of our pattern matching engines, so I'm not going to try to fight linguistic necessity here." | 
        Consider the following code, which uses the by_ref() helper to define a recursive regular expression
        that matches balanced, nested parentheses:
      
sregex parentheses; parentheses // A balanced set of parentheses ... = '(' // is an opening parenthesis ... >> // followed by ... *( // zero or more ... keep( +~(set='(',')') ) // of a bunch of things that are not parentheses ... | // or ... by_ref(parentheses) // a balanced set of parentheses ) // (ooh, recursion!) ... >> // followed by ... ')' // a closing parenthesis ;
        Matching balanced, nested tags is an important text processing task, and
        it is one that "classic" regular expressions cannot do. The by_ref()
        helper makes it possible. It allows one regex object to be embedded in another
        by reference. Since the right-hand side holds parentheses by reference, assigning the
        right-hand side back to parentheses
        creates a cycle, which will execute recursively.
      
Once we allow self-reference in our regular expressions, the genie is out of the bottle and all manner of fun things are possible. In particular, we can now build grammars out of regular expressions. Let's have a look at the text-book grammar example: the humble calculator.
sregex group, factor, term, expression; group = '(' >> by_ref(expression) >> ')'; factor = +_d | group; term = factor >> *(('*' >> factor) | ('/' >> factor)); expression = term >> *(('+' >> term) | ('-' >> term));
        The regex expression defined
        above does something rather remarkable for a regular expression: it matches
        mathematical expressions. For example, if the input string were "foo 9*(10+3) bar", this pattern
        would match "9*(10+3)".
        It only matches well-formed mathematical expressions, where the parentheses
        are balanced and the infix operators have two arguments each. Don't try this
        with just any regular expression engine!
      
        Let's take a closer look at this regular expression grammar. Notice that
        it is cyclic: expression
        is implemented in terms of term,
        which is implemented in terms of factor,
        which is implemented in terms of group,
        which is implemented in terms of expression,
        closing the loop. In general, the way to define a cyclic grammar is to forward-declare
        the regex objects and embed by reference those regular expressions that have
        not yet been initialized. In the above grammar, there is only one place where
        we need to reference a regex object that has not yet been initialized: the
        definition of group. In that
        place, we use by_ref()
        to embed expression by reference.
        In all other places, it is sufficient to embed the other regex objects by
        value, since they have already been initialized and their values will not
        change.
      
| ![[Tip]](../../../doc/src/images/tip.png) | Tip | 
|---|---|
| 
          Embed by value if possible  | 
        Using regex_compiler<>regex_compiler<>
        You can create a named dynamic regex by prefacing your regex with "(?$name=)", where name
        is the name of the regex. You can refer to a named regex from another regex
        with "(?$name)". The
        named regex does not need to exist yet at the time it is referenced in another
        regex, but it must exist by the time you use the regex.
      
Below is a code fragment that uses dynamic regex grammars to implement the calculator example from above.
using namespace boost::xpressive; using namespace regex_constants; sregex expr; { sregex_compiler compiler; syntax_option_type x = ignore_white_space; compiler.compile("(? $group = ) \\( (? $expr ) \\) ", x); compiler.compile("(? $factor = ) \\d+ | (? $group ) ", x); compiler.compile("(? $term = ) (? $factor )" " ( \\* (? $factor ) | / (? $factor ) )* ", x); expr = compiler.compile("(? $expr = ) (? $term )" " ( \\+ (? $term ) | - (? $term ) )* ", x); } std::string str("foo 9*(10+3) bar"); smatch what; if(regex_search(str, what, expr)) { // This prints "9*(10+3)": std::cout << what[0] << std::endl; }
As with static regex grammars, nested regex invocations create nested match results (see Nested Results below). The result is a complete parse tree for string that matched. Unlike static regexes, dynamic regexes are always embedded by reference, not by value.
The calculator examples above raises a number of very complicated memory-management issues. Each of the four regex objects refer to each other, some directly and some indirectly, some by value and some by reference. What if we were to return one of them from a function and let the others go out of scope? What becomes of the references? The answer is that the regex objects are internally reference counted, such that they keep their referenced regex objects alive as long as they need them. So passing a regex object by value is never a problem, even if it refers to other regex objects that have gone out of scope.
        Those of you who have dealt with reference counting are probably familiar
        with its Achilles Heel: cyclic references. If regex objects are reference
        counted, what happens to cycles like the one created in the calculator examples?
        Are they leaked? The answer is no, they are not leaked. The basic_regex<>
Nested regular expressions raise the issue of sub-match scoping. If both the inner and outer regex write to and read from the same sub-match vector, chaos would ensue. The inner regex would stomp on the sub-matches written by the outer regex. For example, what does this do?
sregex inner = sregex::compile( "(.)\\1" ); sregex outer = (s1= _) >> inner >> s1;
The author probably didn't intend for the inner regex to overwrite the sub-match written by the outer regex. The problem is particularly acute when the inner regex is accepted from the user as input. The author has no way of knowing whether the inner regex will stomp the sub-match vector or not. This is clearly not acceptable.
        Instead, what actually happens is that each invocation of a nested regex
        gets its own scope. Sub-matches belong to that scope. That is, each nested
        regex invocation gets its own copy of the sub-match vector to play with,
        so there is no way for an inner regex to stomp on the sub-matches of an outer
        regex. So, for example, the regex outer
        defined above would match "ABBA",
        as it should.
      
        If nested regexes have their own sub-matches, there should be a way to access
        them after a successful match. In fact, there is. After a regex_match()regex_search()match_results<>match_results<>nested_results() member function that returns an ordered
        sequence of match_results<>
Take as an example the regex for balanced, nested parentheses we saw earlier:
sregex parentheses; parentheses = '(' >> *( keep( +~(set='(',')') ) | by_ref(parentheses) ) >> ')'; smatch what; std::string str( "blah blah( a(b)c (c(e)f (g)h )i (j)6 )blah" ); if( regex_search( str, what, parentheses ) ) { // display the whole match std::cout << what[0] << '\n'; // display the nested results std::for_each( what.nested_results().begin(), what.nested_results().end(), output_nested_results() ); }
This program displays the following:
( a(b)c (c(e)f (g)h )i (j)6 )
    (b)
    (c(e)f (g)h )
        (e)
        (g)
    (j)
Here you can see how the results are nested and that they are stored in the order in which they are found.
| ![[Tip]](../../../doc/src/images/tip.png) | Tip | 
|---|---|
| See the definition of output_nested_results in the Examples section. | 
        Sometimes a regex will have several nested regex objects, and you want to
        know which result corresponds to which regex object. That's where basic_regex<>::regex_id()
        and match_results<>::regex_id()
        come in handy. When iterating over the nested results, you can compare the
        regex id from the results to the id of the regex object you're interested
        in.
      
        To make this a bit easier, xpressive provides a predicate to make it simple
        to iterate over just the results that correspond to a certain nested regex.
        It is called regex_id_filter_predicate,
        and it is intended to be used with Boost.Iterator.
        You can use it as follows:
      
sregex name = +alpha; sregex integer = +_d; sregex re = *( *_s >> ( name | integer ) ); smatch what; std::string str( "marsha 123 jan 456 cindy 789" ); if( regex_match( str, what, re ) ) { smatch::nested_results_type::const_iterator begin = what.nested_results().begin(); smatch::nested_results_type::const_iterator end = what.nested_results().end(); // declare filter predicates to select just the names or the integers sregex_id_filter_predicate name_id( name.regex_id() ); sregex_id_filter_predicate integer_id( integer.regex_id() ); // iterate over only the results from the name regex std::for_each( boost::make_filter_iterator( name_id, begin, end ), boost::make_filter_iterator( name_id, end, end ), output_result ); std::cout << '\n'; // iterate over only the results from the integer regex std::for_each( boost::make_filter_iterator( integer_id, begin, end ), boost::make_filter_iterator( integer_id, end, end ), output_result ); }
        where output_results is a
        simple function that takes a smatch
        and displays the full match. Notice how we use the regex_id_filter_predicate
        together with basic_regex<>::regex_id() and boost::make_filter_iterator() from the Boost.Iterator
        to select only those results corresponding to a particular nested regex.
        This program displays the following:
      
marsha jan cindy 123 456 789
        Imagine you want to parse an input string and build a std::map<>
        from it. For something like that, matching a regular expression isn't enough.
        You want to do something when parts of your regular
        expression match. Xpressive lets you attach semantic actions to parts of
        your static regular expressions. This section shows you how.
      
        Consider the following code, which uses xpressive's semantic actions to parse
        a string of word/integer pairs and stuffs them into a std::map<>.
        It is described below.
      
#include <string> #include <iostream> #include <boost/xpressive/xpressive.hpp> #include <boost/xpressive/regex_actions.hpp> using namespace boost::xpressive; int main() { std::map<std::string, int> result; std::string str("aaa=>1 bbb=>23 ccc=>456"); // Match a word and an integer, separated by =>, // and then stuff the result into a std::map<> sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) ) [ ref(result)[s1] = as<int>(s2) ]; // Match one or more word/integer pairs, separated // by whitespace. sregex rx = pair >> *(+_s >> pair); if(regex_match(str, rx)) { std::cout << result["aaa"] << '\n'; std::cout << result["bbb"] << '\n'; std::cout << result["ccc"] << '\n'; } return 0; }
This program prints the following:
1 23 456
        The regular expression pair
        has two parts: the pattern and the action. The pattern says to match a word,
        capturing it in sub-match 1, and an integer, capturing it in sub-match 2,
        separated by "=>".
        The action is the part in square brackets: [
        ref(result)[s1] =
        as<int>(s2) ]. It says
        to take sub-match one and use it to index into the results
        map, and assign to it the result of converting sub-match 2 to an integer.
      
| ![[Note]](../../../doc/src/images/note.png) | Note | 
|---|---|
| 
          To use semantic actions with your static regexes, you must  | 
        How does this work? Just as the rest of the static regular expression, the
        part between brackets is an expression template. It encodes the action and
        executes it later. The expression ref(result) creates a lazy reference to the result object. The larger expression ref(result)[s1]
        is a lazy map index operation. Later, when this action is getting executed,
        s1 gets replaced with the
        first sub_match<>as<int>(s2) gets executed, s2
        is replaced with the second sub_match<>as<>
        action converts its argument to the requested type using Boost.Lexical_cast.
        The effect of the whole action is to insert a new word/integer pair into
        the map.
      
| ![[Note]](../../../doc/src/images/note.png) | Note | 
|---|---|
| 
          There is an important difference between the function  | 
        In addition to the sub-match placeholders s1,
        s2, etc., you can also use
        the placeholder _ within
        an action to refer back to the string matched by the sub-expression to which
        the action is attached. For instance, you can use the following regex to
        match a bunch of digits, interpret them as an integer and assign the result
        to a local variable:
      
int i = 0; // Here, _ refers back to all the // characters matched by (+_d) sregex rex = (+_d)[ ref(i) = as<int>(_) ];
What does it mean, exactly, to attach an action to part of a regular expression and perform a match? When does the action execute? If the action is part of a repeated sub-expression, does the action execute once or many times? And if the sub-expression initially matches, but ultimately fails because the rest of the regular expression fails to match, is the action executed at all?
        The answer is that by default, actions are executed lazily.
        When a sub-expression matches a string, its action is placed on a queue,
        along with the current values of any sub-matches to which the action refers.
        If the match algorithm must backtrack, actions are popped off the queue as
        necessary. Only after the entire regex has matched successfully are the actions
        actually exeucted. They are executed all at once, in the order in which they
        were added to the queue, as the last step before regex_match()
For example, consider the following regex that increments a counter whenever it finds a digit.
int i = 0; std::string str("1!2!3?"); // count the exciting digits, but not the // questionable ones. sregex rex = +( _d [ ++ref(i) ] >> '!' ); regex_search(str, rex); assert( i == 2 );
        The action ++ref(i)
        is queued three times: once for each found digit. But it is only executed
        twice: once for each digit that precedes a '!'
        character. When the '?' character
        is encountered, the match algorithm backtracks, removing the final action
        from the queue.
      
        When you want semantic actions to execute immediately, you can wrap the sub-expression
        containing the action in a keep()keep()
        turns off back-tracking for its sub-expression, but it also causes any actions
        queued by the sub-expression to execute at the end of the keep(). It is as if the sub-expression in the
        keep()
        were compiled into an independent regex object, and matching the keep()
        is like a separate invocation of regex_search(). It matches characters and executes actions
        but never backtracks or unwinds. For example, imagine the above example had
        been written as follows:
      
int i = 0; std::string str("1!2!3?"); // count all the digits. sregex rex = +( keep( _d [ ++ref(i) ] ) >> '!' ); regex_search(str, rex); assert( i == 3 );
        We have wrapped the sub-expression _d
        [ ++ref(i) ] in keep().
        Now, whenever this regex matches a digit, the action will be queued and then
        immediately executed before we try to match a '!'
        character. In this case, the action executes three times.
      
| ![[Note]](../../../doc/src/images/note.png) | Note | 
|---|---|
| 
          Like  | 
So far, we've seen how to write semantic actions consisting of variables and operators. But what if you want to be able to call a function from a semantic action? Xpressive provides a mechanism to do this.
        The first step is to define a function object type. Here, for instance, is
        a function object type that calls push() on its argument:
      
struct push_impl { // Result type, needed for tr1::result_of typedef void result_type; template<typename Sequence, typename Value> void operator()(Sequence &seq, Value const &val) const { seq.push(val); } };
        The next step is to use xpressive's function<> template to define a function object
        named push:
      
// Global "push" function object. function<push_impl>::type const push = {{}};
        The initialization looks a bit odd, but this is because push
        is being statically initialized. That means it doesn't need to be constructed
        at runtime. We can use push
        in semantic actions as follows:
      
std::stack<int> ints; // Match digits, cast them to an int // and push it on the stack. sregex rex = (+_d)[push(ref(ints), as<int>(_))];
You'll notice that doing it this way causes member function invocations to look like ordinary function invocations. You can choose to write your semantic action in a different way that makes it look a bit more like a member function call:
sregex rex = (+_d)[ref(ints)->*push(as<int>(_))];
        Xpressive recognizes the use of the ->*
        and treats this expression exactly the same as the one above.
      
        When your function object must return a type that depends on its arguments,
        you can use a result<>
        member template instead of the result_type
        typedef. Here, for example, is a first
        function object that returns the first
        member of a std::pair<>
        or sub_match<>
// Function object that returns the // first element of a pair. struct first_impl { template<typename Sig> struct result {}; template<typename This, typename Pair> struct result<This(Pair)> { typedef typename remove_reference<Pair> ::type::first_type type; }; template<typename Pair> typename Pair::first_type operator()(Pair const &p) const { return p.first; } }; // OK, use as first(s1) to get the begin iterator // of the sub-match referred to by s1. function<first_impl>::type const first = {{}};
        As we've seen in the examples above, we can refer to local variables within
        an actions using xpressive::ref().
        Any such variables are held by reference by the regular expression, and care
        should be taken to avoid letting those references dangle. For instance, in
        the following code, the reference to i
        is left to dangle when bad_voodoo() returns:
      
sregex bad_voodoo() { int i = 0; sregex rex = +( _d [ ++ref(i) ] >> '!' ); // ERROR! rex refers by reference to a local // variable, which will dangle after bad_voodoo() // returns. return rex; }
When writing semantic actions, it is your responsibility to make sure that all the references do not dangle. One way to do that would be to make the variables shared pointers that are held by the regex by value.
sregex good_voodoo(boost::shared_ptr<int> pi) { // Use val() to hold the shared_ptr by value: sregex rex = +( _d [ ++*val(pi) ] >> '!' ); // OK, rex holds a reference count to the integer. return rex; }
        In the above code, we use xpressive::val()
        to hold the shared pointer by value. That's not normally necessary because
        local variables appearing in actions are held by value by default, but in
        this case, it is necessary. Had we written the action as ++*pi, it would have executed immediately.
        That's because ++*pi
        is not an expression template, but ++*val(pi) is.
      
        It can be tedious to wrap all your variables in ref() and val() in your semantic actions. Xpressive provides
        the reference<>
        and value<>
        templates to make things easier. The following table shows the equivalencies:
      
Table 42.12. reference<> and value<>
| This ... | ... is equivalent to this ... | 
|---|---|
| 
 int i = 0; sregex rex = +( _d [ ++ref(i) ] >> '!' ); 
 | 
 int i = 0; reference<int> ri(i); sregex rex = +( _d [ ++ri ] >> '!' ); 
 | 
| 
 boost::shared_ptr<int> pi(new int(0)); sregex rex = +( _d [ ++*val(pi) ] >> '!' ); 
 | 
 boost::shared_ptr<int> pi(new int(0)); value<boost::shared_ptr<int> > vpi(pi); sregex rex = +( _d [ ++*vpi ] >> '!' ); 
 | 
        As you can see, when using reference<>, you need to first declare a local
        variable and then declare a reference<> to it. These two steps can be combined
        into one using local<>.
      
Table 42.13. local<> vs. reference<>
| This ... | ... is equivalent to this ... | 
|---|---|
| 
 local<int> i(0); sregex rex = +( _d [ ++i ] >> '!' ); 
 | 
 int i = 0; reference<int> ri(i); sregex rex = +( _d [ ++ri ] >> '!' ); 
 | 
        We can use local<>
        to rewrite the above example as follows:
      
local<int> i(0); std::string str("1!2!3?"); // count the exciting digits, but not the // questionable ones. sregex rex = +( _d [ ++i ] >> '!' ); regex_search(str, rex); assert( i.get() == 2 );
        Notice that we use local<>::get() to access the value of the local variable.
        Also, beware that local<>
        can be used to create a dangling reference, just as reference<> can.
      
        In the beginning of this section, we used a regex with a semantic action
        to parse a string of word/integer pairs and stuff them into a std::map<>. That required that the map and the
        regex be defined together and used before either could go out of scope. What
        if we wanted to define the regex once and use it to fill lots of different
        maps? We would rather pass the map into the regex_match()
// Define a placeholder for a map object: placeholder<std::map<std::string, int> > _map; // Match a word and an integer, separated by =>, // and then stuff the result into a std::map<> sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) ) [ _map[s1] = as<int>(s2) ]; // Match one or more word/integer pairs, separated // by whitespace. sregex rx = pair >> *(+_s >> pair); // The string to parse std::string str("aaa=>1 bbb=>23 ccc=>456"); // Here is the actual map to fill in: std::map<std::string, int> result; // Bind the _map placeholder to the actual map smatch what; what.let( _map = result ); // Execute the match and fill in result map if(regex_match(str, what, rx)) { std::cout << result["aaa"] << '\n'; std::cout << result["bbb"] << '\n'; std::cout << result["ccc"] << '\n'; }
This program displays:
1 23 456
        We use placeholder<>
        here to define _map, which
        stands in for a std::map<>
        variable. We can use the placeholder in the semantic action as if it were
        a map. Then, we define a match_results<>what.let( _map = result );". The regex_match()result.
      
| ![[Note]](../../../doc/src/images/note.png) | Note | 
|---|---|
| Placeholders in semantic actions are not actually replaced at runtime with references to variables. The regex object is never mutated in any way during any of the regex algorithms, so they are safe to use in multiple threads. | 
        The syntax for late-bound action arguments is a little different if you are
        using regex_iterator<>regex_token_iterator<>let() function that you can use to bind variables
        to their placeholders. The following code demonstrates how.
      
// Define a placeholder for a map object: placeholder<std::map<std::string, int> > _map; // Match a word and an integer, separated by =>, // and then stuff the result into a std::map<> sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) ) [ _map[s1] = as<int>(s2) ]; // The string to parse std::string str("aaa=>1 bbb=>23 ccc=>456"); // Here is the actual map to fill in: std::map<std::string, int> result; // Create a regex_iterator to find all the matches sregex_iterator it(str.begin(), str.end(), pair, let(_map=result)); sregex_iterator end; // step through all the matches, and fill in // the result map while(it != end) ++it; std::cout << result["aaa"] << '\n'; std::cout << result["bbb"] << '\n'; std::cout << result["ccc"] << '\n';
This program displays:
1 23 456
        You are probably already familiar with regular expression assertions.
        In Perl, some examples are the ^ and $
        assertions, which you can use to match the beginning and end of a string,
        respectively. Xpressive lets you define your own assertions. A custom assertion
        is a contition which must be true at a point in the match in order for the
        match to succeed. You can check a custom assertion with xpressive's check()
There are a couple of ways to define a custom assertion. The simplest is to use a function object. Let's say that you want to ensure that a sub-expression matches a sub-string that is either 3 or 6 characters long. The following struct defines such a predicate:
// A predicate that is true IFF a sub-match is // either 3 or 6 characters long. struct three_or_six { bool operator()(ssub_match const &sub) const { return sub.length() == 3 || sub.length() == 6; } };
You can use this predicate within a regular expression as follows:
// match words of 3 characters or 6 characters. sregex rx = (bow >> +_w >> eow)[ check(three_or_six()) ] ;
        The above regular expression will find whole words that are either 3 or 6
        characters long. The three_or_six
        predicate accepts a sub_match<>
| ![[Note]](../../../doc/src/images/note.png) | Note | 
|---|---|
| The custom assertion participates in determining whether the match succeeds or fails. Unlike actions, which execute lazily, custom assertions execute immediately while the regex engine is searching for a match. | 
Custom assertions can also be defined inline using the same syntax as for semantic actions. Below is the same custom assertion written inline:
// match words of 3 characters or 6 characters. sregex rx = (bow >> +_w >> eow)[ check(length(_)==3 || length(_)==6) ] ;
        In the above, length()
        is a lazy function that calls the length() member function of its argument, and _ is a placeholder that receives the sub_match.
      
Once you get the hang of writing custom assertions inline, they can be very powerful. For example, you can write a regular expression that only matches valid dates (for some suitably liberal definition of the term “valid”).
int const days_per_month[] = {31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 31, 31}; mark_tag month(1), day(2); // find a valid date of the form month/day/year. sregex date = ( // Month must be between 1 and 12 inclusive (month= _d >> !_d) [ check(as<int>(_) >= 1 && as<int>(_) <= 12) ] >> '/' // Day must be between 1 and 31 inclusive >> (day= _d >> !_d) [ check(as<int>(_) >= 1 && as<int>(_) <= 31) ] >> '/' // Only consider years between 1970 and 2038 >> (_d >> _d >> _d >> _d) [ check(as<int>(_) >= 1970 && as<int>(_) <= 2038) ] ) // Ensure the month actually has that many days! [ check( ref(days_per_month)[as<int>(month)-1] >= as<int>(day) ) ] ; smatch what; std::string str("99/99/9999 2/30/2006 2/28/2006"); if(regex_search(str, what, date)) { std::cout << what[0] << std::endl; }
The above program prints out the following:
2/28/2006
        Notice how the inline custom assertions are used to range-check the values
        for the month, day and year. The regular expression doesn't match "99/99/9999" or "2/30/2006"
        because they are not valid dates. (There is no 99th month, and February doesn't
        have 30 days.)
      
        Symbol tables can be built into xpressive regular expressions with just a
        std::map<>.
        The map keys are the strings to be matched and the map values are the data
        to be returned to your semantic action. Xpressive attributes, named a1, a2,
        through a9, hold the value
        corresponding to a matching key so that it can be used in a semantic action.
        A default value can be specified for an attribute if a symbol is not found.
      
        An xpressive symbol table is just a std::map<>,
        where the key is a string type and the value can be anything. For example,
        the following regular expression matches a key from map1 and assigns the
        corresponding value to the attribute a1.
        Then, in the semantic action, it assigns the value stored in attribute a1 to an integer result.
      
int result; std::map<std::string, int> map1; // ... (fill the map) sregex rx = ( a1 = map1 ) [ ref(result) = a1 ];
Consider the following example code, which translates number names into integers. It is described below.
#include <string> #include <iostream> #include <boost/xpressive/xpressive.hpp> #include <boost/xpressive/regex_actions.hpp> using namespace boost::xpressive; int main() { std::map<std::string, int> number_map; number_map["one"] = 1; number_map["two"] = 2; number_map["three"] = 3; // Match a string from number_map // and store the integer value in 'result' // if not found, store -1 in 'result' int result = 0; cregex rx = ((a1 = number_map ) | *_) [ ref(result) = (a1 | -1)]; regex_match("three", rx); std::cout << result << '\n'; regex_match("two", rx); std::cout << result << '\n'; regex_match("stuff", rx); std::cout << result << '\n'; return 0; }
This program prints the following:
3 2 -1
        First the program builds a number map, with number names as string keys and
        the corresponding integers as values. Then it constructs a static regular
        expression using an attribute a1
        to represent the result of the symbol table lookup. In the semantic action,
        the attribute is assigned to an integer variable result.
        If the symbol was not found, a default value of -1 is assigned to result.
        A wildcard, *_,
        makes sure the regex matches even if the symbol is not found.
      
        A more complete version of this example can be found in libs/xpressive/example/numbers.cpp[10]. It translates number names up to "nine hundred ninety nine
        million nine hundred ninety nine thousand nine hundred ninety nine"
        along with some special number names like "dozen".
      
        Symbol table matches are case sensitive by default, but they can be made
        case-insensitive by enclosing the expression in icase().
      
        Up to nine attributes can be used in a regular expression. They are named
        a1, a2,
        ..., a9 in the boost::xpressive namespace. The attribute type
        is the same as the second component of the map that is assigned to it. A
        default value for an attribute can be specified in a semantic action with
        the syntax (a1
        | .
      default-value)
        Attributes are properly scoped, so you can do crazy things like: ( (a1=sym1)
        >> (a1=sym2)[ref(x)=a1] )[ref(y)=a1]. The
        inner semantic action sees the inner a1,
        and the outer semantic action sees the outer one. They can even have different
        types.
      
| ![[Note]](../../../doc/src/images/note.png) | Note | 
|---|---|
| Xpressive builds a hidden ternary search trie from the map so it can search quickly. If BOOST_DISABLE_THREADS is defined, the hidden ternary search trie "self adjusts", so after each search it restructures itself to improve the efficiency of future searches based on the frequency of previous searches. | 
        Matching a regular expression against a string often requires locale-dependent
        information. For example, how are case-insensitive comparisons performed?
        The locale-sensitive behavior is captured in a traits class. xpressive provides
        three traits class templates: cpp_regex_traits<>, c_regex_traits<> and null_regex_traits<>. The first wraps a std::locale,
        the second wraps the global C locale, and the third is a stub traits type
        for use when searching non-character data. All traits templates conform to
        the Regex
        Traits Concept.
      
        By default, xpressive uses cpp_regex_traits<> for all patterns. This causes all
        regex objects to use the global std::locale.
        If you compile with BOOST_XPRESSIVE_USE_C_TRAITS
        defined, then xpressive will use c_regex_traits<> by default.
      
        To create a dynamic regex that uses a custom traits object, you must use
        regex_compiler<>
// Declare a regex_compiler that uses the global C locale regex_compiler<char const *, c_regex_traits<char> > crxcomp; cregex crx = crxcomp.compile( "\\w+" ); // Declare a regex_compiler that uses a custom std::locale std::locale loc = /* ... create a locale here ... */; regex_compiler<char const *, cpp_regex_traits<char> > cpprxcomp(loc); cregex cpprx = cpprxcomp.compile( "\\w+" );
        The regex_compiler objects
        act as regex factories. Once they have been imbued with a locale, every regex
        object they create will use that locale.
      
        If you want a particular static regex to use a different set of traits, you
        can use the special imbue() pattern modifier. For instance:
      
// Define a regex that uses the global C locale c_regex_traits<char> ctraits; sregex crx = imbue(ctraits)( +_w ); // Define a regex that uses a customized std::locale std::locale loc = /* ... create a locale here ... */; cpp_regex_traits<char> cpptraits(loc); sregex cpprx1 = imbue(cpptraits)( +_w ); // A shorthand for above sregex cpprx2 = imbue(loc)( +_w );
        The imbue()
        pattern modifier must wrap the entire pattern. It is an error to imbue only part of a static regex. For
        example:
      
// ERROR! Cannot imbue() only part of a regex sregex error = _w >> imbue(loc)( _w );
null_regex_traits
      
        With xpressive static regexes, you are not limitted to searching for patterns
        in character sequences. You can search for patterns in raw bytes, integers,
        or anything that conforms to the Char
        Concept. The null_regex_traits<> makes it simple. It is a stub implementation
        of the Regex
        Traits Concept. It recognizes no character classes and does no case-sensitive
        mappings.
      
        For example, with null_regex_traits<>, you can write a static regex to
        find a pattern in a sequence of integers as follows:
      
// some integral data to search int const data[] = {0, 1, 2, 3, 4, 5, 6}; // create a null_regex_traits<> object for searching integers ... null_regex_traits<int> nul; // imbue a regex object with the null_regex_traits ... basic_regex<int const *> rex = imbue(nul)(1 >> +((set= 2,3) | 4) >> 5); match_results<int const *> what; // search for the pattern in the array of integers ... regex_search(data, data + 7, what, rex); assert(what[0].matched); assert(*what[0].first == 1); assert(*what[0].second == 6);
Squeeze the most performance out of xpressive with these tips and tricks.
        Compiling a regex (dynamic or static) is far more expensive
        than executing a match or search. If you have the option, prefer to compile
        a pattern into a basic_regex<>
        Since basic_regex<>basic_regex<>
match_results<>
        The match_results<>match_results<>
        Caveat: match_results<>
match_results<>
        This is a corollary to the previous tip. If you are doing multiple searches,
        you should prefer the regex algorithms that accept a match_results<>match_results<>match_results<>
        xpressive provides overloads of the regex_match()regex_search()strlen. If you already
        know the length of the string, you can avoid this overhead by calling the
        regex algorithms with a [begin, end)
        pair.
      
On average, static regexes execute about 10 to 15% faster than their dynamic counterparts. It's worth familiarizing yourself with the static regex dialect.
syntax_option_type::optimize
      
        The optimize flag tells the
        regex compiler to spend some extra time analyzing the pattern. It can cause
        some patterns to execute faster, but it increases the time to compile the
        pattern, and often increases the amount of memory consumed by the pattern.
        If you plan to reuse your pattern, optimize
        is usually a win. If you will only use the pattern once, don't use optimize.
      
Keep the following tips in mind to avoid stepping in potholes with xpressive.
With static regexes, you can create grammars by nesting regexes inside one another. When compiling the outer regex, both the outer and inner regex objects, and all the regex objects to which they refer either directly or indirectly, are modified. For this reason, it's dangerous for global regex objects to participate in grammars. It's best to build regex grammars from a single thread. Once built, the resulting regex grammar can be executed from multiple threads without problems.
        This is a pitfall common to many regular expression engines. Some patterns
        can cause exponentially bad performance. Often these patterns involve one
        quantified term nested withing another quantifier, such as "(a*)*", although in many cases,
        the problem is harder to spot. Beware of patterns that have nested quantifiers.
      
        If type BidiIterT is used
        as a template argument to basic_regex<>CharT is iterator_traits<BidiIterT>::value_type. Type CharT
        must have a trivial default constructor, copy constructor, assignment operator,
        and destructor. In addition the following requirements must be met for objects;
        c of type CharT,
        c1 and c2
        of type CharT const,
        and i of type int:
      
Table 42.14. CharT Requirements
| Expression | Return type | Assertion / Note / Pre- / Post-condition | 
|---|---|---|
| 
                   | 
                   | Default constructor (must be trivial). | 
| 
                   | 
                   | Copy constructor (must be trivial). | 
| 
                   | 
                   | Assignment operator (must be trivial). | 
| 
                   | 
                   | 
                   | 
| 
                   | 
                   | 
                   | 
| 
                   | 
                   | 
                   | 
| 
                   | 
                   | 
                   | 
| 
                   | 
                   | 
                   | 
| 
                   | 
                   | 
                   | 
| 
                   | 
                   | 
                   | 
| 
                   | 
                   | 
                   | 
        In the following table X
        denotes a traits class defining types and functions for the character container
        type CharT; u is an object of type X;
        v is an object of type const X;
        p is a value of type const CharT*; I1
        and I2 are Input Iterators;
        c is a value of type const CharT;
        s is an object of type X::string_type;
        cs is an object of type
        const X::string_type;
        b is a value of type bool; i
        is a value of type int; F1 and F2
        are values of type const CharT*; loc
        is an object of type X::locale_type; and ch
        is an object of const char.
      
Table 42.15. Traits Requirements
| Expression | Return type | 
                  Assertion / Note | 
|---|---|---|
| 
                   | 
                   | 
                  The character container type used in the implementation of class
                  template  | 
| 
                   | 
                   | |
| 
                   | Implementation defined | A copy constructible type that represents the locale used by the traits class. | 
| 
                   | Implementation defined | A bitmask type representing a particular character classification. Multiple values of this type can be bitwise-or'ed together to obtain a new valid value. | 
| 
                   | 
                   | 
                  Yields a value between  | 
| 
                   | 
                   | 
                  Widens the specified  | 
| 
                   | 
                   | 
                  For any characters  | 
| 
                   | 
                   | 
                  For characters  | 
| 
                   | 
                   | 
                  Returns a character such that for any character  | 
| 
                   | 
                   | 
                  For all characters  | 
| 
                   | 
                   | 
                  Returns a sort key for the character sequence designated by the
                  iterator range  | 
| 
                   | 
                   | 
                  Returns a sort key for the character sequence designated by the
                  iterator range  | 
| 
                   | 
                   | 
                  Converts the character sequence designated by the iterator range
                   | 
| 
                   | 
                   | 
                  Returns a sequence of characters that represents the collating
                  element consisting of the character sequence designated by the
                  iterator range  | 
| 
                   | 
                   | 
                  Returns  | 
| 
                   | 
                   | 
                  Returns the value represented by the digit  | 
| 
                   | 
                   | 
                  Imbues  | 
| 
                   | 
                   | 
                  Returns the current locale used by  | 
This section is adapted from the equivalent page in the Boost.Regex documentation and from the proposal to add regular expressions to the Standard Library.
        Below you can find six complete sample programs. 
      
This is the example from the Introduction. It is reproduced here for your convenience.
#include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace boost::xpressive; int main() { std::string hello( "hello world!" ); sregex rex = sregex::compile( "(\\w+) (\\w+)!" ); smatch what; if( regex_match( hello, what, rex ) ) { std::cout << what[0] << '\n'; // whole match std::cout << what[1] << '\n'; // first capture std::cout << what[2] << '\n'; // second capture } return 0; }
This program outputs the following:
hello world! hello world
        Notice in this example how we use custom mark_tags
        to make the pattern more readable. We can use the mark_tags
        later to index into the match_results<>
#include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace boost::xpressive; int main() { char const *str = "I was born on 5/30/1973 at 7am."; // define some custom mark_tags with names more meaningful than s1, s2, etc. mark_tag day(1), month(2), year(3), delim(4); // this regex finds a date cregex date = (month= repeat<1,2>(_d)) // find the month ... >> (delim= (set= '/','-')) // followed by a delimiter ... >> (day= repeat<1,2>(_d)) >> delim // and a day followed by the same delimiter ... >> (year= repeat<1,2>(_d >> _d)); // and the year. cmatch what; if( regex_search( str, what, date ) ) { std::cout << what[0] << '\n'; // whole match std::cout << what[day] << '\n'; // the day std::cout << what[month] << '\n'; // the month std::cout << what[year] << '\n'; // the year std::cout << what[delim] << '\n'; // the delimiter } return 0; }
This program outputs the following:
5/30/1973 30 5 1973 /
The following program finds dates in a string and marks them up with pseudo-HTML.
#include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace boost::xpressive; int main() { std::string str( "I was born on 5/30/1973 at 7am." ); // essentially the same regex as in the previous example, but using a dynamic regex sregex date = sregex::compile( "(\\d{1,2})([/-])(\\d{1,2})\\2((?:\\d{2}){1,2})" ); // As in Perl, $& is a reference to the sub-string that matched the regex std::string format( "<date>$&</date>" ); str = regex_replace( str, date, format ); std::cout << str << '\n'; return 0; }
This program outputs the following:
I was born on <date>5/30/1973</date> at 7am.
        The following program finds the words in a wide-character string. It uses
        wsregex_iterator. Notice
        that dereferencing a wsregex_iterator
        yields a wsmatch object.
      
#include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace boost::xpressive; int main() { std::wstring str( L"This is his face." ); // find a whole word wsregex token = +alnum; wsregex_iterator cur( str.begin(), str.end(), token ); wsregex_iterator end; for( ; cur != end; ++cur ) { wsmatch const &what = *cur; std::wcout << what[0] << L'\n'; } return 0; }
This program outputs the following:
This is his face
        The following program finds race times in a string and displays first the
        minutes and then the seconds. It uses regex_token_iterator<>
#include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace boost::xpressive; int main() { std::string str( "Eric: 4:40, Karl: 3:35, Francesca: 2:32" ); // find a race time sregex time = sregex::compile( "(\\d):(\\d\\d)" ); // for each match, the token iterator should first take the value of // the first marked sub-expression followed by the value of the second // marked sub-expression int const subs[] = { 1, 2 }; sregex_token_iterator cur( str.begin(), str.end(), time, subs ); sregex_token_iterator end; for( ; cur != end; ++cur ) { std::cout << *cur << '\n'; } return 0; }
This program outputs the following:
4 40 3 35 2 32
        The following program takes some text that has been marked up with html and
        strips out the mark-up. It uses a regex that matches an HTML tag and a regex_token_iterator<>
#include <iostream> #include <boost/xpressive/xpressive.hpp> using namespace boost::xpressive; int main() { std::string str( "Now <bold>is the time <i>for all good men</i> to come to the aid of their</bold> country." ); // find a HTML tag sregex html = '<' >> optional('/') >> +_w >> '>'; // the -1 below directs the token iterator to display the parts of // the string that did NOT match the regular expression. sregex_token_iterator cur( str.begin(), str.end(), html, -1 ); sregex_token_iterator end; for( ; cur != end; ++cur ) { std::cout << '{' << *cur << '}'; } std::cout << '\n'; return 0; }
This program outputs the following:
{Now }{is the time }{for all good men}{ to come to the aid of their}{ country.}
Here is a helper class to demonstrate how you might display a tree of nested results:
// Displays nested results to std::cout with indenting struct output_nested_results { int tabs_; output_nested_results( int tabs = 0 ) : tabs_( tabs ) { } template< typename BidiIterT > void operator ()( match_results< BidiIterT > const &what ) const { // first, do some indenting typedef typename std::iterator_traits< BidiIterT >::value_type char_type; char_type space_ch = char_type(' '); std::fill_n( std::ostream_iterator<char_type>( std::cout ), tabs_ * 4, space_ch ); // output the match std::cout << what[0] << '\n'; // output any nested matches std::for_each( what.nested_results().begin(), what.nested_results().end(), output_nested_results( tabs_ + 1 ) ); } };