GREP — Find Regular Expressions in Files
Quick Start for Release 8.0
Program Dated 4 May 2005 / This Document Dated 4 May 2005
Copyright © 1986–2005 Stan Brown, Oak Road Systems
Program Dated 4 May 2005 / This Document Dated 4 May 2005
Copyright © 1986–2005 Stan Brown, Oak Road Systems
Summary: GREP searches named input files, or the standard input, and displays lines that match one or more patterns called regular expressions or regexes. GREP can also search binary files and display records or buffers that contain matches. This Quick Start is your overview of GREP.
These documents are sometimes revised between software releases — you may want to check for revisions at <http://oakroadsystems.com/sharware/grep.htm>.
The DOS filter FIND is useful for finding a given string in one or more files. But what if you want to find the word the in caps or lower case, without also finding other, There, then, and so on? You don't really want to search for a specific string. Rather, what you're looking for is a regular expression or regex, namely the preceded and followed by something other than a letter. GREP to the rescue!
GREP takes one or more regexes, matches them against the input files, and displays the hits.
Oak Road Systems GREP combines most features of UNIX grep, egrep, and fgrep. GREP has many other advantages over FIND besides using regular expressions. Indeed, customers have cited some of these as features they couldn't find in competing GREPs:
The 16-bit version, GREP16, runs under DOS 2.0 or higher, including a DOS box under any version of Windows. The 32-bit version, GREP32, requires a DOS box (or "command prompt") under Windows 95, Windows NT, or any later Windows.
The two executables operate the same and have the same features, except that you need GREP32 for long filenames, for extended regexes, and for character mapping. If you typically run GREP in a DOS box ("command prompt") under Windows 95 or NT or later, GREP32 is the one you want.
There's no special installation procedure. Simply move GREP16.EXE, GREP32.EXE, or both to any convenient directory in your path.
An interactive program tour is included as file TOUR.BAT; just type TOUR after unZIPping the archive.
You may wish to rename the executable you use more often to the simpler GREP.EXE. All the examples in this GREP Quick Start assume you've done that. Otherwise, just substitute GREP16 or GREP32 wherever you see GREP in the examples.
Starting with release 7.5, a Quick Reference Card is included as an MS-Word file, GREPQRC.DOC. It's suitable for printing in 8½×11 or A4 format.
GREP is shareware. You are encouraged to "try before you buy" with the free download.
If you use GREP past a 30-day evaluation period, you must register and pay for it. Please see the file LICENSE.TXT for full details, including support and warranty information.
The unregistered evaluation version displays a registration reminder when you run it, and a request for feedback at the end.
Warning for batch files: About once per hundred runs, the unregistered version prompts you to press a key to continue execution. GREP works just fine in batch files, but you need to be at your computer when running unregistered GREP so that you can answer that prompt. If you like GREP enough to put it into batch files that run unattended, you like it enough to register it!
When you register, you get the registered version with these benefits:
There's no special uninstall procedure; simply delete the GREP files. GREP doesn't write any secret files or modify the Windows registry.
The basic GREP command form is
grep options regex inputfiles
(You can also run GREP from the Windows desktop; see the GREP Manual.
Options are listed later in this GREP Quick Start and are fully explained in the GREP Manual.
regex is a string or a special pattern-matching string called a regular expression or regex. Regex patterns are listed later in this GREP Quick Start and are explained in detail in the GREP Manual. (A regex is normally required on the command line; however, if you use the /F option, one or more regexes are taken from a file or the keyboard instead of the command line.)
You can specify inputfiles on the command line; otherwise GREP reads the standard input.
As with any command, you can redirect or pipe inputs or output. GREP can return a useful value in ERRORLEVEL, as explained in the GREP Manual.
Here are two simple examples. First,
grep /I pic[t\s] \proj\*.cob
examines every COBOL source file in the root-level PROJ directory and displays every line that contains a picture clause ("pic" followed by either "t" or a space) in caps or lower case (the /I option). Adding the /S option
grep /I /S pic[t\s] \*.cob
examines every COBOL source file in all directories on the current disk.
For a summary of operating instructions, type
grep /? | more
Since the help text is over 100 lines long, you might prefer to redirect it to your printer or a file:
grep /? >prn:
GREP scans either named input files or the standard input — the standard input can be a named file, a pipe, or the keyboard.
Named input files provide the greatest flexibility. They can be read as text or binary, and you can search subdirectory trees.
GREP32 can use long filenames; GREP16 requires short (8.3) filenames.
GREP expands any wildcards in named input files. Not only DOS-style * and ?, but UNIX-style [...] can be used. For instance, "c:\My Documents\[abc]*doc" tells GREP to read every file in the indicated directory whose name starts with A, B, or C and ends with DOC (including ".DOC"). Please see Named Input Files in the GREP Manual for complete rules.
You can use the /X option to exclude some files or groups of files from consideration. For instance, if you want all 2001 reports except December, you might specify something like
grep [options] [regex] *2001* -x*dec2001*
If you have many named input files, you may want to store the list in a file; see the /@ option.
If you set the /S option, GREP searches not only the files indicated on the command line, but also the same-named files in subdirectories.
(The /S option is fully functional in the registered version, and searches all the way to the bottom of a directory tree. In the unregistered evaluation version, GREP searches the named or implied directories and all directories immediately below them, but no further in any one execution. You can either make multiple runs, or register GREP for the convenience of searching the entire directory tree.)
For example, with the command
grep /S regex \hazax* *.c g:\mumble\*.htm
GREP examines all files on the entire current drive whose names start with "hazax"; then it looks at all C source files in the current directory and all subdirectories under it; finally it looks at all HTML files in directory "g:\mumble" and all subdirectories under it.
Perhaps a more realistic example: you have a document about Vandelay Industries somewhere on your disk, but you can't remember where. You can find it this way:
grep /S Vandelay \* or: grep /S Vandelay \*.*
(Both * and *.* select all files; see Wildcard Expansion in the GREP Manual.) You might want to add the /I option if you can't remember how "Vandelay" was capitalized.
If you don't specify any named input files, GREP takes its input from the standard input. That can mean any of these three sources:
input redirected from a single file (DOS doesn't allow wildcards):
grep [options] [regex] <inputfile
another command's output piped into GREP for further processing:
other-command | grep [options] [regex]
keyboard input (GREP prompts you):
grep [options] [regex]
Example:
tracert oakroadsystems.com | grep 123
tells GREP to read the tracert command's output and display any lines that contain the string "123".
GREP was originally written with plain text files in mind, but you can also use it quite well with binary files like word-processing files, databases, and executable programs. GREP not only reads binary files differently, it also adjusts the display format for matches.
DOS doesn't mark a file as text or binary; the program that reads the file just has to know. GREP "knows" files are binary when you tell it via the /R2 or /R3 option; otherwise it treats input files as text. Use the /R3 option when you don't know any details of the internal structure of the binary file; please see Binary Files and Text Files in the GREP Manual for much more about binary files.
Registered users can use the /R-1 or /R-2 option to have GREP examine each file and decide whether it's text or free-form binary; please see the /R option in the GREP Manual for details. If you have the registered version, I recommend /R-1.
Normally, GREP displays hits on your screen. "Hits" are the text lines, binary records, or binary buffers that contain matches for the regex(es). As part of the output, GREP displays the file path and name as a header above the group of hits from that file. You can use various options to display abbreviated or expanded forms of hits or to suppress those headers, move them to the lines with the hits, or display headers even for files that had no hits.
You can also redirect GREP's output into a file or pipe GREP's output to another command (even another GREP command). To redirect GREP output, follow the DOS rules and put one of these at the end of the GREP command line:
>>
reportfile
appends GREP's output to an existing file, or create the file and write
to it if it doesn't exist.
>
reportfile
overwrites an existing file with GREP's output, or create the file
and write to it if it doesn't exist.
|
other-command
pipes GREP's output to the standard input stream of the other
command.
You can pipe or redirect output regardless of whether input was piped or redirected.
Only the hits (and file path\name headers, if present) are redirected by the above syntax. Errors and warning messages are still sent to the standard error stream. That is usually your screen, though some OSes or shell replacements let you redirect error output. For example, in 4DOS and 4NT type help piping or help redirection for information.
The /D option lets you create extra debugging output and send it to a named file or the standard error output.
Each description is hyperlinked to the full description in the GREP Manual.
On the command line, options can appear anywhere, before or after the regex and the input files. All options are processed before any files are read.
You have a lot of freedom about how you enter options: use a leading hyphen or slash, use upper- or lower-case letters, leave spaces between options or combine them. For instance, the following are just some of the different ways of turning on the /P3 option and /B option:
/p3 -b /b/P3 /p3B -B/P3 -P3 -b
This GREP Quick Start always uses capital letters for the options, to make it easier to distinguish letter l and figure 1.
For clarity, you should always use a hyphen or slash before the numeric /0 option, /1 option, or /3 option. Example: /E0 means the /E option with a value of 0, but /E/0 means the /E option with no value specified, followed by the /0 option.
Registered users who use certain options frequently can put them in the ORS_GREP environment variable. Use the SET command in the c:\config.sys file (if present) or on the command line:
set ORS_GREP=options...
You have the same freedom as on the command line: leading slashes or hyphens, space separation or options run together, caps or lower case.
Example: If you prefer to have GREP sense the type of each file (/R-1 option) and you prefer UNIX-style output (/U option) with line numbers (/N option), then you want to set the environment variable as
set ORS_GREP=/R-1UN
The GREP Manual gives more information about the environment variable, including instructions for overriding a particular stored option on the command line.
A regular expression or regex is a pattern of characters to compare to lines, records, or buffers from one or more input files. GREP reports a hit if the input contains a match with the pattern in the regex.
A regex can be a simple text string, like mother, or something more complex. (If you want to search only for simple strings, use the /E0 option and ignore all this regex stuff.)
Example 1: If you want both the English and the American spellings of the word "grey/gray", use
gr[ea]y
as your regex. (See Example 5 for "colour/color".)
Example 2: The basic regex for any word starting with "moth" is
moth[a-z]*
which is the letters "moth" followed by any number of letters a through z. Yes, that regex does match "moth" itself: see * or + for Repetition in the GREP Manual.
Example 3: A word in double quotes would be matched by
\"[a-z]+\"
Read that regex as "a double quote mark, followed by one or more letters, followed by another double quote mark." (You need the backslashes \ to tell most flavors of DOS to pass the quote marks forward to GREP. See Quotes in a Regex in the GREP Manual.)
Example 4: A U.S. local telephone number has the basic regex
[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]
That signifies three digits, followed by a hyphen, followed by four digits. (You could express it more simply with an extended regex: [0-9]{3}-[0-9]{4} or even \d{3}-\d{4}.)
Example 5: To get the American and English spellings of "color/colour" is easy with GREP32: specify an extended regex (with the /E2 option) of
colou?r
GREP16 doesn't support extended regexes, so you could either use colou*r (which would also match the non-words colouur, colouuuuur, etc.), or else use the /F- option and enter color and colour as two regexes.
From the examples you can see that a regex is essentially a string of characters with a bunch of operators thrown in to express possibilities like "any of these characters" and "repeated". Here's a quick summary of the characters that have special meaning in a regex; note that some work in any regex and others only in an extended regex (/E2 option). Each one is hyperlinked to the section of the GREP Manual where you'll find a full description.
which regexes? | description | |
---|---|---|
Characters with special meaning outside square brackets: | ||
. period | any | matches any character |
* asterisk | any | matches 0 or more occurrences of the preceding |
+ plus sign | any | matches 1 or more occurrences of the preceding |
? question mark | extended | matches 0 or 1 occurrence of the preceding |
[ left square bracket | any | start a character class, e.g. [abcde] to match any one of a, b, c, d, e |
^ caret | any | match start of line in text mode or start of record in binary mode |
$ dollar sign | any | match end of line in text mode or end of record in binary mode |
\ backslash | any | treat any of the listed special characters as normal |
\ backslash | extended | (1) character types like \w for a word character;
(2) simple assertions like \b for a word boundary; (3) back references to parenthesized subexpressions; (4) character encoding for odd characters like \x3c for < |
{ left brace | extended | repetition count, e.g. {3,} for three or more occurrences of the preceding |
| vertical bar | extended | alternatives, e.g. mother|father to match "mother" or "father" |
(...) parentheses or round brackets |
extended | subexpressions, e.g. ( )+ to match one or more occurrences of " " |
Characters with special meaning inside square brackets: | ||
] right square bracket | any | end the character class |
- minus sign or hyphen | any | character range, e.g. [a-z] to match any lower-case English letter |
^ caret | any | negate the character class, e.g. [^a-z] to match any character except a lower-case English letter |
\ backslash | any | treat the next character as normal |
\ backslash | extended | character encoding |
[: left square bracket followed by colon |
extended | introduce a named character class, e.g. [[:punct:]0-9] for any punctuation character or a digit |
[ on to the GREP Manual ]