Rexx Language Constructs
In this chapter, the concept and syntax of REXX clauses are explained. At the end of the chapter there is a section describing how Regina differs from standard REXX are described in the first part of the chapter.
A program in the REXX language consists of clauses, which are divided into four groups: null clauses, commands, assignments, and instructions. The three latter groups (commands, assignments, and instructions) are collectively referred to as statements. This does not match the terminology in [TRL2], where "instruction" is equivalent to what is known here as "statement", and "keyword instruction" is equivalent to what is known here as "instruction". However, I find the terminology used here simpler and less confusing.
Incidentally, the terminology used here matches [DANEY].
A clause is defined as all non-clause-delimiters (i.e. blanks and tokens) up to and including a clause delimiter. A token delimiter can be:
An end-of-line, unless it lies within a comment. An end-of-line within a constant string is considered a syntax error {6}.
A semicolon character that is not within a comment or constant string.
A colon character, provided that the sequence of tokens leading up to it consists of a single symbol and whitespace. If a sequence of two symbol tokens is followed by a colon, then this implies SYNTAX condition {13}.
Some systems have the ability to store a text file having a last line unterminated by an end-of-line character sequence. In general, this applies to systems that use an explicit end-of-line character sequence to denote end-of-lines, e.g. Unix and MS-DOS systems. Under these systems, if the last line is unterminated, it will strictly speaking not be a clause, since a clause must include its terminating clause delimiter. However, some interpreters are likely to regard the end-of-file as a clause delimiter too. The functionality of INTERPRET gives some weight to this interpretation. But other systems may ignore that last, unterminated line, or maybe issue a syntax error. (However, there is no SYNTAX condition number adequately covering this situation.
Example: Binary transferring files
Suppose a REXX program is stored on an MS-DOS machine. Then, an end-of-line sequence is marked in the file as the two characters carriage return and newline. If this file is transferred to a Unix system, then only newline marks the end-of-line. For this to work, the file must be transferred as a text file. If it is (incorrectly) transferred as a binary file, the result is that on the Unix system, each line seems to contain a trailing carriage return character. In an editor, it might look like this:
say 'hello world'^M
say 'that"s it'^M
This will probably raise SYNTAX condition {13}.
Null clauses are clauses that consist of only whitespace, or comments, or both; in addition to the terminating clause delimiter. These clauses are ignored when interpreting the code, except for one situation: null clauses containing at least one comment is traced when appropriate. Null clauses not containing any comments are ignored in every respect.
Example: Tracing comments
The tracing of comments may be a major problem, depending on the context. There are basically two strategies for large comments: either box multiple lines as a single comment, or make the text on each line an independent comment, as shown below:
trace all
/*
This is a single, large comment, which spans multiple lines.
Such comments are often used at the start of a subroutine or
similar, in order to describe both the interface to and the
functionality of the function.
*/
/* This is also a large comment, but it is written as multiple */
/* comments, each on its own line. Thus, this is several clauses */
/* while the comment above is a single comment. */
During tracing, the first of these will be displayed as one large comment, and during interactive tracing, it will only pause once. The second will be displayed as multiple lines, and will make several pauses during interactive tracing. An interpreter may solve this situation in several ways, the main objective must be to display the comments nicely the to programmer debugging the code. Preferably, the code is shown in a fashion that resembles how it is entered in the file.
If a label is multiple defined, the first definition is used and the rest are ignored. Multiple defined labels is not an SYNTAX condition.
A null clause is not a statement. In some situations, like after the THEN subclause, only a statement come. If a null clause is provided, then it is ignored, and the next statement is used instead.
Consider the following code:
parse pull foo
if foo=2 then
say 'foo is not 2'
else
/* do nothing */
say 'that"s it'
This will not work the way indentation indicates, since the comment in this example is not a statement. Thus, the ELSE reads beyond the comment, and connects to the SAY instruction which becomes the ELSE part. (That what probably not what the programmer intended.) This code will say that's it, only when foo is different from 2. A separate instruction, NOP has been provided in order to fill the need that was inadequately attempted filled by the comment in the code fragment above.
Example: Trailing comments
The effect that comments are not statements can be exploited when documenting the program, and simultaneously making the program faster. Consider the following two loops:
sum = 0
do i=1 to 10
/* sum 1 2 3 ... 8 9 10 */
sum = sum + i
end
sum = 0
do i=1 to 10
sum = sum + i /* sum 1 2 3 ... 8 9 10 */
end
In the first loop, there are two clauses, while the second loop contains only one clause, because the comment is appended to an already existing clause. During execution, the interpreter has to spend time ignoring the null clause in the first loop, while the second loop avoids this problem (assuming tracing is unenabled). Thus, the second loop is faster; although only insignificantly faster for small loops. Of course, the comment could have been taken out of the loop, which would be equally fast to the second version above.
Assignments are clauses where the first token is a symbol and the second token is the equal sign (=). This definition opens for some curious effects, consider the following clauses:
a == b
This is not a command, but an assignment of the expression = b to the variable a. Of course, the expression is illegal (=b) and will trigger a SYNTAX condition for syntax error {35}. TRL2 defines the operator == as consisting of two tokens. Thus, in the first of these examples, the second token is =, the third token is also =, while the fourth token is b.
3 = 5
This is an assignment of the value 5 to the symbol 3, but since this is not a variable symbol, this is an illegal assignment, and will trigger the SYNTAX condition for syntax error {31}.
"hello" = foo
This is not an invalid assignment, since the first token in the clause is not a symbol. Instead, this becomes a command.
arg =(foo) bar
The fourth statement is a valid assignment, which will space-concatenate the two variable symbols foo and bar, and assign the result to the variable symbol arg. It is specifically not an ARG instruction, even though it might look like one. If you need an ARG instruction which template starts with an absolute indirect positional pattern, use the PARSE UPPER ARG instruction instead, or prepend a dot in front of the template.
An assignment can assign a value to a simple variable, a stem variable or a compound variable. When assigning to a stem variable, all possible variable symbols having that stem are assigned the value. Note specifically that this is not like setting a default, it is a one time multiple assignment.
Example: Multiple assignment
The difference between REXX's multiple assignment and a default value can be seen from the following code:
foo. = 'bar'
foo.1 = 'baz'
drop foo.1
say foo.1 /* says "FOO.1" */
Here, the SAY instruction writes out FOO.1, not bar. During the DROP instruction, the variable FOO.1 regains its original, uninitialized value FOO.1, not the value of its stem variable FOO., i.e. bar, because stem assignments does not set up a default.
Example: Emulating a default value
If you want to set the compound variable to the value of its stem variable, if the stem is initialized, then you may use the following code:
if (symbol('foo.')) then
foo.1 = foo.
else
drop foo.1
In this example, the FOO.1 variable is set to the value of its stem if the stem currently is assigned a value. Else, the FOO.1 variable is dropped.
However, this is probably not exactly the same, since the internal storage of the computer is likely to store variables like FOO.2 and FOO.3 only implicitly (after all, it can not explicitly store every compound having FOO. as stem). After the assignment of the value of FOO. to FOO.1, the FOO.1 compound variable is likely to be explicitly stored in the interpreter.
There is no way you can discover the difference, but the effects are often that more memory is used, and some functionality that dumps all variables may dump FOO.1 but not FOO.2 (which is inconsistent). See section RexxVariablePool.
Example: Space considerations
Even more strange are the effects of the following short example:
foo. = 'bar'
drop foo.1
Although apparently very simple, there is no way that an interpreter can release all memory referring to FOO.1. After all, FOO.1 has a different value than FOO.2, FOO.3, etc., so the interpreter must store information that tells it that FOO.1 has the uninitialized value.
These considerations may seem like nit-picking, but they will matter if you drop lots of compound variables for a stem which has previously received a value. Some programming idioms do this, so be aware. If you can do without assigning to the stem variable, then it is possible for the interpreter to regain all memory used for that stem's compound variables.
In this section, all instructions in standard REXX are described.
Extensions are listed later in this chapter.
First some notes on the terminology. What is called an instruction in this document is equivalent to a "unit" of clauses. That is, each instruction can consist of one or more clauses. For instance, the SAY instruction is always a single instruction, but the IF instruction is a multi-clause instruction. Consider the following script, where each clause has been boxed:
if a=b then
say 'hello'
else
say 'bye'
Further, the THEN or ELSE parts of this instruction might consist of a DO/END pair, in which case the IF instruction might consists of an virtually unlimited number of clauses.
Then, some notes on the syntax diagrams used in the following descriptions of the instructions. The rules applying to these diagrams can be listed as:
Anything written in courier font in the syntax diagrams indicates that it should occur as-is in the REXX program. Whenever something is written in italic font, it means that the term should be substituted for another value, expression, or terms.
Anything contained within matching pairs of square brackets ([...]) are optional, and may be left out.
Whenever a pair or curly braces is used, it contains two or more subclauses that are separated by the vertical bar (|). It means that the curly braces will be substituted for one of the subclauses it contains.
Whenever the ellipsis (...) is used, it indicates that the immediately following subclauses may be repeated zero or more times. The scope of the ellipsis is limited to the contents of a set of square brackets or curly braces, if it occurs there.
Whenever the vertical bar | is used in any of the syntax diagrams, it means that either the term to the left, or the term to the right can be used, but not both, and at least one of the must be used. This "operator" is associative (can be used in sequence), and it has lower priority than the square brackets (the scope of the vertical bar located within a pair of square brackets or curly braces is limited to the text within those square brackets or curly braces.
Whenever a semicolon (;) is used in the syntax diagram, it indicates that a clause separator must be present at the point. It may either be a semicolon character, or an end-of-line.
Whenever the syntax diagram is spread out over more lines, it means that any of the lines can be used, but that the individual lines are mutually exclusive. Consider the syntax:
SAY = symbol
string
This is equivalent to the syntax:
SAY [symbol | string ]
Because in the first of these two syntaxes, the SAY part may be continued at either line.
Sometimes the syntax of an instruction is so complex that parts of the syntax has been extracted, and is shown below in its expanded state. The following is an example of how this looks:
SAY something TO someone
something : = HI
HELLO
BYE
someone : = THE BOSS
YOUR NEIGHBOR
You can generally identify these situations by the fact that they comes a bit below the real syntax diagram, and that they contains a colon character after the name of the term to be expanded.
In the syntax diagrams, some generic names have been used for the various parts, in order to indicate common attributes for the term. For instance, whenever a term in the syntax diagrams is called expr, it means that any valid REXX expression may occur instead of that term. The most common such names are:
condition
Indicates that the subclause can be any of the names of the conditions, e.g. SYNTAX, NOVALUE, HALT, etc.
expr
Indicates that the subclause can be any valid REXX expression, and will in general be evaluated as normal during execution.
statement
Indicates that extra clauses may be inserted into the instruction, and that exactly one of them must be a true statement.
string
Indicates that the subclause is a constant string, i.e. either enclosed by single quotes ('...') or double quotes ("...").
symbol
Indicates that the subclause is a single symbol. In general, whenever symbol is used as the name for a subclause, it means that the symbol will not automatically be expanded to the value of the symbol. But instead, some operation is performed on the name of the symbol.
template
Indicates that the subclause is a parsing template. The exact syntax of this is explain in a chapter on tracing, to be written later.
In addition to this, variants may also exists. These variants will have an extra letter or number appended to the name of the subclause, and is used for differing between two or more subclauses having the same "type" in one syntax diagram. In the case of other names for the subclauses, these are explained in the description of the instruction.
ADDRESS [ environment [ command ] [ redirection ] ] ;
[ [ VALUE ] expression [ redirection ] ] ;
and redirection has one of the forms:
WITH INPUT standard_redir [ OUTPUT out_redir ] [ ERROR out_redir ] ;
WITH INPUT standard_redir [ ERROR out_redir ] [ OUTPUT out_redir ] ;
WITH OUTPUT out_redir [ INPUT standard_redir ] [ ERROR out_redir ] ;
WITH OUTPUT out_redir [ ERROR out_redir ] [ INPUT standard_redir ] ;
WITH ERROR out_redir [ INPUT standard_redir ] [ OUTPUT out_redir ] ;
WITH ERROR out_redir [ OUTPUT out_redir ] [ INPUT standard_redir ] ;
standard_redir is defined as:
NORMAL ;
[ STREAM | STEM | LIFO | FIFO ] symbol ;
and out_redir is defined as:
NORMAL ;
[ APPEND | REPLACE ] [ STREAM | STEM | LIFO | FIFO ] symbol ;
We will dicuss redirection later.
The ADDRESS instruction controls where commands to an external environment are sent. If both environment and command are specified, the given command will be executed in the given environment. The effect is the same as issuing an expression to be executed as a command (see section Commands), except that the environment in which it is to be executed can be explicitly specified in the ADDRESS clause. In this case, the special variable RC will be set as usual, and the ERROR or FAILURE conditions might be raised, as for normal commands.
In other words: All normal commands are ADDRESS statements with a suppressed keyword and environment.
The environment term must be a symbol or a literal string. If it is a symbol, its "name" is used, i.e. it is not tail substituted or swapped for a variable value. The command and expression terms can be any REXX expression. eg.
SYSTEM='PATH'
ADDRESS SYSTEM "echo Hello"
is equivalent to a plain
ADDRESS SYSTEM "echo Hello"
or
ADDRESS "SYSTEM" "echo Hello"
for the external echo command.
A symbol specified as an environment name isn't case-sensitive, whereas a string must match the case. Builtin environments are always uppercased.
REXX maintains a list of environments, the size of this list is at least two. If you select a new environment, it will be put in the front of this list. Note that if command is specified, the contents of the environment stack is not changed. If you omit command, environment will always be put in the front of the list of environments. Regina has an infinite list and never pushs out any entry. Possible values are listed below. If you supply a command with the ADDRESS statement, the environment is interpreted as a temporary change for just this command.
What happens if you specify an environment that is already in the list, is not completely defined. Strictly speaking, you should end up with both entries in the list pointing to the same environment, but some implementations will probably handle this by reordering the list, leaving the selected environment in the front. This is Regina's behaviour. Every environment exists only once. The redirection command below always changes the behaviour of one -- the given -- environment. You can imagine a set of playing cards in your hand. The operation is to draw one card by name and put it to the front.
If you do not specify any subkeywords or parameters to ADDRESS, the effect is to swap the two first entries in the list of environments. Consequently, executing ADDRESS multiple times will toggle between two environments.
The second syntax form of ADDRESS is a special case of the first form with command omitted. If the first token after ADDRESS is VALUE, then the rest of the clause is taken to be an expression, naming the environment which is to be made the current environment. Using VALUE makes it possible to circumvent the restriction that the name of the new environment must be a symbol or literal string. However, you can not combine both VALUE and command in a single clause.
Example: Examples of the ADDRESS instruction
Let's look at some examples, they can sometimes be combined with a redirection:
ADDRESS COMMAND
ADDRESS SYSTEM 'copy' fromfile tofile
ADDRESS system
ADDRESS VALUE newenv
ADDRESS
ADDRESS (oldenv)
The first of these sets the environment COMMAND as the current environment.
The second performs the command ''copy' in the environment SYSTEM, using the values of the symbols fromfile and tofile as parameters. Note that this will not set SYSTEM as the current environment.
The third example sets SYSTEM as the current environment (it will be automatically converted to upper case).
The fourth example sets as the current environment the contents of the symbol newenv, pushing SYSTEM down one level in the stack.
The fifth clause swaps the two uppermost entries on the stack; and SYSTEM ends up at the top pushing the environment specfied in newenv below it.
The sixth clause is equivalent to the fourth example, but is not allowed by ANSI. Since Regina 3.0 this style is deprecated and can't be used if OPTIONS STRICT_ANSI is in effect. Again, avoid this kind of ADDRESS statement style, and use the VALUE version instead.
Example: The VALUE subkeyword
Let us look a bit closer at the last example. Note the differences between the two clauses:
ADDRESS ENV
ADDRESS VALUE ENV
The first of these sets the current default environment to ENV, while the second sets it to the value of the symbol ENV.
If you are still confused, Don't Panic; the syntax of ADDRESS is somewhat bizarre, and you should not put too much effort into learning all aspects of it. Just make sure that you understand how to use it in simple situations. Chances are that you will not have use for its more complicated variants for quite some time.
Then, what names are legal as environments? Well, that is implementation-specific, but some names seems to be in common use. The name COMMAND is sometimes used to refer to an environment that sends the command to the operating system. Likewise, the name of the operating system is often used for this (CMS, UNIX, etc.). You have to consult the implementation specific documentation for more information about this. Actually, there is not really any restrictions on what constitutes a legal environment name (even the nullstring is legal). Some interpreters will allow you to select anything as the current environment; and if it is an illegal name, the interpreter will complain only when the environment is actually used. Other implementations may not allow you to select an invalid environment name at all.
Regina allows every name as an environment name. Regina gives an error message about wrong names only when the name is used. The error string looks somewhat strange if Regina is used as a separate program, since the extension of the environment name space is only useful when running as part of a program which extends the standard names.
Regina uses three kinds of environments. Some have alias names. The environment names are:
SYSTEM
alias OS2ENVIRONMENT
alias ENVIRONMENT
This is the default environment which is selected at startup. The standard operating system command line interpreter will be loaded to execute the commands. You can use the builtin commands of the command line interpreter, often called shell, or any other program which the command line interpreter can find and load.
COMMAND
alias CMD
This environment loads the named program directly. You have to supply a path if this is needed for the current operating system to load the program. You can't use builtin shell functionality like system redirections like you can with SYSTEM. Regina's redirections are more powerful and work in either environment.
PATH
This works like the environment COMMAND but Regina uses the standard operating system search rules for programs. This is done by searching through the items of the PATH system-environment variable in most operation systems.
The definition of REXX says nothing about which environment is preselected when you invoke the interpreter, although TRL defines that one environment is automatically preselected when starting up a REXX script. Note that there is no NONE environment in standard REXX, i.e. an environment that ignores commands, but some interpreters implement the TRACE setting ??? to accomplish this. Regina uses the environment SYSTEM as the preselected environment as mentioned above. More implementation specific details can be found in the section implementation specific documentation for Regina.
The list of environments will be saved across subroutine calls; so the effect of any ADDRESS clauses in a subroutine will cease upon return from the subroutine.
ADDRESS Redirections
ANSI defines redirections for the ADDRESS statement. This feature has been missing from Regina until version 3.0; although you have had the chance to redirect input and output by using LIFO> and >FIFO modifiers on command strings.
These command modifiers still exist and have a higher precedence than the ANSI defined redirections. Note, that LIFO and FIFO can be used by the newer redirection system. But, first of all, some examples show the usage of ADDRESS redirections.
ADDRESS SYSTEM "sort" WITH INPUT STEM names. OUTPUT STEM names.
ADDRESS SYSTEM "myprog" WITH INPUT STEM somefood. OUTPUT STREAM prg.out ERROR STEM oops.
ADDRESS PATH WITH INPUT FIFO '' OUTPUT NORMAL
ADDRESS SYSTEM WITH INPUT FIFO '' OUTPUT FIFO '' ERROR NORMAL
ADDRESS SYSTEM "fgrep 'bongo'" WITH INPUT STREAM feeder
The first command instructs the default command line interpreter to call the program called sort. The input for the command is read from the stem names. (note the trailing period) and the output is sent back to the same stem variable
after the command terminates. Thus, bothering about the implementation of a fast sort algorithm for a stem is as simple as calling a program which can actually do the sort.
A program called myprog is called in the second case. The input is fetched from the stem somefood. (again note the trailing period), and the standard output of the program is redirected to the stream called PRG.OUT (note it is uppercased using standard Rexx rules). Any generated error messages via the standard error stream are redirected to the stem called oops. Note the problematic PRG.OUT. You have to use a symbol and can't use strings.
In the third example, the redirection behaviour of the environment PATH is changed for all future uses. The input for all commands addressed to this environment is fetched from the standard stack in FIFO order. After each call the stack will be flushed. The output is sent to the default output stream, which is the current console in most cases. The behaviour for error messages is not changed.
The fourth example allows pipes between commands in the environment; SYSTEM for all future uses. The input is fetched from the default stack and sent to the default stack after each command. The stack itself is flushed in between. Each executed program will write to something which is the input to the next called command. The error redirection is set or set back to the initial behaviour of writing to the standard error stream.
The fifth example relates to the fourth. The default stack has to be filled with something initially. This is done by the redirection to the stream FEEDER while writing the output of the fgrep command to the default FIFO as declared in example four. After this, a single line with a simple sort command will sort the output of fgrep and place it in the default stack. You can fetch the final output of your pipe cascade by reading the stack contents. This statement overwrites some of the rules of the fourth example temporarily.
You can see the powerful possibilities of the redirection command. The disadvantage is the loss of a direct overview of what happens after a permanent redirection command has executed.
Its now the time to show you all rules and semantics of the redirection.
Rules for the redirection by the keyword WITH of the ADDRESS statement:
Every environment has its own default redirection set.
Every redirection set consists of three independent redirection streams; standard input (INPUT), standard output (OUTPUT) and standard error (ERROR). Users with some experiences with Unix, DOS & Windows or OS/2 may remember the redirection commands of the command line interpreter which can redirect each of the streams, too. This is nearly the same.
Each redirection stream starts with the program-startup streams given to REXX when invoking the interpreter. These can be reset to the startup default by specifying the argument NORMAL for each redirection stream.
The sequence of the redirection streams is irrelevant.
You can specify each stream only once per statement.
Redirections
can be intermixed. This means you can let both the OUTPUT and the
ERROR redirection point to the same "thing". The data from
the different channels will be put to the assigned "thing"
as they arrive. ANSI's
point of view isn't very clear at this point. They state to keep the
output different for files and put them together after the called
program finished while the data shall be mixed at once when using
stems.
Regina always mixes the fetched data at once.
Redirections from and to the same source/destination try to keep the data consistent. If the INPUT/OUTPUT pair or the INPUT/ERROR pair points to the same destination, the content of the input or output channel is buffered so that writing to the output won't overwrite the input.
All redirection streams are entered by its name (e.g. INPUT), a redirection processor (e.g. STREAM) and a destination symbol (e.g. OUT_FN) following the rules to the redirection processor. This means that you have to enter a dot after a symbol name for a stem, or any symbol for the rest of the processors, in which case the content of the symbol is used as for normal variables.
Both OUTPUT and ERROR streams can replace or append the data to the destination. Simply append either APPEND or REPLACE immediately after the OUTPUT or ERROR keywords. REPLACE is the default.
The destination is checked or cleared prior to the execution of the command.
ANSI defines two redirection processors: STEM and STREAM. The processors LIFO and FIFO are allowed extensions to the standard.
The processor STEM uses the content of the symbol destination.0 to access the count of the currently accessible lines. destination is the given destination name, of course. destination.0 must be filled with a whole, non-negative number in terms of the DATATYPE builtin function. Each of n lines can be addressed by appending the whole numbers one to n to the stem. Example: STEM foo. is given, FOO.0 contains 3. This indicates three content lines. They are the contents of the symbols FOO.1 and FOO.2 and FOO.3 .
The processor STREAM uses the content of the symbol destination to use a stream as known in the STREAM builtin function. The usage is nearly equivalent to the commands LINEIN destination or LINEOUT destination for accessing the contents of the file. An empty variable (content set to the empty string) as the content of the destination is allowed and indicates the default input, output or error streams given to the REXX program. This is equivalent to the NORMAL keyword.
The processor LIFO uses the content of the symbol destination as a queue name. New lines are pushed in last-in, first-out order to the queue. An empty destination string is allowed and describes the default queue. Lines are fetched from the queue if this processor is used for the INPUT stream.
The processor FIFO uses the content of the symbol destination as a queue name. New lines are pushed in first-in, first-out order to the queue. An empty destination string is allowed and describes the default queue. Lines are fetched from the queue if this processor is used for the INPUT stream.
On INPUT, all the data in the input stream is read up to either the end of the input data or until the called process terminates. The latter one may be determined after feeding up the input stream of the called process with unused data. Thus, there is no way to say if data is used or not. This isn't a problem with STEMs. But all file related sequential access objects including LIFOs and FIFOs may have lost data between two calls. Imagine an input file (STREAM) with three lines:
One line
DELIMITER
Second line
and
furthermore two processes p1 and p2 called WITH INPUT
STREAM f with f containing the three lines above. p1
reads lines up until a line containing DELIMITER and p2
processes the rest. It is very likely that the second process won't
fetch any line because the stream may be processed by REXX, and REXX
may has put one or more lines ahead into the feeder pipe to the
process. This might or might not happen. It is implementation
dependent and Regina shows this behaviour. The input object is
checked for existence and if it is properly set up before the
command is started.
In short: INPUT may or may not use the
entire input.
Both OUTPUT and ERROR objects are checked for being properly set up just before the command starts. REPLACE is implemented as a deletion just before the command starts. Note that ANSI doesn't force STEM lines to be dropped in case of a replacement. A big stem with thousands of lines will still exist after a replacement operation if the called command doesn't produce any output. Just destination.0 is set to 0.
The redirection of commands is a mystery to many people and it will continue be. You can thank all the people who designed stacks, queues, pipelines and all the little helper utilities of a witch's kitchen of process management.
ARG [ template ] ;
The ARG instruction will parse the argument strings at the current procedural level into the template. Parsing will be performed in upper case mode. This clause is equivalent to:
PARSE UPPER ARG [ template ] ;
For more information, see the PARSE instruction. Note that this is the only situation where a multistring template is relevant.
Example: Beware assignments
The similarity between ARG and PARSE UPPER ARG has one exception. Suppose the PARSE UPPER ARG has an absolute positional pattern as the first element in the template, like:
parse upper arg =(foo) bar
This is not equivalent to an ARG instruction, because ARG instruction would become an assignment. A simple trick to avoid this problem is just to prepend a placeholder period (.) to the pattern, thus the equal sign (=) is no longer the second token in the new ARG instruction. Also, unless the absolute positional pattern is indirect, the equal sign can be removed without changing the meaning of the statement.
CALL = routine [ parameter ]
[, [ parameter ] ... ] ;
{ ON | OFF } condition [ NAME label ] ;
The CALL instruction invokes a subroutine, named by routine, which can be internal, built-in, or external; and the three repositories of functions are searched in that order. are searched for routine in that order. The token routine must be either a literal string or a symbol (which is taken literally). However, if routine is a literal string, the pool of internal subroutines is not searched. Note that some interpreters may have additional repositories of labels to search.
In a CALL instruction, each parameter is evaluated, strictly in order from left to right, and passed as an argument to the subroutine. A parameter might be left out (i.e. an empty argument), which is not the same as passing the nullstring as argument.
Users often confuse a parameter which is the nullstring with leaving out the parameter. However, this is two very different situations. Consider the following calls to the built-in function TRANSLATE():
say translate('abcDEF' ) /* says ABCDEF */
say translate('abcDEF',"") /* says abcDEF */
say translate('abcDEF',,"") /* says ' ' */
The TRANSLATE() function is able to differ between receiving the nullstring (i.e. a defined string having zero length), from the situation where a parameter was not specified (i.e. the undefined string). Since TRANSLATE() is one of the few functions where the parameters' default values are very different from the nullstring, the distinction becomes very visible.
Breakage Alert!! |
Prior to Version 3.1 of Regina, the following syntactical use of the CALL instruction was valid: call myfunc('abcDEF',,"") This syntax is not allowed by ANSI and use of this syntax will now result in Error 37.1 |
Breakage Alert!! |
For the CALL instruction, watch out for interference with line continuation. If there are trailing commas, it might be interpreted as line continuation. If a CALL instruction use line continuation between two parameters, two commas are needed: one to separate the parameters, and one to denote line continuation.
A number of settings are stored across internal subroutine calls. An internal subroutine will inherit the values in effect when the call is made, and the settings are restored on exit from the subroutine. These settings are:
Conditions traps, see chapter Conditions.
Current trapped condition, see section CTS.
NUMERIC settings, see section Numeric.
ADDRESS environments, see section Address.
TRACE mode, see section Trace and chapter [not yet written].
The elapse time clock, see section Time.
Also, the OPTIONS settings may or may not be restored, depending on the implementation. Further, a number of other things may be saved across internal subroutines. The effect on variables are controlled by the PROCEDURE instruction in the subroutine itself. The state of all DO-loops will be preserved during subroutine calls.
Example: Subroutines and trace settings
Subroutines can not be used to set various settings like trace settings, NUMERIC settings, etc. Thus, the following code will not work as intended:
say digits() /* says 9, maybe */
call inc_digits
say digits() /* still says 9 */
exit
inc_digits:
numeric digits digits() + 1
return
The programmer probably wanted to call a routine which incremented the precision of arithmetic operations. However, since the setting of NUMERIC DIGITS is saved across subroutine calls, the new value set in inc_digits is lost at return from that routine. Thus, in order to work correctly, the NUMERIC instruction must be located in the main routine itself.
Built-in subroutines will have no effect on the settings, except for explicitly defined side effects. Nor will external subroutines change the settings. For all practical purposes, an external subroutine is conceptually equivalent to reinvoking the interpreter in a totally separated process.
If the name of the subroutine is specified by a literal string, then the name will be used as-is; it will not be converted to upper case. This is important because a routine which contains lower case letters can only be invoked by using a literal string as the routine name in the CALL instruction.
Example: Labels are literals
Labels are literal, which means that they are neither tail-substituted nor substituted for the value of the variable. Further, this also means that the setting of NUMERIC DIGITS has no influence on the section of labels, even when the labels are numeric symbols. Consider the following code:
call 654.32
exit
654.321:
say here
return
654.32:
say there
return
In this example, the second of the two subroutines are always chosen, independent of the setting of NUMERIC DIGITS. Assuming that NUMERIC DIGITS are set to 5, then the number 654.321 is converted to 654.32, but that does not affect labels. Nor would a statement CALL 6.5432E2 call the second label, even though the numeric value of that symbol is equal to that of one of the labels.
The called subroutines may or may not return data to the caller. In the calling routine, the special variable RESULT will be set to the return value or dropped, depending on whether any data was returned or not. Thus, the CALL instruction is equivalent to calling the routine as a function, and assigning the return value to RESULT, except when the routine does not return data.
In REXX, recursive routines are allowed. A minimum number of 100 nested internal and external subroutine invocations, and support for a minimum of 10 parameters for each call are required by REXX. See chapter Limits for more information concerning implementation limits.
When the token following CALL is either ON or OFF, the CALL instruction is not used for calling a subroutine, but for setting up condition traps. In this case, the third token of the clause must be the name of a condition, which setup is to be changed.
If the second token was ON, then there can be either three or five tokens. If the five token version is used, then the fourth token must be NAME and the fifth token is taken to be the symbolic name of a label, which is the condition handler. This name can be either a constant string, or a symbol, which is taken literally. When OFF is used, the named condition trap is turned off.
Note that the ON and OFF forms of the CALL instruction were introduced in TRL2. Thus, they are not likely to be present on older interpreters. More information about conditions and condition traps are given in a chapter Conditions.
DO [ repetitor ] [ conditional ] ;
[ clauses ]
END [ symbol ] ;
repetitor : = symbol = expri [ TO exprt ]
[ BY exprb ] [ FOR exprf ]
exprr
FOREVER
conditional : = WHILE exprw
UNTIL expru
The DO/END instruction is the instruction used for looping and grouping several statements into one block. This is a multi-clause instruction.
The most simple case is when there is no repetitor or conditional, in which case it works like BEGIN/END in Pascal or {...} in C. I.e. it groups zero or more REXX clauses into one conceptual statement.
The repetitor subclause controls the control variable of the loop, or the number of repetitions. The exprr subclause may specify a certain number of repetitions, or you may use FOREVER to go on looping forever.
If you specify the control variable symbol, it must be a variable symbol, and it will get the initial value expri at the start of the loop. At the start of each iteration, including the first, it will be checked whether it has reached the value specified by exprt. At the end of each iteration the value exprb is added to the control variable. The loop will terminate after at most exprf iterations. Note that all these expressions are evaluated only once, before the loop is entered for the first iteration.
You may also specify UNTIL or WHILE, which take a boolean expression. WHILE is checked before each iteration, immediately after the maximum number of iteration has been performed. UNTIL is checked after each iteration, immediately before the control variable is incremented. It is not possible to specify both UNTIL and WHILE in the same DO instruction.
The FOREVER keyword is only needed when there is no conditional, and the repetitor would also be empty if FOREVER was not specified. Actually, you could rewrite this as DO WHILE 1. The two forms are equivalent, except for tracing output.
The subclauses TO, BY, and FOR may come in any order, and their expressions are evaluated in the order in which they occur. However, the initial assignment must always come first. Their order may affect your program if these expressions have any side effects. However, this is seldom a problem, since it is quite intuitive. Note that the counting of iterations, if the FOR subclause has been specified, is never affected by the setting of NUMERIC DIGITS.
Example: Evaluation order
What may prove a real trap, is that although the value to which the control variable is set is evaluated before any other expressions in the repetitor, it is assigned to the control variable after all expressions in the repetitor have been evaluated.
The following code illustrates this problem:
ctrl = 1
do ctrl=f(2) by f(3) to f(5)
call func(6)
end
call func(7)
exit
f:
say 'ctrl='ctrl 'arg='arg(1)
return arg(1)
This code produces the following output:
ctrl=1 arg=2
ctrl=1 arg=3
ctrl=1 arg=5
ctrl=2 arg=6
ctrl=5 arg=6
ctrl=8 arg=7
Make sure you understand why the program produces this output. Failure to understand this may give you a surprise later, when you happen to write a complex DO-instruction, and do not get the expected result.
If the TO expression is omitted, there is no checking for an upper bound of the expression. If the BY subclause is omitted, then the default increment of 1 is used. If the FOR subclause is omitted, then there is no checking for a maximum number of iterations.
Example: Loop convergence For the reasons just explained, the instruction:
do ctrl=1
nop /* and other statements */
end
will start with CTRL being 1, and then iterate through 2, 3, 4, ..., and never terminate except by LEAVE, RETURN, SIGNAL, or EXIT.
Although similar constructs in other languages typically provokes an overflow at some point, something "strange" happens in REXX. Whenever the value of ctrl becomes too large, the incrementation of that variable produces a result that is identical to the old value of ctrl. For NUMERIC DIGITS set to 9, this happens when ctrl becomes 1.00000000E+9. When adding 1 to this number, the result is still 1.00000000E+9. Thus, the loop "converges" at that value.
If the value of NUMERIC DIGITS is 1, then it will "converge" at 10, or 1E+1 which is the "correct" way of writing that number under NUMERIC DIGITS 1. You can in general disregard loop "convergence", because it will only occur in very rare situations.
Example: Difference between UNTIL and WHILE
One frequent misunderstanding is that the WHILE and UNTIL subclauses of the DO/END instruction are equivalent, except that WHILE is checked before the first iteration, while UNTIL is first checked before the second iteration.
This may be so in other languages, but in REXX. Because of the order in which the parts of the loop are performed, there are other differences. Consider the following code:
count = 1
do i=1 while count \= 5
count = count + 1
end
say i count
count = 1
do i=1 until count=5
count = count + 1
end
say i count
After the first loop, the numbers 6 and 5, while in the second loop, the numbers 5 and 5 are written out. The reason is that a WHILE clause is checked after the control variable of the loop has been incremented, but an UNTIL expression is checked before the incrementation.
A loop can be terminated in several ways. A RETURN or EXIT instruction terminates all active loops in the procedure levels terminated. Further, a SIGNAL instruction transferring control (i.e. neither a SIGNAL ON nor SIGNAL OFF) terminates all loops at the current procedural level. This applies even to "implicit" SIGNAL instructions, i.e. when triggering a condition handler by the method of SIGNAL. A LEAVE instruction terminates one or more loops. Last but not least, a loop can terminate itself, when it has reached its specified stop conditions.
Note that the SIGNAL instruction terminates also non-repetitive loops (or rather: DO/END pairs), thus after an SIGNAL instruction, you must not execute an END instruction without having executed its corresponding DO first (and after the SIGNAL instruction). However, as long as you stay away from the ENDs, it is all right according to TRL to execute code within a loop without having properly activated the loop itself.
Note that on exit from a loop, the value of the control variable has been incremented once after the last iteration of the loop, if the loop was terminated by the WHILE expression, by exceeding the number of max iterations, or if the control variable exceeded the stop value. However, the control variable has the value of the last iteration if the loop was terminated by the UNTIL expression, or by an instruction inside the loop (e.g. LEAVE, SIGNAL, etc.).
The following algorithm in REXX code shows the execution of a DO instruction, assuming that expri, exprt, exprb, exprf, exprw, expru, and symbol have been taken from the syntax diagram of DO.
@expri = expri
@exprt = exprt
@exprb = exprb
@exprf = exprf
@iters = 0
symbol = @expri
start_of_loop:
if symbol > @extrt then signal after_loop
if @iters > @exprf then signal after_loop
if exprw then signal after_loop
instructions
end_of_loop:
if \expru then signal after_loop
symbol = symbol + @exprb
signal start_of_loop
after_loop:
Some notes are in order for this algorithm. First, it uses the SIGNAL instruction, which is defined to terminate all active loops. This aspect of the SIGNAL instruction has been ignored for the purpose of illustrating the DO, and consequently, the code shown above is not suitable for nested loops. Further, the order of the first four statements should be identical to the order in the corresponding subclauses in the repetitor. The code has also ignored that the WHILE and the UNTIL subclauses can not be used in the same DO instruction. And in addition, all variables starting with the at sign (@), are assumed to be internal variables, private to this particular loop. Within instructions, a LEAVE instruction is equivalent to signal after_loop, while a ITERATE instruction is equivalent to signal end_of_loop.
DROP symbol [ symbol ... ] ;
The DROP instruction makes the named variables uninitialized, i.e. the same state that they had at the startup of the program. The list of variable names are processed strictly from left to right and dropped in that order. Consequently, if one of the variables to be dropped is used in a tail of another, then the order might be significant. E.g. the following two DROP instructions are not equivalent:
bar = 'a'
drop bar foo.bar /* drops 'BAR' and 'FOO.BAR' */
bar = 'a'
drop foo.bar bar /* drops 'FOO.a' and 'BAR'
The variable terms can be either a variable symbol or a symbol enclosed in parentheses. The former form is first tail-substituted, and then taken as the literal name of the symbol to be dropped. The result names the variable to drop. In the latter form, the value of the variable symbol inside the parentheses is retrieved and taken as a space separated list of symbols. Each of these symbols is tail-substituted (if relevant); and the result is taken as the literal name of a variable to be dropped. However, this process is not recursive, so that the list of names referred to indirectly can not itself contain parentheses. Note that the second form was introduced in TRL2, mainly in order to make INTERPRET unnecessary.
In general, things contained in parentheses can be any valid REXX expression, but this does not apply to the DROP, PARSE, and PROCEDURE instructions.
Example: Dropping compound variables
Note a potential problem for compound variables: when a stem variable is set, it will not set a default value, rather it will assign "all possible variables" in that stem collection at once. So dropping a compound variable in a stem collection for which the stem variable has been set, will set that compound variable to the original uninitialized value; not the value of the stem variable. See section Assign for further notes on assignments. To illustrate consider the code:
foo. = 'default'
drop baz bar foo.bar
say foo.bar foo.baz /* says 'FOO.BAR default' */
In this example, the SAY instruction writes out the value of the two compound variables FOO.BAR and FOO.BAZ. When performing tail-substitution for these, the interpreter finds that both BAR and BAZ are uninitialized. Further, FOO.BAR has also been made uninitialized, while FOO.BAZ has the value assigned to it in the assignment to the stem variable.
Example: Tail-substitution in DROP
For instance, suppose that the variable FOO has the value bar. After being dropped, FOO will have its uninitialized value, which is the same as its name: FOO. If the variable to be dropped is a stem variable, then both the stem variable and all compound variables of that stem become uninitialized.
bar = 123
drop foo.bar /* drops 'FOO.123' */
Technically, it should be noted that some operations involving dropping of compound variables can be very space consuming. Even though the standard does not operate with the term "default value" for the value assigned to a stem variable, that is the way in which it is most likely to be implemented. When a stem is assigned a value, and some of its compound variables are dropped afterwards, then the interpreter must use memory to store references to the variables dropped. This might seem counterintuitive at first, since dropping ought to release memory, not allocate more.
There is a parallel between DROP and PROCEDURE EXPOSE. However, there is one important difference, although PROCEDURE EXPOSE will expose the name of a variable enclosed in parentheses before starting to expose the symbols that variable refers to, this is not so for DROP. If DROP had mimicked the behavior of PROCEDURE EXPOSE in this matter, then the whole purpose of indirect specifying of variables in DROP would have been defeated.
Dropping a variable which does not have a value is not an error. There is no upper limit on the number of variables that can be dropped in one DROP clause, other than restrictions on the clause length. If an exposed variable is dropped, the variable in the caller is dropped, but the variable remains exposed. If it reassigned a value, the value is assigned to a variable in the caller routine.
EXIT [ expr ] ;
Terminates the REXX program, and optionally returns the expression expr to the caller. If specified, expr can be any string. In some systems, there are restrictions on the range of valid values for the expr. Often the return expression must be an integer, or even a non-negative integer. This is not really a restriction on the REXX language itself, but a restriction in the environment in which the interpreter operates, check the system dependent documentation for more information.
If expr is omitted, nothing will be returned to the caller. Under some circumstances that is not legal, and might be handled as an error or a default value might be used. The EXIT instruction behaves differently in a "program" than in an external subroutine. In a "program", it returns control to the caller e.g. the operating system command interpreter. While for an external routine, it returns control to the calling REXX script, independent of the level of nesting inside the external routine being terminated.
|
RETURN |
EXIT |
At the main level of the program |
Exits program |
Exits program |
At an internal subroutine level of the program |
Exits subroutine, and returns to caller |
Exits program |
At the main level of an external subroutine |
Exits the external subroutine |
Exits the external subroutine |
At a subroutine level within an external subroutine |
Exits the subroutine, returning to calling routine within external subroutine script |
Exits the external subroutine |
Actions of RETURN and EXIT Instructions
If terminating an external routine (i.e. returning to the calling REXX script) any legal REXX string value is allowed as a return value. Also, no return value can be returned, and in both cases, this information is successfully transmitted back to the calling routine. In the case of a function call (as opposed to a subroutine call), returning no value will raise SYNTAX condition {44}. The table above describes the actions taken by the EXIT and RETURN instruction in various situations.
IF expr [;] THEN [;] statement
[ ELSE [;] statement ]
This is a normal if-construct. First the boolean expression expr is evaluated, and its value must be either 0 or 1 (everything else is a syntax error which raises SYNTAX condition number {34}). Then, the statement following either THEN or ELSE is executed, depending on whether expr was 1 or 0, respectively.
Note that there must come a statement after THEN and ELSE. It is not allowed to put just a null-clause (i.e. a comment or a label) there. If you want the THEN or ELSE part to be empty, use the NOP instruction. Also note that you can not directly put more than one statement after THEN or ELSE; you have to package them in a DO-END pair to make them a single, conceptual statement.
After THEN, after ELSE, and before THEN, you might put one or more clause delimiters (newlines or semicolons), but these are not required. Also, the ELSE part is not required either, in which case no code is executed if expr is false (evaluates to 0). Note that there must also be a statement separator before ELSE, since the that statement must be terminated. This also applies to the statement after ELSE. However, since statement includes a trailing clause delimiter itself, this is not explicitly shown in the syntax diagram.
Example: Dangling ELSE
Note the case of the "dangling" ELSE. If an ELSE part can correctly be thought of as belonging to more than one IF/THEN instruction pair, it will be parsed as belonging to the closest (i.e. innermost) IF instruction:
parse pull foo bar
if foo then
if bar then
say 'foo and bar are true'
else
say 'one or both are false'
In this code, the ELSE instruction is nested to the innermost IF, i.e. to IF BAR THEN.
INTERPRET expr ;
The INTERPRET instruction is used to dynamically build and execute REXX instructions during run-time. First, it evaluates the expression expr, and then parses and interprets the result as a (possibly empty) list of REXX instructions to be executed. For instance:
foo = 'hello, world'
interpret 'say "'foo'!"'
executes the statement SAY "hello, world!" after having evaluated the expression following INTERPRET. This example shows several important aspects of INTERPRET. Firstly, it's very easy to get confused by the levels of quotes, and a bit of caution should be taken to nest the quotes correctly. Secondly, the use of INTERPRET does not exactly improve readability.
Also, INTERPRET will probably increase execution time considerably if put inside loops, since the interpreter may be forced to reparse the source code for each iteration. Many optimizing REXX interpreters (and in particular REXX compilers) has little or no support for INTERPRET. Since virtually anything can happen inside it, it is hard to optimize, and it often invalidates assumptions in other parts of the script, forcing it to ignore other possible optimizations. Thus, you should avoid INTERPRET when speed is at a premium.
There are some restrictions on which statements can be inside an INTERPRET statement. Firstly, labels cannot occur there. TRL states that they are not allowed, but you may find that in some implementations labels occurring there will not affect the label symbol table of the program being run. Consider the statement:
interpret 'signal there; there: say hallo'
there:
This statement transfers control to the label THERE in the program, never to the THERE label inside the expression of the INTERPRET instruction. Equivalently, any SIGNAL to a label THERE elsewhere in the program never transfers control to the label inside the INTERPRET instruction. However, labels are strictly speaking not allowed inside INTERPRET strings.
Example: Self-modifying Program
There is an idea for a self-modifying program in REXX which is basically like this:
string = ''
do i=1 to sourceline()
string = string ';' sourceline(i)
end
string = transform( string )
interpret string
exit
transform: procedure
parse arg string
/* do some transformation on the argument */
return string
Unfortunately, there are several reasons why this program will not work in REXX, and it may be instructive to investigate why. Firstly, it uses the label TRANSFORM, which is not allowed in the argument to INTERPRET. The interpret will thus refer to the TRANSFORM routine of the "outermost" invocation, not the one "in" the INTERPRET string.
Secondly, the program does not take line continuations into mind. Worse, the SOURCELINE() built-in function refers to the data of the main program, even inside the code executed by the INTERPRET instruction. Thirdly, the program will never end, as it will nest itself up till an implementation-dependent limit for the maximum number of nested INTERPRET instructions.
In order to make this idea work better, temporary files should be used.
On the other hand, loops and other multi-clause instructions, like IF and SELECT occur inside an INTERPRET expression, but only if the whole instruction is there; you can not start a structured instruction inside an INTERPRET instruction and end it outside, or vice-versa. However, the instruction SIGNAL is allowed even if the label is not in the interpreted string. Also, the instructions ITERATE and LEAVE are allowed in an INTERPRET, even when they refer to a loop that is external to the interpreted string.
Most of the time, INTERPRET is not needed, although it can yield compact and interesting code. If you do not strictly need INTERPRET, you should consider not using it, for reasons of compatibility, speed, and readability. Many of the traditional uses of INTERPRET have been replaced by other mechanisms in order to decrease the necessity of INTERPRET; e.g. indirect specification of variables in EXPOSE and DROP, the improved VALUE() built-in function, and indirect specification of patterns in templates.
Only semicolon (;) is allowed as a clause delimiter in the string interpreted by an INTERPRET instruction. The colon of labels can not be used, since labels are not allowed. Nor does specific end-of-line character sequences have any defined meaning there. However, most interpreters probably allow the end-of-line character sequence of the host operating system as alternative clause delimiters. It is interesting to note that in the context of the INTERPRET instruction, an implicit, trailing clause delimiter is always appended to the string to be interpreted.
ITERATE [ symbol ] ;
The ITERATE instruction will iterate the innermost, active loop in which the ITERATE instruction is located. If symbol is specified, it will iterate the innermost, active loop having symbol as control variable. The simple DO/END statement without a repetitor and conditional is not affected by ITERATE. All active multiclause structures (DO, SELECT, and IF) within the loop being iterated are terminated.
The effect of an ITERATE is to immediately transfer control to the END statement of the affected loop, so that the next (if any) iteration of the loop can be started. It only affects loops on the current procedural level. All actions normally associated with the end of an iteration is performed.
Note that symbol must be specified literally; i.e. tail substitution is not performed for compound variables. So if the control variable in the DO instruction is FOO.BAR, then symbol must use FOO.BAR if it is to refer to the control variable, no matter the value of the BAR variable.
Also note that ITERATE (and LEAVE) are means of transferring control in the program, and therefore they are related to SIGNAL, but they do have the effect of automatically terminating all active loops on the current procedural level, which SIGNAL has.
Two types of errors can occur. Either symbol does not refer to any loop active at the current procedural level; or (if symbol is not specified) there does not exist any active loops at the current procedural level. Both errors are reported as SYNTAX condition {28}.
LEAVE [ symbol ] ;
This statement terminates the innermost, active loop. If symbol is specified, it terminates the innermost, active loop having symbol as control variable. As for scope, syntax, errors, and functionality, it is identical to ITERATE, except that LEAVE terminates the loop, while ITERATE lets the loop start on the next iteration normal iteration. No actions normally associated with the normal end of an iteration of a loop is performed for a LEAVE instruction.
Example: Iterating a simple DO/END
In order to circumvent this, a simple DO/END can be rewritten as this:
if foo then do until 1
say 'This is a simple DO/END group'
say 'but it can be terminated by'
leave
say 'iterate or leave'
end
This shows how ITERATE has been used to terminate what for all practical purposes is a simple DO/END group. Either ITERATE or LEAVE can be used for this purpose, although LEAVE is perhaps marginally faster.
NOP ;
The NOP instruction is the "no operation" statement; it does nothing. Actually, that is not totally true, since the NOP instruction is a "real" statement (and a placeholder), as opposed to null clauses. I've only seen this used in two circumstances.
After any THEN or ELSE keyword, where a statement is required, when the programmer wants an empty THEN or ELSE part. By the way, this is the intended use of NOP. Note that you can not use a null clause there (label, comment, or empty lines), since these are not parsed as "independent" statements.
I have seen it used as "trace-bait". That is, when you start interactive trace, the statement immediately after the TRACE instruction will be executed before you receive interactive control. If you don't want that to happen (or maybe the TRACE instruction was the last in the program), you need to add an extra dummy statement. However, in this context, labels and comments can be used, too.
NUMERIC DIGITS [ expr ] ;
FORM [ SCIENTIFIC | ENGINEERING | [ VALUE ] expr ] ;
FUZZ [ expr ] ;
REXX has an unusual form of arithmetic. Most programming languages use integer and floating point arithmetic, where numbers are coded as bits in the computers native memory words. However, REXX uses floating point arithmetic of arbitrary precision, that operates on strings representing the numbers. Although much slower, this approach gives lots of interesting functionality. Unless number-crunching is your task, the extra time spent by the interpreter is generally quite acceptable and often almost unnoticeable.
The NUMERIC statement is used to control most aspects of arithmetic operations. It has three distinct forms: DIGITS, FORM and FUZZ; which to choose is given by the second token in the instruction:
DIGITS
Is used to set the number of significant digits in arithmetic operations. The initial value is 9, which is also the default value if expr is not specified. Large values for DIGITS tend to slow down some arithmetic operations considerably. If specified, expr must be a positive integer.
FUZZ
Is used in numeric comparisons, and its initial and default value is 0. Normally, two numbers must have identical numeric values for a number of their most significant digits in order to be considered equal. How many digits are considered is determined by DIGITS. If DIGITS is 4, then 12345 and 12346 are equal, but not 12345 and 12356. However, when FUZZ is non-zero, then only the DIGITS minus FUZZ most significant digits are checked. E.g. if DIGITS is 4 and FUZZ are 2, then 1234 and 1245 are equal, but not 1234 and 1345.
The value for FUZZ must be a non-negative integer, and less than the value of DIGITS. FUZZ is seldom used, but is useful when you want to make comparisons less influenced by inaccuracies. Note that using with values of FUZZ that is close to DIGITS may give highly surprising results.
FORM
Is used to set the form in which exponential numbers are written. It can be set to either SCIENTIFIC or ENGINEERING. The former uses a mantissa in the range 1.000... to 9.999..., and an exponent which can be any integer; while the latter uses a mantissa in the range 1.000... to 999.999..., and an exponent which is dividable by 3. The initial and default setting is SCIENTIFIC. Following the subkeyword FORM may be the subkeywords SCIENTIFIC and ENGINEERING, or the subkeyword VALUE. In the latter case, the rest of the statement is considered an expression, which will evaluate to either SCIENTIFIC or ENGINEERING. However, if the first token of the expression following VALUE is neither a symbol nor literal string, then the VALUE subkeyword can be omitted.
The setting of FORM never affects the decision about whether to choose exponential form or normal floating point form; it only affects the appearance of the exponential form once that form has been selected.
Many things can be said about the usefulness of FUZZ. My impression is that it is seldom used in REXX programs. One problem is that it only addresses relative inaccuracy: i.e. that the smaller value must be within a certain range, that is determined by a percentage of the larger value. Often one needs absolute inaccuracy, e.g. two measurements are equal if their difference are less than a certain absolute threshold.
Example: Simulating relative accuracy with absolute accuracy
As explained above, REXX arithmetic has only relative accuracy, in order to obtain absolute accuracy, one can use the following trick:
numeric fuzz 3
if a=b then
say 'relative accuracy'
if abs(a-b)<=500 then
say 'absolute accuracy'
In the first IF instruction, if A is 100,000, then the range of values for B which makes the expression true is 99,500-100,499, i.e. an inaccuracy of about +-500. If A has the value 10,000,000, then B must be within the range 9,950,000-10,049,999; i.e. an inaccuracy of about +-50,000.
However, in the second IF instruction, assuming A is 100,000, the expression becomes true for values of B in the range 99,500-100,500. Assuming that A is 10,000,000, the expression becomes true for values of B in the range 9,999,500-10,000,500.
The effect is largely to force an absolute accuracy for the second example, no matter what the values of A and B are. This transformation has taken place since an arithmetic subtraction is not affected by the NUMERIC FUZZ, only numeric comparison operations. Thus, the effect of NUMERIC FUZZ on the implicit subtraction in the operation = in the first IF has been removed by making the subtraction explicit.
Note that there are some minor differences in how numbers are rounded, but this can be fixed by transforming the expression into something more complex.
To retrieve the values set for NUMERIC, you can use the built-in functions DIGITS(), FORM(), and FUZZ(). These values are saved across subroutine calls and restored upon return.
OPTIONS expr ;
The OPTIONS instruction is used to set various interpreter-specific options. Its typical uses are to select certain REXX dialects, enable optimizations (e.g. time versus memory considerations), etc. No standard dictates what may follow the OPTIONS keyword, except that it should be a valid REXX expression, which is evaluated. Currently, no specific options are required by any standard.
The contents of expr is supposed to be word based, and it is the intention that more than one option can be specified in one OPTIONS instruction. REXX interpreters are specifically instructed to ignore OPTIONS words which they do not recognize. That way, a program can use run-time options for one interpreter, without making other interpreters trip when they see those options. An example of OPTION may be:
OPTIONS 4.00 NATIVE_FLOAT
The instruction might instruct the interpreter to start enforcing language level 4.00, and to use native floating point numbers in stead of the REXX arbitrary precision arithmetic. On the other hand, it might also be completely ignored by the interpreter.
It is uncertain whether modes selected by OPTIONS will be saved across subroutine calls. Refer to implementation-specific documentation for information about this.
Example: Drawback of OPTIONS
Unfortunately, the processing of the OPTIONS instruction has a drawback. Since an interpreter is instructed to ignore option-settings that it does not understand, it may ignore options which are essential for further processing of the program. Continuing might cause a fatal error later, although the behavior that would most precisely point out the problem is a complaint about the non-supported OPTION setting. Consider:
options 'cms_bifs'
pos = find( haystack, needle )
If this code fragment is run on an interpreter that does not support the cms_bifs option setting, then the OPTIONS instruction may still seem to have been executed correctly. However, the second clause will generally crash, since the FIND() function is still not available. Even though the real problem is in the first line, the error message is reported for the second line.
PARSE [ UPPER ] type [ template ] ;
type = { ARG | LINEIN | PULL | SOURCE | VERSION }
VALUE [ expr ] WITH
VAR symbol
The PARSE instruction takes one or more source strings, and then parses them using the template for directions. The process of parsing is one where parts of a source string are extracted and stored in variables. Exactly which parts, is determined by the patterns. A complete description of parsing is given in chapter [not yet written].
Which strings are to be the source of the parsing is defined by the type subclause, which can be any of:
ARG.
The data to use as the source during the parsing is the argument strings given at the invocation of this procedure level. Note that this is the only case where the source may consist of multiple strings.
LINEIN.
Makes the PARSE instruction read a line from the standard input stream, as if the LINEIN() built-in function had been called. It uses the contents of that line (after stripping off end-of-line characters, if necessary) as the source for the parsing. This may raise the NOTREADY condition if problems occurred during the read.
PULL.
Retrieves as the source string for the parsing the topmost line from the stack. If the stack is empty, the default action for reading an empty stack is taken. That is, it will read a whole line from the standard input stream, strip off any end-of-line characters (if necessary), and use that string as the source.
SOURCE.
The source string for the parsing is a string containing information about how this invocation of the REXX interpreter was started. This information will not change during the execution of a REXX script. The format of the string is:
system invocation filename
Here, the first space-separated word (system) is a single word describing the platform on which the system is running. Often, this is the name of the operating system. The second word describes how the script was invoked. TRL2 suggests that invocation could be COMMAND, FUNCTION, or SUBROUTINE, but notes that this may be specific to VM/CMS.
Everything after the second word is implementation-dependent. It is indicated that it should refer to the name of the REXX script, but the format is not specified. In practice, the format will differ because the format of file names differs between various operating systems. Also, the part after the second word might contain other types of information. Refer to the implementation-specific notes for exact information.
VALUE expr WITH.
This form will evaluate expr and use the result of that evaluation as the source string to be parsed. The token WITH may not occur inside expr, since it is a reserved subkeyword in this context.
VAR symbol.
This form uses the current value of the named variable symbol (after tail-substitution) as the source string to be parsed. The variable may be any variable symbol. If the variable is uninitialized, then a NOTREADY condition will be raised.
VERSION.
This format resembles SOURCE, but it contains information about the version of REXX that the interpreter supports. The string contains five words, and has the following format:
language level date month year
Where language is the name of the language supported by the REXX interpreter. This may seem like overkill, since the language is REXX, but there may be various different dialects of REXX. The word can be just about anything, except for two restrictions, the first four letters should be REXX (in upper case), and the word should not contain any periods. [TRL2] indicates that the remainder of the word (after the fourth character) can be used to identify the implementation.
The second word is the REXX language level supported by the interpreter. Note that this is not the same as the version of the interpreter, although several implementations makes this mistake. Strictly speaking, neither [TRL1] nor [TRL2] define the format of this word, but a numeric format is strongly suggested.
The last three words (date, month, and year) makes up the date part of the string. This is the release date of the interpreter, in the default format of the DATE() built-in function.
Much confusion seems to be related to the second word of PARSE VERSION. It describes the language level, which is not the same as the version number of the interpreter. In fact, most interpreters have a version numbering which is independent of the REXX language level. Unfortunately, several interpreters makes the mistake of using this field as for their own version number. This is very unfortunate for two reasons; first, it is incorrect, and second, it makes it difficult to determine which REXX language level the interpreter is supposed to support.
Chances are that you can find the interpreter version number in PARSE SOURCE or the first word of PARSE VERSION.
The format of the REXX language level is not rigidly defined, but TRL1 corresponds to the language level 3.50, while TRL2 corresponds to the language level 4.00. Both implicitly indicate the that language level description is a number, and states that an implementation less than a certain number "may be assumed to indicate a subset" of that language level. However, this must not be taken to literally, since language level 3.50 has at least two features which are missing in language level 4.00 (the Scan trace setting, and the PROCEDURE instruction that is not forced to be the first instruction in a subroutine). [TRH:PRICE] gives a very good overview over the varying functionality of different language levels of REXX up to level 4.00.
With the release of the ANSI REXX Standard [ANSI] in 1996, the REXX language IS now rigidly defined. The language level of ANSI REXX is 5.00. Regina is attempting to keep pace with the ANSI Standard. It includes some features of language level 5.00 such as date and time conversions in the DATE() and TIME() BIFs plus the new BIFs COUNTSTR() and CHANGESTR(). Regina does not supply a complete set of multiple-level error messages as defined in the ANSI Standard, nor the extensions to ADDRESS, so does not comply to language level 5.00, but currently is a hybrid between 4.00 and 5.00. Thus PARSE VERSION will return 4.xx :-)
Note that even though the information of the PARSE SOURCE is constant throughout the execution of a REXX script, this is not necessarily correct for the PARSE VERSION. If your interpreter supports multiple language levels (e.g. through the OPTIONS instruction), then it will have to change the contents of the PARSE VERSION string in order to comply with different language levels. To some extent, this may also apply to PARSE SOURCE, since it may have to comply with several implementation-specific standards.
After the source string has been selected by the type subclause in the PARSE instruction, this string is parsed into the template. The functionality of templates is common for the PARSE, ARG and PULL instructions, and is further explained in chapter [not yet written].
PROCEDURE [ EXPOSE [ varref [ varref ... ] ] ] ;
varref = { symbol | ( symbol ) }
The PROCEDURE instruction is used by REXX subroutines in order to control how variables are shared among routines. The simplest use is without any parameters; then all future references to variables in that subroutine refer to local variables. If there is no PROCEDURE instruction in a subroutine, then all variable references in that subroutine refer to variables in the calling routine's name space.
If the EXPOSE subkeyword is specified too, then any references to the variables in the list following EXPOSE refer to local variables, but to variables in the name space of the calling routine.
Example: Dynamic execution of PROCEDURE
The definition opens for some strange effects, consider the following code:
call testing
testing:
say foo
procedure expose bar
say foo
Here, the first reference to FOO is to the variable FOO in the caller routine's name space, while the second reference to FOO is to a local variable in the called routine's name space. This is difficult to parse statically, since the names to expose (and even when to expose them) is determined dynamically during run-time. Note that this use of PROCEDURE is allowed in [TRL1], but not in [TRL2].
Several restrictions have been imposed on PROCEDURE in [TRL2] in order to simplify the execution of PROCEDURE (and in particular, to ease the implementation of optimizing interpreters and compilers).
The first restriction, to which all REXX interpreters adhere as far as I know, is that each invocation of a subroutine (i.e. not the main program) may execute PROCEDURE at most once. Both TRL1 and TRL2 contain this restriction. However, more than one PROCEDURE instruction may exist "in" each routine, as long as at most one is executed at each invocation of the subroutine.
The second restriction is that the PROCEDURE instruction must be the first statement in the subroutine. This restriction was introduced between REXX language level 3.50 and 4.00, but several level 4.00 interpreters may not enforce it, since there is no breakage when allowing it.
There are several important consequences of this second restriction:
(1) it implicitly includes the first restriction listed above, since only one instruction can be the first; (2) it prohibits selecting one of several possible PROCEDURE instructions; (3) it prohibits using the same variable name twice; first as an exposed and then as a local variable, as indicated in the example above; (4) it prohibits the customary use of PROCEDURE and INTERPRET, where the latter is used to create a level of indirectness for the PROCEDURE instruction. This particular use can be exemplified by:
testing:
interpret 'procedure expose' bar
where BAR holds a list of variable names which are to be exposed. However, in order to make this functionality available without having to resort to INTERPRET, which is generally considered "bad" programming style, new functionality has been added to PROCEDURE between language levels 3.50 and 4.00. If one of the variables in the list of variables is enclosed in parentheses, that means indirection. Then, the variables exposed are: (1) the variable enclosed in parentheses; (2) the value of that variable is read, and its contents is taken to be a space-separated list of variable names; and (3) all there variable names are exposed strictly in order from left to right.
Example: Indirect exposing
Consider the following example:
testing:
procedure expose foo (bar) baz
Assuming that the variable BAR holds the value one two, then variables exposed are the following: FOO, BAR, ONE, TWO, BAZ, in that order. In particular, note that the variable FOO is exposed immediately before the variables which it names are exposed.
Example: Order of exposing
Then there is another fine point about exposing, the variables are hidden immediately after the EXPOSE subkeyword, so they are not initially available when the variable list is processed. Consider the following code:
testing:
procedure expose bar foo.bar foo.baz baz
which exposes variables in the order specified. If the variable BAR holds the value 123, then FOO.123 is exposed as the second item, since BAR is visible after having already been exposed as the first item. On the other hand, the third item will always expose the variable FOO.BAZ, no matter what the value of BAZ is in the caller, since the BAZ variable is visible only after it has been used in the third item. Therefore, the order in which variables are exposed is important. So, if a compound variable is used inside parentheses in an PROCEDURE instruction, then any simple symbols needed for tail substitution must previously to have been explicitly exposed. Compare this to the DROP instruction.
What exactly is exposing? Well, the best description is to say that it makes all future uses (within that procedural level) to a particular variable name refer to the variable in the calling routine rather than in the local subroutine. The implication of this is that even if it is dropped or it has never been set, an exposed variable will still refer to the variable in the calling routine. Another important thing is that it is the tail-substituted variable name that is exposed. So if you expose FOO.BAR, and BAR has the value 123, then only FOO.123 is exposed, and continues to be so, even if BAR later changes its value to e.g. 234.
Example: Global variables
A problem lurking on new REXX users, is the fact that exposing a variable only exposes it to the calling routine. Therefore, it is incorrect to speak of global variables, since the variable might be local to the calling routine. To illustrate, consider the following code:
foo = 'bar'
call sub1
call sub2
exit
sub1: procedure expose foo
say foo /* first says 'bar', then 'FOO' */
return
sub2: procedure
say foo /* says 'FOO' */
call sub1
return
Here, the first subroutine call in the "main" program writes out bar, since the variable FOO in SUB1 refers to the FOO variable in the main program's (i.e. its caller routine's) name space. During the second call from the main program, SUB2 writes out FOO, since the variable is not exposed. However, SUB2 calls SUB1, which exposes FOO, but that subroutine also writes out FOO. The reason for this is that EXPOSE works on the run-time nesting of routines, not on the typographical structure of the code. So the PROCEDURE in SUB1 (on its second invocation) exposes FOO to SUB2, not to the main program as typography might falsely indicate.
The often confusing consequence of the run-time binding of variable names is that an exposed variable of SUB1 can be bound to different global variables, depending on from where it was called. This differs from most compiled languages, which bind their variables independently of from where a subroutine is called. In turn, the consequence of this is that REXX has severe problems storing a persistent, static variable which is needed by one subroutine only. A subroutine needing such a variable (e.g. a count variable which is incremented each time the subroutine is called), must either use an operating system command, or all subroutines calling that subroutine (and their calling routines, etc.) must expose the variable. The first of these solution is very inelegant and non-standard, while the second is at best troublesome and at worst seriously limits the maximum practical size of a REXX program. There are hopes that the VALUE() built-in function will fix this in future standards of REXX.
Another important drawback with PROCEDURE is that it only works for internal subroutines; for external subroutines it either do not work, or PROCEDURE may not even be allowed on the main level of the external subroutine. However, in internal subroutines inside the external subroutines, PROCEDURE is allowed, and works like usual.
PULL [ template ] ;
This statement takes a line from the top of the stack and parse it into the variables in the template. It will also translate the contents of the line to uppercase.
This statement is equivalent to PARSE UPPER PULL [template ] with the same exception as explained for the ARG instruction. See chapter [not yet written] for a description of parsing and chapter Stack for a discussion of the stack.
PUSH [ expr ] ;
The PUSH instruction will add a string to the stack. The string added will either be the result of the expr, or the nullstring if expr is not specified.
The string will be added to the top of the stack (LIFO), i.e. it will be the first line normally extracted from the stack. For a thorough discussion of the stack and the methods of manipulating it, see chapter Stack for a discussion of the stack.
QUEUE [ expr ] ;
The QUEUE instruction is identical to the PUSH instruction, except for the position in the stack where the new line is inserted. While the PUSH puts the line on the "top" of the stack, the QUEUE instruction inserts it at the bottom of the stack (FIFO), or in the bottom of the topmost buffer, if buffers are used.
For further information, refer to documentation for the PUSH instruction, and see chapter Stack for general information about the stack.
RETURN [ expr ] ;
The RETURN instruction is used to terminate the current procedure level, and return control to a level above. When RETURN is executed inside one or more nesting construct, i.e. DO, IF, WHEN, or OTHERWISE, then the nesting constructs (in the procedural levels being terminated) are terminated too.
Optionally, an expression can be specified as an argument to the RETURN instruction, and the string resulting from evaluating this expression will be the return value from the procedure level terminated to the caller procedure level. Only a single value can be returned. When RETURN is executed with no argument, no return value is returned to the caller, and then a SYNTAX condition {44} is raised if the subroutine was invoked as a function.
Example: Multiple entry points
A routine can have multiple exit points, i.e. a procedure can be terminated by any of several RETURN instructions. A routine can also have multiple entry points, i.e. several routine entry points can be terminated by the same RETURN instruction. However, this is rarer than having multiple exit points, because it is generally perceived that it creates less structured and readable code. Consider the following code:
call foo
call bar
call baz
exit
foo:
if datatype(name, 'w') then
drop name
signal baz
bar:
name = 'foo'
baz:
if symbol('name')== 'VAR' then
say 'NAME currently has the value' name
else
say 'NAME is currently an unset variable'
return
Although this is hardly a very practical example, it shows how the main bulk of a routine can be used together with three different entry points. The main part of the routine is the IF statement having two SAY statements. It can be invoked by calling FOO, BAR, or BAZ.
There are several restrictions to this approach. For instance, the PROCEDURE statement becomes cumbersome, but not impossible, to use.
Also note that when a routine has multiple exit points, it may choose to return a return value only at some of those exit points.
When a routine is located at the very end of a source file, there is an implicit RETURN instruction after the last explicit clause. However, according to good programming practice, you should avoid taking advantage of this feature, because it can create problems later if you append new routines to the source file and forget to change the implied RETURN to an explicit one.
If the current procedure level is the main level of either the program or an external subroutine, then a RETURN instruction is equivalent to an EXIT instruction, i.e. it will terminate the REXX program or the external routine. The table in the Exit section shows the actions of both the RETURN and EXIT instructions depending on the context in which they occur.
The SAY Instruction
SAY [ expr ] ;
Evaluates the expression expr, and prints the resulting string on the standard output stream. If expr is not specified, the nullstring is used instead. After the string has been written, an implementation-specific action is taken in order to produce an end-of-line.
The SAY instruction is roughly equivalent to
call lineout , expr
The differences are that there is no way of determining whether the printing was successfully completed if SAY is used, and the special variable RESULT is never set when executing a SAY instruction. Besides, the effect of omitting expr is different. In SAA API, the RXSIOSAY subfunction of the RXSIO exit handler is able to trap a SAY instruction, but not a call to the LINEOUT() built-in function. Further, the NOTREADY condition is never raised for a SAY instruction.
SELECT ; whenpart [ whenpart ... ] [ OTHERWISE [;]
[ statement ... ] ] END ;
whenpart : WHEN expr [;] THEN [;] statement
This instruction is used for general purpose, nested IF structures. Although it has certain similarities with CASE in Pascal and switch in C, it is in some respects very different from these. An example of the general use of the SELECT instruction is:
select
when expr1 then statement1
when expr2 then do
statement2a
statement2b
end
when expr3 then statement3
otherwise
ostatement1
ostatement2
end
When the SELECT instruction is executed, the next statement after the SELECT statement must be a WHEN statement. The expression immediately following the WHEN token is evaluated, and must result in a valid boolean value. If it is true (i.e. 1), the statement following the THEN token matching the WHEN is executed, and afterwards, control is transferred to the instruction following the END token matching the SELECT instruction. This is not completely true, since an instruction may transfer control elsewhere, and thus implicitly terminate the SELECT instruction; e.g. LEAVE, EXIT, ITERATE, SIGNAL, or RETURN or a condition trapped by method SIGNAL.
If the expression of the first WHEN is not true (i.e. `0), then the next statement must be either another WHEN or an OTHERWISE statement. In the former case, the process explained above is iterated. In the latter case, the clauses following the OTHERWISE up to the END statement are interpreted.
It is considered a SYNTAX condition, {7} if no OTHERWISE statement when none of the WHEN-expressions evaluates to true. In general this can only be detected during runtime. However, if one of the WHENs is selected, the absence of an OTHERWISE is not considered an error.
By the nature of the SELECT instruction, the WHENs are tested in the sequence they occur in the source. If more than one WHEN have an expression that evaluates to true, the first one encountered is selected.
If the programmer wants to associate more than one statement with a WHEN statement, a DO/END pair must be used to enclose the statements, to make them one statement conceptually. However, zero, one, or more statements may be put after the OTHERWISE without having to enclose them in a DO/END pair. The clause delimiter is optional after OTHERWISE, and before and after THEN.
Example: Writing SWITCH as IF
Although CASE in Pascal and switch in C are in general table-driven (they check an integer constant and jumps directly to the correct case, based on the value of the constant), SELECT in REXX is not so. It is a just a shorthand notation for nested IF instructions. Thus a SWITCH instruction can always be written as set of nested IF statements; but for very large SWITCH statements, the corresponding nested IF structure may be too deeply nested for the interpreter to handle.
The following code shows how the SWITCH statement shown above can be written as a nested IF structure:
if expr1 then statement1
else if expr2 then do
statement2a
statement2b
end else if expr3 then statement3
else
ostatement1
ostatement2
end
SIGNAL = { string | symbol } ;
[ VALUE ] expr ;
{ ON | OFF } condition [ NAME
{ string | symbol } ] ;
The SIGNAL instruction is used for two purposes: (a) to transfer control to a named label in the program, and (b) to set up a named condition trap.
The first form in the syntax definition transfers control to the named label, which must exist somewhere in the program; if it does not exist, a SYNTAX condition {16} is raised. If the label is multiple defined, the first definition is used. The parameter can be either a symbol (which is taken literally) or a string. If it is a string, then be sure that the case of the string matches the case of the label where it is defined. In practice, labels are in upper case, so the string should contain only uppercase letters too, and no space characters.
The second form of the syntax is used if the second token of the instruction is VALUE. Then, the rest of the instruction is taken as a general REXX expression, which result after evaluation is taken to be the name of the label to transfer control to. This form is really just a special case of the first form, where the programmer is allowed to specify the label as an expression. Note that if the start of expr is such that it can not be misinterpreted as the first form (i.e. the first token of expr is neither a string nor a symbol), then the VALUE subkeyword can be omitted.
Example: Transferring control to inside a loop
When the control of execution is transferred by a SIGNAL instruction, all active loops at the current procedural level are terminated, i.e. they can not continued later, although they can of course be reentered from the normal start. The consequence of this is that the following code is illegal:
do forever
signal there
there:
nop
end
The fact that the jump is altogether within the loop does not prevent the loop from being terminated. Thus, after the jump to the loop, the END instruction is attempted executed, which will result in a SYNTAX condition {10}. However, if control is transferred out of the loop after the label, but before the END, then it would be legal, i.e. the following is legal:
do forever
signal there
there:
nop
signal after
end
after:
This is legal, simply because the END instruction is never seen during this script. Although both TRL1 and TRL2 allow this construct, it will probably be disallowed in ANSI.
Just as loops are terminated by a SIGNAL instruction, SELECT and IF instructions are also terminated. Thus, it is illegal to jump to a location within a block of statements contained in a WHEN, OTHERWISE, or IF instruction, unless the control is transferred out of the block before the execution reaches the end of the block.
Whenever execution is transferred during a SIGNAL instruction, the special variable SIGL is set to the line number of the line containing the SIGNAL instruction, before the control is transferred. If this instruction extends over several lines, it refers to the first of this. Note that even blanks are part of a clause, so if the instruction starts with a line continuation, the real line of the instruction is different from that line where the instruction keyword is located.
The third form of syntax is used when the second token in the instruction is either ON or OFF. In both cases must the third token in the instruction be then name of a condition (as a constant string or a symbol, which is taken literally), and the setup of that condition trap is changed. If the second token is OFF, then the trap of the named condition is disabled.
If the second token is ON, then the trap of the named condition is enabled. Further, in this situation two more tokens may be allowed in the instruction: the first must be NAME and the second must be the name of a label (either as a constant string or a symbol, which is taken literally). If the five token form is used, then the label of the condition handler is set to the named label, else the name of the condition handler is set to the default, which is identical to the name of the condition itself.
Note that the NAME subclause of the SIGNAL instruction was a new construct in TRL2, and is not a part of TRL1. Thus, older interpreters may not support it.
Example: Naming condition traps
Note that the default value for the condition handler (if the NAME subclause is not specified) is the name of the condition, not the condition handler from the previous time the condition was enabled. Thus, after the following code, the name of the condition handler for the condition SYNTAX is SYNTAX, not FOOBAR:
signal on syntax name foobar
signal on syntax
Example: Named condition traps in TRL1
A common problem when trying to port REXX code from a TRL2 interpreter to a TRL1 interpreter, is that explicitly named condition traps are not supported. There exist ways to circumvent this, like:
syntax_name = 'SYNTAX_HANDLER'
signal on syntax
if 1 + 2 then /* will generate SYNTAX condition */
nop
syntax:
oldsigl = sigl
signal value translate(syntax_name)
syntax_handler:
say 'condition at line' oldsigl 'is being handled...'
exit
Here, a "global" variable is used to store the name of the real condition handler, in the absence of a field for this in the interpreter. This works fine, but there are some problems: the variable SYNTAX_NAME must be exposed to everywhere, in order to be available at all times. It would be far better if this value could be stored somewhere from which it could be retrieved from any part of the script, no matter the current state of the call-stack. This can be fixed with programs like GLOBALV under VM/CMS and putenv under Unix.
Another problem is that this destroys the possibility of setting up the condition handler with the default handler name. However, to circumvent this, add a new DEFAULT_SYNTAX_HANDLER label which becomes the new name for the old SYNTAX label.
Further information about conditions and condition traps are given in chapter Conditions.
TRACE [ number | setting | [ VALUE ] expr ] ;
setting = A | S | C | E | F | I | L | N | O | R | S
The TRACE instruction is used to set a tracing mode. Depending on the current mode, various levels of debugging information is displayed for the programmer. Also interactive tracing is allowed, where the user can re-execute clauses, change values of variables, or in general, execute REXX code interactively between the statements of the REXX script.
If setting is not specified, then the default value N is assumed. If the second token after TRACE is VALUE, then the remaining parts of the clause is interpreted as an expression, which value is used as the trace setting. Else, if the second token is either a string of a symbol, then it is taken as the trace setting; and a symbol is taken literally. In all other circumstances, whatever follows the token TRACE is taken to be an expression, which value is the trace setting.
If a parameter is given to the TRACE instruction, and the second token in the instruction is not VALUE, then there must only be one token after TRACE, and it must be either a constant string or a symbol (which is taken literally). The value of this token can be either a whole number or a trace setting.
If is it a whole number and the number is positive, then the number specifies how many of interactive pauses to skip. This assumes interactive tracing; if interactive tracing is not enabled, this TRACE instruction is ignored. If the parameter is a whole, negative number, then tracing is turned off temporarily for a number of clauses determined by the absolute value of number.
If the second token is a symbol of string, but not a whole number, then it is taken to be one of the settings below. It may optionally be preceded by one or more question mark (?) characters. Of the rest of the token, only the first letter matter; this letter is translated to upper case, and must be one of the following:
[A]
(All) Traces all clauses before execution.
[C]
(Commands) Traces all command clauses before execution.
[E]
(Errors) Traces any command that would raise the ERROR condition (whether enabled or not) after execution. Both the command clause and the return value is traced.
[F]
(Failures) Trances any command that would raise the FAILURE condition (whether enabled or not) after execution. Both the command clause and the return value is traced.
[I]
(Intermediate) Traces not only all clauses, but also traces all evaluation of expressions; even intermediate results. This is the most detailed level of tracing.
[L]
(Labels) Traces the name of any label clause executed; whether the label was jumped to or not.
[N]
(Normal or Negative) This is the same as the Failure setting.
[O]
(Off) Turns off all tracing.
[R]
(Results) Traces all clauses and the results of evaluating expressions. However, intermediate expressions are not traced.
The Errors and Failures settings are not influenced by whether the ERROR or FAILURE conditions are enabled or not. These TRACE settings will trace the command and return value after the command have been executed, but before the respective condition is raised.
The levels of tracing might be set up graphically, as in the figure below. An arrow indicates that the setting pointed to is a superset of the setting pointed from.
/-> Failures -> Errors -> Commands
Off \
\-----> Labels --------> All -> Results -> Intermediate
Hierarchy of TRACE settings
According to this figure, Intermediate is a superset of Result, which is a superset of All. Further, All is a superset of both Commands and Labels. Commands is a superset of Errors, which is a superset of Failures. Both Failure and Labels are supersets of Off. Actually, Command is strictly speaking not a superset of Errors, since Errors traces after the command, while Command traces before the command.
Scan is not part of this diagram, since it provides a completely different tracing functionality. Note that Scan is part of TRL1, but was removed in TRL2. It is not likely to be part of newer REXX interpreters.
UPPER symbol [ symbol [ symbol [...] ] ] ;
The UPPER instruction is used to translate the contents of one or more variables to uppercase. The variables are translated in sequence from left to right.
Each symbol is separated by one or more blanks.
While it is more convenient and probably faster than individual calls to TRANSLATE, UPPER is not part of the ANSI standard and is not common in other interpreters so should be avoided. It is provided to ease porting of programs from CMS.
Only simple and compound symbols can be specified. Specification of a stem variable results in an error.
An operator represents an operation to be carried out between two terms, such as division. There are 5 types of operators in the Rexx Language: Arithmetic, Assignment, Comparative, Concatenation, and Logical Operators. Each is described in further details below.
Arithmetic operators can be applied to numeric constants and Rexx variables that evaluate to valid Rexx numbers. The following operators are listed in descreasing order of precedence:
- Unary prefix. Same as 0 - number.
+ Unary prefix. Same as 0 + number.
** Power
* Multiply
/ Divide
% Integer divide. Divide and return the integer part of the division.
// Remainder divide. Divide and return the remainder of the division.
+ Add
- Subtract.
Assignment operators are a means to change the value of a variable. Rexx only has one assignment operator.
= Assign the value on the right side of the "=" to the variable on the left.
The Rexx comparative operators compare two terms and return the logical value 1 if the result of the comparison is true, or 0 if the result of the comparison is false. The non-strict comparative operators will ignore leading or trailing blanks for string comparisons, and leading zeros for numeric comparisons. Numeric comparisons are made if both terms to be compared are valid Rexx numbers, otherwise string comparison is done. String comparisons are case sesitive, and the shorter of the two strings is padded with blanks.
The following lists the non-strict comparative operators.
= Equal
\=, ^= Not equal
> Greater than.
< Less than.
>= Greater than or equal.
<= Less than or equal
<>, >< Greater than or less than. Same as Not equal.
The following lists the strict comparative operators. For two strings to be considered equal when using the strict equal comparative operator, both strings must be the same length.
== Strictly equal
\==, ^== Strictly not equal.
>> Strictly greater than.
<< Strictly less than.
>>= Striclty greater than or equal.
<<= Strictly less than or equal.
The concatenation operators combine two strings to form one, by appending the seond string to the right side of the first. The Rexx concatenation operators are:
(blank) Concatenation of strings with one space between them.
(abuttal) Concatenation of strings with no intervening space.
|| Concatenation of strings with no intervening soace.
Examples:
a = abc;b = 'def'
Say a b -> results in 'abc def'
Say a || b -> results in 'abcdef'
Say a'xyz' -> results in 'abcxyz'
Logical operators work with the Rexx strings 1 and 0, usually as a result of a comparative operator. These operators also only result in logical TRUE; 1 or logical FALSE; 0.
& And Returns 1 if both terms are 1.
| Inclusive or Returns 1 if either term is 1.
&& Exclusive or Returns 1 if either term is 1 but NOT both terms.
\ Logical not Reverses the result; 0 becomes 1 and 1 becomes 0.
OPTIONS settings
Are saved across subroutines, just like other pieces of information, like conditions settings, NUMERIC settings, etc. See chapter Options for more information about OPTIONS settings.
Return value
To the program that called Regina is limited to being an integer, when this is required by the operating systems. All current implementations are for operating systems that require this.
Default return value
From a REXX program is 0 under most systems, specifically Unix, OS/2, MS-DOS. Here, VMS deviates, since it uses 1 as the default return value. Using 0 under VMS tends to make VMS issue a warning saying that no error occurred.
Transferring control into a loop
Works fine in Regina, as long as no END, THEN, ELSE, WHEN, or OTHERWISE instructions are executed afterwards; unless the normal entrypoint for the construct has been executed after the transfer of control.
PARSE SOURCE information
PARSE VERSION information
Last line of source code
Is implicitly taken to be terminated by an end-of-line sequence in Regina, even if such a sequence is not present in the source code of the REXX script. This applies only to source code. Also, the end-of-string in INTERPRET strings is taken to be implicitly terminated by an end-of-line character sequence.
Moving code MS-DOS to Unix
Is simplified by Regina, since it will accept the MS-DOS type end of line sequences as valid. I.e. any Ctrl-M in front of a Ctrl-J in the source file is ignored on Unix systems by Regina. This applies only to source code.
Labels in INTERPRET
Is handled by Regina in the following way: A label can occur inside an INTERPRET string, but it is ignored, and can never be jumped to in a SIGNAL or CALL instruction.
Most people have problems invoking external programs. This section shows the basic rules, and some tricks to let you use Regina and other Rexx interpreters sucessfully.
Every call to an external program is executed by an implicit ADDRESS statement.
'echo Hello planet'
is equivalent to
ADDRESS currentenvironment 'echo Hello planet'
The default environment is SYSTEM in Regina and many other Rexx interpreters.
Every ADDRESS environment has its own purpose and advantages. It is a good idea to use ADDRESS in front of each command. Everybody knows what happens in this case. And you can choose the best environment for the command.
This is the all-purpose solution for every command. The command is passed to the current command interpreter. It is generally the best option for most commands, but is has some disadvantages:
You don't have control over the different interpreters. You can get ugly errors in Windows NT, 2000, XP or in unices if you don't know how the interpreter interprets your command.
You have some trouble passing special characters to the command. Have you every tried to pass a ">" sign to a command? You won't get what you expect if you don't know how to quote it to bypass the interpreter.
You invoke a separate program just to invoke another program. It costs time and memory usage. Choosing another environment may lead to a quicker and safer execution.
Use SYSTEM if you want to use pipelines and redirections of the interpreter or if you want to use a builtin command of the shell. "echo" is a builtin command in command interpreters. Also, the Unix pipleline of commands like "prog1 | prog2 | prog3" cannot be represented shorter in Regina.
This is the right ADDRESS environment if you know the called program's name but not where it is on disk. One example is "sort" in many systems.
Since Regina has ANSI's extremely useful ADDRESS WITH technique, you can very effectivly sort queue contents or stem leaves by:
ADDRESS PATH 'sort' WITH INPUT STEM unsort. OUTPUT STEM sort.
You let Regina find the program 'sort' (or SORT.EXE if you use Windows) and get the fastest way to do it. You don't have to bother about the current interpreter. Regina acts as a one. You can pass every character you want and Regina does its best to let it appear in the called program.
This is a special variant of PATH. It acts like PATH, but you have to pass the fully qualified filename of the program to execute. You usually use this if you want to use a distinct version of a program, e.g. if you can choose the version.
Another purpose is to call a program which isn't listed in the path and you don't want to change the path's environment variable by the builtin function VALUE.
The operating system and runtime system decides what program is looked for if you omit the path component; the current directory is used in most cases.
Use this if you want to execute a Rexx program in a separate instance of the interpreter. Whereas a nomal CALL on an external progam will run the external Rexx program in the current instance of Regina, this allows the external Rexx program to run in a new, independent instance of Regina.
Use it if:
The called interpreter is unstable and a crash in it will not affect the current execution. A common situation where you want it is an external program library you can bind with RxFuncAdd. Such a library can crash or terminate the interpreter. The calling interpreter won't be affected by this termination.
You want to take advantage of the powerful ADDRESS WITH redirection. The general mechanism to communicate with external scripts is a queue, but you don't have this in cases where you want to pass error messages in a different way or if you use a script which wasn't designed to use queues originally.
The current interpreter shall be reused and you want to take advantage the second point. You may have different Regina interpreters and you want to use just the current interpreter even if it isn't in your path. Regina tries to load the current interpreter a second time if you use this ADDRESS environment. You should get "rexx" if you use ADDRESS REXX, "regina" if you use ADDRESS REGINA. Regina also attempts to load the same executable that the current instance was started from, but not every system passes enough informations to Regina to find its own executable in all cases.
Redirection of program's input and/or output in general is relatively predicatable on most operating systems, however mention must be made of behaviour specific to the Windows platform.
Windows and to a lesser degree OS/2, have techniques to hide windows, to start programs in separate windows and other cool features. Florian did some significant testing of this on all different Windows platforms and there is bad news. There is no consistent mechanism to start external programs without error and full control. Sounds strange, is strange. We have the options to:
use the interpreter (ADDRESS SYSTEM) or not (ADDRESS PATH or CMD)
start GUI or text mode programs
choose the interpreter (CMD.EXE or COMMAND.COM)
The main goal was to start GUIs separately and text mode programs under the control of the caller (GUI or text). Regina can be part of a GUI progress and must be treated as GUI in this case. Most people get upset with console windows popping up showing nothing.
Some combinations of the interpreter, the target programs, and the options we can pass along to the system lead to nonstarting, nonstopping, crashing programs. Or we may loose control by means of broken communications to the subprocess (ADDRESS WITH...).
So we had to choose either to let program run safely OR to let program run pretty. Blame the guys who designed Windows, not the Regina crew!
So, if you have a DOS graphical extension known as Windows 95, Windows 98 or Windows Millenium you will get console windows popping up if run from a GUI program. We are sorry for this, we can't change it.
Those Systems with a 32 bit startup kernel known as Windows NT, Windows 2000, Windows XP will hide the console windows when starting a text mode program from a GUI program.
ATTENTION: Your programs might crash or you may loose control either of the called program or of Regina if you change the interpreter inside your Rexx program. Never use
CALL VALUE 'COMSPEC', something, 'SYSTEM'
in your program if you don't know the consequences! Unpredicatable behaviour is likely to occur; use at your own peril!
Many language interpreters provide a mechanism where code executed within that interpreter is limited to affecting the environment of the interpreter and cannot change the external environment in which the interpreter runs.
Restricted mode is used in situations where you need to guarantee that the author of a Rexx program is unable to affect the user's environment.
Situations where a restricted mode is applicable include, using Regina as a database procedural language, or as a language plugin for a Web browser.
Features of Regina that are disabled in restricted mode are:
LINEOUT, CHAROUT, POPEN, RXFUNCADD BIFs
"OPEN WRITE", "OPEN BOTH" subcommands of STREAM BIF
The "built-in" environments eg. SYSTEM, CMD or PATH of ADDRESS command
Setting the value of a variable in the external environment with VALUE BIF.
Calling external functions
To run Regina in restricted mode, you can start the Regina interpreter from the command line with the '-r' switch, or when using the Rexx SAA API, ORing, RXRESTRICTED to the CallType parameter of RexxStart() function.
Regina provides native language support in the following ways:
Error messages can be displayed in a user-selectable native language.
All native language error messages are contained in binary files (*.mtb) that are built with the Regina executables from source files (*.mts).
The mechanism Regina uses to determine what native language to use to display error messages depends on the operating system.
On EPOC32, the language is supplied when installing; the selected language is contained in default.mtb. On all other platforms, Regina uses environment variables if you want to use a language other than English.
The English language messages are built into the interpreter for two reasons:
to staisfy the ANSI requirement that error messages can be obtained in English using the ERRORTEXT BIF and specifying a value of 'S' for argument 2.
used as a fallback position when no native langugae support is available
To specify a native language, up to 2 environment variables are used.
REGINA_LANG environment variable is set to an ISO 639, 2 character language abbreviation as defined in the following table.
REGINA_LANG |
Language |
Translation By |
de |
German |
Floran Grosse-Coosmann |
es |
Spanish |
Pablo Garcia-Abia |
no |
Norwegian |
Vidar Tysse |
pt |
Portuguese |
Susana and Brian Carpenter, Josie Medeiros |
(to get your name in this table, contact the maintainer with the language you wish to support)
If REGINA_LANG is not set, the default is en. The case of the value is irrelevant; EN is the same as en.
REGINA_LANG_DIR is required if Regina does not know where the language files will be at runtime.
Any binary distribution that includes an installation routine; RPM, Windows InstallShield or EPOC32, will set the location of the .mtb files automatically. Similarly building and installing Regina on Unix-like platforms using configure;make install combination will also set the location automatically. All other platforms will require this environment variable to bet set manually.
Table of Contents
Rexx Language Constructs 1
1Definitions 1
2Null clauses 3
3Commands 5
3.1Assignments 5
4Instructions 7
4.1The ADDRESS Instruction 9
4.2The ARG Instruction 15
4.3The CALL Instruction 15
4.4The DO/END Instruction 18
4.5The DROP Instruction 22
4.6The EXIT Instruction 23
4.7The IF/THEN/ELSE Instruction 24
4.8The INTERPRET Instruction 25
4.9The ITERATE Instruction 27
4.10The LEAVE Instruction 27
4.11The NOP Instruction 28
4.12The NUMERIC Instruction 28
4.13The OPTIONS Instruction 30
4.14The PARSE Instruction 31
4.15The PROCEDURE Instruction 33
4.16The PULL Instruction 37
4.17The PUSH Instruction 37
4.18The QUEUE Instruction 37
4.19The RETURN Instruction 37
4.20The SELECT/WHEN/OTHERWISE Instruction 39
4.21The SIGNAL Instruction 41
4.22The TRACE Instruction 43
4.23The UPPER Instruction 45
5Operators 46
5.1Arithmetic Operators 46
5.2Assignment Operators 46
5.3Comparative Operators 46
5.4Concatenation Operators 47
5.5Logical Operators 47
6Implementation-Specific Information 48
6.1Miscellaneous 48
6.2Implementation of the ADDRESS environment 48
6.2.1SYSTEM aka ENVIRONMENT aka OS2ENVIRONMENT 49
6.2.2PATH 49
6.2.3CMD aka COMMAND 49
6.2.4REXX or REGINA 50
6.3ADRRESS WITH on Windows 50
6.4Regina Restricted Mode 51
6.5Native Language Support 51
6.5.1Error Messages 51
6.5.2Implementation 52