Stream Input and Output
And the streams thereof shall be turned into pitch
Isaiah 33:21
For every one that asketh receivedth;
and he that seeketh findth;
and to him that knocketh it shall be opened.
Matthew 7:8
This chapter treats the topic of input from and output to streams using the built-in functions. An overview of the other parts of the input/output (I/O) system is also given but not discussed in detail. At the end of the chapter there are sections containing implementation-specific information for this topic.
Stream I/O is a problem area for languages like REXX. They try to maintain compatibility for all platforms (i.e. to be non-system-specific), but the basic I/O capabilities differ between systems, so the simplest way to achieve compatibility is to include only a minimal, common subset of the functionality of all platforms. With respect to the functionality of the interface to their surrounding environment, non-system-specific script languages like REXX are inherently inferior to system specific script languages which are hardwired to particular operating systems and can benefit from all their features.
Although REXX formally has its own I/O constructs, it is common for some platforms that most or all of the I/O is performed as operating system commands rather than in REXX. This is how it was originally done under VM/CMS, which was one of the earliest implementations and which did not support REXX's I/O constructs. There, the EXECIO program and the stack (among other methods) are used to transfer data to and from a REXX program.
Later, the built-in functions for stream I/O gained territory, but lots of implementations still rely on special purpose programs for doing I/O. The general recommendation to REXX programmers is to use the built-in functions instead of special purpose programs whenever possible; that is the only way to make compatible programs.
REXX regards a stream as a sequence of characters, conceptually equivalent to what a user might type at the keyboard. Note that a stream is not generally equivalent to a file. [MCGH:DICT] defines a file as "a collection of related records treated as a unit," while [OX:CDICT] defines it as "Information held on backing store [...] in order (a) to enable it to persist beyond the time of execution of a single job and/or (b) to overcome space limitations in main memory." A stream is defined by [OX:CDICT] as "a flow of data characterized by relative long duration and constant rate."
Thus, a file has a flavor of persistency, while a stream has a flavor of sequence and momentarily. For a stream, data read earlier may already have been lost, and the data not yet read may not be currently defined; for instance the input typed at a keyboard or the output of a program. Even though much of the REXX literature use these two terms interchangeably (and after all, there is some overlap), you should bear in mind that there is a difference between them.
In this documentation, the term "file" means "a collection of persistent data on secondary storage, to which random access and multiple retrieval are allowed." The term "stream" means a sequential flow of data from a file or from a sequential device like a terminal, tape, or the output of a program. The term stream is also used in its strict REXX meaning: a handle to/from which a flow of data can be written/read.
REXX I/O is very simple, and this short crash course is probably all you need in a first-time reading of this chapter. But note that that, we need to jump a bit ahead in this section.
To read a line from a stream, use the LINEIN() built-in function, which returns the data read. To write a stream, use the LINEOUT() built-in function, and supply the data to be written as the second parameter. For both operations, give the name of the stream as the first parameter. Some small examples:
contents = linein( 'myfile.txt' )
call lineout 'yourfile.txt', 'Data to be written'
The first of these reads a line from the stream myfile.txt, while the second writes a line to the stream yourfile.txt. Both these calls operate on lines and they use a system specific end-of-line marker as a delimiter between lines. The marker is tagged on at the end of any data written out, and stripped off any data read.
Opening a stream in REXX is generally done automatically, so you can generally ignore that in your programs. Another useful method is repositioning to a particular line:
call linein 'myfile.txt', 12, 0
call lineout 'yourfile.txt',, 13
Where the first of these sets the current read position to the start of line 12 of the stream; the second sets the current write position to the start of line 13. Note that the second parameter is empty, that means no data is to be written. Also note that the current read and write positions are two independent entities; setting one does not affect the other.
The built-in functions CHARIN() and CHAROUT() are similar to the ones just described, except that they are character-oriented, i.e. the end-of-line delimiter is not treated as a special character.
Examples of use are:
say charin( 'myfile.txt', 10 )
call charout 'logfile', 'some data'
Here, the first example reads 10 characters, starting at the current input position, while the second writes the eleven characters of "some data" to the file, without an end-of-file marker afterwards.
It is possible to reposition character-wise too, some examples are:
call charin 'myfile',, 8
call charout 'foofile,, 10
These two clauses repositions the current read and write positions of the named files to the 8th and 10th characters, respectively.
Unlike most programming languages, REXX does not use file handles; the name of the stream is also in general the handle (although some implementations add an extra level of indirection). You must supply the name to all I/O functions operating on a stream. However, internally, the REXX interpreter is likely to use the native file pointers of the operating system, in order to improve speed. The name specified can generally be the name of an operating system file, a device name, or a special stream name supported by your implementation.
The format of the stream name is very dependent upon your operating system. For portability concerns, you should try not to specify it as a literal string in each I/O call, but set a variable to the stream name, and use that variable when calling I/O functions. This reduces the number of places you need to make changes if you need to port the program to another system. Unfortunately, this approach increases the need for PROCEDURE EXPOSE, since the variable containing the files name must be available to all routines using file I/O for that particular file, and all their non-common ancestors.
Example: Specifying file names
The following code illustrates a portability problem related to the naming of streams. The variable filename is set to the name of the stream operated on in the function call.
filename = '/tmp/MyFile.Txt'
say ' first line is' linein( filename )
say 'second line is' linein( filename )
say ' third line is' linein( filename )
Suppose this script, which looks like it is written for Unix, is moved to a VMS machine. Then, the stream name might be something like SYS$TEMP:MYFILE.TXT, but you only need to change the script at one particular point: the assignment to the variable filename; as opposed to three places if the stream name is hard-coded in each of the three calls to LINEIN().
If the stream name is omitted from the built-in I/O functions, a default stream is used: input functions use the default input stream, while output functions use the default output stream. These are implicit references to the default input and output streams, but unfortunately, there is no standard way to explicitly refer to these two streams. And consequently, there is no standard way to refer to the default input or output stream in the built-in function STREAM().
However, most implementations allow you to access the default streams explicitly through a name, maybe the nullstring or something like stdin and stdout. However, you must refer to the implementation-specific documentation for information about this.
Also note that standard REXX does not support the concept of a default error stream. On operating systems supporting this, it can probably be accessed through a special name; see system-specific information. The same applies for other special streams.
Sometimes the term "default input stream" is called "standard input stream," "default input devices," "standard input," or just "stdin."
The use of stream names instead of stream descriptors or handles is deeply rooted in the REXX philosophy: Data structures are text strings carrying information, rather than opaque data blocks in internal, binary format. This opens for some intriguing possibilities. Under some operating systems, a file can be referred to by many names. For instance, under Unix, a file can be referred to as foobar, ./foobar and ././foobar. All which name the same file, although a REXX interpreter may be likely to interpret them as three different streams, because the names themselves differ. On the other hand, nothing prevents an interpreter from discovering that these are names for the same stream, and treat them as equivalent (except concerns for processing time). Under Unix, the problem is not just confined to the use of ./ in file names, hard-links and soft-links can produce similar effects, too.
Example: Internal file handles
Suppose you start reading from a stream, which is connected to a file called foo. You read the first line of foo, then you issue a command, in order to rename foo to bar. Then, you try to read the next line from foo. The REXX program for doing this under Unix looks something like:
signal on notready
line1 = linein( 'foo' )
'mv foo bar'
line2 = linein( 'foo' )
Theoretically, the file foo does not exist during the second call, so the second read should raise the NOTREADY condition. However, a REXX interpreter is likely to have opened the stream already, so it is performing the reading on the file descriptor of the open file. It is probably not going to check whether the file exists before each I/O operation (that would require a lot of extra checking). Under most operating systems, renaming a file will not invalidate existing file descriptors. Consequently, the interpreter is likely to continue to read from the original foo file, even though its has changed.
Example: Unix temporary files
On some systems, you can delete a file, and still read from and write to the stream connected to that file. This technique is shown in the following Unix specific code:
tmpfile = '/tmp/myfile'
call lineout tmpfile, ''
call lineout tmpfile,, 1
'rm' tmpfile
call lineout tmpfile, 'This is the first line'
Under Unix, this technique is often used to create temporary files; you are guaranteed that the file will be deleted on closing, no matter how your program terminates. Unix deletes a file whenever there are no more references to it. Whether the reference is from the file system or from an open descriptor in a user process is irrelevant. After the rm command, the only reference to the file is from the REXX interpreter. Whenever it terminates, the file is deleted--since there are no more references to it.
Example: Files in different directories
Here is yet another example of how using the filename directly in the stream I/O functions may give strange effects. Suppose you are using a system that has hierarchical directories, and you have a function CHDIR() which sets a current directory; then consider the following code:
call chdir '../dir1'
call lineout 'foobar', 'written to foobar while in dir1'
call chdir '../dir2'
call lineout 'foobar', 'written to foobar while in dir2'
Since the file is implicitly opened while you are in the directory dir1, the file foobar refers to a file located there. However, after changing the directory to dir2, it may seem logical that the second call to LINEOUT() operates on a file in dir2, but that may not be the case. Considering that these clauses may come a great number of lines apart, that REXX has no standard way of closing files, and that REXX only have one file table (i.e. open files are not local to subroutines); this may open for a significant astonishment in complex REXX scripts.
Whether an implementation treats ././foo and ./foo as different streams is system-dependent; that applies to the effects of renaming or deleting the file while reading or writing, too. See your interpreter's system-specific documentation.
Most of the effects shown in the examples above are due to insufficient isolation between the filename of the operating system and the file handle in the REXX program. Whenever a file can be explicitly opened and bound to a file handle, you should do that in order to decrease the possibilities for strange side effects.
Interpreters that allow this method generally have an OPEN() function that takes the name of the files to open as a parameter, and returns a string that uniquely identifies that open file within the current context; e.g. an index into a table of open files. Later, this index can be used instead of the filename.
Some implementations allow only this indirect naming scheme, while others may allow a mix between direct and indirect naming. The latter is likely to create some problems, since some strings are likely to be both valid direct and indirect file ids.
REXX knows two different types of streams: persistent and transient. They differ conceptually in the way they can be operated, which is dictated by the way they are stored. But there is no difference in the data you can read from or write to them (i.e. both can used for character- or line-wise data), and both are read and written using the same functions.
[Persistent streams]
(often referred to just as "files") are conceptually stored on permanent storage in the computer (e.g. a disk), as an array of characters. Random access to and repeated retrieval of any part of the stream are allowed for persistent streams. Typical example of persistent streams are normal operating system files.
[Transient streams]
are typically not available for random access or repeated retrieval, either because it is not stored permanently, but read as a sequence of data that is generated on the fly; or because they are available from a sequential storage (e.g. magnetic tape) where random access is difficult or impossible. Typical examples of transient streams are devices like keyboards, printers, communication interfaces, pipelines, etc.
REXX does not allow any repositioning on transient streams; such operations are not conceptually meaningful; a transient stream must be treated sequentially. It is possible to treat a persistent stream as a transient stream, but not vice versa. Thus, some implementations may allow you to open a persistent stream as transient. This may be useful for files to which you have only append access, i.e. writes can only be performed at the end of file. Whether you can open a stream in a particular mode, or change the mode of a stream already open depends on your implementation.
Example: Determining stream type
Unfortunately, there is no standard way to determine whether a given file is persistent or transient. You may try to reposition for the file, and you can assume that the file is persistent if the repositioning succeeded, like in the following code:
streamtype: procedure
signal on notready
call linein arg(1), 1, 0
return 'persistent' /* unless file is empty */
notready:
return 'transient'
Although the idea in this code is correct, there are unfortunately a few problems. First, the NOTREADY condition can be raised by other things than trying to reposition a transient stream; e.g. by any repositioning of the current read position in an empty file, if you have write access only, etc. Second, your implementation may not have NOTREADY, or it may not use it for this situation.
The best method is to use a STREAM() function, if one is available. Unfortunately, that is not very compatible, since no standard stream commands are defined.
In most programming languages, opening a file is the process of binding a file (given by a file name) to an internal handle. REXX is a bit special, since conceptually, it does not use stream handles, just stream names. Therefore, the stream name is itself also the stream handle, and the process of opening streams becomes apparently redundant. However, note that a number of implementations allow explicit opening, and some even require it.
REXX may open streams "on demand" when they are used for the first time. However, this behavior is not defined in TRL, which says the act of opening the stream is not a part of REXX [TRL2]. This might be interpreted as open-on-demand or that some system-specific program must be executed to open a stream.
Although an open-on-demand feature is very practical, there are situations where you need to open streams in particular modes. Thus, most systems have facilities for explicitly opening a file. Some REXX interpreters may require you to perform some implementation-specific operation before accessing streams, but most are likely to just open them the first time they are referred to in an I/O operation.
There are two main approaches to explicit opening of streams. The first uses a non-standard built-in function normally called OPEN(), which generally takes the name of the file to open as the first parameter, and often the mode as the second parameter. The second approach is similar, but uses the standard built-in function STREAM() with a Command option.
Example: Not closing files
Since there are no open or close operation, a REXX interpreter never knows when to close a stream, unless explicitly told so. It can never predict when a particular stream is to be used next, so it has to keep the current read and write positions in case the stream is to be used again. Therefore, you should always close the streams when you are finished using them. Failure to do so, will fill the interpreter with data about unneeded streams, and more serious, it may fill the file table of your process or system. As a rule, any REXX script that uses more than a couple of streams, should close every stream after use, in order to minimize the number of simultaneously open streams. Thus, the following code might eventually crash for some REXX interpreters:
do i=1 to 300
call lineout 'file.'||i, 'this is file number' i
end
A REXX interpreter might try to defend itself against this sort of open-many-close-none programming, using of various programming techniques; this may lead to other strange effects. However, the main responsibility for avoiding this is with you, the REXX script programmer.
Note that if a stream is already open for reading, and you start writing to it, your implementation may have to reopen it in order to open for both reading and writing. There are mainly two strategies for handling this. Either the old file is closed, and then reopened in the new mode, which may leave you with read and write access to another file. Or a new file handle is opened for the new mode, which may leave you with read and write access to two different files.
These are real-world problems which are not treated by the ideal description of TRL. A good implementation should detect these situations and raise NOTREADY.
As already mentioned, REXX does not have an explicit way of opening a stream. Nor does it have an explicit way of closing a stream. There is one semi-standard method: If you call LINEOUT(), but omit both the data to be written and the new current write position, then the implementation is defined to set the current write position to the end-of-file. Furthermore, it is allowed by TRL to do something "magic" in addition. It is not explicitly defined what this magic is, but TRL suggests that it may be closing the stream, flushing the stream, or committing changes done previously to the stream.
In SAA, the definition is strengthened to state that the "magic" is closing, provided that the environment supports that operation.
A similar operating can be performed by calling CHAROUT() with neither data nor a new position. However, in this case, both TRL and SAA leave it totally up to the implementation whether or not the file is to be closed. One can wonder whether the changes for LINEOUT() in SAA with respect to TRL should also have been done to CHAROUT(), but that this was forgotten.
TRL2 does not indicate that LINEIN() or CHARIN() can be used to close a string. Thus, the closest one gets to a standard way of closing input files is to call e.g. LINEOUT(); although it is conceptually suspect to call an output routine for an input file. The historical reasons for this omission are perhaps that flushing output files is vital , while the concept of flushing is irrelevant for input files; flushing is an important part of closing a file, and that explains why closing is only indicated for output files.
Thus, the statement:
call lineout 'myfile.txt'
might be used to close the stream myfile.txt in some implementations. However, it is not guaranteed to close the stream, so you cannot depend on this for scripts of maximum portability, but it's better than nothing. However, note that if it closes the stream, then also the current read position is affected. If it merely flushes the stream, then only the current write position is likely to be affected.
Basically, the built-in REXX library offers two strategies of reading and writing streams: line-wise and character-wise. When reading line-wise, the underlying storage method of the stream must contain information which describes where each line starts and ends.
Some file systems store this information as one or more special characters; while others structure the file in a number of records; each containing a single line. This introduces a slightly subtle point; even though a stream foo returns the same data when read by LINEIN()on two different machines; the data read from foo may differ between the same two machines when the stream is read by CHARIN(), and vice versa. This is so because the end-of-line markers can vary between the two operating systems.
Example: Character-wise handling of EOL
Suppose a text file contains the following three lines (ASCII character set is assumed):
first
second
third
and you first read it line-wise and then character-wise. Assume the following program:
file = 'DATAFILE'
foo = ''
do i=1 while chars(file)>0
foo = foo || c2x(charin(file))' '
end
say foo
When the file is read line-wise, the output is identical on all machines, i.e. the three lines shown above. However, the character-wise reading will be dependent on your operating system and its file system, thus, the output might e.g. be any of:
66 69 72 73 74 73 65 6F 63 6E 64 74 68 69 72 64 66 69 72 73 74 0A
66 69 72 73 74 0A
73 65 6F 63 6E 64 0A
74 68 69 72 64 0A
66 69 72 73 74 0D 0A
73 65 6F 63 6E 64 0D 0A
74 68 69 72 64 0D 0A
If the machine uses records to store the lines, the first one may be the result; here, only the data in the lines of the file is returned. Note that the boxes in the output are put around the data generated by the actual line contents. What is outside the boxes is generated by the end-of-line character sequences.
The second output line is typical for Unix machines. They use the newline ASCII character as line separator, and that character is read immediately after each line. The last line is typical for MS-DOS, where the line separator character sequence is a carriage return following by a newline (ASCII '0D'x and '0A'x).
For maximum portability, the line-wise built-in functions (LINEIN(), LINEOUT() and LINES()) should only be used for line-wise streams. And the character-wise built-in functions (CHARIN(), CHAROUT() and CHARS()) should only be used for character-wise data. You should in general be very careful when mixing character- and line-wise data in a single stream; it does work, but may easily lead to portability problems.
The difference between character- and line-wise streams are roughly equivalent to the difference between binary and text streams, but the two concepts are not totally equivalent. In a binary file, the data read is the actual data stored in the file, while in a text file, the character sequences used for denoting end-of-line and end-of-file markers may be translated to actions or other characters during reading.
The end-of-file marker may be differently implemented on different systems. On some systems, this marker is only implicitly present at the end-of-file--which is calculated from the file size (e.g. Unix). Other systems may put a character signifying end-of-file at the end (or even in the middle) of the file (e.g. <Ctrl-Z> for MS-DOS). These concepts vary between operating systems, interpreters should handle each concept according to the customs of the operating system. Check the implementation-specific documentation for further information. In any case, if the interpreter treats a particular character as end-of-file, then it only gives special treatment to this character during line-wise operations. During character-wise operations, no characters have special meanings.
Four built-in functions provide line- and character-oriented stream reading and writing capabilities: CHARIN(), CHAROUT(), LINEIN(), LINEOUT().
[CHARIN()]
is a built-in function that takes up to three parameters, which are all optional: the name of the stream to read from, the start point, and the number of characters to read. The stream name defaults to the default input stream, the start point defaults to the current read position, the number of characters to read defaults to one character. Leave out the second parameter in order to avoid all repositioning. During execution, data is read from the stream specified, and returned as the return value.
[LINEIN()]
is a built-in function that takes three parameters too, and they are equivalent to the parameters of CHARIN(). However, if the second parameter is specified, it refer to a line position, rather than a character position; it refers to the character position of the first character of that line. Further, the third parameter can only be 0 or 1, and refers to the number of lines to read; i.e. you cannot read more than one line in each call. The line read is returned by the function, or the nullstring if no reading was requested.
[LINEOUT()]
is a built-in function that takes three parameters too, the first is the name of the stream to write to, and defaults to the default output stream. The second parameter is the data to be written to the file, and if not specified, no writing occurs. The third parameter is a line-oriented position in the file; if the third parameter is specified, the current position is repositioned at before the data (if any) is written. If data is written, an end-of-line character sequence is appended to the output stream.
[CHAROUT()]
is a built-in function that is used to write characters to a file. It is identical to LINEOUT(), except that the third parameter refers to a character position, instead of a line position. The second difference is that an end-of-line character sequence is not appended at the end of the data written.
Example: Counting lines, words, and characters
The following REXX program emulates the core functionality of the wc program under Unix. It counts the number of lines, words, and characters in a file given as the first argument.
file = arg(1)
parse value 0 0 0 with lines words chars
do while lines(file)>0
line = linein(file)
lines = lines + 1
words = words + words(line)
chars = chars + length(line)
end
say 'lines='lines 'words='words 'chars='chars
There are some problems. For instance, the end-of-line characters are not counted, and a last improperly terminated line is not counted either.
Standard REXX does not have any seek call that returns the current position in a stream. Instead, it provides two calls that returns the amount of data remaining on a stream. These two built-in functions are LINES() and CHARS().
The LINES() built-in function returns the number of complete lines left on the stream given as its first parameter. The term "complete lines" does not really matter much, since an implementation can assume the end-of-file to implicitly mean an end-of-line.
The CHARS() built-in function returns the number of character left in the stream given as its first parameter.
This is one of the concepts where REXX I/O does not map very well to C I/O and vice versa. While REXX reports the amount of data from the current read position to the end of stream, C reports the amount of data from the start of the file to the current position. Further, the REXX method only works for input streams, while the C method works for both input and output files. On the other hand, C has no basic constructs for counting remaining or reposition at lines of a file.
Example: Retrieving current position
So, how does one find the current position in a file, when only allowed to do normal repositioning? The trick is to reposition twice, as shown in the code below.
ftell: procedure
parse arg filename
now = chars(filename)
call charin filename, 0, 1
total = chars(filename)
call charin filename, 0, total-now
return total-now
Unfortunately, there are many potential problems with this code. First, it only works for input files, since there is no equivalent to CHARS() for output files. Second, if the file is empty, none of the repositioning work, since it is illegal to reposition at or after end-of-file for input files--and the end-of-file is the first position of the file. Third, if the current read position of the file is at the end of file (e.g. all characters have been read) it will not work for similar reasons as for the second case. And fourth, it only works for persistent files, since transient files do not allow repositioning.
Example: Improved ftell function
An improved version of the code for the ftell routine (given above), which tries to handle these problems is:
ftell: procedure
parse arg filename
signal on notready name not_persist
now = chars(filename)
signal on notready name is_empty
call charin filename, 0, 1
total = chars()
if now>0 then
call charin filename, 0, total-now+1
else if total>0 then
call charin filename, 1, total
else
nop /* empty file, should have raised NOTREADY */
return total-now+1
not_presist: say filename 'is not persistent'; return 0
is_empty: say filename 'is empty'; return 0
The same method can be used for line-oriented I/O too, in order to return the current line number of an input file. However, a potential problem in that case is that the routine leaves the stream repositioned at the start of the current line, even if it was initially positioned to the middle of a line. In addition, the line-oriented version of this ftell routine may prove to be fairly inefficient, since the interpreter may have to scan the whole file twice for end-of-line character sequences.
REXX supports two strategies for reading and writing streams: character-wise, and line-wise, this section describes how a program can reposition the current positions for each these strategies. Note that positioning is only allowed for persistent streams.
For each open file, there is a current read position or a current write position, depending on whether the file is opened for reading or writing. If the file is opened for reading and writing simultaneously, it has both a current read position and a current write position, and the two are independent and in general different. A position within a file is the sequence number of the byte or line that will be read or written in the next such operation.
Note that REXX starts numbering at one, not zero. Therefore, the first character and the first line of a stream are both numbered one. This differs from several other programming languages, which starts numbering at zero.
Just after a stream has been opened, the initial values of the current read position is the first character in the stream, while the current write position is the end-of-file, i.e. the position just after the last character in the stream. Then, reading will return the first character (or line) in the stream, and writing will append a new character (or line) to the stream.
These initial values for the current read and write positions are the default values. Depending on your REXX implementation, other mechanisms for explicitly opening streams (e.g. through the STREAM() built-in function) may be provided, and may set other initial values for these positions. See the implementation-specific documentation for further information.
When setting the current read position, it must be set to the position of an existing character in the stream; i.e. a positive value, not greater than the total number of characters in the stream. In particular, it is illegal to set the current read position to the position immediately after the last character in the stream; although this is legal in many other programming languages and operating systems, where it is known as "seeking to the end-of-file".
When setting the current write position, it too must be set to the position of an existing character in the stream. In addition, and unlike the current read position, the current write position may also be set to the position immediately following the last character in the stream. This is known as "positioning at the end-of-file", and it is the initial value for the current write position when a stream is opened. Note that you are not allowed to reposition the current write position further out beyond the end-of-file--which would create a "hole" in the stream--even though this is allowed in many other languages and operating systems.
Depending on your operating system and REXX interpreter, repositioning to after the end-of-file may be allowed as an extension, although it is illegal according to TRL2. You should avoid this technique if you wish to write portable programs.
REXX only keeps one current read position and one current write position for each stream. So both line-wise and character-wise reading as well as positioning of the current read position will operate on the same current read position, and similarly for the current write position.
When repositioning line-wise, the current write position is set to the first character of the line positioned at. However, if positioning character-wise so that the current read position is in the middle of a line in the file, a subsequent call to LINEIN() will read from (and including) the current position until the next end-of-line marker. Thus, LINEIN() might under some circumstances return only the last part of a line. Similarly, if the current write position has been positioned in the middle of an existing line by character-wise positioning, and LINEOUT() is called, then the line written out becomes the last part of the line stored in the stream.
Note that if you want to reposition the current write position using a line count, the stream may have to be open for read, too. This is because the interpreter may have to read the contents of the stream in order to find where the lines start and end. Depending on your operating system, this may even apply if you reposition using character count.
Example: Repositioning in empty files
Since the current read position must be at an existing character in the stream, it is impossible to reposition in or read from an empty stream. Consider the following code:
filename = '/tmp/testing'
call lineout filename,, 1 /* assuming truncation */
call linein filename, 1, 0
One might believe that this would set the current read and write positions to the start of the stream. However, assume that the LINEOUT() call truncates the file, so that it is zero bytes long. Then, the last call can never be legal, since there is no byte in the file at which it is possible to position the current read position. Therefore, a NOTREADY condition is probably raised.
Example: Relative repositioning
It is rather difficult to reposition a current read or write position relative to the current position. The only way to do this within the definition of the standard is to keep a counter which tells you the current position. That is, if you want to move the current read position five lines backwards, you must do it like this:
filename = '/tmp/data'
linenum = 0 ;
say linein(filename,10); linenum = 10
do while random(100)>3
say linein(filename); linenum = linenum+1
end
call linein(filename,linenum-5,0); linenum = linenum-5
Here, the variable linenum is updated for each time the current read position is altered. This may not seem to difficult, and it is not in most cases. However, it is nearly impossible to do this in the general case, since you must keep an account of both line numbers and character numbers. Setting one may invalidate the other: consider the situation where you want to reposition the current read position to the 10th character before the 100th line in the stream. Except from mixing line-wise and character-wise I/O (which can have strange effects), this is nearly impossible. When repositioning character-wise, the line number count is invalidated, and vice versa.
The "only" proper way of handling this is to allow one or more (non-standard) STREAM() built-in function operations that returns the current character and line count of the stream in the interpreter.
Example: Destroying linecount
This example shows how overwriting text to the middle of a file can destroy the line count. In the following code, we assume that the file foobar exists, and contains ten lines which are "first line", second line, etc. up to "tenth line". Then consider the following code:
filename = 'foobar'
say linein(filename, 5) /* says 'fifth line' */
say linein(filename) /* says 'sixth line' */
say linein(filename) /* says 'seventh line' */
call lineout filename, 'This is a very long line', 5
say linein(filename, 5) /* says 'This is a very long line' */
say linein(filename) /* says 'venth line' */
say linein(filename) /* says 'eight line' */
As you can see from the output of this example, the call to LINEOUT() inserts a long line and overwrites the fifth and sixth lines completely, and the seventh line partially. Afterwards, the sixth line is the remaining part of the old seventh line, and the new seventh line is the old eighth line, etc.
TRL2 contains two important improvements over TRL1 in the area of handling errors in stream I/O: the NOTREADY condition and the STREAM() built-in function. The NOTREADY condition is raised whenever a stream I/O operation did not succeed. The STREAM() function is used to retrieve status information about a particular stream or to execute a particular operation for a stream.
You can discover that an error occurred during an I/O operation in one of the following ways: a) it may trigger a SYNTAX condition; b) it may trigger a NOTREADY condition; or c) it may just not return that data it was supposed to. There is no clear border between which situations should trigger SYNTAX and which should trigger NOTREADY. Errors in parameters to the I/O functions, like a negative start position, is clearly a SYNTAX condition, while reading off the end-of-file is equally clearly a NOTREADY condition. In between lay more uncertain situations like trying to position the current write position after the end-of-file, or trying to read a non-existent file, or using an illegal file name.
Some situations are likely to be differently handled in various implementations, but you can assume that they are handled as either SYNTAX or NOTREADY. Defensive, portable programming requires you to check for both. Unfortunately, NOTREADY is not allowed in TRL1, so you have to avoid that condition if you want maximum compatibility. And due to the very lax restrictions on implementations, you should always perform very strict verification on all data returned from any file I/O built-in function.
If neither are trapped, SYNTAX will terminate the program while NOTREADY will be ignored, so the implementor's decision about which of these to use may even depend on the severity of the problem (i.e. if the problem is small, raising SYNTAX may be a little too strict). Personally, I think SYNTAX should be raised in this context only if the value of a parameter is outside its valid range for all contexts in which the function might be called.
Example: General NOTREADY condition handler
Under TRL2 the "correct" way to handle NOTREADY conditions and errors from I/O operations is unfortunately very complex. It is shown in this example, in order to demonstrate the procedure:
myfile = 'MYFILE.DAT'
signal on syntax name syn_handler
call on notready name IO_handler
do i=1 to 10 until res=0
res = lineout(myfile, 'line #'i)
if (res=0) then
say 'Call to LINEOUT() didn"t manage to write out data'
end
exit
IO_handler:
syn_handler:
file = condition('D')
say condition('C') 'raised for file' file 'at line' sigl':'
say ' ' sourceline(sigl)
say ' State='stream(file,'S') 'reason:' stream(file,'D')
call lineout( condition( 'D' )) /* try to close */
if condition('C')=='SYNTAX' then
exit 1
else
return
Note the double checking in this example: first the condition handler is set up to trap any NOTREADY conditions, and then the return code from LINEOUT() is checked for each call.
As you can see, there is not really that much information that you can retrieve about what went wrong. Some systems may have additional sources from which you can get information, e.g. special commands for the STREAM() built-in function, but these are non-standard and should be avoided when writing compatible programs.
This section describes some of the common traps and pitfalls of REXX I/O.
TRL is rather relaxed in its specifications of what an interpreter must implement of the I/O system. It recognizes that operating systems differ, and that some details must be left to the implementor to decide, if REXX is to be effectively implemented. The parts of the I/O subsystem of REXX where implementations are allowed to differ, are:
The functions LINES() and CHARS() are not required to return the number of lines or characters left in a stream. TRL says that if it is impossible or difficult to calculate the numbers, these functions may return 1 unless it is absolutely certain that there are no more data left. This leads to some rather kludgy programming techniques.
Implementations are allowed to ignore closing streams, since TRL does not specify a way to do this. Often, the closing of streams is implemented as a command, which only makes it more incompatible.
Check the implementation-specific documentation before using the function LINEOUT(file) for closing files.
The difference in the action of closing and flushing a file, can make a REXX script that works under one implementation crash under another, so this feature is of very limited value if you are trying to write portable programs.
TRL says that because the operating system environments will differ a lot, and an efficient and useful interpreter is the most important goal, implementations are allowed to deviate from the standard in any respect necessary in the domain of I/O [TRL2]. Thus, you should never assume anything about the I/O system, as the "rules" listed in TRL are only advisory.
In the section above, some areas where the standard allows implementations to differ are listed. In an ideal world, that ought to be the only traps that you should need to look out for, but unfortunately, the world is not ideal. There are several areas where the requirements set up by the standard is quite high, and where implementations are likely to differ from the standard.
These areas are:
Repositioning at (for the current write position) or beyond the end-of-file may be allowed. On some systems, to prohibit that would require a lot of checking, so some systems will probably skip that check. At least for some operating systems, the act of repositioning after end-of-file is a useful feature.
Under Unix, it can be used for creating a dynamically sized random access file; do not bother about how much space is allocated for the file, just position to the correct "sloth" and write the data there. If the data file is sparse, holes might occur in the file; that is parts of the file which has not been written, and which is all zeros (and which are therefore not stored on disk.
Some implementations will use the same position for both the current read position and the current write position to overcome these implementations. Whenever you are doing a read, and the previous operation was a write (or vice versa), it is may prove useful to reposition the current read (or write) position.
There might be a maximum linesize for your REXX interpreter. At least the 50Kb limit on string length may apply.
Handling the situation where another program writes data to a file which is used by the REXX interpreter for reading.
Because of the large differences between various operating systems, REXX allows some fuzz in the implementation of the LINES() and CHARS() built-in functions. Sometimes, it is difficult to calculate the number of lines or characters in a stream; generally because the storage format of the file often requires a linear search through the whole stream to determine that number. Thus, REXX allows an implementation to return the value 1 for any situation where the real number is difficult or impossible to determine. Effectively, an implementation can restrict the domain of return values for these two functions only 1 and 0 from these two functions.
Many operating systems store lines using a special end-of-line character sequence. For these systems, it is very time-consuming to count the number of lines in a file, as the file must be scanned for such character sequences. Thus, it is very tempting for an implementor to return the value 1 for any situation where there are more than zero lines left.
A similar situation arises for the number of characters left, although it is more common to know this number, thus it is generally a better chance of CHARS() returning the true number of characters left than LINES() returning the true number of lines left.
However, you can be fairly sure that if an implementation returns a number greater than 1, then that number is the real number of lines (or characters) left in the stream. And simultaneously, if the number returned is 0, then there is no lines (or characters) left to be read in the stream. But if the number is 1, then you will never know until you have tried.
Example: File reading idiom
This example shows a common idiom for reading all contents of a file into REXX variables using the LINES() and LINEIN() built-in functions.
i = 1
signal on notready
lleft = lines(file)
do while lleft>0
do i=i to i+lleft
line.i = linein(file)
end
lleft = lines(file)
end
notready:
lines.0 = i-1
Here, the two nested loops iterates over all the data to be read. The innermost loop reads all data currently available, while the outermost loop checks for more available data. Implementations having a LINES() that return only 0 and 1 will generally iterate the outermost loop many times; while implementations that returns the "true" number from LINES() generally only iterates the outermost loop once.
There is only one place in this code that LINEIN() is called. The I variable is incremented at only one place, and the variable LINES.0 is set in one clause, too. Some redundancy can be removed by setting the WHILE expression to:
do while word(value('lleft',lines(file)) lleft,2)>0
The two assignments to the LLEFT variable must be removed. This may look more complicated, but it decreases the number of clauses having a call to LINES() from two till one. However, it is less certain that this second solution is more efficient, since using VALUE() built-in function can be inefficient over "normal" variable references.
How to handle the last line in a stream is sometimes a problem. If you use a system that stores end-of-lines as special character sequences, and the last part of the data of a stream is an unterminated line, then what is returned when you try to read that part of data?
There are three possible solutions: First, it may interpret the end-of-file itself as an implicit end-of-line, in this case, the partial part of the line is returned, as if it was properly terminated. Second, it may raise the NOTREADY condition, since the end-of-file was encountered during reading. Third, if there is any chance of additional data being appended, it may wait until such data are available. The second and third approaches are suitable for persistent and transient files, respectively.
The first approach is sometimes encountered. It has some problems though. If the end of a stream contains the data ABC<NL>XYZ, then it might return the string XYZ as the last line of the stream. However, suppose the last line was an empty line, then the last part of the stream would be: ABC<NL>. Few would argue that there is any line in this stream after the line ABC. Thus, the decision whether the end-of-file is an implicit end-of-line depends on whether the would-be last line has zero length or not.
An pragmatic solution is to let the end-of-file only be an implicit end-of-file if the characters immediately in front of it are not an explicit end-of-line character sequence.
However, TRL gives some indications that an end-of-file is not an implicit end-of-line. It says that LINES() returns the number of complete lines left, and that LINEIN() returns a complete line. On the other hand, the end-of-line sequence is not rigidly defined by TRL, so an implementor is almost free to define end-of-line in just about any terms that are comfortable. Thus, the last line of a stream may be a source of problem if it is not explicitly terminated by an end-of-line.
This section lists some of the other parts of REXX and the environments around REXX that may be considered a part of the I/O system.
[Stack.]
The stack be used to communicate with external environments. At the REXX side, the interface to the stack is the instructions PUSH, PULL, PARSE PULL, and QUEUE; and the built-in function QUEUED(). These can be used to communicate with external programs by storing data to be transferred on the stack.
[The STREAM() built-in function.]
This function is used to control various aspects about the files manipulated with the other standard I/O functions. The standard says very little about this function, and leaves it up to the implementor to specify the rest. Operations like opening, closing, truncating, and changing modes
[The SAY instruction.]
The SAY instruction can be used to write data to the default output stream. If you use redirection, you can indirectly use it to write data to a file.
[The ADDRESS instruction.]
The ADDRESS instruction and commands can be used to operate on files, depending on the power of your host environments and operating system.
[The VALUE() built-in function.]
The function VALUE(), when used with three parameters, can be used to communicate with external host environments and the operating system. However, this depends on the implementation of your interpreter.
[SAA API.]
The SAA API provides several operations that can be used to communicate between processes. In general, SAA API allows you to perform the operations listed above from a binary program written in a language other than REXX.
And of course, I/O is performed whenever a REXX program or external function is started.
This section describes some implementations of stream I/O in REXX. Unfortunately, this has become a very large section, reflecting the fact that stream I/O is an area of many system-specific solutions.
In addition, the variations within this topic are rather large. Regina implements a set of functions that are very close to that of TRL2. The other extreme are ARexx and BRexx, which contain a set of functions which is very close to the standard I/O library of the C programming language.
Regina implements stream I/O in a fashion that closely resembles how it is described in TRL2. The following list gives the relevant system-specific information.
[Names for standard streams.]
Regina uses <stdout> and <stdin> as names for the standard output and input streams. Note that the angle brackets are part of the names. You may also access the standard error stream (on systems supporting this stream) under the name <stderr>. In addition, the nullstring is taken to be equivalent to an empty first parameter in the I/O-related built-in functions.
[Implicit opening.]
Regina implicitly opens any file whenever it is first used.
If the first operation is a read, it will be opened in read-only mode. If the first operation is a write, it is opened in read-write mode. In this case if the read-write opening does not succeed, the file is opened in write-only mode. If the file exists, the opening is non-destructive, i.e. that the file is not truncated or overwritten when opened, else it is created if opened in read-write mode.
If you name a file currently open in read-only mode in a write operation, Regina closes the file, and reopens it in read-write mode. The only exception is when you call LINEOUT() with both second and third arguments unspecified, which always closes a file, both for reading and writing. Similarly, if the file was opened in write-only mode, and you use it in a read operation, Regina closes and reopens in read-write mode.
This implicit reopening is enabled by default. You can turn it off by unsetting the extension ExplicitOpen.
[Separate current positions.]
The environment in which Regina operates (ANSI C and POSIX) does not allow separate read and write positions, but only supplies one position for both operations. Regina handles this by maintaining the two positions internally, and move the "real" current position back and forth depending on whether a read or write operation is next.
[Swapping out file descriptors.]
In order to defend itself against "open-many-close-none" programming, Regina tries to "swap out" files that have been unused for some time. Assume that your operating system limits Regina to 100 simultaneously open files; when your try to open your 101st file, Regina closes the least recently used stream, and recycles its descriptor for the new file. You can enable or disable this recycling with the SwapFilePtr extension.
During this recycling, Regina only closes the file in the operating system, but retains all vital information about the file itself. If you re-access the file later, Regina reopens it, and positions the current read and write positions at the correct (i.e. previous) positions. This introduces some uncertainties into stream processing. Renaming a file affects it only if it gets swapped out. Since the swap operation is something the users do not see, it can cause some strange effects.
Regina will not allow a transient stream to be swapped out, since they often are connected to some sort of active partner in the other end, and closing the file might kill the partner or make it impossible to reestablish the stream. So only persistent files are swapped out. Thus, you can still fill the file table in Regina.
[Explicit opening and closing.]
Regina allows streams to be explicitly opened or closed through the use of the built-in function STREAM(). The exact syntax of this function is described in section stream. Old versions of Regina supported two non-standard built-in functions OPEN() and CLOSE() for these operations. These functions are still supported for compatibility reasons, but might be removed in future releases. Their availability is controlled by the OpenBif and CloseBif extensions.
[Truncation after writing lines.]
If you reposition line-wise the current write position to the middle of a file, Regina truncates the file at the new position. This happens whether data is written during the LINEOUT() or not. If not, the file might contain half a line, some lines might disappear, and the linecount would in general be disrupted. The availability of this behavior is controlled by LineOutTrunc, which is turned on by default.
Unfortunately, the operation of truncating a file is not part of POSIX, and it might not exist on all systems, so on some rare systems, this truncating will not occur. In order to be able to truncate a file, your machine must have the ftruncate() system call in C. If you don't have this, the truncating functionality is not available.
[Caching info on lines left.]
When Regina executes the built-in function LINES() for a persistent stream, it caches the number of lines left as an attribute to the stream. In subsequent calls to LINEIN(), this number is updated, so that subsequent calls to LINES() can retrieve the cached number instead of having to re-scan the rest of the stream, provided that the number is still valid. Some operations will invalidate the count: repositioning the current read position; reading using the character oriented I/O, i.e. CHARIN(); and any write operation by the same interpreter on the stream. Ideally, any write operation should invalidate the count, but that might require a large overhead before any operation, in order to check whether the file has been written to by other programs.
This functionality can be controlled by the extension called CacheLineNo, which is turned on by default. Note that if you turn that off, you can experience a serious decrease in performance.
The following extra built-in functions relating to stream I/O are defined in Regina. They are provided for extra support and compatibility with other systems. Their support may be discontinued in later versions, and they are likely to be moved to a library of extra support.
CLOSE(streamid)
Closes the stream named by streamid. This stream must have been opened by implicit open or by the OPEN function call earlier. The function returns 1 if there was any file to close, and 0 if the file was not opened. Note that the return value does not indicate whether the closing was successful. You can use the extension named CloseBif with the OPTIONS instruction to select or remove this function. This function is now obsolete, instead you should use:
STREAM( streamid, 'Command', 'CLOSE' )
CLOSE(myfile) |
1 |
if stream was open |
CLOSE('NOSUCHFILE') |
0 |
if stream didn't exist |
OPEN(streamid,access)
Opens the stream named streamid with the access access. If access is not specified, the access R will be used. access may be the following characters. Only the first character of the access is needed.
[R]
(Read) Open for read access. The file pointer will be positioned at the start of the file, and only read operations are allowed.
[W]
(Write) Open for write access and position the current write position at the end of the file. An error is returned if it was not possible to get appropriate access.
The return value from this function is either 1 or 0, depending on whether the named stream is in opened state after the operation has been performed.
Note that if you open the files "foobar" and "./foobar" they will point to the same physical file, but Regina interprets them as two different streams, and will open a internal file descriptor for each one. If you try to open an already open stream, using the same name, it will have no effect.
You can use the extension OpenBif with the OPTIONS instruction to control the availability of this function. This function is now obsolete, but is still kept for compatibility with other interpreters and older versions of Regina. Instead, with Regina you should use:
STREAM( streamid, 'C', 'READ'|'WRITE'|'APPEND'|'UPDATE' )
OPEN(myfile, 'write') |
1 |
maybe, if successful |
OPEN(passwd, 'Write') |
0 |
maybe, if no write access |
OPEN('DATA', 'READ') |
0 |
maybe, if successful |
The return value from this function is either 1 or 0, depending on whether the named stream is in opened state after the operation has been performed.
This section lists the functionality not yet in Regina, but which is intended to be added later. Most of these are fixes to problems, compatibility modes, etc.
[Indirect naming of streams.]
Currently, streams are named directly, which is a convenient. However, there are a few problems: for instance, it is difficult to write to a file which name is <stdout>, simply because that is a reserved name. To fix this, an indirect naming scheme will be provided through the STREAM()< built-in function. The functionality will resemble the OPEN() built-in function of ARexx.
[Consistence in filehandle swapping.]
When a file handle is currently swapped out in order to avoid filling the system file table, very little checking of consistency is currently performed. At least, vital information about the file should be retained, such as the inode and file system for Unix machines retrieval by the fstat() call. When the file is swapped in again, this information must be checked against the file which is reopened. If there is a mismatch, NOTREADY should be raised. Similarly, when reopening a file because of a new access mode is requested, the same checking should be performed.
[Files with holes.]
Regina will be changed to allow it to generate files with holes for system where this is relevant. Although standard REXX does not allow this, it is a very common programming idiom for certain systems, and should be allowed. It will, however, be controllable through a extension called SparseFiles.
ARexx differs considerably from standard REXX with respect to stream I/O. In fact, none of the standard stream functionality of REXX is available in ARexx. Instead, a completely distinct set of functions are used. The differences are so big, that it is useless to describe ARexx stream I/O in terms of standard REXX stream I/O, and everything said so far in this chapter is irrelevant for ARexx. Therefore, we explain the ARexx functionality from scratch.
All in all, the ARexx file I/O interface resembles the functions of the Standard C I/O library, probably because ARexx is written in C, and the ARexx I/O functions are "just" interfaces to the underlying C functions. You may want to check up the documentation for the ANSI C I/O library as described in [ANSIC], [KR], and [PJPlauger].
ARexx uses a two level naming scheme for streams. The file names are bound to a stream name using the OPEN() built-in function. In all other I/O functions, only the stream name is used.
OPEN(name,filename[,mode])
You use the OPEN() built-in function to open a stream connected to a file called filename in AmigaDOS. In subsequent I/O calls, you refer to the stream as name. These two names can be different.
The name parameter cannot already be in use by another stream. If so, the OPEN() function fails. Note that the name parameter is case-sensitive. The filename parameter is not strictly case-sensitive: the case used when creating a new file is preserved, but when referring to an existing file, the name is case-insensitive. This is the usual behavior of AmigaDOS.
If any of the other I/O operations uses a stream name that has not been properly opened using OPEN(), that operation fails, because ARexx has no auto-open-on-demand feature.
The optional parameter mode can be any of Read, Write, or Append. The mode Read opens an existing file and sets the current position to the start of the file. The mode Append is identical to Read, but sets the current positions to the end-of-file. The mode Write creates a new file, i.e. if a file with that name already exists, it is deleted and a new file is created. Thus, with Write you always start with an empty file. Note that the terms "read," "write," and "append" are only remotely connected to the mode in which the file is opened. Both reading and writing are allowed for all of these three modes; the mode names only reflect the typical operations of these modes.
The result from OPEN() is a boolean value, which is 1 if a file by the specified name was successfully opened during the OPEN() call, and 0 otherwise.
The number of simultaneously open files is no problem because AmigaDOS allocates files handles dynamically, and thus only limited by the available memory. One system managed 2000 simultaneously open files during a test.
OPEN('infile', 'work:DataFile') |
1 |
if successful |
OPEN('work', 'RAM:FooBar', 'Read') |
0 |
if didn't exist |
OPEN('output', 'TmpFile', 'W') |
1 |
(re)creates file |
CLOSE(name)
You use the CLOSE() built-in function to close a stream. The parameter name must match the first parameter in a call to OPEN() earlier in the same program, and must refer to an open stream. The return value is a boolean value that reflects whether there was a file to close (but not whether it was successfully closed).
CLOSE('infile') |
1 |
if stream was previously open |
CLOSE('outfile') |
0 |
if stream wasn't previously open |
WRITELN(name,string)
The WRITELN() function writes the contents of string as a line to the stream name. The name parameter must match the value of the first parameter in an earlier call to OPEN(), and must refer to an open stream. The data written is all the characters in string immediately followed by the newline character (ASCII <Ctrl-J> for AmigaDOS).
The return value is the number of characters written, including the terminating newline. Thus, a return value of 0 indicates that nothing was written, while a value which is one more than the number of characters in string indicates that all data was successfully written to the stream.
When writing a line to the middle of a stream, the old contents is written over, but the stream is not truncated; there is no way to truncate a stream with the ARexx built-in functions. This overwriting can leave partial lines in the stream.
WRITELN('tmp', 'Hello, world!') |
14 |
if successful |
WRITELN('work', 'Hi there') |
0 |
nothing was written |
WRITELN('tmp', 'Hi there') |
5 |
partially successful |
WRITECH(name,string)
The WRITECH() function is identical to WRITELN(), except that the terminating newline character is not added to the data written out. Thus, WRITELN() is suitable for line-wise output, while WRITECH() is useful for character-wise output.
WRITECH('tmp', 'Hello, world!') |
13 |
if successful |
WRITECH('work', 'Hi there') |
0 |
nothing was written |
WRITECH('tmp', 'Hi there') |
5 |
partially successful |
READLN(name)
The READLN() function reads a line of data from the stream referred to by name. The parameter name must match the first parameter of an earlier call to OPEN(), i.e. it must be an open stream.
The return value is a string of characters which corresponds to the characters in the stream from and including the current position forward to the first subsequent newline character found. If no newline character is found, the end-of-file is implicitly interpreted as a newline and the end-of-file state is set. However, the data returned to the user never contains the terminating end-of-line.
To differ between the situation where the last line of the stream was implicitly terminated by the end-of-file and where it was explicitly terminated by an end-of-line character sequence, use the EOF() built-in function. The EOF() returns 1 in the former case and 0 in the latter case.
There is a limit in ARexx on the length of lines that you can read in one call to READLN(). If the length of the line in the stream is more than 1000 characters, then only the first 1000 characters are returned. The rest of the line can be read by additional READLN() and READCH() calls. Note that whenever READLN() returns a string of exactly 1000 characters, then no terminating end-of-line was found, and a new call to READLN() must be executed in order to read the rest of the line.
READLN('tmp') |
Hello world! |
maybe |
READLN('work') |
|
maybe, if unsuccessful |
READCH(name[,length])
The READCH() built-in function reads characters from the stream named by the parameter name, which must correspond to the first parameter in a previous call to OPEN(). The number of characters read is given by length, which must be a non-negative integer. The default value of length is 1.
The value returned is the data read, which has the length corresponding to the length parameter if no errors occurred.
There is a limit in ARexx for the length of strings that can be read in one call to READCH(). The limit is 65535 bytes, and is a limitation in the maximum size of an ARexx string.
READCH('tmp',3) |
Hel |
maybe |
READCH('tmp') |
l |
maybe |
READCH('tmp',6) |
o worl |
maybe |
EOF(name)
The EOF() built-in function tests to see whether the end-of-file has been seen on the stream specified by name, which must be an open stream, i.e. the first parameter in a previous call to OPEN().
The return value is 1 if the stream is in end-of-file mode, i.e. if a read operation (either READLN() or READCH()) has seen the end-of-file during its operation. However, reading the last character of the stream does not put the stream in end-of-file mode; you must try to read at least one character past the last character. If the stream is not in end-of-file mode, the return value is 0.
Whenever the stream is in end-of-file mode, it stays there until a call to SEEK() is made. No read or write operation can remove the end-of-file mode, only SEEK() (and closing followed by reopening).
EOF('tmp') |
0 |
maybe |
EOF('work') |
1 |
maybe |
SEEK(name,offset[,mode])
The SEEK() built-in function repositions the current position of the file specified by the parameter name, which must correspond to an open file, i.e. to the first parameter of a previous call to OPEN(). The current position in the file is set to the byte referred to by the parameter offset. Note that offset is zero-based, so the first byte in the file is numbered 0. The value returned is the current position in the file after the seek operation has been carried through, using Beginning mode.
If the current position is attempted set past the end-of-file or before the beginning of the file, then the current position is not moved, and the old current position is returned. Note that it is legal to position at the end-of-file, i.e. the position immediately after the last character of the file. If a file contains 12 characters, the valid range for the resulting new current position is 0-12.
The last parameter, mode, can take any of the following values:
Beginning, Current, or End. It specify the base of the seeking, i.e. whether it is relative to the first byte, the end-of-file position, or the old current position. For instance: for a 20 byte file with current position 3, then offset 7 for base Beginning is equivalent to offset -13 for base End and offset 4 for Current. Note that only the first character of the mode parameter is required, the rest of that parameter is ignored.
SEEK('tmp', 12, 'B') |
12 |
if successful |
SEEK('tmp', -4, 'Begin') |
12 |
if previously at 12 |
SEEK('tmp', -10, 'E') |
20 |
if length is 30 |
SEEK('tmp', 5) |
17 |
if previously at 12 |
SEEK('tmp', 5, 'Celcius') |
17 |
only first character in mode matters |
SEEK('tmp', 0, 'B') |
0 |
always to start of file |
Now, as the functionality has been explained, let me point out the main conceptual differences from standard REXX; they are:
[Current position.]
ARexx does not differ between a current read and write position, but uses a common current position for both reading and writing. Further, this current position (which it is called in this documentation) can be set to any byte within the file, and to the end-of-file position. Note that the current position is zero-based.
[Indirect naming.]
The stream I/O operations in ARexx do not get a parameter which is the name of the file. Instead, ARexx uses an indirect naming scheme. The OPEN() built-in function binds a REXX stream name for a file to a named file in the AmigaDOS operating system; and later, only the REXX stream name is used in other stream I/O functions operating on that file.
[Special stream names.]
There are two special file names in ARexx: STDOUT and STDIN, which refer to the standard input file and standard output file. With respect to the indirect naming scheme, these are not file names, but names for open streams; i.e. they can be used in stream I/O operations other than OPEN(). For some reason, is it possible to close STDIN but not STDOUT.
[NOTREADY not supported.]
ARexx has no NOTREADY condition. Instead, you must detect errors by calling EOF() and checking the return codes from each I/O operations.
[Other things missing.]
In ARexx, all files must be explicitly opened. There is no way to reposition line-wise, except for reading lines and keeping a count yourself.
Of course, ARexx also has a lot of functionality which is not part of standard REXX, like relative repositioning, explicit opening, an end-of-file indicator, etc. But this functionality is descriptive above in the descriptions of extended built-in functions, and it is of less interest here.
When an ARexx script has opened a file in Write mode, other ARexx scripts are not allowed to access that file. However, if the file is opened in Read or Append mode, then other ARexx scripts can open the file too, and the same state of the contents of the file is seen by all scripts.
Note that it is difficult to translate between using standard REXX stream I/O and ARexx stream I/O. In particular, the main problem (other than missing functionality in one of the systems) is the processing of end-of-lines. In standard REXX, the end-of-file is detected by checking whether there is more data left, while in ARexx one checks whether the end-of-file has been read. The following is a common standard REXX idiom:
while lines('file')>0 /* for each line available */
say linein('file') /* process it */
end
In ARexx this becomes:
tmp = readln('file') /* attempt to read first line */
do until eof('file') /* if EOF was not seen */
say tmp /* process line */
tmp = readln('file') /* attempt to read next line */
end
It is hard to mechanically translate between them,
because of the lack of an EOF() built-in function in standard REXX, and the lack of a LINES() built-in function in ARexx.
Note that in the ARexx example, an improperly terminated last line is not read as an independent line, since READLN() searches for an end-of-line character sequence. Thus, in the last invocation tmp is set to the last unterminated line, but EOF() returns true too. To make this different, make the UNTIL subterm of the DO loop check for the expression EOF('file') && TMP<>".
The limit of 1000 characters for READLN() means that a generic line reading routine in ARexx must be similar to this:
readline: procedure
parse arg filename
line = ''
do until length(tmpline)<1000
tmpline = readln(filename)
line = line || tmpline
end
return line
This routine calls READLN() until it returns a line that is shorter than 1000 characters. Note that end-of-file checking is ignored, since READLN() returns an empty string a the end-of-stream.
BRexx contains a set of I/O which shows very close relations with the C programming language I/O library. In fact, you should consider consulting the C library documentation for in-depth documentation on this functionality.
BRexx contains a two-level naming scheme: in REXX, streams are referred to by a stream handle, which is an integer; in the operating system files are referred to by a file name, which is a normal string. The function OPEN() is used to bind a file name to a stream handle. However, BRexx I/O functions generally have the ability to get a reference either as a file name and a stream handle, and open the file if appropriate. However, if the name of a file is an integer which can be interpreted as a file descriptor number, it is interpreted as a descriptor rather than a name. Whenever you use BRexx and want to program robust code, always use OPEN() and the descriptor.
If a file is opened by specifying the name in a I/O operation other than OPEN(), and the name is an integer and only one or two higher than the highest current file descriptor, strange things may happen.
Five special streams are defined, having the pseudo file names: <STDIN>, <STDOUT>, <STDERR>, <STDAUX>, and <STDPRN>; and are assigned pre-defined stream handles from 0 to 4, respectively. These refer to the default input, default output, and default error output, default auxiliary output, and printer output. The two last generally refer to the COM1: and LPT1: devices under MS-DOS. Either upper or lower case letter can be used when referring to these four special names.
However, note that if any of these five special files are closed, they can not be reopened again. The reopened file will be just a normal file, having the name e.g. <STDOUT>.
There is a few things you should watch out for with the special files. I/O involving the <STDAUX> and <STDPRN> can cause the Abort, Retry, Ignore message to be shown once for each character that was attempted read or written. It can be boring and tedious to answer R or I if the text string is long. If A is answered, BRexx terminates.
You should never write data to file descriptor 0 (<STDIN>), apparently, it will only disappear. Likewise, never read data to file descriptors 1 and 2 (<STDOUT> and <STDERR>), the former seems to terminate the program while the latter apparently just returns the nullstring. Also be careful with reading from file descriptors 3 and 4, since your program may hang if no data is available.
OPEN(file,mode)
The OPEN() built-in function opens a file named by file, in mode mode, and returns an integer which is the number of the stream handle assigned to the file. In general, the stream handle is a non-negative integer, where 0 to 4 are pre-defined for the default streams. If an error occurred during the open operation, the value -1 is returned.
The mode parameter specifies the mode in which the file is opened. It consists of two parts: the access mode, and the file mode. The access mode part consists of one single character, which can be r for read, w for write, and a for append. In addition, the + character can be appended to open a file in both read and write mode. The file mode part can also have of one additional character which can be t for text files and b for binary files. The t mode is default.
The following combinations of + and access mode are possible:
r is non-destructive open for reading; w is destructive open for write-only mode; a is non-destructive open for in append-only mode, i.e. only write operations are allowed, and all write operations must be performed at the end-of-file; r+ is non-destructive open for reading and writing; w+ is destructive open for reading and writing; and a+ is non-destructive open in append update, i.e. reading is allowed anywhere, but writing is allowed only at end-of-file. Destructive mode means that the file is truncated to zero length when opened.
In addition, the b and t characters can be appended in order to open the file in binary or text mode.
These modes are the same as under C, although the t mode character is strictly not in ANSI C. Also note that r, w, and a are mutually exclusive, but one of them must always be present. The mode + is optional, but if present, it must always come immediately after r, w, or a. The t and b modes are optional and mutually exclusive; the default is t. If present, t or b must be the last character in the mode string.
open('myfile','w') |
7 |
perhaps |
open('no.such.file','r') |
-1 |
if non-existent |
open('c:tmp','r+b') |
6 |
perhaps |
If two file descriptors are opened to the same file, only the most recently of them works. However, if the most recently descriptor is closed, the least recently starts working again. There may be other strange effects too, so try avoid reopening a file that is already open.
CLOSE(file)
The CLOSE() built-in function closes a file that is already open. The parameter file can be either a stream handle returned from OPEN() or a file name which has been opened (but for which you do not known the correct stream handle).
The return value of this function seems to be the nullstring in all cases.
close(6) |
|
if open |
close(7) |
|
if not open |
close('foobar') |
|
perhaps |
EOF(file)
The EOF() built-in function checks the end-of-file state for the stream given by file, which can be either a stream descriptor or a file name. The value returned is 1 if the end-of-file status is set for the stream, and 0 if it is cleared. In addition, the value -1 is returned if an error occurred, for instance if the file is not open.
The end-of-file indicator is set whenever an attempt was made to read at least one character past the last character of the file. Note that reading the last character itself will not set the end-of-file condition.
eof(foo) |
0 |
if not at eof |
eof('8') |
1 |
if at eof |
eof('no.such.file') |
-1 |
if file isn't open |
READ([file][,length])
The READ() built-in function reads data from the file referred to by the file parameter, which can be either a file name or a stream descriptor. If it is a file name, and that file is not currently open, then BRexx opens the file in mode rt. The default value of the first parameter is the default input stream. The data is read from and including the current position.
If the length parameter is not specified, a whole line is read, i.e. reading forwards to and including the first end-of-line sequence. However, the end-of-line sequence itself is not returned. If the length parameter is specified, it must be a non-negative integer, and specified the number of characters to read.
The data returned is the data read, except that if length is not specified, the terminating end-of-line sequence is stripped off. If the last line of a file contains a string unterminated by the end-of-string character sequence, then the end-of-file is implicitly interpreted as an end-of-line. However, in this case the end-of-file state is entered, since the end-of-stream was found while looking for an end-of-line.
read('foo') |
one line |
reads a complete line |
read('foo',5) |
anoth |
reads parts of a line |
read(6) |
er line |
using a file descriptor |
read() |
hello there |
perhaps, reads line from default input stream |
WRITE([file][,[string][,dummy]])
The WRITE() built-in function writes a string of data to the stream specified by the file parameter, or by default the default output stream. If specified, file can be either a file name or a stream descriptor. If it is a file name, and that file is not already open, it is opened using wt mode.
The data written is specified by the string parameter.
The return value is an integer, which is the number of bytes written during the operation. If the file is opened in text mode, all ASCII newline characters are translated into ASCII CRLF character sequences. However, the number returned is not affected by this translation; it remains independent of any text of binary mode. Unfortunately, errors while writing is seldom trapped, so the number returned is generally the number of character that was supposed to be written, independent of whether they was actually written or not.
If a third parameter is specified, the data is written as a line, i.e. including the end-of-line sequence. Else, the data is written as-is, without any end-of-line sequence. Note that with BRexx, the third parameter is considered present if at least the comma in front of it--the second comma--is present. This is a bit inconsistent with the standard operations of the ARG() built-in function. The value of the third parameter is always ignored, only its presence is considered.
If the second parameter is omitted, only an end-of-line action is written, independent of whether the third parameter is present or not.
write('bar','data') |
4 |
writes four bytes |
write('bar','data','nl') |
4+?? |
write a line |
write('bar','data',) |
4+?? |
same as previous |
SEEK(file[,[offset][,origin]])
The SEEK() built-in function moves the current position to a location in the file referred to by file. The parameter file can be either a file name (which must already be open) or a stream descriptor. This function does not implicitly open files that is not currently open.
The parameter offset determines the location of the stream and must be an integer. It defaults to zero. Note that the addressing of bytes within the stream is zero-based.
The third parameter can be any of TOF, CUR, or EOF, in order to set the reference point in which to recon the offset location. The three strings refer to top-of-file, current position, and end-of-file, and either upper or lower case can be used. The default value is ???.
The return value of this function is the absolute position of the position in the file after the seek operation has been performed.
The SEEK() function provides a very important additional feature. Whenever a file opened for both reading and writing has been used in a read operation and is to be used in a write operation next (or vice versa), then a call to SEEK() must be performed between the two I/O calls. In other words, after a read only a seeking and reading may occur; after a write, only seeking and writing may occur; and after a seek, reading, writing, and seeking may occur.
Under the MS-DOS operating system, the end-of-line character sequence is <CR><LF>, while in C, the end-of-line sequence is only <LF>. This opens for some very strange effects.
When an MS-DOS file is opened for read in text mode by BRexx, all <CR><LF> character sequences in file data are translated to <LF> when transferred into the C program. Further, BRexx, which is a C program, interprets <LF> as an end-of-line character sequence. However, if the file is opened in binary mode, then the first translation from <CR><LF> in the file to <LF> into the C program is not performed. Consequently, if a file that really is a text file is opened as a binary file and read line-wise, all lines would appear to have a trailing <CR> character.
Similarly, <LF> written by the C program is translated to <CR><LF> in the file. This is always done when the file is opened in text mode. When the file is opened in binary mode, all data is transferred without any alterations. Thus, when writing lines to a file which is opened for write in binary mode, the lines appear to have only <LF>, not <CR><LF>. If later opened as a text file, this is not recognized as an end-of-line sequence.
Example: Differing end-of-lines
Here is an example of how an incorrect choice of file type can corrupt data. Assume BRexx running under MS-DOS, using <CR><LF> as a end-of-line sequence in text files, but the system calls translating this to <LF> in the file I/O interface. Consider the following code.
file = open('testfile.dat', 'wt') /* text mode */
call write file, '45464748'x, 'dummy' /* i.e. 'abcd' */
call write file, '65666768'x, 'dummy' /* i.e. 'ABCD' */
call close file
file = open('testfile.dat', 'rb') /* binary mode */
say c2x(read(file)) /* says '454647480D' */
say c2x(read(file)) /* says '656667680D' */
call close file
Here, two lines of four characters each are written to the file, while when reading, two lines of five characters are read. The reason is simply that the writing was in text mode, so the end-of-line character sequence was <CR><LF>; while the reading was in binary mode, so the end-of-line character sequence was just <LF>. Thus, the <CR> preceding the <LF> is taken to be part of the line during the read.
To avoid this, be very careful about using the correct mode when opening files. Failure to do so will almost certainly give strange effects.
Table of Contents
Stream Input and Output 1
1Background and Historical Remarks 1
2REXX's Notion of a Stream 1
3Short Crash-Course 2
4Naming Streams 3
5Persistent and Transient Streams 6
6Opening a Stream 7
7Closing a Stream 8
8Character-wise and Line-wise I/O 8
9Reading and Writing 10
10Determining the Current Position 12
11Positioning Within a File 13
12Errors: Discovery, Handling, and Recovery 16
13Common Differences and Problems with Stream I/O 17
13.1Where Implementations are Allowed to Differ 17
13.2Where Implementations might Differ anyway 18
13.3LINES() and CHARS() are Inaccurate 18
13.4The Last Line of a Stream 20
13.5Other Parts of the I/O System 20
13.6Implementation-Specific Information 21
13.7Stream I/O in Regina 0.07a 21
13.8Functionality to be Implemented Later 24
13.9Stream I/O in ARexx 1.15 24
13.10Main Differences from Standard REXX 28
13.11Stream I/O in BRexx 1.0b 30
13.12Problems with Binary and Text Modes 34