-h- readme.txt	Wed Jul 24 11:49:28 1985	USER$A:[MINOW.LZ]README.TXT;1
This is a rewrite of the Unix compress utility.  It is *not*
switch-compatible with Unix compress, however it is (almost)
file-compatible (when compiled on Unix, or when "export" mode
is selected on VMS Version 4).

The advantages of this version are as follows:

1. Compress and decompress are separate programs, simplifying the
   problems of the small system implementor.  Both run on an
   unmapped PDP-11 (with a maximum of 12 bits).

   The command interface is just

	lzcomp input compressed_output
	lzdcmp compressed_input output

   Input files are not deleted.

2. The compression algorithm and I/O design is intended to simplify
   embedding the programs (as subroutines) in some other task.
   (for example, in a database package.)

3. On non-Unix systems, the I/O design should be significantly
   faster.  It should be slightly faster on Unix.

The only disadvantage is that, as noted, it is not command (option)
compatible with Unix compress.  Also, some periferal functionality
(such as the deletion of input files and the output file naming
conventions) have not been implemented.

On Unix (i.e., in "export" mode), the compressed data file is
identical to the Unix file, *except* that lzcomp writes two
CLEAR codes in a row to signal end-of-file (and lzdcmp treats
two CLEAR codes in a row as signalling end-of-file).

lzcomp and lzdcmp have been added to the Decus C distribution.

Martin Minow
decvax!minow,
minow%rex.dec@decwrl.arpa

-h- lzcmp1.c	Wed Jul 24 11:49:28 1985	USER$A:[MINOW.LZ]LZCMP1.C;21
/*
 *		lzcomp [-options] infile outfile
 */
#ifdef	DOCUMENTATION

title	lzcomp	File Compression
index		File compression

synopsis
	.s.nf
	lzcomp [-options] [infile [outfile]]
	.s.f
description

	lzcomp implements the Lempel-Ziv file compression algorithm.
	(Files compressed by lzcomp are uncompressed by lzdcmp.)
	It operates by finding common substrings and replaces them
	with a variable-size code.  This is deterministic, and
	can be done with a single pass over the file.  Thus,
	the decompression procedure needs no input table, but
	can track the way the table was built.

	Options may be given in either case.
	.lm +8
	.p -8
	-B	Input file is "binary", not "human readable text".
	This is necessary on Dec operating systems, such as VMS and
	RSX-11M, that treat these files differently.  (Note that binary
	support is rudamentary and probably insufficient as yet.)
	(On VMS version 4, this is ignored unless the -x option is
	specified or the input file is record-oriented.)
	.p -8
	-M bits	Write using the specified number of bits in the
	code -- necessary for big machines making files for little
	machines.  For example, if compressing a file on VMS
	which is to be read on a PDP-11, you should select -M 12.
	.p -8
	-V [n]	Verbose if specified.  If a value is specified,
	it will enable debugging code (if compiled in).
	.p -8
	-X [n]	"Export" -- write a file format that can be read by
	other operating systems.  Only the bytes in the file are copied;
	file attributes are not preserved.  If specified, the value
	determines the level of compatiblity.  If not specified,
	or specified with an explicit value of zero, and lzcomp is
	running on Vax/VMS version 4 under VaxC and the input file
	is a disk or magtape file (block-oriented), a VMS-private output
	format is used which is incompatible with the Unix compress
	utility, but which preserves VMS file attributes.  -X may
	take on the following values:
	.lm +4.s
	.i -4;#0##Choose VMS private format.  See restrictions below.
	.i -4;#1##Compatible with Unix compress version 3.0:
	this is the default if -x is given without a value.
	.i -4;#2##As above, but supress "block compression"
	.i -4;#3##Supress block compression and do not output
	a compress header block.  This is for compatiblity
	with a quite early version of Unix compress (and requires
	conditional-compilation to use).
	.lm -4.s
	Note that the -B (binary) option is ignored unless
	the input file is "record-oriented", such as a terminal
	or mailbox.
	.lm -8.s
	The other two arguments are the input and output
	filenames respectively.  Redirection is supported,
	however, the output must be a disk/tape file.

	The file format is almost identical to the current
	Unix implementation of compress (V4.0).  Files written
	by Unix compress should be readable by lzdcmp.  Files
	written by lzcomp in export (-x) format will be
	readable by Unix compress (except that lzcomp outputs
	two "clear" codes to mark EOF.  A patch to Unix
	compress is available.)

VMS Restrictions

	VMS Private mode stores the true name and attributes
	of the input file into the compressed file and lzdcmp
	restores the attributes (and filename if requested).
	The following restrictions apply -- they may be lifted
	in the future as they are primarily due to the author's
	lack of understanding of the intricacies of of VMS I/O:

	    All files must be stored on disk.
	    The lzcomp output file must be specified directly.

	Also, for all usage on VMS, the compressed file must
	be written to, and read from disk.

LZW compression algorithm

	This section is abstracted from Terry Welch's article
	referenced below.  The algorithm builds a string
	translation table that maps substrings in the input
	into fixed-length codes.  The compress algorithm may
	be described as follows:

	  1. Initialize table to contain single-character
	     strings.
	  2. Read the first character.  Set <w> (the prefix
	     string) to that character.
	  3. (step): Read next input character, K.
 	  4. If at end of file, output code(<w>); exit.
	  5. If <w>K is in the string table:
		Set <w> to <w>K; goto step 3.
	  6. Else <w>K is not in the string table.
		Output code(<w>);
		Put <w>K into the string table;
		Set <w> to K; Goto step 3.

	"At each execution of the basic step an acceptable input
	string <w> has been parsed off.  The next character K is
	read and the extended string <w>K is tested to see if it
	exists in the string table.  If it is there, then the
	extended string becomes the parsed string <w> and the
	step is repeated.  If <w>K is not in the string table,
	then it is entered, the code for the successfully
	parsed string <w> is put out as comprssed data, the
	character K becomes the beginning of the next string,
	and the step is repeated."

	The decompression algorithm translates each received
	code into a prefix string and extension [suffix] character.
	The extension character is stored (in a push-down stack),
	and the prefix translated again, until the prefix is a
	single character, which completes decompression of this
	code.  The entire code is then output by popping the
	stack.

	"An update to the string table is made for each code received
	(except the first one).  When a code has been translated,
	its final character is used as the extension character,
	combined with the prior string, to add a new string to
	the string table.  This new string is assigned a unique
	code value, which is the same code that the compressor
	assigned to that string.  In this way, the decompressor
	incrementally reconstructs the same string table that
	the decompressor used.... Unfortunately ... [the algorithm]
	does not work for an abnormal case.

	The abnormal case occurs whenever an input character string
	contains the sequence K<w>K<w>K, where K<w> already
	appears in the compressor string table."

	The decompression algorithm, augmented to handle
	the abnormal case, is as follows:

	  1. Read first input code;
	     Store in CODE and OLDcode;
	     With CODE = code(K), output(K);  FINchar = K;
	  2. Read next code to CODE; INcode = CODE;
	     If at end of file, exit;
	  3. If CODE not in string table (special case) then
		Output(FINchar);
		CODE = OLDcode;
		INcode = code(OLDcode, FINchar);
	
	  4. If CODE == code(<w>K) then
		Push K onto the stack;
		CODE == code(<w>);
		Goto 4.

	  5. If CODE == code(K) then
		Output K;
		FINchar = K;

	  6. While stack not empty
		Output top of stack;
		Pop stack;

	  7. Put OLDcode,K into the string table.
	     OLDcode = INcode;
	     Goto 2.

	The algorithm as implemented here introduces two additional
	complications.

	The actual codes are transmitted using a variable-length
	encoding.  The lowest-level routines increase the number
	of bits in the code when the largest possible code is
	transmitted.

	Periodically, the algorithm checks that compression is
	still increasing.  If the ratio of input bytes to output
	bytes decreases, the entire process is reset.  This can
	happen if the characteristics of the input file change.

VMS Private File Structure

	In VMS Private mode, the compressed data file contains
	a variable-length (but compressed) file header with the
	file "attributes" needed by the operating system to
 	construct the file.  This allows the decompression
	program to recreate the file in its original format,
	which is essential if ISAM databases are compressed.

	The overall file format is as follows:
	.lm +8
	.p -8
	LZ_SOH	"start of header" signal (this value cannot appear
	in user data).

	A variable-length data record (maximum 256 bytes)
	containing the header name, followed by whitespace, followed
	by header-specific information.  In this case, the name
	record will contain the string "vms$attributes" followed
	by the number of bytes in the attribute data block.
	(I assume that the name record will consist of a facility
	name, such as "vms", followed by a dollar sign, followed
	by a facility-unique word.)
	.p -8
	LZ_EOR	Signals "end of record".

	This is followed by a VMS file attributes record (generated
	by a VMS system library	routine).
	.p -8
	LZ_ETX	Signals "end of segment".
	.p -8
	ST_STX	Signals "start of text" (i.e., start of data file).

	This is followed by the user data file.
	.p -8
	LZ_ETX	Signals "end of segment"
	.p -8
	LZ_ETX	Two in a row signals "end of file".
	.s.lm -8
	Note that this format can easily be extended to include
	trailer records (with file counts and checksums) and/or
	multiple data files in one compressed file.

	Note also that the LZ_CLEAR code may appear in headers
	or data files to cause the decompression program to
	"readapt" to the characteristics of the input data.
	LZ_STX and LZ_SOH reset the compression algorithm.
	LZ_EOR does not.

Authors

	The algorithm is from "A Technique for High Performance
	Data Compression."  Terry A. Welch. IEEE Computer Vol 17,
	No. 6 (June 1984), pp 8-19.

	This revision is by Martin Minow.

	Unix Compress authors are as follows:
	.s.nf
	Spencer W. Thomas	(decvax!harpo!utah-cs!utah-gr!thomas)
	Jim McKie		(decvax!mcvax!jim)
	Steve Davies		(decvax!vax135!petsd!peora!srd)
	Ken Turkowski		(decvax!decwrl!turtlevax!ken)
	James A. Woods		(decvax!ihnp4!ames!jaw)
	Joe Orost		(decvax!vax135!petsd!joe)
	.s.f

#endif

/*
 * Compatible with compress.c, v3.0 84/11/27
 */

/*)BUILD
 *		$(PROGRAM) = lzcomp
 *		$(INCLUDE) = lz.h
 *		$(FILES) = { lzcmp1.c lzcmp2.c lzcmp3.c lzio.c lzvio.c }
 */

#include	"lz.h"


#ifdef unix
#include <sys/types.h>
#include <sys/stat.h>
#endif

/*
 * These global parameters are written to the compressed file.
 * The decompressor needs them.  The initialized values are defaults
 * and are modified by command line arguments.
 */
short		maxbits = BITS;		/* settable max # bits/code	*/
code_int maxmaxcode = 1 << BITS; 	/* Totally largest code	*/
code_int	hsize = HSIZE;		/* Actual hash table size	*/

/*
 * Flags (command line arguments) to control compression.
 */
#if VMS_V4
flag		export = 0;		/* Assume vms "private" mode	*/
#else
flag		export = 1;		/* Assume Unix compatible mode	*/
#endif
flag		block_compress = TRUE;	/* Assume block compression	*/
flag		binary = FALSE;		/* Reading text file if FALSE	*/
flag		noheader = FALSE;	/* No magic header if TRUE	*/
flag		verbose = VERBOSE_DEFAULT; /* Non-zero for status/debug	*/
flag		background = FALSE;	/* TRUE (Unix) if detached	*/
readonly flag	is_compress = TRUE;	/* for lzio.c (needed?)		*/
long		fsize;			/* Input file size in bytes	*/
char		*infilename = NULL;	/* For error printouts		*/
char		*outfilename = NULL;	/* For openoutput and errors	*/
int		firstcode;		/* First code after internals	*/
count_int	tot_incount = 0;	/* Total number of input bytes	*/
count_int	tot_outcount = 0;	/* Total number of output codes	*/
extern count_int in_count;
extern count_int out_count;
static long	start_time;		/* Time we started (in msec)	*/
extern long	cputime();		/* Returns process time in msec	*/
STREAM		instream;
STREAM		outstream;
char_type	inbuffer[MAXIO];
char_type	outbuffer[MAXIO];
static STREAM	mem_stream;
jmp_buf		failure;
#if VMS_V4
#include types
#include stat
#include descrip
#ifndef FDLSTUFF
#define FDLSTUFF char
#endif
FDLSTUFF	*fdl_input;
FDLSTUFF	*fdl_output;
static struct dsc$descriptor fdl_descriptor;
#endif

main(argc, argv)
int		argc;
char		*argv[];
/*
 * Compress mainline
 */
{
#ifndef	decus
	/*
	 * background is TRUE if running detached from the command terminal.
	 */
	background = (signal(SIGINT, SIG_IGN) == SIG_IGN) ? TRUE : FALSE;
	if (!background)
	    background = !isatty(fileno(stderr));
	if (!background) {
	    if (verbose > 0)
		signal(SIGINT, abort);
	    else {
		signal(SIGINT, interrupt);
		signal(SIGSEGV, address_error);
	    }
	}
#endif
	if (setjmp(failure) == 0) {
	    setup(argc, argv);		/* Command line parameters	*/
	    openinput();		/* Open input, set instream	*/
	    getfilesize();		/* Get input file size		*/
	    gethashsize();		/* Get actual hash table size	*/
	    initialize();		/* Set maxbits and the like	*/
	    openoutput();		/* Open output file		*/
	    if (verbose > 0)
		start_time = cputime();
	    put_magic_header();
	    init_compress(TRUE);
	    compress(&instream);
#if VMS_V4
	    if (export == 0) {
		outputcode((code_int) LZ_ETX);
		outputcode((code_int) LZ_ETX);
		fdl_close(fdl_input);
	    }
	    else
#endif
	    if (block_compress) {
		outputcode((code_int) LZ_CLEAR);
		outputcode((code_int) LZ_CLEAR);
	    }
	    outputcode((code_int) -1);		/* Flush output buffers	*/
#if VMS_V4
	    if (export == 0)
		fdl_close(fdl_output);
	    else {
		fclose(stdout);
	    }
#else
	    fclose(stdout);
#endif
	    if (verbose > 0) {
		start_time = cputime() - start_time;
		tot_incount += in_count;
		tot_outcount += out_count;
		fprintf(stderr, "%ld chars in, %ld bytes out, ",
		    tot_incount, tot_outcount);
		if (tot_outcount > 0) {
		    divout("compression ratio: ",
			(long) tot_incount, (long) tot_outcount, "");
		    divout(" (",
			((long) tot_incount - (long) tot_outcount) * 100,
			(long) tot_incount, "%)\n");
		}
		fprintf(stderr,
		    "%ld.%02ld seconds (process time) for compression.\n",
		    start_time / 1000L, (start_time % 1000L) / 10L);
		if (start_time > 0) {
		    divout("  ", (long) tot_incount * 10L,
			(start_time + 50L) / 100L,
			" input bytes per second.\n");
		}
	    }
	    exit(IO_SUCCESS);
	}
	else {
	    fprintf(stderr, "Error when compressing \"%s\" to \"%s\"\n",
		(infilename  == NULL) ? 
		    "<input file unknown>" : infilename,
		(outfilename == NULL) ?
		    "<output file unknown>" : outfilename);
	    if (errno != 0)
		perror("lzcomp fatal error");
	    exit(IO_ERROR);
	}
}

divout(leader, numer, denom, trailer)
char		*leader;
long		numer;
long		denom;
char		*trailer;
/*
 * Print numer/denom without floating point on small machines.
 */
{
	fprintf(stderr, "%s%ld.%02ld%s",
	    leader, numer / denom, ((numer % denom) * 100L) / denom, trailer);
}

static
initialize()
/*
 * Mung some global values.
 */
{
	if (maxbits < INIT_BITS)	/* maxbits is set by the -M 	*/
	    maxbits = INIT_BITS;	/* option.  Make sure it's	*/
	if (maxbits > BITS)		/* within a reasonable range	*/
	    maxbits = BITS;
	maxmaxcode = 1 << maxbits;	/* Truly biggest code		*/
	if (export == 0)
	    firstcode = LZ_FIRST;	/* VMS private			*/
	else if (block_compress)
	    firstcode = LZ_CLEAR + 1;	/* Default			*/
	else
	    firstcode = 256;		/* Backwards compatible		*/
}

put_magic_header()
/*
 * Write the magic header bits.
 */
{
#ifndef COMPATIBLE
	if (export && !noheader) {
	    PUT(HEAD1_MAGIC, &outstream);
	    PUT(HEAD2_MAGIC, &outstream);
	    PUT(maxbits | ((block_compress) ? BLOCK_MASK : 0),
		&outstream);
	}
#if VMS_V4
	else if (export == 0) {
	    char		text[256];
	    /*
	     * VMS private mode (with attribute block)
	     */
	    PUT(HEAD1_MAGIC, &outstream);
	    PUT(VMS_HEAD2_MAGIC, &outstream);
	    PUT((char) (maxbits | BLOCK_MASK), &outstream);
	    PUT(firstcode - 0x100, &outstream);
	    init_compress();
	    outputcode(LZ_SOH);
#if DEBUG
	    if (strlen(ATT_NAME) != ATT_SIZE) {
		fprintf("\"%s\", expected %d, got %d\n",
		    ATT_NAME, ATT_SIZE, strlen(ATT_NAME));
	    }
#endif
	    sprintf(text, "%s%d;", ATT_NAME, fdl_descriptor.dsc$w_length);
	    mem_compress(text, strlen(text));
	    outputcode(LZ_EOR);
	    mem_compress(fdl_descriptor.dsc$a_pointer,
			 fdl_descriptor.dsc$w_length);
	    fdl_free(&fdl_descriptor);
	    outputcode(LZ_ETX);
	    outputcode(LZ_STX);
	}
#endif
#endif
}

mem_compress(datum, length)
char_type	*datum;
int		length;
/*
 * Compress from memory
 */
{
	mem_stream.bp = mem_stream.bstart = datum;
	mem_stream.bsize = length;
	mem_stream.bend = datum + length;
	mem_stream.func = lz_eof;
	compress(&mem_stream);
}

/*
 * This routine is used to tune the hash table size according to
 * the file size.  If the filesize is unknown, fsize should be
 * set to zero.
 */

typedef struct TUNETAB {
    long	fsize;
    code_int	hsize;
} TUNETAB;

static readonly TUNETAB tunetab[] = {
#if HSIZE > 5003
    {	1 << 12,	 5003	},
#endif
#if HSIZE > 9001
    {	1 << 13,	 9001	},
#endif
#if HSIZE > 18013
    {	1 << 14,	18013	},
#endif
#if HSIZE > 35023
    {	1 << 15,	35023	},
    {	  47000,	50021	},
#endif
    {	      0,	    0	},
};

static
gethashsize()
/*
 * Tune the hash table parameters for small files.
 * We don't have a good way to find the file size on vms V3.
 * fsize is set to zero if we can't find it.
 */
{
	register TUNETAB	*tunep;

	hsize = HSIZE;
	if (fsize > 0) {
	    for (tunep = tunetab; tunep->fsize != 0; tunep++) {
		if (fsize < tunep->fsize) {
		    hsize = tunep->hsize;
		    break;
		}
	    }
	}
}

static
getfilesize()
/*
 * Set fsize to the input filesize (in bytes) if possible.
 * Magic for all operating systems.
 */
{
#ifdef	rsx
	extern char	f_efbk;	/* F.EFBK -- highest block in file	*/
#define	fdb(p,offset)	(stdin->io_fdb[((int) &p + offset)] & 0xFF)
#define efbk(offset)	fdb(f_efbk, offset)
	extern char	f_rtyp;	/* F.RTYP -- Record type		*/
	extern char	f_ratt;	/* F.RATT -- Record attributes		*/
	/*
	 * Note: Block number is stored high-order word first.
	 */
	fsize = efbk(2)
	    + (efbk(3) << 8)
	    + (efbk(0) << 16)
	    + (efbk(1) << 24);
	fsize *= 512;
#endif
#ifdef	rt11
	fsize = stdin->io_size;		/* Set by Decus C		*/
	fsize *= 512;
#endif
#ifdef	vms
#if VMS_V4
	struct stat	statbuf;

	fsize = 0;
	if (export != 0) {
	    if (fstat(fileno(stdin), &statbuf) == 0)
		fsize = (long) statbuf.st_size;
	}
	else {
	    fsize = (long) fdl_fsize(fdl_input);
	}
#else
	fsize = 0;				/* Can't find filesize	*/
#endif
#endif
#ifdef	unix
	struct stat	statbuf;

	fsize = 0;
	if (fstat(fileno(stdin), &statbuf) == 0)
	    fsize = (long) statbuf.st_size;
#endif
}

static readonly char *helptext[] = {
	"The following options are valid:",
	"-B\tBinary file (important on VMS/RSX, ignored on Unix)",
	"-M val\tExplicitly set the maximum number of code bits",
	"-V val\tPrint status information (or debugging) to stderr",
	"-X val\tSet export (compatiblity) mode:",
#if VMS_V4
	"  -X 0\tExplicitly choose VMS Private mode",
#endif
	"  -X 1\t(default if -X specified, output format is compatible",
	      "\twith Unix compress V3.0",
	"  -X 2\tCompatible with Unix compress 3.0, block compression",
	      "\tsupressed.",
#ifdef COMPATIBLE
	"  -X 3No header (file is readable by old compress)",
#endif
	NULL,
};

static
setup(argc, argv)
int		argc;
char		*argv[];
/*
 * Get parameters and open files.  Exit fatally on errors.
 */
{
	register char	*ap;
	register int	c;
	char		**hp;
	auto int	i;
	int		j;

#ifdef	vms
	argc = getredirection(argc, argv);
#endif
	for (i = j = 1; i < argc; i++) {
	    ap = argv[i];
	    if (*ap++ != '-' || *ap == EOS)	/* Filename?		*/
		argv[j++] = argv[i];		/* Just copy it		*/
	    else {
		while ((c = *ap++) != EOS) {
		    if (islower(c))
			c = toupper(c);
		    switch (c) {
		    case 'B':
			binary = TRUE;
			break;

		    case 'M':
			maxbits = getvalue(ap, &i, argv);
			if (maxbits < MIN_BITS) {
			    fprintf(stderr, "Illegal -M value\n");
			    goto usage;
			}
			break;

		    case 'V':
			verbose = getvalue(ap, &i, argv);
			break;

		    case 'X':
			export = getvalue(ap, &i, argv);
			if (export < 0 || export > 3) {
			    fprintf(stderr, "Illegal -X value: %d\n", export);
			    goto usage;
			}
			block_compress = "\1\1\0\0"[export];
			noheader       = "\0\0\0\1"[export];
			export         = "\0\1\1\1"[export];
			break;

		    default:
			fprintf(stderr, "Unknown option '%c' in \"%s\"\n",
				*ap, argv[i]);
usage:			for (hp = helptext; *hp != NULL; hp++)
			    fprintf(stderr, "%s\n", *hp);
			FAIL("usage");
		    }				/* Switch on options	*/
		}				/* Everything for -xxx	*/
	    }					/* If -option		*/
	}					/* For all argc's	*/
	/*  infilename = NULL; */		/* Set "stdin"  signal	*/
	/* outfilename = NULL; */		/* Set "stdout" signal	*/
	switch (j) {				/* Any file arguments?	*/
	case 3:					/* both files given	*/
	    if (!streq(argv[2], "-"))		/* But - means stdout	*/
		outfilename = argv[2];
	case 2:					/* Input file given	*/
	    if (!streq(argv[1], "-")) {
		infilename = argv[1];
	    }
	    break;

	case 0:					/* None!		*/
	case 1:					/* No file arguments	*/
	    break;

	default:
	    fprintf(stderr, "Too many file arguments\n");
	    FAIL("too many files");
	}
}

static int
getvalue(ap, ip, argv)
register char		*ap;
int			*ip;
char			*argv[];
/*
 * Compile a "value".  We are currently scanning *ap, part of argv[*ip].
 * The following are possible:
 *	-x123		return (123) and set *ap to EOS so the caller
 *	ap^		cycles to the next argument.
 *
 *	-x 123		*ap == EOS and argv[*ip + 1][0] is a digit.
 *			return (123) and increment *i to skip over the
 *			next argument.
 *
 *	-xy or -x y	return(1), don't touch *ap or *ip.
 *
 * Note that the default for "flag option without value" is 1.  This
 * can only cause a problem for the -M option where the value is
 * mandatory.  However, the result of 1 is illegal as it is less
 * than INIT_BITS.
 */
{
	register int	result;
	register int	i;

	i = *ip + 1;
	if (isdigit(*ap)) {
	    result = atoi(ap);
	    *ap = EOS;
	}
	else if (*ap == EOS
	      && argv[i] != NULL
	      && isdigit(argv[i][0])) {
	    result = atoi(argv[i]);
	    *ip = i;
	}
	else {
	    result = 1;
	}
	return (result);
}

openinput()
{
#ifdef decus
	if (infilename == NULL) {
	    infilename = malloc(256 + 1);
	    fgetname(stdin, infilename);
	    infilename = realloc(infilename, strlen(infilename) + 1);
	}
	else {
	    if (freopen(infilename, (binary) ? "rn" : "r", stdin) == NULL) {
		perror(infilename);
		FAIL("can't reopen input");
	    }
	}
#else
#ifdef vms
#if VMS_V4
	if (export == 0) {
	    char		*fname;
	    char		filename[256];

	    if ((fname = infilename) == NULL) {
		fgetname(stdin, filename);
		fname = filename;
	    }
	    if ((fdl_input = fdl_open(fname, &fdl_descriptor)) == NULL) {
		if ((fdl_status & 01) == 0) {
		    fdl_message(NULL, fname);
		    FAIL("can't fdl_open");
		}
		fprintf(stderr,
		    "Cannot open \"%s\" in vms private format,", fname);
		fprintf(stderr, " trying export format.\n");
		export = TRUE;
		goto try_export;
	    }
	    fclose(stdin);
	    stdin = NULL;
	    infilename = malloc(256 + 1);
	    infilename = realloc(fname, strlen(fname) + 1);
	    if (verbose > 1) {
		fprintf(stderr, "FDL information for \"%s\"\n", filename);
		fdl_dump(&fdl_descriptor, stderr);
	    }
	    goto opened;
	}
try_export:
#endif
	if (infilename == NULL) {
	    infilename = malloc(256 + 1);
	    fgetname(stdin, infilename);
	    infilename = realloc(infilename, strlen(infilename) + 1);
	}
	else {
#if VMS_V4
	    if ((stdin = freopen(infilename, "r", stdin)) == NULL) {
#else
	    if (freopen(infilename, "r", stdin) == NULL) {
#endif
		perror(infilename);
		exit(IO_ERROR);
	    }
	}
#else
	if (infilename == NULL)
	    infilename = "stdin";
	else {
	    if (freopen(infilename, "r", stdin) == NULL) {
		perror(infilename);
		exit(IO_ERROR);
	    }		    
	}
#endif
#endif
opened:	instream.bp = instream.bend = NULL;
	instream.bstart = inbuffer;
	instream.bsize = sizeof inbuffer;
	instream.func = lz_fill;
}

openoutput()
/*
 * Open the output file (after the input file has been opened).
 * if outfilename == NULL, it's already open on stdout.
 */
{
	if (outfilename == NULL) {
#if VMS_V4
#if 0					/* The following doesn't work	*/
	    outfilename = malloc(256 + 1);
	    fgetname(stdout, outfilename);
	    outfilename = realloc(outfilename, strlen(outfilename) + 1);
	    if (export == 0) {
		fclose(stdout);
		stdout = NULL;		/* Can't do terminal test below	*/
		if ((fdl_output = fdl_create(NULL, outfilename)) == NULL) {
		    if ((fdl_status & 01) == 0)
			fdl_message(NULL, outfilename);
		    fprintf(stderr, "Can't create \"%s\"\n", outfilename);
		    FAIL("can't fdl_create");
		}
	    }
#else
	    fprintf(stderr,
		"Restriction: The output file must be specified.\n");
	    FAIL("can't redirect on VMS V4");
#endif
#else
#ifdef	vms
	    outfilename = malloc(256 + 1);
	    fgetname(stdout, outfilename);
	    outfilename = realloc(outfilename, strlen(outfilename) + 1);
#else
#ifdef decus
	    outfilename = malloc(256 + 1);
	    fgetname(stdout, outfilename);
	    outfilename = realloc(outfilename, strlen(outfilename) + 1);
#else
	    outfilename = "<stdout>";
#endif
#endif
#endif
	}
	else {
#if VMS_V4
	    if (export == 0) {
		fclose(stdout);
		stdout = NULL;		/* Can't do terminal test below	*/
		if ((fdl_output = fdl_create(NULL, outfilename)) == NULL) {
		    if ((fdl_status & 01) == 0)
			fdl_message(NULL, outfilename);
		    fprintf(stderr,
			"Can't create \"%s\" (VMS private)\n", outfilename);
		    FAIL("can't fdl_create");
		}
	    }
	    else {
		if (freopen(outfilename, "w", stdout) == NULL) {
		    perror(outfilename);
		    FAIL("can't create");
		}
	    }
#else
#ifdef decus
	    if (freopen(outfilename, "wn", stdout) == NULL) {
		perror(outfilename);
		FAIL("can't create");
	    }
#else
	    if (freopen(outfilename, "w", stdout) == NULL) {
		perror(outfilename);
		FAIL("can't create");
	    }
#endif
#endif
	}
	if (stdout != NULL && isatty(fileno(stdout))) {
	    fprintf(stderr, "%s: is a terminal.  We object.\n",
		outfilename);
	    FAIL("can't create");
	}
	outstream.bp = outstream.bstart = outbuffer;
	outstream.bend = outbuffer + sizeof outbuffer;
	outstream.bsize = sizeof outbuffer;
	outstream.func = lz_flush;
}

-h- lzcmp2.c	Wed Jul 24 11:49:28 1985	USER$A:[MINOW.LZ]LZCMP2.C;8
/*
 *		l z c m p 2 . c
 *
 * Actually do compression.  Terminology (and algorithm):
 *
 * Assume the input string is "abcd", we have just processed "ab" and
 * read 'c'.  At this point, a "prefix code" will be assigned to "ab".
 * Search in the prefix:character memory (either the "fast memory" or
 * the hash-code table) for the code followed by this character.  If
 * found, assign the code found to the "prefix code" and read the
 * next character.  If not found, output the current prefix code,
 * generate a new prefix code and store "old_prefix:char" in the
 * table with "new_prefix" as its definition.
 *
 * Naming conventions:
 *   code	a variable containing a prefix code
 *   c or char	a variable containing a character
 *
 * There are three tables that are searched (dependent on compile-time
 * and execution time considerations):
 *   fast	Direct table-lookup -- requires a huge amount of physical
 *		(non-paged) memory, but is very fast.
 *   hash	Hash-coded table-lookup.
 *   cache	A "look-ahead" cache for the hash table that optimizes
 *		searching for the most frequent character.  This considerably
 *		speeds up processing for raster-images (for example) at
 *		a modest amount of memory.
 * Structures are used to hold the actual tables to simplify organization
 * of the program.
 *
 * Subroutines:
 *    compress()	performs data compression on an input datastream.
 *    init_compress()	called by the output routine to clear tables.
 */

#include	"lz.h"

/*
 * General variables
 * Cleared by init_compress on a "hard initialization"
 * outputcode() in lzcmp3.c refers to next_code.
 */

long int	in_count;		/* Length of input		*/
long int	out_count;		/* Bytes written to output file	*/
static flag	first_clear = TRUE;	/* Don't zero first time	*/
code_int	next_code;		/* Next output code		*/
static count_int checkpoint = CHECK_GAP; /* When to test ratio again	*/
static long	ratio = 0;		/* Ratio for last segment	*/

/*
 * These global parameters are set by mainline code.  Unchanged here.
 */
extern short	maxbits;		/* Settable max # bits/code	*/
extern short	block_compress;		/* For old-style compatibility	*/
extern code_int	maxmaxcode;		/* Actual maximum output code	*/
extern long	tot_incount;		/* Total input count		*/
extern long	tot_outcount;		/* Total output count		*/
extern code_int	hsize;			/* Actual hash table size	*/

#ifdef XENIX_16
static count_int htab0[8192];
static count_int htab1[8192];
static count_int htab2[8192];
static count_int htab3[8192];
static count_int htab4[8192];
static count_int htab5[8192];
static count_int htab6[8192];
static count_int htab7[8192];
static count_int htab8[HSIZE - 65536];
static count_int *hashtab[9] = {
    htab0, htab1, htab2, htab3, htab4, htab5, htab6, htab7, htab8
};

static U_short code0[16384];
static U_short code1[16384];
static U_short code2[16384];
static U_short code3[16384];
static U_short code4[HSIZE - 65536];
static U_short *codetab[5] = {
    code0, code1, code3, code3, code4
}

#define HASH(i)		(hashtab[((unsigned) (i)) >> 13][(i) & 0x1FFF])
#define CODE(i)		(codetab[((unsigned) (i)) >> 14][(i) & 0x3FFF])

#else
count_int	hashtab[HSIZE];
U_short		codetab[HSIZE];

#define HASH(i)		hashtab[i]
#define CODE(i)		codetab[i]
#endif

/*
 * compress a datastream
 *
 * Algorithm:  on large machines, for maxbits <= FBITS, use fast direct table
 * lookup on the prefix code / next character combination.  For smaller code
 * size, use open addressing modular division double hashing (no chaining), ala
 * Knuth vol. 3, sec. 6.4 Algorithm D, along with G. Knott's relatively-prime
 * secondary probe.  Do block compression with an adaptive reset, whereby the
 * code table is cleared when the compression ratio decreases, but after the
 * table fills.  The variable-length output codes are re-sized at this point,
 * and a special LZ_CLEAR code is generated for the decompressor.  For the
 * megamemory version, the sparse array is cleared indirectly through a
 * "shadow" output code history.  Late additions: for the hashing code,
 * construct the table according to file size for noticeable speed improvement
 * on small files.  Also detect and cache codes associated with the most
 * common character to bypass hash calculation on these codes (a characteristic
 * of highly-compressable raster images).  Please direct questions about this
 * implementation to ames!jaw.
 */

compress(in)
STREAM		*in;		/* Input stream structure		*/
/*
 * Compress driver.  Global fsize is the size of the entire datastream
 * (from LZ_STX or LZ_SOH to the terminating LZ_ETX).  You must
 * force a reinitialization -- by calling outputcode() with a new header --
 * if size is changed.  If the "newer" output format is chosen (with
 * data streams delimited by LZ_SOH/LZ_STX, init_compress will be
 * called automatically.  Otherwise, you must call init_compress(TRUE)
 * before calling compress() for the first time.
 */
{
	register long		hash_code;	/* What we look for	*/
	register code_int	i;		/* Index into vectors	*/
	register int		c;		/* Current input char	*/
	register code_int	code;		/* Substring code	*/
	register int		displacement;	/* For secondary hash	*/
	register code_int	hsize_reg;	/* Size of hash table	*/
	register int		hshift;		/* For xor hasher	*/

	if ((code = GET(in)) == EOF)
	    return;
	in_count++;
	hsize_reg = hsize;
	/*
	 * Set hash code range bound
	 */
	hshift = 0;
	for (hash_code = (long) hsize; hash_code < 65536L; hash_code <<= 1)
	    hshift++;
	hshift = 8 - hshift;
	while ((c = GET(in)) != (unsigned) EOF) {
	    in_count++;
	    hash_code = (long) (((long) c << maxbits) + code);
	    i = (c << hshift) ^ code;		/* XOR hashing		*/
	    if (HASH(i) == hash_code) {		/* Found at first slot?	*/
		code = CODE(i);
		continue;
	    }
	    else if ((long) HASH(i) < 0)	/* empty slot		*/
		goto nomatch;
	    displacement = hsize_reg - i;	/* secondary hash	*/
	    if (i == 0)
		displacement = 1;
probe:
	    if ((i -= displacement) < 0)	/* Wrap around?		*/
		i += hsize_reg;
	    if (HASH(i) == hash_code) {		/* Found in hash table?	*/
		code = CODE(i);			/* Set new prefix code	*/
		continue;			/* Read next input char	*/
	    }
	    else if ((long) HASH(i) > 0)	/* If slot is occupied	*/
		goto probe;			/* Look somewhere else	*/
nomatch:
	    /*
	     * Output the current prefix and designate a new prefix.
	     * If the input character was the "hog", save it in the
	     * look-ahead cache table.  Then, save in the hash table.
	     */
	    outputcode((code_int) code);	/* No match, put prefix	*/
#if SIGNED_COMPARE_SLOW
	    if ((unsigned) next_code < (unsigned) maxmaxcode) {
#else
	    if (next_code < maxmaxcode) {
#endif
		CODE(i) = next_code++;		/* code -> hashtable	*/
		HASH(i) = hash_code;
	    }
	    else if (block_compress
		  && (count_int) in_count >= checkpoint) {
		clear();
	    }
	    code = c;				/* Start new substring	*/
	}
	/*
	 * At EOF, put out the final code.
	 */
	outputcode((code_int) code);
}

clear()
/*
 * Check the compression ratio to see whether it is going up
 * or staying the same.  If it is going down, the internal
 * statistics of the file have changed, so clear out our
 * tables and start over.  Inform the decompressor of the
 * change by sending out a LZ_CLEAR code.
 */
{
	register long int	rat;

	checkpoint = in_count + CHECK_GAP;
#if DEBUG
	if (verbose > 2) {
	    divout("at clear() test",  in_count, out_count, "");
	    fprintf(stderr, ", ratio at entry: %ld.%02ld, gap %d\n",
		rat / 256L, ((rat & 255L) * 100L) / 256L, CHECK_GAP);
	}
#endif
	if (in_count > 0x007FFFFL) {		/* Shift will overflow	*/
	    rat = out_count >> 8;
	    if (rat == 0)
		rat = 0x7FFFFFFFL;
	    else {
		rat = in_count / rat;
	    }
	}
	else {
	    rat = (in_count << 8) / out_count;
	}
	if (rat > ratio)
	    ratio = rat;
	else {
#if DEBUG
	    if (verbose > 0) {
		fprintf(stderr, "Resetting compression, in %ld, out %ld\n",
		    in_count, out_count);
		fprintf(stderr, "Old ratio: %ld == (%ld.%02ld)",
		    ratio, ratio / 256L, ((ratio & 255L) * 100L) / 256L);
		fprintf(stderr, ", test ratio: %ld = (%ld.%02ld), gap %d\n",
		    rat, rat / 256L, ((rat & 255L) * 100L) / 256L, CHECK_GAP);
	    }
#endif
	    outputcode((code_int) LZ_CLEAR);	/* Calls init_compress	*/
	}
}

init_compress(full_init)
flag		full_init;	/* TRUE for full initialization		*/
/*
 * Clear the tables.  Called by outputcode() on LZ_SOH, LZ_STX
 * (full_init TRUE) or on LZ_CLEAR (full_init FALSE).
 * init_compress() is not called on LZ_EOR.
 */
{
#ifdef XENIX_16
	register count_int	*hp;
	register int		n;
	register int		j;
	register code_int	k;

	k = hsize;
	for (j = 0; k > 0; k -= 8192) {
	    i = (k < 8192) ? k : 8192;
	    hp = hashtab[j++];
	    n = i >> 4;
	    switch (i & 15) {
	    case 15:	*hp++ = -1;
	    case 14:	*hp++ = -1;
	    case 13:	*hp++ = -1;
	    case 12:	*hp++ = -1;
	    case 11:	*hp++ = -1;
	    case 10:	*hp++ = -1;
	    case  9:	*hp++ = -1;
	    case  8:	*hp++ = -1;
	    case  7:	*hp++ = -1;
	    case  6:	*hp++ = -1;
	    case  5:	*hp++ = -1;
	    case  4:	*hp++ = -1;
	    case  3:	*hp++ = -1;
	    case  2:	*hp++ = -1;
	    case  1:	*hp++ = -1;
	    }
	    while (--n >= 0) {
		*hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
		*hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
		*hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
		*hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
	   }
	}
#else
	register count_int	*hp;
	register code_int	n;

	hp = &hashtab[0];
	n = hsize >> 4;			/* divide by 16			*/
	switch (hsize & 15) {
	case 15:	*hp++ = -1;
	case 14:	*hp++ = -1;
	case 13:	*hp++ = -1;
	case 12:	*hp++ = -1;
	case 11:	*hp++ = -1;
	case 10:	*hp++ = -1;
	case  9:	*hp++ = -1;
	case  8:	*hp++ = -1;
	case  7:	*hp++ = -1;
	case  6:	*hp++ = -1;
	case  5:	*hp++ = -1;
	case  4:	*hp++ = -1;
	case  3:	*hp++ = -1;
	case  2:	*hp++ = -1;
	case  1:	*hp++ = -1;
	}
	while (--n >= 0) {
	    *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
	    *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
	    *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
	    *hp++ = -1; *hp++ = -1; *hp++ = -1; *hp++ = -1;
	}
#endif
	if (full_init) {
	    tot_incount += in_count;
	    tot_outcount += out_count;
	    in_count = 0;
	    out_count = 0;
	    ratio = 0;
	}
	first_clear = FALSE;
	next_code = firstcode;
}
-h- lzcmp3.c	Wed Jul 24 11:49:28 1985	USER$A:[MINOW.LZ]LZCMP3.C;10
/*
 * 			l z c m p 3 . c
 * Output a given code.
 */
#include "lz.h"

extern STREAM	outstream;
extern code_int	next_code;
extern code_int	maxmaxcode;		/* Actual maximum output code	*/
extern short	maxbits;
extern count_int out_count;

static char_type buf[BITS];
static int	offset;
static short	n_bits = INIT_BITS;	/* # of bits in compressed file	*/
static short	n_bits8 = INIT_BITS << 3;
static code_int	maxcode = MAXCODE(INIT_BITS);

#if !vax_asm
static readonly char_type lmask[9] = {
    0xFF, 0xFE, 0xFC, 0xF8, 0xF0, 0xE0, 0xC0, 0x80, 0x00
};
static readonly char_type rmask[9] = {
    0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F, 0xFF
};
#endif

#if DEBUG
extern int	col;
static int	todump;
#endif

outputcode(code)
code_int  code;
/*
 * Output the given code.
 * Inputs:
 * 	code:	A n_bits-bit integer.  If == -1, then EOF.  This assumes
 *		that n_bits <= (long)wordsize - 1.
 * Note: if not in "export" mode, the following values are special:
 *	LZ_CLEAR	(also in export mode if block_compress TRUE)
 *			(soft) clear out compress tables and reset the
 *			number of bits per code to the minimum.
 *	LZ_SOH, LZ_STX	(hard) clear out compress tables and reset as above.
 *	LZ_ETX, LZ_EOR	force out the current output segment, analogous
 *			to fflush.
 *			
 * Outputs:
 * 	Outputs code to the file.  If the codespace has filled
 *	(next_code >= (1 << n_bits), increase n_bits.
 *	If LZ_CLEAR, LZ_SOH, or LZ_STX is seen, reset n_bits
 *	to the initial value and call init_compress to reset
 *	the lookup and cache tables.
 *
 * Assumptions:
 *	Output chars are 8 bits long.  This is deeply hardwired
 *	into the algorithm.  It is independent, however, of the
 *	size of the input data.
 *
 * Algorithm:
 * 	Maintain a BITS character long buffer (so that 8 codes will
 *	fit in it exactly).  Use the VAX insv instruction to insert each
 *	code in turn.  When the buffer fills up empty it and start over.
 */
{
	/*
	 * On the VAX (Unix), it is important to have the register declarations
	 * in exactly the order given, or the asm will break.
	 */
	register int	r_off, bits;
	register char_type	*bp;
#if !vax_asm
	register code_int	r_code;
#endif

	r_off = offset;
	bits = n_bits;
	bp = buf;
	if (code >= 0) {
	    /*
	     * Not at EOF, add the code
	     */
#if DEBUG
	    if (verbose > 3) {
		fprintf(stderr, "%c%5d %5d",
		    ((col += 12) >= 72) ? (col = 0, '\n') : ' ',
		    code, next_code);
		if (code >= LZ_CLEAR && code < firstcode) {
		    fprintf(stderr, " = %s", lz_names[code - LZ_CLEAR]);
		    col = 74;
		}
	    }
#endif
#if vax_asm
	    /*
	     * VAX DEPENDENT!! Implementation on other machines may be
	     * difficult.
	     *
	     * Translation: Insert BITS bits from the argument starting at
	     * offset bits from the beginning of buf.
	     */
	    0;					/* C compiler bug ??	*/
	    asm("insv	4(ap),r11,r10,(r9)");
#else
	    /*
	     * WARNING: byte/bit numbering on the vax is simulated
	     * by the following code
	     */
	    bp += (r_off >> 3);			/* -> first output slot	*/
	    r_off &= 7;
	    /*
	     * Since code is always >= 8 bits, only need to mask the first
	     * hunk on the left.
	     */
	    r_code = code;
	    *bp = (*bp & rmask[r_off]) | (r_code << r_off) & lmask[r_off];
	    bp++;
	    bits -= (8 - r_off);
	    r_code >>= 8 - r_off;
	    /*
	     * Get any 8 bit parts in the middle ( <= 1 for up to 16 bits).
	     */
	    if (bits >= 8) {
		*bp++ = r_code;
		r_code >>= 8;
		bits -= 8;
	    }
	    if (bits != 0)				/* Last bits. */
		*bp = r_code;
#endif
	    offset += n_bits;
	    if (offset == n_bits8) {
		out_count += n_bits;
		lz_putbuf(buf, n_bits, &outstream);
#if DEBUG
		if (todump > 0) {
		    dumphex(buf, n_bits, stderr);
		    todump -= n_bits;
		}
#endif
		offset = 0;
	    }
	    /*
	     * If the next entry is going to be too big for the code size,
	     * then increase it, if possible.  Note:
	     *     !export			firstcode == LZ_FIRST
	     *	   export && block_compress	firstcode == LZ_CLEAR + 1
	     *	   export && !block_compress	firstcode == LZ_CLEAR
	     */
	    if (next_code > maxcode) {
		if (offset > 0) {
		    lz_putbuf(buf, n_bits, &outstream);
		    out_count += n_bits;
		    offset = 0;
#if DEBUG
		    if (todump > 0) {
			dumphex(buf, n_bits, stderr);
			todump -= n_bits;
		    }
#endif
		}
		n_bits++;			/* Need more bits	*/
		n_bits8 += (1 << 3);
		if (n_bits == maxbits)
		    maxcode = maxmaxcode;
		else
		    maxcode = MAXCODE(n_bits);
#if DEBUG
		if (verbose > 2) {
		    fprintf(stderr,
			"%snext_code %d, change to %d bits max %d",
			(col > 0) ? "\n" : "", next_code,
			n_bits, maxcode);
		    col = 74;
		}
#endif
	    }
	    if (code >= LZ_CLEAR && code < firstcode) {
		switch (code) {
		case LZ_SOH:
		case LZ_STX:
		case LZ_CLEAR:
		    if (offset > 0) {
			lz_putbuf(buf, n_bits, &outstream);
			out_count += n_bits;
			offset = 0;
#if DEBUG
			if (todump > 0) {
			    dumphex(buf, n_bits, stderr);
			    todump -= n_bits;
			}
#endif
		    }
		    n_bits = INIT_BITS;		/* Reset codes		*/
		    n_bits8 = INIT_BITS << 3;
		    maxcode = MAXCODE(INIT_BITS);
		    init_compress(code != LZ_CLEAR);
#if DEBUG
		    if (verbose > 2) {
			fprintf(stderr,
		    "\n(%s) Change to %d bits, maxcode %d, next_code = %d",
			    lz_names[code - LZ_CLEAR],
			    n_bits, maxcode, next_code);
			col = 74;
			todump = 32;
		    }
#endif
		    break;

		case LZ_EOR:
		case LZ_ETX:			/* Just written out	*/
		    break;

		default:
		    abort();			/* Can't happen		*/
		}
	    }
	}
	else {
	    /*
	     * At EOF, write the rest of the buffer.
	     */
	    if ((r_off = offset) > 0) {
		r_off += 7;
		r_off >>= 3;
		lz_putbuf(buf, r_off, &outstream);
		out_count += r_off;
#if DEBUG
		if (todump > 0) {
		    dumphex(buf, r_off, stderr);
		    todump -= r_off;
		}
#endif
	    }
	    offset = 0;
	    lz_flush(&outstream);		/* Flush output buffer	*/
#if DEBUG
	    if (verbose > 3 || todump > 0) {
		fprintf(stderr, "\n*EOF*\n");
		col = 0;
	    }
#endif
	}
}