Mirror 2.9 Reference Manual

Lee McLoughlin

and

Zoë Leech

1 June 1998
lmjm@icparc.ic.ac.uk
zl@icparc.ic.ac.uk

Introduction

Mirror is a package written in Perl that uses the FTP protocol to duplicate a directory hierarchy between the machine it is run on and a remote host. It avoids copying files unnecessarily by comparing the file time-stamps and file sizes before transferring. Amongst other things, it can optionally rename, compress, gzip, and split files.

Mirror was written by Lee McLoughlin <lmjm@icparc.ic.ac.uk> for use by archive maintainers but can be used by anyone wanting to transfer a lot of files via FTP.  Although originally only available on Un*x with version 2.9 mirror will also run on Wind*ws 95 and Wind*ws NT.
 

The latest version of mirror can always be found at either:

The latest version of this guide can always be found at:

Description

Mirror is called in one of two ways (see also mirror master): The first method is used to retrieve a remote file or directory into the current directory. If you are mirroring a directory it is best to end the pathname in a slash ('/') as this makes the remote recursive listing smaller or use the -r flag to suppress recursion (see -g below). The mirror.defaults file is not used.

In the second method given above, a minimal number of arguments are required and mirror is controlled by keyword=value lines read from the package files. If a file named mirror.defaults is found in either the directory containing the mirror executable or in the PERLLIB path, then it is loaded before any of the package-filesmirror.defaults normally just contains  the package of keyword settings called defaults that is used to provide common defaults for all package-files.   If no mirror.defaults file is found  the default settings built into mirror  are used.

Each package-files is read in turn, looking for named packages.  If the package is not named defaults, then mirror will perform the following steps.

If mirror is already connected to a site, other than the target site, it will disconnect from the site.  It then changes to the given local directory, creating it if necessary, and scans it to get the details of the local files that are already there.  Mirror then attempts to connect to the remote site's FTP daemon. It will then login using the given remote_user and remote_password.  The remote directory is then scanned. Mirror does this by changing to the remote directory (remote_dir) and running the FTP LIST command, passing the flags_recursive  or flags_nonrecursive options depending on the value of recursive.  Alternatively a file containing the directory listing may be retrieved (see ls_lR_file and local_ls_lR_file) . Each remote pathname will have any required mappings performed on it to create a local pathname. Then any checks specified by the exclude_patt, max_days, get_newer and get_size_change keywords are applied to names of files or symlinks. max_days, get_newer and get_size_change  are not applied to directories.  This creates a list of all required remote files and the local pathnames to store them in.

Local versions of all required directories are then created.  Then all required files are fetched from the remote site into their local pathnames. This is done by retrieving the file into a temporary file in the target directory. The transfer is normally done in binary mode (see vms_xfer_text).  If required the temporary file may be compressed, gzip'ed or split. The file's time-stamps are reset to match those of the remote file.  Finally the temporary file is renamed to have the correct name.

Once all files have been transferred any required symbolic links are created (where support by your Operating System) and any unnecessary pathnames in the mirror are deleted.

Unless an internal failure is detected, any error will cause the current package to be skipped and the next one tried.

Mirror can handle symbolic links but not hard links. It does not duplicate owner or group information as usually this is meaningless over a network (but see user and group). If you require any of these options and you are on Un*x use rdist(1) instead.

Mirror was written to mirror remote Un*x archives, but has grown (like topsy).

Flags

Although mirror has a large number of command line flags most should only really be used when doing a very simple mirror as a one-time event.  If you intend to maintain a mirror area it is much better to put all the details into a mirror package file and then run mirror on that file.

The only flags you should use often are -n and, if you like to see what mirror is up to,-d.
 
-d  Enable debugging. If this argument is given more than once (e.g. -d -d) the debugging level will increase. Currently the maximum useful level is four.
-n  Do nothing except compare local and remote directories, no file transfers are done. Sets debug level to two, so that you are shown a trace of what would be done.
-g site:path  Get all files  matching path, which is a regexp, on the given site. If path matches .*/.+ (e.g. /fred or /fred/bloggs) then it is the name of the directory and everything after the last / is the pattern of filenames to get. If path ends with / then it is the name of a directory and all its contents are retrieved.  One note of caution. If you use host:/fred, a full directory listing of / on the remote host will be done. If all you wanted was the contents of the directory /fred then specify host:/fred/
-p package  When using multiple package files only mirror the given package. This option may be given multiple times in which case all the given packages will be mirrored. Without this option, all packages will be mirrored. Package is a regexp matched against the package name following the -p.
-R package  Similar to -p but skips all packages until it reaches the given package. Useful for restarting failed mirror runs from where they left off.
-F  Use temporary dbm files for the information about files. This is useful if you mirror a very large directory.  See the variable use_files.
-r  Equivalent to -k recursive=false
-v  Print the version details of mirror and exit.
-T  Do not do any file transfers just force the time-stamps of any local files to be reset to be the same as the remote files. Normally only used when initialising a mirror that already contains files retrieved another way (e.g. from CDROM).
-Ufilename  Record all files transfered by mirror into the given filename. Remember that mirror changes into local_dir to do its work, so it should be a full pathname. If no filename is given, it defaults to upload_log.day.month.year.
-k key=value  Override any default key/value.  See below
-m  Equivalent to -k mode_copy=true 
-t  Equivalent to -k text_mode=true
-f  Equivalent to -k force=true
-s site  Equivalent to -k site=site
-u user  Equivalent to -k remote_user=user You are then prompted for a password, with echo turned off. The password is used as the remote_password.
-L Just generate a pretty printed version of the input and exit.

Package Files

Each group of keywords defines how to mirror a particular package and should begin with a unique package line. The package name is used in report generation and by the -p argument, so pick something mnemonic. The minimum needed for each package is package, site, remote_dir and local_dir . On finding a package line, all the default values are reset to either the values from the defaults package (or built-in values if defaults has not been set).  A package ends at either the next package statement or at the end of file.

Package files are parsed as a series of statements. Blank lines and lines beginning with a hash are ignored. Each statement is of the form

or  You can add whitespace before the keyword and the equals/plus. Everything immediately following the equals/plus is the value, including any leading or trailing whitespace. The equals version sets the keyword to this value, while the plus version concatenates the value onto the end of the existing value (normally set in defaults package).

A statement can be continued over multiple lines by ending all lines except the last, with the character ampersand ('&'). The line following the ampersand, is appended to the current line with all leading whitespace removed.

Although there are a lot of keywords that can be set, the built-in defaults will handle most cases. Normally only packagesiteremote_dir and local_dir need to be set.

Setting Defaults

If the package name is defaults, then no site is contacted, but the default values given for any keywords are changed. Normally all the defaults are in the file mirror.defaults which will be automatically loaded before any package files (see Description).
# Sample mirror.defaults
package=defaults
        # The LOCAL hostname - if not the same as `hostname` returns
        # (I advertise the name sunsite.org.uk but the machine is
        #  really swallow.doc.ic.ac.uk.)
        hostname=sunsite.org.uk
        # Keep all local_dirs relative to here
        local_dir=/public/
        remote_password=wizards@sunsite.org.uk

Keywords

The following is a list of all the available keywords and the default values built into mirror.  To change these defaults it is usually best to change your mirror.defaults file.
 
The keywords are grouped into the following sections:  
Required Keywords 
keyword default Description
package none A name for the package to be mirrored.  Should be different from all other package names you use.
site none Hostname or IP address of the remote site to mirror from.
remote_dir none Remote directory to mirror. See also recurse_hard.
local_dir none Local directory.
 
FTP Related 
keyword default Description
remote_user anonymous Username to use at remote site.
remote_password localuser@localhostname Password to use at remote site.  Note: localuser is will be your name and localhostname will be the name of the local machine (if it can be found, see hostname)
remote_account none Account name/password to use at remote site, after logging in anonymously (for systems that require it).
remote_group none If present set the remote 'site group'. 
remote_gpass none If present set the remote 'site gpass'. 
timeout 40 Timeout FTP requests after this many seconds. 
failed_gets_excl none Regexp of error messages to skip reporting, when the FTP GET command fails.  (E.g. permission denied.)
ftp_port 21 Port number of remote FTP daemon. 
proxy false Set to true to use proxy FTP service. 
proxy_ftp_port 4514 Port number of proxy-service FTP daemon. This value should be changed depending on which proxy library you are using. 
proxy_gateway internet-gateway Name of proxy-service, may also be supplied by the environment variable INTERNET_HOST
using_socks false Set to true if you are using a SOCKS version of Perl
passive_ftp false Set to true if you want to use the PASV extension of the FTP protocol. Especially useful with firewalls, other proxy FTP servers, and the variable using_socks
retry_call true If initial connect fails, retry ONCE after ONE minute. This is to handle sites which reverse lookup the incoming host but sometimes timeout on the first attempt. 
disconnect false Disconnect from remote site at end of package.  Normally only disconnects if the next package specifies a different site.  (Some sites will not let you change to certain directories except when first connecting in.)
remote_idle none If set try and set the remote idle timer to this.
 
File Copying 
keyword default Description
get_patt . Regexp of remote pathnames to retrieve.
exclude_patt none Regexp of remote pathnames to ignore.
local_ignore none Regexp of local pathnames to ignore. Useful to skip restricted local directories.
get_newer true Get the remote file if it is more recent that the local file.
get_size_change true Get the file if the size is different from local. If the file is to be compressed after being fetched get_size_change is automatically set to false.
make_bad_symlinks false If true, symlinks will be made to invalid (non-existent) pathnames. (In older versions of mirror this defaulted to true.)
follow_local_symlinks none Regexp of pathnames of local symbolic links.  Rather than treating them as symlinks the target files or directories they reference are used instead. This makes local symlinks invisible to mirror.
get_missing true Really get files. When set to false, only deletions and symlinking will be done. Used to delete expired files older than max_days without retrieving older files.
get_file true Get files.  If set to false mirror will try to put files.
text_mode false If true, all files are transferred in TEXT mode. Un*x prefers binary so that is the default.
strip_cr false Strip carriage returns from any file as it is retrieved.
vms_keep_versions true When mirroring VMS files, keep the version numbers. If false, the versions are stripped off and the only the base filenames are kept.
vms_xfer_text (readme|info|listing|\.c)$ Pattern of VMS files to transfer in TEXT mode (case insensitive).
name_mappings none Remote to local pathname mappings (a Perl substitute command, e.g. s:old:new:).
external_mapping none Specifies a file that should contain a Perl module called extmap containing at least a function called map.  This function is used as the name_mappings function.
update_local false Set get_patt to be all the files and directories already present in local_dir.
max_days 0 If >0, ignore files older than this many days.  Any ignored files will not be transferred or deleted.
max_size 0 If >0, do not transfer any files any larger than this many bytes.
chmod true By default try and set the file attributes (e.g. time-stamps) of the copied file.  If false do not set attributes. 
 
Local File Attributes 
keyword default Description
user none User name or uid to give to local pathnames.
group none Group name or gid to give to local pathnames.
mode_copy false Flag indicating if we need to copy the file/dir modes.  If this is false then file_mode and dir_mode will be used instead.
file_mode 0444 Mode to give files created locally if mode_copy is false.
dir_mode 0755 Mode to give directories created locally if mode_copy is false.
force false If true, all files will be transferred regardless of the results from size or time-stamp comparisons.
umask 07000 Do not create setuid files by default (see the chmod(1) on Un*x).
use_timelocal true Time-stamp files to local time zone. If false, the time zone is set to GMT (older versions of mirror had a bug setting all files to GMT).
force_times yes Force local times to match remote times.
 
File Deletion 
keyword default Description
do_deletes false Delete destination files if not in source tree.
delete_patt . Regexp of local pathnames to check for deletions. Names that are not matched are not checked. The match by delete_excl is done to all files selected by this pattern.
delete_get_patt false Set delete_patt to be get_patt.
delete_excl none Regexp of local pathnames that mirror will not delete.
max_delete_files 10% If this is set to just a number and there are more than this many files to delete, do not delete just warn. If this is set to number% and the percentage of files that would be deleted is greater than the number, do not delete just warn.
max_delete_dirs 10% As max_delete_files except applies to directories.
save_deletes false Instead of deleting local files move them into save_dir
save_dir Old Where local files no longer on remote site are moved to.  Either begins with / or is relative to local_dir.  Only used when save_deletes is true.
store_remote_listing none Local pathname where remote listings are kept. Useful if you have a slow network or want to perform several operations on the same package without retrieving the index every time.
 
File Compression 
keyword default Description
compress_patt none Regexp of files to compress before storing locally. See get_size_change.
compress_excl \.(z|gz)$ Regexp of files not to compress (case insensitive).
compress_prog compress Program to compress files. If set to the word compress or gzip, the full pathname for the program and correct compress_suffix will automatically be set. When using gzip, level -9 is used. Note that compress_suffix can be reset to a non-standard value by setting it after compress_prog.
compress_suffix none Character(s) the compress program appends to files. If compress_prog is compress, this defaults to .Z. If compress_prog is gzip, this defaults to .gz.
compress_conv_patt (\.Z|\.taz)$ If compress_prog is gzip, files matching this pattern are uncompressed and gzip'ed before storing locally. Compression conversion is only meant to do compress to gzip conversion.
compress_conv_expr s/\.Z$/\.gz/; 
s/\.taz$/\.tgz/
Perl expression to convert suffix from compress to gzip style. Change .Z to .gz and .taz to .tgz.
compress_size_floor 0 Do not compress files smaller than this size, in bytes.
 
File Splitting 
keyword default Description
split_max 0 If >0 and the size of the file is greater than this many bytes, the file is split up to be stored locally (filename must also match split_patt).  The name of the file being split up is used as the directory name and each part is stored in a file called part1, part2... in that directory.
split_patt none Regexp of remote pathnames to split up before storing locally.
split_chunk 102400 Size, in bytes, of chunks to split files into.
 
Directory Listings 
keyword default Description
remote_fs unix File store type. Currently can be one of unix, dls, netware, vms, dosftp, macos, lsparse and infomac. See the Filestores section for more details.
ls_lR_file none Remote file containing ls-lR (result of running ls -lR on that machine), otherwise run remote ls command.
local_ls_lR_file none Local file containing ls-lR, otherwise use remote ls_lR_file. This is useful when first mirroring a large package.
recursive true Mirror both the contents of local_dir and sub directories of local_dir.
recurse_hard false Generate remote ls by doing CWD and ls for each sub directory. In this case remote_dir must be absolute (begin with a /) not relative. Use the CWD command in FTP to find the path for the start of the remote archive area. (Not available if remote_fs is VMS.)
flags_recursive -lRat Flags to send to remote ls to do a recursive listing.
flags_nonrecursive -lat Flags to send to remote ls to do a non-recursive listing.
ls_fix_mappings none Edit pathnames in remote directory listings (a Perl substitute command, e.g. s:/usr/spool/pub:/:).
 
Logging 
keyword default Description
update_log none Filename, relative to local_dir, where mirror will write a report of all it does to maintain a package.
mail_to none Mail a log of the work done to this comma separated list of addresses (currently only supported on Un*x).
mail_prog none Program called to send to the mail_to list. May be passed the argument mail_subject. Defaults to mailx, Mail, or mail. (Not supported under Wind*ws)
mail_subject -s "mirror update" This can contain $keyword.  These will be replaced by the current value for that keyword (e.g.: -s "mirror update: $package")
 
Special 
keyword default Description
hostname none Mirror automatically skips packages whose site variable matches this host. Defaults to the local hostname.  This is normally only ever set in the defaults package.  Useful if you are sharing mirror package files with others.
comment none Used in reports.
use_files false Put the associative arrays that mirror uses into temporary files (currently only support on Un*x).  The files are created in /var/tmp with names: local_map and remote_map.  The suffixes will depend on which DBM library was set as default when Perl was installed on your machine.
interactive false A non-batch transfer. Implied by -g flag.
skip none If set causes this package to be skipped.  The value is reported as the reason for skipping.
verbose false Verbose messages.
algorithm 0 Sets the basic algorithm that mirror uses. 

Algorithm=0 mirrors an entire site at a time.  This is very friendly on the remote site as it uses few of its resources.  However it can chew up a lot of memory on the local machine. 

Algorithm=1 mirrors a site directory-by-directory.  Should ONLY be used for true mirrors (i.e.: no differences between the this mirror copy and the original). This uses up a lot less local resources. However it is very unfriendly to the remote site as it requires remote site to run an ls command for each directory mirrored.   Mirror will only "see" the one directory it is mirroring so it will not know that files outside this directory exists so symlinks outside this directory are considered bad, see make_bad_symlinks.  Deletions are done on a directory by directory basis so be extra careful about the settings of max_delete_files and max_delete_dirsget_patt is applied to just the filename in this directory not the full path, as are other name checks. You will almost certainly need to set remote_dir to be an absolute pathname (beginning with /). 

local_dir_check false If true and the local_dir does not exit skip this package.  By default the local_dir will be created if it does not already exist.

Filestores

Mirror uses the remote directory listing to work out what files are available. Mirror was originally targeted connect to Un*x FTP daemons using a standard ls command. To use a Un*x host with a non-standard ls or a non Un*x host it is necessary to set the remote_fs variable to match the kind of directory listing that will be returned. There is some interaction between remote_fs and other variables in particular flags_nonrecursive, recurse_hard and get_size_change. The following sections show examples of the results of running the FTP DIR command on the various kinds of archive and recommendations for related variables. With some unusual set-ups archive you may have to vary from the recommended variable set-ups.

remote_fs=unix

total 65
-rw-r--r-- 1 nobody nobody   2245 Jan 28 20:06 README
-rw-r--r-- 1 nobody nobody  45881 Jan 29 19:13 mirror.html
This is the default and you should not normally have to reset any other related variables.

remote_fs=dls

00index.txt      189916  
0readme            5793  
1_x/                  =  OS/2 1.x-specific files
This is an ls variant used on some Un*x archives. It provides descriptions of known items in the listing. Set flags_recursive to -dtR.

remote_fs=netware

- [R----F--] jrd                  1646       May 07 21:43    index
d [R----F--] jrd                   512       Sep 09 10:52    netwire
d [R----F--] jrd                   512       Sep 02 01:31    pktdrvr
d [RWCE-F--] jrd                   512       Sep 04 10:55    incoming
or
-[R----F--] 1 jrd                  1646       May 07 21:43    index
d[R----F--] 1 jrd                   512       Sep 09 10:52    netwire
d[R----F--] 1 jrd                   512       Sep 02 01:31    pktdrvr
This is used by Novell archives. Set recurse_hard to true and set flags_nonrecursive to be nothing. See also remote_dir.

remote_fs=dosftp

00-index.txt  6,471 13:54  7/20/93   alabama.txt   1,246 23:29  5/08/97
alaska.txt      873 23:29  5/08/92   alberta.txt   2,162 23:29  5/08/97
dosftp is for an FTP daemon on D*S boxes. Set recurse_hard to true and set flags_nonrecursive to nothing. See also remote_dir.

remote_fs=macos

-------r--      0      127   127 Aug 27 13:53 !Gopher Links
drwxrwxr-x          folder    32 Sep  9 16:30 FAQ
drwxrwx-wx          folder     0 Sep  9 09:59 incoming
macos is for one of Macintosh FTP daemon variants. Although the output is similar to Un*x  the Un*x remote_fs type cannot cope with it because there are three file sizes for each file. Set recurse_hard to true, flags_nonrecursive to nothing, get_size_change to false and compress_patt to nothing (this last setting is due to the unusual file names upsetting the shell used to run compress). See also remote_dir.

remote_fs=vms

USERS:[ANONYMOUS.PUBLIC]

1-README.FIRST;13     9  14-JUN-1993 13:09 [ANONYMOUS] (RWE,RWE,RE,RE)
PALTER.DIR;1          1  18-JAN-1993 11:56 [ANONYMOUS] (RWE,RWE,RE,RE)
PRESS-RELEASES.DIR;1
                      1  11-AUG-1992 20:05 [ANONYMOUS] (RWE,RWE,,)
alternatively:
[VMSSERV.FILES]ALARM.DIR;1      1/3          5-MAR-1993 18:09
[VMSSERV.FILES]ALARM.TXT;1      1/3          4-FEB-1993 12:20
Set flags_recursive to '[...]' and get_size_change to false. recurse_hard is not available with VMS. See also the vms_keep_versions and vms_xfer_text variables.
 

remote_fs=infomac

-r     1974 Jul 21 00:06 00readme.txt
lr        3 Sep  8 08:34 AntiVirus -> vir
This is a special case just meant to handle the sumex-aim.stanford.edu info-mac directory listing stored on that archive in help/all-files. recurse_hard should be set to true.

remote_fs=dosish

This is for a D*S/Wind*ws FTP server with a faintly DOS like output
03-04-94  08:45PM       <DIR>          .
03-04-94  08:45PM       <DIR>          ..
03-04-94  09:58AM                 9718 Conduit
03-04-94  09:59AM                 8745 Eve
recurse_hard should be set to true and flags_nonrecursive to nothing.

remote_fs=lsparse

Allow reparsing of the listing generated by mirror with debugging turned to a high level. Meant only for mirror wizards.

Examples

Here is the mirror.defaults file from the archive on sunsite.org.uk:
# This is the default mirror settings used by my site:
# sunsite.org.uk (193.63.255.4)

package=defaults
        # The LOCAL hostname - if not the same as `hostname`
        # (I advertise the name sunsite.org.uk but the machine is
        #  really swallow.sunsite.org.uk)
        hostname=sunsite.org.uk
        # Keep all local_dirs relative to here
        local_dir=/public/Mirrors
        remote_password=wizards@sunsite.org.uk
        mail_to=
        # Don't mirror file modes.  Set all dirs/files to these
        dir_mode=0755
        file_mode=0444
        # By default, files are owned by root.zero
        user=0
        group=0
#       # Keep a log file in each updated directory
#       update_log=.mirror
        update_log=
        # Don't overwrite my mirror log with the remote one.
        # Don't retrieve any of their mirror temporary files.
        # Don't touch anything whose name begins with a space!
        # nor any FSP or gopher files...
        exclude_patt=(^|/)(\.mirror$|\.in\..*\.$|MIRROR.LOG|#.*#|\.FSP|\.cache|\.zipped|lost+found/|)
        # Try to compress everything
        compress_patt=.
        compress_prog=compress
        # Don't compress information files, files that don't benefit from
        # being compressed, files that tell ftpd, gopher, wais... to do things,
        # the sources for compression programs...
        # (Note this is the only regexp that is case insensitive.)
        compress_excl+|^\.notar$|-z|\.gz$|\.taz$|\.tar.Z|\.arc$|\.zip$|\.lzh$|\.zoo$|\.exe$|\.lha$|\.zom$|\.gif$|\.jpeg$|\.jpg$|\.mpeg$|\.au$|read.*me|index|\.message|info|faq|gzip|compress
        # Don't delete own mirror log or any .notar files (incl in subdirs)
        delete_excl=(^|/)\.(mirror|notar)$
        # Ignore any local readme files
        local_ignore=README.doc.ic
        # Automatically delete local copies of files that the
        # remote site has zapped
        do_deletes=true
Here are some sample package descriptions:
package=gnu
        comment=Powerful and free Un*x utilities
        site=prep.ai.mit.edu
        remote_dir=/pub/gnu
        # Local_dir+ causes gnu to be appended to the default local_dir
        # so making /public/gnu
        local_dir+gnu
        exclude_patt+|^ListArchives/|^lost+found/|^scheme-7.0/|^\.history
        # I tend to only keep the latest couple of versions of things
        # this stops mirror from retrieving the older versions I've removed
        max_days=30
        do_deletes=false

package=X11R6
        comment=X Windows (windowing graphics system for Un*x)
        site=ftp.x.org
        remote_dir=/pub/R6
        local_dir+ftp.x.org/pub/R6
        # This is a local symlink to the free-for-all contrib area
        # and is mirrored elsewhere
        local_ignore=^contrib$
        # Don't compress a thing.  It is already compressed 
        # but doesn't look it.
        compress_patt=


# THIS IS JUST A TEST
package=test vms site
        site=vmsbox.somewhere.ac.uk
        local_dir=/tmp/copy4
        remote_dir=vmsserv/files
        remote_fs=vms
        # Must do these settings for VMS
        flags_recursive=[...]
        get_size_change=false

# and on, and on ...

Temporary Filenames

By default when mirror creates a temporary filename it takes the real filename and puts .in. at the start.
If your system limits the length of a filename a lot (some older Un*xes were limited to 14 characters) then look for:
  LIMITED NAMELEN
which is about 75% of the way through mirror.pl, for a note on how to reduce temporary filename length.  I only know of one site using this.

Regular Expressions

This is a short explanation of regular expressions.  For a more comprehensive guide see the Perl manual pages or the O'Reilly book "Mastering Regular Expressions".

A regular expression, or regexp, is a way of using matching patterns in text strings.  For example the regexp:

would match any string that begins with an s.  The ^ is a special character that means beginning of string.  There are a number of specials possible in a regexp, everything that is not special is taken as a literal character, such as the s in the example above.  To turn off a special character put a backslash, \, in front of it.  This only effects the special character immediately following it.

A word of warning: although very similar to Un*x shell (and D*S COMMAND) wildcards there are differences.  For example any Un*x and D*S would treat *.ZIP as any filename ending in .ZIP, *.ZIP as a regular expression is an error!  The * is special that must follow something (see below).

Regexp Specials

^ beginning of string
$ end of string
. any character
[r] a range or characters either as a list abcef or a hyphen separated range a-f
[^r] anything not in the given list or range
(p1|p2|p3...) patterns p1 or p2 or p3 ... (the patterns may be specials)
* zero or more of the preceding item (which may be a special)
+ one or more of the preceding item (which may be a special)
\d any digit (same as [0-9])
\D any non-digit (same as [^0-9])
\s any whitespace character
\S any non-whitespace character

Regexp Examples

abc matches abc, also xxxabcyyy but not xabbcy 
^abc$ matches only abc
a.*z matches a any string z. e.g. asdkjfhaksdjfhz
index.html matches index.html AND indexXhtml index/html (. matches any character)
index\.html matches index.html (the backslash stops . matching any character)
[rR][eE][aA][dD][mM][eE] matches readme, Readme, README ...
\.(gz|Z)$ matches strings ending in .gz or .Z

Hints

When adding a new package, first test it by running mirror with the -n option.

If you are adding to an existing archive that was not created by mirror (perhaps you copied the files from a CDROM) then it is usually best to force the time-stamps of the existing local files so time comparisons with the remote files show the files as identical (see -T).

Try and keep all packages that are being retrieved from the same site together in the same package file. That way mirror will only have to login once.

Remember that all regexp's are Perl regular expressions.

If the remote site contains symlinks that you want to "flatten out" into the corresponding files, then do this by changing the flags passed to the remote ls which will be either flags_recursive or flags_nonrecursive to include L  First test this by trying a ls -lRatL on the remote site under the FTP command to check whether the remote filestore has any symlink loops.   These cause ls to go into an infinite loop - if this happens you will have to talk to the manager of the remote area about removing them.

If you are mirroring a very large site that changes infrequently, add max_days=7 to the settings after it is initially mirrored. That way mirror will only have to consider recent files when updating. Then once a week, or whenever necessary, call mirror with -k max_days=0 to force a full update.

If you don't want to compress anything from the remote site the easiest way to do this is to set the compress_patt to nothing.

If you want to run a command at the end of mirroring a package a useful trick is to reset the mail_prog variable to be the program name and mail_to to be the arguments.

For netware, dosftp, macos and VMS you should normally set remote_dir to be the home directory of the remote FTP daemon. Connect in manually and before changing directory use the pwd command to find where home is. If you are only mirroring part of the tree then give the full pathname including this home directory at the start.

macos names can sometimes contain characters that make it hard to pass them through Un*x shells. Since compressing files is done via a shell it would be best to turn off compression with compress_patt=

macos files seem to always change size when transfered, in either binary or text mode. So it would be best to set get_size_change=false

Netiquette

If you are going to mirror a remote site, please obey any restrictions that the site administrators place on access. You can generally find the restrictions on connecting to the archive using the standard FTP command. Any restrictions are normally given as a login banner or in a (hopefully) obvious file.

Here are, what I hope are, some good general rules:

You should probably get permission from the remote site before setting up a mirror of it.  Some sites require detailed logs.  Unauthorised mirrors would take traffic from the site generating the logs and so ruin their statistics.  There may also be SERIOUS LEGAL REASONS why mirrors are unwanted.

Only mirror a site well outside the working hours of both the local and remote sites.

It is probably unfriendly to try to mirror a remote site more than once a day.

Before trying to mirror a remote site, try and find the packages you want from local archives, as no one will be pleased if you soak up a lot of network bandwidth needlessly.

If you have a local archive, then tell people about it so they don't have to waste bandwidth and CPU at the remote site.

Do remember to check your package-files from time to time in case the remote archive has changed their access restrictions.
 

Bugs

Some of the netiquette guidelines should be enforced.

Should be able to cope with links as well as symlinks.

Suffers from creeping featurism. (Actually more like galloping featurism!.)

If you are using Perl 4 (Perl 5 users skip this):

There seems to be a problem with older versions of Perl that causes mirror to fail with the message 'fstype unix unknown'.  If you experience
this then please upgrade your Perl to 5.004 or better.
 

Remember!

Objects in a mirror are closer than you think!

Author

Mirror was writen by Lee McLoughlin <lmjm@icparc.ic.ac.uk>. It uses a heavily rewritten and extended version of the ftp.pl package originally by: Alan R. Martello <al@ee.pitt.edu> which uses lchat.pl which is based on the chat2.pl package by: Randal L. Schwartz <merlyn@ora.com>

Special thanks to the following people for patches, comments and other suggestions that have helped to improve mirror. If I have omitted anyone, please contact me.

Zoë Leech <zl@icparc.ic.ac.uk>
James Revell <revell@uunet.uu.net>
Chris Myers <chris@wugate.wustl.edu>
Amos Shapira <amoss@cs.huji.ac.il>
Paul A Vixie <vixie@pa.dec.com>
Jonathan Kamens <jik@pit-manager.mit.edu>
Christian Andretzky <casys@otto.mb3.tu-chemnitz.de>
Kean Stump <kean@ucs.orst.edu>
Anita Eijs <anita@hermes.bouw.tno.nl>
Simon E Sperro <S.E.Sperro@gdr.bath.ac.uk>
Aaron Wohl <aw0g+@andrew.cmu.edu>
Michael Meissner <meissner@osf.org>
Michael Graff <explorer@iastate.edu>
Bradley Rhoades <us267388@mail.mmmg.com>
Edwards Reed <eer@cinops.xerox.com>
Joachim Schrod <schrod@iti.informatik.th-darmstadt.de>
David Woodgate <David.Woodgate@mel.dit.csiro.au>
Pieter Immelman <pi@itu1.sun.ac.za>
Jost Krieger <x920031@bus072.rz.ruhr-uni-bochum.de>
Erez Zadok <ezk@cs.columbia.edu>
 

Copyright

Mirror, both the software and all the accompanying documentation including this document, is under the following copyright.

Copyright © 1990 - 1998 Lee McLoughlin

Permission to use, copy, and distribute this software and its documentation for any purpose with or without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation.

Permission to modify the software is granted, but not the right to distribute the modified code. Modifications are to be distributed as patches to released version.

This software is provided "as is" without express or implied warranty.