WWWOFFLE - World Wide Web Offline Explorer - Version 2.7a ========================================================= The WWWOFFLE programs simplify World Wide Web browsing from computers that use intermittent (dial-up) connections to the internet. Description ----------- The WWWOFFLE server is a proxy web server with special features for use with dial-up internet links. This means that it is possible to browse web pages and read them without having to remain connected. Basic Features - Caching of HTTP, FTP and finger protocols. - Allows the 'GET', 'HEAD', 'POST' and 'PUT' HTTP methods. - Interactive or command line control of online/offline/autodial status. - Highly configurable. - Low maintenance, start/stop and online/offline status can be automated. While Online - Caching of pages that are viewed for later review. - Conditional fetching to only get pages that have changed. - Based on expiration date, time since last fetched or once per session. - Non cached support for SSL (Secure Socket Layer e.g. https). - Can be used with one or more external proxies based on web page. - Control which pages cannot be accessed. - Allow replacement of blocked pages. - Control which pages are not to be stored in the cache. - Requests compressed pages from web servers (compile time option). While Offline - Can be configured to use dial-on-demand for pages that are not cached. - Selection of pages to download next time online - Using normal browser to follow links. - Command line interface to select pages for downloading. - Control which pages can be requested when offline. - Provides non-cached access to intranet servers. Automated Download - Downloading of specified pages non-interactively. - Options to automatically fetch objects in requested pages - Understands various types of pages - HTML 4.0, Java classes, VRML (partial), XML (partial). - Options to fetch different classes of objects - Images, Stylesheets, Frames, Scripts, Java or other objects. - Option to not fetch webbug images (images of 1 pixel square). - Automatically follows links for pages that have been moved. - Can monitor pages at regular intervals to fetch those that have changed. - Recursive fetching - To specified depth. - On any host or limited to same server or same directory. - Chosen from command line or from browser. - Control over which links can be fetched recursively. Convenience - Optional information footer on HTML pages showing date cached and options. - Options to modify HTML pages - Remove scripts. - Remove Java applets. - Remove stylesheets. - Remove shockwave flash animations. - Indicate cached and uncached links. - Remove the blink tag. - Remove refresh tags. - Remove links to pages that are in the DontGet list. - Remove inline frames (iframes) that are in the DontGet list. - Replace images that are in the DontGet list. - Replace webbug images (images of 1 pixel square). - Demoronise HTML character sets. - Stop animated GIFs. - Automatic proxy configuration for Netscape. - Searchable cache with the addition of the ht://Dig, mnoGoSearch (UdmSearch) or Namazu programs. - Built in simple web-server for local pages. - Timeouts to stop proxy lockups - DNS name lookups. - Remote server connection. - Data transfer. - Continue or stop downloads interrupted by client. - Based on file size of fraction downloaded. - Purging of pages from cache - Based on URL matching. - To keep the cache size below a specified limit. - To keep the free disk space above a specified limit. - Interactive or command line control. - Compression of cached pages based on age. - Provides compressed pages to web browser (compile time option). Indexes - Multiple indexes of pages stored in cache - Servers for each protocol (http, ftp ...). - Pages on each server. - Pages waiting to be fetched. - Pages requested last time offline. - Pages fetched last time online. - Pages monitored on a regular basis. - Configurable indexes - Sorted by name, date, server domain name, type of file. - Options to delete, refresh or monitor pages. - Selection of complete list of pages or hide un-interesting pages. Security - Works with pages that require basic username/password authentication. - Automates proxy authentication for external proxies that require it. - Control over access to the proxy - Defaults to local host access only. - Host access configured by hostname or IP address. - Optional proxy authentication for user level access control. - Optional password control for proxy management functions. - Can censor incoming and outgoing HTTP headers to maintain user privacy. Configuration - All options controlled using a configuration file. - Interactive web page to allow editing of the configuration file. - User customisable error and information pages. Changes ------- Since version 2.7: Bug Fixes: Ensure that the -put or -post options to wwwoffle have one URL. Fix IPv6 checking (configure fails if IPv6 not available). Fix conditional request problem (304 reply for non-conditional requests). Make the socket binding errors less confusing. Fix requesting of compressed data. Handle NULL strings in FTP code and parsing requests. Speed up wildcard matching of '/*' paths. When search script fails give an error not a blank page. The content-length header is not removed unless compression is being used. Fix core dump with configuration page adding first item to DontGet/DontCache section. Preserve cache file timestamps when compressing them. Handle relative URLs that start with '//'. Fix Solaris compilation problem with statfs/statvfs. Bug fix for failure to censor some headers. Remove the 'alt' attribute from disabled images when modifying HTML. New Features: Re-instate the old configuration editing web pages due to user demand. Allow wildcards to have more than two '*' in them. The upgrade-config.pl script warns about URL-SPECs with path='/' not '/*'. Availability ------------ Version 2.7a uploaded, but may not be available yet FTP server: ftp://ftp.ibiblio.org/pub/Linux/apps/www/servers/wwwoffle-2.7a.tgz FTP server: ftp://ftp.demon.co.uk/pub/unix/httpd/wwwoffle-2.7a.tgz Web page: http://www.gedanken.demon.co.uk/wwwoffle/ Author & Copyright ------------------ This program is copyright Andrew M. Bishop 1996,97,98,99,2000,01,02 (amb@gedanken.demon.co.uk) and distributed under GPL. email: amb@gedanken.demon.co.uk [Please put wwwoffle in the subject line]