Browser configuration
You have to configure your web browser to use WebCleaner as a proxy.
Netscape/Mozilla
Select Edit -> Preferences -> Advanced
-> Proxies.
Activate Manual proxy configuration.
Under HTTP Proxy enter localhost, the Port is
8080.
Under HTTPS Proxy enter localhost, the Port is
8080.
Under No Proxy for enter localhost, 127.0.0.1.
Click Ok to use your new settings.
Firefox
Select Edit -> Preferences -> General
-> Connection Settings.
Activate Manual proxy configuration.
Under HTTP Proxy enter localhost, the Port is
8080.
Under SSL Proxy enter localhost, the Port is
8080.
Under No Proxy for enter localhost, 127.0.0.1.
Click Ok to use your new settings.
Internet Explorer
Select Tools -> Internet Options ->
Connections.
Click on LAN Settings. If you have a dialup connection to the
internet, select your dialup connection and click on Settings.
Activate Use a proxy server.
If activated, deactivate Bypass proxy server for local addresses.
Click on Advanced.
Under HTTP enter localhost, the Port is
8080.
Under Secure enter localhost, the Port is
8080.
Click Ok to use your new settings.
Opera 8
Select Tools -> Preferences -> Advanced ->
Network -> Proxy servers.
Activate HTTP and enter localhost, the Port is
8080.
Activate HTTPS and enter localhost, the Port is
8080.
Activate Enable HTTP 1.1 for proxies
Activate Do not use proxy on the adresses below and enter
localhost, 127.0.0.1.
Click Ok to use your new settings.
Konqueror (KDE)
Select Settings -> Configure Konqueror ->
Proxy.
Activate Manually specify the proxy settings and select its
Setup. In the new windows enter localhost as hostname and
8080 as port number both for
HTTP and HTTPS.
Under Exceptions add both localhost and 127.0.0.1
with the New button.
Proxy filter modules
WebCleaner uses a modular filter design allowing
a lot of flexibility for different uses.
Each module has a list if mime types and a list of which
parts of request/response challenge it applies to. And each module can
be further customized by separate rules in the filter configuration.
Name | Description | Requirements | Configuration rules |
---|---|---|---|
BinaryCharFilter | Replace illegal binary characters in HTML code like the quote chars often found in Microsoft pages. |
MIME types: text/html
HTTP stages: response content body
|
None |
Blocker | Block or allow specific sites by URL name. Before matching a URL the hostname and path is unquoted to avoid spoofing attacks. |
MIME types: all
HTTP stages: request URL
|
Block, Allow |
Compress | Compression of documents with good compression ratio like HTML, WAV, etc. |
MIME types: text/*, application/postscript,
application/pdf, application/x-dvi, audio/basic, audio/midi, audio/x-wav,
image/x-portable-*map, x-world/x-vrml
HTTP stages: response content body
|
None |
GifImage | Deanimates GIFs and removes all unwanted GIF image extensions (for example GIF comments). |
MIME types: image/gif
HTTP stages: response content body
|
None |
Header | Add, modify and delete HTTP headers of request and response. |
MIME types: all
HTTP stages: request and response headers
|
Header |
HtmlRewriter | Parse HTML code and rewrite single tags, attributes and values. Execute and filter JavaScript. Parse and filter content rated pages. Filter HTML comments. |
MIME types: text/html
HTTP stages: response content body
|
Javascript, Nocomments, Rating, Htmlrewrite |
Name | Description | Requirements | Configuration rules |
ImageReducer | Convert images to low quality JPEG files to reduce bandwidth |
Software: the Python Image Library (PIL) must be installed.
MIME types: all image types supported by the
Python Imaging Library (as of version 1.1.5: jpeg, png, gif, bmp, pcx, tiff, xbm, xpm)
HTTP stages: response content body
|
None |
ImageSize | Remove images with certain width and/or height. |
Software: the Python Image Library (PIL) must be installed.
MIME types: all image types supported by the
Python Imaging Library (as of version 1.1.5: jpeg, png, gif, bmp, pcx, tiff, xbm, xpm)
HTTP stages: response content body
|
Image |
Rating | Parse and evaluate content rating data. |
MIME types: all
HTTP stages: response headers
|
Rating |
Replacer | Replace regular expressions in data streams. |
MIME types: text/html, (text|application)/javascript
HTTP stages: response content body
|
Replace |
VirusFilter | Scan all data with the ClamAv virus scanner. For performance reasons there is a maximum size of 4 MB. If an object exceeds that size the proxy gives an error. |
Software: the ClamAV virus scanner must be installed
on the proxy host.
MIME types: text/html
HTTP stages: response content body
|
Antivirus |
XmlRewriter | Parse XML code and rewrite single tags, attributes and values. Plus there is the ability to filter embedded HTML content, often occuring in RSS feeds. |
MIME types: text/html
HTTP stages: response content body
|
Htmlrewrite, Xmlrewrite |
Filter configuration rules
Htmlrewrite
Matching
A HTML rewrite rule applies to one specified HTML tag and
can replace (or delete if the replacement data is empty) parts of or the
complete tag. The tag name is a case insensitive string.
If attributes are given, they must match too before the rule applies.
Action
If there is no replacement given the specified tag
part will be removed, else it will be replaced.
Back references to matched subgroups can be specified in the replacement
string with a backslash and the subgroup number (ie. \1, \2, ...).
What it does when replacement is foo | ||
---|---|---|
replace part | before | after |
tag | <blink>text</blink> | footextfoo |
tagname | <blink>text</blink> | <foo>text</foo> |
enclosed | <blink>text</blink> | <blink>foo</blink> |
attr | <a href="bla">..</a> | <a foo>..</a> |
attrval | <a href="bla">..</a> | <a href="foo">..</a> |
complete | <a href="bla">..</a> | foo |
If you specified zero or more than one attributes to match, 'attr' and 'attrvalue' replace the first occuring or matching attribute or nothing.
Xmlrewrite
Selector
An XML rewrite rule applies to one specific XML tag
and can replace (or delete) parts of or the complete tag.
The selector is a simplified XPath expression of the form
(/tag)+
where a tag is of the form
name([attr=val(,attr=val)*])?
.
Tag names, attributes and values are case sensitive.
Example: /rss/channel/item/description
selects the
<description>
XML tag in an RSS new feed.
Action
Defined replacement types | ||
---|---|---|
replace type | replace value | action |
rsshtml | unused | Assumes all text content inside the XML tag is HTML. Only allows certain HTML tags, and filters the HTML data with the Htmlrewrite rules. |
remove | unused | Removes the complete selected XML tag and its content. |
Replace
Replace regular expressions in HTML or JavaScript pages.
Block
A block rule specifies regular expressions for urls
which must be blocked.
The replacement URL specifies the URL to show when the block matches. If none
is given a default block message is shown.
Back references to matched subgroups can be specified in the replacement
url with a backslash and the subgroup number (ie. \1, \2, ...).
Blockdomains
Block a list of domains. The domain list is stored in an extern compressed file.
Blockurls
Block a list of URLs. The URL list is stored in an extern compressed file.
Allow
An allow rule specifies regular expressions for urls which must be allowed, even if a matching block rule exists.
Allowdomains
Allow a list of domains. The domain list is stored in an extern compressed file.
Allowurls
Allow a list of URLs. The URL list is stored in an extern compressed file.
Header
Modify HTTP headers. If the replacement value is empty, the header is deleted, else it gets replaced or added if it did not exist before.
Image
Block images with a certain size by replacing them with a transparent 1x1 image.
Javascript
Execute and filter JavaScript (JS) in HTML pages using the integrated Spidermonkey JS engine. The filter deletes popups and places dynamic content emitted with document.write() into the HTML file.
Nocomments
Remove comments from HTML source. Comments inside <script> or <style> tags are not removed.
Rating
One activated Rating rule enables the content rating system in WebCleaner. Several distinct content rating services including the one defined by WebCleaner itself can be configured.
Antivirus
One activated Antivirus rule enables the virus filtering for the VirusFilter module.