Prev Next Top
This panel displays the following components:
- <Tags & Attributes of Embedded Web Files
combo: It contains the HTML (or other markup) tags and related tag attributes that are expected to contain the URL of embedded web files. The URL will be adjusted to refer to the related imported file. A conditional attribute may be appended to the tag, see Note1 below.
- Accepted File Extensions of Embedded Web Files
combo: It contains the list of all acceptable web file name extensions that can be part of the embedded files of web pages.
- Suffix of Directory to Contain Embedded Web Files
text field: (Disabled) It contains the suffix attached to the name of any sub-directory that will contain embedded web files of a specific web page. The related sub-directory name is the same as the web page file name with the above-mentioned suffix.
- Search Expressions
combo: It contains the list of Search Expressions with their parameters; each one specifies the endpoints or tags of HTML sections to be removed. The search expressions are used during an Optimize operation if the File > Search Expression Enabled menu item is checked. These expressions are used for removing Web bugs, spying scripts, and advertising that provoke an internet access. This is a new feature in version 1.1. See Note2 below.
Note1: Some HTML tags (ex. the LINK tag) can play different roles; the specific one is defined by a specific attribute or even its absence. Here are the possible formats and their meaning:
- <TAG+attrib=value
: The tag TAG should contain the attribute attrib of value value (ex. <LINK+type=text/css)
- <TAG+attrib
: The tag TAG should contain the attribute attrib of any value (ex. <LINK+type)
- <TAG-attrib
: The tag TAG should NOT contain the attribute attrib (ex. <LINK-type)
Note2: A search expression is preceded by a priority parameter, it may be followed by a modifier parameter, each one is separated by a space.
- A priority parameter: It contains a single character that determines the execution priority; the smaller the number, the higher its priority. An expression with a priority parameter of "0" is executed before the one that has a priority parameter of "2".
- The search expression. It contains one "any" wildcard and an endpoint search word chain (usually a tag) on each side of this wildcard (ex. SCR="http://*", where SRC="http:// is the left word chain, and " is the right word chain, in this example, the chain contains only one word or character). A word chain is a sequence of words (or characters) separated by whitespace wildcard characters (ex. instead of a one word SRC=http://", you can specify a chain of three words SRC^=^"http://, that contains the words: SRC, = and "http://). Each word found may be separated by none-to-many whitespaces. This explanation is only a summary; a complete documentation is included in the SearchExpressionSyntax.htm file. Note: The search words are always case insensitive.
- An optional modifier parameter. It contains 0 to 3 characters. If it contains 1 or 3 characters, the expression is disabled. If it contains 2 or 3 characters, the first one specifies the whitespace wildcard character, and the second character specifies the "any" wildcard character. If this parameter is absent (i.e. 0 character), it means that the expression is enabled and that the default wildcards are used. The default wildcards are:
- ^
as the whitespace wildcard character
- *
as the "any" wildcard character
Search Expression Examples
Example 1
2 <SCRIPT*</SCRIPT>
This expression is enabled and it is executed after those that have a priority of 0 or 1. When executed, it searches in the HTML files for any sequence of characters that starts by <SCRIPT or <script and that ends by </SCRIPT> or </script>. The found text is then deleted, including the start and end word chains themselves.
Example 2
4 SRC_=_"http://+" _+
This expression is enabled and it comes after the previous expression (its priority value is 4, the previous one is 2). Its space wildcard is _ and its "any" wildcard is +. It searches for any text sequence that starts by the SRC, none-to-many whitespaces, =, none-to-many whitespaces and "http//, and that ends by the " character. The found text is then deleted, including the start and end word chains themselves.
For example, the following text (in bold) will be found by the above expression:
- <IMG SRC="http://bla.com/ad.jpg" height=12>
- <IMG SRC ="HTTP://blaxx.com/add.gif" width=62>
- <IMG src
= "http://jazz.org/bar.txt" height=15>
Prev Next Top