| Interface | Description |
|---|---|
| ParseFilter |
Extension point for DOM-based parsers.
|
| Parser |
A parser for content generated by a
Protocol implementation. |
| ParseStatusCodes |
| Class | Description |
|---|---|
| HTMLMetaTags |
This class holds the information about HTML "meta" tags extracted from a
page.
|
| NutchSitemapParse | |
| NutchSitemapParser | |
| Outlink | |
| OutlinkExtractor |
Extractor to extract
Outlinks / URLs from
plain text using Regular Expressions. |
| Parse | |
| ParseFilters |
Creates and caches
ParseFilter implementing plugins. |
| ParsePluginList |
This class represents a natural ordering for which parsing plugin should get
called for a particular mimeType.
|
| ParsePluginsReader |
A reader to load the information stored in the
$NUTCH_HOME/conf/parse-plugins.xml file. |
| ParserChecker |
Parser checker, useful for testing parser.
|
| ParserFactory |
Creates and caches
Parser plugins. |
| ParserJob | |
| ParserJob.ParserMapper | |
| ParseStatusUtils | |
| ParseUtil |
| Enum | Description |
|---|---|
| ParseUtil.ChangeFrequency |
| Exception | Description |
|---|---|
| ParseException | |
| ParserNotFound |
Parse interface and related classes.Copyright © 2019 The Apache Software Foundation