| Package | Description |
|---|---|
| org.apache.nutch.parse |
The
Parse interface and related classes. |
| org.apache.nutch.parse.html |
An HTML document parsing plugin.
|
| org.apache.nutch.parse.js |
Parser and parse filter plugin to extract all (possible) links
from JavaScript files and embedded JavaScript code snippets.
|
| org.apache.nutch.parse.tika |
Parse various document formats with help of
Apache Tika.
|
| Modifier and Type | Method and Description |
|---|---|
Parser |
ParserFactory.getParserById(java.lang.String id)
Function returns a
Parser instance with the specified
extId, representing its extension ID. |
Parser[] |
ParserFactory.getParsers(java.lang.String contentType,
java.lang.String url)
Function returns an array of
Parsers for a given content type. |
| Modifier and Type | Class and Description |
|---|---|
class |
HtmlParser |
| Modifier and Type | Class and Description |
|---|---|
class |
JSParseFilter
This class is a heuristic link extractor for JavaScript files and code
snippets.
|
| Modifier and Type | Class and Description |
|---|---|
class |
TikaParser
Wrapper for Tika parsers.
|
Copyright © 2019 The Apache Software Foundation