| Package | Description |
|---|---|
| org.apache.nutch.parse |
The
Parse interface and related classes. |
| org.apache.nutch.parse.html |
An HTML document parsing plugin.
|
| org.apache.nutch.parse.tika |
Parse various document formats with help of
Apache Tika.
|
| Modifier and Type | Method and Description |
|---|---|
Outlink[] |
Parse.getOutlinks() |
static Outlink[] |
OutlinkExtractor.getOutlinks(java.lang.String plainText,
Configuration conf)
Extracts
Outlink from given plain text. |
static Outlink[] |
OutlinkExtractor.getOutlinks(java.lang.String plainText,
java.lang.String anchor,
Configuration conf)
Extracts
Outlink from given plain text and adds anchor to the
extracted Outlinks |
static Outlink |
Outlink.read(java.io.DataInput in) |
| Modifier and Type | Method and Description |
|---|---|
java.util.Map<Outlink,Metadata> |
NutchSitemapParse.getOutlinkMap() |
| Modifier and Type | Method and Description |
|---|---|
void |
Parse.setOutlinks(Outlink[] outlinks) |
| Modifier and Type | Method and Description |
|---|---|
void |
NutchSitemapParse.setOutlinks(java.util.Map<Outlink,Metadata> outlinkMap) |
| Constructor and Description |
|---|
Parse(java.lang.String text,
java.lang.String title,
Outlink[] outlinks,
ParseStatus parseStatus) |
| Constructor and Description |
|---|
NutchSitemapParse(java.util.Map<Outlink,Metadata> outlinkMap,
ParseStatus parseStatus) |
| Modifier and Type | Method and Description |
|---|---|
void |
DOMContentUtils.getOutlinks(java.net.URL base,
java.util.ArrayList<Outlink> outlinks,
org.w3c.dom.Node node)
This method finds all anchors below the supplied DOM
node, and
creates appropriate Outlink records for each (relative to the
supplied base URL), and adds them to the outlinks
ArrayList. |
| Modifier and Type | Method and Description |
|---|---|
void |
DOMContentUtils.getOutlinks(java.net.URL base,
java.util.ArrayList<Outlink> outlinks,
org.w3c.dom.Node node)
This method finds all anchors below the supplied DOM
node, and
creates appropriate Outlink records for each (relative to the
supplied base URL), and adds them to the outlinks
ArrayList. |
Copyright © 2019 The Apache Software Foundation