public class OutlinkExtractor
extends java.lang.Object
Outlinks / URLs from
plain text using Regular Expressions.| Constructor and Description |
|---|
OutlinkExtractor() |
| Modifier and Type | Method and Description |
|---|---|
static Outlink[] |
getOutlinks(java.lang.String plainText,
Configuration conf)
Extracts
Outlink from given plain text. |
static Outlink[] |
getOutlinks(java.lang.String plainText,
java.lang.String anchor,
Configuration conf)
Extracts
Outlink from given plain text and adds anchor to the
extracted Outlinks |
public static Outlink[] getOutlinks(java.lang.String plainText, Configuration conf)
Outlink from given plain text. Applying this method
to non-plain-text can result in extremely lengthy runtimes for parasitic
cases (postscript is a known example).plainText - the plain text from wich URLs should be extracted.Outlinks within found in plainTextpublic static Outlink[] getOutlinks(java.lang.String plainText, java.lang.String anchor, Configuration conf)
Outlink from given plain text and adds anchor to the
extracted OutlinksplainText - the plain text from wich URLs should be extracted.anchor - the anchor of the urlOutlinks within found in plainTextCopyright © 2019 The Apache Software Foundation