public class IndexUtil
extends java.lang.Object
| Constructor and Description |
|---|
IndexUtil(Configuration conf) |
| Modifier and Type | Method and Description |
|---|---|
NutchDocument |
index(java.lang.String key,
WebPage page)
Index a
WebPage, here we add the following fields:
id: default uniqueKey for the NutchDocument.
digest: Digest is used to identify pages (like unique ID) and
is used to remove duplicates during the dedup procedure. |
public IndexUtil(Configuration conf)
public NutchDocument index(java.lang.String key, WebPage page)
WebPage, here we add the following fields:
NutchDocument.MD5Signature or
TextProfileSignature.key - The key of the page (reversed url).page - The WebPage.Copyright © 2019 The Apache Software Foundation