public class SolrDeleteDuplicates extends Reducer<Text,SolrDeleteDuplicates.SolrRecord,Text,SolrDeleteDuplicates.SolrRecord> implements Tool
SolrDeleteDuplicates.SolrRecord instances(which contain id, boost and timestamp)SolrDeleteDuplicates.SolrRecords with the same digest will be
grouped together. Now, of these documents with the same digests, delete all
of them except the one with the highest score (boost field). If two (or more)
documents have the same score, then the document with the latest timestamp is
kept. Again, every other is deleted from solr index.| Modifier and Type | Class and Description |
|---|---|
static class |
SolrDeleteDuplicates.SolrInputFormat |
static class |
SolrDeleteDuplicates.SolrInputSplit |
static class |
SolrDeleteDuplicates.SolrRecord |
static class |
SolrDeleteDuplicates.SolrRecordReader |
Reducer.Context| Constructor and Description |
|---|
SolrDeleteDuplicates() |
| Modifier and Type | Method and Description |
|---|---|
void |
cleanup(Reducer.Context context) |
boolean |
dedup(java.lang.String solrUrl) |
Configuration |
getConf() |
static void |
main(java.lang.String[] args) |
void |
reduce(Text key,
java.lang.Iterable<SolrDeleteDuplicates.SolrRecord> values,
Reducer.Context context) |
int |
run(java.lang.String[] args) |
void |
setConf(Configuration conf) |
void |
setup(Reducer.Context job) |
public Configuration getConf()
getConf in interface Configurablepublic void setConf(Configuration conf)
setConf in interface Configurablepublic void setup(Reducer.Context job) throws java.io.IOException
setup in class Reducer<Text,SolrDeleteDuplicates.SolrRecord,Text,SolrDeleteDuplicates.SolrRecord>java.io.IOExceptionpublic void cleanup(Reducer.Context context) throws java.io.IOException
cleanup in class Reducer<Text,SolrDeleteDuplicates.SolrRecord,Text,SolrDeleteDuplicates.SolrRecord>java.io.IOExceptionpublic void reduce(Text key, java.lang.Iterable<SolrDeleteDuplicates.SolrRecord> values, Reducer.Context context) throws java.io.IOException
reduce in class Reducer<Text,SolrDeleteDuplicates.SolrRecord,Text,SolrDeleteDuplicates.SolrRecord>java.io.IOExceptionpublic boolean dedup(java.lang.String solrUrl)
throws java.io.IOException,
java.lang.InterruptedException,
java.lang.ClassNotFoundException
java.io.IOExceptionjava.lang.InterruptedExceptionjava.lang.ClassNotFoundExceptionpublic int run(java.lang.String[] args)
throws java.io.IOException,
java.lang.InterruptedException,
java.lang.ClassNotFoundException
public static void main(java.lang.String[] args)
throws java.lang.Exception
java.lang.ExceptionCopyright © 2019 The Apache Software Foundation