Class CrawlLogExtractionMapper
- java.lang.Object
-
- org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>
-
- dk.netarkivet.viewerproxy.webinterface.hadoop.CrawlLogExtractionMapper
-
public class CrawlLogExtractionMapper extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>
Hadoop Mapper for extracting crawllog lines from metadata files. Expects the Configuration provided for the job to have a regex set, which is used to filter for relevant lines. If no regex is set an all-matching regex will be used.
-
-
Constructor Summary
Constructors Constructor Description CrawlLogExtractionMapper()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected void
map(org.apache.hadoop.io.LongWritable linenumber, org.apache.hadoop.io.Text archiveFilePath, org.apache.hadoop.mapreduce.Mapper.Context context)
Mapping method.
-
-
-
Method Detail
-
map
protected void map(org.apache.hadoop.io.LongWritable linenumber, org.apache.hadoop.io.Text archiveFilePath, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException
Mapping method.- Overrides:
map
in classorg.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>
- Parameters:
linenumber
- The linenumber. Is ignored.archiveFilePath
- The path to the archive file.context
- Context used for writing output.- Throws:
IOException
- If it fails to generate the CDX indexes.InterruptedException
-
-