Class MetadataCDXMapper
- java.lang.Object
-
- org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>
-
- dk.netarkivet.viewerproxy.webinterface.hadoop.MetadataCDXMapper
-
public class MetadataCDXMapper extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>
Hadoop Mapper for creating CDX indexes for metadata files through the GUI application's QA pages. The input is a key (not used) and a Text line, which should be the path to the archive file. The output is an exit code (not used), and the generated CDX lines.
-
-
Constructor Summary
Constructors Constructor Description MetadataCDXMapper()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description List<String>
index(InputStream archiveInputStream, String archiveName, org.apache.hadoop.mapreduce.Mapper.Context context)
Extracts CDX lines from an inputstream representing a metadata archive fileprotected void
map(org.apache.hadoop.io.LongWritable linenumber, org.apache.hadoop.io.Text archiveFilePath, org.apache.hadoop.mapreduce.Mapper.Context context)
Mapping method.
-
-
-
Method Detail
-
map
protected void map(org.apache.hadoop.io.LongWritable linenumber, org.apache.hadoop.io.Text archiveFilePath, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException, InterruptedException
Mapping method.- Overrides:
map
in classorg.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.NullWritable,org.apache.hadoop.io.Text>
- Parameters:
linenumber
- The linenumber. Is ignored.archiveFilePath
- The path to the archive file.context
- Context used for writing output.- Throws:
IOException
- If it fails to generate the CDX indexes.InterruptedException
-
index
public List<String> index(InputStream archiveInputStream, String archiveName, org.apache.hadoop.mapreduce.Mapper.Context context) throws IOException
Extracts CDX lines from an inputstream representing a metadata archive file- Parameters:
archiveInputStream
- The inputstream the archive file is read from- Returns:
- A list of the CDX lines for the records in the archive file.
- Throws:
IOException
-
-