Class MetadataCDXMapper


  • public class MetadataCDXMapper
    extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text,​org.apache.hadoop.io.NullWritable,​org.apache.hadoop.io.Text>
    Hadoop Mapper for creating CDX indexes for metadata files through the GUI application's QA pages. The input is a key (not used) and a Text line, which should be the path to the archive file. The output is an exit code (not used), and the generated CDX lines.
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper

        org.apache.hadoop.mapreduce.Mapper.Context
    • Constructor Detail

      • MetadataCDXMapper

        public MetadataCDXMapper()
    • Method Detail

      • map

        protected void map​(org.apache.hadoop.io.LongWritable linenumber,
                           org.apache.hadoop.io.Text archiveFilePath,
                           org.apache.hadoop.mapreduce.Mapper.Context context)
                    throws IOException,
                           InterruptedException
        Mapping method.
        Overrides:
        map in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text,​org.apache.hadoop.io.NullWritable,​org.apache.hadoop.io.Text>
        Parameters:
        linenumber - The linenumber. Is ignored.
        archiveFilePath - The path to the archive file.
        context - Context used for writing output.
        Throws:
        IOException - If it fails to generate the CDX indexes.
        InterruptedException
      • index

        public List<String> index​(InputStream archiveInputStream,
                                  String archiveName,
                                  org.apache.hadoop.mapreduce.Mapper.Context context)
                           throws IOException
        Extracts CDX lines from an inputstream representing a metadata archive file
        Parameters:
        archiveInputStream - The inputstream the archive file is read from
        Returns:
        A list of the CDX lines for the records in the archive file.
        Throws:
        IOException