Class MetadataCDXMapper


  • public class MetadataCDXMapper
    extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text,​org.apache.hadoop.io.NullWritable,​org.apache.hadoop.io.Text>
    Hadoop Mapper for creating CDX indexes for metadata files through the GUI application's QA pages. The input is a key (not used) and a Text line, which should be the path to the archive file. The output is an exit code (not used), and the generated CDX lines.
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper

        org.apache.hadoop.mapreduce.Mapper.Context
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.List<java.lang.String> index​(java.io.InputStream archiveInputStream, java.lang.String archiveName)
      Extracts CDX lines from an inputstream representing a metadata archive file
      protected void map​(org.apache.hadoop.io.LongWritable linenumber, org.apache.hadoop.io.Text archiveFilePath, org.apache.hadoop.mapreduce.Mapper.Context context)
      Mapping method.
      • Methods inherited from class org.apache.hadoop.mapreduce.Mapper

        cleanup, run, setup
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • map

        protected void map​(org.apache.hadoop.io.LongWritable linenumber,
                           org.apache.hadoop.io.Text archiveFilePath,
                           org.apache.hadoop.mapreduce.Mapper.Context context)
                    throws java.io.IOException,
                           java.lang.InterruptedException
        Mapping method.
        Overrides:
        map in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text,​org.apache.hadoop.io.NullWritable,​org.apache.hadoop.io.Text>
        Parameters:
        linenumber - The linenumber. Is ignored.
        archiveFilePath - The path to the archive file.
        context - Context used for writing output.
        Throws:
        java.io.IOException - If it fails to generate the CDX indexes.
        java.lang.InterruptedException
      • index

        public java.util.List<java.lang.String> index​(java.io.InputStream archiveInputStream,
                                                      java.lang.String archiveName)
                                               throws java.io.IOException
        Extracts CDX lines from an inputstream representing a metadata archive file
        Parameters:
        archiveInputStream - The inputstream the archive file is read from
        Returns:
        A list of the CDX lines for the records in the archive file.
        Throws:
        java.io.IOException