Class GetMetadataMapper


  • public class GetMetadataMapper
    extends org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text,​org.apache.hadoop.io.NullWritable,​org.apache.hadoop.io.Text>
    Hadoop Mapper for extracting metadata entries from metadata files.
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.Mapper

        org.apache.hadoop.mapreduce.Mapper.Context
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static java.lang.String MIME_PATTERN  
      static java.lang.String URL_PATTERN  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      protected void map​(org.apache.hadoop.io.LongWritable lineNumber, org.apache.hadoop.io.Text filePath, org.apache.hadoop.mapreduce.Mapper.Context context)
      Mapping method.
      protected void setup​(org.apache.hadoop.mapreduce.Mapper.Context context)
      Setup method that is provided by default for a Hadoop Mapper.
      • Methods inherited from class org.apache.hadoop.mapreduce.Mapper

        cleanup, run
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • setup

        protected void setup​(org.apache.hadoop.mapreduce.Mapper.Context context)
                      throws java.io.IOException,
                             java.lang.InterruptedException
        Setup method that is provided by default for a Hadoop Mapper. Initializes the patterns for matching the metadata records.
        Overrides:
        setup in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text,​org.apache.hadoop.io.NullWritable,​org.apache.hadoop.io.Text>
        Parameters:
        context - The job context. Used for getting the provided Configuration.
        Throws:
        java.io.IOException - Thrown by the super class' setup method.
        java.lang.InterruptedException - Thrown by the super class' setup method.
      • map

        protected void map​(org.apache.hadoop.io.LongWritable lineNumber,
                           org.apache.hadoop.io.Text filePath,
                           org.apache.hadoop.mapreduce.Mapper.Context context)
        Mapping method.
        Overrides:
        map in class org.apache.hadoop.mapreduce.Mapper<org.apache.hadoop.io.LongWritable,​org.apache.hadoop.io.Text,​org.apache.hadoop.io.NullWritable,​org.apache.hadoop.io.Text>
        Parameters:
        lineNumber - The current line number of the input file (is ignored).
        filePath - The path to the input file.
        context - Context used for writing output.