Class MetadataExtractionStrategy

  • All Implemented Interfaces:
    HadoopJobStrategy

    public class MetadataExtractionStrategy
    extends java.lang.Object
    implements HadoopJobStrategy
    Strategy to give a HadoopJob when wanting to extract selected content from metadata files matching specific URL- and MIME-patterns. The mapper expects the used Configuration to have these patterns set before use. Otherwise, it will use all-matching patterns. This type of job is the Hadoop counterpart to running GetMetadataArchiveBatchJob.
    • Constructor Summary

      Constructors 
      Constructor Description
      MetadataExtractionStrategy​(long jobID, org.apache.hadoop.fs.FileSystem fileSystem)
      Constructor.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.hadoop.fs.Path createJobInputFile​(java.util.UUID uuid)
      Create the job input file with name from a uuid.
      org.apache.hadoop.fs.Path createJobOutputDir​(java.util.UUID uuid)
      Create the job output directory with name from a uuid.
      java.lang.String getJobType()
      Return a string specifying which kind of job is being run.
      int runJob​(org.apache.hadoop.fs.Path jobInputFile, org.apache.hadoop.fs.Path jobOutputDir)
      Runs a Hadoop job (HadoopJobTool) according to the specification of the used strategy.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • MetadataExtractionStrategy

        public MetadataExtractionStrategy​(long jobID,
                                          org.apache.hadoop.fs.FileSystem fileSystem)
        Constructor.
        Parameters:
        jobID - The ID for the job.
        fileSystem - The Hadoop FileSystem used.
    • Method Detail

      • runJob

        public int runJob​(org.apache.hadoop.fs.Path jobInputFile,
                          org.apache.hadoop.fs.Path jobOutputDir)
        Description copied from interface: HadoopJobStrategy
        Runs a Hadoop job (HadoopJobTool) according to the specification of the used strategy.
        Specified by:
        runJob in interface HadoopJobStrategy
        Parameters:
        jobInputFile - The Path specifying the job's input file.
        jobOutputDir - The Path specifying the job's output directory.
        Returns:
        An exit code for the job.
      • createJobInputFile

        public org.apache.hadoop.fs.Path createJobInputFile​(java.util.UUID uuid)
        Description copied from interface: HadoopJobStrategy
        Create the job input file with name from a uuid.
        Specified by:
        createJobInputFile in interface HadoopJobStrategy
        Parameters:
        uuid - The UUID to create a unique name from.
        Returns:
        Path specifying where the input file is located.
      • createJobOutputDir

        public org.apache.hadoop.fs.Path createJobOutputDir​(java.util.UUID uuid)
        Description copied from interface: HadoopJobStrategy
        Create the job output directory with name from a uuid.
        Specified by:
        createJobOutputDir in interface HadoopJobStrategy
        Parameters:
        uuid - The UUID to create a unique name from.
        Returns:
        Path specifying where the output directory is located.