Class Reporting


  • public class Reporting
    extends java.lang.Object
    Methods for generating the batch results needed by the QA pages.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.io.File getCrawlLogForDomainInJob​(java.lang.String domain, long jobid)
      Submit a batch job to extract the part of a crawl log that is associated with the given domain and job.
      static java.io.File getCrawlLoglinesMatchingRegexp​(long jobid, java.lang.String regexp)
      Return any crawllog lines for a given jobid matching the given regular expression.
      static java.util.List<java.lang.String> getFilesForJob​(long jobid, java.lang.String harvestprefix)
      Retrieve a list of all files uploaded for a given harvest job.
      static java.util.List<CDXRecord> getMetadataCDXRecordsForJob​(long jobid)
      Depending on settings, submits either a Hadoop job or batch job to generate cdx for all metadata files for a job, and returns the results in a list.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • getFilesForJob

        public static java.util.List<java.lang.String> getFilesForJob​(long jobid,
                                                                      java.lang.String harvestprefix)
        Retrieve a list of all files uploaded for a given harvest job. For installations that use batch, this is done via a batch job, and for hadoop-based implementations it is done via an implementation of dk.netarkivet.common.utils.service.FileResolver
        Parameters:
        jobid - the job for which files are required
        harvestprefix - the prefix for the (w)arc datafiles for this job as determined by the implementation of ArchiveFileNaming used in the installation
        Returns:
        a list of filenames
      • getMetadataCDXRecordsForJob

        public static java.util.List<CDXRecordgetMetadataCDXRecordsForJob​(long jobid)
        Depending on settings, submits either a Hadoop job or batch job to generate cdx for all metadata files for a job, and returns the results in a list.
        Parameters:
        jobid - The job to get cdx for.
        Returns:
        A list of cdx records.
        Throws:
        ArgumentNotValid - If jobid is 0 or negative.
      • getCrawlLogForDomainInJob

        public static java.io.File getCrawlLogForDomainInJob​(java.lang.String domain,
                                                             long jobid)
        Submit a batch job to extract the part of a crawl log that is associated with the given domain and job.
        Parameters:
        domain - The domain to get crawl.log-lines for.
        jobid - The jobid to get the crawl.log-lines for.
        Returns:
        A file containing the crawl.log lines. This file is temporary, and should be deleted after use.
        Throws:
        ArgumentNotValid - On negative jobids, or if domain is null or the empty string.
      • getCrawlLoglinesMatchingRegexp

        public static java.io.File getCrawlLoglinesMatchingRegexp​(long jobid,
                                                                  java.lang.String regexp)
        Return any crawllog lines for a given jobid matching the given regular expression.
        Parameters:
        jobid - The jobid
        regexp - A regular expression
        Returns:
        a File with the matching lines.