Class Reporting


  • public class Reporting
    extends Object
    Methods for generating the batch results needed by the QA pages.
    • Method Detail

      • getFilesForJob

        public static List<String> getFilesForJob​(long jobid,
                                                  String harvestprefix)
        Retrieve a list of all files uploaded for a given harvest job. For installations that use batch, this is done via a batch job, and for hadoop-based implementations it is done via an implementation of dk.netarkivet.common.utils.service.FileResolver
        Parameters:
        jobid - the job for which files are required
        harvestprefix - the prefix for the (w)arc datafiles for this job as determined by the implementation of ArchiveFileNaming used in the installation
        Returns:
        a list of filenames
      • getMetadataCDXRecordsForJob

        public static List<CDXRecord> getMetadataCDXRecordsForJob​(long jobid)
        Depending on settings, submits either a Hadoop job or batch job to generate cdx for all metadata files for a job, and returns the results in a list.
        Parameters:
        jobid - The job to get cdx for.
        Returns:
        A list of cdx records.
        Throws:
        ArgumentNotValid - If jobid is 0 or negative.
      • getCrawlLogForDomainInJob

        public static File getCrawlLogForDomainInJob​(String domain,
                                                     long jobid)
        Submit a batch job to extract the part of a crawl log that is associated with the given domain and job.
        Parameters:
        domain - The domain to get crawl.log-lines for.
        jobid - The jobid to get the crawl.log-lines for.
        Returns:
        A file containing the crawl.log lines. This file is temporary, and should be deleted after use.
        Throws:
        ArgumentNotValid - On negative jobids, or if domain is null or the empty string.
      • getCrawlLoglinesMatchingRegexp

        public static File getCrawlLoglinesMatchingRegexp​(long jobid,
                                                          String regexp)
        Return any crawllog lines for a given jobid matching the given regular expression.
        Parameters:
        jobid - The jobid
        regexp - A regular expression
        Returns:
        a File with the matching lines.
      • getCrawlLogLinesMatchingDomain

        public static File getCrawlLogLinesMatchingDomain​(long jobID,
                                                          String domain)