dk.netarkivet.viewerproxy.webinterface
Class Reporting

java.lang.Object
  extended by dk.netarkivet.viewerproxy.webinterface.Reporting

public class Reporting
extends java.lang.Object

Methods for generating the batch results needed by the QA pages.


Field Summary
(package private) static java.lang.String archivefile_suffix
          The suffix for the data arc/warc files produced by Heritrix.
(package private) static java.lang.String metadatafile_suffix
          The suffix for the data arc/warc metadata file created by NetarchiveSuite.
 
Method Summary
static java.io.File getCrawlLogForDomainInJob(java.lang.String domain, int jobid)
          Submit a batch job to extract the part of a crawl log that is associated with the given domain and job.
static java.io.File getCrawlLoglinesMatchingRegexp(int jobid, java.lang.String regexp)
          Return any crawllog lines for a given jobid matching the given regular expression.
static java.util.List<java.lang.String> getFilesForJob(int jobid, java.lang.String harvestprefix)
          Submit a batch job to list all files for a job, and report result in a sorted list.
static java.util.List<CDXRecord> getMetadataCDXRecordsForJob(long jobid)
          Submit a batch job to generate cdx for all metadata files for a job, and report result in a list.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

archivefile_suffix

static final java.lang.String archivefile_suffix
The suffix for the data arc/warc files produced by Heritrix.

See Also:
Constant Field Values

metadatafile_suffix

static final java.lang.String metadatafile_suffix
The suffix for the data arc/warc metadata file created by NetarchiveSuite.

See Also:
Constant Field Values
Method Detail

getFilesForJob

public static java.util.List<java.lang.String> getFilesForJob(int jobid,
                                                              java.lang.String harvestprefix)
Submit a batch job to list all files for a job, and report result in a sorted list.

Parameters:
jobid - The job to get files for.
harvestprefix - The harvestprefix for the files produced by heritrix
Returns:
A sorted list of files.
Throws:
ArgumentNotValid - If jobid is 0 or negative.
IOFailure - On trouble generating the file list

getMetadataCDXRecordsForJob

public static java.util.List<CDXRecord> getMetadataCDXRecordsForJob(long jobid)
Submit a batch job to generate cdx for all metadata files for a job, and report result in a list.

Parameters:
jobid - The job to get cdx for.
Returns:
A list of cdx records.
Throws:
ArgumentNotValid - If jobid is 0 or negative.
IOFailure - On trouble generating the cdx

getCrawlLogForDomainInJob

public static java.io.File getCrawlLogForDomainInJob(java.lang.String domain,
                                                     int jobid)
Submit a batch job to extract the part of a crawl log that is associated with the given domain and job.

Parameters:
domain - The domain to get crawl.log-lines for.
jobid - The jobid to get the crawl.log-lines for.
Returns:
A file containing the crawl.log lines. This file is temporary, and should be deleted after use.
Throws:
ArgumentNotValid - On negative jobids, or if domain is null or the empty string.

getCrawlLoglinesMatchingRegexp

public static java.io.File getCrawlLoglinesMatchingRegexp(int jobid,
                                                          java.lang.String regexp)
Return any crawllog lines for a given jobid matching the given regular expression.

Parameters:
jobid - The jobid
regexp - A regular expression
Returns:
a File with the matching lines.