dk.netarkivet.viewerproxy.reporting
Class HarvestedUrlsForDomainBatchJob

java.lang.Object
  extended by dk.netarkivet.common.utils.batch.FileBatchJob
      extended by dk.netarkivet.common.utils.arc.ARCBatchJob
          extended by dk.netarkivet.viewerproxy.reporting.HarvestedUrlsForDomainBatchJob
All Implemented Interfaces:
java.io.Serializable

public class HarvestedUrlsForDomainBatchJob
extends ARCBatchJob

Batchjob that extracts lines referring to a specific domain from a crawl log. The batch job should be restricted to run on metadata files for a specific job only, using the FileBatchJob.processOnlyFilesMatching(String) construct.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
FileBatchJob.ExceptionOccurrence
 
Field Summary
(package private)  java.lang.String domain
          The domain to extract crawl.log lines for.
 
Fields inherited from class dk.netarkivet.common.utils.arc.ARCBatchJob
noOfRecordsProcessed
 
Fields inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed
 
Constructor Summary
HarvestedUrlsForDomainBatchJob(java.lang.String domain)
          Initialise the batch job.
 
Method Summary
 void finish(java.io.OutputStream os)
          Does nothing, no finishing is needed.
 ARCBatchFilter getFilter()
          returns a BatchFilter object which restricts the set of arcrecords in the archive on which this batch-job is performed.
 void initialize(java.io.OutputStream os)
          Does nothing, no initialisation is needed.
 void processRecord(org.archive.io.arc.ARCRecord record, java.io.OutputStream os)
          Process a record on crawl log concerning the given domain to result.
 java.lang.String toString()
          Humanly readable representation of this instance.
 
Methods inherited from class dk.netarkivet.common.utils.arc.ARCBatchJob
getExceptionArray, handleException, noOfRecordsProcessed, processFile
 
Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

domain

final java.lang.String domain
The domain to extract crawl.log lines for.

Constructor Detail

HarvestedUrlsForDomainBatchJob

public HarvestedUrlsForDomainBatchJob(java.lang.String domain)
Initialise the batch job.

Parameters:
domain - The domain to get crawl.log lines for.
Method Detail

initialize

public void initialize(java.io.OutputStream os)
Does nothing, no initialisation is needed.

Specified by:
initialize in class ARCBatchJob
Parameters:
os - Not used.

getFilter

public ARCBatchFilter getFilter()
Description copied from class: ARCBatchJob
returns a BatchFilter object which restricts the set of arcrecords in the archive on which this batch-job is performed. The default value is a neutral filter which allows all records.

Overrides:
getFilter in class ARCBatchJob
Returns:
A filter telling which records should be given to processRecord().

processRecord

public void processRecord(org.archive.io.arc.ARCRecord record,
                          java.io.OutputStream os)
Process a record on crawl log concerning the given domain to result.

Specified by:
processRecord in class ARCBatchJob
Parameters:
record - The record to process.
os - The output stream for the result.
Throws:
ArgumentNotValid - on null parameters
IOFailure - on trouble processing the record.

finish

public void finish(java.io.OutputStream os)
Does nothing, no finishing is needed.

Specified by:
finish in class ARCBatchJob
Parameters:
os - Not used.

toString

public java.lang.String toString()
Humanly readable representation of this instance.

Overrides:
toString in class java.lang.Object
Returns:
The class content.