dk.netarkivet.harvester.datamodel
Class HeritrixTemplate

java.lang.Object
  extended by dk.netarkivet.harvester.datamodel.HeritrixTemplate

public class HeritrixTemplate
extends java.lang.Object

Class encapsulating the Heritrix order.xml. Enables verification that dom4j Document obey the constraints required by our software, specifically the Job class. The class assumes the type of order.xml used in configuring Heritrix version 1.10+. Information about the Heritrix crawler, and its processes and modules can be found in the Heritrix developer and user manuals found on http://crawler.archive.org


Field Summary
static java.lang.String ARCHIVER_PATH_XPATH
          Xpath to check, that all templates use the same archiver path, Constants.ARCDIRECTORY_NAME.
static java.lang.String BALANCE_REPLENISH_AMOUNT_XPATH
          Xpath needed by Job.editOrderXML_maxObjectsPerDomain().
static java.lang.String DECIDERULES_MAP_XPATH
          Xpath needed by Job.editOrderXML_crawlerTraps().
static java.lang.String DECIDINGSCOPE_XPATH
          Xpath to check, that all templates use the DecidingScope.
static java.lang.String DEDUPLICATOR_XPATH
          Xpath for the deduplicator node in order.xml documents.
static java.lang.String ERROR_PENALTY_AMOUNT_XPATH
          Xpath needed by Job.editOrderXML_maxObjectsPerDomain().
static java.lang.String GROUP_MAX_ALL_KB_XPATH
          Xpath needed by Job.editOrderXML_maxBytesPerDomain().
static java.lang.String GROUP_MAX_FETCH_SUCCESS_XPATH
          Xpath needed by Job.editOrderXML_maxObjectsPerDomain().
static java.lang.String HERITRIX_FROM_XPATH
          Xpath checked by Heritrix for correct mail address.
static java.lang.String HERITRIX_USER_AGENT_XPATH
          Xpath checked by Heritrix for correct user-agent field in requests.
static java.lang.String QUEUE_TOTAL_BUDGET_XPATH
          Xpath needed by Job.editOrderXML_maxObjectsPerDomain().
 
Constructor Summary
HeritrixTemplate(org.dom4j.Document doc)
          Alternate constructor, which always verifies the given document.
HeritrixTemplate(org.dom4j.Document doc, boolean verify)
          Constructor for HeritrixTemplate class.
 
Method Summary
 org.dom4j.Document getTemplate()
          return the template.
 java.lang.String getXML()
          Return HeritrixTemplate as XML.
 boolean isVerified()
          Has Template been verified?
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

GROUP_MAX_ALL_KB_XPATH

public static final java.lang.String GROUP_MAX_ALL_KB_XPATH
Xpath needed by Job.editOrderXML_maxBytesPerDomain().

See Also:
Constant Field Values

GROUP_MAX_FETCH_SUCCESS_XPATH

public static final java.lang.String GROUP_MAX_FETCH_SUCCESS_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain().

See Also:
Constant Field Values

QUEUE_TOTAL_BUDGET_XPATH

public static final java.lang.String QUEUE_TOTAL_BUDGET_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain().

See Also:
Constant Field Values

ERROR_PENALTY_AMOUNT_XPATH

public static final java.lang.String ERROR_PENALTY_AMOUNT_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain().

See Also:
Constant Field Values

BALANCE_REPLENISH_AMOUNT_XPATH

public static final java.lang.String BALANCE_REPLENISH_AMOUNT_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain().

See Also:
Constant Field Values

DECIDERULES_MAP_XPATH

public static final java.lang.String DECIDERULES_MAP_XPATH
Xpath needed by Job.editOrderXML_crawlerTraps().

See Also:
Constant Field Values

HERITRIX_USER_AGENT_XPATH

public static final java.lang.String HERITRIX_USER_AGENT_XPATH
Xpath checked by Heritrix for correct user-agent field in requests.

See Also:
Constant Field Values

HERITRIX_FROM_XPATH

public static final java.lang.String HERITRIX_FROM_XPATH
Xpath checked by Heritrix for correct mail address.

See Also:
Constant Field Values

DECIDINGSCOPE_XPATH

public static final java.lang.String DECIDINGSCOPE_XPATH
Xpath to check, that all templates use the DecidingScope.


DEDUPLICATOR_XPATH

public static final java.lang.String DEDUPLICATOR_XPATH
Xpath for the deduplicator node in order.xml documents.

See Also:
Constant Field Values

ARCHIVER_PATH_XPATH

public static final java.lang.String ARCHIVER_PATH_XPATH
Xpath to check, that all templates use the same archiver path, Constants.ARCDIRECTORY_NAME. The archive path tells Heritrix to which directory it shall write its arc files.

See Also:
Constant Field Values
Constructor Detail

HeritrixTemplate

public HeritrixTemplate(org.dom4j.Document doc,
                        boolean verify)
Constructor for HeritrixTemplate class.

Parameters:
doc - the order.xml
verify - If true, verifies if the given dom4j Document contains the elements required by our software.
Throws:
ArgumentNotValid - if doc is null, or verify is true and doc does not obey the constraints required by our software.

HeritrixTemplate

public HeritrixTemplate(org.dom4j.Document doc)
Alternate constructor, which always verifies the given document.

Parameters:
doc -
Method Detail

getTemplate

public org.dom4j.Document getTemplate()
return the template.

Returns:
the template

isVerified

public boolean isVerified()
Has Template been verified?

Returns:
true, if verified on construction, otherwise false

getXML

public java.lang.String getXML()
Return HeritrixTemplate as XML.

Returns:
HeritrixTemplate as XML