dk.netarkivet.harvester.datamodel
Class HeritrixTemplate

java.lang.Object
  extended by dk.netarkivet.harvester.datamodel.HeritrixTemplate

public class HeritrixTemplate
extends java.lang.Object

Class encapsulating the Heritrix order.xml. Enables verification that dom4j Document obey the constraints required by our software, specifically the Job class. The class assumes the type of order.xml used in configuring Heritrix version 1.10+. Information about the Heritrix crawler, and its processes and modules can be found in the Heritrix developer and user manuals found on http://crawler.archive.org


Field Summary
static java.lang.String ARCHIVER_PATH_XPATH
          Xpath to check, that all templates use the same archiver path, Constants.ARCDIRECTORY_NAME.
static java.lang.String DECIDERULES_MAP_XPATH
          Xpath needed by Job.editOrderXML_crawlerTraps().
static java.lang.String DECIDINGSCOPE_XPATH
          Xpath to check, that all templates use the DecidingScope.
static java.lang.String DEDUPLICATOR_XPATH
          Xpath for the deduplicator node in order.xml documents.
static java.lang.String GROUP_MAX_ALL_KB_XPATH
          Xpath needed by Job.editOrderXML_maxBytesPerDomain().
static java.lang.String GROUP_MAX_FETCH_SUCCESS_XPATH
          Xpath needed by Job.editOrderXML_maxObjectsPerDomain().
static java.lang.String HERITRIX_FROM_XPATH
          Xpath checked by Heritrix for correct mail address.
static java.lang.String HERITRIX_USER_AGENT_XPATH
          Xpath checked by Heritrix for correct user-agent field in requests.
static java.lang.String MAXTIMESEC_PATH_XPATH
          Xpath to check, that all templates have the max-time-sec attribute.
static java.lang.String QUEUE_TOTAL_BUDGET_XPATH
          Xpath needed by Job.editOrderXML_maxObjectsPerDomain().
static java.lang.String QUOTA_ENFORCER_ENABLED_XPATH
          Xpath needed by Job.editOrderXML_maxBytesPerDomain().
 
Constructor Summary
HeritrixTemplate(org.dom4j.Document doc)
          Alternate constructor, which always verifies the given document.
HeritrixTemplate(org.dom4j.Document doc, boolean verify)
          Constructor for HeritrixTemplate class.
 
Method Summary
 org.dom4j.Document getTemplate()
          return the template.
 java.lang.String getXML()
          Return HeritrixTemplate as XML.
 boolean isVerified()
          Has Template been verified?
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

QUOTA_ENFORCER_ENABLED_XPATH

public static final java.lang.String QUOTA_ENFORCER_ENABLED_XPATH
Xpath needed by Job.editOrderXML_maxBytesPerDomain().

See Also:
Constant Field Values

GROUP_MAX_ALL_KB_XPATH

public static final java.lang.String GROUP_MAX_ALL_KB_XPATH
Xpath needed by Job.editOrderXML_maxBytesPerDomain().

See Also:
Constant Field Values

GROUP_MAX_FETCH_SUCCESS_XPATH

public static final java.lang.String GROUP_MAX_FETCH_SUCCESS_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain().

See Also:
Constant Field Values

QUEUE_TOTAL_BUDGET_XPATH

public static final java.lang.String QUEUE_TOTAL_BUDGET_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain().

See Also:
Constant Field Values

DECIDERULES_MAP_XPATH

public static final java.lang.String DECIDERULES_MAP_XPATH
Xpath needed by Job.editOrderXML_crawlerTraps().

See Also:
Constant Field Values

HERITRIX_USER_AGENT_XPATH

public static final java.lang.String HERITRIX_USER_AGENT_XPATH
Xpath checked by Heritrix for correct user-agent field in requests.

See Also:
Constant Field Values

HERITRIX_FROM_XPATH

public static final java.lang.String HERITRIX_FROM_XPATH
Xpath checked by Heritrix for correct mail address.

See Also:
Constant Field Values

DECIDINGSCOPE_XPATH

public static final java.lang.String DECIDINGSCOPE_XPATH
Xpath to check, that all templates use the DecidingScope.


DEDUPLICATOR_XPATH

public static final java.lang.String DEDUPLICATOR_XPATH
Xpath for the deduplicator node in order.xml documents.

See Also:
Constant Field Values

ARCHIVER_PATH_XPATH

public static final java.lang.String ARCHIVER_PATH_XPATH
Xpath to check, that all templates use the same archiver path, Constants.ARCDIRECTORY_NAME. The archive path tells Heritrix to which directory it shall write its arc files.

See Also:
Constant Field Values

MAXTIMESEC_PATH_XPATH

public static final java.lang.String MAXTIMESEC_PATH_XPATH
Xpath to check, that all templates have the max-time-sec attribute.

See Also:
Constant Field Values
Constructor Detail

HeritrixTemplate

public HeritrixTemplate(org.dom4j.Document doc,
                        boolean verify)
Constructor for HeritrixTemplate class.

Parameters:
doc - the order.xml
verify - If true, verifies if the given dom4j Document contains the elements required by our software.
Throws:
ArgumentNotValid - if doc is null, or verify is true and doc does not obey the constraints required by our software.

HeritrixTemplate

public HeritrixTemplate(org.dom4j.Document doc)
Alternate constructor, which always verifies the given document.

Parameters:
doc -
Method Detail

getTemplate

public org.dom4j.Document getTemplate()
return the template.

Returns:
the template

isVerified

public boolean isVerified()
Has Template been verified?

Returns:
true, if verified on construction, otherwise false

getXML

public java.lang.String getXML()
Return HeritrixTemplate as XML.

Returns:
HeritrixTemplate as XML