|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object dk.netarkivet.harvester.datamodel.HeritrixTemplate
public class HeritrixTemplate
Class encapsulating the Heritrix order.xml. Enables verification that dom4j Document obey the constraints required by our software, specifically the Job class. The class assumes the type of order.xml used in configuring Heritrix version 1.10+. Information about the Heritrix crawler, and its processes and modules can be found in the Heritrix developer and user manuals found on http://crawler.archive.org
Field Summary | |
---|---|
static java.lang.String |
ARCHIVER_PATH_XPATH
Xpath to check, that all templates use the same archiver path, Constants.ARCDIRECTORY_NAME . |
static java.lang.String |
DECIDERULES_MAP_XPATH
Xpath needed by Job.editOrderXML_crawlerTraps(). |
static java.lang.String |
DECIDINGSCOPE_XPATH
Xpath to check, that all templates use the DecidingScope. |
static java.lang.String |
DEDUPLICATOR_XPATH
Xpath for the deduplicator node in order.xml documents. |
static java.lang.String |
GROUP_MAX_ALL_KB_XPATH
Xpath needed by Job.editOrderXML_maxBytesPerDomain(). |
static java.lang.String |
GROUP_MAX_FETCH_SUCCESS_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain(). |
static java.lang.String |
HERITRIX_FROM_XPATH
Xpath checked by Heritrix for correct mail address. |
static java.lang.String |
HERITRIX_USER_AGENT_XPATH
Xpath checked by Heritrix for correct user-agent field in requests. |
static java.lang.String |
MAXTIMESEC_PATH_XPATH
Xpath to check, that all templates have the max-time-sec attribute. |
static java.lang.String |
QUEUE_TOTAL_BUDGET_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain(). |
static java.lang.String |
QUOTA_ENFORCER_ENABLED_XPATH
Xpath needed by Job.editOrderXML_maxBytesPerDomain(). |
Constructor Summary | |
---|---|
HeritrixTemplate(org.dom4j.Document doc)
Alternate constructor, which always verifies the given document. |
|
HeritrixTemplate(org.dom4j.Document doc,
boolean verify)
Constructor for HeritrixTemplate class. |
Method Summary | |
---|---|
org.dom4j.Document |
getTemplate()
return the template. |
java.lang.String |
getXML()
Return HeritrixTemplate as XML. |
boolean |
isVerified()
Has Template been verified? |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final java.lang.String QUOTA_ENFORCER_ENABLED_XPATH
public static final java.lang.String GROUP_MAX_ALL_KB_XPATH
public static final java.lang.String GROUP_MAX_FETCH_SUCCESS_XPATH
public static final java.lang.String QUEUE_TOTAL_BUDGET_XPATH
public static final java.lang.String DECIDERULES_MAP_XPATH
public static final java.lang.String HERITRIX_USER_AGENT_XPATH
public static final java.lang.String HERITRIX_FROM_XPATH
public static final java.lang.String DECIDINGSCOPE_XPATH
public static final java.lang.String DEDUPLICATOR_XPATH
public static final java.lang.String ARCHIVER_PATH_XPATH
Constants.ARCDIRECTORY_NAME
.
The archive path tells Heritrix to which directory it shall write
its arc files.
public static final java.lang.String MAXTIMESEC_PATH_XPATH
Constructor Detail |
---|
public HeritrixTemplate(org.dom4j.Document doc, boolean verify)
doc
- the order.xmlverify
- If true, verifies if the given dom4j Document contains
the elements required by our software.
ArgumentNotValid
- if doc is null, or verify is true and doc does
not obey the constraints required by our software.public HeritrixTemplate(org.dom4j.Document doc)
doc
- Method Detail |
---|
public org.dom4j.Document getTemplate()
public boolean isVerified()
public java.lang.String getXML()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |