Class H1HeritrixTemplate
- java.lang.Object
-
- dk.netarkivet.harvester.datamodel.HeritrixTemplate
-
- dk.netarkivet.harvester.datamodel.H1HeritrixTemplate
-
- All Implemented Interfaces:
Serializable
public class H1HeritrixTemplate extends HeritrixTemplate implements Serializable
Class encapsulating the Heritrix order.xml. Enables verification that dom4j Document obey the constraints required by our software, specifically the Job class.The class assumes the type of order.xml used in configuring Heritrix version 1.10+. Information about the Heritrix crawler, and its processes and modules can be found in the Heritrix developer and user manuals found on http://crawler.archive.org
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static String
ARC_ARCHIVER_PATH_XPATH
Xpath to check, that all templates use the same ARC archiver path,Constants.ARCDIRECTORY_NAME
.static String
ARCHIVEFILE_PREFIX_XPATH
Xpath for the arcfile 'prefix' in the order.xml .static String
ARCS_ENABLED_XPATH
static String
ARCSDIR_XPATH
Xpath for the ARCs dir in the order.xml.static String
ARCWRITERPROCESSOR_XPATH
static String
DECIDERULES_ACCEPT_IF_PREREQUISITE_XPATH
Xpath needed by Job.editOrderXML_crawlerTraps().static String
DECIDERULES_MAP_XPATH
Xpath needed by Job.editOrderXML_crawlerTraps().static String
DECIDINGSCOPE_XPATH
Xpath to check, that all templates use the DecidingScope.static String
DEDUPLICATOR_ENABLED
Xpath for the boolean telling if the deduplicator is enabled in order.xml documents.static String
DEDUPLICATOR_INDEX_LOCATION_XPATH
Xpath for the deduplicator index directory node in order.xml documents.static String
DEDUPLICATOR_XPATH
Xpath for the deduplicator node in order.xml documents.static String
DISK_PATH_XPATH
Xpath for the 'disk-path' in the order.xml .static String
GROUP_MAX_ALL_KB_XPATH
Xpath needed by Job.editOrderXML_maxBytesPerDomain().static String
GROUP_MAX_FETCH_SUCCESS_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain().static String
HERITRIX_FROM_XPATH
Xpath checked by Heritrix for correct mail address.static String
HERITRIX_USER_AGENT_XPATH
Xpath checked by Heritrix for correct user-agent field in requests.static String
MAXTIMESEC_PATH_XPATH
Xpath to check, that all templates have the max-time-sec attribute.static String
METADATA_ITEMS_XPATH
Xpath for the WARC metadata in the order.xml.static String
QUEUE_TOTAL_BUDGET_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain().static String
QUOTA_ENFORCER_ENABLED_XPATH
Xpath needed by Job.editOrderXML_maxBytesPerDomain().static String
SEEDS_FILE_XPATH
Xpath for the 'seedsfile' in the order.xml.static String
WARC_ARCHIVER_PATH_XPATH
Xpath to check, that all templates use the same WARC archiver path,Constants.WARCDIRECTORY_NAME
.static String
WARCS_ENABLED_XPATH
Xpath for the WARCs dir in the order.xml.static String
WARCS_SKIP_IDENTICAL_DIGESTS_XPATH
static String
WARCS_WRITE_METADATA_OUTLINKS_XPATH
static String
WARCS_WRITE_METADATA_XPATH
static String
WARCS_WRITE_REQUESTS_XPATH
static String
WARCS_WRITE_REVISIT_FOR_IDENTICAL_DIGESTS_XPATH
static String
WARCS_WRITE_REVISIT_FOR_NOT_MODIFIED_XPATH
static String
WARCSDIR_XPATH
Xpath for the WARCs dir in the order.xml.static String
WARCWRITERPROCESSOR_XPATH
-
Fields inherited from class dk.netarkivet.harvester.datamodel.HeritrixTemplate
HARVESTINFO_AUDIENCE, HARVESTINFO_CHANNEL, HARVESTINFO_HARVESTFILENAMEPREFIX, HARVESTINFO_HARVESTNUM, HARVESTINFO_JOBID, HARVESTINFO_JOBSUBMITDATE, HARVESTINFO_MAXBYTESPERDOMAIN, HARVESTINFO_MAXOBJECTSPERDOMAIN, HARVESTINFO_OPERATOR, HARVESTINFO_ORDERXMLDESCRIPTION, HARVESTINFO_ORDERXMLNAME, HARVESTINFO_ORDERXMLUPDATEDATE, HARVESTINFO_ORIGHARVESTDEFINITIONCOMMENTS, HARVESTINFO_ORIGHARVESTDEFINITIONID, HARVESTINFO_ORIGHARVESTDEFINITIONNAME, HARVESTINFO_PERFORMER, HARVESTINFO_SCHEDULENAME, HARVESTINFO_VERSION, HARVESTINFO_VERSION_NUMBER, template_id
-
-
Constructor Summary
Constructors Constructor Description H1HeritrixTemplate(long template_id, String templateAsString)
H1HeritrixTemplate(org.dom4j.Document doc)
Alternate constructor, which always verifies the given document.H1HeritrixTemplate(org.dom4j.Document doc, boolean verify)
Constructor for HeritrixTemplate class.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
configureQuotaEnforcer(boolean maxObjectsIsSetByQuotaEnforcer, long forceMaxBytesPerDomain, long forceMaxObjectsPerDomain)
Activates or deactivate the quota-enforcer, depending on budget definition.static void
editOrderXML_configureQuotaEnforcer(org.dom4j.Document orderXMLdoc, boolean maxObjectsIsSetByQuotaEnforcer, long forceMaxBytesPerDomain, long forceMaxObjectsPerDomain)
Activates or deactivate the quota-enforcer, depending on budget definition.static void
editOrderXML_maxObjectsPerDomain(org.dom4j.Document orderXMLdoc, long forceMaxObjectsPerDomain, boolean maxObjectsIsSetByQuotaEnforcer)
Auxiliary method to modify the orderXMLdoc Document with respect to setting the maximum number of objects to be retrieved per domain.static void
editOrderXMLAddCrawlerTraps(org.dom4j.Document orderXMLdoc, String elementName, List<String> crawlerTraps)
Method to add a list of crawler traps with a given element name.void
enableOrDisableDeduplication(boolean enabled)
Long
getMaxBytesPerDomain()
Long
getMaxObjectsPerDomain()
org.dom4j.Document
getTemplate()
return the template.String
getText()
Only available for H1 templates.String
getXML()
Return HeritrixTemplate as XML.boolean
hasContent()
void
insertAttributes(List<EAV.AttributeAndType> attributesAndTypes)
Try to insert the given list of attributes into the template.void
insertCrawlerTraps(String elementName, List<String> crawlerTraps)
Method to add a list of crawler traps with a given element name.void
insertUmbrabean(String jobName, String rabbitMQUrl, String limitSearchRegEx)
Inserts all nevessary umbra-related beans in this template.void
insertWarcInfoMetadata(Job ajob, String origHarvestdefinitionName, String origHarvestdefinitionComments, String scheduleName, String performer)
Method to add settings to the WARCWriterProcesser, so that it can generate a proper WARCINFO record.boolean
IsDeduplicationEnabled()
Return true if the templatefile has deduplication enabled.boolean
isValid()
boolean
isVerified()
Has Template been verified?void
removeDeduplicatorIfPresent()
Try to remove the deduplicator, if present in the template.void
setArchiveFilePrefix(String archiveFilePrefix)
void
setArchiveFormat(String archiveFormat)
Make sure that Heritrix will archive its data in the chosen archiveFormat.void
setDeduplicationIndexLocation(String absolutePath)
void
setDiskPath(String absolutePath)
void
setMaxBytesPerDomain(Long forceMaxBytesPerDomain)
Auxiliary method to modify the orderXMLdoc Document with respect to setting the maximum number of bytes to retrieve per domain.void
setMaxJobRunningTime(Long maxJobRunningTimeSecondsL)
Set the maxRunning time for the harvestvoid
setMaxObjectsPerDomain(Long maxobjectsL)
void
setRecoverlogNode(File recoverlogGzFile)
void
setSeedsFilePath(String absolutePath)
void
writeTemplate(OutputStream os)
void
writeTemplate(javax.servlet.jsp.JspWriter out)
void
writeToFile(File orderXmlFile)
-
Methods inherited from class dk.netarkivet.harvester.datamodel.HeritrixTemplate
editOrderXMLAddPerDomainCrawlerTraps, getTemplateFromString, isActive, read, read, setIsActive
-
-
-
-
Field Detail
-
QUOTA_ENFORCER_ENABLED_XPATH
public static final String QUOTA_ENFORCER_ENABLED_XPATH
Xpath needed by Job.editOrderXML_maxBytesPerDomain().- See Also:
- Constant Field Values
-
GROUP_MAX_ALL_KB_XPATH
public static final String GROUP_MAX_ALL_KB_XPATH
Xpath needed by Job.editOrderXML_maxBytesPerDomain().- See Also:
- Constant Field Values
-
GROUP_MAX_FETCH_SUCCESS_XPATH
public static final String GROUP_MAX_FETCH_SUCCESS_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain().- See Also:
- Constant Field Values
-
QUEUE_TOTAL_BUDGET_XPATH
public static final String QUEUE_TOTAL_BUDGET_XPATH
Xpath needed by Job.editOrderXML_maxObjectsPerDomain().- See Also:
- Constant Field Values
-
DECIDERULES_MAP_XPATH
public static final String DECIDERULES_MAP_XPATH
Xpath needed by Job.editOrderXML_crawlerTraps().- See Also:
- Constant Field Values
-
DECIDERULES_ACCEPT_IF_PREREQUISITE_XPATH
public static final String DECIDERULES_ACCEPT_IF_PREREQUISITE_XPATH
Xpath needed by Job.editOrderXML_crawlerTraps().- See Also:
- Constant Field Values
-
HERITRIX_USER_AGENT_XPATH
public static final String HERITRIX_USER_AGENT_XPATH
Xpath checked by Heritrix for correct user-agent field in requests.- See Also:
- Constant Field Values
-
HERITRIX_FROM_XPATH
public static final String HERITRIX_FROM_XPATH
Xpath checked by Heritrix for correct mail address.- See Also:
- Constant Field Values
-
DECIDINGSCOPE_XPATH
public static final String DECIDINGSCOPE_XPATH
Xpath to check, that all templates use the DecidingScope.- See Also:
- Constant Field Values
-
DEDUPLICATOR_XPATH
public static final String DEDUPLICATOR_XPATH
Xpath for the deduplicator node in order.xml documents.- See Also:
- Constant Field Values
-
ARC_ARCHIVER_PATH_XPATH
public static final String ARC_ARCHIVER_PATH_XPATH
Xpath to check, that all templates use the same ARC archiver path,Constants.ARCDIRECTORY_NAME
. The archive path tells Heritrix to which directory it shall write its arc files.- See Also:
- Constant Field Values
-
WARC_ARCHIVER_PATH_XPATH
public static final String WARC_ARCHIVER_PATH_XPATH
Xpath to check, that all templates use the same WARC archiver path,Constants.WARCDIRECTORY_NAME
. The archive path tells Heritrix to which directory it shall write its arc files.- See Also:
- Constant Field Values
-
DEDUPLICATOR_INDEX_LOCATION_XPATH
public static final String DEDUPLICATOR_INDEX_LOCATION_XPATH
Xpath for the deduplicator index directory node in order.xml documents.- See Also:
- Constant Field Values
-
DEDUPLICATOR_ENABLED
public static final String DEDUPLICATOR_ENABLED
Xpath for the boolean telling if the deduplicator is enabled in order.xml documents.- See Also:
- Constant Field Values
-
DISK_PATH_XPATH
public static final String DISK_PATH_XPATH
Xpath for the 'disk-path' in the order.xml .- See Also:
- Constant Field Values
-
ARCHIVEFILE_PREFIX_XPATH
public static final String ARCHIVEFILE_PREFIX_XPATH
Xpath for the arcfile 'prefix' in the order.xml .- See Also:
- Constant Field Values
-
ARCSDIR_XPATH
public static final String ARCSDIR_XPATH
Xpath for the ARCs dir in the order.xml.- See Also:
- Constant Field Values
-
WARCWRITERPROCESSOR_XPATH
public static final String WARCWRITERPROCESSOR_XPATH
- See Also:
- Constant Field Values
-
ARCWRITERPROCESSOR_XPATH
public static final String ARCWRITERPROCESSOR_XPATH
- See Also:
- Constant Field Values
-
WARCSDIR_XPATH
public static final String WARCSDIR_XPATH
Xpath for the WARCs dir in the order.xml.- See Also:
- Constant Field Values
-
SEEDS_FILE_XPATH
public static final String SEEDS_FILE_XPATH
Xpath for the 'seedsfile' in the order.xml.- See Also:
- Constant Field Values
-
ARCS_ENABLED_XPATH
public static final String ARCS_ENABLED_XPATH
- See Also:
- Constant Field Values
-
WARCS_ENABLED_XPATH
public static final String WARCS_ENABLED_XPATH
Xpath for the WARCs dir in the order.xml.- See Also:
- Constant Field Values
-
WARCS_WRITE_REQUESTS_XPATH
public static final String WARCS_WRITE_REQUESTS_XPATH
- See Also:
- Constant Field Values
-
WARCS_WRITE_METADATA_XPATH
public static final String WARCS_WRITE_METADATA_XPATH
- See Also:
- Constant Field Values
-
WARCS_WRITE_METADATA_OUTLINKS_XPATH
public static final String WARCS_WRITE_METADATA_OUTLINKS_XPATH
- See Also:
- Constant Field Values
-
WARCS_SKIP_IDENTICAL_DIGESTS_XPATH
public static final String WARCS_SKIP_IDENTICAL_DIGESTS_XPATH
- See Also:
- Constant Field Values
-
WARCS_WRITE_REVISIT_FOR_IDENTICAL_DIGESTS_XPATH
public static final String WARCS_WRITE_REVISIT_FOR_IDENTICAL_DIGESTS_XPATH
- See Also:
- Constant Field Values
-
WARCS_WRITE_REVISIT_FOR_NOT_MODIFIED_XPATH
public static final String WARCS_WRITE_REVISIT_FOR_NOT_MODIFIED_XPATH
- See Also:
- Constant Field Values
-
METADATA_ITEMS_XPATH
public static final String METADATA_ITEMS_XPATH
Xpath for the WARC metadata in the order.xml.- See Also:
- Constant Field Values
-
MAXTIMESEC_PATH_XPATH
public static final String MAXTIMESEC_PATH_XPATH
Xpath to check, that all templates have the max-time-sec attribute.- See Also:
- Constant Field Values
-
-
Constructor Detail
-
H1HeritrixTemplate
public H1HeritrixTemplate(org.dom4j.Document doc, boolean verify)
Constructor for HeritrixTemplate class.- Parameters:
doc
- the order.xmlverify
- If true, verifies if the given dom4j Document contains the elements required by our software.- Throws:
ArgumentNotValid
- if doc is null, or verify is true and doc does not obey the constraints required by our software.
-
H1HeritrixTemplate
public H1HeritrixTemplate(org.dom4j.Document doc)
Alternate constructor, which always verifies the given document.- Parameters:
doc
-
-
H1HeritrixTemplate
public H1HeritrixTemplate(long template_id, String templateAsString) throws org.dom4j.DocumentException
- Throws:
org.dom4j.DocumentException
-
-
Method Detail
-
getTemplate
public org.dom4j.Document getTemplate()
return the template.- Returns:
- the template
-
isVerified
public boolean isVerified()
Has Template been verified?- Returns:
- true, if verified on construction, otherwise false
-
getXML
public String getXML()
Return HeritrixTemplate as XML.- Specified by:
getXML
in classHeritrixTemplate
- Returns:
- HeritrixTemplate as XML
-
editOrderXMLAddCrawlerTraps
public static void editOrderXMLAddCrawlerTraps(org.dom4j.Document orderXMLdoc, String elementName, List<String> crawlerTraps)
Method to add a list of crawler traps with a given element name. It is used both to add per-domain traps and global traps.- Parameters:
elementName
- The name of the added element.crawlerTraps
- A list of crawler trap regular expressions to add to this job.
-
editOrderXML_maxObjectsPerDomain
public static void editOrderXML_maxObjectsPerDomain(org.dom4j.Document orderXMLdoc, long forceMaxObjectsPerDomain, boolean maxObjectsIsSetByQuotaEnforcer)
Auxiliary method to modify the orderXMLdoc Document with respect to setting the maximum number of objects to be retrieved per domain. This method updates 'group-max-fetch-success' element of the QuotaEnforcer pre-fetch processor node (org.archive.crawler.frontier.BdbFrontier) with the value of the argument forceMaxObjectsPerDomain- Parameters:
orderXMLdoc
-forceMaxObjectsPerDomain
- The maximum number of objects to retrieve per domain, or 0 for no limit.- Throws:
PermissionDenied
- If unable to replace the frontier node of the orderXMLdoc DocumentIOFailure
- If the group-max-fetch-success element is not found in the orderXml. TODO The group-max-fetch-success check should also be performed in TemplateDAO.create, TemplateDAO.update
-
editOrderXML_configureQuotaEnforcer
public static void editOrderXML_configureQuotaEnforcer(org.dom4j.Document orderXMLdoc, boolean maxObjectsIsSetByQuotaEnforcer, long forceMaxBytesPerDomain, long forceMaxObjectsPerDomain)
Activates or deactivate the quota-enforcer, depending on budget definition. Object limit can be defined either by using the queue-total-budget property or the quota enforcer. Which is chosen is set by the argument maxObjectsIsSetByQuotaEnforcer}'s value. So quota enforcer is set as follows:- Object limit is not set by quota enforcer, disabled only if there is no byte limit.
- Object limit is set by quota enforcer, so it should be enabled whether a byte or object limit is set.
- Parameters:
orderXMLdoc
- the template to modifymaxObjectsIsSetByQuotaEnforcer
- Decides whether the maxObjectsIsSetByQuotaEnforcer or not.forceMaxBytesPerDomain
- The number of max bytes per domain enforced (can be no limit)forceMaxObjectsPerDomain
- The number of max objects per domain enforced (can be no limit)
-
isValid
public boolean isValid()
- Specified by:
isValid
in classHeritrixTemplate
- Returns:
- true, if the template is valid, otherwise false
-
configureQuotaEnforcer
public void configureQuotaEnforcer(boolean maxObjectsIsSetByQuotaEnforcer, long forceMaxBytesPerDomain, long forceMaxObjectsPerDomain)
Description copied from class:HeritrixTemplate
Activates or deactivate the quota-enforcer, depending on budget definition. Object limit can be defined either by using the queue-total-budget property or the quota enforcer. Which is chosen is set by the argument maxObjectsIsSetByQuotaEnforcer}'s value. So quota enforcer is set as follows:- Object limit is not set by quota enforcer, disabled only if there is no byte limit.
- Object limit is set by quota enforcer, so it should be enabled whether a byte or object limit is set.
- Specified by:
configureQuotaEnforcer
in classHeritrixTemplate
- Parameters:
maxObjectsIsSetByQuotaEnforcer
- Decides whether the maxObjectsIsSetByQuotaEnforcer or not.forceMaxBytesPerDomain
- The number of max bytes per domain enforced (can be no limit)forceMaxObjectsPerDomain
- The number of max objects per domain enforced (can be no limit)
-
setMaxBytesPerDomain
public void setMaxBytesPerDomain(Long forceMaxBytesPerDomain)
Auxiliary method to modify the orderXMLdoc Document with respect to setting the maximum number of bytes to retrieve per domain. This method updates 'group-max-all-kb' element of the 'QuotaEnforcer' node, which again is a subelement of 'pre-fetch-processors' node. with the value of the argument forceMaxBytesPerDomain- Specified by:
setMaxBytesPerDomain
in classHeritrixTemplate
- Parameters:
forceMaxBytesPerDomain
- The maximum number of byte to retrieve per domain, or -1 for no limit. Note that the number is divided by 1024 before being inserted into the orderXml, as Heritrix expects KB.- Throws:
PermissionDenied
- If unable to replace the QuotaEnforcer node of the orderXMLdoc DocumentIOFailure
- If the group-max-all-kb element cannot be found. TODO This group-max-all-kb check also be performed in TemplateDAO.create, TemplateDAO.update
-
getMaxBytesPerDomain
public Long getMaxBytesPerDomain()
- Specified by:
getMaxBytesPerDomain
in classHeritrixTemplate
-
setMaxObjectsPerDomain
public void setMaxObjectsPerDomain(Long maxobjectsL)
- Specified by:
setMaxObjectsPerDomain
in classHeritrixTemplate
-
getMaxObjectsPerDomain
public Long getMaxObjectsPerDomain()
- Specified by:
getMaxObjectsPerDomain
in classHeritrixTemplate
-
IsDeduplicationEnabled
public boolean IsDeduplicationEnabled()
Return true if the templatefile has deduplication enabled.- Specified by:
IsDeduplicationEnabled
in classHeritrixTemplate
- Returns:
- True if Deduplicator is enabled.
-
setArchiveFormat
public void setArchiveFormat(String archiveFormat)
Description copied from class:HeritrixTemplate
Make sure that Heritrix will archive its data in the chosen archiveFormat.- Specified by:
setArchiveFormat
in classHeritrixTemplate
- Parameters:
archiveFormat
- the chosen archiveformat ('arc' or 'warc' supported) Throws ArgumentNotValid If the chosen archiveFormat is not supported.
-
setMaxJobRunningTime
public void setMaxJobRunningTime(Long maxJobRunningTimeSecondsL)
Description copied from class:HeritrixTemplate
Set the maxRunning time for the harvest- Specified by:
setMaxJobRunningTime
in classHeritrixTemplate
- Parameters:
maxJobRunningTimeSecondsL
- Limit the harvest to this number of seconds
-
writeTemplate
public void writeTemplate(OutputStream os) throws IOException, ArgumentNotValid
- Specified by:
writeTemplate
in classHeritrixTemplate
- Throws:
IOException
ArgumentNotValid
-
getText
public String getText()
Only available for H1 templates.- Returns:
- the template as a String.
-
insertCrawlerTraps
public void insertCrawlerTraps(String elementName, List<String> crawlerTraps)
Description copied from class:HeritrixTemplate
Method to add a list of crawler traps with a given element name. It is used both to add per-domain traps and global traps.- Specified by:
insertCrawlerTraps
in classHeritrixTemplate
- Parameters:
elementName
- The name of the added element.crawlerTraps
- A list of crawler trap regular expressions to add to this job.
-
hasContent
public boolean hasContent()
- Specified by:
hasContent
in classHeritrixTemplate
-
writeToFile
public void writeToFile(File orderXmlFile)
- Specified by:
writeToFile
in classHeritrixTemplate
-
setRecoverlogNode
public void setRecoverlogNode(File recoverlogGzFile)
- Specified by:
setRecoverlogNode
in classHeritrixTemplate
-
setDeduplicationIndexLocation
public void setDeduplicationIndexLocation(String absolutePath)
- Specified by:
setDeduplicationIndexLocation
in classHeritrixTemplate
-
setSeedsFilePath
public void setSeedsFilePath(String absolutePath)
- Specified by:
setSeedsFilePath
in classHeritrixTemplate
-
setArchiveFilePrefix
public void setArchiveFilePrefix(String archiveFilePrefix)
- Specified by:
setArchiveFilePrefix
in classHeritrixTemplate
-
setDiskPath
public void setDiskPath(String absolutePath)
- Specified by:
setDiskPath
in classHeritrixTemplate
-
removeDeduplicatorIfPresent
public void removeDeduplicatorIfPresent()
Description copied from class:HeritrixTemplate
Try to remove the deduplicator, if present in the template.- Specified by:
removeDeduplicatorIfPresent
in classHeritrixTemplate
-
enableOrDisableDeduplication
public void enableOrDisableDeduplication(boolean enabled)
- Specified by:
enableOrDisableDeduplication
in classHeritrixTemplate
-
insertWarcInfoMetadata
public void insertWarcInfoMetadata(Job ajob, String origHarvestdefinitionName, String origHarvestdefinitionComments, String scheduleName, String performer)
Description copied from class:HeritrixTemplate
Method to add settings to the WARCWriterProcesser, so that it can generate a proper WARCINFO record.- Specified by:
insertWarcInfoMetadata
in classHeritrixTemplate
- Parameters:
ajob
- a HarvestJoborigHarvestdefinitionName
- The name of the harvestdefinition behind this jobscheduleName
- The name of the schedule used. (Will be null, if the job is not a selectiveHarvest).performer
- The name of organisation/person doing this harvest
-
insertAttributes
public void insertAttributes(List<EAV.AttributeAndType> attributesAndTypes)
Description copied from class:HeritrixTemplate
Try to insert the given list of attributes into the template.- Specified by:
insertAttributes
in classHeritrixTemplate
-
writeTemplate
public void writeTemplate(javax.servlet.jsp.JspWriter out) throws IOFailure
- Specified by:
writeTemplate
in classHeritrixTemplate
- Throws:
IOFailure
-
insertUmbrabean
public void insertUmbrabean(String jobName, String rabbitMQUrl, String limitSearchRegEx)
Description copied from class:HeritrixTemplate
Inserts all nevessary umbra-related beans in this template.- Specified by:
insertUmbrabean
in classHeritrixTemplate
- Parameters:
jobName
- a String representing the job - must be unique for the this NAS environment for all timerabbitMQUrl
- the URL of the rabbitMQ socket connection (amqp://) to which umbra requests are to be sentlimitSearchRegEx
- the regular expression used to limit the heritrix search-path of urls to be sent to Umbra.
-
-