|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object javax.management.Attribute org.archive.crawler.settings.Type org.archive.crawler.settings.ComplexType org.archive.crawler.settings.ModuleType org.archive.crawler.framework.Processor org.archive.crawler.extractor.Extractor dk.netarkivet.harvester.harvesting.extractor.ExtractorOAI
public class ExtractorOAI
This is a link extractor for use with Heritrix. It will find the resumptionToken
in an OAI-PMH listMetadata query and construct the link for the next page of
the results. This extractor will not extract any other links so if there are
additional urls in the OAI metadata then an additional extractor should be used
for these. Typically this means that the extractor chain in the order template
will end:
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
---|
org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator |
Field Summary | |
---|---|
(package private) org.apache.commons.logging.Log |
log
The class logger. |
Fields inherited from class org.archive.crawler.framework.Processor |
---|
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules |
Fields inherited from class org.archive.crawler.settings.ComplexType |
---|
definition, definitionMap |
Constructor Summary | |
---|---|
ExtractorOAI(java.lang.String name)
Constructor for this extractor. |
Method Summary | |
---|---|
protected void |
extract(org.archive.crawler.datamodel.CrawlURI curi)
Perform the link extraction on the current crawl uri. |
boolean |
processXml(org.archive.crawler.datamodel.CrawlURI curi,
java.lang.CharSequence cs)
Searches for resumption token and adds link if it is found. |
java.lang.String |
report()
Return a report from this processor. |
Methods inherited from class org.archive.crawler.extractor.Extractor |
---|
innerProcess |
Methods inherited from class org.archive.crawler.framework.Processor |
---|
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
Methods inherited from class org.archive.crawler.settings.ModuleType |
---|
addElement, listUsedFiles |
Methods inherited from class org.archive.crawler.settings.ComplexType |
---|
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
---|
getName, hashCode |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
final org.apache.commons.logging.Log log
Constructor Detail |
---|
public ExtractorOAI(java.lang.String name)
name
- the name of this extractorMethod Detail |
---|
protected void extract(org.archive.crawler.datamodel.CrawlURI curi)
extract
in class org.archive.crawler.extractor.Extractor
curi
- the CrawlUI from which to extract the link.public boolean processXml(org.archive.crawler.datamodel.CrawlURI curi, java.lang.CharSequence cs)
curi
- the CrawlURI.cs
- the character sequency in which to search.
public java.lang.String report()
report
in class org.archive.crawler.framework.Processor
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |