ExtractorOAI

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

dk.netarkivet.harvester.harvesting.extractor
Class ExtractorOAI

java.lang.Object
  javax.management.Attribute
      org.archive.crawler.settings.Type
          org.archive.crawler.settings.ComplexType
              org.archive.crawler.settings.ModuleType
                  org.archive.crawler.framework.Processor
                      org.archive.crawler.extractor.Extractor
                          dk.netarkivet.harvester.harvesting.extractor.ExtractorOAI

All Implemented Interfaces:: java.io.Serializable, javax.management.DynamicMBean

public class ExtractorOAI
extends org.archive.crawler.extractor.Extractor
extends org.archive.crawler.extractor.Extractor

This is a link extractor for use with Heritrix. It will find the resumptionToken in an OAI-PMH listMetadata query and construct the link for the next page of the results. This extractor will not extract any other links so if there are additional urls in the OAI metadata then an additional extractor should be used for these. Typically this means that the extractor chain in the order template will end: true true

See Also:: Serialized Form

Nested Class Summary

Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
`org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator`

Field Summary
`(package private) org.apache.commons.logging.Log`	`log` The class logger.

Fields inherited from class org.archive.crawler.framework.Processor
`ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules`

Fields inherited from class org.archive.crawler.settings.ComplexType
`definition, definitionMap`

Constructor Summary
`ExtractorOAI(java.lang.String name)` Constructor for this extractor.

Method Summary
`protected void`	`extract(org.archive.crawler.datamodel.CrawlURI curi)` Perform the link extraction on the current crawl uri.
`boolean`	`processXml(org.archive.crawler.datamodel.CrawlURI curi, java.lang.CharSequence cs)` Searches for resumption token and adds link if it is found.
`java.lang.String`	`report()` Return a report from this processor.

Methods inherited from class org.archive.crawler.extractor.Extractor
`innerProcess`

Methods inherited from class org.archive.crawler.framework.Processor
`checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn`

Methods inherited from class org.archive.crawler.settings.ModuleType
`addElement, listUsedFiles`

Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute

Methods inherited from class org.archive.crawler.settings.ComplexType

addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute

Methods inherited from class org.archive.crawler.settings.Type
`addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient`

Methods inherited from class javax.management.Attribute
`getName, hashCode`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, wait, wait, wait`

Field Detail

log

final org.apache.commons.logging.Log log

The class logger.

Constructor Detail

ExtractorOAI

public ExtractorOAI(java.lang.String name)

Constructor for this extractor.

Parameters:: name - the name of this extractor

Method Detail

extract

protected void extract(org.archive.crawler.datamodel.CrawlURI curi)

Perform the link extraction on the current crawl uri. This method does not set linkExtractorFinished() on the current crawlURI, so subsequent extractors in the chain can find more links.

Specified by:: extract in class org.archive.crawler.extractor.Extractor

Parameters:: curi - the CrawlUI from which to extract the link.

processXml

public boolean processXml(org.archive.crawler.datamodel.CrawlURI curi,
                          java.lang.CharSequence cs)

Searches for resumption token and adds link if it is found. Returns true iff a link is added.

Parameters:: curi - the CrawlURI.; cs - the character sequency in which to search.
Returns:: true iff a resumptionToken is found and a link added.

report

public java.lang.String report()

Return a report from this processor.

Overrides:: report in class org.archive.crawler.framework.Processor

Returns:: the report.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

dk.netarkivet.harvester.harvesting.extractor Class ExtractorOAI

log

ExtractorOAI

extract

processXml

report

dk.netarkivet.harvester.harvesting.extractor
Class ExtractorOAI