public class ExtractorOAI extends org.archive.crawler.extractor.Extractor
Modifier and Type | Field and Description |
---|---|
static String |
EXTENDED_RESUMPTION_TOKEN_MATCH
Regular expression matching the extended resumptionToken with attributes like this.
|
static String |
SIMPLE_RESUMPTION_TOKEN_MATCH
Regular expression matching the simple resumptionToken like this.
|
Constructor and Description |
---|
ExtractorOAI(String name)
Constructor for this extractor.
|
Modifier and Type | Method and Description |
---|---|
protected void |
extract(org.archive.crawler.datamodel.CrawlURI curi)
Perform the link extraction on the current crawl uri.
|
boolean |
processXml(org.archive.crawler.datamodel.CrawlURI curi,
CharSequence cs)
Searches for resumption token and adds link if it is found.
|
String |
report()
Return a report from this processor.
|
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isEnabled, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
public static final String SIMPLE_RESUMPTION_TOKEN_MATCH
public static final String EXTENDED_RESUMPTION_TOKEN_MATCH
public ExtractorOAI(String name)
name
- the name of this extractorprotected void extract(org.archive.crawler.datamodel.CrawlURI curi)
extract
in class org.archive.crawler.extractor.Extractor
curi
- the CrawlUI from which to extract the link.public boolean processXml(org.archive.crawler.datamodel.CrawlURI curi, CharSequence cs)
curi
- the CrawlURI.cs
- the character sequency in which to search.Copyright © 2005–2015 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.