public class ExtractorOAI extends org.archive.modules.extractor.ContentExtractor
Modifier and Type | Field and Description |
---|---|
static String |
EXTENDED_RESUMPTION_TOKEN_MATCH
Regular expression matching the extended resumptionToken with attributes like this.
|
static String |
SIMPLE_RESUMPTION_TOKEN_MATCH
Regular expression matching the simple resumptionToken like this.
|
Constructor and Description |
---|
ExtractorOAI()
Constructor for this extractor.
|
Modifier and Type | Method and Description |
---|---|
protected boolean |
innerExtract(org.archive.modules.CrawlURI curi)
Perform the link extraction on the current crawl uri.
|
boolean |
processXml(org.archive.modules.CrawlURI curi,
CharSequence cs)
Searches for resumption token and adds link if it is found.
|
String |
report()
Return a report from this processor.
|
protected boolean |
shouldExtract(org.archive.modules.CrawlURI curi) |
add, addOutlink, addOutlink, addRelativeToBase, addRelativeToVia, fromCheckpointJson, getExtractorParameters, getLoggerModule, innerProcess, logUriError, setExtractorParameters, setLoggerModule, toCheckpointJson
doCheckpoint, finishCheckpoint, flattenVia, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerProcessResult, innerRejectProcess, isRunning, isSuccess, process, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop
public static final String SIMPLE_RESUMPTION_TOKEN_MATCH
public static final String EXTENDED_RESUMPTION_TOKEN_MATCH
public ExtractorOAI()
protected boolean innerExtract(org.archive.modules.CrawlURI curi)
innerExtract
in class org.archive.modules.extractor.ContentExtractor
curi
- the CrawlUI from which to extract the link.public boolean processXml(org.archive.modules.CrawlURI curi, CharSequence cs)
curi
- the CrawlURI.cs
- the character sequence in which to search.public String report()
report
in class org.archive.modules.extractor.Extractor
protected boolean shouldExtract(org.archive.modules.CrawlURI curi)
shouldExtract
in class org.archive.modules.extractor.ContentExtractor
Copyright © 2005–2018 The Royal Danish Library, the National Library of France and the Austrian National Library.. All rights reserved.