Class NasWARCProcessor
- java.lang.Object
-
- org.archive.modules.Processor
-
- org.archive.modules.writer.WriterPoolProcessor
-
- org.archive.modules.writer.BaseWARCWriterProcessor
-
- org.archive.modules.writer.WARCWriterProcessor
-
- dk.netarkivet.harvester.harvesting.NasWARCProcessor
-
- All Implemented Interfaces:
org.archive.checkpointing.Checkpointable
,org.archive.io.warc.WARCWriterPoolSettings
,org.archive.io.WriterPoolSettings
,org.archive.spring.HasKeyedProperties
,org.springframework.beans.factory.Aware
,org.springframework.beans.factory.BeanNameAware
,org.springframework.context.Lifecycle
public class NasWARCProcessor extends org.archive.modules.writer.WARCWriterProcessor
Custom NAS WARCWriterProcessor addding NetarchiveSuite metadata to the WARCInfo records written by Heritrix by just extending the org.archive.modules.writer.WARCWriterProcessor; This was not possible in H1.- Author:
- svc
-
-
Field Summary
Fields Modifier and Type Field Description protected Map<String,String>
metadataMap
metadata items.-
Fields inherited from class org.archive.modules.writer.BaseWARCWriterProcessor
generator, stats, urlsWritten
-
Fields inherited from class org.archive.modules.writer.WriterPoolProcessor
ANNOTATION_UNWRITTEN, compress, directory, frequentFlushes, maxFileSizeBytes, maxTotalBytesToWrite, maxWaitForIdleMs, poolMaxActive, prefix, serverCache, skipIdenticalDigests, startNewFilesOnCheckpoint, storePaths, template, writeBufferSize
-
-
Constructor Summary
Constructors Constructor Description NasWARCProcessor()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Map<String,String>
getFormItems()
List<String>
getMetadata()
boolean
getWriteMetadataOutlinks()
void
setMetadataItems(Map<String,String> metadataItems)
void
setWriteMetadataOutlinks(boolean writeMetadataOutlinks)
protected URI
writeMetadata(org.archive.io.warc.WARCWriter w, String timestamp, URI baseid, org.archive.modules.CrawlURI curi, org.archive.util.anvl.ANVLRecord namedFields)
modify default writeMetadata method to handle the write of outlinks in metadata or not-
Methods inherited from class org.archive.modules.writer.WARCWriterProcessor
fromCheckpointJson, getWriteMetadata, getWriteRequests, innerProcessResult, qualifyRecordID, saveHeader, setWriteMetadata, setWriteRequests, setWriteRevisitForIdenticalDigests, setWriteRevisitForNotModified, toCheckpointJson, write, writeDnsRecords, writeFtpControlConversation, writeFtpRecords, writeHttpRecords, writeRequest, writeResource, writeResponse, writeRevisit, writeRevisit, writeWhoisRecords
-
Methods inherited from class org.archive.modules.writer.BaseWARCWriterProcessor
addIfNotBlank, addStats, copyStats, getDefaultMaxFileSize, getDefaultStorePaths, getRecordID, getRecordIDGenerator, getStats, report, setRecordIDGenerator, setupPool, updateMetadataAfterWrite
-
Methods inherited from class org.archive.modules.writer.WriterPoolProcessor
calcOutputDirs, checkBytesWritten, copyForwardWriteTagIfDupe, doCheckpoint, getCompress, getDirectory, getFrequentFlushes, getHostAddress, getMaxFileSizeBytes, getMaxTotalBytesToWrite, getMaxWaitForIdleMs, getMetadataProvider, getPool, getPoolMaxActive, getPrefix, getSerialNo, getServerCache, getSkipIdenticalDigests, getStartNewFilesOnCheckpoint, getStorePaths, getTemplate, getTotalBytesWritten, getWriteBufferSize, innerProcess, innerRejectProcess, setCompress, setDirectory, setFrequentFlushes, setMaxFileSizeBytes, setMaxTotalBytesToWrite, setMaxWaitForIdleMs, setMetadataProvider, setPool, setPoolMaxActive, setPrefix, setServerCache, setSkipIdenticalDigests, setStartNewFilesOnCheckpoint, setStorePaths, setTemplate, setTotalBytesWritten, setWriteBufferSize, shouldProcess, shouldWrite, start, stop
-
Methods inherited from class org.archive.modules.Processor
finishCheckpoint, flattenVia, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, isRunning, isSuccess, process, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, startCheckpoint
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
-
-
-
Method Detail
-
getWriteMetadataOutlinks
public boolean getWriteMetadataOutlinks()
-
setWriteMetadataOutlinks
public void setWriteMetadataOutlinks(boolean writeMetadataOutlinks)
-
getMetadata
public List<String> getMetadata()
- Specified by:
getMetadata
in interfaceorg.archive.io.WriterPoolSettings
- Overrides:
getMetadata
in classorg.archive.modules.writer.BaseWARCWriterProcessor
-
writeMetadata
protected URI writeMetadata(org.archive.io.warc.WARCWriter w, String timestamp, URI baseid, org.archive.modules.CrawlURI curi, org.archive.util.anvl.ANVLRecord namedFields) throws IOException
modify default writeMetadata method to handle the write of outlinks in metadata or not- Overrides:
writeMetadata
in classorg.archive.modules.writer.WARCWriterProcessor
- Throws:
IOException
-
-