Class NasWARCProcessor
- java.lang.Object
-
- org.archive.modules.Processor
-
- org.archive.modules.writer.WriterPoolProcessor
-
- org.archive.modules.writer.BaseWARCWriterProcessor
-
- org.archive.modules.writer.WARCWriterProcessor
-
- dk.netarkivet.harvester.harvesting.NasWARCProcessor
-
- All Implemented Interfaces:
org.archive.checkpointing.Checkpointable
,org.archive.io.warc.WARCWriterPoolSettings
,org.archive.io.WriterPoolSettings
,org.archive.spring.HasKeyedProperties
,org.springframework.beans.factory.Aware
,org.springframework.beans.factory.BeanNameAware
,org.springframework.context.Lifecycle
public class NasWARCProcessor extends org.archive.modules.writer.WARCWriterProcessor
Custom NAS WARCWriterProcessor addding NetarchiveSuite metadata to the WARCInfo records written by Heritrix by just extending the org.archive.modules.writer.WARCWriterProcessor; This was not possible in H1.- Author:
- svc
-
-
Field Summary
Fields Modifier and Type Field Description protected java.util.Map<java.lang.String,java.lang.String>
metadataMap
metadata items.-
Fields inherited from class org.archive.modules.writer.BaseWARCWriterProcessor
generator, stats, urlsWritten
-
Fields inherited from class org.archive.modules.writer.WriterPoolProcessor
ANNOTATION_UNWRITTEN, compress, directory, frequentFlushes, maxFileSizeBytes, maxTotalBytesToWrite, maxWaitForIdleMs, poolMaxActive, prefix, serverCache, skipIdenticalDigests, startNewFilesOnCheckpoint, storePaths, template, writeBufferSize
-
-
Constructor Summary
Constructors Constructor Description NasWARCProcessor()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.Map<java.lang.String,java.lang.String>
getFormItems()
java.util.List<java.lang.String>
getMetadata()
boolean
getWriteMetadataOutlinks()
void
setMetadataItems(java.util.Map<java.lang.String,java.lang.String> metadataItems)
void
setWriteMetadataOutlinks(boolean writeMetadataOutlinks)
protected java.net.URI
writeMetadata(org.archive.io.warc.WARCWriter w, java.lang.String timestamp, java.net.URI baseid, org.archive.modules.CrawlURI curi, org.archive.util.anvl.ANVLRecord namedFields)
modify default writeMetadata method to handle the write of outlinks in metadata or not-
Methods inherited from class org.archive.modules.writer.WARCWriterProcessor
fromCheckpointJson, getWriteMetadata, getWriteRequests, innerProcessResult, qualifyRecordID, saveHeader, setWriteMetadata, setWriteRequests, setWriteRevisitForIdenticalDigests, setWriteRevisitForNotModified, toCheckpointJson, write, writeDnsRecords, writeFtpControlConversation, writeFtpRecords, writeHttpRecords, writeRequest, writeResource, writeResponse, writeRevisit, writeRevisit, writeWhoisRecords
-
Methods inherited from class org.archive.modules.writer.BaseWARCWriterProcessor
addIfNotBlank, addStats, copyStats, getDefaultMaxFileSize, getDefaultStorePaths, getRecordID, getRecordIDGenerator, getStats, report, setRecordIDGenerator, setupPool, updateMetadataAfterWrite
-
Methods inherited from class org.archive.modules.writer.WriterPoolProcessor
calcOutputDirs, checkBytesWritten, copyForwardWriteTagIfDupe, doCheckpoint, getCompress, getDirectory, getFrequentFlushes, getHostAddress, getMaxFileSizeBytes, getMaxTotalBytesToWrite, getMaxWaitForIdleMs, getMetadataProvider, getPool, getPoolMaxActive, getPrefix, getSerialNo, getServerCache, getSkipIdenticalDigests, getStartNewFilesOnCheckpoint, getStorePaths, getTemplate, getTotalBytesWritten, getWriteBufferSize, innerProcess, innerRejectProcess, setCompress, setDirectory, setFrequentFlushes, setMaxFileSizeBytes, setMaxTotalBytesToWrite, setMaxWaitForIdleMs, setMetadataProvider, setPool, setPoolMaxActive, setPrefix, setServerCache, setSkipIdenticalDigests, setStartNewFilesOnCheckpoint, setStorePaths, setTemplate, setTotalBytesWritten, setWriteBufferSize, shouldProcess, shouldWrite, start, stop
-
Methods inherited from class org.archive.modules.Processor
finishCheckpoint, flattenVia, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, isRunning, isSuccess, process, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, startCheckpoint
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
-
-
-
Field Detail
-
metadataMap
protected java.util.Map<java.lang.String,java.lang.String> metadataMap
metadata items. Add to bean WARCProcessor bean as as
-
-
Constructor Detail
-
NasWARCProcessor
public NasWARCProcessor()
-
-
Method Detail
-
getWriteMetadataOutlinks
public boolean getWriteMetadataOutlinks()
-
setWriteMetadataOutlinks
public void setWriteMetadataOutlinks(boolean writeMetadataOutlinks)
-
getFormItems
public java.util.Map<java.lang.String,java.lang.String> getFormItems()
-
setMetadataItems
public void setMetadataItems(java.util.Map<java.lang.String,java.lang.String> metadataItems)
-
getMetadata
public java.util.List<java.lang.String> getMetadata()
- Specified by:
getMetadata
in interfaceorg.archive.io.WriterPoolSettings
- Overrides:
getMetadata
in classorg.archive.modules.writer.BaseWARCWriterProcessor
-
writeMetadata
protected java.net.URI writeMetadata(org.archive.io.warc.WARCWriter w, java.lang.String timestamp, java.net.URI baseid, org.archive.modules.CrawlURI curi, org.archive.util.anvl.ANVLRecord namedFields) throws java.io.IOException
modify default writeMetadata method to handle the write of outlinks in metadata or not- Overrides:
writeMetadata
in classorg.archive.modules.writer.WARCWriterProcessor
- Throws:
java.io.IOException
-
-