Class NasWARCProcessor

  • All Implemented Interfaces:
    org.archive.checkpointing.Checkpointable, org.archive.io.warc.WARCWriterPoolSettings, org.archive.io.WriterPoolSettings, org.archive.spring.HasKeyedProperties, org.springframework.beans.factory.Aware, org.springframework.beans.factory.BeanNameAware, org.springframework.context.Lifecycle

    public class NasWARCProcessor
    extends org.archive.modules.writer.WARCWriterProcessor
    Custom NAS WARCWriterProcessor addding NetarchiveSuite metadata to the WARCInfo records written by Heritrix by just extending the org.archive.modules.writer.WARCWriterProcessor; This was not possible in H1.
    Author:
    svc
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected java.util.Map<java.lang.String,​java.lang.String> metadataMap
      metadata items.
      • Fields inherited from class org.archive.modules.writer.BaseWARCWriterProcessor

        generator, stats, urlsWritten
      • Fields inherited from class org.archive.modules.writer.WriterPoolProcessor

        ANNOTATION_UNWRITTEN, compress, directory, frequentFlushes, maxFileSizeBytes, maxTotalBytesToWrite, maxWaitForIdleMs, poolMaxActive, prefix, serverCache, skipIdenticalDigests, startNewFilesOnCheckpoint, storePaths, template, writeBufferSize
      • Fields inherited from class org.archive.modules.Processor

        beanName, isRunning, kp, recoveryCheckpoint, uriCount
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.util.Map<java.lang.String,​java.lang.String> getFormItems()  
      java.util.List<java.lang.String> getMetadata()  
      boolean getWriteMetadataOutlinks()  
      void setMetadataItems​(java.util.Map<java.lang.String,​java.lang.String> metadataItems)  
      void setWriteMetadataOutlinks​(boolean writeMetadataOutlinks)  
      protected java.net.URI writeMetadata​(org.archive.io.warc.WARCWriter w, java.lang.String timestamp, java.net.URI baseid, org.archive.modules.CrawlURI curi, org.archive.util.anvl.ANVLRecord namedFields)
      modify default writeMetadata method to handle the write of outlinks in metadata or not
      • Methods inherited from class org.archive.modules.writer.WARCWriterProcessor

        fromCheckpointJson, getWriteMetadata, getWriteRequests, innerProcessResult, qualifyRecordID, saveHeader, setWriteMetadata, setWriteRequests, setWriteRevisitForIdenticalDigests, setWriteRevisitForNotModified, toCheckpointJson, write, writeDnsRecords, writeFtpControlConversation, writeFtpRecords, writeHttpRecords, writeRequest, writeResource, writeResponse, writeRevisit, writeRevisit, writeWhoisRecords
      • Methods inherited from class org.archive.modules.writer.BaseWARCWriterProcessor

        addIfNotBlank, addStats, copyStats, getDefaultMaxFileSize, getDefaultStorePaths, getRecordID, getRecordIDGenerator, getStats, report, setRecordIDGenerator, setupPool, updateMetadataAfterWrite
      • Methods inherited from class org.archive.modules.writer.WriterPoolProcessor

        calcOutputDirs, checkBytesWritten, copyForwardWriteTagIfDupe, doCheckpoint, getCompress, getDirectory, getFrequentFlushes, getHostAddress, getMaxFileSizeBytes, getMaxTotalBytesToWrite, getMaxWaitForIdleMs, getMetadataProvider, getPool, getPoolMaxActive, getPrefix, getSerialNo, getServerCache, getSkipIdenticalDigests, getStartNewFilesOnCheckpoint, getStorePaths, getTemplate, getTotalBytesWritten, getWriteBufferSize, innerProcess, innerRejectProcess, setCompress, setDirectory, setFrequentFlushes, setMaxFileSizeBytes, setMaxTotalBytesToWrite, setMaxWaitForIdleMs, setMetadataProvider, setPool, setPoolMaxActive, setPrefix, setServerCache, setSkipIdenticalDigests, setStartNewFilesOnCheckpoint, setStorePaths, setTemplate, setTotalBytesWritten, setWriteBufferSize, shouldProcess, shouldWrite, start, stop
      • Methods inherited from class org.archive.modules.Processor

        finishCheckpoint, flattenVia, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, isRunning, isSuccess, process, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, startCheckpoint
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface org.archive.checkpointing.Checkpointable

        finishCheckpoint, setRecoveryCheckpoint, startCheckpoint
      • Methods inherited from interface org.springframework.context.Lifecycle

        isRunning
      • Methods inherited from interface org.archive.io.warc.WARCWriterPoolSettings

        getRecordIDGenerator
      • Methods inherited from interface org.archive.io.WriterPoolSettings

        calcOutputDirs, getCompress, getFrequentFlushes, getMaxFileSizeBytes, getPrefix, getTemplate, getWriteBufferSize
    • Field Detail

      • metadataMap

        protected java.util.Map<java.lang.String,​java.lang.String> metadataMap
        metadata items. Add to bean WARCProcessor bean as as ...
    • Method Detail

      • getFormItems

        public java.util.Map<java.lang.String,​java.lang.String> getFormItems()
      • setMetadataItems

        public void setMetadataItems​(java.util.Map<java.lang.String,​java.lang.String> metadataItems)
      • getMetadata

        public java.util.List<java.lang.String> getMetadata()
        Specified by:
        getMetadata in interface org.archive.io.WriterPoolSettings
        Overrides:
        getMetadata in class org.archive.modules.writer.BaseWARCWriterProcessor
      • writeMetadata

        protected java.net.URI writeMetadata​(org.archive.io.warc.WARCWriter w,
                                             java.lang.String timestamp,
                                             java.net.URI baseid,
                                             org.archive.modules.CrawlURI curi,
                                             org.archive.util.anvl.ANVLRecord namedFields)
                                      throws java.io.IOException
        modify default writeMetadata method to handle the write of outlinks in metadata or not
        Overrides:
        writeMetadata in class org.archive.modules.writer.WARCWriterProcessor
        Throws:
        java.io.IOException