Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
BNF
Description
Currently heritrix3 writes WARC records conformant with the WARC 1.0 standard.
Browsing through the WARCWriterProcessor that we're currently extending in NetarchiveSuite (org.archive.modules.writer.WARCWriterProcessor) it seems that it uses a WARCWriter and WARCConstants classes from archive-commons package, that hardwires its WARC to WARC version 1.0
Imports in the WARCWriterProcessor from the archive-commons package:
import org.archive.io.warc.WARCRecordInfo; import org.archive.io.warc.WARCWriter; import org.archive.io.warc.WARCWriterPool; import org.archive.io.warc.WARCWriterPoolSettings; import static org.archive.format.warc.WARCConstants
The easiest way to implement 1.1 would to copy the above classes inside the folder:
https://github.com/netarchivesuite/netarchivesuite/tree/master/harvester/heritrix3/heritrix3-extensions/src/main/java
And then adapt the WARCConstants to be 1.1 compliant instead
Attachments
Issue Links
- was spawned by
-
NAS-2290 Generate revisit records
- Closed