Class WARCUtils

  • public class WARCUtils
    extends java.lang.Object
    Various utilities on WARC-records. We have borrowed code from wayback. See org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter
    • Field Summary

      Modifier and Type Field Description
      protected static org.slf4j.Logger log
      Logging output place.
    • Constructor Summary

      Constructor Description
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static createWARCWriter​( newFile)
      Create new WARCWriter, writing to warcfile newFile.
      static java.lang.String getRecordType​( record)
      Find out what type of WARC-record this is.
      static void insertWARCFile​( warcFile, writer)
      Insert the contents of a WARC file into another WARCFile.
      static boolean isWarc​(java.lang.String filename)
      Check if the given filename represents a WARC file.
      static byte[] readWARCRecord​( record)
      Read the contents (payload) of an WARC record into a byte array.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • log

        protected static final org.slf4j.Logger log
        Logging output place.
    • Method Detail

      • createWARCWriter

        public static createWARCWriter​( newFile)
        Create new WARCWriter, writing to warcfile newFile.
        newFile - the WARCfile, that the WARCWriter writes to.
        new WARCWriter, writing to warcfile newFile.
      • insertWARCFile

        public static void insertWARCFile​( warcFile,
        Insert the contents of a WARC file into another WARCFile.
        warcFile - An WARC file to read.
        writer - A place to write the arc records
        IOFailure - if there are problems reading the file.
      • readWARCRecord

        public static byte[] readWARCRecord​( record)
                                     throws IOFailure
        Read the contents (payload) of an WARC record into a byte array.
        record - An WARC record to read from. After reading, the WARC Record will no longer have its own data available for reading.
        A byte array containing the payload of the WARC record. Note that the size of the payload is calculated by subtracting the contentBegin value from the length of the record (both values included in the record header).
        IOFailure - If there is an error reading the data, or if the record is longer than Integer.MAX_VALUE (since we can't make bigger arrays).
      • getRecordType

        public static java.lang.String getRecordType​( record)
        Find out what type of WARC-record this is.
        record - a given WARCRecord
        the type of WARCRecord as a String.
      • isWarc

        public static boolean isWarc​(java.lang.String filename)
        Check if the given filename represents a WARC file.
        filename - A given filename
        true, if the filename ends with .warc or .warc.gz