Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3


warc package


Most of the constants should be collected in this class, most of which are primarily for internal use.

ReaderFactory and Readers

  • This factory can be used to create the various types of readers with optional buffering. You can either get compressed or uncompressed readers. There are also methods which can auto-detect whether or not a compressed reader is required.
  • Abstract reader class which is the base for the all the readers. It also defines the options which can be set on a reader. Currently only digest options.
  • A reader implementation for reading compressed records.
  • A reader implementation for reading uncompressed records.

This class contains the record parser, fields and validation.

Auxiliary classes

  • Reading a WARC header encapsulates each line in instances of this class.
  • Parses and validated an WARC date.
  • Parses, validates and encapsulates a WARC digest header (algorithm, digest, encoding). The encoding is auto-detected and added later in the reading process.
  • Defines the different possible error types.
  • Defines an WARC validation error using a type, key and value.


  • Abstract writer class which is the base for all the writers.
  • A writer implementation prototype.