Child pages
  • jwat-common

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  • Uses an alphabet of 64 characters and is the most widely used.
  • Uses an alphabet of 32 characters and seems to be the default encoding for WARC digests.
  • Uses an alphabet of 16 characters and is also more commonly called hexadecimal strings or just hex for short.
  • Uses only 0s and 1s and represents the 8bit values as binary string representations.

InputStream / StringReader

  • An extended InputStream modified to keep track of the number of consumed bytes.
  • An extended Pushback Inputstream which also keeps track of the actual number of consumed bytes.
  • An extended DigestInputStream which overrides the skip method to perform reads.
  • An InputStream wrapper with a fixed amount of data available, which must be consumed by either reading/skipping it, or ultimately be skipped when it is closed, if data is still available.
  • An Inputstream wrapper which can only consumed a maximum amount of data which in turn is recorded internally and available as a byte array afterwards.
  • A StringReader that uses a String as Input and also keeps track of the number of consumed chars.

Common header and payload classes used by both ARC and WARC

  • Parses and validates a content-type header with optional parameters.
  • Parses and validates an IPv4 or IPv6 address using regular expressions.
  • Identifies and encapsulates valid http header blocks. The payload is accessible though an InputStream with optional digest value computation.
  • Encapsulates an ARC/WARC payload and optionally computes a digest value.
  • An interface that must be implemented to receive notice of a payloads closure.

RandomAccessFile InputStream wrappers

  • Used to access a File as an InputStream. All the RandomAccessFile methods can be used, like seek to re-position the stream.
  • Used to access a File as an OutputStream.