The pages below describe the individual packages and also the process by which ARC and WARC files are read and validated.
If you can not find the information you seek you can always try the javadocs or look at the source code. As a last resort you are also welcome to email me.
This toolkit includes the following packages:
- jwat-common: General purpose classes including specialized streams, binary->string encoding and common arc/warc http-response/payload code.
- jwat-gzip: GZip reader/validator/writer, including input/output streams for data.
- jwat-arc: Contains Arc reader/validator/writer specific classes.
- jwat-warc: Contains Warc reader/validator/writer specific classes.
ARC reader process
Describes the steps taken to read and validate an ARC record.
WARC reader process
Describes the steps taken to read and validate a WARC record.
Known side-effects and pitfalls from the current reading/validating strategy