The following sub-pages describe in some detail how the readers are implemented.
For more fine grained information about the API the javadocs should be consulted.
This toolkit includes classes to read and validate Arc, GZip and Warc files. Arc and Warc files which are GZip compressed are also supported.
The toolkit has the following package layout:
- jwat-common: General purpose classes including specialized streams, binary->string encoding and common arc/warc http-response/payload code.
- jwat-gzip: GZip input-stream/entry reader/validator.
- jwat-arc: Contains Arc reader/validator specific classes.
- jwat-warc: Contains Warc reader/validator specific classes.
It is in java, it is beautiful, it is incomplete as of yet.
For historical reasons the old respositories before merging can be found here.