Child pages
  • Usage
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Description of how to use the toolkit and examples

Since this project is mainly aimed at building a general purpose Web Archiving Toolkit these packages do not contain any applications, instead they are intended to be used as building blocks.

Dependency

<dependency>
<groupId>org.jwat</groupId>
<artifactId>jwat</artifactId>
<version>0.8.0-SNAPSHOT</version>
<type>pom</type>
</dependency>

Usage: 

The WARC reader can be used to read either all the records in a file sequentially or select records in random order.

Both scenarios are supported by the various factory and reader methods.

Compression:

Besides uncompressed WARC files, GZip compressed files are also supported.

GZip compression is only supported on WARC files where each record is compressed individually and concatenated into one file and not the case where the whole WARC file and all it's records are GZip'ed as a whole. The later mostly because this makes random access to individual record highly ineffective.

  • No labels