Child pages
  • Running JWAT-Tools

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Excerpt
hiddentrue

Instructions of on how to run JWAT-Tools.

Options

...

Installing and running

To install JWAT-Tools simply unpack the archive.

To run JWAT-Tools use the Windows or Linux scripts included in the package.

The scripts can be called from any location.

Info
iconfalse
titleWindows scripts

jwattools.cmd

jwattools_debug.cmd

jwattools_debug_suspended.cmd

Info
iconfalse
titleLinux scripts

jwattools.sh

jwattools_debug.sh

jwattools_debug_suspended.sh

Options

The command line interface has changed yet again for v0.5.6.

The main help page only lists command and global options.

Use jwattools help <command> to show a command's usage.

Code Block
borderStylesolid
titleCommandline options (v0.5.6)
C:\Java\workspace\jwat-tools>target\jwat-tools-0.5.6-SNAPSHOT\jwattools.cmd
JWATTools v0.5.6
usage: JWATTools <command> [<args>]
Commands:
   arc2warc    convert ARC to WARC
   cdx         create a CDX index (unsorted)
   compress    compress
   decompress  decompress
   extract     extract ARC/WARC record(s)
   interval    interval extract
   pathindex   create a heritrix path index (unsorted)
   test        test validity of ARC/WARC/GZip file(s)
   unpack      unpack multifile GZip

See 'jwattools help <command>' for more information on a specific command.

C:\Java\workspace\jwat-tools>

Command line interface for v0.5.5.

Code Block
borderStylesolid
titleCommandline options (v0.5.5)
C:\Java\workspace\jwat-tools>jwattools.battools>target\jwat-tools-0.5.5-SNAPSHOT\jwattools.cmd
JWATTools v0.5.15
usageUsage: JWATTools [-dte19] command [file ...]
Commands:
-t   arc2warc     convert ARC to WARC
   cdx          create a CDX index (unsorted)
   compress     compress
   decompress   decompress
   extract      extract ARC/WARC record(s)
   interval     interval extract
   pathindex    create a heritrix path index (unsorted)
   test         test validity of ARC, /WARC and/or GZip file(s)
 -e  unpack show errors  -d   decompressunpack multifile GZip
Options:
   -r      recursive (currently has no effect)
   -w<x>   set the amount of worker thread(s) (defaults to 1)
Test options:
compress faster  -9e   compressshow bettererrors
   -il   intervalrelaxed extractURL URI validation
   -ux   unpackto multifilevalidate gziptext/xml payload -c   convert arc to warc(eg. mets)
Compress options:
   -1, --fast   compress faster
   -9, --slow   compress better

C:\Java\workspace\jwat-tools>

You can supply one or more files. Each file can contain * and/or ? wildcards, but only in the filename part of the path. You can use more wildcards at the same time if you want.

-t (test)

Reads and validates all the files supplied. Files which are not recognized as either GZip, ARC or WARC are skipped. If wildcards are used, files that do not match are also skipped.

Use -e for more than a summary of errors.

-d (decompress)

Decompress one or more (multi-part) GZip files and write the decompressed data to a new file, one for each input file.
Useful for decompressing ARC and/or WARC files.

-r (recursive)

Is currently ignored. All operations are currently recursive.

-1..-9 (compress)

Compress normal and/or WARC files.

-i (interval extract)

Extract an interval from a given file. Interval can be expressed as offset, offset2 or offset,+length. Offset and length can be expressed in hex by pre-pending "$" or "0x".

-u (unpack)

Unpack a (multi-file)GZip and save each entry as individual files.

-c (convert)

Convert ARC files to WARC.