Clone Tools
  • last updated a few minutes ago
Constraints: committers
Constraints: files
Constraints: dates
Added callback support to read and modify the ARC/WARC headers before processing the payload.
    • -0
    • +39
  1. … 6 more files in changeset.
Improve support for empty files and errors at the end of (W)ARC files in the ArchiveParser/ArchiveParserCallback.
  1. … 1 more file in changeset.
CR-JWAS-33: Follow-up on review.
  1. … 70 more files in changeset.
JWAT-77: Unit tests and bug fixes for newly implemented ArcFileWriter/WarcFileWriter and related classes.

JWAT-76: Fix for archiveLengthStr/contentLengthStr set and archiveLength/contentLength null when using payload length validation.

Removed alot of tags and replaced with spaces. (Company policy)

Minor code cleanup.

    • -180
    • +180
    • -14
    • +14
    • -260
    • +260
    • -34
    • +34
  1. … 91 more files in changeset.
Merge in files from jwat-tools with history.
    • -1
    • +29
    • -1
    • +18
  1. … 9 more files in changeset.
JWAT-78: PayloadManager in JWAT-Tools seems to have a bug related to the closing of the RandomAccessFile and a non null tmpfile object.

Added some unit tests of most common classes.

Tweaked some constant definitions.

  1. … 5 more files in changeset.
Make buffer sizes configurable in PayloadManager and ByteArrayIOStream.

Added new ManagedPayloadManager to support this.

    • -0
    • +52
  1. … 1 more file in changeset.
Minor tweaks. Changed version to 0.6.0-SNAPSHOT. Deployed to maven central from now on.
Changed manageRecord back from private to public since it was used after all.
Adding a containermd task to create the containerMD represnetation of an arc or a warc file.

Use the 1.0.2 version of the jwat core libraries.

Correct the usage of identified payload by closing the handle in the case where a temporary file has to be created (large files) and delete them at the end.

Changed JWAT dependencies from 1.0.1-SNAPSHOT to 1.0.1.
Work in progress:

Git style help command.

Improved multithreading for some tasks.

Support for linux file identification.

Rewrote arc2warc to support multiple filedesc records and payload repair.

ManagedPayload added for reloading of payload in different validators.

Fully implemented 2 step XML validation.

Improved FileIdent based on file name and stream peeking.

    • -0
    • +112
    • -0
    • +344
  1. … 3 more files in changeset.
Corrected a spelling mistake in JWAT API.
Improved various tasks.
Changed to 0.5.5-SNAPSHOT. Changed the Command Line Interface.
Cleanup and refactoring.
changed some classes to allow for different recordheader/payloadheader maximum sizes.

Temporarily set to 1024k.

Added reusable ArchiveParser, changed some stuff to work with GUI. Almost complete parallelized CDX'er.
    • -0
    • +187
    • -0
    • +23
Refactoring for use with JWAT-Tools GUI.

Forgot to add env.cmd and some other files.

    • -0
    • +84