nclarkekb in JWAT

Removed a few unused import statements.
Added callback support to read and modify the ARC/WARC headers before processing the payload.
    • -0
    • +24
Improve support for empty files and errors at the end of (W)ARC files in the ArchiveParser/ArchiveParserCallback.
JWAT-89: Removed encodedwords use in HeaderLineParser. Both need to be refactord and it is not really useful.
CR-JWAS-33: Follow-up on review.
  1. … 59 more files in changeset.
JWAT-88: Unit test improved.
JWAT-88: Change so payload digest is not checked for WARC revisit and continuation records.
JWAT-87: Improved detection of garbage at the ed of (W)ARC files and unit tests of this. Also added unit tests testing empty (W)ARC files.
    • -0
    • +60
    • -0
    • +43
Forgot to remove some dependencies to ArcHeader in the parser.
Made some changes so ArcHeader, WarcHeader and HttpHeader can be re-parsed without using the complete Arc/Warc Readers.
Made some methods public instead of protected. Various cleanup.
ArcReader and WarcReader now implement Iterable<..> interface.
POM cleanup.
UriProfile throw clauses modified so that the invalid character gets hex encoded and the message becomes more meaningful.
JWAT-77: Unit tests and bug fixes for newly implemented ArcFileWriter/WarcFileWriter and related classes.

JWAT-76: Fix for archiveLengthStr/contentLengthStr set and archiveLength/contentLength null when using payload length validation.

Removed alot of tags and replaced with spaces. (Company policy)

Minor code cleanup.

    • -0
    • +505
    • -0
    • +64
  1. … 82 more files in changeset.
Unit test/javadoc of merged classes in jwat-common. Trivial full unit test of old class.
    • -0
    • +50
Clean up unwanted head.
Merge in files from jwat-tools with history.
    • -0
    • +41
ANVLRecord adds space after ":" to make output pretty.

Made constant in WarcFileWriter public.

JWAT-78: PayloadManager in JWAT-Tools seems to have a bug related to the closing of the RandomAccessFile and a non null tmpfile object.

Added some unit tests of most common classes.

Tweaked some constant definitions.

Changed some method signtures.
Added initial ANVLRecord class.
    • -0
    • +77
Make buffer sizes configurable in PayloadManager and ByteArrayIOStream.

Added new ManagedPayloadManager to support this.

JWAT-77: Add (W)ArcFileWriter helper classes.
    • -0
    • +26
    • -0
    • +44
    • -0
    • +138
    • -0
    • +26
    • -0
    • +82
    • -0
    • +177
Minor tweaks. Changed version to 0.6.0-SNAPSHOT. Deployed to maven central from now on.
Changed manageRecord back from private to public since it was used after all.
Fixed some texts. Added some spaces.
  1. … 27 more files in changeset.
Changed JWAT dependencies from 1.0.1-SNAPSHOT to 1.0.1.
JWAT-69: Unit tested WARC-Refers-To-Target-URI and WARC-Refers-To-Date in writer.

Bug fixed some copy/paste errors in the two new headers.

JWAT-69: Unit tested WARC-Refers-To-Target-URI and WARC-Refers-To-Date in reader.

Fixed some small bugs and omissions with the reading of those new headers.

Removed some tabs.