Clone
 

nclarkekb in JWAT

Improve support for empty files and errors at the end of (W)ARC files in the ArchiveParser/ArchiveParserCallback.
JWAT-89: Removed encodedwords use in HeaderLineParser. Both need to be refactord and it is not really useful.
CR-JWAS-33: Follow-up on review.
  1. … 59 more files in changeset.
JWAT-88: Unit test improved.
JWAT-88: Change so payload digest is not checked for WARC revisit and continuation records.
JWAT-87: Improved detection of garbage at the ed of (W)ARC files and unit tests of this. Also added unit tests testing empty (W)ARC files.
    • -0
    • +60
    /jwat-arc/src/test/resources/invalid-arcfile-record-then-garbage.arc
    • -0
    • +43
    /jwat-warc/src/test/resources/invalid-warcfile-record-then-garbage.warc
Forgot to remove some dependencies to ArcHeader in the parser.
Made some changes so ArcHeader, WarcHeader and HttpHeader can be re-parsed without using the complete Arc/Warc Readers.
Made some methods public instead of protected. Various cleanup.
ArcReader and WarcReader now implement Iterable<..> interface.
POM cleanup.
UriProfile throw clauses modified so that the invalid character gets hex encoded and the message becomes more meaningful.
JWAT-77: Unit tests and bug fixes for newly implemented ArcFileWriter/WarcFileWriter and related classes.

JWAT-76: Fix for archiveLengthStr/contentLengthStr set and archiveLength/contentLength null when using payload length validation.

Removed alot of tags and replaced with spaces. (Company policy)

Minor code cleanup.

    • -0
    • +505
    /jwat-arc/src/test/java/org/jwat/arc/TestArcFileWriter.java
    • -0
    • +64
    /jwat-arc/src/test/java/org/jwat/arc/TestArcFileWriterConfig.java
  1. … 82 more files in changeset.
Unit test/javadoc of merged classes in jwat-common. Trivial full unit test of old class.
    • -0
    • +50
    /jwat-common/src/test/java/org/jwat/common/TestDiagnosisType.java
Clean up unwanted head.
Merge in files from jwat-tools with history.
    • -0
    • +41
    /jwat-archive/pom.xml
ANVLRecord adds space after ":" to make output pretty.

Made constant in WarcFileWriter public.

JWAT-78: PayloadManager in JWAT-Tools seems to have a bug related to the closing of the RandomAccessFile and a non null tmpfile object.

Added some unit tests of most common classes.

Tweaked some constant definitions.

Changed some method signtures.
Added initial ANVLRecord class.
    • -0
    • +77
    /jwat-common/src/main/java/org/jwat/common/ANVLRecord.java
Make buffer sizes configurable in PayloadManager and ByteArrayIOStream.

Added new ManagedPayloadManager to support this.

JWAT-77: Add (W)ArcFileWriter helper classes.
    • -0
    • +26
    /jwat-arc/src/main/java/org/jwat/arc/ArcFileNaming.java
    • -0
    • +44
    /jwat-arc/src/main/java/org/jwat/arc/ArcFileNamingSingleFile.java
    • -0
    • +138
    /jwat-arc/src/main/java/org/jwat/arc/ArcFileWriter.java
    • -0
    • +26
    /jwat-warc/src/main/java/org/jwat/warc/WarcFileNaming.java
    • -0
    • +82
    /jwat-warc/src/main/java/org/jwat/warc/WarcFileNamingDefault.java
    • -0
    • +177
    /jwat-warc/src/main/java/org/jwat/warc/WarcFileWriter.java
Minor tweaks. Changed version to 0.6.0-SNAPSHOT. Deployed to maven central from now on.
Changed manageRecord back from private to public since it was used after all.
Fixed some texts. Added some spaces.
  1. … 27 more files in changeset.
Changed JWAT dependencies from 1.0.1-SNAPSHOT to 1.0.1.
JWAT-69: Unit tested WARC-Refers-To-Target-URI and WARC-Refers-To-Date in writer.

Bug fixed some copy/paste errors in the two new headers.

JWAT-69: Unit tested WARC-Refers-To-Target-URI and WARC-Refers-To-Date in reader.

Fixed some small bugs and omissions with the reading of those new headers.

Removed some tabs.

Added some unit tests.
JWAT-72: Scheme class is not case insensitive

Unit test of isArcRecord().