JWAT

Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Minor refactoring of API, unittests, etc.
Added reusable ArchiveParser, changed some stuff to work with GUI. Almost complete parallelized CDX'er.
    • -0
    • +187
    /jwat-archive/src/main/java/org/jwat/archive/ArchiveParser.java
Refactoring for use with JWAT-Tools GUI.

Forgot to add env.cmd and some other files.

    • -0
    • +84
    /jwat-archive/src/main/java/org/jwat/archive/FileIdent.java
Changed existing spaces to tabs even though personal preference is for tabs.
Added an even more relaxed Uri profile for Heritrix written data.

Warc-Profile treated as an URI, oversight fixed (JWAT-61).

Minor review stuff.

Refactored Test classes file names.

    • -0
    • +42
    /jwat-arc/src/test/java/org/jwat/arc/TestArc_UriProfile.java
  1. … 44 more files in changeset.
Uri methods added with profile parameter, additional uri profiles added, minor unittesting, javadocs and review changes.
Unittest for UriProfile (JWAT-59).
URI and URI profile split into separate classes. Currently only includes a strict RFC3986 profile.
    • -0
    • +306
    /jwat-common/src/main/java/org/jwat/common/UriProfile.java
JWAT-59: Good progress on JWAt Uri implementation. Almost ready and tested.
Followup to review CR-JWAS-25. Experimental Uri implementation.
  1. … 17 more files in changeset.
Somewhat conclusion of the following issues:

JWAT-46: ARC reader refactoring

JWAT-8: Unit tests and coverage of ARCRecordBase, ArcRecord and ArcVersionBlock

JWAT-45: ARC writer

Partial lenient Uri implementation.

    • -0
    • +337
    /jwat-arc/src/test/java/org/jwat/arc/TestArcWriter_States.java
    • -0
    • +141
    /jwat-common/src/main/java/org/jwat/common/Uri.java
    • -0
    • +67
    /jwat-common/src/test/java/org/jwat/common/TestUri.java
ArcWriter->ArcReader combo unit tested.
Added the last validation checks in the Arc reader and added some unit tests for the new validation errors.
startOffset tweaking and unit testing.

Unit testing of those hard to throw exceptions.

  1. … 8 more files in changeset.
Minor javadoc and complete unit testing of ArcRecord and ArcVersionBlock.
A bit more unit testing.
    • -0
    • +285
    /jwat-arc/src/test/java/org/jwat/arc/TestParams_Writer.java
    • -0
    • +387
    /jwat-warc/src/test/java/org/jwat/warc/TestParams_Readers.java
    • -0
    • +284
    /jwat-warc/src/test/java/org/jwat/warc/TestParams_Writers.java
  1. … 31 more files in changeset.
JWAT-60: no-type was not correctly handled in ARCHeader rewrite.

JWAT-58: ARCReader now defaults to a less strict mode where LFs in otherwise empty version block is allowed and trailing LFs between records is ignore if not =1.

JWAT-57: Should have been fixed in previous push.

Also added check for negative offset and archive-length.

Rewrote the record reader to not accept any line as a possible record header. Adds error of data before record if it occurs.

Added some unit testing here and there.

JWAT-57: Added workaround for test using toString() on ContentType.
Fixed a bug introduced with http request support in HttpHeader parser.

Added some unit tests for absolute resources in http request.

Added some more unit testing here and there.

Changed the ARC Writer slightly, still not 100% functional nor tested.

Better handling of "-" in ARC header values.
    • -0
    • +93
    /jwat-arc/src/test/java/org/jwat/arc/TestArcRecord.java
ArchiveLengthStr was not exposed, but needed in JHove2 Module.
Found a bug in the ARC reader, not all diagnoses were reported even though the isCompliant field was correctly updated.
Improved handling of versionblock and metadata, added hasEmptyPayload if the payload has been completely processed by the reader.

Added some more errors/warnings that needed to be refactored.

Unit test of ArcVersionBlock.

Added number of information strings comparison to diagnosis constructor.
Changes from review(spelling, javadoc, exeption hanlding etc.) and BnF tests(GZip compliant fields/methods). Add missing javadoc to common and warc packages.
  1. … 23 more files in changeset.
Refactoring of gzip test class names and addition of compressed entry size.
  1. … 7 more files in changeset.
Added missing consumed and isValid logic/methods to GZip reader and entry.
API changes for JHove2 modules.
Fixes too much cleanup in close method.