Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Added callback support to read and modify the ARC/WARC headers before processing the payload.
    • -0
    • +24
    ./org/jwat/warc/WarcRecordParserCallback.java
  1. … 5 more files in changeset.
CR-JWAS-33: Follow-up on review.
    • -1
    • +2
    ./org/jwat/warc/WarcFileNamingDefault.java
    • -1
    • +7
    ./org/jwat/warc/WarcFileNamingSingleFile.java
    • -34
    • +9
    ./org/jwat/warc/WarcFileWriter.java
    • -6
    • +0
    ./org/jwat/warc/WarcFileWriterConfig.java
    • -8
    • +8
    ./org/jwat/warc/WarcReaderFactory.java
  1. … 63 more files in changeset.
JWAT-88: Change so payload digest is not checked for WARC revisit and continuation records.
  1. … 1 more file in changeset.
JWAT-87: Improved detection of garbage at the ed of (W)ARC files and unit tests of this. Also added unit tests testing empty (W)ARC files.
  1. … 5 more files in changeset.
Made some changes so ArcHeader, WarcHeader and HttpHeader can be re-parsed without using the complete Arc/Warc Readers.
  1. … 3 more files in changeset.
Made some methods public instead of protected. Various cleanup.
    • -11
    • +11
    ./org/jwat/warc/WarcFieldParsers.java
  1. … 1 more file in changeset.
ArcReader and WarcReader now implement Iterable<..> interface.
  1. … 3 more files in changeset.
Use default charset in case of bad charset and handle bad encoding in WARC-Target-URI header (add a simple test case)
  1. … 4 more files in changeset.
JWAT-77: Unit tests and bug fixes for newly implemented ArcFileWriter/WarcFileWriter and related classes.

JWAT-76: Fix for archiveLengthStr/contentLengthStr set and archiveLength/contentLength null when using payload length validation.

Removed alot of tags and replaced with spaces. (Company policy)

Minor code cleanup.

    • -2
    • +17
    ./org/jwat/warc/WarcFileNaming.java
    • -32
    • +51
    ./org/jwat/warc/WarcFileNamingDefault.java
    • -15
    • +35
    ./org/jwat/warc/WarcFileNamingSingleFile.java
    • -62
    • +108
    ./org/jwat/warc/WarcFileWriter.java
    • -5
    • +28
    ./org/jwat/warc/WarcFileWriterConfig.java
  1. … 91 more files in changeset.
ANVLRecord adds space after ":" to make output pretty.

Made constant in WarcFileWriter public.

  1. … 2 more files in changeset.
JWAT-77: Add (W)ArcFileWriter helper classes.
    • -0
    • +26
    ./org/jwat/warc/WarcFileNaming.java
    • -0
    • +82
    ./org/jwat/warc/WarcFileNamingDefault.java
    • -0
    • +44
    ./org/jwat/warc/WarcFileNamingSingleFile.java
    • -0
    • +177
    ./org/jwat/warc/WarcFileWriter.java
    • -0
    • +52
    ./org/jwat/warc/WarcFileWriterConfig.java
  1. … 5 more files in changeset.
Fixed some texts. Added some spaces.
  1. … 39 more files in changeset.
Stuff with close() now implements Closeable
  1. … 8 more files in changeset.
Fixed the javadoc so that the command 'mvn -Psonatype-oss-release clean install -Dgpg.skip=true' works
  1. … 6 more files in changeset.
JWAT-69: Unit tested WARC-Refers-To-Target-URI and WARC-Refers-To-Date in writer.

Bug fixed some copy/paste errors in the two new headers.

  1. … 1 more file in changeset.
JWAT-69: Unit tested WARC-Refers-To-Target-URI and WARC-Refers-To-Date in reader.

Fixed some small bugs and omissions with the reading of those new headers.

Removed some tabs.

    • -12
    • +25
    ./org/jwat/warc/WarcConstants.java
  1. … 10 more files in changeset.
Added some unit tests.
    • -1
    • +1
    ./org/jwat/warc/WarcReaderFactory.java
    • -1
    • +1
    ./org/jwat/warc/WarcWriterFactory.java
  1. … 6 more files in changeset.
JWAT-70: Unit test DataFormatException in Gzip reader/writer.

JWAT-71: Found IndexOutOfBoundException while unit testing DataFormatException in GzipWriter.

Removed some tabs.

  1. … 7 more files in changeset.
JWAT-67: Validate support for sha256 Digest algorithm in WARC-header. Tested with sha1, sha-256, sha-512, tiger, RipeMD128, RipeMD160, RipeMD256, RipeMD320 and BouncyCastle.

Added reader and addHeader support for WARC-Refers-To-Target-URI and WARC-Refers-To-Date including minor tests.

    • -1
    • +2
    ./org/jwat/warc/WarcReaderFactory.java
  1. … 8 more files in changeset.
Work in progress on unified ARC/WARC reader. Module not included yet.
    • -3
    • +3
    ./org/jwat/warc/WarcReaderCompressed.java
    • -2
    • +2
    ./org/jwat/warc/WarcReaderUncompressed.java
  1. … 6 more files in changeset.
Support for WARC-Refers-To-Target-URI and WARC-Refers-To-Date - see https://docs.google.com/document/d/1QyQBA7Ykgxie75V8Jziz_O7hbhwf7PF6_u9O6w6zgp0/edit
Followup from reviews.

Saving of test data for use in JHOVE2.

'no-type' is ignored when looking for http headers.

Improved detection of possible arc record.

Minor tweaks.

    • -1
    • +1
    ./org/jwat/warc/WarcReaderCompressed.java
    • -9
    • +18
    ./org/jwat/warc/WarcReaderFactory.java
    • -1
    • +1
    ./org/jwat/warc/WarcReaderUncompressed.java
  1. … 36 more files in changeset.
Zero length ARC, WARC and GZip files are now reported as non compliant.
  1. … 15 more files in changeset.
Raw ARC record line now stored.

Removed some tabs.

    • -41
    • +41
    ./org/jwat/warc/WarcFieldParsers.java
  1. … 4 more files in changeset.
Strict validation of <> encapsulating some URIs.

Tying up loose ends.

    • -4
    • +46
    ./org/jwat/warc/WarcFieldParsers.java
    • -6
    • +6
    ./org/jwat/warc/WarcReaderFactory.java
  1. … 11 more files in changeset.
Added some getters.
  1. … 3 more files in changeset.
Minor refactoring of API, unittests, etc.
  1. … 7 more files in changeset.
Added an even more relaxed Uri profile for Heritrix written data.

Warc-Profile treated as an URI, oversight fixed (JWAT-61).

Minor review stuff.

Refactored Test classes file names.

  1. … 54 more files in changeset.
Followup to review CR-JWAS-25. Experimental Uri implementation.
    • -1
    • +1
    ./org/jwat/warc/WarcReaderCompressed.java
    • -1
    • +1
    ./org/jwat/warc/WarcReaderUncompressed.java
  1. … 27 more files in changeset.
ArcWriter->ArcReader combo unit tested.
  1. … 11 more files in changeset.