JWAT

Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates

Changeset b7021e932ba1ecce529447992971343e1f7c3d6c does not exist.

[maven-release-plugin] prepare for next development iteration

[maven-release-plugin] copy for tag jwat-1.1.1

[maven-release-plugin] prepare release jwat-1.1.1

Removed a few unused import statements.

Added callback support to read and modify the ARC/WARC headers before processing the payload.

    • -0
    • +24
    /jwat-arc/src/main/java/org/jwat/arc/ArcRecordParserCallback.java
[maven-release-plugin] prepare for next development iteration

[maven-release-plugin] copy for tag jwat-1.1.0

[maven-release-plugin] prepare release jwat-1.1.0

Improve support for empty files and errors at the end of (W)ARC files in the ArchiveParser/ArchiveParserCallback.

JWAT-89: Removed encodedwords use in HeaderLineParser. Both need to be refactord and it is not really useful.

CR-JWAS-33: Follow-up on review.

  1. … 59 more files in changeset.
[maven-release-plugin] prepare for next development iteration

[maven-release-plugin] copy for tag jwat-1.0.6

[maven-release-plugin] prepare release jwat-1.0.6

JWAT-88: Unit test improved.

JWAT-88: Change so payload digest is not checked for WARC revisit and continuation records.

JWAT-87: Improved detection of garbage at the ed of (W)ARC files and unit tests of this. Also added unit tests testing empty (W)ARC files.

    • -0
    • +60
    /jwat-arc/src/test/resources/invalid-arcfile-record-then-garbage.arc
    • -0
    • +43
    /jwat-warc/src/test/resources/invalid-warcfile-record-then-garbage.warc
Forgot to remove some dependencies to ArcHeader in the parser.

Made some changes so ArcHeader, WarcHeader and HttpHeader can be re-parsed without using the complete Arc/Warc Readers.

Made some methods public instead of protected. Various cleanup.

[maven-release-plugin] prepare for next development iteration

[maven-release-plugin] copy for tag jwat-1.0.5

[maven-release-plugin] prepare release jwat-1.0.5

ArcReader and WarcReader now implement Iterable<..> interface.

Merged in tledouxfr/jwat/unknown_charset (pull request #6)

Use default charset in case of bad charset and handle bad encoding in WARC-Target-URI header (add a simple test case)

Use default charset in case of bad charset and handle bad encoding in WARC-Target-URI header (add a simple test case)

    • binary
    /jwat-warc/src/test/resources/invalid-warcfile-encoding-headers.warc.gz
[maven-release-plugin] prepare for next development iteration

[maven-release-plugin] copy for tag jwat-1.0.4

[maven-release-plugin] prepare release jwat-1.0.4

POM cleanup.