JWAT

Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
[maven-release-plugin] prepare release jwat-1.1.1
Removed a few unused import statements.
Added callback support to read and modify the ARC/WARC headers before processing the payload.
    • -0
    • +24
    /jwat-arc/src/main/java/org/jwat/arc/ArcRecordParserCallback.java
[maven-release-plugin] prepare for next development iteration
[maven-release-plugin] copy for tag jwat-1.1.0
[maven-release-plugin] prepare release jwat-1.1.0
Improve support for empty files and errors at the end of (W)ARC files in the ArchiveParser/ArchiveParserCallback.
JWAT-89: Removed encodedwords use in HeaderLineParser. Both need to be refactord and it is not really useful.
CR-JWAS-33: Follow-up on review.
  1. … 59 more files in changeset.
[maven-release-plugin] prepare for next development iteration
[maven-release-plugin] copy for tag jwat-1.0.6
[maven-release-plugin] prepare release jwat-1.0.6
JWAT-88: Unit test improved.
JWAT-88: Change so payload digest is not checked for WARC revisit and continuation records.
JWAT-87: Improved detection of garbage at the ed of (W)ARC files and unit tests of this. Also added unit tests testing empty (W)ARC files.
    • -0
    • +60
    /jwat-arc/src/test/resources/invalid-arcfile-record-then-garbage.arc
    • -0
    • +43
    /jwat-warc/src/test/resources/invalid-warcfile-record-then-garbage.warc
Forgot to remove some dependencies to ArcHeader in the parser.
Made some changes so ArcHeader, WarcHeader and HttpHeader can be re-parsed without using the complete Arc/Warc Readers.
Made some methods public instead of protected. Various cleanup.
[maven-release-plugin] prepare for next development iteration
[maven-release-plugin] copy for tag jwat-1.0.5
[maven-release-plugin] prepare release jwat-1.0.5
ArcReader and WarcReader now implement Iterable<..> interface.
Merged in tledouxfr/jwat/unknown_charset (pull request #6)

Use default charset in case of bad charset and handle bad encoding in WARC-Target-URI header (add a simple test case)

Use default charset in case of bad charset and handle bad encoding in WARC-Target-URI header (add a simple test case)
    • binary
    /jwat-warc/src/test/resources/invalid-warcfile-encoding-headers.warc.gz
[maven-release-plugin] prepare for next development iteration
[maven-release-plugin] copy for tag jwat-1.0.4
[maven-release-plugin] prepare release jwat-1.0.4
POM cleanup.
UriProfile throw clauses modified so that the invalid character gets hex encoded and the message becomes more meaningful.
[maven-release-plugin] prepare for next development iteration