warclib

Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Moved files and folders around part 1.
  1. … 30 more files in changeset.
Big corporate merger!
Added some quoted string parsing. Fixed huge skip bug which was apparent testing with a BufferedInputStream.
Minor additions to the read header routine.
Partial quoted string and almost no encoded words.
  1. … 1 more file in changeset.
Added utf8 support to header linereader. Seems to works. Tests not conclusive.

Needs tweaking and more unit tests.

  1. … 2 more files in changeset.
Wrote a functional readheader line method that now handles multiline headers.

Added some unit test.

  1. … 4 more files in changeset.
Various stuff.

Moved test folders around.

Fixed trailing newline requirement after record.

Also fixed some incorrect test files.

Added a pushback inputstream for the newline checker and also to be used in header readline routine.

    • -0
    • +114
    ./WarcInputStream.java
  1. … 46 more files in changeset.
Fixed Content-Length in some test warc that were incorrect after checking for excess lines in the parser.

Minor tweaks.

Moved unit tests and test files to seperate folders.

  1. … 4 more files in changeset.
Added Digest Parser. Started on header readline method.

Added some more unit tests.

  1. … 3 more files in changeset.
Fixed some more header validation.

Added some matrix checks.

Added some content-type, segment-number checks.

Changed the error types to more types and more meaningful names.

Added some more unit tests to cover most of the current functionality.

    • -0
    • +88
    ./WarcValidationError.java
  1. … 13 more files in changeset.
Added detection of duplicate fields.

Finished some more unit-tests.

    • -0
    • +32
    ./WarcErrorType.java
  1. … 13 more files in changeset.
Added some unit tests.

Added some more header parsing code.

Fixed a date case error and case error in magic identifier.

  1. … 4 more files in changeset.
Added an iterator to the parser.

Introduced myself to junit and made 2 small tests that compare the number of records with the expected number using both the iterator and nextrecord method.

Parser almost validates all fields according to specs.

Policy errors need there own class.

The warc parser now parses all fields in a simplistic way.

WarcDateParser added. Other parsers were borrowed from the arc package.

    • -0
    • +65
    ./WarcDateParser.java
First commit.

CheckMagic and Version.

Primitive WARC field parser.

    • -0
    • +139
    ./WarcConstants.java
    • -0
    • +183
    ./WarcRecord.java
  1. … 1 more file in changeset.