Nicholas Clarke

Byterange was 1 byte to long. Job list does not call h3mon.init() any more, potential for long delay.

Added callback support to read and modify the ARC/WARC headers before processing the payload.
    • -0
    • +24
    /jwat-arc/src/main/java/org/jwat/arc/ArcRecordParserCallback.java
Removed page size limit in H3 monitor. Do not show H3 monitor information for done or failed jobs.

Merge branch 'master' into NAS-2638

Button to fail a job when H3 has died unexpectedly. Minor code review changes. Larger HTTP parsing buffer in ArchiveExtractCDXJob.

Fixed bug in crawllog caching. Work on optimizing frontier queue viewery. Removed useless auto refresh from most pages.

Merge branch 'master' into NAS-2638

NAS-2638: Handle script errors in Scripting/FrontierQueue by showing stacktrace on failure. SLF4J/Logback currently not included in the assembled webapp.

NAS-2638: Merge and fix of NAS-2642. Edited some buttons and removed tomcat articats from WEB-INF/lib directory.

NAS-2638: Moved menu generation to template builder wrapper. Improved config by catching exception for invalid regexes.

WEBDAN-269: Tweaked the menu.

NAS-2638: Tweaked zip artifact directory structure.

NAS-2638: Had to change the dependencies to include more since everything is mixed together.

NAS-2638: Also builds a h3 monitor artifact that uses tomcat embedded. H3 monitor resources split into separate class files. HTMLUtils modified to not be dependent on JSPOutputStream.

  1. … 8 more files in changeset.
Builds an artifact that uses tomcat embedded. Resources split into separate class files. HTMLUtils modified to not be dependent on JSPOutputStream.

  1. … 8 more files in changeset.
Added and fixed drop in hbase-phoenix ddl.

NAS-2638: Moved H3 remote monitor to separate module.

  1. … 44 more files in changeset.
NAS-2638: Moved H3 monitor classes to separate module.

  1. … 17 more files in changeset.
NAS-2641: Pagination now supports additional parameters in the page links.

Merge branch 'master' into staging

H3 monitor review follow-up and cleanup.

Improve support for empty files and errors at the end of (W)ARC files in the ArchiveParser/ArchiveParserCallback.
JWAT-89: Removed encodedwords use in HeaderLineParser. Both need to be refactord and it is not really useful.
CR-JWAS-33: Follow-up on review.
  1. … 59 more files in changeset.
JWAT-88: Unit test improved.
JWAT-88: Change so payload digest is not checked for WARC revisit and continuation records.
JWAT-87: Improved detection of garbage at the ed of (W)ARC files and unit tests of this. Also added unit tests testing empty (W)ARC files.
    • -0
    • +60
    /jwat-arc/src/test/resources/invalid-arcfile-record-then-garbage.arc
    • -0
    • +43
    /jwat-warc/src/test/resources/invalid-warcfile-record-then-garbage.warc
NAS-2610: Review followup.

Forgot to remove some dependencies to ArcHeader in the parser.