Changes

#159 (11-Jul-2022 10:43:59)

  1. Changing back to 10. Issue is with crawl log caching. (commit: 1ecfd3917eea680d0c84a1809b54b87104643555) — Colin Rosenthal (csr) / githubweb

#157 (11-Jul-2022 08:26:35)

  1. Fix backslash use with regex (commit: e56fe98e53c0e01fd885b5142fdc9a376a5ab3d8) — clara.wiatrowski / githubweb
  2. Report script error to GUI (commit: 99a977c0d2ef6a9b394ea035e74b7441be8b8836) — clara.wiatrowski / githubweb

#155 (12-May-2022 11:17:20)

  1. Added some fixes so getMetadataMapper works when caching is disabled. (commit: ab09f8ef5eb0c73702fe9598fedc9a6b92f197fc) — Colin Rosenthal (csr) / githubweb
  2. Testing new crawlrss (commit: c95f4b4e426fcf194116ae8a0e3ead5d2c26ecc0) — Colin Rosenthal (csr) / githubweb
  3. Excluding clashing dependency of httpclient (commit: 2d37bccb714e8981ceb811b94d904fb3d64234ce) — Colin Rosenthal (csr) / githubweb
  4. Added an exclusion to the assembly (commit: b228e2039e0220a767d11c8a6fdb0e6362c464d1) — Colin Rosenthal (csr) / githubweb

#152 (03-May-2022 12:01:05)

  1. Excluding clashing dependency of httpclient (commit: 2d37bccb714e8981ceb811b94d904fb3d64234ce) — Colin Rosenthal (csr) / githubweb

#148 (07-Feb-2022 08:43:20)

  1. NAS-2874 Increment loaded TLD counter (commit: 9ce76c4c90200e2aab6c837ae8da2ecc45ed58a6) — clara.wiatrowski / githubweb
  2. Increased wait time for GUI startup to 300 s in integration test. (commit: fadac574a4b23573135d738002f6a85d65576f02) — Colin Rosenthal (csr) / githubweb
  3. Increased wait time for GUI startup to 600 s in integration test. (commit: d4a533db3345d222697247bc62c0fdb2bbfb62c3) — Colin Rosenthal (csr) / githubweb
  4. Added more output on waiting (commit: 06e3f570f2b8080f70d019032ee743b51543a51a) — Colin Rosenthal (csr) / githubweb
  5. Added a suggested jacoco exclusion from (commit: 363bd0797a9485139c3b2b306d6a7c2c80a582d0) — Colin Rosenthal (csr) / githubweb
  6. Added a suggested jacoco exclusion from (commit: 021189494a52540ecd84b2f05faec4ead4f62a59) — Colin Rosenthal (csr) / githubweb
  7. Set a pageload timeout for selenium (commit: f217f6e44f9da92814982fdac739ad5cf5e66fc5) — Colin Rosenthal (csr) / githubweb
  8. Experimental removal of one jacoco (commit: d5689554f663f35c5dca4432acddf5e4439698de) — Colin Rosenthal (csr) / githubweb
  9. Commented out hadoop tests that require a cluster to function (commit: 7ffc31d8cf002ab00951c61c8ffa337e3a4823f5) — Colin Rosenthal (csr) / githubweb
  10. [maven-release-plugin] prepare release netarchivesuite-7.3 (commit: 21bc5d6b60808accb511deb6542151803e3fd283) — Colin Rosenthal (csr) / githubweb
  11. [maven-release-plugin] prepare for next development iteration (commit: ab6a59444fb018007fec0f5ca2b218a72398b382) — Colin Rosenthal (csr) / githubweb
  12. Added fallback behaviour if hdfs caching fails to cache file (commit: 76ac9cb9eb40bd9fcc9010d8f12d7e64e25ac443) — Colin Rosenthal (csr) / githubweb

#147 (28-Jan-2022 10:41:10)

  1. Revert "Switched field to search for domain in crawl log" (commit: 534e263a5f491acddaf2697a61d51df56cc6fda1) — Colin Rosenthal (csr) / githubweb
  2. Bumped heritrix version to our version for NAS 7.3 (commit: c60eaac037bb5099878ec4fe05e1d08ef5e43a21) — Colin Rosenthal (csr) / githubweb

#146 (25-Jan-2022 14:54:42)

  1. Fixed a tempfile name (commit: 31fc8d34137a414d5c746069437456712678d46b) — Colin Rosenthal (csr) / githubweb
  2. Added debugging to crawllog searching (commit: 6c7a5062550049c6f6aab16d58f275970c3f5ef1) — Colin Rosenthal (csr) / githubweb
  3. Switched field to search for domain in crawl log (commit: f0e2fec464f219bbccc5ad1a8c47bb5ab34b5f79) — Colin Rosenthal (csr) / githubweb

#145 (25-Jan-2022 12:43:35)

  1. Added some logging. (commit: 4f2a7081eba3dbf57dd74356caac68bf394bb4de) — Colin Rosenthal (csr) / githubweb
  2. Improved logging in PutFileEventHandler (commit: fa6ad587d2ca341a37f1c95a72cc699737b71052) — Colin Rosenthal (csr) / githubweb
  3. Added handling for IDENTIFY_TIMEOUT and correct handling of out-of-sync (commit: 0fe311646422afdbc3d46bebc0d550af56e558bd) — Colin Rosenthal (csr) / githubweb
  4. Changed expected appliaction set in SystemTest to match new (commit: 9ea5aefda0f8b97159d90e98203f545b7c7600f7) — Colin Rosenthal (csr) / githubweb
  5. Changed log level. (commit: 2fad6ab352d1f932669f1c5b8f3c7350aff55071) — Colin Rosenthal (csr) / githubweb
  6. Ensure jobs are closed to prevent threadleak in invoking java process (commit: bf6398619466adaf8f019aae7210544afc6d142c) — Asger Askov Blekinge (abr) / githubweb
  7. Ensure Filesystem objects are closed after use (commit: 802c4d77c7232bf263fc1b0a534c0ee2944b83b1) — Asger Askov Blekinge (abr) / githubweb
  8. Include exit code in IOFailure exception (commit: 75bc6fef31051e97fa06da8e58c28c3d508346b7) — Asger Askov Blekinge (abr) / githubweb
  9. HadoopJobTools logs if the job failes (commit: 22d8f391af675451ab3dbef953113f2d77d968a3) — Asger Askov Blekinge (abr) / githubweb
  10. Changed log level. (commit: 6d4214fb0e32b897d1bddc72add08d79a5ed0dde) — Colin Rosenthal (csr) / githubweb
  11. GetMetadataMapper and cacheFile report progress to prevent a (commit: 934d4bb20c17b3e7ed28636af4164857e0fb7705) — Asger Askov Blekinge (abr) / githubweb
  12. Hadoop 3.3.1 as used in test and prod clusters (commit: 1b0383b08a490ff22a913d9888c0e19ae8298dcf) — Asger Askov Blekinge (abr) / githubweb
  13. dedupIndexer can now send progress info to hadoop and thus hopefully (commit: 936944d28b6add356f10837d7f4b1f5f3f8efa39) — Asger Askov Blekinge (abr) / githubweb
  14. Merged commit (commit: 7a085bb6f6c16325f007b5ff3614731fe928968e) — Colin Rosenthal (csr) / githubweb
  15. Fixed error introduced during merge (commit: 51e3eaacfe50ff91354213a63907835f73599123) — Colin Rosenthal (csr) / githubweb
  16. Fixed error in test spec (commit: 00f781947cfe78334b9723ae55fcfbcdf952cd2e) — Colin Rosenthal (csr) / githubweb
  17. Fixed error in test spec (commit: bc7b7cc0f5d8b3cd6778e6ed71771e56c635a7f4) — Colin Rosenthal (csr) / githubweb
  18. Explicitly create cache file when caching hdfs (commit: 34952e98ab34e3985d9171f85ee4128d6e7f8d29) — Asger Askov Blekinge (abr) / githubweb
  19. Modified FileResolver to return empty if http response code is not 200. (commit: cbc51994639305fe8a36746c6ba4c00492b3173e) — Colin Rosenthal (csr) / githubweb
  20. Fixed bitmag getfileids and some cleanup (commit: 520e4de5e267e3af2d40fa6e78803c15a33df510) — Colin Rosenthal (csr) / githubweb
  21. Writing direct to hdfs. (commit: 2ab46b46fece97462f8000d14ddcf3017c77dd2c) — Colin Rosenthal (csr) / githubweb
  22. Added direct output streaming from hdfs (commit: 480e7411841e986bc48e2bc9fa7d5f759f5eead7) — Colin Rosenthal (csr) / githubweb
  23. Rewritten GetFileIDsAction to use a new handler for each call. (commit: d354945a9730fe2b394f1cc434afefa39d4cf69e) — Colin Rosenthal (csr) / githubweb
  24. Fixed some issues with holding large hadoop result sets in memory (commit: c5d6c5d38bb0cfe52a15493ccde9fa24a07e6a8d) — Colin Rosenthal (csr) / githubweb

#142 (14-Sep-2021 14:23:05)

  1. Progress reporting in CDXMapper should prevent timeouts on big warc (commit: f4c54cddfe7440cc5024fee62d1feb6b057089d1) — Asger Askov Blekinge (abr) / githubweb