Changes

#31 (08-Aug-2022 13:31:59)

  1. Added some fixes so getMetadataMapper works when caching is disabled. (commit: ab09f8ef5eb0c73702fe9598fedc9a6b92f197fc) — Colin Rosenthal (csr) / githubweb
  2. Testing new crawlrss (commit: c95f4b4e426fcf194116ae8a0e3ead5d2c26ecc0) — Colin Rosenthal (csr) / githubweb
  3. Excluding clashing dependency of httpclient (commit: 2d37bccb714e8981ceb811b94d904fb3d64234ce) — Colin Rosenthal (csr) / githubweb
  4. Added an exclusion to the assembly (commit: b228e2039e0220a767d11c8a6fdb0e6362c464d1) — Colin Rosenthal (csr) / githubweb
  5. Fix backslash use with regex (commit: e56fe98e53c0e01fd885b5142fdc9a376a5ab3d8) — clara.wiatrowski / githubweb
  6. Delete all configs process (commit: a064796cbbdb99751763d85ace01f041a7bfd35c) — clara.wiatrowski / githubweb
  7. Report script error to GUI (commit: 99a977c0d2ef6a9b394ea035e74b7441be8b8836) — clara.wiatrowski / githubweb
  8. Upgrade database model (commit: 909341de18a58f46f100188110d0f61506448958) — clara.wiatrowski / githubweb
  9. Add and update totalBytesWritten (commit: c89a3a460eee1c732da3a4096c169ed2f42a067b) — clara.wiatrowski / githubweb
  10. Display totalBytesWritten in GUI (commit: 7a4044cbb9e83d149dd2291a92107ff10ab91632) — clara.wiatrowski / githubweb
  11. Removed two potential NPEs (commit: 214b354e830ce28d2bdea0d16704c1168b232997) — Colin Rosenthal (csr) / githubweb
  12. Distinguish queues types (commit: 810288f8f53177eb8f8f8e91ccf22467a754b099) — clara.wiatrowski / githubweb
  13. update danish wording (commit: a9d1464c12a7c08f9484cead7afee284764db8a4) — clara.wiatrowski / githubweb
  14. update danish wording (commit: c69421ee0e47bf1f31112f13e8fa887bf913892f) — clara.wiatrowski / githubweb
  15. Added a to-do related to NAS-2875 (commit: 85b7a3ac9e8761c879014a931f2f94fe07066be4) — Colin Rosenthal (csr) / githubweb
  16. Changed domain match to element 3 of crawl log line (commit: afd201569e56243da9f1fc5c88179806adb637a9) — Colin Rosenthal (csr) / githubweb
  17. Us crawl log field 11 for domain (commit: 530d443d1c093ced66f5fc473e516c6b5c0373f6) — Colin Rosenthal (csr) / githubweb
  18. Changing back to 10. Issue is with crawl log caching. (commit: 1ecfd3917eea680d0c84a1809b54b87104643555) — Colin Rosenthal (csr) / githubweb
  19. Removed last Windows bitarkiv from test setup (commit: 41bfb71dbef6f3b827ca963c193de5266a191653) — Colin Rosenthal (csr) / githubweb
  20. Added dependency to make intellij debugging easier (commit: 120e95f4208801032f84b690a936ed239298780d) — Colin Rosenthal (csr) / githubweb
  21. Created common method for filtering carwl logs by domain or regex (commit: c0b47406849900e9abe978d2039f36990858985c) — Colin Rosenthal (csr) / githubweb
  22. Commented out failing test - see https://sbforge.org/jira/browse/NAS-848 (commit: 40a83ba3f213e004d9c00e6478b437d23c298e8c) — Colin Rosenthal (csr) / githubweb
  23. Commented out test that sometimes fails on Jenkins. (commit: 40fea001615fca40b4eaf0429e6d925e5753792a) — Colin Rosenthal (csr) / githubweb
  24. Experiment with absolute-ordering (commit: 1d540343f571b3cd4234cc2f6d10d6149416de9d) — Colin Rosenthal (csr) / githubweb
  25. Marked GUI dependency as "provided". (commit: 0f0958484efc6bab2c64430a374a17b86732405a) — Colin Rosenthal (csr) / githubweb
  26. Added corrected behaviour when the seed is missing a scheme. (commit: 0d30f3765d08aa9bdfcbe3ba33429d7f6fb486f2) — Colin Rosenthal (csr) / githubweb
  27. Fix to NAS-2883 (commit: bfe20c8963fb50986ebacce856f636dee5b6f75f) — Colin Rosenthal (csr) / githubweb
  28. Upgraded heritrix version to lates snapshot (commit: a9da22a92f9dce69dd51ed76ffe8acbbe88da5a6) — Colin Rosenthal (csr) / githubweb
  29. Made sure to exclude unwanted older heritrix from crawl-rss import (commit: a01ffef132b3820aa9edf77b80ee36faa37933a7) — Colin Rosenthal (csr) / githubweb
  30. Update snapshot deps for wrapper and heritrix (commit: 6e3fd0260c15a0ccefa4dafe920fecf3bea774cf) — Colin Rosenthal (csr) / githubweb
  31. [maven-release-plugin] prepare release netarchivesuite-7.4 (commit: ff43b1ffbb7beea67078a7b5f95043677bbabeea) — Colin Rosenthal (csr) / githubweb
  32. [maven-release-plugin] prepare for next development iteration (commit: 395ff291e16c4aa5ecbeb0659df6fcd14ce1ff31) — Colin Rosenthal (csr) / githubweb
  33. Get sizeOnDisk from CrawlProgressMessage (commit: 5151b607a496449c1b856f1e1a30293c27c2e545) — clara.wiatrowski / githubweb

#30 (08-Aug-2022 13:30:54)

  1. Fix to NAS-2883 (commit: bfe20c8963fb50986ebacce856f636dee5b6f75f) — Colin Rosenthal (csr) / githubweb
  2. Upgraded heritrix version to lates snapshot (commit: a9da22a92f9dce69dd51ed76ffe8acbbe88da5a6) — Colin Rosenthal (csr) / githubweb
  3. Made sure to exclude unwanted older heritrix from crawl-rss import (commit: a01ffef132b3820aa9edf77b80ee36faa37933a7) — Colin Rosenthal (csr) / githubweb
  4. Update snapshot deps for wrapper and heritrix (commit: 6e3fd0260c15a0ccefa4dafe920fecf3bea774cf) — Colin Rosenthal (csr) / githubweb

#29 (12-Jul-2022 09:47:57)

  1. Marked GUI dependency as "provided". (commit: 0f0958484efc6bab2c64430a374a17b86732405a) — Colin Rosenthal (csr) / githubweb
  2. Added corrected behaviour when the seed is missing a scheme. (commit: 0d30f3765d08aa9bdfcbe3ba33429d7f6fb486f2) — Colin Rosenthal (csr) / githubweb

#28 (11-Jul-2022 15:52:06)

  1. Commented out failing test - see https://sbforge.org/jira/browse/NAS-848 (commit: 40a83ba3f213e004d9c00e6478b437d23c298e8c) — Colin Rosenthal (csr) / githubweb
  2. Commented out test that sometimes fails on Jenkins. (commit: 40fea001615fca40b4eaf0429e6d925e5753792a) — Colin Rosenthal (csr) / githubweb
  3. Experiment with absolute-ordering (commit: 1d540343f571b3cd4234cc2f6d10d6149416de9d) — Colin Rosenthal (csr) / githubweb

#27 (11-Jul-2022 14:10:53)

  1. Added dependency to make intellij debugging easier (commit: 120e95f4208801032f84b690a936ed239298780d) — Colin Rosenthal (csr) / githubweb
  2. Created common method for filtering carwl logs by domain or regex (commit: c0b47406849900e9abe978d2039f36990858985c) — Colin Rosenthal (csr) / githubweb

#26 (11-Jul-2022 13:40:45)

  1. Removed last Windows bitarkiv from test setup (commit: 41bfb71dbef6f3b827ca963c193de5266a191653) — Colin Rosenthal (csr) / githubweb

#24 (15-Mar-2022 14:51:03)

  1. [maven-release-plugin] prepare release netarchivesuite-7.3 (commit: 21bc5d6b60808accb511deb6542151803e3fd283) — Colin Rosenthal (csr) / githubweb
  2. [maven-release-plugin] prepare for next development iteration (commit: ab6a59444fb018007fec0f5ca2b218a72398b382) — Colin Rosenthal (csr) / githubweb
  3. Added fallback behaviour if hdfs caching fails to cache file (commit: 76ac9cb9eb40bd9fcc9010d8f12d7e64e25ac443) — Colin Rosenthal (csr) / githubweb

#22 (31-Jan-2022 08:30:58)

  1. Updated complete settings. (commit: ec8997c5169139e33d096a97306055628149c89c) — Colin Rosenthal (csr) / githubweb
  2. [maven-release-plugin] prepare release netarchivesuite-7.1 (commit: 1d53f8bcdc078160b94774ca5bceb31263ca8355) — Colin Rosenthal (csr) / githubweb
  3. [maven-release-plugin] prepare for next development iteration (commit: 6c38a07255284b81c271cd42aa9de132a29f810b) — Colin Rosenthal (csr) / githubweb
  4. Updated javadoc configuration (commit: f2fb2e833c885960de8308f986afb67f80b702b6) — Colin Rosenthal (csr) / githubweb
  5. Updated to support new protocol-agnostic server-ip attribute in heritrix (commit: 0533aefea0a2c4d8f8f997f2c02c1c752605c0bd) — Colin Rosenthal (csr) / githubweb
  6. Updated Heritrix snapshot version (commit: 2daa6a7e569f5c874201ba5881f271503a0ab31d) — Colin Rosenthal (csr) / githubweb
  7. Replaced transitive dnsjava with explicit import (commit: 3dec86c2b13056e8865abf04409afdc182c6912c) — Colin Rosenthal (csr) / githubweb
  8. Replaced transitive dnsjava with explicit import (completely purged now) (commit: 4100b6a03522368c5e229a9a893d7458c72b7a55) — Colin Rosenthal (csr) / githubweb
  9. Fixed api calls to new dnsjava (commit: ea179c16682e4c885cc4cdd90c0cc388871d8348) — Colin Rosenthal (csr) / githubweb
  10. Attempt to auto-detect heritrix version (commit: 63321f17f5b76200f5a5264afff93bf3b2b30f92) — Colin Rosenthal (csr) / githubweb
  11. Added some logging. (commit: 17f6c88b5f3c98dc1894ae06840215c123c28e63) — Colin Rosenthal (csr) / githubweb
  12. Included webarchive-commons dependency at compile time. (commit: 8d8ffd4e664644d75a7e8d9678d9b52a24308b72) — Colin Rosenthal (csr) / githubweb
  13. Added auto-detection of heritrix version (commit: 508f3ba00225857d00214c828b210fb9ab8cf3a5) — Colin Rosenthal (csr) / githubweb
  14. Excluded je (commit: e52e94422088473e8eed44b5058ba8ac1d5d4515) — Colin Rosenthal (csr) / githubweb
  15. Removed duplicate dependency (commit: 77c3130494c64b3babe8e5c1194b0334cd5a53e2) — Colin Rosenthal (csr) / githubweb
  16. Log warning if H3 version cannot be found automatically. (commit: e1e2cc4d1d52ca6be64a7fbd497ad36bf8db5853) — Colin Rosenthal (csr) / githubweb
  17. Update javadoc. (commit: 67531364d6acdb0486623fe78ff9b07bf74c4afa) — Colin Rosenthal (csr) / githubweb
  18. New complete settings (commit: f821e1f2e5722b3d341b2a3f470271698faca822) — Colin Rosenthal (csr) / githubweb
  19. Fixed minicluster test cases to not use hdfs caching. (commit: fb8bc4311890781df75ab50013d3da5f0a2cba7e) — Colin Rosenthal (csr) / githubweb
  20. Fixed additional minicluster test cases with missing configuration param (commit: e5f7d9d8e19d4724ce0f3e6d13c3376038984b00) — Colin Rosenthal (csr) / githubweb
  21. Updated the Heritrix version to our matching release 3.4.0-NAS-7.2 (commit: c9a3b57feb5ce65667b014d87b1af6e1dd65473d) — Colin Rosenthal (csr) / githubweb
  22. [maven-release-plugin] prepare release netarchivesuite-7.2 (commit: 5451945661aee7d11384d88bfb27e70fd2ffc021) — Colin Rosenthal (csr) / githubweb
  23. [maven-release-plugin] prepare for next development iteration (commit: 77d8764f3d78690ee5759d2536d5e3688c91bab8) — Colin Rosenthal (csr) / githubweb
  24. Removed jar scanning as an experiment (commit: 808014cd265f0e4d4030db8fd8d075329328cd21) — Colin Rosenthal (csr) / githubweb
  25. Progress reporting in CDXMapper should prevent timeouts on big warc (commit: f4c54cddfe7440cc5024fee62d1feb6b057089d1) — Asger Askov Blekinge (abr) / githubweb
  26. Added some logging. (commit: 4f2a7081eba3dbf57dd74356caac68bf394bb4de) — Colin Rosenthal (csr) / githubweb
  27. Improved logging in PutFileEventHandler (commit: fa6ad587d2ca341a37f1c95a72cc699737b71052) — Colin Rosenthal (csr) / githubweb
  28. Added handling for IDENTIFY_TIMEOUT and correct handling of out-of-sync (commit: 0fe311646422afdbc3d46bebc0d550af56e558bd) — Colin Rosenthal (csr) / githubweb
  29. Changed expected appliaction set in SystemTest to match new (commit: 9ea5aefda0f8b97159d90e98203f545b7c7600f7) — Colin Rosenthal (csr) / githubweb
  30. Changed log level. (commit: 2fad6ab352d1f932669f1c5b8f3c7350aff55071) — Colin Rosenthal (csr) / githubweb
  31. Ensure jobs are closed to prevent threadleak in invoking java process (commit: bf6398619466adaf8f019aae7210544afc6d142c) — Asger Askov Blekinge (abr) / githubweb
  32. Ensure Filesystem objects are closed after use (commit: 802c4d77c7232bf263fc1b0a534c0ee2944b83b1) — Asger Askov Blekinge (abr) / githubweb
  33. Include exit code in IOFailure exception (commit: 75bc6fef31051e97fa06da8e58c28c3d508346b7) — Asger Askov Blekinge (abr) / githubweb
  34. HadoopJobTools logs if the job failes (commit: 22d8f391af675451ab3dbef953113f2d77d968a3) — Asger Askov Blekinge (abr) / githubweb
  35. Changed log level. (commit: 6d4214fb0e32b897d1bddc72add08d79a5ed0dde) — Colin Rosenthal (csr) / githubweb
  36. GetMetadataMapper and cacheFile report progress to prevent a (commit: 934d4bb20c17b3e7ed28636af4164857e0fb7705) — Asger Askov Blekinge (abr) / githubweb
  37. Hadoop 3.3.1 as used in test and prod clusters (commit: 1b0383b08a490ff22a913d9888c0e19ae8298dcf) — Asger Askov Blekinge (abr) / githubweb
  38. dedupIndexer can now send progress info to hadoop and thus hopefully (commit: 936944d28b6add356f10837d7f4b1f5f3f8efa39) — Asger Askov Blekinge (abr) / githubweb
  39. Merged commit (commit: 7a085bb6f6c16325f007b5ff3614731fe928968e) — Colin Rosenthal (csr) / githubweb
  40. Fixed error introduced during merge (commit: 51e3eaacfe50ff91354213a63907835f73599123) — Colin Rosenthal (csr) / githubweb
  41. Fixed error in test spec (commit: 00f781947cfe78334b9723ae55fcfbcdf952cd2e) — Colin Rosenthal (csr) / githubweb
  42. Fixed error in test spec (commit: bc7b7cc0f5d8b3cd6778e6ed71771e56c635a7f4) — Colin Rosenthal (csr) / githubweb
  43. Explicitly create cache file when caching hdfs (commit: 34952e98ab34e3985d9171f85ee4128d6e7f8d29) — Asger Askov Blekinge (abr) / githubweb
  44. Modified FileResolver to return empty if http response code is not 200. (commit: cbc51994639305fe8a36746c6ba4c00492b3173e) — Colin Rosenthal (csr) / githubweb
  45. Fixed bitmag getfileids and some cleanup (commit: 520e4de5e267e3af2d40fa6e78803c15a33df510) — Colin Rosenthal (csr) / githubweb
  46. Writing direct to hdfs. (commit: 2ab46b46fece97462f8000d14ddcf3017c77dd2c) — Colin Rosenthal (csr) / githubweb
  47. NAS-2874 Increment loaded TLD counter (commit: 9ce76c4c90200e2aab6c837ae8da2ecc45ed58a6) — clara.wiatrowski / githubweb
  48. Added direct output streaming from hdfs (commit: 480e7411841e986bc48e2bc9fa7d5f759f5eead7) — Colin Rosenthal (csr) / githubweb
  49. Rewritten GetFileIDsAction to use a new handler for each call. (commit: d354945a9730fe2b394f1cc434afefa39d4cf69e) — Colin Rosenthal (csr) / githubweb
  50. Fixed some issues with holding large hadoop result sets in memory (commit: c5d6c5d38bb0cfe52a15493ccde9fa24a07e6a8d) — Colin Rosenthal (csr) / githubweb
  51. Fixed a tempfile name (commit: 31fc8d34137a414d5c746069437456712678d46b) — Colin Rosenthal (csr) / githubweb
  52. Added debugging to crawllog searching (commit: 6c7a5062550049c6f6aab16d58f275970c3f5ef1) — Colin Rosenthal (csr) / githubweb
  53. Switched field to search for domain in crawl log (commit: f0e2fec464f219bbccc5ad1a8c47bb5ab34b5f79) — Colin Rosenthal (csr) / githubweb
  54. Revert "Switched field to search for domain in crawl log" (commit: 534e263a5f491acddaf2697a61d51df56cc6fda1) — Colin Rosenthal (csr) / githubweb
  55. Bumped heritrix version to our version for NAS 7.3 (commit: c60eaac037bb5099878ec4fe05e1d08ef5e43a21) — Colin Rosenthal (csr) / githubweb
  56. Increased wait time for GUI startup to 300 s in integration test. (commit: fadac574a4b23573135d738002f6a85d65576f02) — Colin Rosenthal (csr) / githubweb
  57. Increased wait time for GUI startup to 600 s in integration test. (commit: d4a533db3345d222697247bc62c0fdb2bbfb62c3) — Colin Rosenthal (csr) / githubweb
  58. Added more output on waiting (commit: 06e3f570f2b8080f70d019032ee743b51543a51a) — Colin Rosenthal (csr) / githubweb
  59. Added a suggested jacoco exclusion from (commit: 363bd0797a9485139c3b2b306d6a7c2c80a582d0) — Colin Rosenthal (csr) / githubweb
  60. Added a suggested jacoco exclusion from (commit: 021189494a52540ecd84b2f05faec4ead4f62a59) — Colin Rosenthal (csr) / githubweb
  61. Set a pageload timeout for selenium (commit: f217f6e44f9da92814982fdac739ad5cf5e66fc5) — Colin Rosenthal (csr) / githubweb
  62. Experimental removal of one jacoco (commit: d5689554f663f35c5dca4432acddf5e4439698de) — Colin Rosenthal (csr) / githubweb
  63. Commented out hadoop tests that require a cluster to function (commit: 7ffc31d8cf002ab00951c61c8ffa337e3a4823f5) — Colin Rosenthal (csr) / githubweb

#21 (06-Jul-2021 09:13:03)

  1. https://sbforge.org/jira/browse/NAS-2859 (commit: 5aa217c09975ae120744599f93eb99979658c293) — apre / githubweb
  2. Update OnbFreeSpaceProvider.java (commit: aed496e2550fe60a499b747c74dff74db965dd57) — noreply / githubweb
  3. Update OnbFreeSpaceProvider.java (commit: f4b2bc107c333408155e76b78985e4ed2d395790) — noreply / githubweb
  4. Update OnbFreeSpaceProvider.java (commit: 431ffbf03c51e065068eef0b8f70164a0d92f1a8) — noreply / githubweb
  5. Actually want to write requests and metadata by default in tests! (commit: 3ed813811e34e2d2d86d7dfede9d9ea5a318328f) — Colin Rosenthal (csr) / githubweb
  6. Updated version to a unique name (commit: e3b328a3cca883f6396a2955ef28bb0e2c7d2300) — Colin Rosenthal (csr) / githubweb
  7. Quick fix to NARK-1819 (commit: 7a4be0771bb6221cfc11689154393b1536dbb940) — Colin Rosenthal (csr) / githubweb
  8. non-function arcrep for use in bitmag development (commit: 143645020442b9cbaa453401829ad7fcbeb1e11f) — Colin Rosenthal (csr) / githubweb
  9. poms updated with Hadoop and basic settings for it added in (commit: 1360e402889d881aee7e6a964c7847f2a96de0b5) — Rasmus Bohl Kristensen (rbkr) / githubweb
  10. Small changes to settings (commit: a85a3ad520234cf8cedeeef058911ad47c304474) — Rasmus Bohl Kristensen (rbkr) / githubweb
  11. Can now at least work with local bitmag, it seems (commit: 91bdce496fa2ebe20ee0eeb8ce97528a868505f1) — Rasmus Bohl Kristensen (rbkr) / githubweb
  12. FileNameHarvester now grabs list of files directly from bitmag. Added (commit: 6824e324e6ccd74a10740790c11968bd5803a312) — Rasmus Bohl Kristensen (rbkr) / githubweb
  13. Indexing through hadoop instead of batch should now work for WARC files (commit: 982056795b045ed339987660c2d13dd7f41d1073) — Rasmus Bohl Kristensen (rbkr) / githubweb
  14. Changes from review (commit: a4871eadd0dcfd71fe3bda54170c5f911ae3da88) — Rasmus Bohl Kristensen (rbkr) / githubweb
  15. Fixed dependency conflict with hadoop-client package and finished (commit: e8735f0b8fa3800b012a7928e369cf358b5248d8) — Rasmus Bohl Kristensen (rbkr) / githubweb
  16. Refactored Bitrepository to a singleton. (commit: e494e8d0482fde2bc44ee840b4e675dac00a43d5) — Colin Rosenthal (csr) / githubweb
  17. Small logging changes (commit: c091c483521bf77ea95b7450dd0aee6449f5ccac) — Rasmus Bohl Kristensen (rbkr) / githubweb
  18. Bitrepository class changes (commit: 9a39a4b2aacdc9283f4c34b080693ca5a1ddfe92) — Rasmus Bohl Kristensen (rbkr) / githubweb
  19. Dependency fix to avoid logging loop and small logging changes (commit: a709b310c9f6af51c6312e3ed3fec73a912e2622) — Rasmus Bohl Kristensen (rbkr) / githubweb
  20. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: 68ab4244669d4e8d7847001c179f62cb019cacc1) — Colin Rosenthal (csr) / githubweb
  21. [maven-release-plugin] prepare for next development iteration (commit: 16147d5dddfd034ad243da25c911a9fa1e4d53d3) — Colin Rosenthal (csr) / githubweb
  22. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: b246f40d31190967ee84f51cf54e73def89c6737) — Colin Rosenthal (csr) / githubweb
  23. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: e97567d8b0cf594e4ba5ac10d3d7d8449adc0cc0) — Colin Rosenthal (csr) / githubweb
  24. [maven-release-plugin] prepare for next development iteration (commit: 974b3a9a687aca6fd6fba84828f84f0699ff8078) — Colin Rosenthal (csr) / githubweb
  25. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: df51498918198325ccd2b687dde5c09872ddcc1b) — Colin Rosenthal (csr) / githubweb
  26. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: 597dca6302626d2eb975dbd7b6e8f7bf8e4dfe17) — Colin Rosenthal (csr) / githubweb
  27. [maven-release-plugin] prepare for next development iteration (commit: d7f4a80b29070e594c26a17a7087d4fff502b8ea) — Colin Rosenthal (csr) / githubweb
  28. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: c8b3c3a9a215db98a3e07c327593f73b93d531ea) — Colin Rosenthal (csr) / githubweb
  29. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: d71699a62c3a45657e4faf14519b97ba71593b80) — Colin Rosenthal (csr) / githubweb
  30. [maven-release-plugin] prepare for next development iteration (commit: d2fae1846cd326ddf93451ef5d9f15ecbe4f6a73) — Colin Rosenthal (csr) / githubweb
  31. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: 185f170ccde75314ccc6dfec0939d2390c459824) — Colin Rosenthal (csr) / githubweb
  32. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: 3ca4bd51ae53845b7e6867d637188d4b866f17a6) — Colin Rosenthal (csr) / githubweb
  33. [maven-release-plugin] prepare for next development iteration (commit: 3208775980a139629d0de3a5c1c9859f2ee21543) — Colin Rosenthal (csr) / githubweb
  34. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: f67005308b68ec48d71838bc2efe4d7a9ee38a07) — Colin Rosenthal (csr) / githubweb
  35. Fixed issue with release deployment. (commit: 5d81a39a5881eb2f494a517ed2c7615929670b5e) — Colin Rosenthal (csr) / githubweb
  36. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: dff365105289bbe62558d6dafa673690caf7d153) — Colin Rosenthal (csr) / githubweb
  37. [maven-release-plugin] prepare for next development iteration (commit: 6fcd724e7c8c86b5edb57b79e74e0c2a202101e1) — Colin Rosenthal (csr) / githubweb
  38. WarcRecordClient.java andApacheClientReaderFactory.java in (commit: d71da3e20c6df7f04c793f8b55f0fc6a028286d1) — Peter Christiansen (pech) / githubweb
  39. WarcRecordClient get and getFile changed (commit: 2b5e8b16b240a009f9ae1ee456350d70aaa821ac) — Peter Christiansen (pech) / githubweb
  40. Mulig del-løsning (commit: c81836a89060b2194e91860f82947d17a27fe8dc) — Colin Rosenthal (csr) / githubweb
  41. Fixed datafil og tilføjet lidt dokumentation (commit: 26339a4799be4c7ecff361b2be29861d8af2e2f8) — Colin Rosenthal (csr) / githubweb
  42. Added some integration tests for indexing on hadoop (commit: 96dfe55073b04b9cab36dbc4607c7496c877c2fd) — Colin Rosenthal (csr) / githubweb
  43. Removed the test which was less like the anticipated prod architecture (commit: 1ac65bb88c41f4f60214f728145a0c39ad035d46) — Colin Rosenthal (csr) / githubweb
  44. Tidied up the hadoop/cdx integration test (commit: b8eaa110fb2b6f1b8e2530776aa5c8d3e901273c) — Colin Rosenthal (csr) / githubweb
  45. Added an integration test for WarcRecordClient (commit: 64ac9178de086347e9335ba4b98316370ab699b8) — Colin Rosenthal (csr) / githubweb
  46. Added Readme file in empty directory (commit: c4da8b152e8fa7f10de0bd1fb48ed0868b7664eb) — Colin Rosenthal (csr) / githubweb
  47. Added Readme file in empty directory (commit: 96b73b68fa7e1301ab0328df8e0210d56a735c52) — Colin Rosenthal (csr) / githubweb
  48. Added a hdfs setting that seems relevant (commit: 60cd273673637ba210763dda8b107a7b96bef508) — Colin Rosenthal (csr) / githubweb
  49. WarcRecord fixes for WarcRecordClientTest and Tester (commit: 961cff06f0e4c6241fc550d476add1739b142f7f) — Peter Christiansen (pech) / githubweb
  50. Error fix (commit: 3cce599a2ccae8aa6ba1c26c75a33cd893ca6220) — Peter Christiansen (pech) / githubweb
  51. Made method for indexing with Hadoop that assumes direct access to input (commit: b227515615b8ca0bd8c8fe2fd34be189679549c8) — Rasmus Bohl Kristensen (rbkr) / githubweb
  52. Dedup indexing (commit: f5508c47ebc69bc946ff16ca87218c949a59508a) — Colin Rosenthal (csr) / githubweb
  53. latest from pc (commit: 9cb0209af9f385e3527307c1dd08c4a78b588c4a) — Peter Christiansen (pech) / githubweb
  54. Moved getWarc from constructor to get (commit: 4e00feebd87d28c227a323907ab83cdaca0e143d) — Peter Christiansen (pech) / githubweb
  55. Code-maturation for cdx-indexing (commit: c87a69b57c851c9d90e12982a344da40f614bc74) — Colin Rosenthal (csr) / githubweb
  56. URI corrected to include filename Not yet robust for files not in gzip (commit: 3148c07477ae875b0e29021e146090e7487312f7) — Peter Christiansen (pech) / githubweb
  57. Hardcoded finName for testing (commit: dda3889ee6740e4306cf84f8e0b7bbee557ee8ee) — Peter Christiansen (pech) / githubweb
  58. Hardcoded finName for testing (commit: dee3eaab03e37446ffd1be94b1783193a68bba4d) — Peter Christiansen (pech) / githubweb
  59. Attempt to avoid double-indexing (commit: e1c23281deb8bd3055c9f85508c5076076f37c32) — Colin Rosenthal (csr) / githubweb
  60. Now passes integration test. (commit: 65fd5e068a37d7040f4ef741dd35347d00136ce5) — Colin Rosenthal (csr) / githubweb
  61. Now returns correct record. (commit: 671d7f64ec0db29802bba897eddfdce56d026b49) — Colin Rosenthal (csr) / githubweb
  62. Efter lidt cleanup (commit: dd9de321b98a8635ba8dd1948152469046b3985f) — Peter Christiansen (pech) / githubweb
  63. Initial work on FileResolver (commit: d411beb865c7dace665ae2709953857503a37c27) — Colin Rosenthal (csr) / githubweb
  64. Efter endnu lidt cleanup, men før logs (commit: e7504646b6f369f362a96d9ddcf01e4df7266323) — Peter Christiansen (pech) / githubweb
  65. Added hadoop job for getting metadata lines from archive files and an (commit: 553e20659df3bb62c6d121d50c6effa3fc8947e9) — Rasmus Bohl Kristensen (rbkr) / githubweb
  66. latest update (commit: 5763a813e2ec0f0eb1e8ad50cefcafc4767f0455) — Peter Christiansen (pech) / githubweb
  67. Added filehandling for GetMetadataArchiveMapper and small touch ups (commit: 24aaecbc74299fa9fda9191dfe510977aa027b8f) — Rasmus Bohl Kristensen (rbkr) / githubweb
  68. added null response if http statuscode is not 200 (commit: 4804a0787d97a2a9c3802edb716d3d3c78753259) — Peter Christiansen (pech) / githubweb
  69. removed printlns and added logging for http exception (commit: db425bdcd25a943fdf3c59f741fb5eeb04da82b1) — Peter Christiansen (pech) / githubweb
  70. Added pattern-matching method to file-resolver (commit: 57b380f2f8c289544404aecbb81f6cef8a084274) — Colin Rosenthal (csr) / githubweb
  71. Small refactor of ArchiveFile/HadoopUtils, few touch ups and started on (commit: 0d880c6017572102b4bf24e60d87ea35a84e2470) — Rasmus Bohl Kristensen (rbkr) / githubweb
  72. added test methods for archive files and negative testing (commit: 085be39423a4881b64793f1328a73ba0286a3f3f) — Peter Christiansen (pech) / githubweb
  73. Changed test to use paths relative to module root (commit: 0602fcb9458cfa655d0cb39f98d004e720bc9e42) — Colin Rosenthal (csr) / githubweb
  74. Added tests (commit: ec684feb1b69f99952e68dd4e366289b45710ed6) — Peter Christiansen (pech) / githubweb
  75. test corrections excludes .gz (commit: aca94d36b025e95458e0f718bfd83b9d6876545a) — Peter Christiansen (pech) / githubweb
  76. Added a conf flag to switch between standard indexing and dedup indexing (commit: 2e2173b833c5c5ddd879e7548b96d875a06353a1) — Colin Rosenthal (csr) / githubweb
  77. 'Start' of https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-1970 (commit: d52a6bfda1ed72ad4fc125356ae274f18e0de8c6) — Rasmus Bohl Kristensen (rbkr) / githubweb
  78. Tiny settings change for NARK-1882 review (commit: 27a3d95258902008eda3d450c7261e3d694a4c10) — Rasmus Bohl Kristensen (rbkr) / githubweb
  79. Integration of Hadoop dedup indexing with GetMetadataArchiveMapper now (commit: ca2c62d474caf14d80e8e1e8f3970a4582e84672) — Rasmus Bohl Kristensen (rbkr) / githubweb
  80. Cleaned up a few things in RawMetadataCache and refactored HadoopUtils (commit: d10211a994d936309d076e87f0ae9699d99f385e) — Rasmus Bohl Kristensen (rbkr) / githubweb
  81. Squashed commit of the following: (commit: 8d9adc2b50d996dfaa544528b34a7b6b96947d1e) — Rasmus Bohl Kristensen (rbkr) / githubweb
  82. Added pattern configuration constants in GetMetadataMapper (commit: 9c130776d86bffa43170028cad724353348ec8dc) — Rasmus Bohl Kristensen (rbkr) / githubweb
  83. Cleanup aaording review (commit: 5b3c5fbb6202b4be581afacd06be50dbe3e0deb2) — Peter Christiansen (pech) / githubweb
  84. latest changes i getFile etc. (commit: 2c586ca6fd4c5b62a1df805f10f8bd1ac451b323) — Peter Christiansen (pech) / githubweb
  85. corrected (commit: 216184c3737c5b895f1eba74be84f5e4beab244e) — Peter Christiansen (pech) / githubweb
  86. A few final edits. (commit: e7fbf863417a855376671d375fbe0e4332b9f0ca) — Colin Rosenthal (csr) / githubweb
  87. 'Initial' commit (commit: c4553748c54d383e3f082fc10cf29b2ca4688ab3) — Rasmus Bohl Kristensen (rbkr) / githubweb
  88. First commit on arc_record branch (commit: d27f60647703e6333b0baa83ece150e2b1c238a5) — Peter Christiansen (pech) / githubweb
  89. Review https://sbforge.org/fisheye/cru/CR-NAS-385 changes (commit: 553c4afcb7ddf654b26cc4c9afa3c4cdc7c79197) — Rasmus Bohl Kristensen (rbkr) / githubweb
  90. Added testing (commit: 779f5a0dc37edfd41a0dd30d1dc4dd2167c8e13b) — Peter Christiansen (pech) / githubweb
  91. added .arc test-files (commit: dfffbac038e84ceb95a38d42024b6eb2fd557fdf) — Peter Christiansen (pech) / githubweb
  92. Fixed dependency problem and added simple application class to run (commit: 016eb5f4073e02288b2acd38f940e0002db0642f) — Rasmus Bohl Kristensen (rbkr) / githubweb
  93. Fixed get .arc-record with positive offset (commit: 66c495b026566a13658e39cf40e28120e91a31b8) — Peter Christiansen (pech) / githubweb
  94. Small refactor and implemented harvestRecentFilenames (commit: 199355bf63aba632f2dcb23c66de7778b794e97e) — Rasmus Bohl Kristensen (rbkr) / githubweb
  95. Javadoc added to few files https://sbforge.org/fisheye/cru/CR-NAS-387 (commit: 18336a02e91749175a3a19b4a9fafaa181af053c) — Rasmus Bohl Kristensen (rbkr) / githubweb
  96. More review changes https://sbforge.org/fisheye/cru/CR-NAS-387 (commit: 274b0fb8819d98558b17b3fae93c4885ebb6200c) — Rasmus Bohl Kristensen (rbkr) / githubweb
  97. minor changes tests (commit: fc4184449c23c26d5a014544479b6673b024164c) — Peter Christiansen (pech) / githubweb
  98. Initial functioning FileResolverRESTClient (commit: c60029793838e1421e0eb97c3ca8aee8d2b32149) — Colin Rosenthal (csr) / githubweb
  99. Removed some old bitmag classes (commit: 4674413716dbf6fc8e322c47ab8c2fb89c259664) — Rasmus Bohl Kristensen (rbkr) / githubweb
  100. Improved handling of try/catch logic (commit: 7a7cda6012b1ba4c6e7d0a1832e5c22b5133bb65) — Colin Rosenthal (csr) / githubweb
  101. Fix addRejectRule (commit: d4ccc959e652767475b8284b63394cceef8682b5) — clara.wiatrowski / githubweb
  102. Added some new tests and matured code ready for review (commit: b005ce2942cfab8a18e1389da311c32f2e5ac1ee) — Colin Rosenthal (csr) / githubweb
  103. Removed more old bitmag classes, refactored parts of some classes for (commit: dbf8703610bf19bfe044cf7f6f59a10710fdf7b4) — Rasmus Bohl Kristensen (rbkr) / githubweb
  104. Fixed some old imports that made the compiler angry (commit: 48d011475ef9dc0480986dbab35e8266e3207f4f) — Rasmus Bohl Kristensen (rbkr) / githubweb
  105. Fixed up FileResolverRESTClient for review and refactored code to enable (commit: 0a31340c22213cb7707a5188ec83ded5143c22ce) — Colin Rosenthal (csr) / githubweb
  106. Added more logging to FileNameHarvester (commit: b7120d70f3adf93ca6a12368f30897084a8a6295) — Rasmus Bohl Kristensen (rbkr) / githubweb
  107. Small refactor to make ArchiveFile's collectHadoopResults use (commit: 358b6977ce8fc98987a32327d57815bd30c0f34a) — Rasmus Bohl Kristensen (rbkr) / githubweb
  108. Latest bug fixes on loop testing (commit: f41a6bc3cf46940f48d0dbfca3121cd68679342a) — Peter Christiansen (pech) / githubweb
  109. Fixed bug with indexing threads sharing same filesystem instance (commit: d8c00a93115685b3357ec4202d7de727d76486fa) — Rasmus Bohl Kristensen (rbkr) / githubweb
  110. Undo of file-change permissions. (commit: 0019993d36df0e7d3be07594ded1edfe0c2e101b) — Colin Rosenthal (csr) / githubweb
  111. Fixed bug with indexing threads sharing same filesystem instance (commit: e9969a932d57a7e301ec95365219e3a662c3b0c6) — Colin Rosenthal (csr) / githubweb
  112. Fixed handling of returning used client to pool (commit: cba38820408a11fa56fe247f7b1c9eab668c6ba2) — Colin Rosenthal (csr) / githubweb
  113. Added cdx indexing for metadata files in CDXIndexer and proper testing (commit: 7000ae16f8936955299227260d601c5db7005b81) — Rasmus Bohl Kristensen (rbkr) / githubweb
  114. Got Hadoop replacement for ArchiveExtractCDXJob ready, refactored some (commit: 52c718231cc4eb4ba13f6161ace4f701aeb4b738) — Rasmus Bohl Kristensen (rbkr) / githubweb
  115. Added setting for new job input/output dirs and more logging (commit: 629996d5c70f12e916878cb48e1c234b932eedcd) — Rasmus Bohl Kristensen (rbkr) / githubweb
  116. Setting fix from review https://sbforge.org/jira/browse/NARK-1954 (commit: 5cb2bc46120e35bc9e1074ec1f135efd672d4b2a) — Rasmus Bohl Kristensen (rbkr) / githubweb
  117. Tidied up logic in client and tests (commit: 6996761e42e6f09685269eee0b656c2686cbc2d6) — Colin Rosenthal (csr) / githubweb
  118. Just save it for further improvement (commit: b527b008cd496ee73a0f32223f3a73a9936d4208) — Peter Christiansen (pech) / githubweb
  119. Review changes https://sbforge.org/fisheye/cru/CR-NAS-393, changes to (commit: bf5e943440e3760f4e25662ffcf603c3a78c0b2e) — Rasmus Bohl Kristensen (rbkr) / githubweb
  120. added FAILED check to JMSBitmagArcRepositoryClient.java (commit: ec70a97c6f06a04443c4997ce2c74087357f6807) — Peter Christiansen (pech) / githubweb
  121. FileResolverRESTClient now sends collectionId as an extra query (commit: 4327720c7867c82e5f3533a23657a5cd16149eba) — Colin Rosenthal (csr) / githubweb
  122. Fixed SimpleFileResolver, refactored how Hadoop jobs can be started, and (commit: 987c230dc013d45aaac9554df0392043965e56a0) — Rasmus Bohl Kristensen (rbkr) / githubweb
  123. Added collectionId parameter to WarcRecordClient (commit: 7aadc2501609e5b0ee956af0928ef4793af85ccb) — Colin Rosenthal (csr) / githubweb
  124. Added exactfilename parameter to FileResolverRESTClient. (commit: c8272eac493023dd1aaa9ebef671a1e3a42c3742) — Colin Rosenthal (csr) / githubweb
  125. uber.jar fixes and JMSBitmagArcRepositoryClient.java adds (commit: b60c7fc5255681dba809539cea8236e8763d2266) — Peter Christiansen (pech) / githubweb
  126. Added settings for new job and finished last refactoring parts (commit: dcee3b48afde25b3ab1ac42fa65adc64f672d91e) — Rasmus Bohl Kristensen (rbkr) / githubweb
  127. Made small fix/cleanup in crawl log mapper and added more documentation (commit: e052b35ecbe8d07c2a88e914d3202d863b57bf50) — Rasmus Bohl Kristensen (rbkr) / githubweb
  128. just to be sure (commit: f88f5411e3c1901a51e4140d0fb7898d27deb537) — Peter Christiansen (pech) / githubweb
  129. Squashed commit of the following: (commit: ab9b8860ca1f5323ca20cabf8a23c7ee01009bc8) — Rasmus Bohl Kristensen (rbkr) / githubweb
  130. Small changes from review https://sbforge.org/fisheye/cru/CR-NAS-395 (commit: 3d6bc39bc70440065a715acfe11a76e0575c5ea3) — Rasmus Bohl Kristensen (rbkr) / githubweb
  131. news (commit: a37a6b6d2a24db801c5491e63667dea5dbaf0bf3) — Peter Christiansen (pech) / githubweb
  132. unfinished code (commit: d854ab2d140a6b31223bf7aee4d7f4dda3861c44) — Peter Christiansen (pech) / githubweb
  133. Not finished 2 (commit: 765cdeea72e0987372c7979f6cb0b81981393d13) — Peter Christiansen (pech) / githubweb
  134. corrected for putFileAcction (commit: 443afa016d24c5b802fae820712098a702c52480) — Peter Christiansen (pech) / githubweb
  135. Modified JMSBitmagArcRepositoryClient,  PutfileAction and (commit: 539f2aef1903a6745252d0f4b8787c1dd3c1c282) — Peter Christiansen (pech) / githubweb
  136. small changes in PutFileAction and PutFileEventHandler (commit: 5896c19b108a9e79aea9e0f9490cc0442370ddef) — Peter Christiansen (pech) / githubweb
  137. latest (commit: ef0643a9605af26ce8d66d58423e802a22574056) — Peter Christiansen (pech) / githubweb
  138. Working version (commit: ccf8b537410437c4d657a06abc0f892d6dbe8950) — Peter Christiansen (pech) / githubweb
  139. newest version with warcRecordClient updates (commit: 9f2b875dd6b7ea22925f7d081b9b425b1f2fe8df) — Peter Christiansen (pech) / githubweb
  140. Cleaned up outcommenting (commit: d63a27a4dd72ccf5032917c76bd1ea022843349d) — Peter Christiansen (pech) / githubweb
  141. Added a default value for setting useBitmagHadoopBackend (commit: d4d44145f82d3a6d425472d62eed5dd070672e8f) — Colin Rosenthal (csr) / githubweb
  142. Squashed commit of the following: (commit: 9687194f6e849461945a3f75bdd3906f128d71c8) — Rasmus Bohl Kristensen (rbkr) / githubweb
  143. Removed bitmag entries reinvented by mistake (commit: 64c2cb0fe60ef0209fd609899cf3d76e2e67337a) — Peter Christiansen (pech) / githubweb
  144. Fixed duplicate code. (commit: c9dd56cca78bda6cf6a436949298f6426eccf6d7) — Colin Rosenthal (csr) / githubweb
  145. Removed old bitmag classes and remnants of it (commit: ad3aaf637af932564da7d15e83b75c02e9f7fb6f) — Rasmus Bohl Kristensen (rbkr) / githubweb
  146. modified copy-nas-and-heritrix.sh (commit: c20655c61b329ee9bfb6ff9fe9386d18630173fb) — Peter Christiansen (pech) / githubweb
  147. First attempt at a kill switch that returns an empty index for dedups (commit: 08d62e8104de4fe99d49b71b4b7933e41987bb56) — Colin Rosenthal (csr) / githubweb
  148. Second attempt using IndexReadyMessage (commit: 3199f61725d3badd01b8e26a1c0c295c7564cb09) — Colin Rosenthal (csr) / githubweb
  149. Added some logging (commit: aea04138a7ce030b1772456dee253c744b10453e) — Colin Rosenthal (csr) / githubweb
  150. Further attempt (commit: 701b2c647c674bc72091877fbd6ab2bd8e989ca9) — Colin Rosenthal (csr) / githubweb
  151. Further attempt using IndexReadyMessage (commit: ca9377f522312bdb2babb2c33e664becd1fbcf81) — Colin Rosenthal (csr) / githubweb
  152. Back to reply (commit: ddb5dd34fc6b691f0318b0965dc09b51f3da9f66) — Colin Rosenthal (csr) / githubweb
  153. Added a bit more logging. (commit: e4c67af253358e93ca41e108cfefff2072a6d9fa) — Colin Rosenthal (csr) / githubweb
  154. Removed potential error when requesting empty cache (commit: f6a7d91cbb5b2f3feaf42fc1b7e83ada5f2bb73a) — Colin Rosenthal (csr) / githubweb
  155. Clean-up (commit: f0f4a71edd0773f7fd35d30bfb5f80abe93057eb) — Colin Rosenthal (csr) / githubweb
  156. Removed dead code (commit: 83446ca81aa7274e519a5e4c1a61b1dc57000fe4) — Peter Christiansen (pech) / githubweb
  157. Removed copy-nas-and-heritrix.sh from version control (commit: c782cd44ff4cf17b0be0812f6dd8f5d45969f8b1) — Peter Christiansen (pech) / githubweb
  158. Basic CR-NAS-399 changes (commit: 55db2873e040bf7fdca20bab5b2fa6c4870c2b48) — Peter Christiansen (pech) / githubweb
  159. Fixed according to CFR-62389 (commit: 1f3ef2f191fa43a5cc9232c299e56e265643cb44) — Peter Christiansen (pech) / githubweb
  160. Improvements according to CR-NAS-399 (commit: 78f6044bd34cd5e1f9a066216c2dd45921d8e6fe) — Peter Christiansen (pech) / githubweb
  161. BitmagUtils.shutdown() and pillar check moved (commit: 633e7a019f3ce150f67872797f75cc3d57d3a162) — Peter Christiansen (pech) / githubweb
  162. Removed instance=new BitmagArcRepositoryClient() from constructor (commit: 3971b85059a5f68d7f270ad05885a5c2614241f2) — Peter Christiansen (pech) / githubweb
  163. Bit of refactoring and made SSL provider to work with https (commit: 8f03956481c07a5c8639a847a6a51a30a8827882) — Rasmus Bohl Kristensen (rbkr) / githubweb
  164. First attempt at a command-line metadata extraction job. (commit: 8183749da409edb2229dabb86b9e2fbaca8a182d) — Colin Rosenthal (csr) / githubweb
  165. Changed how the SSLContext is built to avoid trusting self-signed certs (commit: 18e8f959abf99005115a5e2521286242e98db82f) — Rasmus Bohl Kristensen (rbkr) / githubweb
  166. Fixed the error with closed hadoop file system (commit: 87397cff475747f2efdba4befe07276742918bc7) — Asger Askov Blekinge (abr) / githubweb
  167. Downgraded hadoop to stable 3.2.2 (commit: 2dd04f160af25e4c61fcdc14c697af66e40b3ed7) — Asger Askov Blekinge (abr) / githubweb
  168. Created an invoker-module to prevent the job from including all the (commit: fffd21b9e21b9572899722516c0c6e10a16db15c) — Asger Askov Blekinge (abr) / githubweb
  169. Package in libs (commit: 4ca12576421f97206a423b90978a9b684619ae7b) — Asger Askov Blekinge (abr) / githubweb
  170. Create FileSystem with newInstance and close it afterwards. DO NOT CLOSE (commit: b006660cc04ac3f6c2442dc0d09b74b6c017c9c9) — Asger Askov Blekinge (abr) / githubweb
  171. Added an extra sanity check in the run.sh script. (commit: dae7ccb17d1e55a49039a7388ff60797959fc609) — Colin Rosenthal (csr) / githubweb
  172. Added an extra line to show how to customise location of krb5.conf (commit: 6086fdeca97e33163c28c23062593b6f17f23b95) — Colin Rosenthal (csr) / githubweb
  173. Improved logging (commit: 6486dd243255f8bbf407a34207a76e5a4a27d331) — Colin Rosenthal (csr) / githubweb
  174. Attempted improvement of remote file handling of failures. (commit: 6724f3707ca7351efe7cedd055695d534c974fbb) — Colin Rosenthal (csr) / githubweb
  175. minor cleanup (commit: 5fc5716afe6a2532ad7a65e08660fda70ac3568e) — Peter Christiansen (pech) / githubweb
  176. Modified to support dynamic identification of the correct file-system (commit: dd68c05afd42a7a8beb6c10dde0a134b2a9b47a4) — Colin Rosenthal (csr) / githubweb
  177. Few clarification fixes to java doc (commit: 44da8bb849b06dc55392c01eaf7475563bc75313) — Rasmus Bohl Kristensen (rbkr) / githubweb
  178. Refactoring to make MetadataIndexingApplication closer to a reusable (commit: 48366f7a72262d6f1442b635b466a2b929b9bcd3) — Colin Rosenthal (csr) / githubweb
  179. Parametrised the script to make it more flexible. (commit: 048313b7a4b5d7b85e5aba5d9e31227a96008016) — Colin Rosenthal (csr) / githubweb
  180. Refactored to use login mechanism instead of doAs. (commit: dc8b3564c4d580f0a8078d58514296c11b01dbce) — Colin Rosenthal (csr) / githubweb
  181. Removed all unnecessary configuration overrides. (commit: e1c9cc6202e4f7da4fa340f9d9d84ab4f94ea957) — Colin Rosenthal (csr) / githubweb
  182. Initial version using fileresolver (commit: f212546a060f7271527d9d722308241d60e1b720) — Colin Rosenthal (csr) / githubweb
  183. Added default truststore settings (commit: d3522cfe5d45e6eeafeaf740119b5ef9bc7604d1) — Rasmus Bohl Kristensen (rbkr) / githubweb
  184. Added explicit jersey-server dep. to GUI. (commit: 06f48b57ed3030f29f067d2abae08d74bfab1f98) — Colin Rosenthal (csr) / githubweb
  185. Added a necessary filtering stage to match only current collection (commit: 5501fca44675886868cd1672489a9000f4a15c97) — Colin Rosenthal (csr) / githubweb
  186. Set fallback to environment name for collection (commit: 152eca7f4a888cbcf93654634c732545ce19c938) — Colin Rosenthal (csr) / githubweb
  187. Added skeletal getFile (commit: b89cc228ede40f449a714c9d22d93c22d475ede0) — Colin Rosenthal (csr) / githubweb
  188. Fix as vp creates empty toFile but bitmag requires non-existing toFile. (commit: e5a2e3f7c8f6ceb7c839aa104e38f39f7e300c6c) — Colin Rosenthal (csr) / githubweb
  189. Squashed commit of the following: (commit: 73e016f0e681e23ea9b50428fed236cbba4b706c) — Rasmus Bohl Kristensen (rbkr) / githubweb
  190. Tidying up for review. (commit: e68e482d4385d1ac0a0c8aac903055e1b23eaff3) — Colin Rosenthal (csr) / githubweb
  191. Quick attempt to enable hadoop in GUI (commit: 47d462cd57f8647b2162a7f8b2cd3a97d5fc8b42) — Colin Rosenthal (csr) / githubweb
  192. Upgraded guava version (commit: fe0375d5df7b63a2487714be99db0ac8a38005d9) — Colin Rosenthal (csr) / githubweb
  193. Sorting out separate inclusion of shaded jar. (commit: a00037a08b19e2a5d124a9feaeb239f755abad4a) — Colin Rosenthal (csr) / githubweb
  194. Remove "netarchivesuite" prefix from uber jar name. (commit: 2d76419cbe716ef7ea79a06723aff6b402b64677) — Colin Rosenthal (csr) / githubweb
  195. Improved logging on job creation (commit: ef7cc9ec201529ed39e1996abf4d939b9e91c518) — Colin Rosenthal (csr) / githubweb
  196. Forcing HadoopJobStrategy to use hdfs (commit: bafdb18e7a4b7667e765882e8a1205de182f3c91) — Colin Rosenthal (csr) / githubweb
  197. Forcing HadoopJobStrategy to use hdfs (commit: 6adbe99384b5754740a0cc4e6359ad3d1cc4e9ea) — Colin Rosenthal (csr) / githubweb
  198. Added harvester-core to uber jar (commit: 2214312e8db00065089f6ee1da3811cff20a181f) — Colin Rosenthal (csr) / githubweb
  199. Read hadoop truststore location from NAS settings (commit: 611487279c35ee59fc5ca71bf6c464d99f08be07) — Colin Rosenthal (csr) / githubweb
  200. Pom Jersey fix (commit: d8f038ad17589937dc4fdfd363f138c80098e822) — Rasmus Bohl Kristensen (rbkr) / githubweb
  201. Stuff (commit: 12036689cc9f582449065b5b4072c59a7fcee60d) — Rasmus Bohl Kristensen (rbkr) / githubweb
  202. Small guava pom change (commit: e25e3e3cc198dd3e350f4accd2ce268af0393687) — Rasmus Bohl Kristensen (rbkr) / githubweb
  203. Follow-up from code review (commit: 100475c5b0ecb5b09e6400ca0a404a79c82d1504) — Colin Rosenthal (csr) / githubweb
  204. Moved Kerberos logins (commit: d022a62c4c855acd8a042db1a240ea515a063d7f) — Rasmus Bohl Kristensen (rbkr) / githubweb
  205. Small fixes and revert (commit: ba7b6362e8dc63a0711bdc4a89c957ebca15a6bd) — Rasmus Bohl Kristensen (rbkr) / githubweb
  206. Readded Kerberos login to IndexRequestServer (commit: 9e06be74dd416fc3e0c6036fcc457796b4547e53) — Rasmus Bohl Kristensen (rbkr) / githubweb
  207. Added line 122 with casting (CleanupIF) (commit: 67d5ca5cad1db2c701d9bcf10046c9368895d4bc) — Peter Christiansen (pech) / githubweb
  208. Small job change for clarity in cluster job overview (commit: 7af86ebcd7c55acfb7a5244c9c85f5c52ca6764a) — Rasmus Bohl Kristensen (rbkr) / githubweb
  209. Outcommented TestCorrect, FileChecksumArchiveTester, (commit: 4319452c3343d6cc5aa64da31c9df337b5672688) — Peter Christiansen (pech) / githubweb
  210. Changed outcommenting to @Ignore (commit: f99ba354e51331301700cc3a95e044d7f53b9af3) — Peter Christiansen (pech) / githubweb
  211. collectionID setting fix to always default to env name when unset (commit: a114fff89a4046d8084f41be0a25f599dd785575) — Rasmus Bohl Kristensen (rbkr) / githubweb
  212. Small logging change (commit: 556502406169e61681d21e90a0425c9a8cd36ee6) — Rasmus Bohl Kristensen (rbkr) / githubweb
  213. Tests that fail locally are ignored (commit: ede844af57f06c03032ab97465a1294c881f4a5b) — Peter Christiansen (pech) / githubweb
  214. Added an intellij test configuration (commit: 45d07fd3b8861dcf7958f233ad42545b3436d77d) — Colin Rosenthal (csr) / githubweb
  215. Just some optimized imports and small stuff (commit: 48977bc912bcd2f4b9b44db06e65c04eb0281352) — Rasmus Bohl Kristensen (rbkr) / githubweb
  216. modified test config (commit: f03b993c4bd707d0ba75783892830364fd91a898) — Colin Rosenthal (csr) / githubweb
  217. Corrected internal versions (commit: d5e904f8602621fb085dc9d52d2bf6a31992b8d9) — Colin Rosenthal (csr) / githubweb
  218. Added hadoop-common as necessary (commit: 3cbe36d8da58ba1f65b23a4740b141a79886b53b) — Colin Rosenthal (csr) / githubweb
  219. More fixed versions (commit: e7b9a3383dc1ff2f5b4d6cfbd0ec7a8e3943d7ac) — Colin Rosenthal (csr) / githubweb
  220. switched line 123 with line 122 CleanupIF.. (commit: fd87b5a10b73ec8f3acbf23ad48445cd61ee6662) — Peter Christiansen (pech) / githubweb
  221. Made ArcRepositoryServer implement CleanupIF (commit: ee0494c514d833f4875829cb2f9ecb761ee94252) — Colin Rosenthal (csr) / githubweb
  222. Minimal fix to test if ssl works correctly (commit: c1acc3688c43b627668e1edcb6d89035145d9a67) — Rasmus Bohl Kristensen (rbkr) / githubweb
  223. Removed applications from test (commit: 08c4691699f10ee3dc467d168f4382d050510c5f) — Colin Rosenthal (csr) / githubweb
  224. Improved test logic (commit: 226e523673d58ad2acd92800489386d66a4200a9) — Colin Rosenthal (csr) / githubweb
  225. Unused imports and small line removals (commit: 84edae215c8f466a8002744284cb300e251d9882) — Rasmus Bohl Kristensen (rbkr) / githubweb
  226. Added an exclusion to prevent fatal runtime error. (commit: 6c2e5f7d039dd9bf50c0f1d2e67e57cc4f85778c) — Colin Rosenthal (csr) / githubweb
  227. Added another exclusion to prevent fatal runtime error. (commit: 11ffa7417d88fb549403d62de13144663bd70569) — Colin Rosenthal (csr) / githubweb
  228. Follow up to own review comments (commit: 681d1147f7824aa7a18804c4b1d9475ce9c26c19) — Colin Rosenthal (csr) / githubweb
  229. Filtering transient GUIWebServer from integration test (commit: dd8ac077ee484bdd1136c093400f60eeb4defd9d) — Colin Rosenthal (csr) / githubweb
  230. Fixed check in test of which type of instance is running. (commit: 5ed4647b549aaaa91eb8e1ee69866a0b04506b9b) — Colin Rosenthal (csr) / githubweb
  231. Debugging generalTest (commit: 55008cad5c80a4bcc652151d61713f63c93abe7c) — Colin Rosenthal (csr) / githubweb
  232. Returned old logic. VM running the integration test uses default (commit: 44a990fcc968cb5bd13f5d4b9ce9129c90d07f95) — Colin Rosenthal (csr) / githubweb
  233. ny pom.xmm a intergaces og annotations (commit: 1fd4d1d17ef45ac49b8f1d91504b390ecb47a70e) — Peter Christiansen (pech) / githubweb
  234. Fixed circular dependency (commit: b9239bc6e51ea9fafa652287c25a3a1a76306c30) — Peter Christiansen (pech) / githubweb
  235. Removed requirement for RequiresFileResolver (commit: fdc67c18e38c74545fbda1b6d0912b08d871deca) — Colin Rosenthal (csr) / githubweb
  236. Added compile-time groovy dep to make the groovy scripts look better in (commit: e46a498090f12ccd94a75a281e6a0ba2de28f96d) — Colin Rosenthal (csr) / githubweb
  237. Fixed tests and trying out @ignore for failing mappers (commit: cb5f59cb8465f2eea112989e4ed878851c79ff87) — Rasmus Bohl Kristensen (rbkr) / githubweb
  238. Removed fileresolver exclusion from pom (commit: 9f33291822db7d835c4bb3169047d776da91472c) — Rasmus Bohl Kristensen (rbkr) / githubweb
  239. Explicitly exclude wrong je and httpclient from heritrix bundler (commit: 9a350b07cf6c4cb20cab1e75c3420bc0c0331ade) — Colin Rosenthal (csr) / githubweb
  240. Pom indentation and small fixes (commit: 12d18ec97e3c72d09addc0455c71dd17d96a1de5) — Rasmus Bohl Kristensen (rbkr) / githubweb
  241. Removed unused dependency which was causing a problem. (commit: 9a46907513468a7bdadafece89ceb3e223d5e65c) — Colin Rosenthal (csr) / githubweb
  242. Testing non-ignored Hadoop mapper test (commit: e7e250e1473b6da015c4eb19be7261be6db017be) — Rasmus Bohl Kristensen (rbkr) / githubweb
  243. Changed version to 7.0-SNAPSHOT (commit: 617efc9a7b921f06f00bcf8aee6e104253242f84) — Colin Rosenthal (csr) / githubweb
  244. Mappers enabled again as crawllog extraction works on Jenkins (commit: d9b2ca196fd40ddf7055d83fa454b94a71397a3c) — Rasmus Bohl Kristensen (rbkr) / githubweb
  245. Added super class for Hadoop mapper tests that handles setup of (commit: c8974baa0f7d26fd9024bfd18e513a112324dd0c) — Rasmus Bohl Kristensen (rbkr) / githubweb
  246. Fixed some post-merge stuff (commit: 7bb7162b0968ca5e0fdcb83f8b63551f8573c8f9) — Colin Rosenthal (csr) / githubweb
  247. Fixed casing of TESTX for StressTest to match bitmag collection (commit: 30f2835823f41d6f66644e748b133d6d1da0f2e0) — Colin Rosenthal (csr) / githubweb
  248. Cleanup of pom (commit: 4ab01eeafbda03d2c5f2bd4177c83a0a25a44376) — Colin Rosenthal (csr) / githubweb
  249. Removed hard-coded bitmag paths from start scripts (commit: b6ab8c0ac547e01fc616928ff52d2b70fa7e316b) — Colin Rosenthal (csr) / githubweb
  250. Restored httpclient 4.5.12 which was apparently necessary after all. (commit: e22863ff5dbff424e9996a765d3a3ebd6e423a3e) — Colin Rosenthal (csr) / githubweb
  251. Moved and renamed HadoopMapperTester and cleaned up its subclasses a bit (commit: 19da50ffad622e72f8795a09e117baf44bb5256f) — Rasmus Bohl Kristensen (rbkr) / githubweb
  252. Fixed FileNameHarvester throwing NullPointer on failed comms with Bitmag (commit: 191b3f4828a54bd30d183a745a9e05e7e85462c4) — Rasmus Bohl Kristensen (rbkr) / githubweb
  253. Made "TESTX" in automatic tests settable with -Dsystemtest.testx (commit: df4c1ca5e59fe558eb2c8a70157adb92c12a83c5) — Colin Rosenthal (csr) / githubweb
  254. [maven-release-plugin] prepare release netarchivesuite-7.0 (commit: 90520aea9d5e13775630dfd51d5942015feb6732) — Colin Rosenthal (csr) / githubweb
  255. [maven-release-plugin] prepare for next development iteration (commit: 9f8459ec50a9145c1922559b5ed7a407b550ca7b) — Colin Rosenthal (csr) / githubweb
  256. Fixed javadoc generation for most recent versions of maven plugin and (commit: e4dc5be95d6cc93ca9a6cfa1d91f4d79664f33de) — Colin Rosenthal (csr) / githubweb
  257. Bumped to heritrix version supporting sitemaps (commit: efb441d5c5471fa110a6465b96f9a86682b4a478) — Colin Rosenthal (csr) / githubweb
  258. Simple cache for metadata cdx records (commit: ad347f1a2b6d2f988a6a2acce8fd9e590241a05a) — Colin Rosenthal (csr) / githubweb
  259. Also cache crawl logs (commit: 1d15d6084d008ce72664e0c56bd68e1eabf14bd0) — Colin Rosenthal (csr) / githubweb
  260. Fixing record caching (commit: a2df94cb70d5a2577a7bf05f61790f6f4321284e) — Colin Rosenthal (csr) / githubweb
  261. Fixed deletion of harvested file on successful upload. (commit: 355cbb9655e82fbd799832099f1307d5c49bbd0d) — Colin Rosenthal (csr) / githubweb
  262. Added a default retry handler to the http(s) client. (commit: 9205ce706e71716126189dd7ca8474738613c996) — Colin Rosenthal (csr) / githubweb
  263. Made metadata cache directory configurable (commit: 70bf4d5757d431ba901d09081d02306d96459aa6) — Colin Rosenthal (csr) / githubweb
  264. Added store retries. (commit: 3a463182078ff63092d68f49a210567f2b255ea1) — Colin Rosenthal (csr) / githubweb
  265. Improved logging for upload failures. (commit: e0b51e738a1e2c2743b472b297405dd19541a090) — Colin Rosenthal (csr) / githubweb
  266. Added a retry wait to bitmag uploads. (commit: 6d47283d160b1b8dbe11635b21f8d58d38679b3e) — Colin Rosenthal (csr) / githubweb
  267. Set requestSentRetryEnabled to true in calls to (commit: a6926e367f5662441d852e3e4d0f69189a09130c) — Colin Rosenthal (csr) / githubweb
  268. Set requestSentRetryEnabled to true in calls to (commit: eca2e6f8e8291f94a917a05ef81db7ae272719d3) — Colin Rosenthal (csr) / githubweb
  269. Created a more aggressive retry handler. (commit: 90d20a892319d1ad816714fc2d722ed4653fe428) — Colin Rosenthal (csr) / githubweb
  270. Added some logging to retry handler (commit: 4673e7b2c09d6517d567f588cc3b44ca17df1c94) — Colin Rosenthal (csr) / githubweb
  271. Added some logging to retry handler ctor (commit: b575541e901a0432a89d4edd07861693ddc4952e) — Colin Rosenthal (csr) / githubweb
  272. Make sure we don't keep caches zero-length result files for crawl logs (commit: 5d54c34e35f0f5838993242982c9e19c5dd82644) — Colin Rosenthal (csr) / githubweb
  273. Add tooltip to bullet for job status (commit: 2622ea3eb86736e7efc62b04a14c932110b522ae) — clara.wiatrowski / githubweb
  274. Fix revisit issue write schema before digest (commit: ab2fb1c8b8cc3f72c8c22c164c951df3e15e01f9) — clara.wiatrowski / githubweb
  275. Remove locale for tooltip parameter (commit: 9539bc0797d4425b41988e18621f4c0807db9155) — clara.wiatrowski / githubweb
  276. Hadoop memory and core allocation utility functions (commit: a14e8103d5588c5760a5e6e7b69bb4d665fa8730) — Asger Askov Blekinge (abr) / githubweb
  277. Added utils for managing map-only uber-jobs (commit: 057052a055a46fa4ecb94e8398bc4b0279432a8f) — Colin Rosenthal (csr) / githubweb
  278. Added rethrows for better error handling from hadoop. (commit: 42fe34dbfa04f34062e56c107c65a71869fd3c04) — Colin Rosenthal (csr) / githubweb
  279. Added extra logging on file indexing (commit: 8cf9d33c4d4d8f96851390bd05f4f1d7d1a98995) — Colin Rosenthal (csr) / githubweb
  280. Set default hadoop queue names to "default" (commit: beb96074d734343d6314f5a6fe6f2114af9e999e) — Colin Rosenthal (csr) / githubweb
  281. Corrected a misinformative log statement (commit: e01be32a5147549f58d6d424c65a475ae46fcdd6) — Colin Rosenthal (csr) / githubweb

#18 (03-Mar-2021 08:25:34)

  1. Improved test logic (commit: 226e523673d58ad2acd92800489386d66a4200a9) — Colin Rosenthal (csr) / githubweb

#17 (03-Mar-2021 08:12:45)

  1. Removed applications from test (commit: 08c4691699f10ee3dc467d168f4382d050510c5f) — Colin Rosenthal (csr) / githubweb

#14 (02-Mar-2021 10:52:11)

  1. Made ArcRepositoryServer implement CleanupIF (commit: ee0494c514d833f4875829cb2f9ecb761ee94252) — Colin Rosenthal (csr) / githubweb