Up

Changes

#31 (08-Aug-2022 13:39:06)

  1. Delete all configs process (commit: a064796cbbdb99751763d85ace01f041a7bfd35c) — clara.wiatrowski / detail
  2. Upgrade database model (commit: 909341de18a58f46f100188110d0f61506448958) — clara.wiatrowski / detail
  3. Add and update totalBytesWritten (commit: c89a3a460eee1c732da3a4096c169ed2f42a067b) — clara.wiatrowski / detail
  4. Display totalBytesWritten in GUI (commit: 7a4044cbb9e83d149dd2291a92107ff10ab91632) — clara.wiatrowski / detail
  5. Distinguish queues types (commit: 810288f8f53177eb8f8f8e91ccf22467a754b099) — clara.wiatrowski / detail
  6. update danish wording (commit: a9d1464c12a7c08f9484cead7afee284764db8a4) — clara.wiatrowski / detail
  7. update danish wording (commit: c69421ee0e47bf1f31112f13e8fa887bf913892f) — clara.wiatrowski / detail
  8. Added a to-do related to NAS-2875 (commit: 85b7a3ac9e8761c879014a931f2f94fe07066be4) — Colin Rosenthal (csr) / detail
  9. Changed domain match to element 3 of crawl log line (commit: afd201569e56243da9f1fc5c88179806adb637a9) — Colin Rosenthal (csr) / detail
  10. Us crawl log field 11 for domain (commit: 530d443d1c093ced66f5fc473e516c6b5c0373f6) — Colin Rosenthal (csr) / detail
  11. Changing back to 10. Issue is with crawl log caching. (commit: 1ecfd3917eea680d0c84a1809b54b87104643555) — Colin Rosenthal (csr) / detail
  12. Created common method for filtering carwl logs by domain or regex (commit: c0b47406849900e9abe978d2039f36990858985c) — Colin Rosenthal (csr) / detail
  13. Added corrected behaviour when the seed is missing a scheme. (commit: 0d30f3765d08aa9bdfcbe3ba33429d7f6fb486f2) — Colin Rosenthal (csr) / detail
  14. Fix to NAS-2883 (commit: bfe20c8963fb50986ebacce856f636dee5b6f75f) — Colin Rosenthal (csr) / detail
  15. [maven-release-plugin] prepare release netarchivesuite-7.4 (commit: ff43b1ffbb7beea67078a7b5f95043677bbabeea) — Colin Rosenthal (csr) / detail
  16. [maven-release-plugin] prepare for next development iteration (commit: 395ff291e16c4aa5ecbeb0659df6fcd14ce1ff31) — Colin Rosenthal (csr) / detail

#29 (12-Jul-2022 09:50:38)

  1. Added corrected behaviour when the seed is missing a scheme. (commit: 0d30f3765d08aa9bdfcbe3ba33429d7f6fb486f2) — Colin Rosenthal (csr) / detail

#27 (11-Jul-2022 14:17:55)

  1. Created common method for filtering carwl logs by domain or regex (commit: c0b47406849900e9abe978d2039f36990858985c) — Colin Rosenthal (csr) / detail

#22 (31-Jan-2022 08:33:54)

  1. [maven-release-plugin] prepare release netarchivesuite-7.1 (commit: 1d53f8bcdc078160b94774ca5bceb31263ca8355) — Colin Rosenthal (csr) / detail
  2. [maven-release-plugin] prepare for next development iteration (commit: 6c38a07255284b81c271cd42aa9de132a29f810b) — Colin Rosenthal (csr) / detail
  3. [maven-release-plugin] prepare release netarchivesuite-7.2 (commit: 5451945661aee7d11384d88bfb27e70fd2ffc021) — Colin Rosenthal (csr) / detail
  4. [maven-release-plugin] prepare for next development iteration (commit: 77d8764f3d78690ee5759d2536d5e3688c91bab8) — Colin Rosenthal (csr) / detail
  5. Changed log level. (commit: 2fad6ab352d1f932669f1c5b8f3c7350aff55071) — Colin Rosenthal (csr) / detail
  6. Ensure Filesystem objects are closed after use (commit: 802c4d77c7232bf263fc1b0a534c0ee2944b83b1) — Asger Askov Blekinge (abr) / detail
  7. GetMetadataMapper and cacheFile report progress to prevent a (commit: 934d4bb20c17b3e7ed28636af4164857e0fb7705) — Asger Askov Blekinge (abr) / detail
  8. Fixed error introduced during merge (commit: 51e3eaacfe50ff91354213a63907835f73599123) — Colin Rosenthal (csr) / detail
  9. Fixed some issues with holding large hadoop result sets in memory (commit: c5d6c5d38bb0cfe52a15493ccde9fa24a07e6a8d) — Colin Rosenthal (csr) / detail
  10. Fixed a tempfile name (commit: 31fc8d34137a414d5c746069437456712678d46b) — Colin Rosenthal (csr) / detail
  11. Added debugging to crawllog searching (commit: 6c7a5062550049c6f6aab16d58f275970c3f5ef1) — Colin Rosenthal (csr) / detail
  12. Switched field to search for domain in crawl log (commit: f0e2fec464f219bbccc5ad1a8c47bb5ab34b5f79) — Colin Rosenthal (csr) / detail
  13. Revert "Switched field to search for domain in crawl log" (commit: 534e263a5f491acddaf2697a61d51df56cc6fda1) — Colin Rosenthal (csr) / detail

#21 (06-Jul-2021 09:15:15)

  1. Updated version to a unique name (commit: e3b328a3cca883f6396a2955ef28bb0e2c7d2300) — Colin Rosenthal (csr) / detail
  2. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: 68ab4244669d4e8d7847001c179f62cb019cacc1) — Colin Rosenthal (csr) / detail
  3. [maven-release-plugin] prepare for next development iteration (commit: 16147d5dddfd034ad243da25c911a9fa1e4d53d3) — Colin Rosenthal (csr) / detail
  4. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: b246f40d31190967ee84f51cf54e73def89c6737) — Colin Rosenthal (csr) / detail
  5. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: e97567d8b0cf594e4ba5ac10d3d7d8449adc0cc0) — Colin Rosenthal (csr) / detail
  6. [maven-release-plugin] prepare for next development iteration (commit: 974b3a9a687aca6fd6fba84828f84f0699ff8078) — Colin Rosenthal (csr) / detail
  7. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: df51498918198325ccd2b687dde5c09872ddcc1b) — Colin Rosenthal (csr) / detail
  8. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: 597dca6302626d2eb975dbd7b6e8f7bf8e4dfe17) — Colin Rosenthal (csr) / detail
  9. [maven-release-plugin] prepare for next development iteration (commit: d7f4a80b29070e594c26a17a7087d4fff502b8ea) — Colin Rosenthal (csr) / detail
  10. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: c8b3c3a9a215db98a3e07c327593f73b93d531ea) — Colin Rosenthal (csr) / detail
  11. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: d71699a62c3a45657e4faf14519b97ba71593b80) — Colin Rosenthal (csr) / detail
  12. [maven-release-plugin] prepare for next development iteration (commit: d2fae1846cd326ddf93451ef5d9f15ecbe4f6a73) — Colin Rosenthal (csr) / detail
  13. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: 185f170ccde75314ccc6dfec0939d2390c459824) — Colin Rosenthal (csr) / detail
  14. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: 3ca4bd51ae53845b7e6867d637188d4b866f17a6) — Colin Rosenthal (csr) / detail
  15. [maven-release-plugin] prepare for next development iteration (commit: 3208775980a139629d0de3a5c1c9859f2ee21543) — Colin Rosenthal (csr) / detail
  16. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: f67005308b68ec48d71838bc2efe4d7a9ee38a07) — Colin Rosenthal (csr) / detail
  17. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: dff365105289bbe62558d6dafa673690caf7d153) — Colin Rosenthal (csr) / detail
  18. [maven-release-plugin] prepare for next development iteration (commit: 6fcd724e7c8c86b5edb57b79e74e0c2a202101e1) — Colin Rosenthal (csr) / detail
  19. Added hadoop job for getting metadata lines from archive files and an (commit: 553e20659df3bb62c6d121d50c6effa3fc8947e9) — Rasmus Bohl Kristensen (rbkr) / detail
  20. Added filehandling for GetMetadataArchiveMapper and small touch ups (commit: 24aaecbc74299fa9fda9191dfe510977aa027b8f) — Rasmus Bohl Kristensen (rbkr) / detail
  21. Small refactor of ArchiveFile/HadoopUtils, few touch ups and started on (commit: 0d880c6017572102b4bf24e60d87ea35a84e2470) — Rasmus Bohl Kristensen (rbkr) / detail
  22. 'Start' of https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-1970 (commit: d52a6bfda1ed72ad4fc125356ae274f18e0de8c6) — Rasmus Bohl Kristensen (rbkr) / detail
  23. Integration of Hadoop dedup indexing with GetMetadataArchiveMapper now (commit: ca2c62d474caf14d80e8e1e8f3970a4582e84672) — Rasmus Bohl Kristensen (rbkr) / detail
  24. Cleaned up a few things in RawMetadataCache and refactored HadoopUtils (commit: d10211a994d936309d076e87f0ae9699d99f385e) — Rasmus Bohl Kristensen (rbkr) / detail
  25. Squashed commit of the following: (commit: 8d9adc2b50d996dfaa544528b34a7b6b96947d1e) — Rasmus Bohl Kristensen (rbkr) / detail
  26. Added pattern configuration constants in GetMetadataMapper (commit: 9c130776d86bffa43170028cad724353348ec8dc) — Rasmus Bohl Kristensen (rbkr) / detail
  27. Review https://sbforge.org/fisheye/cru/CR-NAS-385 changes (commit: 553c4afcb7ddf654b26cc4c9afa3c4cdc7c79197) — Rasmus Bohl Kristensen (rbkr) / detail
  28. Fixed up FileResolverRESTClient for review and refactored code to enable (commit: 0a31340c22213cb7707a5188ec83ded5143c22ce) — Colin Rosenthal (csr) / detail
  29. Added cdx indexing for metadata files in CDXIndexer and proper testing (commit: 7000ae16f8936955299227260d601c5db7005b81) — Rasmus Bohl Kristensen (rbkr) / detail
  30. Got Hadoop replacement for ArchiveExtractCDXJob ready, refactored some (commit: 52c718231cc4eb4ba13f6161ace4f701aeb4b738) — Rasmus Bohl Kristensen (rbkr) / detail
  31. Added setting for new job input/output dirs and more logging (commit: 629996d5c70f12e916878cb48e1c234b932eedcd) — Rasmus Bohl Kristensen (rbkr) / detail
  32. Setting fix from review https://sbforge.org/jira/browse/NARK-1954 (commit: 5cb2bc46120e35bc9e1074ec1f135efd672d4b2a) — Rasmus Bohl Kristensen (rbkr) / detail
  33. Review changes https://sbforge.org/fisheye/cru/CR-NAS-393, changes to (commit: bf5e943440e3760f4e25662ffcf603c3a78c0b2e) — Rasmus Bohl Kristensen (rbkr) / detail
  34. Fixed SimpleFileResolver, refactored how Hadoop jobs can be started, and (commit: 987c230dc013d45aaac9554df0392043965e56a0) — Rasmus Bohl Kristensen (rbkr) / detail
  35. Added settings for new job and finished last refactoring parts (commit: dcee3b48afde25b3ab1ac42fa65adc64f672d91e) — Rasmus Bohl Kristensen (rbkr) / detail
  36. Made small fix/cleanup in crawl log mapper and added more documentation (commit: e052b35ecbe8d07c2a88e914d3202d863b57bf50) — Rasmus Bohl Kristensen (rbkr) / detail
  37. Squashed commit of the following: (commit: ab9b8860ca1f5323ca20cabf8a23c7ee01009bc8) — Rasmus Bohl Kristensen (rbkr) / detail
  38. Squashed commit of the following: (commit: 9687194f6e849461945a3f75bdd3906f128d71c8) — Rasmus Bohl Kristensen (rbkr) / detail
  39. First attempt at a kill switch that returns an empty index for dedups (commit: 08d62e8104de4fe99d49b71b4b7933e41987bb56) — Colin Rosenthal (csr) / detail
  40. Second attempt using IndexReadyMessage (commit: 3199f61725d3badd01b8e26a1c0c295c7564cb09) — Colin Rosenthal (csr) / detail
  41. Added some logging (commit: aea04138a7ce030b1772456dee253c744b10453e) — Colin Rosenthal (csr) / detail
  42. Further attempt (commit: 701b2c647c674bc72091877fbd6ab2bd8e989ca9) — Colin Rosenthal (csr) / detail
  43. Further attempt using IndexReadyMessage (commit: ca9377f522312bdb2babb2c33e664becd1fbcf81) — Colin Rosenthal (csr) / detail
  44. Back to reply (commit: ddb5dd34fc6b691f0318b0965dc09b51f3da9f66) — Colin Rosenthal (csr) / detail
  45. Added a bit more logging. (commit: e4c67af253358e93ca41e108cfefff2072a6d9fa) — Colin Rosenthal (csr) / detail
  46. Removed potential error when requesting empty cache (commit: f6a7d91cbb5b2f3feaf42fc1b7e83ada5f2bb73a) — Colin Rosenthal (csr) / detail
  47. Clean-up (commit: f0f4a71edd0773f7fd35d30bfb5f80abe93057eb) — Colin Rosenthal (csr) / detail
  48. Refactoring to make MetadataIndexingApplication closer to a reusable (commit: 48366f7a72262d6f1442b635b466a2b929b9bcd3) — Colin Rosenthal (csr) / detail
  49. Initial version using fileresolver (commit: f212546a060f7271527d9d722308241d60e1b720) — Colin Rosenthal (csr) / detail
  50. Added explicit jersey-server dep. to GUI. (commit: 06f48b57ed3030f29f067d2abae08d74bfab1f98) — Colin Rosenthal (csr) / detail
  51. Added a necessary filtering stage to match only current collection (commit: 5501fca44675886868cd1672489a9000f4a15c97) — Colin Rosenthal (csr) / detail
  52. Tidying up for review. (commit: e68e482d4385d1ac0a0c8aac903055e1b23eaff3) — Colin Rosenthal (csr) / detail
  53. Forcing HadoopJobStrategy to use hdfs (commit: bafdb18e7a4b7667e765882e8a1205de182f3c91) — Colin Rosenthal (csr) / detail
  54. Forcing HadoopJobStrategy to use hdfs (commit: 6adbe99384b5754740a0cc4e6359ad3d1cc4e9ea) — Colin Rosenthal (csr) / detail
  55. Added harvester-core to uber jar (commit: 2214312e8db00065089f6ee1da3811cff20a181f) — Colin Rosenthal (csr) / detail
  56. Moved Kerberos logins (commit: d022a62c4c855acd8a042db1a240ea515a063d7f) — Rasmus Bohl Kristensen (rbkr) / detail
  57. Small fixes and revert (commit: ba7b6362e8dc63a0711bdc4a89c957ebca15a6bd) — Rasmus Bohl Kristensen (rbkr) / detail
  58. Readded Kerberos login to IndexRequestServer (commit: 9e06be74dd416fc3e0c6036fcc457796b4547e53) — Rasmus Bohl Kristensen (rbkr) / detail
  59. collectionID setting fix to always default to env name when unset (commit: a114fff89a4046d8084f41be0a25f599dd785575) — Rasmus Bohl Kristensen (rbkr) / detail
  60. Follow up to own review comments (commit: 681d1147f7824aa7a18804c4b1d9475ce9c26c19) — Colin Rosenthal (csr) / detail
  61. Changed version to 7.0-SNAPSHOT (commit: 617efc9a7b921f06f00bcf8aee6e104253242f84) — Colin Rosenthal (csr) / detail
  62. [maven-release-plugin] prepare release netarchivesuite-7.0 (commit: 90520aea9d5e13775630dfd51d5942015feb6732) — Colin Rosenthal (csr) / detail
  63. [maven-release-plugin] prepare for next development iteration (commit: 9f8459ec50a9145c1922559b5ed7a407b550ca7b) — Colin Rosenthal (csr) / detail
  64. Simple cache for metadata cdx records (commit: ad347f1a2b6d2f988a6a2acce8fd9e590241a05a) — Colin Rosenthal (csr) / detail
  65. Also cache crawl logs (commit: 1d15d6084d008ce72664e0c56bd68e1eabf14bd0) — Colin Rosenthal (csr) / detail
  66. Fixing record caching (commit: a2df94cb70d5a2577a7bf05f61790f6f4321284e) — Colin Rosenthal (csr) / detail
  67. Made metadata cache directory configurable (commit: 70bf4d5757d431ba901d09081d02306d96459aa6) — Colin Rosenthal (csr) / detail
  68. Make sure we don't keep caches zero-length result files for crawl logs (commit: 5d54c34e35f0f5838993242982c9e19c5dd82644) — Colin Rosenthal (csr) / detail
  69. Add tooltip to bullet for job status (commit: 2622ea3eb86736e7efc62b04a14c932110b522ae) — clara.wiatrowski / detail
  70. Added utils for managing map-only uber-jobs (commit: 057052a055a46fa4ecb94e8398bc4b0279432a8f) — Colin Rosenthal (csr) / detail
  71. Added rethrows for better error handling from hadoop. (commit: 42fe34dbfa04f34062e56c107c65a71869fd3c04) — Colin Rosenthal (csr) / detail