Up

Changes

#31 (08-Aug-2022 13:39:06)

  1. [maven-release-plugin] prepare release netarchivesuite-7.4 (commit: ff43b1ffbb7beea67078a7b5f95043677bbabeea) — Colin Rosenthal (csr) / detail
  2. [maven-release-plugin] prepare for next development iteration (commit: 395ff291e16c4aa5ecbeb0659df6fcd14ce1ff31) — Colin Rosenthal (csr) / detail

#22 (31-Jan-2022 08:33:54)

  1. [maven-release-plugin] prepare release netarchivesuite-7.1 (commit: 1d53f8bcdc078160b94774ca5bceb31263ca8355) — Colin Rosenthal (csr) / detail
  2. [maven-release-plugin] prepare for next development iteration (commit: 6c38a07255284b81c271cd42aa9de132a29f810b) — Colin Rosenthal (csr) / detail
  3. [maven-release-plugin] prepare release netarchivesuite-7.2 (commit: 5451945661aee7d11384d88bfb27e70fd2ffc021) — Colin Rosenthal (csr) / detail
  4. [maven-release-plugin] prepare for next development iteration (commit: 77d8764f3d78690ee5759d2536d5e3688c91bab8) — Colin Rosenthal (csr) / detail
  5. Progress reporting in CDXMapper should prevent timeouts on big warc (commit: f4c54cddfe7440cc5024fee62d1feb6b057089d1) — Asger Askov Blekinge (abr) / detail
  6. dedupIndexer can now send progress info to hadoop and thus hopefully (commit: 936944d28b6add356f10837d7f4b1f5f3f8efa39) — Asger Askov Blekinge (abr) / detail
  7. Fixed bitmag getfileids and some cleanup (commit: 520e4de5e267e3af2d40fa6e78803c15a33df510) — Colin Rosenthal (csr) / detail
  8. Writing direct to hdfs. (commit: 2ab46b46fece97462f8000d14ddcf3017c77dd2c) — Colin Rosenthal (csr) / detail
  9. Added direct output streaming from hdfs (commit: 480e7411841e986bc48e2bc9fa7d5f759f5eead7) — Colin Rosenthal (csr) / detail
  10. Fixed some issues with holding large hadoop result sets in memory (commit: c5d6c5d38bb0cfe52a15493ccde9fa24a07e6a8d) — Colin Rosenthal (csr) / detail

#21 (06-Jul-2021 09:15:15)

  1. Updated version to a unique name (commit: e3b328a3cca883f6396a2955ef28bb0e2c7d2300) — Colin Rosenthal (csr) / detail
  2. Small changes to settings (commit: a85a3ad520234cf8cedeeef058911ad47c304474) — Rasmus Bohl Kristensen (rbkr) / detail
  3. Can now at least work with local bitmag, it seems (commit: 91bdce496fa2ebe20ee0eeb8ce97528a868505f1) — Rasmus Bohl Kristensen (rbkr) / detail
  4. FileNameHarvester now grabs list of files directly from bitmag. Added (commit: 6824e324e6ccd74a10740790c11968bd5803a312) — Rasmus Bohl Kristensen (rbkr) / detail
  5. Indexing through hadoop instead of batch should now work for WARC files (commit: 982056795b045ed339987660c2d13dd7f41d1073) — Rasmus Bohl Kristensen (rbkr) / detail
  6. Changes from review (commit: a4871eadd0dcfd71fe3bda54170c5f911ae3da88) — Rasmus Bohl Kristensen (rbkr) / detail
  7. Fixed dependency conflict with hadoop-client package and finished (commit: e8735f0b8fa3800b012a7928e369cf358b5248d8) — Rasmus Bohl Kristensen (rbkr) / detail
  8. Refactored Bitrepository to a singleton. (commit: e494e8d0482fde2bc44ee840b4e675dac00a43d5) — Colin Rosenthal (csr) / detail
  9. Small logging changes (commit: c091c483521bf77ea95b7450dd0aee6449f5ccac) — Rasmus Bohl Kristensen (rbkr) / detail
  10. Bitrepository class changes (commit: 9a39a4b2aacdc9283f4c34b080693ca5a1ddfe92) — Rasmus Bohl Kristensen (rbkr) / detail
  11. Dependency fix to avoid logging loop and small logging changes (commit: a709b310c9f6af51c6312e3ed3fec73a912e2622) — Rasmus Bohl Kristensen (rbkr) / detail
  12. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: 68ab4244669d4e8d7847001c179f62cb019cacc1) — Colin Rosenthal (csr) / detail
  13. [maven-release-plugin] prepare for next development iteration (commit: 16147d5dddfd034ad243da25c911a9fa1e4d53d3) — Colin Rosenthal (csr) / detail
  14. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: b246f40d31190967ee84f51cf54e73def89c6737) — Colin Rosenthal (csr) / detail
  15. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: e97567d8b0cf594e4ba5ac10d3d7d8449adc0cc0) — Colin Rosenthal (csr) / detail
  16. [maven-release-plugin] prepare for next development iteration (commit: 974b3a9a687aca6fd6fba84828f84f0699ff8078) — Colin Rosenthal (csr) / detail
  17. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: df51498918198325ccd2b687dde5c09872ddcc1b) — Colin Rosenthal (csr) / detail
  18. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: 597dca6302626d2eb975dbd7b6e8f7bf8e4dfe17) — Colin Rosenthal (csr) / detail
  19. [maven-release-plugin] prepare for next development iteration (commit: d7f4a80b29070e594c26a17a7087d4fff502b8ea) — Colin Rosenthal (csr) / detail
  20. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: c8b3c3a9a215db98a3e07c327593f73b93d531ea) — Colin Rosenthal (csr) / detail
  21. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: d71699a62c3a45657e4faf14519b97ba71593b80) — Colin Rosenthal (csr) / detail
  22. [maven-release-plugin] prepare for next development iteration (commit: d2fae1846cd326ddf93451ef5d9f15ecbe4f6a73) — Colin Rosenthal (csr) / detail
  23. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: 185f170ccde75314ccc6dfec0939d2390c459824) — Colin Rosenthal (csr) / detail
  24. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: 3ca4bd51ae53845b7e6867d637188d4b866f17a6) — Colin Rosenthal (csr) / detail
  25. [maven-release-plugin] prepare for next development iteration (commit: 3208775980a139629d0de3a5c1c9859f2ee21543) — Colin Rosenthal (csr) / detail
  26. [maven-release-plugin] rollback the release of netarchivesuite-6.0 (commit: f67005308b68ec48d71838bc2efe4d7a9ee38a07) — Colin Rosenthal (csr) / detail
  27. [maven-release-plugin] prepare release netarchivesuite-6.0 (commit: dff365105289bbe62558d6dafa673690caf7d153) — Colin Rosenthal (csr) / detail
  28. [maven-release-plugin] prepare for next development iteration (commit: 6fcd724e7c8c86b5edb57b79e74e0c2a202101e1) — Colin Rosenthal (csr) / detail
  29. Added some integration tests for indexing on hadoop (commit: 96dfe55073b04b9cab36dbc4607c7496c877c2fd) — Colin Rosenthal (csr) / detail
  30. Removed the test which was less like the anticipated prod architecture (commit: 1ac65bb88c41f4f60214f728145a0c39ad035d46) — Colin Rosenthal (csr) / detail
  31. Tidied up the hadoop/cdx integration test (commit: b8eaa110fb2b6f1b8e2530776aa5c8d3e901273c) — Colin Rosenthal (csr) / detail
  32. Added Readme file in empty directory (commit: c4da8b152e8fa7f10de0bd1fb48ed0868b7664eb) — Colin Rosenthal (csr) / detail
  33. Added Readme file in empty directory (commit: 96b73b68fa7e1301ab0328df8e0210d56a735c52) — Colin Rosenthal (csr) / detail
  34. Added a hdfs setting that seems relevant (commit: 60cd273673637ba210763dda8b107a7b96bef508) — Colin Rosenthal (csr) / detail
  35. Made method for indexing with Hadoop that assumes direct access to input (commit: b227515615b8ca0bd8c8fe2fd34be189679549c8) — Rasmus Bohl Kristensen (rbkr) / detail
  36. Dedup indexing (commit: f5508c47ebc69bc946ff16ca87218c949a59508a) — Colin Rosenthal (csr) / detail
  37. Code-maturation for cdx-indexing (commit: c87a69b57c851c9d90e12982a344da40f614bc74) — Colin Rosenthal (csr) / detail
  38. Attempt to avoid double-indexing (commit: e1c23281deb8bd3055c9f85508c5076076f37c32) — Colin Rosenthal (csr) / detail
  39. Initial work on FileResolver (commit: d411beb865c7dace665ae2709953857503a37c27) — Colin Rosenthal (csr) / detail
  40. Added hadoop job for getting metadata lines from archive files and an (commit: 553e20659df3bb62c6d121d50c6effa3fc8947e9) — Rasmus Bohl Kristensen (rbkr) / detail
  41. Added pattern-matching method to file-resolver (commit: 57b380f2f8c289544404aecbb81f6cef8a084274) — Colin Rosenthal (csr) / detail
  42. Small refactor of ArchiveFile/HadoopUtils, few touch ups and started on (commit: 0d880c6017572102b4bf24e60d87ea35a84e2470) — Rasmus Bohl Kristensen (rbkr) / detail
  43. Changed test to use paths relative to module root (commit: 0602fcb9458cfa655d0cb39f98d004e720bc9e42) — Colin Rosenthal (csr) / detail
  44. Added a conf flag to switch between standard indexing and dedup indexing (commit: 2e2173b833c5c5ddd879e7548b96d875a06353a1) — Colin Rosenthal (csr) / detail
  45. 'Start' of https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-1970 (commit: d52a6bfda1ed72ad4fc125356ae274f18e0de8c6) — Rasmus Bohl Kristensen (rbkr) / detail
  46. Integration of Hadoop dedup indexing with GetMetadataArchiveMapper now (commit: ca2c62d474caf14d80e8e1e8f3970a4582e84672) — Rasmus Bohl Kristensen (rbkr) / detail
  47. Cleaned up a few things in RawMetadataCache and refactored HadoopUtils (commit: d10211a994d936309d076e87f0ae9699d99f385e) — Rasmus Bohl Kristensen (rbkr) / detail
  48. Squashed commit of the following: (commit: 8d9adc2b50d996dfaa544528b34a7b6b96947d1e) — Rasmus Bohl Kristensen (rbkr) / detail
  49. Added pattern configuration constants in GetMetadataMapper (commit: 9c130776d86bffa43170028cad724353348ec8dc) — Rasmus Bohl Kristensen (rbkr) / detail
  50. Review https://sbforge.org/fisheye/cru/CR-NAS-385 changes (commit: 553c4afcb7ddf654b26cc4c9afa3c4cdc7c79197) — Rasmus Bohl Kristensen (rbkr) / detail
  51. Small refactor and implemented harvestRecentFilenames (commit: 199355bf63aba632f2dcb23c66de7778b794e97e) — Rasmus Bohl Kristensen (rbkr) / detail
  52. Removed more old bitmag classes, refactored parts of some classes for (commit: dbf8703610bf19bfe044cf7f6f59a10710fdf7b4) — Rasmus Bohl Kristensen (rbkr) / detail
  53. Fixed some old imports that made the compiler angry (commit: 48d011475ef9dc0480986dbab35e8266e3207f4f) — Rasmus Bohl Kristensen (rbkr) / detail
  54. Fixed up FileResolverRESTClient for review and refactored code to enable (commit: 0a31340c22213cb7707a5188ec83ded5143c22ce) — Colin Rosenthal (csr) / detail
  55. Added more logging to FileNameHarvester (commit: b7120d70f3adf93ca6a12368f30897084a8a6295) — Rasmus Bohl Kristensen (rbkr) / detail
  56. Small refactor to make ArchiveFile's collectHadoopResults use (commit: 358b6977ce8fc98987a32327d57815bd30c0f34a) — Rasmus Bohl Kristensen (rbkr) / detail
  57. Fixed bug with indexing threads sharing same filesystem instance (commit: d8c00a93115685b3357ec4202d7de727d76486fa) — Rasmus Bohl Kristensen (rbkr) / detail
  58. Fixed bug with indexing threads sharing same filesystem instance (commit: e9969a932d57a7e301ec95365219e3a662c3b0c6) — Colin Rosenthal (csr) / detail
  59. Added cdx indexing for metadata files in CDXIndexer and proper testing (commit: 7000ae16f8936955299227260d601c5db7005b81) — Rasmus Bohl Kristensen (rbkr) / detail
  60. Got Hadoop replacement for ArchiveExtractCDXJob ready, refactored some (commit: 52c718231cc4eb4ba13f6161ace4f701aeb4b738) — Rasmus Bohl Kristensen (rbkr) / detail
  61. Setting fix from review https://sbforge.org/jira/browse/NARK-1954 (commit: 5cb2bc46120e35bc9e1074ec1f135efd672d4b2a) — Rasmus Bohl Kristensen (rbkr) / detail
  62. Review changes https://sbforge.org/fisheye/cru/CR-NAS-393, changes to (commit: bf5e943440e3760f4e25662ffcf603c3a78c0b2e) — Rasmus Bohl Kristensen (rbkr) / detail
  63. Fixed SimpleFileResolver, refactored how Hadoop jobs can be started, and (commit: 987c230dc013d45aaac9554df0392043965e56a0) — Rasmus Bohl Kristensen (rbkr) / detail
  64. Squashed commit of the following: (commit: ab9b8860ca1f5323ca20cabf8a23c7ee01009bc8) — Rasmus Bohl Kristensen (rbkr) / detail
  65. Removed old bitmag classes and remnants of it (commit: ad3aaf637af932564da7d15e83b75c02e9f7fb6f) — Rasmus Bohl Kristensen (rbkr) / detail
  66. Bit of refactoring and made SSL provider to work with https (commit: 8f03956481c07a5c8639a847a6a51a30a8827882) — Rasmus Bohl Kristensen (rbkr) / detail
  67. Create FileSystem with newInstance and close it afterwards. DO NOT CLOSE (commit: b006660cc04ac3f6c2442dc0d09b74b6c017c9c9) — Asger Askov Blekinge (abr) / detail
  68. Refactoring to make MetadataIndexingApplication closer to a reusable (commit: 48366f7a72262d6f1442b635b466a2b929b9bcd3) — Colin Rosenthal (csr) / detail
  69. Squashed commit of the following: (commit: 73e016f0e681e23ea9b50428fed236cbba4b706c) — Rasmus Bohl Kristensen (rbkr) / detail
  70. Moved Kerberos logins (commit: d022a62c4c855acd8a042db1a240ea515a063d7f) — Rasmus Bohl Kristensen (rbkr) / detail
  71. collectionID setting fix to always default to env name when unset (commit: a114fff89a4046d8084f41be0a25f599dd785575) — Rasmus Bohl Kristensen (rbkr) / detail
  72. Just some optimized imports and small stuff (commit: 48977bc912bcd2f4b9b44db06e65c04eb0281352) — Rasmus Bohl Kristensen (rbkr) / detail
  73. Follow up to own review comments (commit: 681d1147f7824aa7a18804c4b1d9475ce9c26c19) — Colin Rosenthal (csr) / detail
  74. Fixed tests and trying out @ignore for failing mappers (commit: cb5f59cb8465f2eea112989e4ed878851c79ff87) — Rasmus Bohl Kristensen (rbkr) / detail
  75. Changed version to 7.0-SNAPSHOT (commit: 617efc9a7b921f06f00bcf8aee6e104253242f84) — Colin Rosenthal (csr) / detail
  76. [maven-release-plugin] prepare release netarchivesuite-7.0 (commit: 90520aea9d5e13775630dfd51d5942015feb6732) — Colin Rosenthal (csr) / detail
  77. [maven-release-plugin] prepare for next development iteration (commit: 9f8459ec50a9145c1922559b5ed7a407b550ca7b) — Colin Rosenthal (csr) / detail
  78. Added rethrows for better error handling from hadoop. (commit: 42fe34dbfa04f34062e56c107c65a71869fd3c04) — Colin Rosenthal (csr) / detail
  79. Added extra logging on file indexing (commit: 8cf9d33c4d4d8f96851390bd05f4f1d7d1a98995) — Colin Rosenthal (csr) / detail
  80. Corrected a misinformative log statement (commit: e01be32a5147549f58d6d424c65a475ae46fcdd6) — Colin Rosenthal (csr) / detail