Started 3 yr 0 mo ago

Not built Build NetarchiveSuite - wayback - indexer (08-Mar-2021 08:20:32)

Changes
  1. [maven-release-plugin] prepare release netarchivesuite-5.6 (commit: 7b15d7abf1ed8a11f000177058f3acbea23ab29f) (detail)
  2. [maven-release-plugin] prepare for next development iteration (commit: 97ebda03303de3f69cd2656ced3de95af6226083) (detail)
  3. Updated version to a unique name (commit: e3b328a3cca883f6396a2955ef28bb0e2c7d2300) (detail)
  4. Small changes to settings (commit: a85a3ad520234cf8cedeeef058911ad47c304474) (detail)
  5. Can now at least work with local bitmag, it seems (commit: 91bdce496fa2ebe20ee0eeb8ce97528a868505f1) (detail)
  6. FileNameHarvester now grabs list of files directly from bitmag. Added (commit: 6824e324e6ccd74a10740790c11968bd5803a312) (detail)
  7. Indexing through hadoop instead of batch should now work for WARC files (commit: 982056795b045ed339987660c2d13dd7f41d1073) (detail)
  8. Changes from review (commit: a4871eadd0dcfd71fe3bda54170c5f911ae3da88) (detail)
  9. Fixed dependency conflict with hadoop-client package and finished (commit: e8735f0b8fa3800b012a7928e369cf358b5248d8) (detail)
  10. Refactored Bitrepository to a singleton. (commit: e494e8d0482fde2bc44ee840b4e675dac00a43d5) (detail)
  11. Small logging changes (commit: c091c483521bf77ea95b7450dd0aee6449f5ccac) (detail)
  12. Bitrepository class changes (commit: 9a39a4b2aacdc9283f4c34b080693ca5a1ddfe92) (detail)
  13. Dependency fix to avoid logging loop and small logging changes (commit: a709b310c9f6af51c6312e3ed3fec73a912e2622) (detail)
  14. Added some integration tests for indexing on hadoop (commit: 96dfe55073b04b9cab36dbc4607c7496c877c2fd) (detail)
  15. Removed the test which was less like the anticipated prod architecture (commit: 1ac65bb88c41f4f60214f728145a0c39ad035d46) (detail)
  16. Tidied up the hadoop/cdx integration test (commit: b8eaa110fb2b6f1b8e2530776aa5c8d3e901273c) (detail)
  17. Added Readme file in empty directory (commit: c4da8b152e8fa7f10de0bd1fb48ed0868b7664eb) (detail)
  18. Added Readme file in empty directory (commit: 96b73b68fa7e1301ab0328df8e0210d56a735c52) (detail)
  19. Added a hdfs setting that seems relevant (commit: 60cd273673637ba210763dda8b107a7b96bef508) (detail)
  20. Made method for indexing with Hadoop that assumes direct access to input (commit: b227515615b8ca0bd8c8fe2fd34be189679549c8) (detail)
  21. Dedup indexing (commit: f5508c47ebc69bc946ff16ca87218c949a59508a) (detail)
  22. Code-maturation for cdx-indexing (commit: c87a69b57c851c9d90e12982a344da40f614bc74) (detail)
  23. Attempt to avoid double-indexing (commit: e1c23281deb8bd3055c9f85508c5076076f37c32) (detail)
  24. Initial work on FileResolver (commit: d411beb865c7dace665ae2709953857503a37c27) (detail)
  25. Added hadoop job for getting metadata lines from archive files and an (commit: 553e20659df3bb62c6d121d50c6effa3fc8947e9) (detail)
  26. Added pattern-matching method to file-resolver (commit: 57b380f2f8c289544404aecbb81f6cef8a084274) (detail)
  27. Small refactor of ArchiveFile/HadoopUtils, few touch ups and started on (commit: 0d880c6017572102b4bf24e60d87ea35a84e2470) (detail)
  28. Changed test to use paths relative to module root (commit: 0602fcb9458cfa655d0cb39f98d004e720bc9e42) (detail)
  29. Added a conf flag to switch between standard indexing and dedup indexing (commit: 2e2173b833c5c5ddd879e7548b96d875a06353a1) (detail)
  30. 'Start' of https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-1970 (commit: d52a6bfda1ed72ad4fc125356ae274f18e0de8c6) (detail)
  31. Integration of Hadoop dedup indexing with GetMetadataArchiveMapper now (commit: ca2c62d474caf14d80e8e1e8f3970a4582e84672) (detail)
  32. Cleaned up a few things in RawMetadataCache and refactored HadoopUtils (commit: d10211a994d936309d076e87f0ae9699d99f385e) (detail)
  33. Squashed commit of the following: (commit: 8d9adc2b50d996dfaa544528b34a7b6b96947d1e) (detail)
  34. Added pattern configuration constants in GetMetadataMapper (commit: 9c130776d86bffa43170028cad724353348ec8dc) (detail)
  35. Review https://sbforge.org/fisheye/cru/CR-NAS-385 changes (commit: 553c4afcb7ddf654b26cc4c9afa3c4cdc7c79197) (detail)
  36. Small refactor and implemented harvestRecentFilenames (commit: 199355bf63aba632f2dcb23c66de7778b794e97e) (detail)
  37. Removed more old bitmag classes, refactored parts of some classes for (commit: dbf8703610bf19bfe044cf7f6f59a10710fdf7b4) (detail)
  38. Fixed some old imports that made the compiler angry (commit: 48d011475ef9dc0480986dbab35e8266e3207f4f) (detail)
  39. Fixed up FileResolverRESTClient for review and refactored code to enable (commit: 0a31340c22213cb7707a5188ec83ded5143c22ce) (detail)
  40. Added more logging to FileNameHarvester (commit: b7120d70f3adf93ca6a12368f30897084a8a6295) (detail)
  41. Small refactor to make ArchiveFile's collectHadoopResults use (commit: 358b6977ce8fc98987a32327d57815bd30c0f34a) (detail)
  42. Fixed bug with indexing threads sharing same filesystem instance (commit: d8c00a93115685b3357ec4202d7de727d76486fa) (detail)
  43. Fixed bug with indexing threads sharing same filesystem instance (commit: e9969a932d57a7e301ec95365219e3a662c3b0c6) (detail)
  44. Added cdx indexing for metadata files in CDXIndexer and proper testing (commit: 7000ae16f8936955299227260d601c5db7005b81) (detail)
  45. Got Hadoop replacement for ArchiveExtractCDXJob ready, refactored some (commit: 52c718231cc4eb4ba13f6161ace4f701aeb4b738) (detail)
  46. Setting fix from review https://sbforge.org/jira/browse/NARK-1954 (commit: 5cb2bc46120e35bc9e1074ec1f135efd672d4b2a) (detail)
  47. Review changes https://sbforge.org/fisheye/cru/CR-NAS-393, changes to (commit: bf5e943440e3760f4e25662ffcf603c3a78c0b2e) (detail)
  48. Fixed SimpleFileResolver, refactored how Hadoop jobs can be started, and (commit: 987c230dc013d45aaac9554df0392043965e56a0) (detail)
  49. Squashed commit of the following: (commit: ab9b8860ca1f5323ca20cabf8a23c7ee01009bc8) (detail)
  50. Bit of refactoring and made SSL provider to work with https (commit: 8f03956481c07a5c8639a847a6a51a30a8827882) (detail)
  51. Create FileSystem with newInstance and close it afterwards. DO NOT CLOSE (commit: b006660cc04ac3f6c2442dc0d09b74b6c017c9c9) (detail)
  52. Refactoring to make MetadataIndexingApplication closer to a reusable (commit: 48366f7a72262d6f1442b635b466a2b929b9bcd3) (detail)
  53. Squashed commit of the following: (commit: 73e016f0e681e23ea9b50428fed236cbba4b706c) (detail)