UnstableChanges

Summary

  1. Updated version to a unique name (commit: e3b328a3cca883f6396a2955ef28bb0e2c7d2300) (details)
  2. Small changes to settings (commit: a85a3ad520234cf8cedeeef058911ad47c304474) (details)
  3. Can now at least work with local bitmag, it seems (commit: 91bdce496fa2ebe20ee0eeb8ce97528a868505f1) (details)
  4. FileNameHarvester now grabs list of files directly from bitmag. Added (commit: 6824e324e6ccd74a10740790c11968bd5803a312) (details)
  5. Indexing through hadoop instead of batch should now work for WARC files (commit: 982056795b045ed339987660c2d13dd7f41d1073) (details)
  6. Changes from review (commit: a4871eadd0dcfd71fe3bda54170c5f911ae3da88) (details)
  7. Fixed dependency conflict with hadoop-client package and finished (commit: e8735f0b8fa3800b012a7928e369cf358b5248d8) (details)
  8. Refactored Bitrepository to a singleton. (commit: e494e8d0482fde2bc44ee840b4e675dac00a43d5) (details)
  9. Small logging changes (commit: c091c483521bf77ea95b7450dd0aee6449f5ccac) (details)
  10. Bitrepository class changes (commit: 9a39a4b2aacdc9283f4c34b080693ca5a1ddfe92) (details)
  11. Dependency fix to avoid logging loop and small logging changes (commit: a709b310c9f6af51c6312e3ed3fec73a912e2622) (details)
  12. Added some integration tests for indexing on hadoop (commit: 96dfe55073b04b9cab36dbc4607c7496c877c2fd) (details)
  13. Removed the test which was less like the anticipated prod architecture (commit: 1ac65bb88c41f4f60214f728145a0c39ad035d46) (details)
  14. Tidied up the hadoop/cdx integration test (commit: b8eaa110fb2b6f1b8e2530776aa5c8d3e901273c) (details)
  15. Added Readme file in empty directory (commit: c4da8b152e8fa7f10de0bd1fb48ed0868b7664eb) (details)
  16. Added Readme file in empty directory (commit: 96b73b68fa7e1301ab0328df8e0210d56a735c52) (details)
  17. Added a hdfs setting that seems relevant (commit: 60cd273673637ba210763dda8b107a7b96bef508) (details)
  18. Made method for indexing with Hadoop that assumes direct access to input (commit: b227515615b8ca0bd8c8fe2fd34be189679549c8) (details)
  19. Dedup indexing (commit: f5508c47ebc69bc946ff16ca87218c949a59508a) (details)
  20. Code-maturation for cdx-indexing (commit: c87a69b57c851c9d90e12982a344da40f614bc74) (details)
  21. Attempt to avoid double-indexing (commit: e1c23281deb8bd3055c9f85508c5076076f37c32) (details)
  22. Initial work on FileResolver (commit: d411beb865c7dace665ae2709953857503a37c27) (details)
  23. Added hadoop job for getting metadata lines from archive files and an (commit: 553e20659df3bb62c6d121d50c6effa3fc8947e9) (details)
  24. Added pattern-matching method to file-resolver (commit: 57b380f2f8c289544404aecbb81f6cef8a084274) (details)
  25. Small refactor of ArchiveFile/HadoopUtils, few touch ups and started on (commit: 0d880c6017572102b4bf24e60d87ea35a84e2470) (details)
  26. Changed test to use paths relative to module root (commit: 0602fcb9458cfa655d0cb39f98d004e720bc9e42) (details)
  27. Added a conf flag to switch between standard indexing and dedup indexing (commit: 2e2173b833c5c5ddd879e7548b96d875a06353a1) (details)
  28. 'Start' of https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-1970 (commit: d52a6bfda1ed72ad4fc125356ae274f18e0de8c6) (details)
  29. Integration of Hadoop dedup indexing with GetMetadataArchiveMapper now (commit: ca2c62d474caf14d80e8e1e8f3970a4582e84672) (details)
  30. Cleaned up a few things in RawMetadataCache and refactored HadoopUtils (commit: d10211a994d936309d076e87f0ae9699d99f385e) (details)
  31. Squashed commit of the following: (commit: 8d9adc2b50d996dfaa544528b34a7b6b96947d1e) (details)
  32. Added pattern configuration constants in GetMetadataMapper (commit: 9c130776d86bffa43170028cad724353348ec8dc) (details)
  33. Review https://sbforge.org/fisheye/cru/CR-NAS-385 changes (commit: 553c4afcb7ddf654b26cc4c9afa3c4cdc7c79197) (details)
  34. Small refactor and implemented harvestRecentFilenames (commit: 199355bf63aba632f2dcb23c66de7778b794e97e) (details)
  35. Removed more old bitmag classes, refactored parts of some classes for (commit: dbf8703610bf19bfe044cf7f6f59a10710fdf7b4) (details)
  36. Fixed some old imports that made the compiler angry (commit: 48d011475ef9dc0480986dbab35e8266e3207f4f) (details)
  37. Fixed up FileResolverRESTClient for review and refactored code to enable (commit: 0a31340c22213cb7707a5188ec83ded5143c22ce) (details)
  38. Added more logging to FileNameHarvester (commit: b7120d70f3adf93ca6a12368f30897084a8a6295) (details)
  39. Small refactor to make ArchiveFile's collectHadoopResults use (commit: 358b6977ce8fc98987a32327d57815bd30c0f34a) (details)
  40. Fixed bug with indexing threads sharing same filesystem instance (commit: d8c00a93115685b3357ec4202d7de727d76486fa) (details)
  41. Fixed bug with indexing threads sharing same filesystem instance (commit: e9969a932d57a7e301ec95365219e3a662c3b0c6) (details)
  42. Added cdx indexing for metadata files in CDXIndexer and proper testing (commit: 7000ae16f8936955299227260d601c5db7005b81) (details)
  43. Got Hadoop replacement for ArchiveExtractCDXJob ready, refactored some (commit: 52c718231cc4eb4ba13f6161ace4f701aeb4b738) (details)
  44. Setting fix from review https://sbforge.org/jira/browse/NARK-1954 (commit: 5cb2bc46120e35bc9e1074ec1f135efd672d4b2a) (details)
  45. Review changes https://sbforge.org/fisheye/cru/CR-NAS-393, changes to (commit: bf5e943440e3760f4e25662ffcf603c3a78c0b2e) (details)
  46. Fixed SimpleFileResolver, refactored how Hadoop jobs can be started, and (commit: 987c230dc013d45aaac9554df0392043965e56a0) (details)
  47. Squashed commit of the following: (commit: ab9b8860ca1f5323ca20cabf8a23c7ee01009bc8) (details)
  48. Removed old bitmag classes and remnants of it (commit: ad3aaf637af932564da7d15e83b75c02e9f7fb6f) (details)
  49. Bit of refactoring and made SSL provider to work with https (commit: 8f03956481c07a5c8639a847a6a51a30a8827882) (details)
  50. Create FileSystem with newInstance and close it afterwards. DO NOT CLOSE (commit: b006660cc04ac3f6c2442dc0d09b74b6c017c9c9) (details)
  51. Refactoring to make MetadataIndexingApplication closer to a reusable (commit: 48366f7a72262d6f1442b635b466a2b929b9bcd3) (details)
  52. Squashed commit of the following: (commit: 73e016f0e681e23ea9b50428fed236cbba4b706c) (details)
  53. Moved Kerberos logins (commit: d022a62c4c855acd8a042db1a240ea515a063d7f) (details)
  54. collectionID setting fix to always default to env name when unset (commit: a114fff89a4046d8084f41be0a25f599dd785575) (details)
  55. Just some optimized imports and small stuff (commit: 48977bc912bcd2f4b9b44db06e65c04eb0281352) (details)
  56. Follow up to own review comments (commit: 681d1147f7824aa7a18804c4b1d9475ce9c26c19) (details)
Commit e3b328a3cca883f6396a2955ef28bb0e2c7d2300 by Colin Rosenthal (csr)
Updated version to a unique name
(commit: e3b328a3cca883f6396a2955ef28bb0e2c7d2300)
The file was modifiedharvester/heritrix3/heritrix3-controller/pom.xml
The file was modifiedwayback/wayback-indexer/pom.xml
The file was modifiedmonitor/monitor-test/pom.xml
The file was modifiedwayback/wayback-test/pom.xml
The file was modifieddeploy/distribution/pom.xml
The file was modifiedpom.xml
The file was modifiedcommon/netarchivesuite-test-utils/pom.xml
The file was modifiedharvester/harvest-scheduler/pom.xml
The file was modifieddeploy/pom.xml
The file was modifiedharvester/harvester-test/pom.xml
The file was modifiedarchive/archive-core/pom.xml
The file was modifiedharvester/history-gui/pom.xml
The file was modifiedharvester/heritrix3/heritrix3-extensions/pom.xml
The file was modifiedcommon/pom.xml
The file was modifiedharvester/pom.xml
The file was modifiedmonitor/status-gui/pom.xml
The file was modifiedarchive/archive-test/pom.xml
The file was modifiedharvester/harvestchannel-gui/pom.xml
The file was modifiedharvester/qa-gui/pom.xml
The file was modifiedcommon/common-test/pom.xml
The file was modifiedwayback/wayback-resourcestore/pom.xml
The file was modifiedwayback/pom.xml
The file was modifiedarchive/pom.xml
The file was modifiedcommon/common-core/pom.xml
The file was modifieddeploy/deploy-core/pom.xml
The file was modifiedharvester/heritrix3/heritrix3-monitor/pom.xml
The file was modifieddeploy/deploy-test/pom.xml
The file was modifiedintegration-test/pom.xml
The file was modifiedharvester/heritrix3/heritrix3-bundler/pom.xml
The file was modifiedbuild-tools/pom.xml
The file was modifiedharvester/harvestdefinition-gui/pom.xml
The file was modifiedharvester/heritrix3/pom.xml
The file was modifiedintegration-test/system-test/pom.xml
The file was modifiedmonitor/monitor-core/pom.xml
The file was modifiedarchive/bitpreservation-gui/pom.xml
The file was modifiedharvester/harvester-core/pom.xml
The file was modifiedmonitor/pom.xml
Commit a85a3ad520234cf8cedeeef058911ad47c304474 by Rasmus Bohl Kristensen (rbkr)
Small changes to settings
(commit: a85a3ad520234cf8cedeeef058911ad47c304474)
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/HadoopUtilsTester.java
The file was modifiedcommon/common-core/src/main/resources/dk/netarkivet/common/settings.xml
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/HadoopUtils.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
Commit 91bdce496fa2ebe20ee0eeb8ce97528a868505f1 by Rasmus Bohl Kristensen (rbkr)
Can now at least work with local bitmag, it seems
(commit: 91bdce496fa2ebe20ee0eeb8ce97528a868505f1)
The file was modifiedcommon/common-core/src/main/resources/dk/netarkivet/common/distribute/arcrepository/bitrepository/BitmagArcRepositoryClientSettings.xml
The file was addedcommon/common-core/src/main/resources/dk/netarkivet/common/settings.xml.new.bak
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/BitmagArcRepositoryClient.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/HadoopUtilsTester.java
The file was modifiedpom.xml
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/TestBitrepository.java
The file was addedcommon/common-core/src/main/resources/dk/netarkivet/common/distribute/arcrepository/RepositorySettings.xml
The file was addedcommon/common-core/src/main/resources/dk/netarkivet/common/distribute/arcrepository/ReferenceSettings.xml
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
Commit 6824e324e6ccd74a10740790c11968bd5803a312 by Rasmus Bohl Kristensen (rbkr)
FileNameHarvester now grabs list of files directly from bitmag. Added
basic stuff for getting started with using hadoop jobs instead of batch
in ArchiveFile
(commit: 6824e324e6ccd74a10740790c11968bd5803a312)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
Commit 982056795b045ed339987660c2d13dd7f41d1073 by Rasmus Bohl Kristensen (rbkr)
Indexing through hadoop instead of batch should now work for WARC files
(commit: 982056795b045ed339987660c2d13dd7f41d1073)
The file was modifiedwayback/pom.xml
The file was modifiedcommon/common-core/src/main/resources/dk/netarkivet/common/distribute/arcrepository/bitrepository/BitmagArcRepositoryClientSettings.xml
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/HadoopUtils.java
The file was modifiedcommon/common-core/src/main/resources/dk/netarkivet/common/distribute/arcrepository/RepositorySettings.xml
The file was modifiedcommon/common-core/src/main/resources/dk/netarkivet/common/distribute/arcrepository/ReferenceSettings.xml
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/HadoopUtilsTester.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedpom.xml
The file was modifiedcommon/common-core/pom.xml
The file was addedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXJob.java
The file was addedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMap.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
The file was addedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXIndexer.java
Commit a4871eadd0dcfd71fe3bda54170c5f911ae3da88 by Rasmus Bohl Kristensen (rbkr)
Changes from review
(commit: a4871eadd0dcfd71fe3bda54170c5f911ae3da88)
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/BitmagUtils.java
The file was removedcommon/common-core/src/main/resources/dk/netarkivet/common/settings.xml.new.bak
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/HadoopUtils.java
The file was modifiedcommon/common-core/src/main/resources/dk/netarkivet/common/settings.xml
The file was removedcommon/common-core/src/main/resources/dk/netarkivet/common/distribute/arcrepository/RepositorySettings.xml
The file was modifiedcommon/common-core/src/main/resources/dk/netarkivet/common/distribute/arcrepository/bitrepository/BitmagArcRepositoryClientSettings.xml
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXJob.java
The file was removedcommon/common-core/src/test/java/dk/netarkivet/common/HadoopUtilsTester.java
The file was removedcommon/common-core/src/main/resources/dk/netarkivet/common/distribute/arcrepository/ReferenceSettings.xml
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXIndexer.java
Commit e8735f0b8fa3800b012a7928e369cf358b5248d8 by Rasmus Bohl Kristensen (rbkr)
Fixed dependency conflict with hadoop-client package and finished
harvestRecentFilenames
(commit: e8735f0b8fa3800b012a7928e369cf358b5248d8)
The file was modifiedpom.xml
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
Commit e494e8d0482fde2bc44ee840b4e675dac00a43d5 by Colin Rosenthal (csr)
Refactored Bitrepository to a singleton.
(commit: e494e8d0482fde2bc44ee840b4e675dac00a43d5)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/BitmagArcRepositoryClient.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/Bitrepository.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/TestBitrepository.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/BitmagUtils.java
The file was modifiedarchive/archive-core/src/main/java/dk/netarkivet/archive/arcrepository/distribute/JMSBitmagArcRepositoryClient.java
Commit c091c483521bf77ea95b7450dd0aee6449f5ccac by Rasmus Bohl Kristensen (rbkr)
Small logging changes
(commit: c091c483521bf77ea95b7450dd0aee6449f5ccac)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
Commit 9a39a4b2aacdc9283f4c34b080693ca5a1ddfe92 by Rasmus Bohl Kristensen (rbkr)
Bitrepository class changes
(commit: 9a39a4b2aacdc9283f4c34b080693ca5a1ddfe92)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/BitmagArcRepositoryClient.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedarchive/archive-core/src/main/java/dk/netarkivet/archive/arcrepository/distribute/JMSBitmagArcRepositoryClient.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/TestBitrepository.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/BitmagUtils.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/Bitrepository.java
Commit a709b310c9f6af51c6312e3ed3fec73a912e2622 by Rasmus Bohl Kristensen (rbkr)
Dependency fix to avoid logging loop and small logging changes
(commit: a709b310c9f6af51c6312e3ed3fec73a912e2622)
The file was modifiedcommon/common-core/pom.xml
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
Commit 96dfe55073b04b9cab36dbc4607c7496c877c2fd by Colin Rosenthal (csr)
Added some integration tests for indexing on hadoop
(commit: 96dfe55073b04b9cab36dbc4607c7496c877c2fd)
The file was addedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
The file was modifiedwayback/wayback-indexer/pom.xml
Commit 1ac65bb88c41f4f60214f728145a0c39ad035d46 by Colin Rosenthal (csr)
Removed the test which was less like the anticipated prod architecture
(commit: 1ac65bb88c41f4f60214f728145a0c39ad035d46)
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
Commit b8eaa110fb2b6f1b8e2530776aa5c8d3e901273c by Colin Rosenthal (csr)
Tidied up the hadoop/cdx integration test
(commit: b8eaa110fb2b6f1b8e2530776aa5c8d3e901273c)
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
The file was modifiedwayback/wayback-indexer/pom.xml
Commit c4da8b152e8fa7f10de0bd1fb48ed0868b7664eb by Colin Rosenthal (csr)
Added Readme file in empty directory
(commit: c4da8b152e8fa7f10de0bd1fb48ed0868b7664eb)
The file was addedharvester/harvester-test/src/test/resources/h3-templates/default_obeyrobots.xml
The file was addedwayback/wayback-indexer/src/test/testdata/Readme.md
Commit 96b73b68fa7e1301ab0328df8e0210d56a735c52 by Colin Rosenthal (csr)
Added Readme file in empty directory
(commit: 96b73b68fa7e1301ab0328df8e0210d56a735c52)
The file was addedwayback/wayback-indexer/src/test/testdata/Readme.md
The file was addedharvester/harvester-test/src/test/resources/h3-templates/default_obeyrobots.xml
Commit 60cd273673637ba210763dda8b107a7b96bef508 by Colin Rosenthal (csr)
Added a hdfs setting that seems relevant
(commit: 60cd273673637ba210763dda8b107a7b96bef508)
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
Commit b227515615b8ca0bd8c8fe2fd34be189679549c8 by Rasmus Bohl Kristensen (rbkr)
Made method for indexing with Hadoop that assumes direct access to input
files
(commit: b227515615b8ca0bd8c8fe2fd34be189679549c8)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/HadoopUtils.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXJob.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
Commit f5508c47ebc69bc946ff16ca87218c949a59508a by Colin Rosenthal (csr)
Dedup indexing
(commit: f5508c47ebc69bc946ff16ca87218c949a59508a)
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXIndexer.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMap.java
The file was addedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/DedupIndexer.java
The file was addedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/Indexer.java
Commit c87a69b57c851c9d90e12982a344da40f614bc74 by Colin Rosenthal (csr)
Code-maturation for cdx-indexing
(commit: c87a69b57c851c9d90e12982a344da40f614bc74)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
Commit e1c23281deb8bd3055c9f85508c5076076f37c32 by Colin Rosenthal (csr)
Attempt to avoid double-indexing
(commit: e1c23281deb8bd3055c9f85508c5076076f37c32)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/IndexerQueue.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
Commit d411beb865c7dace665ae2709953857503a37c27 by Colin Rosenthal (csr)
Initial work on FileResolver
(commit: d411beb865c7dace665ae2709953857503a37c27)
The file was addedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileResolver.java
The file was addedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/SimpleFileResolver.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
Commit 553e20659df3bb62c6d121d50c6effa3fc8947e9 by Rasmus Bohl Kristensen (rbkr)
Added hadoop job for getting metadata lines from archive files and an
integration test to go with it
(commit: 553e20659df3bb62c6d121d50c6effa3fc8947e9)
The file was modifiedharvester/harvester-test/pom.xml
The file was addedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/hadoop/GetMetaDataArchiveHadoopJobTester.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJob.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/TestInfo.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMap.java
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/GetMetadataArchiveBatchJobTester.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/GetMetadataArchiveMapper.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/archive/GetMetadataArchiveBatchJob.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
The file was removedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXJob.java
Commit 57b380f2f8c289544404aecbb81f6cef8a084274 by Colin Rosenthal (csr)
Added pattern-matching method to file-resolver
(commit: 57b380f2f8c289544404aecbb81f6cef8a084274)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/SimpleFileResolver.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileResolver.java
Commit 0d880c6017572102b4bf24e60d87ea35a84e2470 by Rasmus Bohl Kristensen (rbkr)
Small refactor of ArchiveFile/HadoopUtils, few touch ups and started on
metadata job
(commit: 0d880c6017572102b4bf24e60d87ea35a84e2470)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/HadoopUtils.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/GetMetadataArchiveMapper.java
Commit 0602fcb9458cfa655d0cb39f98d004e720bc9e42 by Colin Rosenthal (csr)
Changed test to use paths relative to module root
(commit: 0602fcb9458cfa655d0cb39f98d004e720bc9e42)
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
Commit 2e2173b833c5c5ddd879e7548b96d875a06353a1 by Colin Rosenthal (csr)
Added a conf flag to switch between standard indexing and dedup indexing
for metadata files.
(commit: 2e2173b833c5c5ddd879e7548b96d875a06353a1)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMap.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
Commit d52a6bfda1ed72ad4fc125356ae274f18e0de8c6 by Rasmus Bohl Kristensen (rbkr)
'Start' of https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-1970
(commit: d52a6bfda1ed72ad4fc125356ae274f18e0de8c6)
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/hadoop/GetMetadataArchiveMapperTester.java
The file was removedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/SimpleFileResolver.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMap.java
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/GetMetadataArchiveBatchJobTester.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/SimpleFileResolver.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/FileResolver.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was removedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileResolver.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/HadoopUtils.java
Commit ca2c62d474caf14d80e8e1e8f3970a4582e84672 by Rasmus Bohl Kristensen (rbkr)
Integration of Hadoop dedup indexing with GetMetadataArchiveMapper now
works - still needs few tweaks though
(commit: ca2c62d474caf14d80e8e1e8f3970a4582e84672)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/SimpleFileResolver.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopFileUtils.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/HadoopUtils.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobUtils.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
Commit d10211a994d936309d076e87f0ae9699d99f385e by Rasmus Bohl Kristensen (rbkr)
Cleaned up a few things in RawMetadataCache and refactored HadoopUtils
into two separate classes
(commit: d10211a994d936309d076e87f0ae9699d99f385e)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobUtils.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/GetMetadataMapper.java
The file was removedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/hadoop/GetMetadataArchiveMapperTester.java
The file was addedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMapper.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopFileUtils.java
The file was addedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/hadoop/GetMetadataMapperTester.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/GetMetadataArchiveMapper.java
The file was removedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMap.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
Commit 8d9adc2b50d996dfaa544528b34a7b6b96947d1e by Rasmus Bohl Kristensen (rbkr)
Squashed commit of the following:
commit d10211a994d936309d076e87f0ae9699d99f385e Author: bohlski
<rbkr@kb.dk> Date:   Mon Sep 21 15:08:37 2020 +0200
    Cleaned up a few things in RawMetadataCache and refactored
HadoopUtils into two separate classes
commit ca2c62d474caf14d80e8e1e8f3970a4582e84672 Author: bohlski
<rbkr@kb.dk> Date:   Tue Sep 15 13:07:22 2020 +0200
    Integration of Hadoop dedup indexing with GetMetadataArchiveMapper
now works - still needs few tweaks though
commit d52a6bfda1ed72ad4fc125356ae274f18e0de8c6 Author: bohlski
<rbkr@kb.dk> Date:   Fri Sep 11 09:41:00 2020 +0200
    'Start' of
https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-1970
commit 73ec57e3facac150ad9dccb85f46718a98456bc0 Merge: 0d880c601
57b380f2f Author: bohlski <rbkr@kb.dk> Date:   Thu Sep 3 13:37:28 2020
+0200
    Merge branch 'NARK-1882-hadoop-indexing' of
https://github.com/netarchivesuite/netarchivesuite into
NARK-1882-hadoop-indexing
commit 0d880c6017572102b4bf24e60d87ea35a84e2470 Author: bohlski
<rbkr@kb.dk> Date:   Thu Sep 3 13:37:24 2020 +0200
    Small refactor of ArchiveFile/HadoopUtils, few touch ups and started
on metadata job
commit 24aaecbc74299fa9fda9191dfe510977aa027b8f Author: bohlski
<rbkr@kb.dk> Date:   Tue Sep 1 14:30:03 2020 +0200
    Added filehandling for GetMetadataArchiveMapper and small touch ups
(commit: 8d9adc2b50d996dfaa544528b34a7b6b96947d1e)
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/SimpleFileResolver.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was removedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileResolver.java
The file was addedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/hadoop/GetMetadataMapperTester.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopFileUtils.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/FileResolver.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/RawMetadataCacheTester.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was removedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/hadoop/GetMetaDataArchiveHadoopJobTester.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/GetMetadataArchiveMapper.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/GetMetadataMapper.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/HadoopUtils.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobUtils.java
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/GetMetadataArchiveBatchJobTester.java
The file was removedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMap.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/harvesting/metadata/MetadataFile.java
The file was addedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMapper.java
The file was removedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/SimpleFileResolver.java
Commit 9c130776d86bffa43170028cad724353348ec8dc by Rasmus Bohl Kristensen (rbkr)
Added pattern configuration constants in GetMetadataMapper
(commit: 9c130776d86bffa43170028cad724353348ec8dc)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/GetMetadataMapper.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMapper.java
Commit 553c4afcb7ddf654b26cc4c9afa3c4cdc7c79197 by Rasmus Bohl Kristensen (rbkr)
Review https://sbforge.org/fisheye/cru/CR-NAS-385 changes
(commit: 553c4afcb7ddf654b26cc4c9afa3c4cdc7c79197)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXIndexer.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopFileUtils.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/FileResolver.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/GetMetadataMapper.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was addedcommon/common-core/src/test/java/dk/netarkivet/common/utils/SimpleFileResolverTester.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobUtils.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMapper.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/SimpleFileResolver.java
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/hadoop/GetMetadataMapperTester.java
Commit 199355bf63aba632f2dcb23c66de7778b794e97e by Rasmus Bohl Kristensen (rbkr)
Small refactor and implemented harvestRecentFilenames
(commit: 199355bf63aba632f2dcb23c66de7778b794e97e)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/WaybackIndexer.java
Commit dbf8703610bf19bfe044cf7f6f59a10710fdf7b4 by Rasmus Bohl Kristensen (rbkr)
Removed more old bitmag classes, refactored parts of some classes for
easier later removal, and added old necessary functionality (e.g. utils)
to already existing classes
(commit: dbf8703610bf19bfe044cf7f6f59a10710fdf7b4)
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/BitmagArcRepositoryClient.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/bitrepository/action/getfile/GetFileEventHandler.java
The file was modifiedcommon/common-test/src/test/java/dk/netarkivet/common/tools/BitmagSimpleApplication.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/bitrepository/action/putfile/PutFileEventHandler.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/NetarchivesuiteBlockingEventHandler.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/bitrepository/action/getfileids/GetFileIDsEventHandler.java
The file was modifiedarchive/archive-core/src/main/java/dk/netarkivet/archive/arcrepository/distribute/JMSBitmagArcRepositoryClient.java
The file was removedcommon/common-core/src/main/resources/dk/netarkivet/common/distribute/arcrepository/bitrepository/BitmagArcRepositoryClientSettings.xml
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/bitrepository/Bitrepository.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/Bitrepository.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/bitrepository/NetarchivesuiteBlockingEventHandler.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/bitrepository/action/putfile/PutFileAction.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/bitrepository/BitmagUtils.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/BitmagUtils.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/arcrepository/bitrepository/BitrepositoryUtils.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/bitrepository/action/getfile/GetFileAction.java
The file was modifiedcommon/common-core/src/main/resources/dk/netarkivet/common/distribute/arcrepository/bitrepository/JmsBitmagArcRepositoryClientSettings.xml
Commit 48d011475ef9dc0480986dbab35e8266e3207f4f by Rasmus Bohl Kristensen (rbkr)
Fixed some old imports that made the compiler angry
(commit: 48d011475ef9dc0480986dbab35e8266e3207f4f)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/warc/WarcRecordClient.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/WaybackIndexer.java
Commit 0a31340c22213cb7707a5188ec83ded5143c22ce by Colin Rosenthal (csr)
Fixed up FileResolverRESTClient for review and refactored code to enable
its use via factory method
(commit: 0a31340c22213cb7707a5188ec83ded5143c22ce)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/FileResolverRESTClient.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/SimpleFileResolverTester.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/FileResolver.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/FileResolverRESTClientTest.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/SimpleFileResolver.java
Commit b7120d70f3adf93ca6a12368f30897084a8a6295 by Rasmus Bohl Kristensen (rbkr)
Added more logging to FileNameHarvester
(commit: b7120d70f3adf93ca6a12368f30897084a8a6295)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
Commit 358b6977ce8fc98987a32327d57815bd30c0f34a by Rasmus Bohl Kristensen (rbkr)
Small refactor to make ArchiveFile's collectHadoopResults use
HadoopJobUtils' collectOutputLines method - fix for old todo
(commit: 358b6977ce8fc98987a32327d57815bd30c0f34a)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
Commit d8c00a93115685b3357ec4202d7de727d76486fa by Rasmus Bohl Kristensen (rbkr)
Fixed bug with indexing threads sharing same filesystem instance
(commit: d8c00a93115685b3357ec4202d7de727d76486fa)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
Commit e9969a932d57a7e301ec95365219e3a662c3b0c6 by Colin Rosenthal (csr)
Fixed bug with indexing threads sharing same filesystem instance
(commit: e9969a932d57a7e301ec95365219e3a662c3b0c6)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
Commit 7000ae16f8936955299227260d601c5db7005b81 by Rasmus Bohl Kristensen (rbkr)
Added cdx indexing for metadata files in CDXIndexer and proper testing
that works
(commit: 7000ae16f8936955299227260d601c5db7005b81)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobUtils.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMapper.java
The file was modifiedwayback/wayback-indexer/pom.xml
The file was addedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXMapperTester.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/Reporting.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXIndexer.java
Commit 52c718231cc4eb4ba13f6161ace4f701aeb4b738 by Rasmus Bohl Kristensen (rbkr)
Got Hadoop replacement for ArchiveExtractCDXJob ready, refactored some
stuff, fixed old bugs in CDXMapper and added more tests for it
(commit: 52c718231cc4eb4ba13f6161ace4f701aeb4b738)
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/Reporting.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/batch/UrlCanonicalizerFactory.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXIndexer.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopFileUtils.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXMapperTester.java
The file was modifiedharvester/harvester-core/pom.xml
Commit 5cb2bc46120e35bc9e1074ec1f135efd672d4b2a by Rasmus Bohl Kristensen (rbkr)
Setting fix from review https://sbforge.org/jira/browse/NARK-1954
(commit: 5cb2bc46120e35bc9e1074ec1f135efd672d4b2a)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/WaybackIndexer.java
Commit bf5e943440e3760f4e25662ffcf603c3a78c0b2e by Rasmus Bohl Kristensen (rbkr)
Review changes https://sbforge.org/fisheye/cru/CR-NAS-393, changes to
uber-jar set up and small pom fixes
(commit: bf5e943440e3760f4e25662ffcf603c3a78c0b2e)
The file was addedhadoop-uber-jar/pom.xml
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/DedupIndexer.java
The file was modifiedharvester/harvest-scheduler/pom.xml
The file was modifiedharvester/history-gui/pom.xml
The file was modifiedharvester/harvester-core/pom.xml
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/Reporting.java
The file was modifiedharvester/heritrix3/heritrix3-bundler/pom.xml
The file was addedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/hadoop/MetadataCDXMapper.java
The file was modifiedharvester/heritrix3/heritrix3-extensions/pom.xml
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was modifiedwayback/wayback-indexer/pom.xml
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXIndexer.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobUtils.java
The file was modifiedpom.xml
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/hadoop/CDXMapper.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXMapperTester.java
Commit 987c230dc013d45aaac9554df0392043965e56a0 by Rasmus Bohl Kristensen (rbkr)
Fixed SimpleFileResolver, refactored how Hadoop jobs can be started, and
implemented getCrawlLogLinesMatchingRegexp
(commit: 987c230dc013d45aaac9554df0392043965e56a0)
The file was addedharvester/harvester-test/src/test/java/dk/netarkivet/viewerproxy/webinterface/hadoop/MetadataCDXMapperTester.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXMapperTester.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/CrawlLogLinesMatchingRegexp.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/Reporting.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobTool.java
The file was addedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/hadoop/CrawlLogExtractionMapper.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJob.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/hadoop/GetMetadataMapperTester.java
The file was addedharvester/harvester-test/src/test/java/dk/netarkivet/viewerproxy/webinterface/hadoop/CrawlLogExtractionMapperTester.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/batch/FileBatchJob.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/JobType.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/SimpleFileResolver.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobUtils.java
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/viewerproxy/webinterface/TestInfo.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/SimpleFileResolverTester.java
Commit ab9b8860ca1f5323ca20cabf8a23c7ee01009bc8 by Rasmus Bohl Kristensen (rbkr)
Squashed commit of the following:
commit 92109a0ad5ddb9238d593fbce21023ae05804b16 Author: bohlski
<rbkr@kb.dk> Date:   Mon Dec 7 11:41:48 2020 +0100
    Small changes from review https://sbforge.org/fisheye/cru/CR-NAS-395
commit e052b35ecbe8d07c2a88e914d3202d863b57bf50 Author: bohlski
<rbkr@kb.dk> Date:   Wed Dec 2 15:05:48 2020 +0100
    Made small fix/cleanup in crawl log mapper and added more
documentation
commit dcee3b48afde25b3ab1ac42fa65adc64f672d91e Author: bohlski
<rbkr@kb.dk> Date:   Wed Dec 2 14:07:18 2020 +0100
    Added settings for new job and finished last refactoring parts
commit 987c230dc013d45aaac9554df0392043965e56a0 Author: bohlski
<rbkr@kb.dk> Date:   Mon Nov 30 11:56:25 2020 +0100
    Fixed SimpleFileResolver, refactored how Hadoop jobs can be started,
and implemented getCrawlLogLinesMatchingRegexp
(commit: ab9b8860ca1f5323ca20cabf8a23c7ee01009bc8)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/batch/FileBatchJob.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/SimpleFileResolver.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/SimpleFileResolverTester.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/MetadataExtractionStrategy.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJob.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobStrategy.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/Reporting.java
The file was addedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/hadoop/MetadataCDXExtractionStrategy.java
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/hadoop/GetMetadataMapperTester.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobTool.java
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/viewerproxy/webinterface/TestInfo.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXMapperTester.java
The file was addedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/hadoop/CrawlLogExtractionMapper.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/CrawlLogLinesMatchingRegexp.java
The file was addedharvester/harvester-test/src/test/java/dk/netarkivet/viewerproxy/webinterface/hadoop/CrawlLogExtractionMapperTester.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was addedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/hadoop/CrawlLogExtractionStrategy.java
The file was addedharvester/harvester-test/src/test/java/dk/netarkivet/viewerproxy/webinterface/hadoop/MetadataCDXMapperTester.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobUtils.java
Commit ad3aaf637af932564da7d15e83b75c02e9f7fb6f by Rasmus Bohl Kristensen (rbkr)
Removed old bitmag classes and remnants of it
(commit: ad3aaf637af932564da7d15e83b75c02e9f7fb6f)
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/bitrepository/Bitrepository.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedarchive/archive-core/src/main/java/dk/netarkivet/archive/arcrepository/distribute/BitmagArcRepositoryClient.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/BitmagUtils.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/bitrepository/NetarchivesuiteBlockingEventHandler.java
Commit 8f03956481c07a5c8639a847a6a51a30a8827882 by Rasmus Bohl Kristensen (rbkr)
Bit of refactoring and made SSL provider to work with https
(commit: 8f03956481c07a5c8639a847a6a51a30a8827882)
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/service/CGIRequestBuilder.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/BasicTwoWaySSLProvider.java
The file was modifiedarchive/archive-core/src/main/java/dk/netarkivet/archive/arcrepository/distribute/BitmagArcRepositoryClient.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/warc/WarcRecordClientTester.java
The file was modifiedcommon/common-core/src/main/resources/dk/netarkivet/common/settings.xml
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/service/SimpleFileResolver.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/service/FileResolver.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/SimpleFileResolver.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/service/FileResolverRESTClient.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/FileResolverRESTClient.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/warc/WarcRecordClient.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJob.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/FileResolver.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/service/WarcRecordClient.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/HttpsClientBuilder.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/FileResolverRESTClientTest.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/warc/WarcRecordClientTest.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/SimpleFileResolverTester.java
Commit b006660cc04ac3f6c2442dc0d09b74b6c017c9c9 by Asger Askov Blekinge (abr)
Create FileSystem with newInstance and close it afterwards. DO NOT CLOSE
FileSystems gotten with .get
(commit: b006660cc04ac3f6c2442dc0d09b74b6c017c9c9)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedhadoop-uber-jar/src/main/java/MetadataIndexingApplication.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/GetMetadataMapper.java
Commit 48366f7a72262d6f1442b635b466a2b929b9bcd3 by Colin Rosenthal (csr)
Refactoring to make MetadataIndexingApplication closer to a reusable
real-world case.
(commit: 48366f7a72262d6f1442b635b466a2b929b9bcd3)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/Reporting.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobUtils.java
The file was addedhadoop-uber-jar-invoker/src/main/resources/input.txt
The file was modifiedpom.xml
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedcommon/common-core/src/main/resources/dk/netarkivet/common/settings.xml
The file was modifiedhadoop-uber-jar-invoker/src/main/resources/run.sh
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was modifiedhadoop-uber-jar/src/main/java/MetadataIndexingApplication.java
Commit 73e016f0e681e23ea9b50428fed236cbba4b706c by Rasmus Bohl Kristensen (rbkr)
Squashed commit of the following:
commit c0397d7d25f0848af27c5ba646cc6b0124afa9ff Merge: d3522cfe5
f0f4a71ed Author: bohlski <rbkr@kb.dk> Date:   Tue Feb 16 11:15:41 2021
+0100
    Merge branch 'bitmag' into NARK-2016-HTTPS-WRS-FileResolver
commit d3522cfe5d45e6eeafeaf740119b5ef9bc7604d1 Author: bohlski
<rbkr@kb.dk> Date:   Tue Feb 16 11:08:47 2021 +0100
    Added default truststore settings
commit 44da8bb849b06dc55392c01eaf7475563bc75313 Author: bohlski
<rbkr@kb.dk> Date:   Wed Feb 10 16:04:30 2021 +0100
    Few clarification fixes to java doc
commit 18e8f959abf99005115a5e2521286242e98db82f Author: bohlski
<rbkr@kb.dk> Date:   Tue Feb 2 16:46:49 2021 +0100
    Changed how the SSLContext is built to avoid trusting self-signed
certs
commit 8f03956481c07a5c8639a847a6a51a30a8827882 Author: bohlski
<rbkr@kb.dk> Date:   Tue Feb 2 11:02:29 2021 +0100
    Bit of refactoring and made SSL provider to work with https
(commit: 73e016f0e681e23ea9b50428fed236cbba4b706c)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJob.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/warc/WarcRecordClient.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/warc/WarcRecordClientTester.java
The file was modifiedarchive/archive-core/src/main/java/dk/netarkivet/archive/arcrepository/distribute/BitmagArcRepositoryClient.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/FileResolver.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/FileResolverRESTClient.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/service/CGIRequestBuilder.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/warc/WarcRecordClientTest.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/service/WarcRecordClient.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/SimpleFileResolverTester.java
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/FileResolverRESTClientTest.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/BasicTwoWaySSLProvider.java
The file was modifiedcommon/common-core/src/main/resources/dk/netarkivet/common/settings.xml
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/service/FileResolverRESTClient.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/HttpsClientBuilder.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was removedcommon/common-core/src/main/java/dk/netarkivet/common/utils/SimpleFileResolver.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/service/SimpleFileResolver.java
The file was addedcommon/common-core/src/main/java/dk/netarkivet/common/utils/service/FileResolver.java
Commit d022a62c4c855acd8a042db1a240ea515a063d7f by Rasmus Bohl Kristensen (rbkr)
Moved Kerberos logins
(commit: d022a62c4c855acd8a042db1a240ea515a063d7f)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/WaybackIndexer.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/distribute/IndexRequestServer.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/RawMetadataCache.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/ArchiveFile.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/BasicTwoWaySSLProvider.java
The file was modifiedharvester/harvester-test/src/test/java/dk/netarkivet/harvester/indexserver/hadoop/GetMetadataMapperTester.java
Commit a114fff89a4046d8084f41be0a25f599dd785575 by Rasmus Bohl Kristensen (rbkr)
collectionID setting fix to always default to env name when unset
(commit: a114fff89a4046d8084f41be0a25f599dd785575)
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/webinterface/GUIWebServer.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/service/CGIRequestBuilder.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/distribute/bitrepository/BitmagUtils.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/harvester/indexserver/distribute/IndexRequestServer.java
The file was modifiedarchive/archive-core/src/main/java/dk/netarkivet/archive/arcrepository/distribute/BitmagArcRepositoryClient.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/WaybackIndexer.java
Commit 48977bc912bcd2f4b9b44db06e65c04eb0281352 by Rasmus Bohl Kristensen (rbkr)
Just some optimized imports and small stuff
(commit: 48977bc912bcd2f4b9b44db06e65c04eb0281352)
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/FileNameHarvester.java
The file was modifiedwayback/wayback-indexer/src/main/java/dk/netarkivet/wayback/indexer/WaybackIndexer.java
Commit 681d1147f7824aa7a18804c4b1d9475ce9c26c19 by Colin Rosenthal (csr)
Follow up to own review comments
(commit: 681d1147f7824aa7a18804c4b1d9475ce9c26c19)
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/warc/WarcRecordClientTest.java
The file was modifiedhadoop-uber-jar/pom.xml
The file was modifiedcommon/common-core/src/test/java/dk/netarkivet/common/utils/warc/WarcRecordClientTester.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/CommonSettings.java
The file was modifiedcommon/common-core/src/main/java/dk/netarkivet/common/utils/hadoop/HadoopJobUtils.java
The file was modifiedharvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/Reporting.java
The file was modifiedwayback/wayback-indexer/src/test/java/dk/netarkivet/wayback/hadoop/CDXJobTest.java