Changes
Summary
- https://sbforge.org/jira/browse/NAS-2859 (details)
- Update OnbFreeSpaceProvider.java (details)
- Update OnbFreeSpaceProvider.java (details)
- Update OnbFreeSpaceProvider.java (details)
- Actually want to write requests and metadata by default in tests! (details)
- Updated version to a unique name (details)
- Quick fix to NARK-1819 (details)
- non-function arcrep for use in bitmag development (details)
- poms updated with Hadoop and basic settings for it added in (details)
- Small changes to settings (details)
- Can now at least work with local bitmag, it seems (details)
- FileNameHarvester now grabs list of files directly from bitmag. Added (details)
- Indexing through hadoop instead of batch should now work for WARC files (details)
- Changes from review (details)
- Fixed dependency conflict with hadoop-client package and finished (details)
- Refactored Bitrepository to a singleton. (details)
- Small logging changes (details)
- Bitrepository class changes (details)
- Dependency fix to avoid logging loop and small logging changes (details)
- WarcRecordClient.java andApacheClientReaderFactory.java in (details)
- WarcRecordClient get and getFile changed (details)
- Mulig del-løsning (details)
- Fixed datafil og tilføjet lidt dokumentation (details)
- Added some integration tests for indexing on hadoop (details)
- Removed the test which was less like the anticipated prod architecture (details)
- Tidied up the hadoop/cdx integration test (details)
- Added an integration test for WarcRecordClient (details)
- Added Readme file in empty directory (details)
- Added Readme file in empty directory (details)
- Added a hdfs setting that seems relevant (details)
- WarcRecord fixes for WarcRecordClientTest and Tester (details)
- Error fix (details)
- Made method for indexing with Hadoop that assumes direct access to input (details)
- Dedup indexing (details)
- latest from pc (details)
- Moved getWarc from constructor to get (details)
- Code-maturation for cdx-indexing (details)
- URI corrected to include filename Not yet robust for files not in gzip (details)
- Hardcoded finName for testing (details)
- Hardcoded finName for testing (details)
- Attempt to avoid double-indexing (details)
- Now passes integration test. (details)
- Now returns correct record. (details)
- Efter lidt cleanup (details)
- Initial work on FileResolver (details)
- Efter endnu lidt cleanup, men før logs (details)
- Added hadoop job for getting metadata lines from archive files and an (details)
- latest update (details)
- Added filehandling for GetMetadataArchiveMapper and small touch ups (details)
- added null response if http statuscode is not 200 (details)
- removed printlns and added logging for http exception (details)
- Added pattern-matching method to file-resolver (details)
- Small refactor of ArchiveFile/HadoopUtils, few touch ups and started on (details)
- added test methods for archive files and negative testing (details)
- Changed test to use paths relative to module root (details)
- Added tests (details)
- test corrections excludes .gz (details)
- Added a conf flag to switch between standard indexing and dedup indexing (details)
- 'Start' of https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-1970 (details)
- Tiny settings change for NARK-1882 review (details)
- Integration of Hadoop dedup indexing with GetMetadataArchiveMapper now (details)
- Cleaned up a few things in RawMetadataCache and refactored HadoopUtils (details)
- Squashed commit of the following: (details)
- Added pattern configuration constants in GetMetadataMapper (details)
- Cleanup aaording review (details)
- latest changes i getFile etc. (details)
- corrected (details)
- A few final edits. (details)
- 'Initial' commit (details)
- First commit on arc_record branch (details)
- Review https://sbforge.org/fisheye/cru/CR-NAS-385 changes (details)
- Added testing (details)
- added .arc test-files (details)
- Fixed dependency problem and added simple application class to run (details)
- Fixed get .arc-record with positive offset (details)
- Small refactor and implemented harvestRecentFilenames (details)
- Javadoc added to few files https://sbforge.org/fisheye/cru/CR-NAS-387 (details)
- More review changes https://sbforge.org/fisheye/cru/CR-NAS-387 (details)
- minor changes tests (details)
- Initial functioning FileResolverRESTClient (details)
- Removed some old bitmag classes (details)
- Improved handling of try/catch logic (details)
- Added some new tests and matured code ready for review (details)
- Removed more old bitmag classes, refactored parts of some classes for (details)
- Fixed some old imports that made the compiler angry (details)
- Fixed up FileResolverRESTClient for review and refactored code to enable (details)
- Added more logging to FileNameHarvester (details)
- Small refactor to make ArchiveFile's collectHadoopResults use (details)
- Latest bug fixes on loop testing (details)
- Fixed bug with indexing threads sharing same filesystem instance (details)
- Undo of file-change permissions. (details)
- Fixed bug with indexing threads sharing same filesystem instance (details)
- Fixed handling of returning used client to pool (details)
- Added cdx indexing for metadata files in CDXIndexer and proper testing (details)
- Got Hadoop replacement for ArchiveExtractCDXJob ready, refactored some (details)
- Added setting for new job input/output dirs and more logging (details)
- Setting fix from review https://sbforge.org/jira/browse/NARK-1954 (details)
- Tidied up logic in client and tests (details)
- Just save it for further improvement (details)
- Review changes https://sbforge.org/fisheye/cru/CR-NAS-393, changes to (details)
- added FAILED check to JMSBitmagArcRepositoryClient.java (details)
- FileResolverRESTClient now sends collectionId as an extra query (details)
- Fixed SimpleFileResolver, refactored how Hadoop jobs can be started, and (details)
- Added collectionId parameter to WarcRecordClient (details)
- Added exactfilename parameter to FileResolverRESTClient. (details)
- uber.jar fixes and JMSBitmagArcRepositoryClient.java adds (details)
- Added settings for new job and finished last refactoring parts (details)
- Made small fix/cleanup in crawl log mapper and added more documentation (details)
- just to be sure (details)
- Squashed commit of the following: (details)
- Small changes from review https://sbforge.org/fisheye/cru/CR-NAS-395 (details)
- news (details)
- unfinished code (details)
- Not finished 2 (details)
- corrected for putFileAcction (details)
- Modified JMSBitmagArcRepositoryClient, PutfileAction and (details)
- small changes in PutFileAction and PutFileEventHandler (details)
- latest (details)
- Working version (details)
- newest version with warcRecordClient updates (details)
- Cleaned up outcommenting (details)
- Added a default value for setting useBitmagHadoopBackend (details)
- Squashed commit of the following: (details)
- Removed bitmag entries reinvented by mistake (details)
- Fixed duplicate code. (details)
- Removed old bitmag classes and remnants of it (details)
- modified copy-nas-and-heritrix.sh (details)
- First attempt at a kill switch that returns an empty index for dedups (details)
- Second attempt using IndexReadyMessage (details)
- Added some logging (details)
- Further attempt (details)
- Further attempt using IndexReadyMessage (details)
- Back to reply (details)
- Added a bit more logging. (details)
- Removed potential error when requesting empty cache (details)
- Clean-up (details)
- Removed dead code (details)
- Removed copy-nas-and-heritrix.sh from version control (details)
- Basic CR-NAS-399 changes (details)
- Fixed according to CFR-62389 (details)
- Improvements according to CR-NAS-399 (details)
- BitmagUtils.shutdown() and pillar check moved (details)
- Removed instance=new BitmagArcRepositoryClient() from constructor (details)
- Bit of refactoring and made SSL provider to work with https (details)
- First attempt at a command-line metadata extraction job. (details)
- Changed how the SSLContext is built to avoid trusting self-signed certs (details)
- Fixed the error with closed hadoop file system (details)
- Downgraded hadoop to stable 3.2.2 (details)
- Created an invoker-module to prevent the job from including all the (details)
- Package in libs (details)
- Create FileSystem with newInstance and close it afterwards. DO NOT CLOSE (details)
- Added an extra sanity check in the run.sh script. (details)
- Added an extra line to show how to customise location of krb5.conf (details)
- Improved logging (details)
- Attempted improvement of remote file handling of failures. (details)
- minor cleanup (details)
- Modified to support dynamic identification of the correct file-system (details)
- Few clarification fixes to java doc (details)
- Refactoring to make MetadataIndexingApplication closer to a reusable (details)
- Parametrised the script to make it more flexible. (details)
- Refactored to use login mechanism instead of doAs. (details)
- Removed all unnecessary configuration overrides. (details)
- Initial version using fileresolver (details)
- Added default truststore settings (details)
- Added explicit jersey-server dep. to GUI. (details)
- Added a necessary filtering stage to match only current collection (details)
- Set fallback to environment name for collection (details)
- Added skeletal getFile (details)
- Fix as vp creates empty toFile but bitmag requires non-existing toFile. (details)
- Squashed commit of the following: (details)
- Tidying up for review. (details)
- Quick attempt to enable hadoop in GUI (details)
- Upgraded guava version (details)
- Sorting out separate inclusion of shaded jar. (details)
- Remove "netarchivesuite" prefix from uber jar name. (details)
- Improved logging on job creation (details)
- Forcing HadoopJobStrategy to use hdfs (details)
- Forcing HadoopJobStrategy to use hdfs (details)
- Added harvester-core to uber jar (details)
- Read hadoop truststore location from NAS settings (details)
- Pom Jersey fix (details)
- Stuff (details)
- Small guava pom change (details)
- Follow-up from code review (details)
- Moved Kerberos logins (details)
- Small fixes and revert (details)
- Readded Kerberos login to IndexRequestServer (details)
- Added line 122 with casting (CleanupIF) (details)
- Small job change for clarity in cluster job overview (details)
- Outcommented TestCorrect, FileChecksumArchiveTester, (details)
- Changed outcommenting to @Ignore (details)
- collectionID setting fix to always default to env name when unset (details)
- Small logging change (details)
- Tests that fail locally are ignored (details)
- Added an intellij test configuration (details)
- Just some optimized imports and small stuff (details)
- modified test config (details)
- Corrected internal versions (details)
- Added hadoop-common as necessary (details)
- More fixed versions (details)
- switched line 123 with line 122 CleanupIF.. (details)
- Made ArcRepositoryServer implement CleanupIF (details)
- Minimal fix to test if ssl works correctly (details)
- Removed applications from test (details)
- Improved test logic (details)
- Unused imports and small line removals (details)
- Added an exclusion to prevent fatal runtime error. (details)
- Added another exclusion to prevent fatal runtime error. (details)
- Follow up to own review comments (details)
- Filtering transient GUIWebServer from integration test (details)
- Fixed check in test of which type of instance is running. (details)
- Debugging generalTest (details)
- Returned old logic. VM running the integration test uses default (details)
- ny pom.xmm a intergaces og annotations (details)
- Fixed circular dependency (details)
- Removed requirement for RequiresFileResolver (details)
- Added compile-time groovy dep to make the groovy scripts look better in (details)
- Explicitly exclude wrong je and httpclient from heritrix bundler (details)
- Removed unused dependency which was causing a problem. (details)