0) Install NAS
1) Make sure that recover-log is enabled in Heritrix. This is done by setting the "recovery-log-enabled" property of the frontier to true:
<boolean name="recovery-log-enabled">true</boolean>
1a) Make sure that the recover.gz file is included in the metadata arc by adding
<logFilePattern>.*(\.log|\.out|\.gz)</logFilePattern> to harvester.harvesting.metadata in the HarvestController settings file.
2) Enable recovery in NetarchiveSuite for just one of the HIGH-priority harvesters by enabling the "settings.harvester.harvesting.continuationFromHeritrixRecoverlogEnabled" setting in the harvester settings-file like this
<harvesting>
<continuationFromHeritrixRecoverlogEnabled>true
</continuationFromHeritrixRecoverlogEnabled>
..
3) Restart the HIGH priority harvester where you have modified the settings, shutdown the the rest of the HIGH-priority harvesters in your installation.
4) Make a rather large selective harvest so there is time for the recovery mechanism to work before we kill the harvester (kill -9) after about 15 minutes, and afterwards restart the harvesting processor as well (If not, it takes a long time (6 hours+), before NAS gives up on the dead Heritrix process, and continues with the data-processing and uploads the metadata (including the the recover.gz file) to the archive.
5) restart the killed job.
6) See that NAS successfully extracts the recovery.gz from the previous job, and points to this in the order.xml (recover-path attribute).
7) after 15 minutes, terminate the job in the Heritrix GUI
8) Examine the cdx'es from the two metadata-arc files, and verify, that the downloads from the killed harvest are not harvested again in the next harvest.