For configuration related to NetarchiveSuite, please refer to section on Detailed Configurations#Configure Heritrix process.

For more specific Heritrix configurations, please refer to Appendix B - Managing Heritrix Harvest Templates (order.xml) and Appendix C - Migrate the Heritrix templates to NetarchiveSuite 3.6.0+ of this document.

The crawling in NetarchiveSuite uses by default Deduplication. This feature and how to disable it is described in Configuration Manual, Section 8.1.2.

How to configure which Heritrix report has to be uploaded in the metadata ARC file

Three settings properties control which heritrix reports are added to the metadata ARC file: