For configuration related to NetarchiveSuite, please refer to section on Detailed Configurations#Configure Heritrix process.

For more specific Heritrix configurations, please refer to Appendix B - Managing Heritrix Harvest Templates (order.xml) and Appendix C - Migrate the Heritrix templates to NetarchiveSuite 3.6.0+ of this document.

The crawling in NetarchiveSuite uses by default Deduplication. This feature and how to disable it is described in Configuration Manual, Section 8.1.2.

  • No labels