Child pages
  • Heritrix3 Configurations

Note that this documentation is for the coming release and is still work-in-progress.
For documentation on the released versions, please view the previous versions of the NetarchiveSuite documentation and select the relevant version.

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »


For configuration related to NetarchiveSuite, please refer to section on Detailed Configurations#Configure Heritrix process.

For more specific Heritrix configurations, please refer to Appendix B - Managing Heritrix Harvest Templates (order.xml) and Appendix C - Migrate the Heritrix templates to NetarchiveSuite 3.6.0+ of this document.

The crawling in NetarchiveSuite uses by default Deduplication. This feature and how to disable it is described in Configuration Manual, Section 8.1.2.

  • No labels