Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 9 Next »

Release Date: 4th November 2016



Java 8

NetarchiveSuite now requires a Java 8 runtime for all components.

New Settings

  • ChecksumFileApplication


    * <b>settings.archive.checksum.usePrecomputedChecksum</b>: This decides whether or not use the pre-computed checksum sent as part of the StoreMessage and UploadMessage
    * The default is false
        public static String CHECKSUM_USE_PRECOMPUTED_CHECKSUM_DURING_UPLOAD= "settings.archive.checksum.usePrecomputedChecksumDuringUpload";

    This boolean can be used to optimise the upload process to the bitarchives.


  • GUIApplication, HarvestJobManager

     * <b>settings.common.topLevelDomains.tld</b>: <br>
     * Extra valid top level domain, like, .dk, .org., not part of current embedded public_suffix_list.dat file 
     * in common/common-core/src/main/resources/dk/netarkivet/common/utils/public_suffix_list.dat
     * downloaded from
    public static String TLDS = "settings.common.topLevelDomains.tld";
  • HarvestControllerApplication

     * The version number which goes in metadata file names like 12345-metadata-&lt;version number&gt;.warc.gz
    public static String METADATA_FILE_VERSION_NUMBER = "settings.harvester.harvesting.metadata.filename.versionnumber";

    This parameter allows for the definition of different generations of metadata file.

     * <b>settings.harvester.harvesting.metadata.compression</b> Do we compress the
     * metadata associated with a given harvest job. 
     * default: false 
    public static String METADATA_COMPRESSION = "settings.harvester.harvesting.metadata.compression";

    Controls whether metadata files are generated in compressed (warc.gz) format.

  • ViewerproxyApplication, IndexServerApplication, WaybackIndexerApplication

     * Specifies the suffix of a regex which can identify valid metadata files by job number. Thus preceding
     * the value of this setting with .* will find all metadata files.
    public static String METADATAFILE_REGEX_SUFFIX = "settings.common.metadata.fileregexsuffix";

    This parameter allows one to determine which metadata files to include in indexing (for Viewerproxy or Wayback). The full regex string to be searched consists of the string <jobid>-<harvestid> followed by this suffix. The default value is -metadata-[0-9]+.(w)?arc(.gz)? which matches all metadata files using the standard NetarchiveSuite naming scheme.

  • GUIApplication

         * <b>settings.harvester.viewerproxy.allowFileDownloads</b> If set to false, there will be no links to
         * allow download of warcfiles via the Viewerproxy GUI.
        public static String ALLOW_FILE_DOWNLOADS = "settings.harvester.viewerproxy.allowFileDownloads";

    A simple security feature to hinder operators from easily downloading harvested archive files. (default: true)

       public static String HERITRIX3_MONITOR_TEMP_PATH = "settings.harvester.harvesting.monitor.tempPath";

    Path to a directory which the new Heritrix3 monitor feature can use for caching. This is empty by default, and falls back to the system-wide temporary directory (usually /tmp).

Control Heritrix from NetarchiveSuite

NetarchiveSuite now requires Java 8.

Top-Level Domains Can Be Defined Externally

warc.gz metadata files

Warc Revisit Records


New Heritrix Version

RSS Crawling

GUI Styling

  • Download NetarchiveSuite

  • Download Heritrix 3 Bundle (required)

  • Javadoc

  • Manuals

Full list of issues resolved in this release

T Key P Summary

Known issues

T Key P Summary Fix Version/s

  • No labels