The generation of the Deduplication indices is now parallelized. We have upgraded from Lucene 2.0 to Lucene 3.6, which also necessitated some major refactoring of the Deduplication modules used by Heritrix.
Make sure that if you need to generate big deduplication indices (based on more than a handful of jobs), the max number of open files, typically only 1024 needs to be raised to 5024 (probably 4096 will do: (On linux this is typically by calling "ulimit -n 5024", if the user has the permission.
Otherwise, you need to add
* soft nofile 5024
* hard nofile 5025
In 3.20, the NAS applications no longer try to upgrade the harvestdatabase themselves, but only verify if the database tables have the correct versions. If not, the applications will fail. To upgrade the harvestdatabase, the system-responsible from now on must use the dk.netarkivet.harvester.tools.HarvestdatabaseUpdateApplication (See Additional Tools Manual).
Also included with the NetarchiveSuite zip in the scripts directory, there is a template for a script calling this program.