Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel4
minLevel2
indent6px
exclude(Download.*)|(Javadoc)|(Manuals)

Excerpt

5.4.1 Release Date: 2018-05-28

BugFix Release 5.4.2 (pending)

This release addresses issue 

JIRA
serverSBForge
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId327e372c-baf0-3de4-afa1-7694d9fcf12b
keyNAS-2514
 which resulted in many url's receiving crawl-status code -50 in some harvests. It is only relevant for users of SeedUriDomainnameQueueAssignmentPolicy. The fix is in two parts:

  • A new QuotaEnforcer implementation dk.netarkivet.harvester.harvesting.PrerequisiteIgnoringQuotaEnforcer which can be used in a crawler-bean harvest-template, and which never enforces harvesting quotas on prerequisite url's (typically dns lookups and robots.txt), and
  • An alteration to SeedUriDomainnameQueueAssignmentPolicy to ensure that dns queries are queued on the same queue as other url's for the same seed. This appears to work around an undocumented race condition in heritrix which was causing many crawl failures.

BugFix Release 5.4.1

NAS 5.4.1 is a Bug-Fix release addressing some issues found during the Acceptance Test phase of NAS 5.4. The issues addressed are

  • A memory leak introduced by a new feature in NAS 5.4 (
    JIRA
    serverSBForge
    columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
    serverId327e372c-baf0-3de4-afa1-7694d9fcf12b
    keyNAS-2614
     ) to manage the number of jobs on the JMS queues, and
  • An error in the functionality for searching/browsing in the frontier of running jobs
  • Introduction of a new setting (settings.harvester.indexserver.tryToMigrateDuplicationRecords), a switch, to disable new functionality associated with the Danish netarchive's project to compress their archive. This functionality caused an unnecessary slowdown in indexing functionality, but is now disabled by default. 

The functionality for browsing in the Heritrix frontier is still somewhat experimental and is in need of a usability overhaul. This is a priority for a future release.


JIRA
serverSBForge
columnskey,summary,status
maximumIssues1000
jqlQuery project = NAS AND issuetype in standardIssueTypes() AND fixVersion = 5.4.1 AND NOT component = Test AND (labels != not_for_release_notes or labels is empty) ORDER BY priority DESC, created ASC
serverId327e372c-baf0-3de4-afa1-7694d9fcf12b

Highlights in 5.4

  • NetarchiveSuite now ships with a customised version of Heritrix 3, forked from the version maintained by Kristinn Sigurdsson.
  • The integration between the NetarchiveSuite Web interface and Heritrix 3 has been much improved, both in regard to scaling and usability.
  • There is significant improvement to the job generation algorithm, so that the production of spurious duplicate jobs is now largely eliminated.
  • Support for Heritrix1 has now been removed from the distribution
  • You can now define a limit to how many jobs are submitted to each jobchannel simultaneously, if you enable limitSubmittedJobsInQueue by setting settings.harvester.scheduler.limitSubmittedJobsInQueue to true. The default value if you enable this is one job at a time. You can change this value by overriding the settings.harvester.scheduler.submittedJobsInQueueLimit. The latter setting is ignored, if limitSubmittedJobsInQueue is false, which is the default setting
  • The setting settings.harvester.scheduler.jobgenerationperiode renamed as settings.harvester.scheduler.jobgenerationperiod (default value is still 60 a.k.a 1 minute)
  • Added new setting to choose between filteringmethods on History/Harveststatus-running.jsp: settings.webinterface.runningjobsFilteringMethod (default: database alternative: cachedLogs)

Upgrading from previous releases of Netarchivesuite

  •  Upgrading the database: After finishing the installation of NetarchiveSuite and starting it for the first time, please go the server where GUIApplication and HarvestJobManager is installed and run:

    Code Block
    cd NAS_INSTALLDIR/conf
    bash update_external_harvest_database.sh

    Please examine the INSTALLDIR/update_external_harvest_database.log for any errors.

 


Panel
Most-recent updates for 5.4.1:


Issues resolved in release 5.4

JIRA
serverSBForge
columnskey,summary,status
maximumIssues1000
jqlQuery project = NAS AND issuetype in standardIssueTypes() AND fixVersion = 5.4 AND NOT component = Test AND (labels != not_for_release_notes or labels is empty) ORDER BY priority DESC, created ASC
serverId327e372c-baf0-3de4-afa1-7694d9fcf12b