The primary function of the NetarchiveSuite is to plan, schedule and archive web harvests of parts of the internet.

See:
          Description

Packages
dk.netarkivet.archive This module makes it possible to setup and run a repository with replication, active bit consistency checks for bit-preservation, and support for distributed batch jobs on the archive.
dk.netarkivet.archive.arcrepository  
dk.netarkivet.archive.arcrepository.bitpreservation  
dk.netarkivet.archive.arcrepository.distribute  
dk.netarkivet.archive.arcrepositoryadmin  
dk.netarkivet.archive.bitarchive  
dk.netarkivet.archive.bitarchive.distribute  
dk.netarkivet.archive.checksum  
dk.netarkivet.archive.checksum.distribute  
dk.netarkivet.archive.distribute  
dk.netarkivet.archive.indexserver  
dk.netarkivet.archive.indexserver.distribute  
dk.netarkivet.archive.tools  
dk.netarkivet.archive.webinterface  
dk.netarkivet.common The framework and utilities used by the whole suite, like exceptions, settings, messaging, file transfer (RemoteFile), and logging.
dk.netarkivet.common.distribute  
dk.netarkivet.common.distribute.arcrepository  
dk.netarkivet.common.distribute.indexserver  
dk.netarkivet.common.distribute.monitorregistry  
dk.netarkivet.common.exceptions  
dk.netarkivet.common.lifecycle  
dk.netarkivet.common.management  
dk.netarkivet.common.tools  
dk.netarkivet.common.utils  
dk.netarkivet.common.utils.arc  
dk.netarkivet.common.utils.batch  
dk.netarkivet.common.utils.cdx  
dk.netarkivet.common.webinterface  
dk.netarkivet.deploy Contains software for installing NetarchiveSuite on multiple machines.
dk.netarkivet.harvester  
dk.netarkivet.harvester.datamodel  
dk.netarkivet.harvester.distribute  
dk.netarkivet.harvester.harvesting This module handles defining, scheduling, and execution of harvests.
dk.netarkivet.harvester.harvesting.controller  
dk.netarkivet.harvester.harvesting.distribute  
dk.netarkivet.harvester.harvesting.extractor  
dk.netarkivet.harvester.harvesting.frontier  
dk.netarkivet.harvester.harvesting.monitor  
dk.netarkivet.harvester.harvesting.report  
dk.netarkivet.harvester.scheduler  
dk.netarkivet.harvester.tools  
dk.netarkivet.harvester.webinterface  
dk.netarkivet.monitor Provides web-access to JMX-packaged information from all NetarchiveSuite applications.
dk.netarkivet.monitor.distribute  
dk.netarkivet.monitor.jmx  
dk.netarkivet.monitor.logging  
dk.netarkivet.monitor.registry  
dk.netarkivet.monitor.registry.distribute  
dk.netarkivet.monitor.tools  
dk.netarkivet.monitor.webinterface  
dk.netarkivet.viewerproxy This module gives access to previously harvested material, through a proxy solution.
dk.netarkivet.viewerproxy.distribute  
dk.netarkivet.viewerproxy.reporting  
dk.netarkivet.viewerproxy.webinterface  
dk.netarkivet.wayback Provides tools for integrating NetarchiveSuite with the open-source wayback machine for browsing webarchives.
dk.netarkivet.wayback.aggregator The Aggregator takes care of sorting the raw index files generated by the indexer and merge the files into larger index files usable by Wayback.
dk.netarkivet.wayback.batch  
dk.netarkivet.wayback.batch.copycode  
dk.netarkivet.wayback.indexer Retrieves indexes of the ARC files in the repository which are needed by Wayback.

 

The primary function of the NetarchiveSuite is to plan, schedule and archive web harvests of parts of the internet. We use Heritrix as our web-crawler. NetarchiveSuite was released on July 2007 as Open Source under the LGPL license and is used by the Danish organization Netarkivet.dk ( http://netarkivet.dk). This organization has since July 2005 been using NetarchiveSuite to harvest Danish websites as authorized by the latest Danish Legal Deposit Act.

The NetarchiveSuite can organize three different kinds of harvests:

The software has been designed with the following in mind: See the NetarchiveSuite wiki for further details.