DeployApplication - Creates deploy scripts from a deploy-config

HarvestdatabaseUpdateApplication - Updates HarvestDB schema

BuildCompleteSettings - Merges module settings files in NAS to one large global default settings file. Run as part of release process.

GetFile - Retrieves a file via the ArcRepository interface

GetRecord - Retrieves a (w)arc-record via the ArcRepository interface

LoadDatabaseChecksumArchive - Migration tool from file-based checksums to database-based checksums

ReestablishAdminDatabase - For reestablishing the admin database from a '' file

RunBatch - Runs a batch job from the command line

Upload - Uploads a file to the ArcRepository from the command line. (Handy for testdata.)


Should be deprecated (question) Reads old file.

ClassDependenciesNon NAS Utility (license is not ours)

CreateIndexCLI to talk to IndexServer via IndexClient

RunChecksumCLI to get all checksums from a Bitarchive (deprecated)


Asynchronously starts a dedup indexing on an IndexServer and then exits. Tue Hejlskov Larsen is this what you use to generate deduplication indexes?

MakeIndexRuns a CDX extraction on a single file in a remote ArcRepository

FindRelevantCrawllogLinesFinds crawl-log lines matching a given domain name in a local metadata file

JMXProxy"This tool will simply reregister all MBeans that matches the given query from the JMX hosts read in settings, using* its own platformmbeanserver. It will then wait forever."

DeduplicateToCDXApplicationExtracts CDX records for deduplicate annotations from a local crawl log file

ResetFailedFilesUtility for WaybackIndexer to reset files that have failed more than 3 times so they can be retried

ARCReaderUtilsSplits an arcfile (not warc) and dumps results to a directory

ArcWrapCreates an arcfile by wrapping a file

ExtractCDXExtracts CDX records, unsorted, from a list of local input arcfiles (not warcs)

JMSBrokerChecks that a JMS broker (as specified in NAS settings) is up and running.

WriteBytesToFile - Just creates large files full of null bytes


Tests if an ftp server configuration in a NAS settings file points to a NAS-compliand ftp server.

ArcMerge - Merges several arcfiles into one arcfile


Extracts CDX records, unsorted, from a list of local input (w)arcfiles

WARCExtractCDXExtracts CDX records, unsorted, from a list of local input warcfiles

ReformatTranslationFilei) reorders a translation file so keys are in the same order as a reference file, and ii) allows the encoding of the output file to be changed

MailValidatorChecks the validity of a mail-server configured in NAS settings by sending a test-mail

MakeNewMetadataFileCreates a metadata file. For use when postprocessing fails. Is this used?


CheckDuplicateReductionValidates deduplication by comparing a crawl log with a collection of arcfiles. (not warc)

StandaloneApplicationReducedCreates a standalone NetarchiveSuite in a single JVM

MigrateDefaultHarvestDatabaseThis just initialises a SiteSection object which is supposed to upgrade the harvest database as a side-effect

CreateCDXMetadataFileComplex tool that takes a set of filenames and runs a batch job to extracts the cdx'es from each files and pack them in a metadata arc or warc file, one record per input file

HarvesterQueueControlTool to count the number of messages in a given JMS queue

HarvestDatabaseValidatorValidates whether you can connect to the harvest database with the settings in a given settings file

HarvestTemplateApplication - Utility for uploading and updating heritrix templates (in test)

CheckDomainCrawltraps - Runs through all domains in the harvest database and checks whether each crawlertrap regexp can validly be included as text-content in an xml document


Runs through a list of crawler-trap regexes in a fileand checks whether each crawlertrap regex can validly be included as text-content in an xml documen