Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


DeployApplicationCreates deploy scripts from a deploy-configy

HarvestdatabaseUpdateApplicationUpdates HarvestDB schemay

BuildCompleteSettingsMerges module settings files in NAS to one large global default settings file. Run as part of release process.y

GetFileRetrieves a file via the ArcRepository interface

GetRecordRetrieves a (w)arc-record via the ArcRepository interface

LoadDatabaseChecksumArchiveMigration tool from file-based checksums to database-based checksums

ReestablishAdminDatabaseFor reestablishing the admin database from a '' file

RunBatchRuns a batch job from the command line

UploadUploads a file to the ArcRepository from the command line. (Handy for testdata.)y


Should be deprecated (question) Reads old file.

ClassDependenciesNon NAS Utility (license is not ours)

CreateIndexCLI to talk to IndexServer via IndexClient

RunChecksumCLI to get all checksums from a Bitarchive (deprecated)


Asynchronously starts a dedup indexing on an IndexServer and then exits. Tue Hejlskov Larsen is this what you use to generate deduplication indexes?

MakeIndexRuns a CDX extraction on a single file in a remote ArcRepository

FindRelevantCrawllogLinesFinds crawl-log lines matching a given domain name in a local metadata file

JMXProxy"This tool will simply reregister all MBeans that matches the given query from the JMX hosts read in settings, using* its own platformmbeanserver. It will then wait forever."

DeduplicateToCDXApplicationExtracts CDX records for deduplicate annotations from a local crawl log file

ResetFailedFilesUtility for WaybackIndexer to reset files that have failed more than 3 times so they can be retried

ARCReaderUtilsSplits an arcfile (not warc) and dumps results to a directory

ArcWrapCreates an arcfile by wrapping a file

ExtractCDXExtracts CDX records, unsorted, from a list of local input arcfiles (not warcs)

JMSBrokerChecks that a JMS broker (as specified in NAS settings) is up and running.

WriteBytesToFileJust creates large files full of null bytes


Tests if an ftp server configuration in a NAS settings file points to a NAS-compliand ftp server.

ArcMergeMerges several arcfiles into one arcfile


Extracts CDX records, unsorted, from a list of local input (w)arcfiles

WARCExtractCDXExtracts CDX records, unsorted, from a list of local input warcfiles

ReformatTranslationFilei) reorders a translation file so keys are in the same order as a reference file, and ii) allows the encoding of the output file to be changed

MailValidatorChecks the validity of a mail-server configured in NAS settings by sending a test-mail

MakeNewMetadataFileCreates a metadata file. For use when postprocessing fails. Is this used?


CheckDuplicateReductionValidates deduplication by comparing a crawl log with a collection of arcfiles. (not warc)

StandaloneApplicationReducedCreates a standalone NetarchiveSuite in a single JVM

MigrateDefaultHarvestDatabaseThis just initialises a SiteSection object which is supposed to upgrade the harvest database as a side-effect

CreateCDXMetadataFileComplex tool that takes a set of filenames and runs a batch job to extracts the cdx'es from each files and pack them in a metadata arc or warc file, one record per input file

HarvesterQueueControlTool to count the number of messages in a given JMS queue

HarvestDatabaseValidatorValidates whether you can connect to the harvest database with the settings in a given settings file

HarvestTemplateApplicationUtility for uploading and updating heritrix templatesy (in test)

CheckDomainCrawltrapsRuns through all domains in the harvest database and checks whether each crawlertrap regexp can validly be included as text-content in an xml documenty


Runs through a list of crawler-trap regexes in a fileand checks whether each crawlertrap regex can validly be included as text-content in an xml documen