Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

ToolPurposeDenmarkFranceAustriaSpainSweden
DeployApplicationCreates deploy scripts from a deploy-configy




HarvestdatabaseUpdateApplicationUpdates HarvestDB schemay




BuildCompleteSettingsMerges module settings files in NAS to one large global default settings file. Run as part of release process.y




GetFileRetrieves a file via the ArcRepository interface





GetRecordRetrieves a (w)arc-record via the ArcRepository interface





LoadDatabaseChecksumArchiveMigration tool from file-based checksums to database-based checksums





ReestablishAdminDatabaseFor reestablishing the admin database from a 'admin.data' file





RunBatchRuns a batch job from the command line





UploadUploads a file to the ArcRepository from the command line. (Handy for testdata.)y




ReestablishAdminDatabase

Should be deprecated (question) Reads old admin.data file.





ClassDependenciesNon NAS Utility (license is not ours)





CreateIndexCLI to talk to IndexServer via IndexClient





RunChecksumCLI to get all checksums from a Bitarchive (deprecated)





SendDedupIndexRequestToIndexserver

Asynchronously starts a dedup indexing on an IndexServer and then exits. Tue Hejlskov Larsen is this what you use to generate deduplication indexes?







MakeIndexRuns a CDX extraction on a single file in a remote ArcRepository





FindRelevantCrawllogLinesFinds crawl-log lines matching a given domain name in a local metadata file





JMXProxy"This tool will simply reregister all MBeans that matches the given query from the JMX hosts read in settings, using* its own platformmbeanserver. It will then wait forever."





DeduplicateToCDXApplicationExtracts CDX records for deduplicate annotations from a local crawl log file





ResetFailedFilesUtility for WaybackIndexer to reset files that have failed more than 3 times so they can be retried





ARCReaderUtilsSplits an arcfile (not warc) and dumps results to a directory





ArcWrapCreates an arcfile by wrapping a file





ExtractCDXExtracts CDX records, unsorted, from a list of local input arcfiles (not warcs)





JMSBrokerChecks that a JMS broker (as specified in NAS settings) is up and running.





WriteBytesToFileJust creates large files full of null bytes





FTPValidator

Tests if an ftp server configuration in a NAS settings file points to a NAS-compliand ftp server.





ArcMergeMerges several arcfiles into one arcfile





ArchiveExtractCDX

Extracts CDX records, unsorted, from a list of local input (w)arcfiles





WARCExtractCDXExtracts CDX records, unsorted, from a list of local input warcfiles





ReformatTranslationFilei) reorders a translation file so keys are in the same order as a reference file, and ii) allows the encoding of the output file to be changed





MailValidatorChecks the validity of a mail-server configured in NAS settings by sending a test-mail





MakeNewMetadataFileCreates a metadata file. For use when postprocessing fails. Is this used?





FindDomainsForCrawllogExtraction?





CheckDuplicateReductionValidates deduplication by comparing a crawl log with a collection of arcfiles. (not warc)





StandaloneApplicationReducedCreates a standalone NetarchiveSuite in a single JVM





MigrateDefaultHarvestDatabaseThis just initialises a SiteSection object which is supposed to upgrade the harvest database as a side-effect





CreateCDXMetadataFileComplex tool that takes a set of filenames and runs a batch job to extracts the cdx'es from each files and pack them in a metadata arc or warc file, one record per input file





HarvesterQueueControlTool to count the number of messages in a given JMS queue





HarvestDatabaseValidatorValidates whether you can connect to the harvest database with the settings in a given settings file





HarvestTemplateApplicationUtility for uploading and updating heritrix templatesy (in test)




CheckDomainCrawltrapsRuns through all domains in the harvest database and checks whether each crawlertrap regexp can validly be included as text-content in an xml documenty




CheckTrapsInFile

Runs through a list of crawler-trap regexes in a fileand checks whether each crawlertrap regex can validly be included as text-content in an xml documen

y(?)