Class Heritrix3Settings
- java.lang.Object
-
- dk.netarkivet.harvester.heritrix3.Heritrix3Settings
-
public class Heritrix3Settings extends java.lang.Object
Settings specific to the heritrix3 harvester module of NetarchiveSuite.
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
CRAWL_LOOP_WAIT_TIME
static java.lang.String
DISREGARD_SEEDURL_INFORMATION_IN_CRAWLLOG
static java.lang.String
FRONTIER_REPORT_WAIT_TIME
static java.lang.String
HARVEST_REPORT_CLASS
static java.lang.String
HERITRIX_ADMIN_NAME
static java.lang.String
HERITRIX_ADMIN_PASSWORD
static java.lang.String
HERITRIX_CONTROLLER_CLASS
static java.lang.String
HERITRIX_GUI_PORT
static java.lang.String
HERITRIX_HEAP_SIZE
static java.lang.String
HERITRIX_JVM_OPTS
static java.lang.String
HERITRIX_LAUNCHER_CLASS
static java.lang.String
METADATA_ARCHIVE_FILES_REPORT_HEADER
settings.harvester.harvesting.metadata.archiveFilesReportName IfMETADATA_GENERATE_ARCHIVE_FILES_REPORT
is set to true, sets the header of the generated report file.static java.lang.String
METADATA_ARCHIVE_FILES_REPORT_NAME
settings.harvester.harvesting.metadata.archiveFilesReportName IfMETADATA_GENERATE_ARCHIVE_FILES_REPORT
is set to true, sets the name of the generated report file.static java.lang.String
METADATA_GENERATE_ARCHIVE_FILES_REPORT
settings.harvester.harvesting.metadata.generateArchiveFilesReport This setting is a boolean flag that enables/disables the generation of an ARC/WARC files report.static java.lang.String
UMBRA_HOPS_SHOULD_PROCESS
Regex specifying the Heritrix discovery path of Urls to be sent to Umbra.static java.lang.String
UMBRA_IS_ENABLED
Flag indicating whether or not this HarvestControllerApplication is configured to support Umbrastatic java.lang.String
UMBRA_PRESTART_SCRIPT
Path to a script to be executed before heritrix is started for every umbra enabled job.static java.lang.String
UMBRA_URL
Url of the socket-endpoint ("amqp://") for the rabbitmq broker which feeds umbra Default values is amqp://guest:guest@localhost:8998/%2f
-
Constructor Summary
Constructors Constructor Description Heritrix3Settings()
-
-
-
Field Detail
-
UMBRA_IS_ENABLED
public static java.lang.String UMBRA_IS_ENABLED
Flag indicating whether or not this HarvestControllerApplication is configured to support Umbra
-
UMBRA_URL
public static java.lang.String UMBRA_URL
Url of the socket-endpoint ("amqp://") for the rabbitmq broker which feeds umbra Default values is amqp://guest:guest@localhost:8998/%2f
-
UMBRA_HOPS_SHOULD_PROCESS
public static java.lang.String UMBRA_HOPS_SHOULD_PROCESS
Regex specifying the Heritrix discovery path of Urls to be sent to Umbra. The default value "^$|.*L" selects only seeds (the empty string) or hops (strings ending in L). Included elements on webpages are therefore not sent to umbra.
-
UMBRA_PRESTART_SCRIPT
public static java.lang.String UMBRA_PRESTART_SCRIPT
Path to a script to be executed before heritrix is started for every umbra enabled job.
-
CRAWL_LOOP_WAIT_TIME
public static java.lang.String CRAWL_LOOP_WAIT_TIME
-
FRONTIER_REPORT_WAIT_TIME
public static java.lang.String FRONTIER_REPORT_WAIT_TIME
-
HERITRIX_ADMIN_NAME
public static java.lang.String HERITRIX_ADMIN_NAME
-
HERITRIX_ADMIN_PASSWORD
public static java.lang.String HERITRIX_ADMIN_PASSWORD
-
HERITRIX_GUI_PORT
public static java.lang.String HERITRIX_GUI_PORT
-
HERITRIX_HEAP_SIZE
public static java.lang.String HERITRIX_HEAP_SIZE
-
HERITRIX_JVM_OPTS
public static java.lang.String HERITRIX_JVM_OPTS
-
HERITRIX_CONTROLLER_CLASS
public static java.lang.String HERITRIX_CONTROLLER_CLASS
-
HERITRIX_LAUNCHER_CLASS
public static java.lang.String HERITRIX_LAUNCHER_CLASS
-
HARVEST_REPORT_CLASS
public static java.lang.String HARVEST_REPORT_CLASS
-
DISREGARD_SEEDURL_INFORMATION_IN_CRAWLLOG
public static java.lang.String DISREGARD_SEEDURL_INFORMATION_IN_CRAWLLOG
-
METADATA_GENERATE_ARCHIVE_FILES_REPORT
public static java.lang.String METADATA_GENERATE_ARCHIVE_FILES_REPORT
settings.harvester.harvesting.metadata.generateArchiveFilesReport This setting is a boolean flag that enables/disables the generation of an ARC/WARC files report. Default value is 'true'.
-
METADATA_ARCHIVE_FILES_REPORT_NAME
public static java.lang.String METADATA_ARCHIVE_FILES_REPORT_NAME
settings.harvester.harvesting.metadata.archiveFilesReportName IfMETADATA_GENERATE_ARCHIVE_FILES_REPORT
is set to true, sets the name of the generated report file. Default value is 'archivefiles-report.txt'.
-
METADATA_ARCHIVE_FILES_REPORT_HEADER
public static java.lang.String METADATA_ARCHIVE_FILES_REPORT_HEADER
settings.harvester.harvesting.metadata.archiveFilesReportName IfMETADATA_GENERATE_ARCHIVE_FILES_REPORT
is set to true, sets the header of the generated report file. This setting should generally be left to its default value, which is '[ARCHIVEFILE] [Closed] [Size]'.
-
-
Constructor Detail
-
Heritrix3Settings
public Heritrix3Settings()
-
-