Serialized Form


Package dk.netarkivet.archive.arcrepository.bitpreservation

Class dk.netarkivet.archive.arcrepository.bitpreservation.AdminDataMessage extends ArchiveMessage implements Serializable

Serialized Fields

fileName

java.lang.String fileName
The filename to be updated in AdminData.


replicaId

java.lang.String replicaId
The id of the replica, where the file resides.


newvalue

ReplicaStoreState newvalue
the new storestate for the filename. Used only when changestorestate is true.


checksum

java.lang.String checksum
the new checksum for the filename. Used only when changechecksum is true.


changestorestate

boolean changestorestate
change storestate flag. default = false.


changechecksum

boolean changechecksum
change checksum flag. default = false.


Package dk.netarkivet.archive.arcrepository.distribute

Class dk.netarkivet.archive.arcrepository.distribute.StoreMessage extends ArchiveMessage implements Serializable

Serialized Fields

theRemoteFile

RemoteFile theRemoteFile
The actual data.


Package dk.netarkivet.archive.bitarchive.distribute

Class dk.netarkivet.archive.bitarchive.distribute.BatchEndedMessage extends ArchiveMessage implements Serializable

Serialized Fields

baApplicationId

java.lang.String baApplicationId
The identifier for the bitarchive application, that performed the batch-job.


originatingBatchMsgId

java.lang.String originatingBatchMsgId
The identifier for the message, that initiated the batch-job.


noOfFilesProcessed

int noOfFilesProcessed
Number of files processed by the batch-job.


filesFailed

java.util.Collection<E> filesFailed
Collection of files that the batch-job could not process.


rf

RemoteFile rf
The result of the batchJob.


exceptions

java.util.List<E> exceptions
List of exceptions that occurred during processing.

Class dk.netarkivet.archive.bitarchive.distribute.BatchMessage extends ArchiveMessage implements Serializable

Serialized Fields

job

FileBatchJob job
The batch job, this message is sent to initiate.


replicaId

java.lang.String replicaId
The id of this replica.


args

java.util.List<E> args
The list of arguments for the batchjob.


batchID

java.lang.String batchID
The ID for the batch process.

Class dk.netarkivet.archive.bitarchive.distribute.BatchReplyMessage extends ArchiveMessage implements Serializable

Serialized Fields

noOfFilesProcessed

int noOfFilesProcessed
Number of files processed by the BatchJob.


filesFailed

java.util.HashSet<E> filesFailed
Set of files that the BatchJob could not process.


resultFile

RemoteFile resultFile
The result of the BatchJob.

Class dk.netarkivet.archive.bitarchive.distribute.BatchTerminationMessage extends NetarkivetMessage implements Serializable

Serialized Fields

terminateID

java.lang.String terminateID
The ID of the batchjob to terminate.

Class dk.netarkivet.archive.bitarchive.distribute.GetFileMessage extends ArchiveMessage implements Serializable

Serialized Fields

arcfileName

java.lang.String arcfileName
the file to retrieve.


remoteFile

RemoteFile remoteFile
The actual data.


replicaId

java.lang.String replicaId
This replica id.

Class dk.netarkivet.archive.bitarchive.distribute.GetMessage extends ArchiveMessage implements Serializable

Serialized Fields

arcfile

java.lang.String arcfile
the arcfile to retrieve an record from.


index

long index
offset of the record to retrieve.


record

BitarchiveRecord record
the retrieved record.

Class dk.netarkivet.archive.bitarchive.distribute.HeartBeatMessage extends ArchiveMessage implements Serializable

Serialized Fields

timestamp

long timestamp
time when heartbeat occurred. Note that timestamps cannot be compared between processes.


applicationId

java.lang.String applicationId
id of the application sending the heartbeat.

Class dk.netarkivet.archive.bitarchive.distribute.RemoveAndGetFileMessage extends ArchiveMessage implements Serializable

Serialized Fields

fileName

java.lang.String fileName
The file to retrieve.


remoteFile

RemoteFile remoteFile
The actual data.


replicaId

java.lang.String replicaId
This replica id.


checksum

java.lang.String checksum
The checksum of the file to remove.


credentials

java.lang.String credentials
The bitarchive credentials.

Class dk.netarkivet.archive.bitarchive.distribute.UploadMessage extends ArchiveMessage implements Serializable

Serialized Fields

arcfileName

java.lang.String arcfileName
the name of the file to upload.


theRemoteFile

RemoteFile theRemoteFile
The actual data.


Package dk.netarkivet.archive.checksum.distribute

Class dk.netarkivet.archive.checksum.distribute.CorrectMessage extends ArchiveMessage implements Serializable

Serialized Fields

theRemoteFile

RemoteFile theRemoteFile
The file to replace the current bad entry.


arcFilename

java.lang.String arcFilename
The name of the arc-file.


theIncorrectChecksum

java.lang.String theIncorrectChecksum
The 'bad' checksum.


replicaId

java.lang.String replicaId
The replica, where this message should be sent.


credentials

java.lang.String credentials
The credentials to allow the correction of the archive entry.


removedFile

RemoteFile removedFile
The 'removed' file, which has to be returned.

Class dk.netarkivet.archive.checksum.distribute.GetAllChecksumsMessage extends ArchiveMessage implements Serializable

Serialized Fields

rf

RemoteFile rf
The file containing the output.


replicaId

java.lang.String replicaId
The id for the replica where this message should be sent.

Class dk.netarkivet.archive.checksum.distribute.GetAllFilenamesMessage extends ArchiveMessage implements Serializable

Serialized Fields

remoteFile

RemoteFile remoteFile
The file with the current content, which will be retrieved from the sender of this message.


replicaId

java.lang.String replicaId
The id for the replica where this message should be sent.

Class dk.netarkivet.archive.checksum.distribute.GetChecksumMessage extends ArchiveMessage implements Serializable

Serialized Fields

arcFilename

java.lang.String arcFilename
The name of the arc file to retrieve the checksum from.


checksum

java.lang.String checksum
The resulting checksum for the arcFile.


replicaId

java.lang.String replicaId
The id of the replica where the checksum should be retrieved.


isReply

boolean isReply
Variable to tell whether this is a reply.


Package dk.netarkivet.archive.distribute

Class dk.netarkivet.archive.distribute.ArchiveMessage extends NetarkivetMessage implements Serializable


Package dk.netarkivet.archive.indexserver.distribute

Class dk.netarkivet.archive.indexserver.distribute.IndexRequestMessage extends ArchiveMessage implements Serializable

Serialization Methods

readObject

private void readObject(java.io.ObjectInputStream s)
Invoke default method for deserializing object, and reinitialise the logger.


writeObject

private void writeObject(java.io.ObjectOutputStream s)
Invoke default method for serializing object.

Serialized Fields

requestedJobs

java.util.Set<E> requestedJobs
List of jobs for which an index is requested. Should always be set.


requestType

RequestType requestType
Type of index is requested. Should always be set.


foundJobs

java.util.Set<E> foundJobs
List of jobs for which an index _can_ be generated. Should only be set on reply. Should always be a subset of requestedJobs. If This set is equal to the requested set, resultFile should also be set.


resultFiles

java.util.List<E> resultFiles
The list of files that make up the generated index. Should only be set on reply, and only if index was generated for all files if indexIsStoredInDirectory is false, this list must contain exactly one file (or not have been set yet).


indexIsStoredInDirectory

boolean indexIsStoredInDirectory
If true, the underlying cache uses a directory to store its files (which may be zero or more files), otherwise just a single file is used.


shouldReturnIndex

boolean shouldReturnIndex
If true, return the index to the sender. If false, send IndexReadyMessage instead.


harvestId

java.lang.Long harvestId
The harvestId needing this index for its jobs.


Package dk.netarkivet.common.distribute

Class dk.netarkivet.common.distribute.AbstractRemoteFile extends java.lang.Object implements Serializable

Serialized Fields

file

java.io.File file
The file this is remote file for


useChecksums

boolean useChecksums
If true, communication is checksummed.


fileDeletable

boolean fileDeletable
If true, the file may be deleted after all transfers are done.


multipleDownloads

boolean multipleDownloads
If true, the file may be downloaded multiple times. Otherwise, the remote file is invalidated after first transfer.


filesize

long filesize
The size of the file.

Class dk.netarkivet.common.distribute.ChannelID extends java.lang.Object implements Serializable

Serialization Methods

readObject

private void readObject(java.io.ObjectInputStream ois)
Method used by Java deserialization. Our coding guidelines prescribes that this method should always be implemented, even if it only calls the default method: http://kb-prod-udv-001.kb.dk/twiki/bin/view/Netarkiv/ImplementeringOgTestAfSerializable See also "Effective Java", pages 219 and 224.

Throws:
IOFailure - if Java could not deserialize the object.

writeObject

private void writeObject(java.io.ObjectOutputStream oos)
Method used by Java serialization. Our coding guidelines prescribes that this method should always be implemented, even if it only calls the default method: http://kb-prod-udv-001.kb.dk/twiki/bin/view/Netarkiv/ImplementeringOgTestAfSerializable See also "Effective Java", pages 219 and 224.

Throws:
IOFailure - if Java could not serialize the object.
Serialized Fields

environmentName

java.lang.String environmentName
The name of the enviroment in which this process is running. It is used for prefixing all ChannelIDs. An example value is "PROD".


name

java.lang.String name
A ChannelID is identified by its name. It has one bit of state information: is it a queue or a topic?

Class dk.netarkivet.common.distribute.FTPRemoteFile extends AbstractRemoteFile implements Serializable

Serialized Fields

ftpServerName

java.lang.String ftpServerName
Ftp-connection information. Read ftp-related settings from settings.xml. Notice that these settings get transferred to the receiver, which is necessary to allow the receiver to get data from different servers.


ftpServerPort

int ftpServerPort
The ftp-server port.


ftpUserName

java.lang.String ftpUserName
The username used to connect to the ftp-server.


ftpUserPassword

java.lang.String ftpUserPassword
The password used to connect to the ftp-server.


ftpFileName

java.lang.String ftpFileName
The name that we use for the file on the FTP server. This is only for internal use.


checksum

java.lang.String checksum
If useChecksums is true, contains the file checksum.

Class dk.netarkivet.common.distribute.HTTPRemoteFile extends AbstractRemoteFile implements Serializable

Serialized Fields

hostname

java.lang.String hostname
The name of the host this file originated on.


url

java.net.URL url
The url that exposes this remote file.


checksum

java.lang.String checksum
If useChecksums is true, contains the file checksum.

Class dk.netarkivet.common.distribute.HTTPSRemoteFile extends HTTPRemoteFile implements Serializable

Class dk.netarkivet.common.distribute.NetarkivetMessage extends java.lang.Object implements Serializable

Serialization Methods

readObject

private void readObject(java.io.ObjectInputStream s)
Invoke default method for deserializing object.


writeObject

private void writeObject(java.io.ObjectOutputStream s)
Invoke default method for serializing object.

Serialized Fields

errMsg

java.lang.String errMsg

isOk

boolean isOk

id

java.lang.String id

to

ChannelID to

replyTo

ChannelID replyTo

replyOfId

java.lang.String replyOfId

Class dk.netarkivet.common.distribute.NullRemoteFile extends java.lang.Object implements Serializable


Package dk.netarkivet.common.distribute.arcrepository

Class dk.netarkivet.common.distribute.arcrepository.BitarchiveRecord extends java.lang.Object implements Serializable

Serialized Fields

fileName

java.lang.String fileName
The file the data were retrieved from.


objectBuffer

byte[] objectBuffer
The actual data.


offset

long offset
The offset of the ArchiveRecord contained.


length

long length
The length of the ArchiveRecord contained.


objectAsRemoteFile

RemoteFile objectAsRemoteFile
The actual data as a remote file.


isStoredAsRemoteFile

boolean isStoredAsRemoteFile
Is the data stored in a RemoteFile.


hasRemoteFileBeenDeleted

boolean hasRemoteFileBeenDeleted
Set after deleting RemoteFile.


LIMIT_FOR_SAVING_DATA_IN_OBJECT_BUFFER

long LIMIT_FOR_SAVING_DATA_IN_OBJECT_BUFFER
How large the ARCRecord can before saving as RemoteFile.


Package dk.netarkivet.common.distribute.monitorregistry

Class dk.netarkivet.common.distribute.monitorregistry.HostEntry extends java.lang.Object implements Serializable

Serialized Fields

name

java.lang.String name
The name of the remote host.


jmxPort

int jmxPort
The JMX port allocated on the remote host.


rmiPort

int rmiPort
The RMI port allocated on the remote host.


time

java.util.Date time
The time this host-entry was last seen alive.


Package dk.netarkivet.common.exceptions

Class dk.netarkivet.common.exceptions.ArgumentNotValid extends NetarkivetException implements Serializable

Class dk.netarkivet.common.exceptions.BatchTermination extends NetarkivetException implements Serializable

Class dk.netarkivet.common.exceptions.ForwardedToErrorPage extends NetarkivetException implements Serializable

Class dk.netarkivet.common.exceptions.IllegalState extends NetarkivetException implements Serializable

Class dk.netarkivet.common.exceptions.IOFailure extends NetarkivetException implements Serializable

Class dk.netarkivet.common.exceptions.NetarkivetException extends java.lang.RuntimeException implements Serializable

Class dk.netarkivet.common.exceptions.NotImplementedException extends NetarkivetException implements Serializable

Class dk.netarkivet.common.exceptions.PermissionDenied extends NetarkivetException implements Serializable

Class dk.netarkivet.common.exceptions.UnknownID extends NetarkivetException implements Serializable


Package dk.netarkivet.common.utils

Class dk.netarkivet.common.utils.FixedUURI extends org.archive.net.UURI implements Serializable

Class dk.netarkivet.common.utils.SparseBitSet extends java.util.BitSet implements Serializable

Serialized Fields

setbits

java.util.Set<E> setbits
A set of the indices of bits that are set in this BitSet.

Class dk.netarkivet.common.utils.SparseRangeFilter extends org.apache.lucene.search.RangeFilter implements Serializable

Serialized Fields

fieldName

java.lang.String fieldName

lowerTerm

java.lang.String lowerTerm

upperTerm

java.lang.String upperTerm

includeLower

boolean includeLower

includeUpper

boolean includeUpper

Package dk.netarkivet.common.utils.arc

Class dk.netarkivet.common.utils.arc.ARCBatchJob extends FileBatchJob implements Serializable

Serialized Fields

noOfRecordsProcessed

int noOfRecordsProcessed
The total number of records processed.


Package dk.netarkivet.common.utils.batch

Class dk.netarkivet.common.utils.batch.ARCBatchFilter extends java.lang.Object implements Serializable

Serialized Fields

name

java.lang.String name
The name of the BatchFilter.

Class dk.netarkivet.common.utils.batch.ByteJarLoader extends java.lang.ClassLoader implements Serializable

Serialized Fields

binaryData

java.util.Map<K,V> binaryData
The map, that holds the class data.

Class dk.netarkivet.common.utils.batch.ChecksumJob extends FileBatchJob implements Serializable

Serialization Methods

readObject

private void readObject(java.io.ObjectInputStream s)
Invoke default method for deserializing object, and reinitialise the logger.


writeObject

private void writeObject(java.io.ObjectOutputStream s)
                  throws IOFailure
Invoke default method for serializing object.

Throws:
IOFailure - If an exception is caught during writing of the object.

Class dk.netarkivet.common.utils.batch.FileBatchJob extends java.lang.Object implements Serializable

Serialized Fields

filesToProcess

java.util.regex.Pattern filesToProcess
Regular expression for the files to process with this job. By default, all files are processed. This pattern must match the entire filename, but not the path (e.g. .*foo.* for any file with foo in it).


noOfFilesProcessed

int noOfFilesProcessed
The total number of files processed (including any that generated errors).


batchJobTimeout

long batchJobTimeout
If positiv it is the timeout of specific Batch Job in miliseconds. If numbers is negative we use standard timeout from settings.


filesFailed

java.util.Set<E> filesFailed
A Set of files which generated errors.


exceptions

java.util.List<E> exceptions
A list with information about the exceptions thrown during the execution of the batchjob.

Class dk.netarkivet.common.utils.batch.FileBatchJob.ExceptionOccurrence extends java.lang.Object implements Serializable

Serialized Fields

fileName

java.lang.String fileName
The name of the file we were processing when the exception occurred, or null.


fileOffset

long fileOffset
The offset in the file we were processing when the exception occurred.


outputOffset

long outputOffset
How much we had written to the output stream when the exception occurred.


exception

java.lang.Exception exception
The exception that was thrown.


inInitialize

boolean inInitialize
True if this exception was thrown during initialize().


inFinish

boolean inFinish
True if this exception was thrown during finish().

Class dk.netarkivet.common.utils.batch.FileListJob extends FileBatchJob implements Serializable

Serialization Methods

readObject

private void readObject(java.io.ObjectInputStream s)
Invoke default method for deserializing object, and reinitialise the logger.

Class dk.netarkivet.common.utils.batch.FileRemover extends FileBatchJob implements Serializable

Class dk.netarkivet.common.utils.batch.LoadableFileBatchJob extends FileBatchJob implements Serializable

Serialization Methods

readObject

private void readObject(java.io.ObjectInputStream in)
                 throws java.io.IOException,
                        java.lang.ClassNotFoundException
Override of the default way to unserialize an object of this class.

Throws:
java.io.IOException - If there is an error reading from the stream, or the serialized object cannot be deserialized due to errors in the serialized form.
java.lang.ClassNotFoundException - If the class definition of the serialized object cannot be found.

writeObject

private void writeObject(java.io.ObjectOutputStream out)
                  throws java.io.IOException
Override of the default way to serialize this class.

Throws:
java.io.IOException - In case there is an error from the underlying stream, or this object cannot be serialized.
Serialized Fields

fileContents

byte[] fileContents
The binary contents of the file before they are turned into a class.


fileName

java.lang.String fileName
The name of the file before they are turned into a class.


args

java.util.List<E> args
The arguments for instantiating the batchjob.

Class dk.netarkivet.common.utils.batch.LoadableJarBatchJob extends FileBatchJob implements Serializable

Serialization Methods

readObject

private void readObject(java.io.ObjectInputStream in)
                 throws java.io.IOException,
                        java.lang.ClassNotFoundException
Override of the default way to deserialize an object of this class.

Throws:
java.io.IOException - If there is an error reading from the stream, or the serialized object cannot be deserialized due to errors in the serialized form.
java.lang.ClassNotFoundException - If the class definition of the serialized object cannot be found.

writeObject

private void writeObject(java.io.ObjectOutputStream out)
                  throws java.io.IOException
Override of the default way to serialize this class.

Throws:
java.io.IOException - In case there is an error from the underlying stream, or this object cannot be serialized.
Serialized Fields

multipleClassLoader

java.lang.ClassLoader multipleClassLoader
The ClassLoader of type ByteJarLoader associated with this job.


jobClass

java.lang.String jobClass
The name of the loaded Job.


args

java.util.List<E> args
The arguments for instantiating the batchjob.


Package dk.netarkivet.common.utils.cdx

Class dk.netarkivet.common.utils.cdx.ARCFilenameCDXRecordFilter extends SimpleCDXRecordFilter implements Serializable

Serialized Fields

arcfilenamepattern

java.lang.String arcfilenamepattern

p

java.util.regex.Pattern p

Class dk.netarkivet.common.utils.cdx.ExtractCDXJob extends ARCBatchJob implements Serializable

Serialized Fields

fields

java.lang.String[] fields
The fields to be included in CDX output.


includeChecksum

boolean includeChecksum
True if we put an MD5 in each CDX line as well.


log

org.apache.commons.logging.Log log
Logger for this class.

Class dk.netarkivet.common.utils.cdx.GetCDXRecordsBatchJob extends ARCBatchJob implements Serializable

Serialized Fields

URLMatcher

java.util.regex.Pattern URLMatcher
The URL pattern used to retrieve the CDX-records.


mimeMatcher

java.util.regex.Pattern mimeMatcher
The MIME pattern used to retrieve the CDX-records.

Class dk.netarkivet.common.utils.cdx.SimpleCDXRecordFilter extends java.lang.Object implements Serializable

Serialized Fields

filtername

java.lang.String filtername
Variable holding the filtername.


Package dk.netarkivet.harvester.datamodel

Class dk.netarkivet.harvester.datamodel.ExtendedField extends java.lang.Object implements Serializable

Serialized Fields

extendedFieldID

java.lang.Long extendedFieldID
persistent id of this extended field.


extendedFieldTypeID

java.lang.Long extendedFieldTypeID
The Id of the Reference to which the extended field belongs


name

java.lang.String name
name of the extended Field. This name will not be translated.


formattingPattern

java.lang.String formattingPattern
formatting patterns of the extended Field


datatype

int datatype
datatype of the extended Field. see datatype list


mandatory

boolean mandatory
is extendedfield mandatory


sequencenr

int sequencenr
sequencenr to sort fields.


defaultValue

java.lang.String defaultValue
default value for this field


options

java.lang.String options
key-value pairs for Options

Class dk.netarkivet.harvester.datamodel.ExtendedFieldType extends java.lang.Object implements Serializable

Serialized Fields

extendedFieldTypeID

java.lang.Long extendedFieldTypeID

name

java.lang.String name

Class dk.netarkivet.harvester.datamodel.Job extends java.lang.Object implements Serializable

Serialization Methods

readObject

private void readObject(java.io.ObjectInputStream s)
Invoke default method for deserializing object, and reinitialise the logger.


writeObject

private void writeObject(java.io.ObjectOutputStream s)
Invoke default method for serializing object.

Serialized Fields

jobID

java.lang.Long jobID
The persistent ID of this job.


origHarvestDefinitionID

java.lang.Long origHarvestDefinitionID
The Id of the harvestdefinition, that generated this job.


status

JobStatus status
The status of the job. See the JobStatus class for the possible states.


priority

JobPriority priority
The priority of this job.


forceMaxObjectsPerDomain

long forceMaxObjectsPerDomain
Overrides the individual configurations maximum setting for objects retrieved from a domain when set to a positive value.


forceMaxBytesPerDomain

long forceMaxBytesPerDomain
Overrides the individual configurations maximum setting for bytes retrieved from a domain when set to other than -1.


orderXMLname

java.lang.String orderXMLname
The name of the harvest template used by the job.


orderXMLdoc

org.dom4j.Document orderXMLdoc
The harvest template used by the job.


settingsXMLfiles

java.io.File[] settingsXMLfiles
The list of Heritrix settings files.


settingsXMLdocs

org.dom4j.Document[] settingsXMLdocs
The corresponding Dom4j Documents for these files.


seedListSet

java.util.Set<E> seedListSet
A set of seeds involved in this job. Outside the SetSeedList() method, the set of seeds is updated in the addConfiguration() method.


harvestNum

int harvestNum
Which run of the harvest definition this is.


harvestErrors

java.lang.String harvestErrors
Errors during harvesting.


harvestErrorDetails

java.lang.String harvestErrorDetails
Details about errors during harvesting.


uploadErrors

java.lang.String uploadErrors
Errors during upload of the harvested data.


uploadErrorDetails

java.lang.String uploadErrorDetails
Details about errors during upload of the harvested data.


actualStart

java.util.Date actualStart
The starting point of the job.


actualStop

java.util.Date actualStop
The ending point of the job.


submittedDate

java.util.Date submittedDate
The time when this job was submitted.


edition

long edition
Edition is used by the DAO to keep track of changes.


resubmittedAsJobWithID

java.lang.Long resubmittedAsJobWithID
Resubmitted as the Job with this ID. If null, this job has not been resubmitted.


domainConfigurationMap

java.util.Map<K,V> domainConfigurationMap
A map (domainName, domainConfigurationName), must be accessible in order to update job information (see Ass. 2.4.3)


configsChanged

boolean configsChanged
A hint to the DAO that configurations have changed. Since configurations are large, the DAO can use that this is false to avoid updating the config list. The DAO can set it to false after saving configurations.


configurationSetsObjectLimit

boolean configurationSetsObjectLimit
Whether the maxObjects field was defined by the harvest definition or the configuration limit. This is deciding for whether we accept smaller configurations or not when building jobs. True means the limit is defined by the configuration, false means that it is defined by the harvest definition.


configurationSetsByteLimit

boolean configurationSetsByteLimit
Whether the maxBytes field was defined by the harvest definition or the configuration limit. This is deciding for whether we accept smaller configurations or not when building jobs. True means the limit is defined by the configuration, false means by the harvest definition.


minCountObjects

long minCountObjects
The lowest number of objects expected by a configuration.


maxCountObjects

long maxCountObjects
The highest number of objects expected by a configuration.


totalCountObjects

long totalCountObjects
The total number of objects expected by all added configurations.


forceMaxRunningTime

long forceMaxRunningTime
The max time in seconds given to the harvester for this job. 0 is unlimited.


underConstruction

boolean underConstruction
If true, this job object is still undergoing changes due to having more configurations added. When set to false, the object is no longer considered immutable except for updating status. Jobs loaded from the DAO are never under construction anymore.


LIM_MAX_REL_SIZE

long LIM_MAX_REL_SIZE
Job limits read from settings during construction.


LIM_MIN_ABS_SIZE

long LIM_MIN_ABS_SIZE

LIM_MAX_TOTAL_SIZE

long LIM_MAX_TOTAL_SIZE

useQuotaEnforcer

boolean useQuotaEnforcer

Class dk.netarkivet.harvester.datamodel.RepeatingSchedule extends Schedule implements Serializable

Serialized Fields

repeats

int repeats
How many times this schedule should be repeated.

Class dk.netarkivet.harvester.datamodel.Schedule extends java.lang.Object implements Serializable

Serialized Fields

name

java.lang.String name
Human readable name for the schedule.


comments

java.lang.String comments
Any comments added by the user.


startDate

java.util.Date startDate
first run of job: date, time (hour:min:sec). May be null, meaning at any time


frequency

Frequency frequency
Frequency of runs, possibly with a time it should happen at.


edition

long edition
Edition is used by the DAO to keep track of changes.


id

java.lang.Long id
ID autogenerated by DB, ignored otherwise.

Class dk.netarkivet.harvester.datamodel.SeedList extends java.lang.Object implements Serializable

Serialized Fields

name

java.lang.String name
The name of the seedlist. Used for sorting.


seeds

java.util.List<E> seeds
The List of Seeds; Each String in the List holds one seed.


comments

java.lang.String comments
Any comments associated with this seedlist.


id

java.lang.Long id
ID autogenerated by DB, ignored otherwise.

Class dk.netarkivet.harvester.datamodel.TimedSchedule extends Schedule implements Serializable

Serialized Fields

endDate

java.util.Date endDate
The day this schedule should end.


Package dk.netarkivet.harvester.distribute

Class dk.netarkivet.harvester.distribute.HarvesterMessage extends NetarkivetMessage implements Serializable

Class dk.netarkivet.harvester.distribute.IndexReadyMessage extends HarvesterMessage implements Serializable

Serialized Fields

harvestId

java.lang.Long harvestId
The ID for a specific harvest.


Package dk.netarkivet.harvester.harvesting

Class dk.netarkivet.harvester.harvesting.ContentSizeAnnotationPostProcessor extends org.archive.crawler.framework.Processor implements Serializable

Class dk.netarkivet.harvester.harvesting.OnNSDomainsDecideRule extends org.archive.crawler.deciderules.SurtPrefixedDecideRule implements Serializable


Package dk.netarkivet.harvester.harvesting.distribute

Class dk.netarkivet.harvester.harvesting.distribute.CrawlProgressMessage extends HarvesterMessage implements Serializable

Serialized Fields

jobID

long jobID
The unique identifier of the job.


harvestID

long harvestID
The unique identifier of the associated harvest definition.


hostUrl

java.lang.String hostUrl
The URL to the host Heritrix admin UI.


status

CrawlProgressMessage.CrawlStatus status
The job's status.


progressStatisticsLegend

java.lang.String progressStatisticsLegend
A legend, fetched only once, for the CrawlProgressMessage.CrawlServiceJobInfo.progressStatistics property.


heritrixStatus

CrawlProgressMessage.CrawlServiceInfo heritrixStatus
The information provided by the CrawlService MBean.


jobStatus

CrawlProgressMessage.CrawlServiceJobInfo jobStatus
The information provided by the CrawlService.Job MBean.

Class dk.netarkivet.harvester.harvesting.distribute.CrawlProgressMessage.CrawlServiceInfo extends java.lang.Object implements Serializable

Serialized Fields

alertCount

int alertCount
The number of alerts raised by Heritrix.


isCrawling

boolean isCrawling
Flag is set to true when Heritrix is crawling or paused.


currentJob

java.lang.String currentJob
Contains the UID of the current job.

Class dk.netarkivet.harvester.harvesting.distribute.CrawlProgressMessage.CrawlServiceJobInfo extends java.lang.Object implements Serializable

Serialized Fields

discoveredFilesCount

long discoveredFilesCount
The number of URIs currently discovered.


downloadedFilesCount

long downloadedFilesCount
The number of URIs currently harvested.


frontierShortReport

java.lang.String frontierShortReport
A summary of the frontier queues.


elapsedSeconds

long elapsedSeconds
The time in seconds elapsed since the crawl began.


currentProcessedKBPerSec

long currentProcessedKBPerSec
The current download rate in KB/sec.


processedKBPerSec

long processedKBPerSec
The average download rate in KB/sec.


currentProcessedDocsPerSec

double currentProcessedDocsPerSec
The current download rate in URI/sec.


processedDocsPerSec

double processedDocsPerSec
The average download rate in URI/sec.


activeToeCount

int activeToeCount
The number of active toe threads for this job.


progressStatistics

java.lang.String progressStatistics
A textual summary of the crawler activity.


status

java.lang.String status
The job status.

Class dk.netarkivet.harvester.harvesting.distribute.CrawlStatusMessage extends HarvesterMessage implements Serializable

Serialized Fields

jobID

long jobID
the id for the crawlJob, for which this message reports.


statusCode

JobStatus statusCode
The current state of the crawl-job.


harvestReport

HarvestReport harvestReport
A harvestReport created at the end of the crawl.


harvestErrors

java.lang.String harvestErrors
harvest errors encountered.


harvestErrorDetails

java.lang.String harvestErrorDetails
harvest errors encountered with details.


uploadErrors

java.lang.String uploadErrors
upload errors encountered.


uploadErrorDetails

java.lang.String uploadErrorDetails
upload errors encountered with details.

Class dk.netarkivet.harvester.harvesting.distribute.DomainStats extends java.lang.Object implements Serializable

Serialized Fields

objectCount

long objectCount
Count of how many objects have been harvested from this domain.


byteCount

long byteCount
Count of how many bytes have been harvested from this domain .


stopReason

StopReason stopReason
The reason why we 'only' harvested byteCount bytes or objectCount objects.

Class dk.netarkivet.harvester.harvesting.distribute.DoOneCrawlMessage extends HarvesterMessage implements Serializable

Serialization Methods

readObject

private void readObject(java.io.ObjectInputStream s)
                 throws java.lang.ClassNotFoundException,
                        java.io.IOException
Method needed to de-serializable an object of this class.

Throws:
java.lang.ClassNotFoundException - In case the object read is of unknown class.
java.io.IOException - On I/O trouble reading the object.

writeObject

private void writeObject(java.io.ObjectOutputStream s)
                  throws java.io.IOException
Method needed to serializable an object of this class.

Throws:
java.io.IOException - On I/O trouble writing the object.
Serialized Fields

submittedJob

Job submittedJob
the Job to crawl.


origHarvestInfo

PersistentJobData.HarvestDefinitionInfo origHarvestInfo
The original harvest info.


metadata

java.util.List<E> metadata
Extra metadata associated with the crawl-job.

Class dk.netarkivet.harvester.harvesting.distribute.FrontierReportMessage extends HarvesterMessage implements Serializable

Serialized Fields

filterId

java.lang.String filterId
The id of the filter that generated this report.


report

InMemoryFrontierReport report
The report.

Class dk.netarkivet.harvester.harvesting.distribute.HarvesterStatusMessage extends HarvesterMessage implements Serializable

Serialized Fields

jobProprity

JobPriority jobProprity
The priority of jobs crawled by the sender.


applicationInstanceId

java.lang.String applicationInstanceId
The sender's application instance ID


isAvailable

boolean isAvailable
Whether or not the sender is processing a crawl request.

Class dk.netarkivet.harvester.harvesting.distribute.JobEndedMessage extends HarvesterMessage implements Serializable

Serialized Fields

jobId

long jobId
The associated job's ID.


jobStatus

JobStatus jobStatus
The associated job's current status.

Class dk.netarkivet.harvester.harvesting.distribute.MetadataEntry extends java.lang.Object implements Serializable

Serialization Methods

readObject

private void readObject(java.io.ObjectInputStream s)
                 throws java.lang.ClassNotFoundException,
                        java.io.IOException
Method needed to de-serializable an object of this class.

Throws:
java.lang.ClassNotFoundException - If the class of the serialized object could not be found
java.io.IOException - If an I/O error occurred while reading the serialized object

writeObject

private void writeObject(java.io.ObjectOutputStream s)
                  throws java.io.IOException
Method needed to serializable an object of this class.

Throws:
java.io.IOException - If an I/O error occurred while writing to the outputstream
Serialized Fields

url

java.lang.String url
The URL for this metadataEntry: Used as the unique identifier for this bit of metadata in the Netarchive.


mimeType

java.lang.String mimeType
The mimetype for this metadataEntry: Identifies which type of document this bit of metadata is.


data

byte[] data
the metadata itself as byte array.

Class dk.netarkivet.harvester.harvesting.distribute.PersistentJobData.HarvestDefinitionInfo extends java.lang.Object implements Serializable

Serialized Fields

origHarvestName

java.lang.String origHarvestName
The original harvest name.


origHarvestDesc

java.lang.String origHarvestDesc
The original harvest description.


scheduleName

java.lang.String scheduleName
The name of the schedule for the original harvest definition.


Package dk.netarkivet.harvester.harvesting.extractor

Class dk.netarkivet.harvester.harvesting.extractor.ExtractorOAI extends org.archive.crawler.extractor.Extractor implements Serializable

Serialized Fields

log

org.apache.commons.logging.Log log
The class logger.


numberOfCURIsHandled

long numberOfCURIsHandled
The number of crawl-uris handled by this extractor.


numberOfLinksExtracted

long numberOfLinksExtracted
The number of links extracted by this extractor.


Package dk.netarkivet.harvester.harvesting.frontier

Class dk.netarkivet.harvester.harvesting.frontier.FrontierReportLine extends java.lang.Object implements Serializable

Serialized Fields

domainName

java.lang.String domainName
The queue name, in our case the domain, as we use per domain queues.


currentSize

long currentSize
Number of URIs currently in the queue.


totalEnqueues

long totalEnqueues
Count of total times a URI has been enqueued to this queue; a measure of the total number of URI instances ever put on this queue. This can be a larger number than the unique URIs, as some URIs (most notably DNS/robots when refetched, but possibly other things force-requeued under advanced usage) may be enqueued more than once.


sessionBalance

long sessionBalance
When using the 'budget/rotation' functionality (a non-zero URI cost policy), this is the running 'balance' of a queue during its current 'active' session. This balance declines; when it hits zero, another queue (if any are waiting 'inactive') gets a chance to enter active crawling (as fast as politeness allows).


lastCost

double lastCost
The 'cost' of the last URI charged against the queue's budgets. If using a cost policy that makes some URIs more costly than others, this may indicate the queue has reached more-costly URIs. (Such larger-cost URIs will be inserted later in the queue, accelerate the depletion of the session balance, and accelerate progress towards the total queue budget, which could send the queue into 'retirement'. Thus higher-cost URIs mean a queue over time gets less of the crawler's cycles.)


averageCost

double averageCost
Average cost of a processed URI.


lastDequeueTime

java.lang.String lastDequeueTime
Timestamp of when the last URI came off this queue for processing. May give an indication of how long a queue has been empty/inactive.


wakeTime

java.lang.String wakeTime
If the queue is in any sort of politeness- or connect-problem-'snooze' delay, this indicates when it will again be eligible to offer URIs to waiting threads. (When it wakes, it gets in line -- so actual wait before next URI is tried may be longer depending on the balance of threads and other active queues.)


totalSpend

long totalSpend
The total of all URI costs charged against this queue.


totalBudget

long totalBudget
The totalBudget above which the queue will be retired (made permanently inactive unless its totalBudget is raised).


errorCount

long errorCount
The number of URIs from this queue that reached 'finished' status with an error code (non-retryable errors, or exhausted retries, or other errors). When nonzero and rising there may be special problems with the site(s) related to this queue.


lastPeekUri

java.lang.String lastPeekUri
The last URI peeked/dequeued from the head of this queue.


lastQueuedUri

java.lang.String lastQueuedUri
The last URI enqueued to anywhere in this queue.

Class dk.netarkivet.harvester.harvesting.frontier.FullFrontierReport extends AbstractFrontierReport implements Serializable

Serialized Fields

dbEnvironment

com.sleepycat.je.Environment dbEnvironment
The Berkeley DB JE environment.


store

com.sleepycat.persist.EntityStore store
The BDB entity store.


linesIndex

com.sleepycat.persist.PrimaryIndex<PK,E> linesIndex
Primary index.


linesByDomain

com.sleepycat.persist.SecondaryIndex<SK,PK,E> linesByDomain
Secondary index, per domain name.


linesByCurrentSize

com.sleepycat.persist.SecondaryIndex<SK,PK,E> linesByCurrentSize
Secondary index, per current size.


linesBySpentBudget

com.sleepycat.persist.SecondaryIndex<SK,PK,E> linesBySpentBudget
Secondary index, per spent budget.


storageDir

java.io.File storageDir
The directory where the BDB is stored.

Class dk.netarkivet.harvester.harvesting.frontier.InMemoryFrontierReport extends AbstractFrontierReport implements Serializable

Serialized Fields

lines

java.util.TreeSet<E> lines
The lines of the report, sorted by natural order.


linesByDomain

java.util.TreeMap<K,V> linesByDomain
The lines of the report, mapped by domain name.


Package dk.netarkivet.harvester.harvesting.report

Class dk.netarkivet.harvester.harvesting.report.AbstractHarvestReport extends java.lang.Object implements Serializable

Serialized Fields

domainstats

java.util.Map<K,V> domainstats
Datastructure holding the domain-information contained in one harvest.


defaultStopReason

StopReason defaultStopReason
The default reason why we stopped harvesting this domain. This value is set by looking for a CRAWL ENDED in the crawl.log.

Class dk.netarkivet.harvester.harvesting.report.BnfHarvestReport extends AbstractHarvestReport implements Serializable

Class dk.netarkivet.harvester.harvesting.report.LegacyHarvestReport extends AbstractHarvestReport implements Serializable


Package dk.netarkivet.monitor.distribute

Class dk.netarkivet.monitor.distribute.MonitorMessage extends NetarkivetMessage implements Serializable


Package dk.netarkivet.monitor.registry.distribute

Class dk.netarkivet.monitor.registry.distribute.RegisterHostMessage extends MonitorMessage implements Serializable

Serialized Fields

hostEntry

HostEntry hostEntry
The HostEntry to register.


Package dk.netarkivet.viewerproxy.reporting

Class dk.netarkivet.viewerproxy.reporting.CrawlLogLinesMatchingRegexp extends ARCBatchJob implements Serializable

Serialized Fields

log

org.apache.commons.logging.Log log
The logger.


regexp

java.lang.String regexp
The regular expression to match in the crawl.log line.

Class dk.netarkivet.viewerproxy.reporting.HarvestedUrlsForDomainBatchJob extends ARCBatchJob implements Serializable

Serialized Fields

log

org.apache.commons.logging.Log log

domain

java.lang.String domain
The domain to extract crawl.log lines for.


Package dk.netarkivet.wayback.batch

Class dk.netarkivet.wayback.batch.ExtractDeduplicateCDXBatchJob extends ARCBatchJob implements Serializable

Serialized Fields

adapter

DeduplicateToCDXAdapter adapter
A utility which has methods for converting a deduplicate crawl-log entry to a CDX entry.


crawl_log_url_pattern

java.util.regex.Pattern crawl_log_url_pattern
A Patteren representing a compiled expression representing the url in a metadata arcfile of a crawl log entry

Class dk.netarkivet.wayback.batch.ExtractWaybackCDXBatchJob extends ARCBatchJob implements Serializable

Serialized Fields

log

org.apache.commons.logging.Log log
Logger for this class.


aToSAdapter

NetarchiveSuiteARCRecordToSearchResultAdapter aToSAdapter
Utility for converting an ArcRecord to a CaptureSearchResult (wayback's representation of a CDX record).


srToCDXAdapter

org.archive.wayback.resourceindex.cdx.SearchResultToCDXLineAdapter srToCDXAdapter
Utility for converting a wayback CaptureSearchResult to a String representing a line in a CDX file.


Package dk.netarkivet.wayback.batch.copycode

Class dk.netarkivet.wayback.batch.copycode.NetarchiveSuiteUURIFactory extends org.archive.net.UURI implements Serializable

serialVersionUID: -6146295130382209042L

Serialized Fields

schemes

java.lang.String[] schemes

ignoredSchemes

java.lang.String[] ignoredSchemes