dk.netarkivet.harvester.harvesting
Class MetadataFile

java.lang.Object
  extended by dk.netarkivet.harvester.harvesting.MetadataFile
All Implemented Interfaces:
java.lang.Comparable<MetadataFile>

public class MetadataFile
extends java.lang.Object
implements java.lang.Comparable<MetadataFile>

Wraps information for an Heritrix file that should be stored in the metadata ARC. Defines a natural order to sort them.


Field Summary
static java.lang.String CDX_PATTERN
          A pattern identifying a CDX metadata entry.
static java.lang.String CRAWL_LOG_PATTERN
          A pattern identifying the crawl log metadata entry.
static java.lang.String DOMAIN_SETTINGS_FILE
          The name of a domain-specific Heritrix settings file (a.k.a.
static java.lang.String HERITRIX_FILE_PATTERN
          The pattern controlling which files in the crawl directory root should be stored in the metadata ARC.
static java.lang.String LOG_FILE_PATTERN
          The pattern controlling which files in the logs subdirectory of the crawl directory root should be stored in the metadata ARC as log files.
static java.lang.String REPORT_FILE_PATTERN
          The pattern controlling which files in the crawl directory root should be stored in the metadata ARC as reports.
 
Constructor Summary
MetadataFile(java.io.File heritrixFile, java.lang.Long harvestId, java.lang.Long jobId, java.lang.String heritrixVersion)
          Creates a metadata file and finds which metadata type it belongs to.
MetadataFile(java.io.File heritrixFile, java.lang.Long harvestId, java.lang.Long jobId, java.lang.String heritrixVersion, java.lang.String domain)
          Creates a metadata file for a domain-specific override file.
 
Method Summary
 int compareTo(MetadataFile other)
          First we compare the type ordinals, then the URLs.
 java.io.File getHeritrixFile()
          Returns the actual file.
 java.lang.String getUrl()
          Returns the metadata URL associated to this file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CDX_PATTERN

public static final java.lang.String CDX_PATTERN
A pattern identifying a CDX metadata entry.

See Also:
CDXDataCache.CDXDataCache(), Constant Field Values

CRAWL_LOG_PATTERN

public static final java.lang.String CRAWL_LOG_PATTERN
A pattern identifying the crawl log metadata entry.

See Also:
CrawlLogDataCache.CrawlLogDataCache(), Constant Field Values

HERITRIX_FILE_PATTERN

public static final java.lang.String HERITRIX_FILE_PATTERN
The pattern controlling which files in the crawl directory root should be stored in the metadata ARC.


REPORT_FILE_PATTERN

public static final java.lang.String REPORT_FILE_PATTERN
The pattern controlling which files in the crawl directory root should be stored in the metadata ARC as reports.


LOG_FILE_PATTERN

public static final java.lang.String LOG_FILE_PATTERN
The pattern controlling which files in the logs subdirectory of the crawl directory root should be stored in the metadata ARC as log files.


DOMAIN_SETTINGS_FILE

public static final java.lang.String DOMAIN_SETTINGS_FILE
The name of a domain-specific Heritrix settings file (a.k.a. override).

See Also:
Constant Field Values
Constructor Detail

MetadataFile

MetadataFile(java.io.File heritrixFile,
             java.lang.Long harvestId,
             java.lang.Long jobId,
             java.lang.String heritrixVersion)
Creates a metadata file and finds which metadata type it belongs to. First the name of a heritrixfile is tested against the reportfile pattern, then again the logfile pattern. If the name matches neither of these, it is considered a setup file.


MetadataFile

MetadataFile(java.io.File heritrixFile,
             java.lang.Long harvestId,
             java.lang.Long jobId,
             java.lang.String heritrixVersion,
             java.lang.String domain)
Creates a metadata file for a domain-specific override file.

Method Detail

getUrl

public java.lang.String getUrl()
Returns the metadata URL associated to this file.

Returns:
the metadata URL associated to this file.

getHeritrixFile

public java.io.File getHeritrixFile()
Returns the actual file.

Returns:
the actual file.

compareTo

public int compareTo(MetadataFile other)
First we compare the type ordinals, then the URLs.

Specified by:
compareTo in interface java.lang.Comparable<MetadataFile>