Class MetadataFile
- java.lang.Object
-
- dk.netarkivet.harvester.harvesting.metadata.MetadataFile
-
- All Implemented Interfaces:
Comparable<MetadataFile>
public class MetadataFile extends Object implements Comparable<MetadataFile>
Wraps information for an Heritrix file that should be stored in the metadata ARC.Defines a natural order to sort them.
-
-
Field Summary
Fields Modifier and Type Field Description static String
CDX_PATTERN
A pattern identifying a CDX metadata entry.static String
CRAWL_LOG_PATTERN
A pattern identifying the crawl log metadata entry.static String
DOMAIN_SETTINGS_FILE
The name of a domain-specific Heritrix settings file (a.k.a.static String
HERITRIX_FILE_PATTERN
The pattern controlling which files in the crawl directory root should be stored in the metadata ARC.static String
LOG_FILE_PATTERN
The pattern controlling which files in the logs subdirectory of the crawl directory root should be stored in the metadata ARC as log files.static String
RECOVER_LOG_PATTERN
A pattern identifying the recover.gz log metadata entry.static String
REPORT_FILE_PATTERN
The pattern controlling which files in the crawl directory root should be stored in the metadata ARC as reports.
-
Constructor Summary
Constructors Constructor Description MetadataFile(File heritrixFile, Long harvestId, Long jobId, String heritrixVersion)
Creates a metadata file and finds which metadata type it belongs to.MetadataFile(File heritrixFile, Long harvestId, Long jobId, String heritrixVersion, String domain)
Creates a metadata file for a domain-specific override file.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
compareTo(MetadataFile other)
First we compare the type ordinals, then the URLs.File
getHeritrixFile()
Returns the actual file.String
getUrl()
-
-
-
Field Detail
-
CDX_PATTERN
public static final String CDX_PATTERN
A pattern identifying a CDX metadata entry.- See Also:
CDXDataCache()
, Constant Field Values
-
CRAWL_LOG_PATTERN
public static final String CRAWL_LOG_PATTERN
A pattern identifying the crawl log metadata entry.- See Also:
CrawlLogDataCache()
, Constant Field Values
-
RECOVER_LOG_PATTERN
public static final String RECOVER_LOG_PATTERN
A pattern identifying the recover.gz log metadata entry.- See Also:
- Constant Field Values
-
HERITRIX_FILE_PATTERN
public static final String HERITRIX_FILE_PATTERN
The pattern controlling which files in the crawl directory root should be stored in the metadata ARC.
-
REPORT_FILE_PATTERN
public static final String REPORT_FILE_PATTERN
The pattern controlling which files in the crawl directory root should be stored in the metadata ARC as reports.
-
LOG_FILE_PATTERN
public static final String LOG_FILE_PATTERN
The pattern controlling which files in the logs subdirectory of the crawl directory root should be stored in the metadata ARC as log files.
-
DOMAIN_SETTINGS_FILE
public static final String DOMAIN_SETTINGS_FILE
The name of a domain-specific Heritrix settings file (a.k.a. override).- See Also:
- Constant Field Values
-
-
Constructor Detail
-
MetadataFile
public MetadataFile(File heritrixFile, Long harvestId, Long jobId, String heritrixVersion)
Creates a metadata file and finds which metadata type it belongs to. First the name of a heritrixfile is tested against the reportfile pattern, then again the logfile pattern. If the name matches neither of these, it is considered a setup file.
-
MetadataFile
public MetadataFile(File heritrixFile, Long harvestId, Long jobId, String heritrixVersion, String domain)
Creates a metadata file for a domain-specific override file.- Parameters:
heritrixFile
- a given heritrix metadata file.harvestId
- The harvestID that the job generating this file is part of.jobId
- The Id of the job generating this fileheritrixVersion
- the version of Heritrix generating the filedomain
- The name of the domain, this metadata belongs to
-
-
Method Detail
-
getUrl
public String getUrl()
- Returns:
- the metadata URL associated to this file.
-
getHeritrixFile
public File getHeritrixFile()
Returns the actual file.- Returns:
- the actual file.
-
compareTo
public int compareTo(MetadataFile other)
First we compare the type ordinals, then the URLs.- Specified by:
compareTo
in interfaceComparable<MetadataFile>
-
-