Class RawMetadataCache
- java.lang.Object
-
- dk.netarkivet.harvester.indexserver.FileBasedCache<Long>
-
- dk.netarkivet.harvester.indexserver.RawMetadataCache
-
- All Implemented Interfaces:
RawDataCache
- Direct Known Subclasses:
CDXDataCache
,CrawlLogDataCache
public class RawMetadataCache extends FileBasedCache<Long> implements RawDataCache
This is an implementation of the RawDataCache specialized for data out of metadata files. It uses regular expressions for matching URL and mime-type of ARC entries for the kind of metadata we want.
-
-
Field Summary
Fields Modifier and Type Field Description static Pattern
MATCH_ALL_PATTERN
A regular expression object that matches everything.-
Fields inherited from class dk.netarkivet.harvester.indexserver.FileBasedCache
cacheDir
-
-
Constructor Summary
Constructors Constructor Description RawMetadataCache(String prefix, Pattern urlMatcher, Pattern mimeMatcher)
Create a new RawMetadataCache.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Long
cacheData(Long id)
Actually cache data for the given ID.File
getCacheFile(Long id)
Get the file potentially containing (cached) data for a single job.-
Methods inherited from class dk.netarkivet.harvester.indexserver.FileBasedCache
cache, get, getCacheDir, getIndex
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface dk.netarkivet.harvester.indexserver.RawDataCache
cache, get
-
-
-
-
Field Detail
-
MATCH_ALL_PATTERN
public static final Pattern MATCH_ALL_PATTERN
A regular expression object that matches everything.
-
-
Constructor Detail
-
RawMetadataCache
public RawMetadataCache(String prefix, Pattern urlMatcher, Pattern mimeMatcher)
Create a new RawMetadataCache. For a given job ID, this will fetch and cache selected content from metadata files (<ID>-metadata-[0-9]+.arc). Any entry in a metadata file that matches both patterns will be returned. The returned data does not directly indicate which file they were from, though parts intrinsic to the particular format might.- Parameters:
prefix
- A prefix that will be used to distinguish this cache's files from other caches'. It will be used for creating a directory, so it must not contain characters not legal in directory names.urlMatcher
- A pattern for matching URLs of the desired entries. If null, a .* pattern will be used.mimeMatcher
- A pattern for matching mime-types of the desired entries. If null, a .* pattern will be used.
-
-
Method Detail
-
getCacheFile
public File getCacheFile(Long id)
Get the file potentially containing (cached) data for a single job.- Specified by:
getCacheFile
in classFileBasedCache<Long>
- Parameters:
id
- The job to find data for.- Returns:
- The file where cache data for the job can be stored.
- See Also:
FileBasedCache.getCacheFile(Object)
-
cacheData
protected Long cacheData(Long id)
Actually cache data for the given ID.- Specified by:
cacheData
in classFileBasedCache<Long>
- Parameters:
id
- A job ID to cache data for.- Returns:
- A File containing the data. This file will be the same as getCacheFile(ID);
- See Also:
FileBasedCache.cacheData(Object)
-
-