dk.netarkivet.archive.indexserver
Class RawMetadataCache
java.lang.Object
dk.netarkivet.archive.indexserver.FileBasedCache<java.lang.Long>
dk.netarkivet.archive.indexserver.RawMetadataCache
- All Implemented Interfaces:
- RawDataCache
- Direct Known Subclasses:
- CDXDataCache, CrawlLogDataCache
public class RawMetadataCache
- extends FileBasedCache<java.lang.Long>
- implements RawDataCache
This is an implementation of the RawDataCache specialized for data out
of metadata files. It uses regular expressions for matching URL and
mime-type of ARC entries for the kind of metadata we want.
Constructor Summary |
RawMetadataCache(java.lang.String prefix,
java.util.regex.Pattern urlMatcher,
java.util.regex.Pattern mimeMatcher)
Create a new RawMetadataCache. |
Method Summary |
protected java.lang.Long |
cacheData(java.lang.Long id)
Actually cache data for the given ID. |
java.io.File |
getCacheFile(java.lang.Long id)
Get the file potentially containing (cached) data for a single job. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RawMetadataCache
public RawMetadataCache(java.lang.String prefix,
java.util.regex.Pattern urlMatcher,
java.util.regex.Pattern mimeMatcher)
- Create a new RawMetadataCache. For a given job ID, this will fetch
and cache selected content from metadata files
(<ID>-metadata-[0-9]+.arc). Any entry in a metadata file that
matches both patterns will be returned. The returned data does not
directly indicate which file they were from, though parts intrinsic to
the particular format might.
- Parameters:
prefix
- A prefix that will be used to distinguish this cache's
files from other caches'. It will be used for creating a directory,
so it must not contain characters not legal in directory names.urlMatcher
- A pattern for matching URLs of the desired entries.
If null, a .* pattern will be used.mimeMatcher
- A pattern for matching mime-types of the desired
entries. If null, a .* pattern will be used.
getCacheFile
public java.io.File getCacheFile(java.lang.Long id)
- Get the file potentially containing (cached) data for a single job.
- Specified by:
getCacheFile
in class FileBasedCache<java.lang.Long>
- Parameters:
id
- The job to find data for.
- Returns:
- The file where cache data for the job can be stored.
- See Also:
FileBasedCache.getCacheFile(Object)
cacheData
protected java.lang.Long cacheData(java.lang.Long id)
- Actually cache data for the given ID.
- Specified by:
cacheData
in class FileBasedCache<java.lang.Long>
- Parameters:
id
- A job ID to cache data for.
- Returns:
- A File containing the data. This file will be the same as
getCacheFile(ID);
- See Also:
FileBasedCache.cacheData(Object)