dk.netarkivet.archive.indexserver
Class RawMetadataCache

java.lang.Object
  extended by dk.netarkivet.archive.indexserver.FileBasedCache<java.lang.Long>
      extended by dk.netarkivet.archive.indexserver.RawMetadataCache
All Implemented Interfaces:
RawDataCache
Direct Known Subclasses:
CDXDataCache, CrawlLogDataCache

public class RawMetadataCache
extends FileBasedCache<java.lang.Long>
implements RawDataCache

This is an implementation of the RawDataCache specialized for data out of metadata files. It uses regular expressions for matching URL and mime-type of ARC entries for the kind of metadata we want.


Field Summary
 
Fields inherited from class dk.netarkivet.archive.indexserver.FileBasedCache
cacheDir
 
Constructor Summary
RawMetadataCache(java.lang.String prefix, java.util.regex.Pattern urlMatcher, java.util.regex.Pattern mimeMatcher)
          Create a new RawMetadataCache.
 
Method Summary
protected  java.lang.Long cacheData(java.lang.Long id)
          Actually cache data for the given ID.
 java.io.File getCacheFile(java.lang.Long id)
          Get the file potentially containing (cached) data for a single job.
 
Methods inherited from class dk.netarkivet.archive.indexserver.FileBasedCache
cache, get, getCacheDir, getIndex
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface dk.netarkivet.archive.indexserver.RawDataCache
cache, get
 

Constructor Detail

RawMetadataCache

public RawMetadataCache(java.lang.String prefix,
                        java.util.regex.Pattern urlMatcher,
                        java.util.regex.Pattern mimeMatcher)
Create a new RawMetadataCache. For a given job ID, this will fetch and cache selected content from metadata files (<ID>-metadata-[0-9]+.arc). Any entry in a metadata file that matches both patterns will be returned. The returned data does not directly indicate which file they were from, though parts intrinsic to the particular format might.

Parameters:
prefix - A prefix that will be used to distinguish this cache's files from other caches'. It will be used for creating a directory, so it must not contain characters not legal in directory names.
urlMatcher - A pattern for matching URLs of the desired entries. If null, a .* pattern will be used.
mimeMatcher - A pattern for matching mime-types of the desired entries. If null, a .* pattern will be used.
Method Detail

getCacheFile

public java.io.File getCacheFile(java.lang.Long id)
Get the file potentially containing (cached) data for a single job.

Specified by:
getCacheFile in class FileBasedCache<java.lang.Long>
Parameters:
id - The job to find data for.
Returns:
The file where cache data for the job can be stored.
See Also:
FileBasedCache.getCacheFile(Object)

cacheData

protected java.lang.Long cacheData(java.lang.Long id)
Actually cache data for the given ID.

Specified by:
cacheData in class FileBasedCache<java.lang.Long>
Parameters:
id - A job ID to cache data for.
Returns:
A File containing the data. This file will be the same as getCacheFile(ID);
See Also:
FileBasedCache.cacheData(Object)