dk.netarkivet.archive.indexserver
Class CombiningMultiFileBasedCache<T extends java.lang.Comparable<T>>

java.lang.Object
  extended by dk.netarkivet.archive.indexserver.FileBasedCache<java.util.Set<T>>
      extended by dk.netarkivet.archive.indexserver.MultiFileBasedCache<T>
          extended by dk.netarkivet.archive.indexserver.CombiningMultiFileBasedCache<T>
Type Parameters:
T - A comparable instance. Must inherit the java.lang.Comparable interface.
Direct Known Subclasses:
CDXIndexCache, CrawlLogIndexCache

public abstract class CombiningMultiFileBasedCache<T extends java.lang.Comparable<T>>
extends MultiFileBasedCache<T>

This class provides the framework for classes that cache the effort of combining multiple files into one. For instance, creating a Lucene index out of crawl.log files takes O(nlogn) where n is the number of lines in the files combined. It is based on an underlying cache of single files. It handles the possibility of some of the files in the underlying cache not being available by telling which files are available rather than by sending an incomplete file.


Field Summary
protected  FileBasedCache<T> rawcache
          The raw data cache that this cache gets data from.
 
Fields inherited from class dk.netarkivet.archive.indexserver.FileBasedCache
cacheDir
 
Constructor Summary
protected CombiningMultiFileBasedCache(java.lang.String name, FileBasedCache<T> rawcache)
          Constructor for a CombiningMultiFileBasedCache.
 
Method Summary
protected  java.util.Set<T> cacheData(java.util.Set<T> ids)
          This is called when an appropriate file for the ids in question has not been found.
protected abstract  void combine(java.util.Map<T,java.io.File> filesFound)
          Combine a set of files found in the raw data cache to form our kind of file.
protected  java.util.Map<T,java.io.File> prepareCombine(java.util.Set<T> ids)
          Prepare needed data for performing combine().
 
Methods inherited from class dk.netarkivet.archive.indexserver.MultiFileBasedCache
getCacheFile
 
Methods inherited from class dk.netarkivet.archive.indexserver.FileBasedCache
cache, get, getCacheDir, getIndex
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

rawcache

protected FileBasedCache<T extends java.lang.Comparable<T>> rawcache
The raw data cache that this cache gets data from.

Constructor Detail

CombiningMultiFileBasedCache

protected CombiningMultiFileBasedCache(java.lang.String name,
                                       FileBasedCache<T> rawcache)
Constructor for a CombiningMultiFileBasedCache.

Parameters:
name - The name of the cache
rawcache - The underlying cache of single files.
Method Detail

cacheData

protected java.util.Set<T> cacheData(java.util.Set<T> ids)
This is called when an appropriate file for the ids in question has not been found. It is expected to do the actual operations necessary to get the data. At the outset, the file for the given IDs is expected to be not present.

Specified by:
cacheData in class FileBasedCache<java.util.Set<T extends java.lang.Comparable<T>>>
Parameters:
ids - The set of identifiers for which we want the corresponding data
Returns:
The set of IDs, or subset if data fetching failed for some IDs. If some IDs failed, the file is not filled, though some data may be cached at a lower level.

prepareCombine

protected java.util.Map<T,java.io.File> prepareCombine(java.util.Set<T> ids)
Prepare needed data for performing combine(). This should ensure that all data is ready to use, or else the ids where the data cannot be obtained should be missing in the returned set.

Parameters:
ids - Set of job IDs to get ready to combine
Returns:
The map of ID->file of the data we will combine for each ID. If subclasses override this method to ensure other data is present, jobs with missing IDs should be removed from this map.

combine

protected abstract void combine(java.util.Map<T,java.io.File> filesFound)
Combine a set of files found in the raw data cache to form our kind of file.

Parameters:
filesFound - The files that were found for the IDs in the raw data cache. The map must not contain any null values.