dk.netarkivet.archive.indexserver
Class CombiningMultiFileBasedCache<T extends java.lang.Comparable<T>>

java.lang.Object
  extended by dk.netarkivet.archive.indexserver.FileBasedCache<java.util.Set<T>>
      extended by dk.netarkivet.archive.indexserver.MultiFileBasedCache<T>
          extended by dk.netarkivet.archive.indexserver.CombiningMultiFileBasedCache<T>
Direct Known Subclasses:
CDXIndexCache, CrawlLogIndexCache

public abstract class CombiningMultiFileBasedCache<T extends java.lang.Comparable<T>>
extends MultiFileBasedCache<T>

This class provides the framework for classes that cache the effort of combining multiple files into one. For instance, creating a Lucene index out of crawl.log files takes O(nlogn) where n is the number of lines in the files combined. It is based on an underlying cache of single files. It handles the possibility of some of the files in the underlying cache not being available by telling which files are available rather than by sending an incomplete file.


Field Summary
(package private)  FileBasedCache<T> rawcache
          The raw data cache that this cache gets data from.
 
Fields inherited from class dk.netarkivet.archive.indexserver.FileBasedCache
cacheDir
 
Constructor Summary
protected CombiningMultiFileBasedCache(java.lang.String name, FileBasedCache<T> rawcache)
          Constructor for a CombiningMultiFileBasedCache
 
Method Summary
protected  java.util.Set<T> cacheData(java.util.Set<T> ids)
          This is called when an appropriate file for the ids in question has not been found.
protected abstract  void combine(java.util.Map<T,java.io.File> filesFound)
          Combine a set of files found in the raw data cache to form our kind of file.
protected  java.util.Map<T,java.io.File> prepareCombine(java.util.Set<T> ids)
          Prepare needed data for performing combine().
 
Methods inherited from class dk.netarkivet.archive.indexserver.MultiFileBasedCache
getCacheFile
 
Methods inherited from class dk.netarkivet.archive.indexserver.FileBasedCache
cache, get, getCacheDir, getIndex
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

rawcache

FileBasedCache<T extends java.lang.Comparable<T>> rawcache
The raw data cache that this cache gets data from.

Constructor Detail

CombiningMultiFileBasedCache

protected CombiningMultiFileBasedCache(java.lang.String name,
                                       FileBasedCache<T> rawcache)
Constructor for a CombiningMultiFileBasedCache

Parameters:
rawcache - The underlying cache of single files.
Method Detail

cacheData

protected java.util.Set<T> cacheData(java.util.Set<T> ids)
This is called when an appropriate file for the ids in question has not been found. It is expected to do the actual operations necessary to get the data. At the outset, the file for the given IDs is expected to be not present.

Specified by:
cacheData in class FileBasedCache<java.util.Set<T extends java.lang.Comparable<T>>>
Parameters:
ids -
Returns:
The set of IDs, or subset if data fetching failed for some IDs. If some IDs failed, the file is not filled, though some data may be cached at a lower level.

prepareCombine

protected java.util.Map<T,java.io.File> prepareCombine(java.util.Set<T> ids)
Prepare needed data for performing combine(). This should ensure that all data is ready to use, or else the ids where the data cannot be obtained should be missing in the returned set.

Parameters:
ids - Set of job IDs to get ready to combine
Returns:
The map of ID->file of the data we will combine for each ID. If subclasses override this method to ensure other data is present, jobs with missing IDs should be removed from this map.

combine

protected abstract void combine(java.util.Map<T,java.io.File> filesFound)
Combine a set of files found in the raw data cache to form our kind of file.

Parameters:
filesFound - The files that were found for the IDs in the raw data cache. The map must not contain any null values.