dk.netarkivet.archive.indexserver
Class CombiningMultiFileBasedCache<T extends java.lang.Comparable<T>>
java.lang.Object
dk.netarkivet.archive.indexserver.FileBasedCache<java.util.Set<T>>
dk.netarkivet.archive.indexserver.MultiFileBasedCache<T>
dk.netarkivet.archive.indexserver.CombiningMultiFileBasedCache<T>
- Type Parameters:
T
- A comparable instance. Must inherit the
java.lang.Comparable interface.
- Direct Known Subclasses:
- CDXIndexCache, CrawlLogIndexCache
public abstract class CombiningMultiFileBasedCache<T extends java.lang.Comparable<T>>
- extends MultiFileBasedCache<T>
This class provides the framework for classes that cache the effort of
combining multiple files into one. For instance, creating a Lucene index
out of crawl.log files takes O(nlogn) where n is the number of lines in
the files combined.
It is based on an underlying cache of single files.
It handles the possibility of some of the files in the underlying cache
not being available by telling which files are available rather than by
sending an incomplete file.
Method Summary |
protected java.util.Set<T> |
cacheData(java.util.Set<T> ids)
This is called when an appropriate file for the ids in question
has not been found. |
protected abstract void |
combine(java.util.Map<T,java.io.File> filesFound)
Combine a set of files found in the raw data cache to form our
kind of file. |
protected java.util.Map<T,java.io.File> |
prepareCombine(java.util.Set<T> ids)
Prepare needed data for performing combine(). |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
rawcache
protected FileBasedCache<T extends java.lang.Comparable<T>> rawcache
- The raw data cache that this cache gets data from.
CombiningMultiFileBasedCache
protected CombiningMultiFileBasedCache(java.lang.String name,
FileBasedCache<T> rawcache)
- Constructor for a CombiningMultiFileBasedCache.
- Parameters:
name
- The name of the cacherawcache
- The underlying cache of single files.
cacheData
protected java.util.Set<T> cacheData(java.util.Set<T> ids)
- This is called when an appropriate file for the ids in question
has not been found. It is expected to do the actual operations
necessary to get the data. At the outset, the file for the given
IDs is expected to be not present.
- Specified by:
cacheData
in class FileBasedCache<java.util.Set<T extends java.lang.Comparable<T>>>
- Parameters:
ids
- The set of identifiers for which we want the corresponding
data
- Returns:
- The set of IDs, or subset if data fetching failed for some IDs.
If some IDs failed, the file is not filled, though some data may be
cached at a lower level.
prepareCombine
protected java.util.Map<T,java.io.File> prepareCombine(java.util.Set<T> ids)
- Prepare needed data for performing combine(). This should ensure that
all data is ready to use, or else the ids where the data cannot be
obtained should be missing in the returned set.
- Parameters:
ids
- Set of job IDs to get ready to combine
- Returns:
- The map of ID->file of the data we will combine for each ID.
If subclasses override this method to ensure other data is present,
jobs with missing IDs should be removed from this map.
combine
protected abstract void combine(java.util.Map<T,java.io.File> filesFound)
- Combine a set of files found in the raw data cache to form our
kind of file.
- Parameters:
filesFound
- The files that were found for the IDs in the raw
data cache. The map must not contain any null values.