Class CombiningMultiFileBasedCache<T extends Comparable<T>>
- java.lang.Object
-
- dk.netarkivet.harvester.indexserver.FileBasedCache<Set<T>>
-
- dk.netarkivet.harvester.indexserver.MultiFileBasedCache<T>
-
- dk.netarkivet.harvester.indexserver.CombiningMultiFileBasedCache<T>
-
- Type Parameters:
T
- A comparable instance. Must inherit the java.lang.Comparable interface.
- Direct Known Subclasses:
CDXIndexCache
,CrawlLogIndexCache
public abstract class CombiningMultiFileBasedCache<T extends Comparable<T>> extends MultiFileBasedCache<T>
This class provides the framework for classes that cache the effort of combining multiple files into one. For instance, creating a Lucene index out of crawl.log files takes O(nlogn) where n is the number of lines in the files combined.It is based on an underlying cache of single files. It handles the possibility of some of the files in the underlying cache not being available by telling which files are available rather than by sending an incomplete file.
-
-
Field Summary
Fields Modifier and Type Field Description protected FileBasedCache<T>
rawcache
The raw data cache that this cache gets data from.-
Fields inherited from class dk.netarkivet.harvester.indexserver.FileBasedCache
cacheDir
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
CombiningMultiFileBasedCache(String name, FileBasedCache<T> rawcache)
Constructor for a CombiningMultiFileBasedCache.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected Set<T>
cacheData(Set<T> ids)
This is called when an appropriate file for the ids in question has not been found.protected abstract void
combine(Map<T,File> filesFound)
Combine a set of files found in the raw data cache to form our kind of file.protected Map<T,File>
prepareCombine(Set<T> ids)
Prepare needed data for performing combine().-
Methods inherited from class dk.netarkivet.harvester.indexserver.MultiFileBasedCache
getCacheFile
-
Methods inherited from class dk.netarkivet.harvester.indexserver.FileBasedCache
cache, get, getCacheDir, getIndex
-
-
-
-
Field Detail
-
rawcache
protected FileBasedCache<T extends Comparable<T>> rawcache
The raw data cache that this cache gets data from.
-
-
Constructor Detail
-
CombiningMultiFileBasedCache
protected CombiningMultiFileBasedCache(String name, FileBasedCache<T> rawcache)
Constructor for a CombiningMultiFileBasedCache.- Parameters:
name
- The name of the cacherawcache
- The underlying cache of single files.
-
-
Method Detail
-
cacheData
protected Set<T> cacheData(Set<T> ids)
This is called when an appropriate file for the ids in question has not been found. It is expected to do the actual operations necessary to get the data. At the outset, the file for the given IDs is expected to be not present.- Specified by:
cacheData
in classFileBasedCache<Set<T extends Comparable<T>>>
- Parameters:
ids
- The set of identifiers for which we want the corresponding data- Returns:
- The set of IDs, or subset if data fetching failed for some IDs. If some IDs failed, the file is not filled, though some data may be cached at a lower level.
-
prepareCombine
protected Map<T,File> prepareCombine(Set<T> ids)
Prepare needed data for performing combine(). This should ensure that all data is ready to use, or else the ids where the data cannot be obtained should be missing in the returned set.- Parameters:
ids
- Set of job IDs to get ready to combine- Returns:
- The map of ID->file of the data we will combine for each ID. If subclasses override this method to ensure other data is present, jobs with missing IDs should be removed from this map.
-
-