CrawlLogIndexCache

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

dk.netarkivet.archive.indexserver
Class CrawlLogIndexCache

java.lang.Object
  dk.netarkivet.archive.indexserver.FileBasedCache<java.util.Set<T>>
      dk.netarkivet.archive.indexserver.MultiFileBasedCache<T>
          dk.netarkivet.archive.indexserver.CombiningMultiFileBasedCache<java.lang.Long>
              dk.netarkivet.archive.indexserver.CrawlLogIndexCache

All Implemented Interfaces:: JobIndexCache

Direct Known Subclasses:: DedupCrawlLogIndexCache, FullCrawlLogIndexCache

public abstract class CrawlLogIndexCache
extends CombiningMultiFileBasedCache<java.lang.Long>
implements JobIndexCache
extends CombiningMultiFileBasedCache<java.lang.Long>
implements JobIndexCache

A cache that serves Lucene indices of crawl logs for given job IDs. Uses the DigestIndexer in the deduplicator software: http://deduplicator.sourceforge.net/apidocs/is/hi/bok/deduplicator/DigestIndexer.html Upon combination of underlying files, each file in the Lucene index is gzipped and the compressed versions are stored in the directory given by getCacheFile(). The subclass has to determine in its constructor call which mime types are included.

Field Summary

Fields inherited from class dk.netarkivet.archive.indexserver.CombiningMultiFileBasedCache
`rawcache`

Fields inherited from class dk.netarkivet.archive.indexserver.FileBasedCache
`cacheDir`

Constructor Summary
`CrawlLogIndexCache(java.lang.String name, boolean blacklist, java.lang.String mimeFilter)` Constructor for the CrawlLogIndexCache class.

Method Summary
`protected void`	`combine(java.util.Map<java.lang.Long,java.io.File> rawfiles)` Combine a number of crawl.log files into one Lucene index.
`protected java.util.Map<java.lang.Long,java.io.File>`	`prepareCombine(java.util.Set<java.lang.Long> ids)` Prepare data for combining.

Methods inherited from class dk.netarkivet.archive.indexserver.CombiningMultiFileBasedCache
`cacheData`

Methods inherited from class dk.netarkivet.archive.indexserver.MultiFileBasedCache
`getCacheFile`

Methods inherited from class dk.netarkivet.archive.indexserver.FileBasedCache
`cache, get, getCacheDir, getIndex`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Methods inherited from interface dk.netarkivet.common.distribute.indexserver.JobIndexCache
`getIndex`

Constructor Detail

CrawlLogIndexCache

public CrawlLogIndexCache(java.lang.String name,
                          boolean blacklist,
                          java.lang.String mimeFilter)

Constructor for the CrawlLogIndexCache class.

Parameters:: name - The name of the CrawlLogIndexCache; blacklist - Shall the mimefilter be considered a blacklist or a whitelist?; mimeFilter - A regular expression for the mimetypes to exclude/include

Method Detail

prepareCombine

protected java.util.Map<java.lang.Long,java.io.File> prepareCombine(java.util.Set<java.lang.Long> ids)

Prepare data for combining. This class overrides prepareCombine to make sure that CDX data is available.

Overrides:: prepareCombine in class CombiningMultiFileBasedCache<java.lang.Long>

Parameters:: ids - Set of IDs that will be combined.
Returns:: Map of ID->File of data to combine for the IDs where we could find data.

combine

protected void combine(java.util.Map<java.lang.Long,java.io.File> rawfiles)

Combine a number of crawl.log files into one Lucene index. This index is placed as gzip files under the directory returned by getCacheFile().

Specified by:: combine in class CombiningMultiFileBasedCache<java.lang.Long>

Parameters:: rawfiles - The map from job ID into crawl.log contents. No null values are allowed in this map.

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

dk.netarkivet.archive.indexserver Class CrawlLogIndexCache

CrawlLogIndexCache

prepareCombine

combine

dk.netarkivet.archive.indexserver
Class CrawlLogIndexCache