Package is.hi.bok.deduplicator
Class CrawlDataIterator
- java.lang.Object
-
- is.hi.bok.deduplicator.CrawlDataIterator
-
- Direct Known Subclasses:
CrawlLogIterator
public abstract class CrawlDataIterator extends Object
An abstract base class for implementations of iterators that iterate over different sets of crawl data (i.e. crawl.log, ARC, WARC etc.)- Author:
- Kristinn Sigurðsson
-
-
Constructor Summary
Constructors Constructor Description CrawlDataIterator(String source)
Constructor.
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description abstract void
close()
Close any resources held open to read the crawl data.abstract String
getSourceType()
A short, human readable, string about what source this iterator uses.abstract boolean
hasNext()
Are there more elements?abstract CrawlDataItem
next()
Get the nextCrawlDataItem
.
-
-
-
Constructor Detail
-
CrawlDataIterator
public CrawlDataIterator(String source)
Constructor.- Parameters:
source
- The location of the crawl data. The meaning of this value may vary based on the implementation of concrete subclasses. Typically it will refer to a directory or a file.
-
-
Method Detail
-
hasNext
public abstract boolean hasNext() throws IOException
Are there more elements?- Returns:
- true if there are more elements, false otherwise
- Throws:
IOException
- If an error occurs accessing the crawl data.
-
next
public abstract CrawlDataItem next() throws IOException
Get the nextCrawlDataItem
.- Returns:
- the next CrawlDataItem. If there are no further elements then null will be returned.
- Throws:
IOException
- If an error occurs accessing the crawl data.
-
close
public abstract void close() throws IOException
Close any resources held open to read the crawl data.- Throws:
IOException
- If an error occurs closing access to crawl data.
-
getSourceType
public abstract String getSourceType()
A short, human readable, string about what source this iterator uses. I.e. "Iterator for Heritrix style crawl.log" etc.- Returns:
- A short, human readable, string about what source this iterator uses.
-
-