Class CrawlDataIterator

  • Direct Known Subclasses:
    CrawlLogIterator

    public abstract class CrawlDataIterator
    extends Object
    An abstract base class for implementations of iterators that iterate over different sets of crawl data (i.e. crawl.log, ARC, WARC etc.)
    Author:
    Kristinn Sigurðsson
    • Constructor Detail

      • CrawlDataIterator

        public CrawlDataIterator​(String source)
        Constructor.
        Parameters:
        source - The location of the crawl data. The meaning of this value may vary based on the implementation of concrete subclasses. Typically it will refer to a directory or a file.
    • Method Detail

      • hasNext

        public abstract boolean hasNext()
                                 throws IOException
        Are there more elements?
        Returns:
        true if there are more elements, false otherwise
        Throws:
        IOException - If an error occurs accessing the crawl data.
      • next

        public abstract CrawlDataItem next()
                                    throws IOException
        Get the next CrawlDataItem.
        Returns:
        the next CrawlDataItem. If there are no further elements then null will be returned.
        Throws:
        IOException - If an error occurs accessing the crawl data.
      • close

        public abstract void close()
                            throws IOException
        Close any resources held open to read the crawl data.
        Throws:
        IOException - If an error occurs closing access to crawl data.
      • getSourceType

        public abstract String getSourceType()
        A short, human readable, string about what source this iterator uses. I.e. "Iterator for Heritrix style crawl.log" etc.
        Returns:
        A short, human readable, string about what source this iterator uses.