public class CrawlLogIterator extends CrawlDataIterator
CrawlDataIterator
capable of iterating over a Heritrix's style
crawl.log
.Modifier and Type | Field and Description |
---|---|
protected SimpleDateFormat |
crawlDataItemFormat
The date format specified by the
CrawlDataItem for dates entered into it (and eventually into the index) |
protected SimpleDateFormat |
crawlDateFormat
The date format used in crawl.log files.
|
protected String |
crawlDateFormatStr |
protected SimpleDateFormat |
fallbackCrawlDateFormat |
protected String |
fallbackCrawlDateFormatStr |
protected BufferedReader |
in
A reader for the crawl.log file being processed
|
protected CrawlDataItem |
next
The next item to be issued (if ready) or null if the next item has not been prepared or there are no more
elements
|
Constructor and Description |
---|
CrawlLogIterator(String source)
Create a new CrawlLogIterator that reads items from a Heritrix crawl.log
|
Modifier and Type | Method and Description |
---|---|
void |
close()
Closes the crawl.log file.
|
String |
getSourceType()
A short, human readable, string about what source this iterator uses.
|
boolean |
hasNext()
Returns true if there are more items available.
|
CrawlDataItem |
next()
Returns the next valid item from the crawl log.
|
protected CrawlDataItem |
parseLine(String line)
Parse the a line in the crawl log.
|
protected void |
prepareNext()
Ready the next item.
|
protected final String crawlDateFormatStr
protected final String fallbackCrawlDateFormatStr
protected final SimpleDateFormat crawlDateFormat
protected final SimpleDateFormat fallbackCrawlDateFormat
protected final SimpleDateFormat crawlDataItemFormat
CrawlDataItem
for dates entered into it (and eventually into the index)protected BufferedReader in
protected CrawlDataItem next
public CrawlLogIterator(String source) throws IOException
source
- The path of a Heritrix crawl.log file.IOException
- If errors were found reading the log.public boolean hasNext() throws IOException
hasNext
in class CrawlDataIterator
IOException
- If an error occurs accessing the crawl data.public CrawlDataItem next() throws IOException
next
in class CrawlDataIterator
IOException
- If there is an error reading the item *after* the item to be returned from the crawl.log.NoSuchElementException
- If there are no more itemsprotected void prepareNext() throws IOException
Note: This method should only be called when next==null
IOException
protected CrawlDataItem parseLine(String line)
Override this method to change how individual crawl log items are processed and accepted/rejected. This method is called from within the loop in prepareNext().
line
- A line from the crawl log. Must not be null.CrawlDataItem
if the next line in the crawl log yielded a usable item, null otherwise.public void close() throws IOException
close
in class CrawlDataIterator
IOException
- If an error occurs closing access to crawl data.public String getSourceType()
CrawlDataIterator
getSourceType
in class CrawlDataIterator
Copyright © 2005–2018 The Royal Danish Library, the National Library of France and the Austrian National Library.. All rights reserved.