dk.netarkivet.wayback
Class DeduplicateToCDXApplication

java.lang.Object
  extended by dk.netarkivet.wayback.DeduplicateToCDXApplication

public class DeduplicateToCDXApplication
extends java.lang.Object

A simple command line application to generate cdx files from local crawl-log files.


Constructor Summary
DeduplicateToCDXApplication()
           
 
Method Summary
 void generateCDX(java.lang.String[] localCrawlLogs)
          Takes an array of file names (relative or full paths) of crawl.log files from which duplicate records are to be extracted.
static void main(java.lang.String[] args)
          An application to generate unsorted cdx files from duplicate records present in a crawl.log file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DeduplicateToCDXApplication

public DeduplicateToCDXApplication()
Method Detail

generateCDX

public void generateCDX(java.lang.String[] localCrawlLogs)
                 throws java.io.IOException
Takes an array of file names (relative or full paths) of crawl.log files from which duplicate records are to be extracted. Writes the concatenated cdx files of all duplicate records in these files to standard out. An exception will be thrown if any of the files cannot be read for any reason or if the argument is null

Parameters:
localCrawlLogs - a list of file names
Throws:
java.io.FileNotFoundException - if one of the files cannot be found
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.io.IOException
An application to generate unsorted cdx files from duplicate records present in a crawl.log file. The only parameters are a list of file-paths. Output is written to standard out.

Parameters:
args - the file names (relative or absolute paths)
Throws:
java.io.FileNotFoundException - if one or more of the files does not exist
java.io.IOException