Package dk.netarkivet.wayback
Class DeduplicateToCDXApplication
- java.lang.Object
-
- dk.netarkivet.wayback.DeduplicateToCDXApplication
-
public class DeduplicateToCDXApplication extends java.lang.Object
A simple command line application to generate cdx files from local crawl-log files.
-
-
Constructor Summary
Constructors Constructor Description DeduplicateToCDXApplication()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
generateCDX(java.lang.String[] localCrawlLogs)
Takes an array of file names (relative or full paths) of crawl.log files from which duplicate records are to be extracted.static void
main(java.lang.String[] args)
An application to generate unsorted cdx files from duplicate records present in a crawl.log file.
-
-
-
Constructor Detail
-
DeduplicateToCDXApplication
public DeduplicateToCDXApplication()
-
-
Method Detail
-
generateCDX
public void generateCDX(java.lang.String[] localCrawlLogs) throws java.io.IOException
Takes an array of file names (relative or full paths) of crawl.log files from which duplicate records are to be extracted. Writes the concatenated cdx files of all duplicate records in these files to standard out. An exception will be thrown if any of the files cannot be read for any reason or if the argument is null- Parameters:
localCrawlLogs
- a list of file names- Throws:
java.io.FileNotFoundException
- if one of the files cannot be foundjava.io.IOException
-
main
public static void main(java.lang.String[] args) throws java.io.IOException
An application to generate unsorted cdx files from duplicate records present in a crawl.log file. The only parameters are a list of file-paths. Output is written to standard out.- Parameters:
args
- the file names (relative or absolute paths)- Throws:
java.io.FileNotFoundException
- if one or more of the files does not existjava.io.IOException
-
-