Package dk.netarkivet.common.utils.cdx
Class CDXUtils
- java.lang.Object
-
- dk.netarkivet.common.utils.cdx.CDXUtils
-
public class CDXUtils extends Object
Utility class for creating CDX-files. The CDX-format is described here: http://www.archive.org/web/researcher/cdx_file_format.php
-
-
Constructor Summary
Constructors Constructor Description CDXUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
generateCDX(ArchiveProfile archiveProfile, File archiveFileDirectory, File cdxFileDirectory)
Applies createCDXRecord() to all ARC/WARC files in a directory, creating one CDX file per ARC/WARC file.static void
writeCDXInfo(File archivefile, OutputStream cdxstream)
Add cdx info for a given archive file to a given OutputStream.
-
-
-
Method Detail
-
writeCDXInfo
public static void writeCDXInfo(File archivefile, OutputStream cdxstream)
Add cdx info for a given archive file to a given OutputStream. Note, any exceptions are logged on level FINE but otherwise ignored.- Parameters:
archivefile
- A file with archive recordscdxstream
- An output stream to add CDX lines to
-
generateCDX
public static void generateCDX(ArchiveProfile archiveProfile, File archiveFileDirectory, File cdxFileDirectory) throws ArgumentNotValid
Applies createCDXRecord() to all ARC/WARC files in a directory, creating one CDX file per ARC/WARC file. Note, any exceptions during index generation are logged at level FINE but otherwise ignored. Exceptions creating any cdx file are logged at level WARNING but otherwise ignored. CDX files are named as the ARC/WARC files except ".(w)arc" or ".(w)arc.gz" is extended with ".cdx"- Parameters:
archiveProfile
- archive profile including filters, patterns, etc.archiveFileDirectory
- A directory with archive files to generate index forcdxFileDirectory
- A directory to generate CDX files in- Throws:
ArgumentNotValid
- if any of directories are null or is not an existing directory, or if cdxFileDirectory is not writable.
-
-