java.lang.Object
- dk.netarkivet.common.utils.batch.FileBatchJob
- - dk.netarkivet.common.utils.warc.WARCBatchJob
  - - dk.netarkivet.common.utils.cdx.WARCExtractCDXJob

All Implemented Interfaces:

Serializable
```
public class WARCExtractCDXJob
extends WARCBatchJob
```
Batch job that extracts information to create a CDX file.
A CDX file contains sorted lines of metadata from the WARC files, with each line followed by the file and offset the record was found at, and optionally a checksum. The timeout of this job is 7 days. See http://www.archive.org/web/researcher/cdx_file_format.php

See Also:

Serialized Form

Nested Class Summary
- Nested classes/interfaces inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
  FileBatchJob.ExceptionOccurrence

Field Summary
- Fields inherited from class dk.netarkivet.common.utils.warc.WARCBatchJob
  noOfRecordsProcessed
- Fields inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
  batchJobTimeout, exceptions, filesFailed, noOfFilesProcessed

Constructor Summary

Constructors
Constructor	Description
`WARCExtractCDXJob()`	Equivalent to WARCExtractCDXJob(true).
`WARCExtractCDXJob(boolean includeChecksum)`	Constructs a new job for extracting CDX indexes.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`void`	`finish(OutputStream os)`	End of the batch job.
`WARCBatchFilter`	`getFilter()`	Filters out the NON-RESPONSE records.
`void`	`initialize(OutputStream os)`	Initialize any data needed (none).
`void`	`processRecord(org.archive.io.warc.WARCRecord sar, OutputStream os)`	Process this entry, reading metadata into the output stream.
`String`	`toString()`

Methods inherited from class dk.netarkivet.common.utils.warc.WARCBatchJob
getExceptionArray, handleException, noOfRecordsProcessed, processFile

Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob
addException, addFinishException, addInitializeException, getBatchJobTimeout, getExceptions, getFilenamePattern, getFilesFailed, getNoOfFilesProcessed, maxExceptionsReached, postProcess, processOnlyFileNamed, processOnlyFilesMatching, processOnlyFilesMatching, processOnlyFilesNamed, setBatchJobTimeout

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - WARCExtractCDXJob
```
public WARCExtractCDXJob(boolean includeChecksum)
```
    Constructs a new job for extracting CDX indexes.
    
    Parameters:
    
    includeChecksum - If true, an MD5 checksum is also written for each record. If false, it is not.
  - WARCExtractCDXJob
```
public WARCExtractCDXJob()
```
    Equivalent to WARCExtractCDXJob(true).
- Method Detail
  - getFilter
```
public WARCBatchFilter getFilter()
```
    Filters out the NON-RESPONSE records.
    
    Overrides:
    
    getFilter in class WARCBatchJob
    
    Returns:
    
    The filter that defines what WARC records are wanted in the output CDX file.
    
    See Also:
    
    WARCBatchJob.getFilter()
  - initialize
```
public void initialize(OutputStream os)
```
    Initialize any data needed (none).
    
    Specified by:
    
    initialize in class WARCBatchJob
    
    Parameters:
    
    os - The OutputStream to which output data is written
    
    See Also:
    
    WARCBatchJob.initialize(OutputStream)
  - processRecord
```
public void processRecord(org.archive.io.warc.WARCRecord sar,
                          OutputStream os)
```
    Process this entry, reading metadata into the output stream.
    
    Specified by:
    
    processRecord in class WARCBatchJob
    
    Parameters:
    
    sar - the object to be processed.
    
    os - The OutputStream to which output data is written
    
    Throws:
    
    IOFailure - on trouble reading WARC record data
    
    See Also:
    
    WARCBatchJob.processRecord(WARCRecord, OutputStream)
  - finish
```
public void finish(OutputStream os)
```
    End of the batch job.
    
    Specified by:
    
    finish in class WARCBatchJob
    
    Parameters:
    
    os - The OutputStream to which output data is written
    
    See Also:
    
    WARCBatchJob.finish(OutputStream)
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object
    
    Returns:
    
    Humanly readable description of this instance.

Class WARCExtractCDXJob

Nested Class Summary

Nested classes/interfaces inherited from class dk.netarkivet.common.utils.batch.FileBatchJob

Field Summary

Fields inherited from class dk.netarkivet.common.utils.warc.WARCBatchJob

Fields inherited from class dk.netarkivet.common.utils.batch.FileBatchJob

Constructor Summary

Method Summary

Methods inherited from class dk.netarkivet.common.utils.warc.WARCBatchJob

Methods inherited from class dk.netarkivet.common.utils.batch.FileBatchJob

Methods inherited from class java.lang.Object

Constructor Detail

WARCExtractCDXJob

WARCExtractCDXJob

Method Detail

getFilter

initialize

processRecord

finish

toString