Class MetadataFileWriter
- java.lang.Object
-
- dk.netarkivet.harvester.harvesting.metadata.MetadataFileWriter
-
- Direct Known Subclasses:
MetadataFileWriterArc
,MetadataFileWriterWarc
public abstract class MetadataFileWriter extends Object
Abstract base class for Metadata file writer. Implementations must extend this class.- Author:
- nicl
-
-
Field Summary
Fields Modifier and Type Field Description protected static String
CDX_URI_SCHEME
Constants used in constructing URI for CDX content.static int
MDF_ARC
Constant representing the ARC format.static int
MDF_WARC
Constant representing the WARC format.protected static int
metadataFormat
Constant representing the metadata Format.
-
Constructor Summary
Constructors Constructor Description MetadataFileWriter()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description abstract void
close()
Close the metadatafile Writer.static boolean
compressRecords()
static MetadataFileWriter
createWriter(File metadataArchiveFile)
Create a writer that writes data to the given archive file.static URI
getAlternateCDXURI(long jobID, String filename)
Generates a URI identifying CDX info for one harvested ARC file.static URI
getCDXURI(String harvestID, String jobID, String filename)
Generates a URI identifying CDX info for one harvested (W)ARC file.abstract File
getFile()
static String
getMetadataArchiveFileName(String jobID, Long harvestID)
Generates a name for an archive(ARC/WARC) file containing metadata regarding a given job.protected static void
initializeMetadataFormat()
Initialize the used metadata format from settings.void
insertFiles(File parentDir, FilenameFilter filter, String mimetype, long harvestId, long jobId)
Append the files contained in the directory to the metadata archive file, but only if the filename matches the supplied filter.static void
resetMetadataFormat()
Reset the metadata format.abstract void
write(String uri, String contentType, String hostIP, long fetchBeginTimeStamp, byte[] payload)
Write a record to the archive file.abstract void
writeFileTo(File file, String uri, String mime)
Write the given file to the metadata file.abstract boolean
writeTo(File fileToArchive, String URL, String mimetype)
Writes a File to an ARCWriter, if available, otherwise logs the failure to the class-logger.
-
-
-
Field Detail
-
MDF_ARC
public static final int MDF_ARC
Constant representing the ARC format.- See Also:
- Constant Field Values
-
MDF_WARC
public static final int MDF_WARC
Constant representing the WARC format.- See Also:
- Constant Field Values
-
metadataFormat
protected static int metadataFormat
Constant representing the metadata Format. Recognized formats are either MDF_ARC or MDF_WARC
-
CDX_URI_SCHEME
protected static final String CDX_URI_SCHEME
Constants used in constructing URI for CDX content.- See Also:
- Constant Field Values
-
-
Method Detail
-
initializeMetadataFormat
protected static void initializeMetadataFormat()
Initialize the used metadata format from settings.
-
getMetadataArchiveFileName
public static String getMetadataArchiveFileName(String jobID, Long harvestID) throws ArgumentNotValid
Generates a name for an archive(ARC/WARC) file containing metadata regarding a given job.- Parameters:
jobID
- The number of the job that generated the archive file.- Returns:
- A "flat" file name (i.e. no path) containing the jobID parameter and ending on "-metadata-N.(w)arc", where N is the serial number of the metadata files for this job, e.g. "42-metadata-1.(w)arc". Currently, only one file is ever made.
- Throws:
ArgumentNotValid
- if any parameter was null.
-
createWriter
public static MetadataFileWriter createWriter(File metadataArchiveFile)
Create a writer that writes data to the given archive file.- Parameters:
metadataArchiveFile
- The archive file to write to.- Returns:
- a writer that writes data to the given archive file.
-
close
public abstract void close()
Close the metadatafile Writer.
-
getFile
public abstract File getFile()
- Returns:
- the finished metadataFile
-
writeFileTo
public abstract void writeFileTo(File file, String uri, String mime)
Write the given file to the metadata file.- Parameters:
file
- A given file with metadata to write to the metadata archive file.uri
- The uri associated with the piece of metadatamime
- The mimetype associated with the piece of metadata
-
writeTo
public abstract boolean writeTo(File fileToArchive, String URL, String mimetype)
Writes a File to an ARCWriter, if available, otherwise logs the failure to the class-logger.- Parameters:
fileToArchive
- the File to archiveURL
- the URL with which it is stored in the arcfilemimetype
- The mimetype of the File-contents- Returns:
- true, if file exists, and is written to the arcfile.
-
write
public abstract void write(String uri, String contentType, String hostIP, long fetchBeginTimeStamp, byte[] payload) throws IOException
Write a record to the archive file.- Parameters:
uri
- record URIcontentType
- content-type of recordhostIP
- resource ip-addressfetchBeginTimeStamp
- record datetimepayload
- A byte array containing the payload- Throws:
IOException
- See Also:
ARCWriter.write(String uri, String contentType, String hostIP, long fetchBeginTimeStamp, long recordLength, InputStream in)
-
insertFiles
public void insertFiles(File parentDir, FilenameFilter filter, String mimetype, long harvestId, long jobId)
Append the files contained in the directory to the metadata archive file, but only if the filename matches the supplied filter.- Parameters:
parentDir
- directory containing the files to append to metadatafilter
- filter describing which files to accept and which to ignoremimetype
- The content-type to write along with the files in the metadata outputharvestId
- The harvestId of the harvestjobId
- The jobId of the harvest
-
resetMetadataFormat
public static void resetMetadataFormat()
Reset the metadata format. Should only be used by a unittest.
-
getCDXURI
public static URI getCDXURI(String harvestID, String jobID, String filename) throws ArgumentNotValid, UnknownID
Generates a URI identifying CDX info for one harvested (W)ARC file. In Netarkivet, all of the parameters below are in the (W)ARC file's name.- Parameters:
harvestID
- The number of the harvest that generated the (W)ARC file.jobID
- The number of the job that generated the (W)ARC file.filename
- The name of the ARC or WARC file behind the cdx-data- Returns:
- A URI in the proprietary schema "metadata".
- Throws:
ArgumentNotValid
- if any parameter is null.UnknownID
- if something goes terribly wrong in our URI construction.
-
getAlternateCDXURI
public static URI getAlternateCDXURI(long jobID, String filename) throws ArgumentNotValid, UnknownID
Generates a URI identifying CDX info for one harvested ARC file.- Parameters:
jobID
- The number of the job that generated the ARC file.filename
- the filename.- Returns:
- A URI in the proprietary schema "metadata".
- Throws:
ArgumentNotValid
- if any parameter is null.UnknownID
- if something goes terribly wrong in our URI construction.
-
compressRecords
public static boolean compressRecords()
- Returns:
- true, if we want to compress out metadata records, false, if not
-
-