Class MetadataFileWriter

    • Field Detail

      • MDF_ARC

        public static final int MDF_ARC
        Constant representing the ARC format.
        See Also:
        Constant Field Values
      • MDF_WARC

        public static final int MDF_WARC
        Constant representing the WARC format.
        See Also:
        Constant Field Values
      • metadataFormat

        protected static int metadataFormat
        Constant representing the metadata Format. Recognized formats are either MDF_ARC or MDF_WARC
      • CDX_URI_SCHEME

        protected static final String CDX_URI_SCHEME
        Constants used in constructing URI for CDX content.
        See Also:
        Constant Field Values
    • Constructor Detail

      • MetadataFileWriter

        public MetadataFileWriter()
    • Method Detail

      • initializeMetadataFormat

        protected static void initializeMetadataFormat()
        Initialize the used metadata format from settings.
      • getMetadataArchiveFileName

        public static String getMetadataArchiveFileName​(String jobID,
                                                        Long harvestID)
                                                 throws ArgumentNotValid
        Generates a name for an archive(ARC/WARC) file containing metadata regarding a given job.
        Parameters:
        jobID - The number of the job that generated the archive file.
        Returns:
        A "flat" file name (i.e. no path) containing the jobID parameter and ending on "-metadata-N.(w)arc", where N is the serial number of the metadata files for this job, e.g. "42-metadata-1.(w)arc". Currently, only one file is ever made.
        Throws:
        ArgumentNotValid - if any parameter was null.
      • createWriter

        public static MetadataFileWriter createWriter​(File metadataArchiveFile)
        Create a writer that writes data to the given archive file.
        Parameters:
        metadataArchiveFile - The archive file to write to.
        Returns:
        a writer that writes data to the given archive file.
      • close

        public abstract void close()
        Close the metadatafile Writer.
      • getFile

        public abstract File getFile()
        Returns:
        the finished metadataFile
      • writeFileTo

        public abstract void writeFileTo​(File file,
                                         String uri,
                                         String mime)
        Write the given file to the metadata file.
        Parameters:
        file - A given file with metadata to write to the metadata archive file.
        uri - The uri associated with the piece of metadata
        mime - The mimetype associated with the piece of metadata
      • writeTo

        public abstract boolean writeTo​(File fileToArchive,
                                        String URL,
                                        String mimetype)
        Writes a File to an ARCWriter, if available, otherwise logs the failure to the class-logger.
        Parameters:
        fileToArchive - the File to archive
        URL - the URL with which it is stored in the arcfile
        mimetype - The mimetype of the File-contents
        Returns:
        true, if file exists, and is written to the arcfile.
      • write

        public abstract void write​(String uri,
                                   String contentType,
                                   String hostIP,
                                   long fetchBeginTimeStamp,
                                   byte[] payload)
                            throws IOException
        Write a record to the archive file.
        Parameters:
        uri - record URI
        contentType - content-type of record
        hostIP - resource ip-address
        fetchBeginTimeStamp - record datetime
        payload - A byte array containing the payload
        Throws:
        IOException
        See Also:
        ARCWriter.write(String uri, String contentType, String hostIP, long fetchBeginTimeStamp, long recordLength, InputStream in)
      • insertFiles

        public void insertFiles​(File parentDir,
                                FilenameFilter filter,
                                String mimetype,
                                long harvestId,
                                long jobId)
        Append the files contained in the directory to the metadata archive file, but only if the filename matches the supplied filter.
        Parameters:
        parentDir - directory containing the files to append to metadata
        filter - filter describing which files to accept and which to ignore
        mimetype - The content-type to write along with the files in the metadata output
        harvestId - The harvestId of the harvest
        jobId - The jobId of the harvest
      • resetMetadataFormat

        public static void resetMetadataFormat()
        Reset the metadata format. Should only be used by a unittest.
      • getCDXURI

        public static URI getCDXURI​(String harvestID,
                                    String jobID,
                                    String filename)
                             throws ArgumentNotValid,
                                    UnknownID
        Generates a URI identifying CDX info for one harvested (W)ARC file. In Netarkivet, all of the parameters below are in the (W)ARC file's name.
        Parameters:
        harvestID - The number of the harvest that generated the (W)ARC file.
        jobID - The number of the job that generated the (W)ARC file.
        filename - The name of the ARC or WARC file behind the cdx-data
        Returns:
        A URI in the proprietary schema "metadata".
        Throws:
        ArgumentNotValid - if any parameter is null.
        UnknownID - if something goes terribly wrong in our URI construction.
      • getAlternateCDXURI

        public static URI getAlternateCDXURI​(long jobID,
                                             String filename)
                                      throws ArgumentNotValid,
                                             UnknownID
        Generates a URI identifying CDX info for one harvested ARC file.
        Parameters:
        jobID - The number of the job that generated the ARC file.
        filename - the filename.
        Returns:
        A URI in the proprietary schema "metadata".
        Throws:
        ArgumentNotValid - if any parameter is null.
        UnknownID - if something goes terribly wrong in our URI construction.
      • compressRecords

        public static boolean compressRecords()
        Returns:
        true, if we want to compress out metadata records, false, if not