Class MetadataFileWriter

    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static java.lang.String CDX_URI_SCHEME
      Constants used in constructing URI for CDX content.
      static int MDF_ARC
      Constant representing the ARC format.
      static int MDF_WARC
      Constant representing the WARC format.
      protected static int metadataFormat
      Constant representing the metadata Format.
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      abstract void close()
      Close the metadatafile Writer.
      static boolean compressRecords()  
      static MetadataFileWriter createWriter​(java.io.File metadataArchiveFile)
      Create a writer that writes data to the given archive file.
      static java.net.URI getAlternateCDXURI​(long jobID, java.lang.String filename)
      Generates a URI identifying CDX info for one harvested ARC file.
      static java.net.URI getCDXURI​(java.lang.String harvestID, java.lang.String jobID, java.lang.String filename)
      Generates a URI identifying CDX info for one harvested (W)ARC file.
      abstract java.io.File getFile()  
      static java.lang.String getMetadataArchiveFileName​(java.lang.String jobID, java.lang.Long harvestID)
      Generates a name for an archive(ARC/WARC) file containing metadata regarding a given job.
      protected static void initializeMetadataFormat()
      Initialize the used metadata format from settings.
      void insertFiles​(java.io.File parentDir, java.io.FilenameFilter filter, java.lang.String mimetype, long harvestId, long jobId)
      Append the files contained in the directory to the metadata archive file, but only if the filename matches the supplied filter.
      static void resetMetadataFormat()
      Reset the metadata format.
      abstract void write​(java.lang.String uri, java.lang.String contentType, java.lang.String hostIP, long fetchBeginTimeStamp, byte[] payload)
      Write a record to the archive file.
      abstract void writeFileTo​(java.io.File file, java.lang.String uri, java.lang.String mime)
      Write the given file to the metadata file.
      abstract boolean writeTo​(java.io.File fileToArchive, java.lang.String URL, java.lang.String mimetype)
      Writes a File to an ARCWriter, if available, otherwise logs the failure to the class-logger.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • initializeMetadataFormat

        protected static void initializeMetadataFormat()
        Initialize the used metadata format from settings.
      • getMetadataArchiveFileName

        public static java.lang.String getMetadataArchiveFileName​(java.lang.String jobID,
                                                                  java.lang.Long harvestID)
                                                           throws ArgumentNotValid
        Generates a name for an archive(ARC/WARC) file containing metadata regarding a given job.
        Parameters:
        jobID - The number of the job that generated the archive file.
        Returns:
        A "flat" file name (i.e. no path) containing the jobID parameter and ending on "-metadata-N.(w)arc", where N is the serial number of the metadata files for this job, e.g. "42-metadata-1.(w)arc". Currently, only one file is ever made.
        Throws:
        ArgumentNotValid - if any parameter was null.
      • createWriter

        public static MetadataFileWriter createWriter​(java.io.File metadataArchiveFile)
        Create a writer that writes data to the given archive file.
        Parameters:
        metadataArchiveFile - The archive file to write to.
        Returns:
        a writer that writes data to the given archive file.
      • close

        public abstract void close()
        Close the metadatafile Writer.
      • getFile

        public abstract java.io.File getFile()
        Returns:
        the finished metadataFile
      • writeFileTo

        public abstract void writeFileTo​(java.io.File file,
                                         java.lang.String uri,
                                         java.lang.String mime)
        Write the given file to the metadata file.
        Parameters:
        file - A given file with metadata to write to the metadata archive file.
        uri - The uri associated with the piece of metadata
        mime - The mimetype associated with the piece of metadata
      • writeTo

        public abstract boolean writeTo​(java.io.File fileToArchive,
                                        java.lang.String URL,
                                        java.lang.String mimetype)
        Writes a File to an ARCWriter, if available, otherwise logs the failure to the class-logger.
        Parameters:
        fileToArchive - the File to archive
        URL - the URL with which it is stored in the arcfile
        mimetype - The mimetype of the File-contents
        Returns:
        true, if file exists, and is written to the arcfile.
      • write

        public abstract void write​(java.lang.String uri,
                                   java.lang.String contentType,
                                   java.lang.String hostIP,
                                   long fetchBeginTimeStamp,
                                   byte[] payload)
                            throws java.io.IOException
        Write a record to the archive file.
        Parameters:
        uri - record URI
        contentType - content-type of record
        hostIP - resource ip-address
        fetchBeginTimeStamp - record datetime
        payload - A byte array containing the payload
        Throws:
        java.io.IOException
        See Also:
        ARCWriter.write(String uri, String contentType, String hostIP, long fetchBeginTimeStamp, long recordLength, InputStream in)
      • insertFiles

        public void insertFiles​(java.io.File parentDir,
                                java.io.FilenameFilter filter,
                                java.lang.String mimetype,
                                long harvestId,
                                long jobId)
        Append the files contained in the directory to the metadata archive file, but only if the filename matches the supplied filter.
        Parameters:
        parentDir - directory containing the files to append to metadata
        filter - filter describing which files to accept and which to ignore
        mimetype - The content-type to write along with the files in the metadata output
        harvestId - The harvestId of the harvest
        jobId - The jobId of the harvest
      • resetMetadataFormat

        public static void resetMetadataFormat()
        Reset the metadata format. Should only be used by a unittest.
      • getCDXURI

        public static java.net.URI getCDXURI​(java.lang.String harvestID,
                                             java.lang.String jobID,
                                             java.lang.String filename)
                                      throws ArgumentNotValid,
                                             UnknownID
        Generates a URI identifying CDX info for one harvested (W)ARC file. In Netarkivet, all of the parameters below are in the (W)ARC file's name.
        Parameters:
        harvestID - The number of the harvest that generated the (W)ARC file.
        jobID - The number of the job that generated the (W)ARC file.
        filename - The name of the ARC or WARC file behind the cdx-data
        Returns:
        A URI in the proprietary schema "metadata".
        Throws:
        ArgumentNotValid - if any parameter is null.
        UnknownID - if something goes terribly wrong in our URI construction.
      • getAlternateCDXURI

        public static java.net.URI getAlternateCDXURI​(long jobID,
                                                      java.lang.String filename)
                                               throws ArgumentNotValid,
                                                      UnknownID
        Generates a URI identifying CDX info for one harvested ARC file.
        Parameters:
        jobID - The number of the job that generated the ARC file.
        filename - the filename.
        Returns:
        A URI in the proprietary schema "metadata".
        Throws:
        ArgumentNotValid - if any parameter is null.
        UnknownID - if something goes terribly wrong in our URI construction.
      • compressRecords

        public static boolean compressRecords()
        Returns:
        true, if we want to compress out metadata records, false, if not