Class MetadataEntry

  • All Implemented Interfaces:
    Serializable

    public class MetadataEntry
    extends Object
    implements Serializable
    Class used to carry metadata in DoOneCrawl messages, including the URL and mimetype necessary to write the metadata to metadata (W)ARC files.
    See Also:
    Serialized Form
    • Constructor Detail

      • MetadataEntry

        public MetadataEntry​(String url,
                             String mimeType,
                             String data)
        Constructor for this class.
        Parameters:
        url - the URL assigned to this metadata (needed for it to be searchable)
        mimeType - the mimeType for this metadata (normally text/plain or text/xml)
        data - the metadata itself
        Throws:
        ArgumentNotValid - if arguments are null or empty strings, or if argument url is not valid URL or if argument mimeType is not valid MimeType
    • Method Detail

      • makeAliasMetadataEntry

        public static MetadataEntry makeAliasMetadataEntry​(List<AliasInfo> aliases,
                                                           Long origHarvestDefinitionID,
                                                           int harvestNum,
                                                           Long jobId)
        Generate a MetadataEntry from a list of AliasInfo objects (VERSION 2) Expired aliases is skipped by this method.
        Parameters:
        aliases - the list of aliases (possibly empty)
        origHarvestDefinitionID - The harvestdefinition that is behind the job with the given jobId
        harvestNum - The number of the harvest that the job with the given jobid belongs to
        jobId - The id of the Job, which this metadata belongs to
        Returns:
        null, if the list if empty (or only consists of expired aliases), otherwise returns a MetadataEntry from a list of AliasInfo objects containing unexpired aliases.
      • makeDuplicateReductionMetadataEntry

        public static MetadataEntry makeDuplicateReductionMetadataEntry​(List<Long> jobIDsForDuplicateReduction,
                                                                        Long origHarvestDefinitionID,
                                                                        int harvestNum,
                                                                        Long jobId)
        Generate a MetadataEntry from a list of job ids for duplicate reduction.
        Parameters:
        jobIDsForDuplicateReduction - the list of jobids (possibly empty)
        origHarvestDefinitionID - The harvestdefinition that is behind the job with the given jobId
        harvestNum - The number of the harvest that the job with the given jobid belongs to
        jobId - The id of the Job, which this metadata belongs to
        Returns:
        null, if the list is empty, otherwise returns a MetadataEntry from the list of jobids.
      • getData

        public byte[] getData()
        Returns:
        Returns the data.
      • getMimeType

        public String getMimeType()
        Returns:
        Returns the mimeType.
      • getURL

        public String getURL()
        Returns:
        Returns the URL
      • isDuplicateReductionMetadataEntry

        public boolean isDuplicateReductionMetadataEntry()
        Checks, if this is a duplicate reduction MetadataEntry.
        Returns:
        true, if this is a duplicate reduction MetadataEntry, otherwise false.
      • toString

        public String toString()
        Overrides:
        toString in class Object
        Returns:
        a string representation of this object
      • storeMetadataToDisk

        public static void storeMetadataToDisk​(List<MetadataEntry> metadata,
                                               File destinationDir)
        Store a list of metadata entries to disk.
        Parameters:
        metadata - the given metadata
        destinationDir - the directory to store the metadata.
      • getMetadataFromDisk

        public static List<MetadataEntry> getMetadataFromDisk​(File sourceDir)
        Retrieve a list of serialized metadata entries on disk.
        Parameters:
        sourceDir - the directory where the metadata is stored.
        Returns:
        the list of deserialized MetadataEntry object.