Class MetadataFile

  • All Implemented Interfaces:
    Comparable<MetadataFile>

    public class MetadataFile
    extends Object
    implements Comparable<MetadataFile>
    Wraps information for an Heritrix file that should be stored in the metadata ARC.

    Defines a natural order to sort them.

    • Field Detail

      • RECOVER_LOG_PATTERN

        public static final String RECOVER_LOG_PATTERN
        A pattern identifying the recover.gz log metadata entry.
        See Also:
        Constant Field Values
      • HERITRIX_FILE_PATTERN

        public static final String HERITRIX_FILE_PATTERN
        The pattern controlling which files in the crawl directory root should be stored in the metadata ARC.
      • REPORT_FILE_PATTERN

        public static final String REPORT_FILE_PATTERN
        The pattern controlling which files in the crawl directory root should be stored in the metadata ARC as reports.
      • LOG_FILE_PATTERN

        public static final String LOG_FILE_PATTERN
        The pattern controlling which files in the logs subdirectory of the crawl directory root should be stored in the metadata ARC as log files.
      • DOMAIN_SETTINGS_FILE

        public static final String DOMAIN_SETTINGS_FILE
        The name of a domain-specific Heritrix settings file (a.k.a. override).
        See Also:
        Constant Field Values
    • Constructor Detail

      • MetadataFile

        public MetadataFile​(File heritrixFile,
                            Long harvestId,
                            Long jobId,
                            String heritrixVersion)
        Creates a metadata file and finds which metadata type it belongs to. First the name of a heritrixfile is tested against the reportfile pattern, then again the logfile pattern. If the name matches neither of these, it is considered a setup file.
      • MetadataFile

        public MetadataFile​(File heritrixFile,
                            Long harvestId,
                            Long jobId,
                            String heritrixVersion,
                            String domain)
        Creates a metadata file for a domain-specific override file.
        Parameters:
        heritrixFile - a given heritrix metadata file.
        harvestId - The harvestID that the job generating this file is part of.
        jobId - The Id of the job generating this file
        heritrixVersion - the version of Heritrix generating the file
        domain - The name of the domain, this metadata belongs to
    • Method Detail

      • getUrl

        public String getUrl()
        Returns:
        the metadata URL associated to this file.
      • getHeritrixFile

        public File getHeritrixFile()
        Returns the actual file.
        Returns:
        the actual file.