Class ArchiveBatchFilter

  • All Implemented Interfaces:
    Serializable

    public abstract class ArchiveBatchFilter
    extends Object
    implements Serializable
    A filter class for batch entries. Allows testing whether or not to process an entry without loading the entry data first.

    accept() is given an ArchiveRecord to avoid unnecessary reading and copying of data of records not accepted by filter.

    See Also:
    Serialized Form
    • Field Detail

      • name

        protected String name
        The name of the BatchFilter.
      • NO_FILTER

        public static final ArchiveBatchFilter NO_FILTER
        A default filter: Accepts everything.
      • EXCLUDE_NON_RESPONSE_RECORDS

        public static final ArchiveBatchFilter EXCLUDE_NON_RESPONSE_RECORDS
        A default filter: Accepts only response records.
      • EXCLUDE_NON_WARCINFO_RECORDS

        public static final ArchiveBatchFilter EXCLUDE_NON_WARCINFO_RECORDS
        A default filter: Accepts only response records.
      • ONLY_HTTP_ENTRIES

        public static final ArchiveBatchFilter ONLY_HTTP_ENTRIES
        Filter that only accepts records where the url starts with http.
    • Constructor Detail

      • ArchiveBatchFilter

        protected ArchiveBatchFilter​(String name)
        Create a new filter with the given name.
        Parameters:
        name - The name of this filter, for debugging mostly.
    • Method Detail

      • getName

        protected String getName()
        Get the name of the filter.
        Returns:
        the name of the filter.
      • accept

        public abstract boolean accept​(ArchiveRecordBase record)
        Check if a given record is accepted (not filtered out) by this filter.
        Parameters:
        record - a given archive record
        Returns:
        true, if the given archive record is accepted by this filter
      • getMimetypeBatchFilter

        public static ArchiveBatchFilter getMimetypeBatchFilter​(String mimetype)
                                                         throws MimeTypeParseException
        Note that the mimetype of the WARC responserecord is not (necessarily) the same as its payload.
        Parameters:
        mimetype - String denoting the mimetype this filter represents
        Returns:
        a BatchFilter that filters out all ARCRecords, that does not have this mimetype
        Throws:
        MimeTypeParseException - (if mimetype is invalid)
      • mimetypeIsOk

        public static boolean mimetypeIsOk​(String mimetype)
        Check, if a certain mimetype is valid
        Parameters:
        mimetype -
        Returns:
        boolean true, if mimetype matches word/word, otherwise false