Class ARCBatchFilter

  • All Implemented Interfaces:
    Serializable

    public abstract class ARCBatchFilter
    extends Object
    implements Serializable
    A filter class for batch entries. Allows testing whether or not to process an entry without loading the entry data first. The class in itself is abstract but contains implementation of several filters.
    See Also:
    Serialized Form
    • Field Detail

      • NO_FILTER

        public static final ARCBatchFilter NO_FILTER
        A default filter: Accepts everything.
      • EXCLUDE_FILE_HEADERS

        public static final ARCBatchFilter EXCLUDE_FILE_HEADERS
        A default filter: Accepts all but the first file.
      • ONLY_HTTP_ENTRIES

        public static final ARCBatchFilter ONLY_HTTP_ENTRIES
        Filter that only accepts records where the url starts with http.
    • Constructor Detail

      • ARCBatchFilter

        protected ARCBatchFilter​(String name)
        Create a new filter with the given name.
        Parameters:
        name - The name of this filter, for debugging mostly.
    • Method Detail

      • getName

        protected String getName()
        Get the name of the filter.
        Returns:
        the name of the filter.
      • getMimetypeBatchFilter

        public static ARCBatchFilter getMimetypeBatchFilter​(String mimetype)
                                                     throws MimeTypeParseException
        Parameters:
        mimetype - String denoting the mimetype this filter represents
        Returns:
        a BatchFilter that filters out all ARCRecords, that does not have this mimetype
        Throws:
        MimeTypeParseException - If mimetype is invalid
      • mimetypeIsOk

        public static boolean mimetypeIsOk​(String mimetype)
        Check, if a certain mimetype is valid.
        Parameters:
        mimetype - a given mimetype
        Returns:
        boolean true, if mimetype matches word/word, otherwise false
      • accept

        public abstract boolean accept​(org.archive.io.arc.ARCRecord record)
        Check if a given record is accepted (not filtered out) by this filter.
        Parameters:
        record - a given ARCRecord
        Returns:
        true, if the given record is accepted by this filter