Package dk.netarkivet.common.utils.batch
Class WARCBatchFilter
- java.lang.Object
-
- dk.netarkivet.common.utils.batch.WARCBatchFilter
-
- All Implemented Interfaces:
java.io.Serializable
public abstract class WARCBatchFilter extends java.lang.Object implements java.io.Serializable
A filter class for batch entries. Allows testing whether or not to process an entry without loading the entry data first. The class in itself is abstract but contains implementation of several filters.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static WARCBatchFilter
EXCLUDE_NON_RESPONSE_RECORDS
A default filter: Accepts on response records.static WARCBatchFilter
NO_FILTER
A default filter: Accepts everything.static WARCBatchFilter
ONLY_HTTP_ENTRIES
Filter that only accepts records where the url starts with http.
-
Constructor Summary
Constructors Modifier Constructor Description protected
WARCBatchFilter(java.lang.String name)
Create a new filter with the given name.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description abstract boolean
accept(org.archive.io.warc.WARCRecord record)
Check if a given record is accepted (not filtered out) by this filter.static WARCBatchFilter
getMimetypeBatchFilter(java.lang.String mimetype)
Note that the mimetype of the WARC responserecord is not (necessarily) the same as its payload.protected java.lang.String
getName()
Get the name of the filter.static boolean
mimetypeIsOk(java.lang.String mimetype)
Check, if a certain mimetype is valid.
-
-
-
Field Detail
-
NO_FILTER
public static final WARCBatchFilter NO_FILTER
A default filter: Accepts everything.
-
EXCLUDE_NON_RESPONSE_RECORDS
public static final WARCBatchFilter EXCLUDE_NON_RESPONSE_RECORDS
A default filter: Accepts on response records.
-
ONLY_HTTP_ENTRIES
public static final WARCBatchFilter ONLY_HTTP_ENTRIES
Filter that only accepts records where the url starts with http.
-
-
Constructor Detail
-
WARCBatchFilter
protected WARCBatchFilter(java.lang.String name)
Create a new filter with the given name.- Parameters:
name
- The name of this filter, for debugging mostly.
-
-
Method Detail
-
getName
protected java.lang.String getName()
Get the name of the filter.- Returns:
- the name of the filter.
-
getMimetypeBatchFilter
public static WARCBatchFilter getMimetypeBatchFilter(java.lang.String mimetype) throws java.awt.datatransfer.MimeTypeParseException
Note that the mimetype of the WARC responserecord is not (necessarily) the same as its payload.- Parameters:
mimetype
- String denoting the mimetype this filter represents- Returns:
- a BatchFilter that filters out all WARCRecords, that does not have this mimetype
- Throws:
java.awt.datatransfer.MimeTypeParseException
- If mimetype is invalid
-
mimetypeIsOk
public static boolean mimetypeIsOk(java.lang.String mimetype)
Check, if a certain mimetype is valid.- Parameters:
mimetype
- a given mimetype- Returns:
- boolean true, if mimetype matches word/word, otherwise false
-
accept
public abstract boolean accept(org.archive.io.warc.WARCRecord record)
Check if a given record is accepted (not filtered out) by this filter.- Parameters:
record
- a given WARCRecord- Returns:
- true, if the given record is accepted by this filter
-
-