Package dk.netarkivet.common.utils.batch
Class ArchiveBatchFilter
- java.lang.Object
-
- dk.netarkivet.common.utils.batch.ArchiveBatchFilter
-
- All Implemented Interfaces:
Serializable
public abstract class ArchiveBatchFilter extends Object implements Serializable
A filter class for batch entries. Allows testing whether or not to process an entry without loading the entry data first.accept() is given an ArchiveRecord to avoid unnecessary reading and copying of data of records not accepted by filter.
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static ArchiveBatchFilter
EXCLUDE_NON_RESPONSE_RECORDS
A default filter: Accepts only response records.static ArchiveBatchFilter
EXCLUDE_NON_WARCINFO_RECORDS
A default filter: Accepts only response records.protected String
name
The name of the BatchFilter.static ArchiveBatchFilter
NO_FILTER
A default filter: Accepts everything.static ArchiveBatchFilter
ONLY_HTTP_ENTRIES
Filter that only accepts records where the url starts with http.
-
Constructor Summary
Constructors Modifier Constructor Description protected
ArchiveBatchFilter(String name)
Create a new filter with the given name.
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description abstract boolean
accept(ArchiveRecordBase record)
Check if a given record is accepted (not filtered out) by this filter.static ArchiveBatchFilter
getMimetypeBatchFilter(String mimetype)
Note that the mimetype of the WARC responserecord is not (necessarily) the same as its payload.protected String
getName()
Get the name of the filter.static boolean
mimetypeIsOk(String mimetype)
Check, if a certain mimetype is valid
-
-
-
Field Detail
-
name
protected String name
The name of the BatchFilter.
-
NO_FILTER
public static final ArchiveBatchFilter NO_FILTER
A default filter: Accepts everything.
-
EXCLUDE_NON_RESPONSE_RECORDS
public static final ArchiveBatchFilter EXCLUDE_NON_RESPONSE_RECORDS
A default filter: Accepts only response records.
-
EXCLUDE_NON_WARCINFO_RECORDS
public static final ArchiveBatchFilter EXCLUDE_NON_WARCINFO_RECORDS
A default filter: Accepts only response records.
-
ONLY_HTTP_ENTRIES
public static final ArchiveBatchFilter ONLY_HTTP_ENTRIES
Filter that only accepts records where the url starts with http.
-
-
Constructor Detail
-
ArchiveBatchFilter
protected ArchiveBatchFilter(String name)
Create a new filter with the given name.- Parameters:
name
- The name of this filter, for debugging mostly.
-
-
Method Detail
-
getName
protected String getName()
Get the name of the filter.- Returns:
- the name of the filter.
-
accept
public abstract boolean accept(ArchiveRecordBase record)
Check if a given record is accepted (not filtered out) by this filter.- Parameters:
record
- a given archive record- Returns:
- true, if the given archive record is accepted by this filter
-
getMimetypeBatchFilter
public static ArchiveBatchFilter getMimetypeBatchFilter(String mimetype) throws MimeTypeParseException
Note that the mimetype of the WARC responserecord is not (necessarily) the same as its payload.- Parameters:
mimetype
- String denoting the mimetype this filter represents- Returns:
- a BatchFilter that filters out all ARCRecords, that does not have this mimetype
- Throws:
MimeTypeParseException
- (if mimetype is invalid)
-
mimetypeIsOk
public static boolean mimetypeIsOk(String mimetype)
Check, if a certain mimetype is valid- Parameters:
mimetype
-- Returns:
- boolean true, if mimetype matches word/word, otherwise false
-
-