Package dk.netarkivet.common.utils.warc
Class WARCUtils
- java.lang.Object
-
- dk.netarkivet.common.utils.warc.WARCUtils
-
public class WARCUtils extends Object
Various utilities on WARC-records. We have borrowed code from wayback. See org.archive.wayback.resourcestore.indexer.WARCRecordToSearchResultAdapter
-
-
Field Summary
Fields Modifier and Type Field Description protected static org.slf4j.Logger
log
Logging output place.
-
Constructor Summary
Constructors Constructor Description WARCUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static org.archive.io.warc.WARCWriter
createWARCWriter(File newFile)
Create new WARCWriter, writing to warcfile newFile.static String
getRecordType(org.archive.io.warc.WARCRecord record)
Find out what type of WARC-record this is.static void
insertWARCFile(File warcFile, org.archive.io.warc.WARCWriter writer)
Insert the contents of a WARC file into another WARCFile.static boolean
isWarc(String filename)
Check if the given filename represents a WARC file.static byte[]
readWARCRecord(org.archive.io.warc.WARCRecord record)
Read the contents (payload) of an WARC record into a byte array.
-
-
-
Method Detail
-
createWARCWriter
public static org.archive.io.warc.WARCWriter createWARCWriter(File newFile)
Create new WARCWriter, writing to warcfile newFile.- Parameters:
newFile
- the WARCfile, that the WARCWriter writes to.- Returns:
- new WARCWriter, writing to warcfile newFile.
-
insertWARCFile
public static void insertWARCFile(File warcFile, org.archive.io.warc.WARCWriter writer)
Insert the contents of a WARC file into another WARCFile.- Parameters:
warcFile
- An WARC file to read.writer
- A place to write the arc records- Throws:
IOFailure
- if there are problems reading the file.
-
readWARCRecord
public static byte[] readWARCRecord(org.archive.io.warc.WARCRecord record) throws IOFailure
Read the contents (payload) of an WARC record into a byte array.- Parameters:
record
- An WARC record to read from. After reading, the WARC Record will no longer have its own data available for reading.- Returns:
- A byte array containing the payload of the WARC record. Note that the size of the payload is calculated by subtracting the contentBegin value from the length of the record (both values included in the record header).
- Throws:
IOFailure
- If there is an error reading the data, or if the record is longer than Integer.MAX_VALUE (since we can't make bigger arrays).
-
getRecordType
public static String getRecordType(org.archive.io.warc.WARCRecord record)
Find out what type of WARC-record this is.- Parameters:
record
- a given WARCRecord- Returns:
- the type of WARCRecord as a String.
-
isWarc
public static boolean isWarc(String filename)
Check if the given filename represents a WARC file.- Parameters:
filename
- A given filename- Returns:
- true, if the filename ends with .warc or .warc.gz
-
-