Package dk.netarkivet.common.utils
Class FileUtils
- java.lang.Object
-
- dk.netarkivet.common.utils.FileUtils
-
public class FileUtils extends Object
Misc. handy file utilities.
-
-
Field Summary
Fields Modifier and Type Field Description static String
ARC_EXTENSION
Extension used for ARC files, including separator .static String
ARC_GZIPPED_EXTENSION
Extension used for gzipped ARC files, including separator .static String
ARC_PATTERN
Pattern matching ARC files, including separator.static FilenameFilter
ARCS_FILTER
A filter that matches arc files, that is any file that ends on .arc or .arc.gz in any case.static String
CDX_EXTENSION
Extension used for CDX files, including separator .static FilenameFilter
CDX_FILE_FILTER
A FilenameFilter accepting a file if and only if its name (transformed to lower case) ends on ".cdx".static int
MAX_IDS_IN_FILENAME
Maximum number of IDs we will put in a filename.static String
OPEN_ARC_PATTERN
Pattern matching open ARC files, including separator .static FilenameFilter
OPEN_ARCS_FILTER
A filter that matches files left open by a crashed Heritrix process.static String
OPEN_WARC_PATTERN
Pattern matching open WARC files, including separator .static FilenameFilter
OPEN_WARCS_FILTER
A filter that matches warcfiles left open by a crashed Heritrix process.static String
WARC_ARC_PATTERN
Pattern matching WARC and ARC files, including separator.static String
WARC_EXTENSION
Extension used for WARC files, including separator .static String
WARC_GZIPPED_EXTENSION
Extension used for gzipped WARC files, including separator .static String
WARC_PATTERN
Pattern matching WARC files, including separator.static FilenameFilter
WARCS_ARCS_FILTER
A filter that matches warc and arc files, that is any file that ends on .warc, .warc.gz, .arc or .arc.gz in any case.static FilenameFilter
WARCS_FILTER
A filter that matches warc files, that is any file that ends on .warc or .warc.gz in any case.
-
Constructor Summary
Constructors Constructor Description FileUtils()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static void
appendToFile(File file, String... lines)
Append the given lines to a file.static void
copyDirectory(File from, File to)
Copy an entire directory from one location to another.static void
copyFile(File from, File to)
Copy file from one location to another.static long
countLines(File file)
Count the number of lines in a file.static boolean
createDir(File dir)
Check if the directory exists, and create it if needed.static File
createUniqueTempDir(File inDir, String prefix)
Creates a new temporary directory with a unique name.static String
formatFilename(String filename)
Returns a valid filename for most filesystems.static <T extends Comparable<T>>
StringgenerateFileNameFromSet(Set<T> IDs, String suffix)
Given a set, generate a reasonable file name from the set.static long
getBytesFree(File f)
Returns the number of bytes free on the file system calling the FreeSpaceProvider class defined by the setting CommonSettings.FREESPACE_PROVIDER_CLASS (a.k.a.static InputStream
getEphemeralInputStream(File file)
Create an InputStream that reads from a file but removes the file when all data has been read.static List<File>
getFilesRecursively(String dir, List<File> files, String type)
Retrieves all files whose names ends with 'type' from directory 'dir' and all its subdirectories.static String
getHumanReadableFileSize(File aFile)
Get a humanly readable representation of the file size.static File
getResourceFileFromClassPath(String filePath)
Loads an file from the class path (for retrieving a file from '.jar').static File
getTempDir()
Get the location of the standard temporary directory.static FilenameFilter
getXmlFilesFilter()
Return a filter that only accepts XML files (ending with .xml), irrespective of their location.static boolean
hasFiles(File aDir)
static void
makeSortedFile(File unsortedFile, File sortedOutput)
Sort a file into another.static File
makeValidFileFromExisting(String filename)
Makes a valid file from filename passed in String.static void
moveFile(File fromFile, File toFile)
Attempt to move a file using rename, and if that fails, move the file by copy-and-delete.static byte[]
readBinaryFile(File file)
Read an entire file, byte by byte, into a byte array, ignoring any locale issues.static String
readFile(File file)
Load file content into text string.static String
readLastLine(File file)
Read the last line in a file.static List<String>
readListFromFile(File file)
Read all lines from a file into a list of strings.static String
relativeTo(File theFile, File theDir)
static boolean
remove(File f)
Remove a file.static void
removeLineFromFile(String line, File file)
Remove a line from a given file.static boolean
removeRecursively(File f)
Remove a file and any subfiles in case of directories.static void
sortCDX(File file, File toFile)
Sort a CDX file according to our standard for CDX file sorting.static void
sortCrawlLog(File file, File toFile)
Sort a crawl.log file according to the url.static void
sortCrawlLogOnTimestamp(File file, File toFile)
Sort a crawl.log file according to the timestamp.static void
sortFile(File file, File toFile)
Sort a file using UNIX sort.static void
writeBinaryFile(File file, byte[] b)
Write an entire byte array to a file, ignoring any locale issues.static void
writeCollectionToFile(File file, Collection<String> collection)
Writes a collection of strings to a file, each string on one line.static void
writeFileToStream(File f, OutputStream out)
Write the entire contents of a file to a stream.static void
writeStreamToFile(InputStream in, File f)
Write the contents of a stream into a file.
-
-
-
Field Detail
-
CDX_EXTENSION
public static final String CDX_EXTENSION
Extension used for CDX files, including separator .- See Also:
- Constant Field Values
-
ARC_EXTENSION
public static final String ARC_EXTENSION
Extension used for ARC files, including separator .- See Also:
- Constant Field Values
-
ARC_GZIPPED_EXTENSION
public static final String ARC_GZIPPED_EXTENSION
Extension used for gzipped ARC files, including separator .- See Also:
- Constant Field Values
-
WARC_EXTENSION
public static final String WARC_EXTENSION
Extension used for WARC files, including separator .- See Also:
- Constant Field Values
-
WARC_GZIPPED_EXTENSION
public static final String WARC_GZIPPED_EXTENSION
Extension used for gzipped WARC files, including separator .- See Also:
- Constant Field Values
-
ARC_PATTERN
public static final String ARC_PATTERN
Pattern matching ARC files, including separator. Note: (?i) means case insensitive, (\\.gz)? means .gz is optionally matched, and $ means matches end-of-line. Thus this pattern will match file.arc.gz, file.ARC, file.aRc.GZ, but not file.ARC.open- See Also:
- Constant Field Values
-
OPEN_ARC_PATTERN
public static final String OPEN_ARC_PATTERN
Pattern matching open ARC files, including separator . Note: (?i) means case insensitive, (\\.gz)? means .gz is optionally matched, and $ means matches end-of-line. Thus this pattern will match file.arc.gz.open, file.ARC.open, file.arc.GZ.OpEn, but not file.ARC.open.txt- See Also:
- Constant Field Values
-
WARC_PATTERN
public static final String WARC_PATTERN
Pattern matching WARC files, including separator. Note: (?i) means case insensitive, (\\.gz)? means .gz is optionally matched, and $ means matches end-of-line. Thus this pattern will match file.warc.gz, file.WARC, file.WaRc.GZ, but not file.WARC.open- See Also:
- Constant Field Values
-
OPEN_WARC_PATTERN
public static final String OPEN_WARC_PATTERN
Pattern matching open WARC files, including separator . Note: (?i) means case insensitive, (\\.gz)? means .gz is optionally matched, and $ means matches end-of-line. Thus this pattern will match file.warc.gz.open, file.WARC.open, file.warc.GZ.OpEn, but not file.wARC.open.txt- See Also:
- Constant Field Values
-
WARC_ARC_PATTERN
public static final String WARC_ARC_PATTERN
Pattern matching WARC and ARC files, including separator. Note: (?i) means case insensitive, (\\.gz)? means .gz is optionally matched, and $ means matches end-of-line. Thus this pattern will match file.warc.gz, file.WARC, file.WaRc.GZ, file.arc.gz, file.ARC, file.aRc.GZ but not file.WARC.open or file.ARC.open- See Also:
- Constant Field Values
-
CDX_FILE_FILTER
public static final FilenameFilter CDX_FILE_FILTER
A FilenameFilter accepting a file if and only if its name (transformed to lower case) ends on ".cdx".
-
OPEN_ARCS_FILTER
public static final FilenameFilter OPEN_ARCS_FILTER
A filter that matches files left open by a crashed Heritrix process. Don't work on these files while Heritrix is still working on them.
-
OPEN_WARCS_FILTER
public static final FilenameFilter OPEN_WARCS_FILTER
A filter that matches warcfiles left open by a crashed Heritrix process. Don't work on these files while Heritrix is still working on them.
-
ARCS_FILTER
public static final FilenameFilter ARCS_FILTER
A filter that matches arc files, that is any file that ends on .arc or .arc.gz in any case.
-
WARCS_FILTER
public static final FilenameFilter WARCS_FILTER
A filter that matches warc files, that is any file that ends on .warc or .warc.gz in any case.
-
WARCS_ARCS_FILTER
public static final FilenameFilter WARCS_ARCS_FILTER
A filter that matches warc and arc files, that is any file that ends on .warc, .warc.gz, .arc or .arc.gz in any case.
-
MAX_IDS_IN_FILENAME
public static final int MAX_IDS_IN_FILENAME
Maximum number of IDs we will put in a filename. Above this number, a checksum of the ids is generated instead. This is done to protect us from getting filenames too long for the filesystem.- See Also:
- Constant Field Values
-
-
Method Detail
-
removeRecursively
public static boolean removeRecursively(File f)
Remove a file and any subfiles in case of directories.- Parameters:
f
- A file to completely and utterly remove.- Returns:
- true if the file did exist, false otherwise.
- Throws:
SecurityException
- If a security manager exists and its
method denies delete access to the fileSecurityManager.checkDelete(java.lang.String)
-
remove
public static boolean remove(File f)
Remove a file.- Parameters:
f
- A file to completely and utterly remove.- Returns:
- true if the file did exist, false otherwise.
- Throws:
ArgumentNotValid
- if f is null.SecurityException
- If a security manager exists and its
method denies delete access to the fileSecurityManager.checkDelete(java.lang.String)
-
formatFilename
public static String formatFilename(String filename)
Returns a valid filename for most filesystems. Exchanges the following characters: " " -> "_" ":" -> "_" "+" -> "_"- Parameters:
filename
- the filename to format correctly- Returns:
- a new formatted filename
-
getFilesRecursively
public static List<File> getFilesRecursively(String dir, List<File> files, String type)
Retrieves all files whose names ends with 'type' from directory 'dir' and all its subdirectories.- Parameters:
dir
- Path of base directoryfiles
- Initially, an empty list (e.g. an ArrayList)type
- The extension/ending of the files to retrieve (e.g. ".xml", ".ARC")- Returns:
- A list of files from directory 'dir' and all its subdirectories
-
readFile
public static String readFile(File file) throws IOException
Load file content into text string.- Parameters:
file
- The file to load- Returns:
- file content loaded into text string
- Throws:
IOException
- If any IO trouble occurs while reading the file, or the file cannot be found.
-
copyFile
public static void copyFile(File from, File to)
Copy file from one location to another. Will silently overwrite an already existing file.- Parameters:
from
- original to copyto
- destination of copy- Throws:
IOFailure
- if an io error occurs while copying file, or the original file does not exist.
-
copyDirectory
public static void copyDirectory(File from, File to) throws IOFailure
Copy an entire directory from one location to another. Note that this will silently overwrite old files, just like copyFile().- Parameters:
from
- Original directory (or file, for that matter) to copy.to
- Destination directory, i.e. the 'new name' of the copy of the from directory.- Throws:
IOFailure
- On IO trouble copying files.
-
readBinaryFile
public static byte[] readBinaryFile(File file) throws IOFailure, IndexOutOfBoundsException
Read an entire file, byte by byte, into a byte array, ignoring any locale issues.- Parameters:
file
- A file to be read.- Returns:
- A byte array with the contents of the file.
- Throws:
IOFailure
- on IO trouble reading the file, or the file does not existIndexOutOfBoundsException
- If the file is too large to be in an array.
-
writeBinaryFile
public static void writeBinaryFile(File file, byte[] b)
Write an entire byte array to a file, ignoring any locale issues.- Parameters:
file
- The file to write the data tob
- The byte array to write to the file- Throws:
IOFailure
- If an exception occurs during the writing.
-
getXmlFilesFilter
public static FilenameFilter getXmlFilesFilter()
Return a filter that only accepts XML files (ending with .xml), irrespective of their location.- Returns:
- A new filter for XML files.
-
readListFromFile
public static List<String> readListFromFile(File file)
Read all lines from a file into a list of strings.- Parameters:
file
- The file to read from.- Returns:
- The list of lines.
- Throws:
IOFailure
- on trouble reading the file, or if the file does not exist
-
writeCollectionToFile
public static void writeCollectionToFile(File file, Collection<String> collection)
Writes a collection of strings to a file, each string on one line.- Parameters:
file
- A file to write to. The contents of this file will be overwritten.collection
- The collection to write. The order it will be written in is unspecified.- Throws:
IOFailure
- if any error occurs writing to the file.ArgumentNotValid
- if file or collection is null.
-
makeSortedFile
public static void makeSortedFile(File unsortedFile, File sortedOutput)
Sort a file into another. The current implementation slurps all lines into memory. This will not scale forever.- Parameters:
unsortedFile
- A file to sortsortedOutput
- The file to sort into
-
removeLineFromFile
public static void removeLineFromFile(String line, File file)
Remove a line from a given file.- Parameters:
line
- The full line to removefile
- The file to remove the line from. This file will be rewritten in full, and the entire contents will be kept in memory- Throws:
UnknownID
- If the file does not exist
-
createDir
public static boolean createDir(File dir) throws PermissionDenied
Check if the directory exists, and create it if needed. The complete path down to the directory is created. If the directory creation fails a PermissionDenied exception is thrown. If the directory is not writable, a warning is logged- Parameters:
dir
- The directory to create- Returns:
- true if dir created.
- Throws:
ArgumentNotValid
- If dir is null or its name is the empty stringPermissionDenied
- If directory cannot be created for any reason
-
getBytesFree
public static long getBytesFree(File f)
Returns the number of bytes free on the file system calling the FreeSpaceProvider class defined by the setting CommonSettings.FREESPACE_PROVIDER_CLASS (a.k.a. settings.common.freespaceprovider.class)- Parameters:
f
- a given file- Returns:
- the number of bytes free defined in the settings.xml
-
relativeTo
public static String relativeTo(File theFile, File theDir)
- Parameters:
theFile
- A file to make relativetheDir
- A directory- Returns:
- the filepath of the theFile relative to theDir. null, if theFile is not relative to theDir. null, if theDir is not a directory.
-
countLines
public static long countLines(File file)
Count the number of lines in a file.- Parameters:
file
- the file to read- Returns:
- the number of lines in the file
- Throws:
IOFailure
- If an error occurred while reading the file
-
getEphemeralInputStream
public static InputStream getEphemeralInputStream(File file)
Create an InputStream that reads from a file but removes the file when all data has been read.- Parameters:
file
- A file to read. This file will be deleted when the inputstream is closed, finalized, reaches end-of-file, or when the VM closes.- Returns:
- An InputStream containing the file's contents.
- Throws:
IOFailure
- If an error occurs in creating the ephemeral input stream
-
makeValidFileFromExisting
public static File makeValidFileFromExisting(String filename) throws IOFailure
Makes a valid file from filename passed in String. Ensures that the File object returned is not null, and that isFile() returns true.- Parameters:
filename
- The file to create the File object from- Returns:
- A valid, non-null File object.
- Throws:
IOFailure
- if file cannot be created.
-
writeFileToStream
public static void writeFileToStream(File f, OutputStream out)
Write the entire contents of a file to a stream.- Parameters:
f
- A file to write to the stream.out
- The stream to write to.- Throws:
IOFailure
- If any error occurs while writing the file to a stream
-
writeStreamToFile
public static void writeStreamToFile(InputStream in, File f)
Write the contents of a stream into a file.- Parameters:
in
- A stream to read from. This stream is not closed by this method.f
- The file to write the stream contents into.- Throws:
IOFailure
- If any error occurs while writing the stream to a file
-
getTempDir
public static File getTempDir()
Get the location of the standard temporary directory. The existence of this directory should be ensure at the start of every application.- Returns:
- The directory that should be used for temporary files.
-
moveFile
public static void moveFile(File fromFile, File toFile)
Attempt to move a file using rename, and if that fails, move the file by copy-and-delete.- Parameters:
fromFile
- The sourcetoFile
- The target
-
generateFileNameFromSet
public static <T extends Comparable<T>> String generateFileNameFromSet(Set<T> IDs, String suffix)
Given a set, generate a reasonable file name from the set.- Type Parameters:
T
- The type of objects, that the Set IDs argument contains.- Parameters:
IDs
- A set of IDs.suffix
- A suffix. May be empty string.- Returns:
- A reasonable file name.
-
sortCrawlLog
public static void sortCrawlLog(File file, File toFile)
Sort a crawl.log file according to the url.- Parameters:
file
- The file containing the unsorted data.toFile
- The file that the sorted data can be put into.- Throws:
IOFailure
- if there were errors running the sort process, or if the file does not exist.
-
sortCrawlLogOnTimestamp
public static void sortCrawlLogOnTimestamp(File file, File toFile)
Sort a crawl.log file according to the timestamp.- Parameters:
file
- The file containing the unsorted data.toFile
- The file that the sorted data can be put into.- Throws:
IOFailure
- if there were errors running the sort process, or if the file does not exist.
-
sortCDX
public static void sortCDX(File file, File toFile)
Sort a CDX file according to our standard for CDX file sorting. This method depends on the Unix sort() command.- Parameters:
file
- The raw unsorted CDX file.toFile
- The file that the result will be put into.- Throws:
IOFailure
- If the file does not exist, or could not be sorted
-
sortFile
public static void sortFile(File file, File toFile)
Sort a file using UNIX sort.- Parameters:
file
- the file that you want to sort.toFile
- The destination file.
-
createUniqueTempDir
public static File createUniqueTempDir(File inDir, String prefix)
Creates a new temporary directory with a unique name. This directory will be deleted automatically at the end of the VM (though behaviour if there are files in it is undefined). This method will try a limited number of times to create a directory, using a randomly generated suffix, before giving up.- Parameters:
inDir
- The directory where the temporary directory should be created.prefix
- The prefix of the directory name, for identification purposes.- Returns:
- A newly created directory that no other calls to createUniqueDir returns.
- Throws:
ArgumentNotValid
- if inDir is not an existing directory that can be written to.IOFailure
- if a free name couldn't be found within a reasonable number of tries.
-
readLastLine
public static String readLastLine(File file)
Read the last line in a file. Note this method is not UTF-8 safe.- Parameters:
file
- input file to read last line from.- Returns:
- The last line in the file (ending newline is irrelevant), returns an empty string if file is empty.
- Throws:
ArgumentNotValid
- on null argument, or file is not a readable file.IOFailure
- on IO trouble reading file.
-
appendToFile
public static void appendToFile(File file, String... lines)
Append the given lines to a file. Each lines is terminated by a newline.- Parameters:
file
- A file to append to.lines
- The lines to write.
-
getResourceFileFromClassPath
public static File getResourceFileFromClassPath(String filePath) throws IOFailure
Loads an file from the class path (for retrieving a file from '.jar').- Parameters:
filePath
- The path of the file.- Returns:
- The file from the class path.
- Throws:
IOFailure
- If resource cannot be retrieved from the class path.
-
getHumanReadableFileSize
public static String getHumanReadableFileSize(File aFile)
Get a humanly readable representation of the file size. If the file is a directory, the size is the aggregate of the files in the directory except that subdirectories are ignored. The number is given with 2 decimals.- Parameters:
aFile
- a File object- Returns:
- a humanly readable representation of the file size (rounded)
-
hasFiles
public static boolean hasFiles(File aDir)
- Parameters:
aDir
- A directory- Returns:
- true, if the given directory contains files; else returns false
-
-