dk.netarkivet.common.utils
Class FileUtils

java.lang.Object
  extended by dk.netarkivet.common.utils.FileUtils

public class FileUtils
extends java.lang.Object

Misc. handy file utilities.


Nested Class Summary
static class FileUtils.FilenameParser
          A class for parsing an ARC filename as generated by our runs of Heritrix and retrieving components like harvestID and jobID.
 
Field Summary
static java.lang.String ARC_EXTENSION
          Extension used for ARC files, including separator .
static java.lang.String ARC_GZIPPED_EXTENSION
          Extension used for ARC files, including separator .
static java.lang.String ARC_PATTERN
          Pattern matching ARC files, including separator.
static java.io.FilenameFilter ARCS_FILTER
          A filter that matches arc files, that is any file that ends on .arc or .arc.gz in any case.
static java.lang.String CDX_EXTENSION
          Extension used for CDX files, including separator .
static java.io.FilenameFilter CDX_FILE_FILTER
          A FilenameFilter accepting a file if and only if its name (transformed to lower case) ends on ".cdx".
static org.apache.commons.logging.Log log
           
static int MAX_IDS_IN_FILENAME
          Maximum number of IDs we will put in a filename.
static java.lang.String OPEN_ARC_PATTERN
          Pattern matching open ARC files, including separator .
static java.io.FilenameFilter OPEN_ARCS_FILTER
          A filter that matches files left open by a crashed Heritrix process.
 
Constructor Summary
FileUtils()
           
 
Method Summary
static void appendToFile(java.io.File file, java.lang.String... lines)
          Append the given lines to a file.
static void copyDirectory(java.io.File from, java.io.File to)
          Copy an entire directory from one location to another.
static void copyFile(java.io.File from, java.io.File to)
          Copy file from one location to another.
static long countLines(java.io.File file)
          Count the number of lines in a file.
static boolean createDir(java.io.File dir)
          Check if the directory exists and is writable and create it if needed.
static java.io.File createUniqueTempDir(java.io.File inDir, java.lang.String prefix)
          Creates a new temporary directory with a unique name.
static java.lang.String formatFilename(java.lang.String filename)
          Returns a valid filename for most filesystems.
static
<T extends java.lang.Comparable<T>>
java.lang.String
generateFileNameFromSet(java.util.Set<T> IDs, java.lang.String suffix)
          Given a set, generate a reasonable file name from the set.
static long getBytesFree(java.io.File f)
          Returns the number of bytes free on the file system that the given file resides on.
static java.io.InputStream getEphemeralInputStream(java.io.File file)
          Create an InputStream that reads from a file but removes the file when all data has been read.
static java.util.List getFilesRecursively(java.lang.String dir, java.util.List<java.io.File> files, java.lang.String type)
          Retrieves all files whose names ends with 'type' from directory 'dir' and all its subdirectories.
static java.io.File getTempDir()
          Get the location of the standard temporary directory.
static java.io.FilenameFilter getXmlFilesFilter()
          Return a filter that only accepts XML files (ending with .xml).
static void makeSortedFile(java.io.File unsortedFile, java.io.File sortedOutput)
          Sort a file into another.
static java.io.File makeValidFileFromExisting(java.lang.String filename)
          Makes a valid file from filename passed in String.
static void moveFile(java.io.File fromFile, java.io.File toFile)
          Attempt to move a file using rename, and if that fails, move the file by copy-and-delete.
static byte[] readBinaryFile(java.io.File file)
          Read an entire file, byte by byte, into a byte array, ignoring any locale issues.
static java.lang.String readFile(java.io.File filename)
          Load file content into text string.
static java.lang.String readLastLine(java.io.File file)
          Read the last line in a file.
static java.util.List<java.lang.String> readListFromFile(java.io.File file)
          Read a all lines from a file into a list of strings.
static java.lang.String relativeTo(java.io.File theFile, java.io.File crawlDir)
           
static boolean remove(java.io.File f)
          Remove a file .
static void removeLineFromFile(java.lang.String line, java.io.File file)
          Remove a line from a given file.
static boolean removeRecursively(java.io.File f)
          Remove a file and any subfiles in case of directories.
static void writeBinaryFile(java.io.File file, byte[] b)
          Write an entire byte array to a file, ignoring any locale issues.
static void writeCollectionToFile(java.io.File file, java.util.Collection<java.lang.String> collection)
          Writes a collection of strings to a file, each string on one line.
static void writeFileToStream(java.io.File f, java.io.OutputStream out)
          Write the entire contents of a file to a stream.
static void writeStreamToFile(java.io.InputStream in, java.io.File f)
          Write the contents of a stream into a file.
static void writeXmlToFile(org.dom4j.Document doc, java.io.File f)
          Write document tree to file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CDX_EXTENSION

public static final java.lang.String CDX_EXTENSION
Extension used for CDX files, including separator .

See Also:
Constant Field Values

ARC_EXTENSION

public static final java.lang.String ARC_EXTENSION
Extension used for ARC files, including separator .

See Also:
Constant Field Values

ARC_GZIPPED_EXTENSION

public static final java.lang.String ARC_GZIPPED_EXTENSION
Extension used for ARC files, including separator .

See Also:
Constant Field Values

ARC_PATTERN

public static final java.lang.String ARC_PATTERN
Pattern matching ARC files, including separator. Note: (?i) means case insensitive, (\\.gz)? means .gz is optionally matched, and $ means matches end-of-line. Thus this pattern will match file.arc.gz, file.ARC, file.aRc.GZ, but not file.ARC.open

See Also:
Constant Field Values

OPEN_ARC_PATTERN

public static final java.lang.String OPEN_ARC_PATTERN
Pattern matching open ARC files, including separator . Note: (?i) means case insensitive, (\\.gz)? means .gz is optionally matched, and $ means matches end-of-line. Thus this pattern will match file.arc.gz.open, file.ARC.open, file.arc.GZ.OpEn, but not file.ARC.open.txt

See Also:
Constant Field Values

log

public static final org.apache.commons.logging.Log log

CDX_FILE_FILTER

public static final java.io.FilenameFilter CDX_FILE_FILTER
A FilenameFilter accepting a file if and only if its name (transformed to lower case) ends on ".cdx".


OPEN_ARCS_FILTER

public static final java.io.FilenameFilter OPEN_ARCS_FILTER
A filter that matches files left open by a crashed Heritrix process. Don't work on these files while Heritrix is still working on them.


ARCS_FILTER

public static final java.io.FilenameFilter ARCS_FILTER
A filter that matches arc files, that is any file that ends on .arc or .arc.gz in any case.


MAX_IDS_IN_FILENAME

public static final int MAX_IDS_IN_FILENAME
Maximum number of IDs we will put in a filename. Above this number, a checksum of the ids is generated instead. This is done to protect us from getting filenames too long for the filesystem.

See Also:
Constant Field Values
Constructor Detail

FileUtils

public FileUtils()
Method Detail

removeRecursively

public static final boolean removeRecursively(java.io.File f)
Remove a file and any subfiles in case of directories.

Parameters:
f - A file to completely and utterly remove.
Returns:
true if the file did exist, false otherwise.
Throws:
java.lang.SecurityException - If a security manager exists and its SecurityManager.checkDelete(java.lang.String) method denies delete access to the file

remove

public static final boolean remove(java.io.File f)
Remove a file .

Parameters:
f - A file to completely and utterly remove.
Returns:
true if the file did exist, false otherwise.
Throws:
ArgumentNotValid - if f is null.
java.lang.SecurityException - If a security manager exists and its SecurityManager.checkDelete(java.lang.String) method denies delete access to the file

formatFilename

public static java.lang.String formatFilename(java.lang.String filename)
Returns a valid filename for most filesystems. Exchanges the following characters:

" " -> "_" ":" -> "_"

Parameters:
filename - the filename to format correctly
Returns:
a new formatted filename

getFilesRecursively

public static java.util.List getFilesRecursively(java.lang.String dir,
                                                 java.util.List<java.io.File> files,
                                                 java.lang.String type)
Retrieves all files whose names ends with 'type' from directory 'dir' and all its subdirectories.

Parameters:
dir - Path of base directory
files - Initially, an empty list (e.g. an ArrayList)
type - The extension/ending of the files to retrieve (e.g. ".xml", ".ARC")
Returns:
A list of files from directory 'dir' and all its subdirectories

readFile

public static final java.lang.String readFile(java.io.File filename)
                                       throws java.io.FileNotFoundException,
                                              java.io.IOException
Load file content into text string.

Parameters:
filename - - file to load
Returns:
file content loaded into text string
Throws:
java.io.FileNotFoundException
java.io.IOException

copyFile

public static final void copyFile(java.io.File from,
                                  java.io.File to)
Copy file from one location to another. Will silently overwrite an already existing file.

Parameters:
from - original to copy
to - destination of copy
Throws:
IOFailure - if an io error occurs while copying file.

copyDirectory

public static final void copyDirectory(java.io.File from,
                                       java.io.File to)
                                throws IOFailure
Copy an entire directory from one location to another. Note that this will silently overwrite old files, just like copyFile().

Parameters:
from - Original directory (or file, for that matter) to copy.
to - Destination directory, i.e. the 'new name' of the copy of the from directory.
Throws:
IOFailure

readBinaryFile

public static byte[] readBinaryFile(java.io.File file)
                             throws IOFailure
Read an entire file, byte by byte, into a byte array, ignoring any locale issues.

Parameters:
file - A file to be read.
Returns:
A byte array with the contents of the file.
Throws:
IOFailure

writeBinaryFile

public static void writeBinaryFile(java.io.File file,
                                   byte[] b)
Write an entire byte array to a file, ignoring any locale issues.

Parameters:
file - The file to write the data to
b - The byte array to write to the file

getXmlFilesFilter

public static java.io.FilenameFilter getXmlFilesFilter()
Return a filter that only accepts XML files (ending with .xml).

Returns:
A new filter for XML files.

readListFromFile

public static java.util.List<java.lang.String> readListFromFile(java.io.File file)
Read a all lines from a file into a list of strings.

Parameters:
file - The file to read from.
Returns:
The list of lines.
Throws:
IOFailure - on trouble reading the file.

writeXmlToFile

public static void writeXmlToFile(org.dom4j.Document doc,
                                  java.io.File f)
                           throws IOFailure
Write document tree to file.

Parameters:
doc - the document tree to save.
f - the file to write the document to.
Throws:
IOFailure - On trouble writing XML file to disk.

writeCollectionToFile

public static void writeCollectionToFile(java.io.File file,
                                         java.util.Collection<java.lang.String> collection)
Writes a collection of strings to a file, each string on one line.

Parameters:
file - A file to write to. The contents of this file will be overwritten.
collection - The collection to write. The order it will be written in is unspecified.
Throws:
IOFailure - if any error occurs writing to the file.
ArgumentNotValid - if file or collection is null.

makeSortedFile

public static void makeSortedFile(java.io.File unsortedFile,
                                  java.io.File sortedOutput)
Sort a file into another. The current implementation slurps all lines into memory. This will not scale forever.

Parameters:
unsortedFile - A file to sort
sortedOutput - The file to sort into

removeLineFromFile

public static void removeLineFromFile(java.lang.String line,
                                      java.io.File file)
Remove a line from a given file.

Parameters:
line - The full line to remove
file - The file to remove the line from. This file will be rewritten in full, and the entire contents will be kept in memory
Throws:
UnknownID - If the file does not exist

createDir

public static boolean createDir(java.io.File dir)
                         throws PermissionDenied
Check if the directory exists and is writable and create it if needed. The complete path down to the directory is created. If the directory creation fails a PermissionDenied exception is thrown.

Parameters:
dir - The directory to create
Returns:
true if dir created.
Throws:
ArgumentNotValid - If dir is null or its name is the empty string
PermissionDenied - If directory cannot be created for any reason, or is not writable.

getBytesFree

public static long getBytesFree(java.io.File f)
Returns the number of bytes free on the file system that the given file resides on. Warning: Slow method, and only works on Linux, Windows, and Mac OS X!

Parameters:
f - a given file
Returns:
the number of bytes free on the file system where file f resides.

relativeTo

public static java.lang.String relativeTo(java.io.File theFile,
                                          java.io.File crawlDir)
Parameters:
theFile - A file to make relative
crawlDir - A directory
Returns:
the filepath of the theFile relative to crawldir. null, if theFile is not relative to crawldir. null, if crawldir is not a directory.

countLines

public static long countLines(java.io.File file)
Count the number of lines in a file.

Parameters:
file - the file to read
Returns:
the number of lines in the file

getEphemeralInputStream

public static java.io.InputStream getEphemeralInputStream(java.io.File file)
Create an InputStream that reads from a file but removes the file when all data has been read.

Parameters:
file - A file to read. This file will be deleted when the inputstream is closed, finalized, reaches end-of-file, or when the VM closes.
Returns:
An InputStream containing the file's contents.

makeValidFileFromExisting

public static java.io.File makeValidFileFromExisting(java.lang.String filename)
                                              throws IOFailure
Makes a valid file from filename passed in String. Ensures that the File object returned is not null, and that isFile() returns true.

Parameters:
filename - The file to create the File object from
Returns:
A valid, non-null File object.
Throws:
IOFailure

writeFileToStream

public static void writeFileToStream(java.io.File f,
                                     java.io.OutputStream out)
Write the entire contents of a file to a stream.

Parameters:
f - A file to write to the stream.
out - The stream to write to.

writeStreamToFile

public static void writeStreamToFile(java.io.InputStream in,
                                     java.io.File f)
Write the contents of a stream into a file.

Parameters:
in - A stream to read from. This stream is not closed by this method.
f - The file to write the stream contents into.

getTempDir

public static java.io.File getTempDir()
Get the location of the standard temporary directory. The existence of this directory should be ensure at the start of every application.

Returns:
The directory that should be used for temporary files.

moveFile

public static void moveFile(java.io.File fromFile,
                            java.io.File toFile)
Attempt to move a file using rename, and if that fails, move the file by copy-and-delete.

Parameters:
fromFile - The source
toFile - The target

generateFileNameFromSet

public static <T extends java.lang.Comparable<T>> java.lang.String generateFileNameFromSet(java.util.Set<T> IDs,
                                                                                           java.lang.String suffix)
Given a set, generate a reasonable file name from the set.

Parameters:
IDs - A set of IDs.
suffix - A suffix.
Returns:
A reasonable file name.

createUniqueTempDir

public static java.io.File createUniqueTempDir(java.io.File inDir,
                                               java.lang.String prefix)
Creates a new temporary directory with a unique name. This directory will be deleted automatically at the end of the VM (though behaviour if there are files in it is undefined). This method will try a limited number of times to create a directory, using a randomly generated suffix, before giving up.

Parameters:
inDir - The directory where the temporary directory should be created.
prefix - The prefix of the directory name, for identification purposes.
Returns:
A newly created directory that no other calls to createUniqueDir returns.
Throws:
ArgumentNotValid - if inDir is not an existing directory that can be written to.
IOFailure - if a free name couldn't be found within a reasonable number of tries.

readLastLine

public static java.lang.String readLastLine(java.io.File file)
Read the last line in a file. Note this method is not UTF-8 safe.

Parameters:
file - input file to read last line from.
Returns:
The last line in the file (ending newline is irrelevant), returns an empty string if file is empty.
Throws:
ArgumentNotValid - on null argument, or file is not a readable file.
IOFailure - on IO trouble reading file.

appendToFile

public static void appendToFile(java.io.File file,
                                java.lang.String... lines)
Append the given lines to a file. Each lines is terminated by a newline.

Parameters:
file - A file to append to.
lines - The lines to write.
Throws:
IOFailure - if anything goes wrong writing.