dk.netarkivet.archive.arcrepositoryadmin
Class ReplicaCacheDatabase

java.lang.Object
  extended by dk.netarkivet.archive.arcrepositoryadmin.ReplicaCacheDatabase
All Implemented Interfaces:
BitPreservationDAO, CleanupIF

public final class ReplicaCacheDatabase
extends java.lang.Object
implements BitPreservationDAO

Method for storing the bitpreservation cache in a database. This method uses the 'admin.data' file for retrieving the upload status. TODO this file is extremely large (more than 2000 lines) and should be shortened.


Field Summary
protected static org.apache.commons.logging.Log log
          The log.
 
Method Summary
 void addChecksumInformation(java.util.List<java.lang.String> checksumOutput, Replica replica)
          Given the output of a checksum job, add the results to the database.
 void addFileListInformation(java.util.List<java.lang.String> filelist, Replica replica)
          Method for adding the results from a list of filenames on a replica.
 void changeStateOfReplicafileinfo(java.lang.String filename, Replica replica, ReplicaStoreState state)
          Method for inserting an entry into the database about a file upload has begun for a specific replica.
 void changeStateOfReplicafileinfo(java.lang.String filename, java.lang.String checksum, Replica replica, ReplicaStoreState state)
          Method for inserting an entry into the database about a file upload has begun for a specific replica.
 void cleanup()
          Method for cleaning up.
 boolean existsFileInDB(java.lang.String filename)
          Checks whether a file is already in the file table in the database.
 Replica getBitarchiveWithGoodFile(java.lang.String filename)
          Method for finding a replica with a valid version of a file.
 Replica getBitarchiveWithGoodFile(java.lang.String filename, Replica badReplica)
          Method for finding a replica with a valid version of a file.
 java.lang.String getChecksum(java.lang.String filename)
          Method for retrieving the checksum for a specific file.
 java.sql.Date getDateOfLastMissingFilesUpdate(Replica replica)
          Get the date for the last file list job.
 java.sql.Date getDateOfLastWrongFilesUpdate(Replica replica)
          Method for retrieving the date for the last update for corrupted files.
static ReplicaCacheDatabase getInstance()
          Method for retrieving the current instance of this class.
 java.lang.Iterable<java.lang.String> getMissingFilesInLastUpdate(Replica replica)
          Method for retrieving the list of the names of the files which was missing for the replica in the last filelist update.
 long getNumberOfFiles(Replica replica)
          Method for retrieving the number of files within a replica.
 long getNumberOfMissingFilesInLastUpdate(Replica replica)
          Method for retrieving the number of files missing from a specific replica.
 long getNumberOfWrongFilesInLastUpdate(Replica replica)
          Method for retrieving the amount of files with a incorrect checksum within a replica.
 ReplicaFileInfo getReplicaFileInfo(java.lang.String filename, Replica replica)
          Method for retrieving the entry in the replicafileinfo table for a given file and replica.
 ReplicaStoreState getReplicaStoreState(java.lang.String filename, java.lang.String replicaId)
          Retrieves the ReplicaStoreState for the entry in the replicafileinfo table, which refers to the given file and replica.
 java.lang.Iterable<java.lang.String> getWrongFilesInLastUpdate(Replica replica)
          Method for retrieving the list of the files in the replica which have a incorrect checksum.
protected  void initialiseDB()
          Method for initialising the database.
 boolean insertAdminEntry(java.lang.String line)
          Method for inserting a line of Admin.Data into the database.
 void insertNewFileForUpload(java.lang.String filename, java.lang.String checksum)
          Creates a new entry for the filename for each replica, and give it the given checksum and set the upload_status = UNKNOWN_UPLOAD_STATUS.
 boolean isEmpty()
          Method for telling whether the database is empty.
 java.util.Collection<java.lang.String> retrieveAllFilenames()
          Retrieves the names of all the files in the file table of the database.
 java.lang.String retrieveAsText()
          Method to print all the tables in the database.
 FileListStatus retrieveFileListStatus(java.lang.String filename, Replica replica)
          Method for retrieving the filelist_status for a replicafileinfo entry.
 java.util.Collection<java.lang.String> retrieveFilenamesForReplicaEntries(java.lang.String replicaId, ReplicaStoreState state)
          Retrieves the names of all the files in the given replica which has the specified UploadStatus.
 void setAdminDate(java.sql.Date date)
          Method for setting a specific value for the filelistdate and the checksumlistdate for all the replicas.
 void setReplicaStoreState(java.lang.String filename, java.lang.String replicaId, ReplicaStoreState state)
          Sets the ReplicaStoreState for the entry in the replicafileinfo table.
 void updateChecksumInformationForFileOnReplica(java.lang.String filename, java.lang.String checksum, Replica replica)
          Method for updating a specific entry in the replicafileinfo table.
 void updateChecksumStatus()
          This method is used to update the status for the checksums for all replicafileinfo entries.
 void updateChecksumStatus(java.lang.String filename)
          Method for updating the status for a specific file for all the replicas.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

log

protected static org.apache.commons.logging.Log log
The log.

Method Detail

getInstance

public static ReplicaCacheDatabase getInstance()
Method for retrieving the current instance of this class.

Returns:
The current instance.

initialiseDB

protected void initialiseDB()
Method for initialising the database. This basically makes sure that all the replicas are within the database, and that no unknown replicas have been defined.


getReplicaFileInfo

public ReplicaFileInfo getReplicaFileInfo(java.lang.String filename,
                                          Replica replica)
                                   throws ArgumentNotValid
Method for retrieving the entry in the replicafileinfo table for a given file and replica.

Specified by:
getReplicaFileInfo in interface BitPreservationDAO
Parameters:
filename - The name of the file for the entry.
replica - The replica of the entry.
Returns:
The replicafileinfo entry corresponding to the given filename and replica.
Throws:
ArgumentNotValid - If the filename is either null or empty, or if the replica is null.

getChecksum

public java.lang.String getChecksum(java.lang.String filename)
                             throws ArgumentNotValid
Method for retrieving the checksum for a specific file. Since a file is not directly attached with a checksum, the checksum of a file must be found by having the replicafileinfo entries for the file vote about it.

Parameters:
filename - The name of the file, whose checksum are to be found.
Returns:
The checksum of the file, or a Null if no validated checksum can be found.
Throws:
ArgumentNotValid - If teh filename is either null or the empty string.

retrieveAllFilenames

public java.util.Collection<java.lang.String> retrieveAllFilenames()
Retrieves the names of all the files in the file table of the database.

Returns:
The list of filenames known by the database.

getReplicaStoreState

public ReplicaStoreState getReplicaStoreState(java.lang.String filename,
                                              java.lang.String replicaId)
                                       throws ArgumentNotValid
Retrieves the ReplicaStoreState for the entry in the replicafileinfo table, which refers to the given file and replica.

Parameters:
filename - The name of the file in the filetable.
replicaId - The id of the replica.
Returns:
The ReplicaStoreState for the specified entry.
Throws:
ArgumentNotValid - If the replicaId or the filename are eihter null or the empty string.

setReplicaStoreState

public void setReplicaStoreState(java.lang.String filename,
                                 java.lang.String replicaId,
                                 ReplicaStoreState state)
                          throws ArgumentNotValid
Sets the ReplicaStoreState for the entry in the replicafileinfo table.

Parameters:
filename - The name of the file in the filetable.
replicaId - The id of the replica.
state - The ReplicaStoreState for the specified entry.
Throws:
ArgumentNotValid - If the replicaId or the filename are eihter null or the empty string. Or if the ReplicaStoreState is null.

insertNewFileForUpload

public void insertNewFileForUpload(java.lang.String filename,
                                   java.lang.String checksum)
                            throws ArgumentNotValid,
                                   IllegalState
Creates a new entry for the filename for each replica, and give it the given checksum and set the upload_status = UNKNOWN_UPLOAD_STATUS.

Parameters:
filename - The name of the file.
checksum - The checksum of the file.
Throws:
ArgumentNotValid - If the filename or the checksum is either null or the empty string.
IllegalState - If the file exists with another checksum on one of the replicas. Or if the file has already been completely uploaded to one of the replicas.

changeStateOfReplicafileinfo

public void changeStateOfReplicafileinfo(java.lang.String filename,
                                         Replica replica,
                                         ReplicaStoreState state)
                                  throws ArgumentNotValid
Method for inserting an entry into the database about a file upload has begun for a specific replica. It is not tested whether the entry has another checksum or another UploadStatus.

Parameters:
filename - The name of the file.
replica - The replica for the replicafileinfo.
state - The new ReplicaStoreState for the entry.
Throws:
ArgumentNotValid - If the filename is either null or the empty string. Or if the replica or the status is null.

changeStateOfReplicafileinfo

public void changeStateOfReplicafileinfo(java.lang.String filename,
                                         java.lang.String checksum,
                                         Replica replica,
                                         ReplicaStoreState state)
                                  throws ArgumentNotValid,
                                         IllegalState
Method for inserting an entry into the database about a file upload has begun for a specific replica. It is not tested whether the entry has another checksum or another UploadStatus.

Parameters:
filename - The name of the file.
checksum - The new checksum for the entry.
replica - The replica for the replicafileinfo.
state - The new ReplicaStoreState for the entry.
Throws:
ArgumentNotValid - If the filename or the checksum is either null or the empty string. Or if the replica or the status is null.
IllegalState - If an sql exception is thrown.

retrieveFilenamesForReplicaEntries

public java.util.Collection<java.lang.String> retrieveFilenamesForReplicaEntries(java.lang.String replicaId,
                                                                                 ReplicaStoreState state)
                                                                          throws ArgumentNotValid
Retrieves the names of all the files in the given replica which has the specified UploadStatus.

Parameters:
replicaId - The id of the replica which contain the files.
state - The ReplicaStoreState for the wanted files.
Returns:
The list of filenames for the entries in the replica which has the specified UploadStatus.
Throws:
ArgumentNotValid - If the UploadStatus is null or if the replicaId is either null or the empty string.

existsFileInDB

public boolean existsFileInDB(java.lang.String filename)
                       throws IllegalState
Checks whether a file is already in the file table in the database.

Parameters:
filename - The name of the file in the database.
Returns:
Whether the file was found in the database.
Throws:
IllegalState - If more than one entry with the given filename was found.

retrieveFileListStatus

public FileListStatus retrieveFileListStatus(java.lang.String filename,
                                             Replica replica)
                                      throws ArgumentNotValid
Method for retrieving the filelist_status for a replicafileinfo entry.

Parameters:
filename - The name of the file.
replica - The replica where the file should be.
Returns:
The filelist_status for the file in the replica.
Throws:
ArgumentNotValid - If the replica is null or the filename is either null or the empty string.

updateChecksumStatus

public void updateChecksumStatus()
This method is used to update the status for the checksums for all replicafileinfo entries.

For each file in the database, the checksum vote is made in the following way.
Each entry in the replicafileinfo table containing the file is retrieved. All the unique checksums are retrieved, e.g. if a checksum is found more than one, then it is ignored.
If only one unique checksum is found, then if must be the correct one, and all the replicas with this file will have their checksum_status set to 'OK'.
If more than one checksum is found, then a vote for the correct checksum is performed. This is done by counting the amount of time each of the unique checksum is found among the replicafileinfo entries for the current file. The checksum with most votes is chosen as the correct one, and the checksum_status for all the replicafileinfo entries with this checksum is set to 'OK', whereas the replicafileinfo entries with a different checksum is set to 'CORRUPT'.
If no winner is found then a warning and a notification is issued, and the checksum_status for all the replicafileinfo entries with for the current file is set to 'UNKNOWN'.

Specified by:
updateChecksumStatus in interface BitPreservationDAO

updateChecksumStatus

public void updateChecksumStatus(java.lang.String filename)
                          throws ArgumentNotValid
Method for updating the status for a specific file for all the replicas. If the checksums for the replicas differ for some replica, then based on a checksum vote, a specific checksum is chosen as the 'correct' one, and the entries with another checksum than the 'correct one' will be marked as corrupt.

Specified by:
updateChecksumStatus in interface BitPreservationDAO
Parameters:
filename - The name of the file to update the status for.
Throws:
ArgumentNotValid - If the filename is either null or the empty string.

addChecksumInformation

public void addChecksumInformation(java.util.List<java.lang.String> checksumOutput,
                                   Replica replica)
Given the output of a checksum job, add the results to the database. The following fields in the table are updated for each corresponding entry in the replicafileinfo table:
- checksum = the given checksum.
- filelist_status = ok.
- filelist_checkdatetime = now.
- checksum_checkdatetime = now.

Specified by:
addChecksumInformation in interface BitPreservationDAO
Parameters:
checksumOutput - The output of a checksum job.
replica - The replica this checksum job is for.

addFileListInformation

public void addFileListInformation(java.util.List<java.lang.String> filelist,
                                   Replica replica)
                            throws ArgumentNotValid,
                                   UnknownID
Method for adding the results from a list of filenames on a replica. This list of filenames should return the list of all the files within the database. For each file in the FileListJob the following fields are set for the corresponding entry in the replicafileinfo table:
- filelist_status = ok.
- filelist_checkdatetime = now. For each entry in the replicafileinfo table for the replica which are missing in the results from the FileListJob the following fields are assigned the following values:
- filelist_status = missing.
- filelist_checkdatetime = now.

Specified by:
addFileListInformation in interface BitPreservationDAO
Parameters:
filelist - The list of filenames either parsed from a FilelistJob or the result from a GetAllFilenamesMessage.
replica - The replica, which the FilelistBatchjob has run upon.
Throws:
ArgumentNotValid - If the filelist or the replica is null.
UnknownID - If the replica does not already exist in the database.

getDateOfLastMissingFilesUpdate

public java.sql.Date getDateOfLastMissingFilesUpdate(Replica replica)
                                              throws ArgumentNotValid,
                                                     java.lang.IllegalArgumentException
Get the date for the last file list job.

Specified by:
getDateOfLastMissingFilesUpdate in interface BitPreservationDAO
Parameters:
replica - The replica to get the date for.
Returns:
The date of the last missing files update for the replica. A null is returned if no last missing files update has been performed.
Throws:
ArgumentNotValid - If the replica is null.
java.lang.IllegalArgumentException - If the Date of the Timestamp cannot be instanciated.

getDateOfLastWrongFilesUpdate

public java.sql.Date getDateOfLastWrongFilesUpdate(Replica replica)
                                            throws ArgumentNotValid,
                                                   java.lang.IllegalArgumentException
Method for retrieving the date for the last update for corrupted files. This method does not contact the replicas, it only retrieves the data from the last time the checksum was retrieved.

Specified by:
getDateOfLastWrongFilesUpdate in interface BitPreservationDAO
Parameters:
replica - The replica to find the date for the latest update for corruption of files.
Returns:
The date for the last checksum update. A null is returned if no wrong files update has been performed for this replica.
Throws:
ArgumentNotValid - If the replica is null.
java.lang.IllegalArgumentException - If the Date of the Timestamp cannot be instanciated.

getNumberOfMissingFilesInLastUpdate

public long getNumberOfMissingFilesInLastUpdate(Replica replica)
                                         throws ArgumentNotValid
Method for retrieving the number of files missing from a specific replica. This method does not contact the replica directly, it only retrieves the count of missing files from the last filelist update.

Specified by:
getNumberOfMissingFilesInLastUpdate in interface BitPreservationDAO
Parameters:
replica - The replica to find the number of missing files for.
Returns:
The number of missing files for the replica.
Throws:
ArgumentNotValid - If the replica is null.

getMissingFilesInLastUpdate

public java.lang.Iterable<java.lang.String> getMissingFilesInLastUpdate(Replica replica)
                                                                 throws ArgumentNotValid
Method for retrieving the list of the names of the files which was missing for the replica in the last filelist update. This method does not contact the replica, it only uses the database to find the files, which was missing during the last filelist update.

Specified by:
getMissingFilesInLastUpdate in interface BitPreservationDAO
Parameters:
replica - The replica to find the list of missing files for.
Returns:
A list containing the names of the files which are missing in the given replica.
Throws:
ArgumentNotValid - If the replica is null.

getNumberOfWrongFilesInLastUpdate

public long getNumberOfWrongFilesInLastUpdate(Replica replica)
                                       throws ArgumentNotValid
Method for retrieving the amount of files with a incorrect checksum within a replica. This method does not contact the replica, it only uses the database to count the amount of files which are corrupt.

Specified by:
getNumberOfWrongFilesInLastUpdate in interface BitPreservationDAO
Parameters:
replica - The replica to find the number of corrupted files for.
Returns:
The number of corrupted files.
Throws:
ArgumentNotValid - If the replica is null.

getWrongFilesInLastUpdate

public java.lang.Iterable<java.lang.String> getWrongFilesInLastUpdate(Replica replica)
                                                               throws ArgumentNotValid
Method for retrieving the list of the files in the replica which have a incorrect checksum. E.g. the checksum_status is set to CORRUPT. This method does not contact the replica, it only uses the local database.

Specified by:
getWrongFilesInLastUpdate in interface BitPreservationDAO
Parameters:
replica - The replica to find the list of corrupted files for.
Returns:
The list of files which have wrong checksums.
Throws:
ArgumentNotValid - If the replica is null.

getNumberOfFiles

public long getNumberOfFiles(Replica replica)
                      throws ArgumentNotValid
Method for retrieving the number of files within a replica. This count all the files which are not missing from the replica, thus all entries in the replicafileinfo table which has the filelist_status set to OK. It is ignored whether the files has a correct checksum. This method does not contact the replica, it only uses the local database.

Specified by:
getNumberOfFiles in interface BitPreservationDAO
Parameters:
replica - The replica to count the number of files for.
Returns:
The number of files within the replica.
Throws:
ArgumentNotValid - If the replica is null.

getBitarchiveWithGoodFile

public Replica getBitarchiveWithGoodFile(java.lang.String filename)
                                  throws ArgumentNotValid
Method for finding a replica with a valid version of a file. This method is used in order to find a replica from which a file should be retrieved, during the process of restoring a corrupt file on another replica. This replica must of the type bitarchive, since a file cannot be retrieved from a checksum replica.

Specified by:
getBitarchiveWithGoodFile in interface BitPreservationDAO
Parameters:
filename - The name of the file which needs to have a valid version in a bitarchive.
Returns:
A bitarchive which contains a valid version of the file, or null if no such bitarchive exists.
Throws:
ArgumentNotValid - If the filename is null or the empty string.

getBitarchiveWithGoodFile

public Replica getBitarchiveWithGoodFile(java.lang.String filename,
                                         Replica badReplica)
                                  throws ArgumentNotValid
Method for finding a replica with a valid version of a file. This method is used in order to find a replica from which a file should be retrieved, during the process of restoring a corrupt file on another replica. This replica must of the type bitarchive, since a file cannot be retrieved from a checksum replica.

Specified by:
getBitarchiveWithGoodFile in interface BitPreservationDAO
Parameters:
filename - The name of the file which needs to have a valid version in a bitarchive.
badReplica - The Replica which has a bad copy of the given file
Returns:
A bitarchive which contains a valid version of the file, or null if no such bitarchive exists.
Throws:
ArgumentNotValid - If the replica is null or the filename is either null or the empty string.

updateChecksumInformationForFileOnReplica

public void updateChecksumInformationForFileOnReplica(java.lang.String filename,
                                                      java.lang.String checksum,
                                                      Replica replica)
                                               throws ArgumentNotValid
Method for updating a specific entry in the replicafileinfo table. Based on the filename, checksum and replica it is verified whether a file is missing, corrupt or valid.

Specified by:
updateChecksumInformationForFileOnReplica in interface BitPreservationDAO
Parameters:
filename - Name of the file.
checksum - The checksum of the file. Is allowed to be null, if no file is found.
replica - The replica where the file exists.
Throws:
ArgumentNotValid - If the filename is null or the empty string, or if the replica is null.

insertAdminEntry

public boolean insertAdminEntry(java.lang.String line)
                         throws ArgumentNotValid
Method for inserting a line of Admin.Data into the database. It is assumed that it is a '0.4' admin.data line.

Parameters:
line - The line to insert into the database.
Returns:
Whether the line was valid.
Throws:
ArgumentNotValid - If the line is null. If it is empty, then it is logged.

setAdminDate

public void setAdminDate(java.sql.Date date)
                  throws ArgumentNotValid
Method for setting a specific value for the filelistdate and the checksumlistdate for all the replicas.

Parameters:
date - The new date for the checksumlist and filelist for all the replicas.
Throws:
ArgumentNotValid - If the date is null.

isEmpty

public boolean isEmpty()
Method for telling whether the database is empty. The database is empty if it does not contain any files. The database will not be entirely empty, since the replicas are put into the replica table during the instantiation of this class, but if the file table is empty, then the replicafileinfo table is also empty, and the database will be considered empty.

Returns:
Whether the file list is empty.

retrieveAsText

public java.lang.String retrieveAsText()
Method to print all the tables in the database.

Returns:
all the tables as a text string

cleanup

public void cleanup()
Method for cleaning up.

Specified by:
cleanup in interface BitPreservationDAO
Specified by:
cleanup in interface CleanupIF