Page tree
Skip to end of metadata
Go to start of metadata

Check filelist is correct and database is loaded with missing files 

  • Click on Bitpreservation
  • Click on "Update" under Filelist status
  • Open a new tab in your browser and go to System status: http://$GUIadminserver:$http-port/Status/Monitor-JMXsummary.jsp
  • Check that you get INFO messages like this INFO: The file 'TEST2_999.arc' was not found in the database. Thus creating entry for the file.

( be aware of, that the last missing file name inserted in the database will be listed until end )

  • And wait until the the filelist is completed without any errors. The last log message should read something like 'INFO: Received batch ended from bitarchive '172.17.0.176_BitApp_G':

BatchEndedMessage for batch job ID:51371-130.226.228.6(ca:65:66:63:d6:69)-55327-1270553597402 From Bitarchive 172.17.0.176_BitApp_G FilesProcessed = 27569 ' for each bitarchiveserver

Here is a list with number files per archive:

Total : 156775 files

14/09/2010 only 5 bitapps each with about 25.800 files having only 25.800 unique files

Check checksum is correct

  • Click on Bitpreservation
  • Click on "Update" under Checksum status
  • Open a new tab in your browser and go to System status: http://$GUIadminserver:$http-port/Status/Monitor-JMXsummary.jsp
  • Click on Instanse-ID.
  • Click on one of the the first bitarchive instanse-ID's.
  • Click Show all in the Index column and.
  • Verify that you get log messages like "INFO: The batchjob 'class dk.netarkivet.archive.arcrepository.bitpreservation.ChecksumJob' has run for 1938 seconds and has reached file '11297-MB100.arc' which is number 1615 out of 27620" each 30. sec ( be aware of, that the checksum logmessages can be delayed because of very big files > 1 GB)
  • And wait until the the checksum is completed without any errors. The last log message should read something like 'INFO: Finished batch job dk.netarkivet.archive.arcrepository.bitpreservation.ChecksumJob with result: 0 failures in processing 27620 files at 172.17.0.176_BitApp_E'

Stress test batch jobs

Setup test

export TESTX=TEST11B
cd /home/test/$TESTX/
mkdir batchprogs
scp test@kb-prod-udv-001.kb.dk:/home/test/test-batch/* batchprogs/.

ChecksumJob

Calculating the MD5 checksum on the archive files (takes around 8 hours).

Run the following command:

java -cp lib/dk.netarkivet.archive.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/ChecksumJob.class -Ooutput.checksum

This should write out the following text messages in the console:

Running batch job 'batchprogs/ChecksumJob.class' on files matching '.*' on replica 'KBN', output written to file 'output.checksum', errors written to stderr
Processed 11 files with 0 failures
Cleaning up dk.netarkivet.common.distribute.JMSConnectionSunMQ
Cleaned up dk.netarkivet.common.distribute.JMSConnectionSunMQ

The output is put into the file 'output.checksum'. This file should contain the following text:

1-1-20090316092641-00003-kb-test-har-002.kb.dk.arc##c68b3e18f7b870b76d86de7970a822c2
2-2-20090316092643-00003-kb-test-har-001.kb.dk.arc##7d723dd4d374437c5e29e995521bf014
.......

GoodPostProcessingJob

Run the following job:

java -cp lib/dk.netarkivet.archive.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/GoodPostProcessingJob.class -Ogood.out

The output is put into the file 'good.out'. This file should contain the following text (sorted):

0G5.arc
0G5.arc
...

Go to the status page and check the log for the BitarchiveMonitor . There should be the following messages:

Jun 11, 2010 9:24:31 AM dk.netarkivet.archive.bitarchive.distribute.BitarchiveMonitorServer doBatchReply
INFO: BatchReplyMessage: 'BatchReplyMessage for batch job ID:10-130.226.228.6(d3:b5:49:8b:d6:94)-37536-1276241061823
FilesProcessed = 156775
FilesFailed = 0
ID:1906780-130.226.228.6(d4:77:b5:34:b8:ae)-52334-1276241071022: To TEST11B_COMMON_THIS_REPOS_CLIENT_130_226_228_6_GUIA_BATCH ReplyTo TEST11B_KB_THE_BAMON OK' sent from BA monitor to queue: '[Queue 'TEST11B_COMMON_THIS_REPOS_CLIENT_130_226_228_6_GUIA_BATCH']'

Jun 11, 2010 9:24:27 AM dk.netarkivet.common.utils.batch.GoodPostProcessingJob postProcess
INFO: Sorting the filenames

Jun 11, 2010 9:24:27 AM dk.netarkivet.common.utils.batch.GoodPostProcessingJob postProcess
INFO: Reading all the filenames.

Jun 11, 2010 9:24:27 AM dk.netarkivet.archive.bitarchive.distribute.BitarchiveMonitorServer doBatchReply
INFO: Post processing batchjob results for 'dk.netarkivet.common.utils.batch.LoadableFileBatchJob' with id 'ID:10-130.226.228.6(d3:b5:49:8b:d6:94)-37536-1276241061823'

EvilPostProcessingJob

Run the following job:

java -cp lib/dk.netarkivet.archive.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/EvilPostProcessingJob.class -Oevil.out

The output is put into the file 'evil.out'. This file should contain the following text (unsorted):

0G5.arc
1G5.arc
...
Go to the status page and check the log for the BitarchiveMonitor . There should be the following messages:
 

Running method from a jar file

Run the following job (takes around 25 hours):

java -cp lib/dk.netarkivet.archive.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH dk.netarkivet.archive.tools.RunBatch -Jbatchprogs/mime.jar -Nbatchprogs.MimeSize -Omimesize.out

The output is put into the file 'mimesize.out'. This file should contain the following text:

..
text/html##567890
image/jpeg##1234567
...

 

 

 

 

 

 

  • No labels