Stress test and crashtests of batch jobs and third-party batch jobs on 14 TB.

 

Start The Test

export TESTX=TEST11B
export PORT=807?
export MAILRECEIVERS=foo@bar.dk
stop_test.sh
cleanup_all_test.sh
prepare_test.sh deploy_config_TEST11B.xml
install_test.sh
start_test.sh

Update The Filelist

In the GUI, update the Filestatus for KB. This will take several minutes to run.

In the GUIApplication log you should see lines like "WARNING: There have been found multiple files with the name '9999-MB100.arc'" and "INFO: The file 'MB100' was not found in the database. Thus creating entry for the file. " and it should end with "INFO: Completed findMissingFiles for replica 'BITARCHIVEReplica (KB) KBN'."

In the Bitpreservation GUI you should see the following:

Filestatus for: KBN
Number of files: 25,890
Missing files: 0

Update Checksum and FileStatus

In the Bitpreservation GUI, click on "Update checksum and filestatus for CS2". After a few minutes, the BitarchiveServer should start producing log messages like

15-08-2013 15:26:47 dk.netarkivet.common.utils.batch.BatchLocalFiles run
INFO: The batchjob 'class dk.netarkivet.common.utils.batch.ChecksumJob' 
has run for 135 seconds and has reached file '1-MB100.arc', which is 
number 10 out of 25890

Wait until the job has finished (about seven hours) with a message like

INFO: Finished batch job 
dk.netarkivet.archive.arcrepository.bitpreservation.ChecksumJob with 
result: 0 failures in processing 27620 files at 172.17.0.176_BitApp_E

on the BitApp and

Aug 15, 2013 10:11:34 PM dk.netarkivet.archive.arcrepositoryadmin.ReplicaCacheDatabase updateChecksumStatus
INFO: UpdateChecksumStatus operation completed!
Aug 15, 2013 10:11:34 PM dk.netarkivet.archive.arcrepository.bitpreservation.DatabaseBasedActiveBitPreservation findChangedFiles
INFO: Completed findChangedFiles for replica 'CHECKSUMReplica (CS2) CS2N'.

in the GUIApplication log.

Start Some Batch Jobs

On devel@kb-test-adm-001:

export TESTX=TEST11B
cd /home/devel/$TESTX/
mkdir batchprogs
scp test@kb-prod-udv-001.kb.dk:/home/test/test-batch/* batchprogs/.

Then start the following four jobs:

nohup java -cp lib/netarchivesuite-archive-core.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/ChecksumJob.class -Ooutput.checksum &

 nohup java -cp lib/netarchivesuite-archive-core.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH2 dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/GoodPostProcessingJob.class -Ogood.out &

nohup java -cp lib/netarchivesuite-archive-core.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH3 dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/EvilPostProcessingJob.class -Oevil.out &

nohup java -cp lib/netarchivesuite-archive-core.jar -Ddk.netarkivet.settings.file=conf/settings_ArcRepositoryApplication.xml -Dsettings.common.applicationInstanceId=BATCH4 dk.netarkivet.archive.tools.RunBatch -Jbatchprogs/mime.jar -Nbatchprogs.MimeSize -Omimesize.out &

The "good" and "evil" jobs complete quickly. You can follow the progress of all the jobs simply with

 ps -fe | grep BATCH

Check The Output of "Good"

The output is put into the file 'good.out'. This file should contain the following text (sorted): 

0G5.arc
0G5.arc
...

Go to the status page and check the log for the BitarchiveMonitor . There should be the following messages: 

Jun 11, 2010 9:24:31 AM dk.netarkivet.archive.bitarchive.distribute.BitarchiveMonitorServer doBatchReply
INFO: BatchReplyMessage: 'BatchReplyMessage for batch job ID:10-130.226.228.6(d3:b5:49:8b:d6:94)-37536-1276241061823
FilesProcessed = 156775
FilesFailed = 0
ID:1906780-130.226.228.6(d4:77:b5:34:b8:ae)-52334-1276241071022: To TEST11B_COMMON_THIS_REPOS_CLIENT_130_226_228_6_GUIA_BATCH ReplyTo TEST11B_KB_THE_BAMON OK' sent from BA monitor to queue: '[Queue 'TEST11B_COMMON_THIS_REPOS_CLIENT_130_226_228_6_GUIA_BATCH']'

Jun 11, 2010 9:24:27 AM dk.netarkivet.common.utils.batch.GoodPostProcessingJob postProcess
INFO: Sorting the filenames

Jun 11, 2010 9:24:27 AM dk.netarkivet.common.utils.batch.GoodPostProcessingJob postProcess
INFO: Reading all the filenames.

Jun 11, 2010 9:24:27 AM dk.netarkivet.archive.bitarchive.distribute.BitarchiveMonitorServer doBatchReply
INFO: Post processing batchjob results for 'dk.netarkivet.common.utils.batch.LoadableFileBatchJob' with id 'ID:10-130.226.228.6(d3:b5:49:8b:d6:94)-37536-1276241061823'

Check The Output of "Evil"

The output is put into the file 'evil.out'. This file should contain the following text (unsorted): 

0G5.arc
1G5.arc
...

 Check The Output Of The Checksum Job

 This may take about eight hours to run. The output is put into the file 'output.checksum'. This file should contain the following text:

1-1-20090316092641-00003-kb-test-har-002.kb.dk.arc##c68b3e18f7b870b76d86de7970a822c2
2-2-20090316092643-00003-kb-test-har-001.kb.dk.arc##7d723dd4d374437c5e29e995521bf014
.......

Check The Output Of MimeSize

This may take up to 25 hours to run. The output is put into the file 'mimesize.out'. This file should contain the following text: 

..
text/html##567890
image/jpeg##1234567
...