Page tree
Skip to end of metadata
Go to start of metadata
Verifies the Batch GUI functionality

Standard functionality

  1. Go to the 'Bitpreservation' -> 'Batchjob Overview' page.
  2. Run the FilelistJob for 'JobID = 1' and 'filetype = Both'. Verify that only filenames starting with 1- are included in the output
  3. Run the  ChecksumJob for 'JobID = .*' and 'filetype = Both'. Verify that both arc and warc files are included in the output.

Adding new BatchJobs

Install new BatchJobs on kb-test-adm-001.kb.dk: (Build them using the recipe on kb-test-adm-001.kb.dk:/home/devel/batch/NetarchiveSuiteBatchprograms/src/README.txt

ssh kb-test-adm-001.kb.dk
export TESTX=TEST11A
# build the BatchJobs.jar from the recipe on kb-test-adm-001.kb.dk:/home/devel/batch/NetarchiveSuiteBatchprograms/src/README.txt
export BATCHJOBS_JAR=/home/devel/batch/NetarchiveSuiteBatchprograms/BatchJobs-<timestamp>.jar
cd /home/devel/${TESTX}/ 
cp -pv $BATCHJOBS_JAR /home/devel/${TESTX}/BatchJobs.jar

Add the following to conf/settings_GUIApplication.xml in the common section:

<settings>
<common>
<batch>
  <batchjobs>
    <batchjob>
      <class>dk.netarkivet.common.utils.batch.ChecksumJob</class>
      <jarfile/>
    </batchjob>
    <batchjob>
      <class>dk.netarkivet.common.utils.batch.FileListJob</class>
      <jarfile/>
    </batchjob>
    <batchjob>
      <class>batchjobs.MimeSearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
    <batchjob>
      <class>batchjobs.URLsearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
    <batchjob>
      <class>batchjobs.ContentSearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
    <batchjob>
      <class>batchjobs.UrlAndMimeSearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
  </batchjobs>
</batch>
</common>
</settings>

Restart the GUI: 

conf/restart.sh

Go to the GUI and verify that the new batch jobs are available in the Batch Overview page

Run all the BatchJobs on a snapshot harvest (settings the Job ID).

  1. Run the MimeSearch BatchJob with argument text/html and verify that the result is a list of html pages.
  2. Run URLsearch BatchJob with arguments '.*kb.*' . This should generate a list of the kb harvested ressources. 
  3. Run ContentSearch BatchJob with MimeType arguments 'text/html' and TextPattern  '.*statsbiblioteket.*". This should generate a list of html ressources the word {{statsbiblioteket}}.  Note: this operation will take a while to finish (about 10 minutes)
  4. Run UrlAndMimeSearch with argument 'image/.*' for mimetype and '.*kb\.dk/.*' for url. Verify that only images from the kb domain is listed.
  • No labels