Page tree
Skip to end of metadata
Go to start of metadata
Verifies the Batch GUI functionality

Standard functionality

  1. Go to the 'Quality Assurance' -> 'Batchjob Overview' page.
  2. Run the FilelistJob for 'JobID = 1' and 'filetype = Both'. Verify that only filenames starting with 1- are included.
  3. Run the  ChecksumJob for 'JobID = .*' and 'filetype = Both'. Verify that both arc and warc files are included in the output.

Adding new BatchJobs

Install new BatchJobs on kb-test-adm-001.kb.dk: (Build them using the recipe on kb-test-adm-001.kb.dk:/home/devel/batch/NetarchiveSuiteBatchprograms/src/README.txt

$ ssh kb-test-adm-001.kb.dk
$ build the BatchJobs.jar from the recipe on kb-test-adm-001.kb.dk:/home/devel/batch/NetarchiveSuiteBatchprograms/src/README.txt
$ export BATCHJOBS_JAR=/home/devel/batch/NetarchiveSuiteBatchprograms/...
$ cd /home/devel/${TESTX}/ 
$ cp -pv $BATCHJOBS_JAR /home/devel/${TESTX}/

Add the following to conf/settings_GUIApplication.xml in the commons section:

<batch>
  <batchjobs>
    <batchjob>
      <class>dk.netarkivet.common.utils.batch.ChecksumJob</class>
      <jarfile/>
    </batchjob>
    <batchjob>
      <class>dk.netarkivet.common.utils.batch.FileListJob</class>
      <jarfile/>
    </batchjob>
    <batchjob>
      <class>batchjobs.MimeSearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
    <batchjob>
      <class>batchjobs.URLsearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
    <batchjob>
      <class>batchjobs.ContentSearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
    <batchjob>
      <class>batchjobs.UrlAndMimeSearch</class>
      <jarfile>BatchJobs.jar</jarfile>
    </batchjob>
  </batchjobs>
</batch>

Restart the GUI: 

conf/restart.sh

Go to the GUI and verify that the new batch jobs are available.

Run all the BatchJobs on a snapshot harvest (settings the Job ID).

  1. Run the MimeSearch BatchJob with argument text/html and verify that the result is a list of html pages.
  2. Run URLsearch BatchJob with arguments '.*kb.*' . This should generate a list of the kb harvested ressources. 
  3. Run ContentSearch BatchJob with MimeType arguments 'text/html' and TextPattern  '.*statsbiblioteket.*". This should generate a list of html ressources the word {{statsbiblioteket}}. 
  4. Run UrlAndMimeSearch with argument 'image/.*' for mimetype and '.*kb\.dk/.*' for url. Verify that only images from the kb domain is listed.
  • No labels

1 Comment

  1. We're harvesting kb and statsbiblioteket.dk, so the sites should be changed in the test next time