Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 3.18.0
    • Fix Version/s: 3.18.0
    • Component/s: None
    • Labels:
      None
    • Verification:
      Hide

      Prepare any multi-machine deploy. Check that the log properties in release_software_dist/TESTX
      show that the ConsoleHandler has been removed from the list of handlers in the log.prop file for
      any windows machines.

      Show
      Prepare any multi-machine deploy. Check that the log properties in release_software_dist/TESTX show that the ConsoleHandler has been removed from the list of handlers in the log.prop file for any windows machines.

      Description

      There is a step in TEST11A:

      java -cp lib/dk.netarkivet.archive.jar -Ddk.netarkivet.settings.file=conf/settings_IndexServerApplication.xml dk.netarkivet.archive.tools.RunBatch -Cbatchprogs/ExceptionBatch.class -Eerror.exception -Ooutput.exception

      which fails as follows

      1. The batch job hangs and never ends
      2. The batch job causes the whole arcrepository to freeze. Any subsequent batch jobs also hang, even jobs which ran ok previously
      3. After restarting, adding -BSBN to the command line to send the job to SB causes the job to run without problem.
        This means that the error closely resembles the problems we have had with automatic indexing for wayback.

        Issue Links

          Activity

          Hide
          Colin Rosenthal added a comment -

          Two of the six KB bitapps hang. In each case the bitapp hangs on the fourth file exception thrown. None of the other bitapps throw as many as four exceptions.

          Show
          Colin Rosenthal added a comment - Two of the six KB bitapps hang. In each case the bitapp hangs on the fourth file exception thrown. None of the other bitapps throw as many as four exceptions.
          Hide
          Colin Rosenthal added a comment -

          ... but adding a fourth metadata arcfile to bitapp_6 causes it to register four exceptions and then terminate normally.

          Show
          Colin Rosenthal added a comment - ... but adding a fourth metadata arcfile to bitapp_6 causes it to register four exceptions and then terminate normally.
          Hide
          Colin Rosenthal added a comment -

          Here is a stack trace showing the last thing the bitapp does before dying. Note that it now dies shortly after the first exception.

          INFO: Starting batch job on bitarchive application with id '172.17.0.53_BitApp_4': 'dk.netarkivet.common.utils.batch.LoadableFileBatchJob', on filename-pattern: '.*'
          08-11-2011 13:29:09 dk.netarkivet.archive.bitarchive.Bitarchive batch
          FINE: Batch: Job dk.netarkivet.common.utils.batch.LoadableFileBatchJob processing ExceptionBatch.class started at Tue Nov 08 13:29:09 CET 2011
          08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.BatchLocalFiles run
          INFO: The batchjob 'class dk.netarkivet.common.utils.batch.LoadableFileBatchJob' has run for 0 seconds and has reached file '10-2-20100903130259-00001-kb-test-har-002.kb.dk.arc', which is number 1 out of 706
          08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.BatchLocalFiles processFile
          FINEST: Started processing of file 'G:\bitarkiv_4\TEST11A\filedir\10-2-20100903130259-00001-kb-test-har-002.kb.dk.arc'.
          08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.LoadableFileBatchJob processFile
          FINEST: Started processing of file 'G:\bitarkiv_4\TEST11A\filedir\10-2-20100903130259-00001-kb-test-har-002.kb.dk.arc'.
          08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.BatchLocalFiles processFile
          FINEST: Started processing of file 'G:\bitarkiv_4\TEST11A\filedir\10-metadata-1.arc'.
          08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.LoadableFileBatchJob processFile
          FINEST: Started processing of file 'G:\bitarkiv_4\TEST11A\filedir\10-metadata-1.arc'.
          08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.BatchLocalFiles processFile
          WARNING: Exception while processing file G:\bitarkiv_4\TEST11A\filedir\10-metadata-1.arc with job dk.netarkivet.common.utils.batch.LoadableFileBatchJob processing ExceptionBatch.class
          java.lang.SecurityException: Security exception
          	at dk.netarkivet.archive.bitarchive.ExceptionBatch.processFile(ExceptionBatch.java:17)
          	at dk.netarkivet.common.utils.batch.LoadableFileBatchJob.processFile(LoadableFileBatchJob.java:179)
          	at dk.netarkivet.common.utils.batch.BatchLocalFiles.processFile(BatchLocalFiles.java:166)
          	at dk.netarkivet.common.utils.batch.BatchLocalFiles.run(BatchLocalFiles.java:113)
          	at dk.netarkivet.archive.bitarchive.Bitarchive.batch(Bitarchive.java:244)
          	at dk.netarkivet.archive.bitarchive.distribute.BitarchiveServer$1.run(BitarchiveServer.java:421)
          08-11-2011 13:29:09 dk.netarkivet.common.distribute.JMSConnection send
          FINEST: Sending message (Heartbeat for 172.17.0.53_BitApp_4 at 1320755349954) to [Queue 'TEST11A_KB_THE_BAMON']
          08-11-2011 13:29:10 dk.netarkivet.common.distribute.JMSConnection send
          FINEST: Sending message (NO ID: To TEST11A_COMMON_MONITOR ReplyTo TEST11A_COMMON_ERROR OK) to [Queue 'TEST11A_COMMON_MONITOR']
          
          Show
          Colin Rosenthal added a comment - Here is a stack trace showing the last thing the bitapp does before dying. Note that it now dies shortly after the first exception. INFO: Starting batch job on bitarchive application with id '172.17.0.53_BitApp_4': 'dk.netarkivet.common.utils.batch.LoadableFileBatchJob', on filename-pattern: '.*' 08-11-2011 13:29:09 dk.netarkivet.archive.bitarchive.Bitarchive batch FINE: Batch: Job dk.netarkivet.common.utils.batch.LoadableFileBatchJob processing ExceptionBatch.class started at Tue Nov 08 13:29:09 CET 2011 08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.BatchLocalFiles run INFO: The batchjob 'class dk.netarkivet.common.utils.batch.LoadableFileBatchJob' has run for 0 seconds and has reached file '10-2-20100903130259-00001-kb-test-har-002.kb.dk.arc', which is number 1 out of 706 08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.BatchLocalFiles processFile FINEST: Started processing of file 'G:\bitarkiv_4\TEST11A\filedir\10-2-20100903130259-00001-kb-test-har-002.kb.dk.arc'. 08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.LoadableFileBatchJob processFile FINEST: Started processing of file 'G:\bitarkiv_4\TEST11A\filedir\10-2-20100903130259-00001-kb-test-har-002.kb.dk.arc'. 08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.BatchLocalFiles processFile FINEST: Started processing of file 'G:\bitarkiv_4\TEST11A\filedir\10-metadata-1.arc'. 08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.LoadableFileBatchJob processFile FINEST: Started processing of file 'G:\bitarkiv_4\TEST11A\filedir\10-metadata-1.arc'. 08-11-2011 13:29:09 dk.netarkivet.common.utils.batch.BatchLocalFiles processFile WARNING: Exception while processing file G:\bitarkiv_4\TEST11A\filedir\10-metadata-1.arc with job dk.netarkivet.common.utils.batch.LoadableFileBatchJob processing ExceptionBatch.class java.lang.SecurityException: Security exception at dk.netarkivet.archive.bitarchive.ExceptionBatch.processFile(ExceptionBatch.java:17) at dk.netarkivet.common.utils.batch.LoadableFileBatchJob.processFile(LoadableFileBatchJob.java:179) at dk.netarkivet.common.utils.batch.BatchLocalFiles.processFile(BatchLocalFiles.java:166) at dk.netarkivet.common.utils.batch.BatchLocalFiles.run(BatchLocalFiles.java:113) at dk.netarkivet.archive.bitarchive.Bitarchive.batch(Bitarchive.java:244) at dk.netarkivet.archive.bitarchive.distribute.BitarchiveServer$1.run(BitarchiveServer.java:421) 08-11-2011 13:29:09 dk.netarkivet.common.distribute.JMSConnection send FINEST: Sending message (Heartbeat for 172.17.0.53_BitApp_4 at 1320755349954) to [Queue 'TEST11A_KB_THE_BAMON'] 08-11-2011 13:29:10 dk.netarkivet.common.distribute.JMSConnection send FINEST: Sending message (NO ID: To TEST11A_COMMON_MONITOR ReplyTo TEST11A_COMMON_ERROR OK) to [Queue 'TEST11A_COMMON_MONITOR']
          Hide
          Colin Rosenthal added a comment -

          And here's another curious thing. Bitapps 1,2,3 & 6 all say that they have finished and are sending reply messages but the KB BAMON has only received reply message from 1,3 & 6.

          Show
          Colin Rosenthal added a comment - And here's another curious thing. Bitapps 1,2,3 & 6 all say that they have finished and are sending reply messages but the KB BAMON has only received reply message from 1,3 & 6.
          Hide
          Colin Rosenthal added a comment -

          The fault is definitely in the logging. By commenting out all handlers in the failing bitapps I can get the job to complete normally. Alternatively by increasing the amount of logging in a bitapp that was running ok I can trigger the failure. Further experiments now to see which handler is the culprit.

          Show
          Colin Rosenthal added a comment - The fault is definitely in the logging. By commenting out all handlers in the failing bitapps I can get the job to complete normally. Alternatively by increasing the amount of logging in a bitapp that was running ok I can trigger the failure. Further experiments now to see which handler is the culprit.
          Hide
          Colin Rosenthal added a comment -

          The fault is in the way we handle stdout. The error can be switched on and off by enabling/disabling the ConsoleHandler.

          Show
          Colin Rosenthal added a comment - The fault is in the way we handle stdout. The error can be switched on and off by enabling/disabling the ConsoleHandler.
          Hide
          Colin Rosenthal added a comment -

          Fix (to deploy system) applied in svn r2173.

          Show
          Colin Rosenthal added a comment - Fix (to deploy system) applied in svn r2173.
          Hide
          Mikis Seth Sørensen added a comment -

          Soooo, this issue is properly at least 'In progress' and perhaps even ready for review?

          Show
          Mikis Seth Sørensen added a comment - Soooo, this issue is properly at least 'In progress' and perhaps even ready for review?
          Hide
          Søren Vejrup Carlsen added a comment -

          Verified as part of TEST11A using revision 2191

          Show
          Søren Vejrup Carlsen added a comment - Verified as part of TEST11A using revision 2191

            People

            • Assignee:
              Colin Rosenthal
              Reporter:
              Colin Rosenthal
              Inspector:
              Søren Vejrup Carlsen
            • Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 7h
                7h