Description
When integrity service fails to get a page of checksums from a contributor it stops collecting more data, but the integrity check workflow is not stopped. This means that a new one can't start before the workflow is stopped by a timeout or the webservice restarted.
A solution could be to make the workflow retry getting the page, or stopping the workflow.
Below a log statement from a failed workflow:
2015-02-27 12:25:02.111 WARN o.b.c.c.m.CollectionBasedConversationMediator - Failing timed out conversation 7b6e9e0e-ee04-49ed-a53b-73e2facafaa5 (Age 3600108ms)
2015-02-27 12:25:02.648 WARN o.b.i.c.IntegrityCollectorEventHandler - Failure: 7b6e9e0e: GET_CHECKSUMS: FAILED: , Failed to receive responses from all contributors before timeout(3600000ms). Missing contributors [avisoffline1]
2015-02-27 12:25:02.648 WARN o.b.i.a.IntegrityAlarmDispatcher - Sending alarm: org.bitrepository.bitrepositoryelements.Alarm@5e649872[origDateTime=2015-02-27T12:25:02.648+01:00, alarmCode=FAILED_OPERATION, alarmRaiser=integrity-service, alarmText=Failed integrity operation: 7b6e9e0e: GET_CHECKSUMS: FAILED: , Failed to receive responses from all contributors before timeout(3600000ms). Missing contributors [avisoffline1], fileID=<null>, collectionID=avis]