[NAS-2492] Error restarting jobs Created: 02/Feb/16 Updated: 03/Feb/16 Resolved: 03/Feb/16 |
|
Status: | Resolved |
Project: | NetarchiveSuite |
Component/s: | Harvest Definition |
Affects Version/s: | 5.0 |
Fix Version/s: | 5.1 |
Type: | Bug | Priority: | Major |
Reporter: | Colin Rosenthal | Assignee: | Søren Vejrup Carlsen (Inactive) |
Resolution: | Fixed | ||
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Verification: | Start a job. Restart test system. Now job is in state failed. Restart job. Once job starts harvesting, the fix is confirmed, |
Description |
I had some jobs which failed because of a parse error from some global crawler traps. When I tried to restart them, they failed again dk.netarkivet.common.exceptions.IllegalState: The placeholder for the property '%{METADATA_ITEMS_PLACEHOLDER}' was not found. Maybe the placeholder has already been replaced with the correct value. The template looks like this: <?xml version="1.0" encoding="UTF-8"?> <!-- HERITRIX 3 CRAWL JOB CONFIGURATION FILE - For use with NetarchiveSuite 5.0 --> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" |
Comments |
Comment by Søren Vejrup Carlsen (Inactive) [ 03/Feb/16 ] |
verified as part of TEST6 |
Comment by Søren Vejrup Carlsen (Inactive) [ 03/Feb/16 ] |
Rolled back: https://github.com/netarchivesuite/netarchivesuite/commit/06a72134014dc9bff49da6fc1da7dd77504aabf4 |
Comment by Colin Rosenthal [ 03/Feb/16 ] |
Agee that we rollback this change. |
Comment by Søren Vejrup Carlsen (Inactive) [ 02/Feb/16 ] |
Or we should undo the last save to the job database: https://github.com/netarchivesuite/netarchivesuite/commit/ae8ecc0a6c17a28cafa591a534d94544b215d26c |
Comment by Søren Vejrup Carlsen (Inactive) [ 02/Feb/16 ] |
Or we should wait to insert the WarcInfoMetadata until we reach the harvester. |
Comment by Colin Rosenthal [ 02/Feb/16 ] |
The same fault crops up later in TEST2 when a job is deliberately made to fail by retstarting the system. |
Comment by Colin Rosenthal [ 02/Feb/16 ] |
... or do nothing but log as a warning rather than throw an exception. |
Comment by Colin Rosenthal [ 02/Feb/16 ] |
Looks like the error is in JobDispatcher.doOneCrawl() where it tries to insert warcInfoMetadata again. Fix should be to check if this job is a continuation, like this: if (job.getContinuationOf() != null ) { ht.insertWarcInfoMetadata(job, origHarvestName, origHarvestSchedule, Settings.get(HarvesterSettings.PERFORMER)); } |