[NAS-2501] 10 minutes waitstate between harvest is finished, and postprocessing begins Created: 11/Feb/16 Updated: 24/Feb/16 Resolved: 24/Feb/16 |
|
Status: | Resolved |
Project: | NetarchiveSuite |
Component/s: | Harvester Controller Server |
Affects Version/s: | None |
Fix Version/s: | 5.1 |
Type: | Bug | Priority: | Minor |
Reporter: | Søren Vejrup Carlsen (Inactive) | Assignee: | Søren Vejrup Carlsen (Inactive) |
Resolution: | Fixed | ||
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | 0.1h | ||
Original Estimate: | Not Specified |
Issue Links: |
|
Description |
There is seemingly a constant 10 minutes waitstate between harvest is finished, and postprocessing begins
2016-02-11 14:08:56.262 [pool-2-thread-1] INFO dk.netarkivet.harvester.heritrix3.controller.Heritri
xLauncher.run
- Job ID 4: crawl is finished.
2016-02-11 14:18:56.221 [Thread-9] INFO dk.netarkivet.harvester.heritrix3.controller.HeritrixLauncher.doCrawl
- CrawlJob is now over
|
Comments |
Comment by Søren Vejrup Carlsen (Inactive) [ 17/Feb/16 ] |
Replaced the rather complex thread-structure with simple wait-loop |
Comment by Søren Vejrup Carlsen (Inactive) [ 12/Feb/16 ] |
After inserting extra logging in the first if-statement private class CrawlControl implements Runnable { @Override public void run() { if (crawlIsOver) { // Don't check again; we are already done log.warn("Why do you check me again. we're done already"); return; } CrawlProgressMessage cpm = null; try { cpm = heritrixController.getCrawlProgress(); } catch (IOFailure e) { // Log a warning and retry log.warn("IOFailure while getting crawl progress", e); return; } catch (HarvestingAbort e) { log.warn("Got HarvestingAbort exception while getting crawl progress. Means crawl is over", e); crawlIsOver = true; return; } JMSConnectionFactory.getInstance().send(cpm); Heritrix3Files files = getHeritrixFiles(); if (cpm.crawlIsFinished()) { log.info("Job ID {}: crawl is finished.", files.getJobID()); crawlIsOver = true; return; } log.info("Job ID: " + files.getJobID() + ", Harvest ID: " + files.getHarvestID() + ", " + cpm.getHostUrl() + "\n" + cpm.getProgressStatisticsLegend() + "\n" + cpm.getJobStatus().getStatus() + " " + cpm.getJobStatus().getProgressStatistics()); } } I get the sequence, which indicates, that the threads runs 4 more times more than it should 2016-02-12 11:35:09.264 [pool-2-thread-1] INFO dk.netarkivet.harvester.heritrix3.controller.HeritrixLauncher.run - Job ID 13: crawl is finished. 2016-02-12 11:36:09.215 [pool-2-thread-1] WARN dk.netarkivet.harvester.heritrix3.controller.HeritrixLauncher.run - Why do you check me again. we're done already 2016-02-12 11:37:09.215 [pool-2-thread-1] WARN dk.netarkivet.harvester.heritrix3.controller.HeritrixLauncher.run - Why do you check me again. we're done already 2016-02-12 11:38:09.215 [pool-2-thread-1] WARN dk.netarkivet.harvester.heritrix3.controller.HeritrixLauncher.run - Why do you check me again. we're done already 2016-02-12 11:39:09.215 [pool-2-thread-1] WARN dk.netarkivet.harvester.heritrix3.controller.HeritrixLauncher.run - Why do you check me again. we're done already 2016-02-12 11:39:09.216 [Thread-8] INFO dk.netarkivet.harvester.heritrix3.controller.HeritrixLauncher.doCrawl - CrawlJob is now over |