Details
-
Bug
-
Resolution: Fixed
-
Critical
-
3.17.0
-
None
-
BNF
-
Rough
Description
Our production engineers reported that the index generation for our semestrial crawl had saturated the disk space for the system temp.
We had configured the common.settings.tempDir property to a special big partition, but this setting seemed not to have any effect in this case.
Here is the stack trace we obtained:
Nov 7, 2011 4:28:24 PM dk.netarkivet.archive.indexserver.distribute.IndexRequestServer doGenerateIndex
WARNING: Unable to generate index for jobs [823,822,825,824]
dk.netarkivet.common.exceptions.IOFailure: Error code 2 sorting crawl log '/data/PROD_CIRCUIT_3.1.0/cache/crawllog/crawllog-823-cache'
at dk.netarkivet.common.utils.FileUtils.sortCrawlLog(FileUtils.java:1005)
at dk.netarkivet.archive.indexserver.CrawlLogIndexCache.getSortedCrawlLog(CrawlLogIndexCache.java:244)
at dk.netarkivet.archive.indexserver.CrawlLogIndexCache.indexFile(CrawlLogIndexCache.java:179)
at dk.netarkivet.archive.indexserver.CrawlLogIndexCache.combine(CrawlLogIndexCache.java:146)
at dk.netarkivet.archive.indexserver.CombiningMultiFileBasedCache.cacheData(CombiningMultiFileBasedCache.java:80)
at dk.netarkivet.archive.indexserver.CombiningMultiFileBasedCache.cacheData(CombiningMultiFileBasedCache.java:48)
at dk.netarkivet.archive.indexserver.FileBasedCache.cache(FileBasedCache.java:167)
at dk.netarkivet.archive.indexserver.distribute.IndexRequestServer.doGenerateIndex(IndexRequestServer.java:157)
at dk.netarkivet.archive.indexserver.distribute.IndexRequestServer.access$000(IndexRequestServer.java:58)
at dk.netarkivet.archive.indexserver.distribute.IndexRequestServer$1.run(IndexRequestServer.java:137)
A little bit of investigation revealed that the IndexServer process had children process running the unix sort command, and this would by default use the system /temp, and cause the saturation.
The suggested fix is to add the '-T <value of common.settings/tempDir" parameter when building the sort command within application code.
Attachments
Issue Links
- Trackbacks
-
3.18.0 release test Previous 3.17.0 release test Release test status https://sbforge.org/jira/browse/NAS1909 Code freeze planned for 24.Oct. Release planned for the 11.Nov NAS:NetarchiveSuite 3.18.0 Release Notes System test environment