[NAS-2234] ChecksumFileServer Dying with OOM Error Created: 09/Aug/13 Updated: 17/Sep/15 Resolved: 08/May/14 |
|
Status: | Resolved |
Project: | NetarchiveSuite |
Component/s: | Archive |
Affects Version/s: | 4.0, 4.2 |
Fix Version/s: | 4.4 |
Type: | Bug | Priority: | Minor |
Reporter: | Colin Rosenthal | Assignee: | Søren Vejrup Carlsen (Inactive) |
Resolution: | Fixed | ||
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Verification: | Tested by replacing the archive setting "archive.checksum.archive.class" to dk.netarkivet.archive.checksum.DatabaseChecksumArchive Restart the FileChecksumApplication All Bitpreservation actions should be possible and give no error. |
Description |
In TEST7, the "Update checksum and filestatus for CS" step fails because of an OOM error:
Host: kb-test-acs-001.kb.dk
Date: Fri Aug 09 15:38:50 CEST 2013
dk.netarkivet.common.utils.ApplicationUtils.logExceptionAndPrint(ApplicationUtils.java:90)
Could not start class dk.netarkivet.archive.checksum.distribute.ChecksumFileServer
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at dk.netarkivet.common.utils.ApplicationUtils.startApp(ApplicationUtils.java:178)
at dk.netarkivet.archive.checksum.ChecksumFileApplication.main(ChecksumFileApplication.java:47)
Caused by: java.lang.OutOfMemoryError: Java heap space
Will try again with more heapspace! |
Comments |
Comment by Søren Vejrup Carlsen (Inactive) [ 08/May/14 ] |
The implementation is now completed. The default of this setting is dk.netarkivet.archive.checksum.FileChecksumArchive. The database will then be located in the "DB" subdir of the checksum basedir (by default set to CS" To migrate the file checksum archive to a DatabaseChecksumArchive, use the dk.netarkivet.archive.tools.LoadDatabaseChecksumArchive tool |
Comment by Søren Vejrup Carlsen (Inactive) [ 11/Nov/13 ] |
Have now begun implementing a Berkeley DB backed DatabaseChecksumArchive |
Comment by Søren Vejrup Carlsen (Inactive) [ 12/Aug/13 ] |
Downgrading the criticality to minor to affect my opinion of its status |
Comment by Søren Vejrup Carlsen (Inactive) [ 12/Aug/13 ] |
We shouldn't IMHO close it, but we don't need to do more now. |
Comment by Mikis Seth Sørensen (Inactive) [ 12/Aug/13 ] |
Can we close this, with the increase of heap as solution? |
Comment by Mikis Seth Sørensen (Inactive) [ 12/Aug/13 ] |
The Bit repository should fix this. |
Comment by Søren Vejrup Carlsen (Inactive) [ 09/Aug/13 ] |
This problem is an old problem, so it is no good moving to pre-4 releases. The ChecksumFileServer was introduced in NetarchiveSuite 3.12 |
Comment by Søren Vejrup Carlsen (Inactive) [ 09/Aug/13 ] |
This is the case, because every file, and corresponding checksum is stored in an synchronized map, and persisted using a file. During the start-phase, the synchronized map is filled out using the checksum file on local disk, and this is where it goes wrong here, because it is out of memory. The short term fix is increase the MaxHeap value. The longterm is to use a berkeleyDB to persist the information. |
Comment by Søren Vejrup Carlsen (Inactive) [ 09/Aug/13 ] |
As the number of files in your archive grows, the more memory the checksumFileServer will require. |
Comment by Søren Vejrup Carlsen (Inactive) [ 09/Aug/13 ] |
TLR recently saw this in production. By changing the start from using "-Xmx1536m" to "-Xmx1936m" the problem disappeared. |
Comment by Colin Rosenthal [ 09/Aug/13 ] |
Increasing the heap size to 2536m has helped. The job is now running and I am waiting to see if it completes normally. |