[NAS-2349] List of all files when adding new file to bitarchive is inefficient Created: 23/Jun/14 Updated: 19/Feb/16 Resolved: 19/Feb/16 |
|
Status: | Resolved |
Project: | NetarchiveSuite |
Component/s: | Archive |
Affects Version/s: | 4.4 |
Fix Version/s: | 5.1 |
Type: | Bug | Priority: | Major |
Reporter: | Mikis Seth Sørensen (Inactive) | Assignee: | Unassigned |
Resolution: | Fixed | ||
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
||||||||||||||||
External reference: |
Description |
See NARK-506 for details. |
Comments |
Comment by Nicholas Clarke (Inactive) [ 24/Jun/14 ] |
http://codingjunkie.net/java-7-copy-move/ |
Comment by Thorbjørn Ravn Andersen (Inactive) [ 24/Jun/14 ] |
It should only be creating dirs and moving files, and I believe Jens Henrik has enough Unix-fu to be able to script that process instead of doing it by hand. |
Comment by Tue Hejlskov Larsen [ 24/Jun/14 ] |
And it is not a quick fix for Jens Henrik! He need to split up >100 drives with 500 TB into new subfolders. |
Comment by Thorbjørn Ravn Andersen (Inactive) [ 24/Jun/14 ] |
I disagree that the "split up in smaller subdirectories" is a quick and dirty fix. It is a very common technique to overcome that most filesystems implement directory operations linearly meaning that Things go slower when the number of files in the current directory get large. |
Comment by Tue Hejlskov Larsen [ 24/Jun/14 ] |
The test system uses the same storage backend according to Jens Henrik. |
Comment by Mikis Seth Sørensen (Inactive) [ 24/Jun/14 ] |
A way forward could be:
|
Comment by Nicholas Clarke (Inactive) [ 24/Jun/14 ] |
Well it could be an experiment to see if the watcher service works nicely over NFS with so many files. If NFS has native watching it should not be a problem. |
Comment by Thorbjørn Ravn Andersen (Inactive) [ 24/Jun/14 ] |
What would be the most appropriate way to solve this? |
Comment by Mikis Seth Sørensen (Inactive) [ 24/Jun/14 ] |
Aaah ok, so this functionality exist to ensure consistency between actual disk data and the cached filelists. We have run into the same problem in the Bitrepository reference pillar, where Jonas had implemented similar functionality, which caused the pillar to slow down to a crawl. This has now been changed so a scheduled service checks consistency on a regular basis instead of being trigged by external operation. This means that instead of using more and more resources as the load rises, the check can run when the system is idle. A even better strategy would of course be to only trigger a small update based on the actual relevant events, eg. the files a disk has changed, that is use the JDK1.7+ Watcher service as Nicholas proposes. |
Comment by Nicholas Clarke (Inactive) [ 24/Jun/14 ] |
Yes, Watcher service is nice. |
Comment by Thorbjørn Ravn Andersen (Inactive) [ 24/Jun/14 ] |
So this would be a very good reason to move to a Java 8 runtime? |
Comment by Nicholas Clarke (Inactive) [ 24/Jun/14 ] |
Well linux and many files in the same folder is a bad combo even without network mounted drives. |
Comment by Thorbjørn Ravn Andersen (Inactive) [ 24/Jun/14 ] |
Sounds like the culprit. Wonder why the Network disk cannot keep up? |
Comment by Søren Vejrup Carlsen (Inactive) [ 24/Jun/14 ] |
The code related to this in the class dk.netarkivet.archive.bitarchive.BitarchiveAdmin. Furthermore, it turns out that the when adding a new file to a archiveDirectory, the List of files for this archivedirectory is regenerated each time instead |