[NAS-2726] The methods in dk.netarkivet.viewerproxy.webinterface.Reporting does not support metadata files using the BNF naming Created: 22/Mar/18 Updated: 11/Apr/18 Resolved: 11/Apr/18 |
|
Status: | Resolved |
Project: | NetarchiveSuite |
Component/s: | GUI |
Affects Version/s: | None |
Fix Version/s: | 5.4 |
Type: | Bug | Priority: | Blocker |
Reporter: | Søren Vejrup Carlsen (Inactive) | Assignee: | Søren Vejrup Carlsen (Inactive) |
Resolution: | Fixed | ||
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
Description |
The methods in https://github.com/netarchivesuite/netarchivesuite/blob/master/harvester/harvester-core/src/main/java/dk/netarkivet/viewerproxy/webinterface/Reporting.java private static String getMetadataFilePatternForJobId(long jobid) { // The old invalid metadataFilePattern //return ".*"+jobid + ".*" + metadatafile_suffix; return jobid + metadatafile_suffix; } This code currently assumes that we do the legacy style naming of the metadatafiles The difference: if(isPrefix) { return collectionName + "-" + jobID + "-" + harvestID + "-metadata-" + versionNumber + ".warc" + possibleGzSuffix; } else { return jobID + "-metadata-" + versionNumber + ".warc" + possibleGzSuffix; } Currently, collectionName is read from setting HarvesterSettings.HERITRIX_PREFIX_COLLECTION_NAME |
Comments |
Comment by Søren Vejrup Carlsen (Inactive) [ 11/Apr/18 ] |
And I have also tested it using BNF style naming <settings> <harvester><harvesting> <heritrix> <archiveNaming> <class>dk.netarkivet.harvester.harvesting.CollectionPrefixNamingConvention</class> <collectionName>BNF</collectionName> </archiveNaming> </heritrix> <metadata> <metadataFileNameFormat>prefix</metadataFileNameFormat> <filename> <versionnumber>1</versionnumber> </filename> </metadata> </harvesting></harvester></settings> And it works perfectly |
Comment by Søren Vejrup Carlsen (Inactive) [ 10/Apr/18 ] |
I am currently testing your solution with the standard naming setup |
Comment by Søren Vejrup Carlsen (Inactive) [ 10/Apr/18 ] |
Yes, the same regexp is used for selecting the correct metadatafile to search in |
Comment by Sara Aubry [ 10/Apr/18 ] |
Will this regexp also fix the bugs on the features "Display crawl lines matching this regexp" and "Browse only crawl-log lines for this domain XXX" ? |
Comment by Sara Aubry [ 10/Apr/18 ] |
Bert says we should stick to his proposal as the metadatafile_suffix starts with a hyphen. |
Comment by Colin Rosenthal [ 10/Apr/18 ] |
That looks good, but isn't there always a hyphen after the jobid? So "(.*-)?" + jobid + "-(.*)?" + metadatafile_suffix |
Comment by Sara Aubry [ 10/Apr/18 ] |
Here is Bert's recommendation: (.*-)?1073(-.*)?-metadata-[0-9]+\.(w)?arc(\.gz)? hence "(.*-)?" + jobid + "(-.*)?" + metadatafile_suffix should work for both KB and BnF naming schemes. |
Comment by Sara Aubry [ 10/Apr/18 ] |
Here is our naming scheme:
|
Comment by Colin Rosenthal [ 10/Apr/18 ] |
It's surely not beyond our intellectual limits to devise a regex that satisfies both sets of requirements. We need an optional {0,1}"prefix-" before the jobid and a "-" after. QuestionS: can the prefix contain international characters? Does it contain only letters and numbers? (See https://stackoverflow.com/questions/14636540/java-regular-expression-with-international-letters for use of international letters in regexes.) |
Comment by Søren Vejrup Carlsen (Inactive) [ 22/Mar/18 ] |
If we return to the old metadatafile-pattern ".*"+jobid + ".*" + metadatafile_suffix , |