[NAS-2545] harvestInfo.performer in warcinfo records is not included in harvestInfo.xml Created: 27/Jul/16 Updated: 03/Nov/16 Resolved: 19/Oct/16 |
|
Status: | Resolved |
Project: | NetarchiveSuite |
Component/s: | WARC |
Affects Version/s: | 5.1 |
Fix Version/s: | 5.2 |
Type: | New Feature | Priority: | Minor |
Reporter: | Sara Aubry | Assignee: | Søren Vejrup Carlsen (Inactive) |
Resolution: | Fixed | ||
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Organization: |
BNF
|
Inspector: | lam.mai |
Sprint: | NAS 5.2 |
Description |
In 5.1, the harvestInfo.performer included in warcinfo records is empty. And is not included in harvestInfo.xml. #added by NetarchiveSuite Version: 5.1 (<a href="https://github.com/netarchivesuite/netarchivesuite/commit/cde61d78299cabccae6195908b81ef77c84a76b9">cde61d7829</a>) <?xml version="1.0" encoding="UTF-8"?> |
Comments |
Comment by Sara Aubry [ 10/Oct/16 ] |
Tested, if the settings.harvester.performer is not declared, it will not appear either in the harvestInfo.xml, nor in the warcinfo metadata of the data files. |
Comment by Sara Aubry [ 28/Sep/16 ] |
Great, we'll test it. |
Comment by Søren Vejrup Carlsen (Inactive) [ 27/Sep/16 ] |
NAS is now consistent. It does not longer add empty performer values into the warcInfo metadata |
Comment by Søren Vejrup Carlsen (Inactive) [ 21/Sep/16 ] |
So probably, you just need to override the empty settings-value settings.harvester.performer The thing, we probably should do is to avoid inserting a empty performer in warcInfo metadata |
Comment by Sara Aubry [ 21/Sep/16 ] |
Yes, we saw that by declaring a performer in the settings. |
Comment by Søren Vejrup Carlsen (Inactive) [ 21/Sep/16 ] |
and appended to the warc-info-metadata with this code in H3HeritrixTemplate.insertWarcInfoMetadata if (performer != null){ sb.append(startMetadataEntry); sb.append(HARVESTINFO_PERFORMER + valuePart + performer + endMetadataEntry); } |
Comment by Søren Vejrup Carlsen (Inactive) [ 21/Sep/16 ] |
The value of the performer is currently read from settings in the JobDispatcher.doOneCrawl method if (job.getContinuationOf() == null ) { ht.insertWarcInfoMetadata(job, origHarvestName, origHarvestSchedule, Settings.get(HarvesterSettings.PERFORMER)); } else { log.info("Job is a continuation of " + job.getContinuationOf() + " so no need to replace WarcInfoMetadata"); } |