Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »


Agenda for the joint BNF, ONB, SB, KB and BNE NetarchiveSuite tele-conference 2017-02-07, 13:00-14:00.

Practical information


  • BNF: Lam, Géraldine, Sara
  • ONB: Michaela, Andreas
  • KB/DK: Stephen, Tue
  • SB: Sabine, Colin
  • BNE: Mar
  • KB/Sweden: -

Upcoming NAS 5.3 Release

Status of developments.

On BnF side, 2 pull requests have already been submitted (see details)



PR3: coming soon. It will also include the following fixes:

H3 pauseAtStart bean property ignored by Harvest Controller NAS-2596 - Getting issue details... STATUS

WARC-Refers-To-Date in WARC revisits records do not have the right original record date: NAS-2602 - Getting issue details... STATUS

NAS workshop in Vienna

Date and participants:

Review of possible topics: 2017 NAS workshop

Questions from KB

1)      What CDX format are you using today and plan to support within next year?

2)      Which version of (Open)Wayback are you using today and what do think about the future development of OpenWyback?

3)      Which social media can you archive today?

Status of the production sites


We are concentrating our efforts on the capture of social media. Last week all curators met physically to kick off the discussion of our social media strategy. We started with analyzing the social media in order to be able to decide the selection and to propose a crawl frequency.

Some statistics for 2016: There are 1.097.585 active websites listed in NAS. 180.046 sites are bigger than 10 Mbytes in 2016. We harvested ca. 27 milliard objects. The total of the archive in the end of 2016 is 769 TB; we harvested 95 TB in 2016, that is to say 35 TB less than in 2015.


We are working what should be the final corrections before changing over to Heritrix 3 and NAS 5. Our tests have not identified any major problems, we are continuing to analyse the results to prepare for any changes that might arise, such as an increase in the amount of data collected. Once we start crawling with H3 we will increase our usual monitoring to be able to deal with any unexpected changes.

We started our 2017 elections crawls.







Next meeting

March 7th

Any other business?




  • No labels