Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  • BNF: Lam, Géraldine, Sara
  • ONB: Michaela, Andreas
  • KB/DK - Copenhagen: Stephen, Tue
  • SBKD/DK - Aarhus: Sabine, Colin
  • BNE: Mar
  • KB/Sweden: -



(These have been validated by our automatic system test. CSR)

PR3: coming soon. It will also include the following fixes:


WARC-Refers-To-Date in WARC revisits records do not have the right original record date:


(Release Date? Next week (week 7) is a school holiday in DK. Colin would like to start work on organising release 5.3 immediately after that, if PR3 is available. So early March - week 10 - for release.)

NAS workshop in Vienna

Date and participants:



We are concentrating our efforts on the capture of social media. Last week all curators met physically to kick off the discussion of our social media strategy. We started with analyzing the social media in order to be able to decide the selection and to propose a crawl frequency.

Some statistics for 2016: There are 1.097.585 active websites listed in NAS. 180.046 sites are bigger than 10 Mbytes in 2016. We harvested ca. 27 milliard objects. The total of the archive in the end of 2016 is 769 TB; we harvested 95 TB in 2016, that is to say 35 TB less than in 2015.

Development is focused on migrating the existing archive to compressed (.gz) format. Compressing the files is easy - the difficulty is finding and updating all references to the old files:

  • in metadata files
  • in cdx indexes
  • in the admin database
  • in the checksum database

... on a running system, and with minimal downtime.



We are working what should be the final corrections before changing over to Heritrix 3 and NAS 5. Our tests have not identified any major problems, we are continuing to analyse the results to prepare for any changes that might arise, such as an increase in the amount of data collected. Once we start crawling with H3 we will increase our usual monitoring to be able to deal with any unexpected changes.

We started our 2017 elections crawls.