Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  • Preparation for implementation of SolR wayback for external users are nearly finished. We had to clear security issues and are now waiting for the final go from our legal experts
  • We have send a recommendation for decision to our directors: We want to implement Warc-files created by crawling with into our preservation system 
  • Our broad crawl is ongoing, some of us are busy with the job follow up 
  • We are negociating with our IT department: we want them to allot time to implement the new Heritrix release, which hopefully – among others – will solve our problems with harvesting “lazy load”
  • The event crawl "Coronavirus in Denmark" is ongoing, of cause. We get help from people outside the Netarchive Team, among others,  colleagues who are not able to do their ordinary work



We have finished our tests on the new Heritrix IIPC version and plan to put it into production before the end of June. This version will integrate also the migration to Postgresql 11.

After this deployment, we will be ready to launch a crawl of YouTube channels about the coronavirus. To enrich this collection, we will also launch an Instagram crawl : we are targeting a selection of 150 instagram profiles. Images and text will be crawled from