Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


  • Preparation for implementation of SolR wayback for external users are nearly finished. We had to clear security issues and are now waiting for the final go from our legal experts
  • We have send a recommendation for decision to our directors: We want to implement Warc-files created by crawling with into our preservation system 
  • Our broad crawl is ongoing, some of us are busy with the job follow up 
  • We are negociating with our IT department: we want them to allot time to implement the new Heritrix release, which hopefully – among others – will solve our problems with harvesting “lazy load”



We have finished our tests on the new Heritrix IIPC version and plan to put it into production before the end of June. This version will integrate also the migration to Postgresql 11.

After this deployment, we will be ready to launch a crawl of YouTube channels about the coronavirus. To enrich this collection, we will also launch an Instagram crawl : we are targeting a selection of 150 instagram profiles. Images and text will be crawled from




  • We are focused on our coronavirus collection. We are collecting proposals from the regional web curators because they don’t have access to the tools due to the situation. We are also accepting public nominations using a web form. We have more than 5 Tb of information and almost 2,000 seeds.
  • We had planned to launch our annual broad crawl in April but we postpone it until the situation is normalized.
  • We have already installed the version 6.1 of BCWeb in a test environment. He will test it these days before uploading it to the production environment