Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.



Broad crawl

  • Last week we launched the third broad crawl 2016. The crawl limit per domaine will be max. 100 MB. There will be special crawls for ministeries and government bodies, and for ultra big sites (e.g.
  • We will try to get in touch with the webpage owneers/web hotels who are blocking our crawler (about 11% are blocking us)

Event crawl

  • The event collection for the Olympics in Rio 2016 will go on until the end of the Paralympics 2016

Selctive crawls

  • We are working on the configuration of the regional/local news media crawls.
  • Facebook
    • We have test-crawled about 60 Danish Facebook profiles with Archive-IT. We are analyzing how much we get from the profiles. We have to renew our account with Archive-IT after the end of November and we are trying to negotiate a good prize.
    • We made a special crawl of Prime Minister Lars Løkkes Facebook profile on 2016.08.30, the day he published his 2025 plan.

Compression of the archive

  • We are preparing for the compression, but this awaits NAS release 5.3

Last not least

Last week we learned, that the ministry of culture wants KB and SB to merge: From January 2017 we will be “Nationalbiblioteket” with two locations, in Copenhagen and Aarhus








  • We switched to NAS 5.2 already because we had severe problems with https websites with the former version. It went smooth so far. We are still using the arc format, because we have to refactor all our tools before we switch to warc.
  • The crawl about our presidential elections still running, we have a new election date beginning of December and hope to be able to finish the crawl soon.
  • Apart from one small, additional thematic crawl we will only have ongoing crawls until the end of the year. Next domain crawl is scheduled for 2017.





Next meetings