Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.



We are still working on the reorganization of the selective crawls: the strategy is

  • Extension of the selective crawls and smaller broad crawls –
    • We now collect all national Danish news media selectively – both newspaper websites and news media only existing online.
    • We investigate all local news media in order to decide frequency and depth for the future crawls.
    • We made a first crawl of university repositories (with OAI-extraction)

 As Heritrix 3 is not able to archive Facebook profiles. But Archive-IT is able to collect Facebook profiles with an API. We will collect about 100 representative open Facebook profiles at Archive-IT, at the moment we are doing the selection of the profiles.

 We are working on the compression of our archive

We still collect url's for the Olympics event crawl (including the paralympics). We nominate all collected url's for the IIPC collection.




  • we have finally launched our online search interface and would be interested in your feedback. The websites are still not accessible, but it is possible to search for versions either by URL or in our (partial) fulltext. We built a bookmarking feature which allows to save versions online and recall them at the library webarchive terminals.
  • At the moment we have ongoing selective crawls and still an event crawl about presidential elections.