Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.



Start of our 2017 broad crawl on October 10th (4,4 million domains, 3500 URLs per domain, 40 crawlers working with 10 threads each during the day and 30 threads during the night because of bandwidth constraints). We expect to harvest 80 TB of data.

 We redesigned our Wayback and will give access to our full text indexed 1996-2000 collection with Shine in November.


  • The crawl about our presidential elections still running


  • Currently we are compressing all metadata arc Files by using the jwat tools. We trying to gain between 1 and 2 TB extra diskspace with this task.


  • Our General Elections crawl is still running. We hope we can close it by the end of the week, in case the Prime Minister is voted by the Parliament. For this last week, the crawl is running daily, with collaboration of regional web curators that have been adding, reviewing and doing quality assurance of the collection of seeds.
  • Our main tasks by now are related to the working meeting we are preparing with the regional web curators, here at the Library, on November 7th.
    • We have scheduled a short workshop for BCWeb, that web curators are using in a preproduction environment so far. But they are already building their own web collections using CWeb.
    • We are working also in a safe access to our web collections (and the non-print legal deposit in general) using Remote Desktop, for the Regional Libraries to remotely access to the non-print legal deposit, including web archive and deposited publications.
  • Elena is not working with us anymore. She moved to the European Commission in Luxemburg.