Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


At the beginning of May we had a meeting to prepare the programme of the 2018 broad crawl. We decided to put NetarchiveSuite 5.4 into production without any additional development except the management of TLDs. We'll contact the same registrars as last year to collect a similar number of seeds, but we'll try to be more attentive to the scope of the harvest: we met some problems with new TLDs like .museum which contained a lot of foreign web sites. We'll also review all our storage space and its managment. The launch of the broad crawl is scheduled for October.

In parallel, we have made an evolution to the system to check the validity of URLs in BCWeb, so that the version 5.3 can recognize all types of HTTPS. We are aiming to use this release before the summer.


- The Problem with "duplicate Jobs submitted", seems to be solved. Changing the scheduling frequency from 1 to 5 Minuten works good so far since 4 weeks.
-  We finished our crawl to the 4th local election this year, so there is no further election planned for this year
-  Our Crawl of, a austrian blogging platform which will shutdown by end of this month, is still running.
- And we have some change in the infrastructure. Our It-Department changes some Network attached storages. So this means a lot of syncing and copying