|There have been several changes in the team over the summer. Pascal Tanésie has arrived as assistant head of the digital legal deposit team, and Vladimir Tybin has joined the team as digital curator. Sophie Derrot has left the BnF to take up a post at the Institut national d'histoire de l'art.|
Our second test broad crawl, with the complete seed list, is nearly finished. The amount of data crawled in this test has proved to be higher than our budget estimates, mainly because there is no deduplication for this first broad crawl with H3. We will analyze the figures in detail and adapt the budget accordingly.
We are also using our new infrastructure for the tests: the crawlers are more powerful and faster but they use more bandwith. We will therefore need to reduce the number of crawlers from 40 to 35. We had set the duration of each job to 3 days but this has proved to be too much, for the real crawl it will be betwen 2 and 2.5 days.
This week we aim to transfer all our crawls onto the new infrastructure and the next week the real broad crawl will start.
Our IT team has been working on the implementation of NAS 5 and they installed a complete preproduction environment (connected to CWeb) of NAS 5.3. They run several tests and checked that some problems we had been experienced with NAS 4 (especially related to security certificates) have been solved with NAS 5.
We’ve been also concentrated in curating our Catalan Politics collection, which was a thematic collection, but it’s indeed a mixture of thematic and event collection. We decided to keep it as it previously was (a thematic collection), but adding new seeds, launching it more frequently and tuning some configurations.
We finally made access available to our web archive at the beginning of July. The online access only allows seeing what captures we have from every site, but the archived content itself is only accessible in our premises and the ones at the regional libraries with legal deposit competencies. Some of them have opened also this access to their users. It is not allowed to download or copy any part of the web archive, due to our copyright law limitations.
We are also preparing our annual workshop with regional web curators, scheduled for November 20th, to review the state of the art of our collaborative project of web archiving and non-print legal deposit.