Status of the production sites
First of all, we welcome Nola N'Diaye in our team as harvesting manager and assistant head of the digital legal deposit team. She succeeds Pascal Tanésie who is retiring in December.
Last month, nas-preload version 9.1 and NetarchiveSuite version 7.4.1 have been released. The new version of NAS includes several improvements and evolutions which will be usefull for monitoring the crawls: display of the compressed data size of the WARC files produced by each running job, distinction of the queues types on Progression and Queues page, bug fix on the possibility to use a regex with a backslash on Browse/Delete frontier...
We are also going to launch a test broad crawl this week. Our production crawl will be launched in October.
The crawl stemmed from the LIFRANUM project which concerns digital French-speaking literature websites, ended last week. 1089 seeds (websites, blogs hosted on several platforms such as wordpress.com, over-blog.com, etc...) have been harvested. We also crawled separately a few thousand contextual contents webpages with a dedicated job. The selection step was made with Hyphe, a web corpus curation tool based on a web crawler.
Finally the IIPC webinar "Web Archiving the War in Ukraine" took place last Wednesday. On this occasion our colleagues Vladimir Tybin and Anaïs Crinière-Boizet presented, with Kees Teszelszky, the "War in Ukraine" IIPC collaborative collection led by the BnF and the National Library of The Netherlands.