Last week, a Datasprint organized by the BnF and Sciences Po Paris took place over 5 days in the BnF DataLab, within the framework of the ResPaDon project. This event brought together researchers, engineers and library professionals around 4 topics concerning web archives.
As we did last December concerning the IA harvest and to celebrate 20 years of harvests about French elections, we opened a participation form until the end of June. The public is invited to take part in the selection of French websites, concerning the 2022 presidential and legislatives elections. It is possible to make suggestions at this address: https://www.bnf.fr/fr/collecte-du-web-electoral-pour-les-elections-presidentielle-et-legislatives-de-2022
For the third year, the Environnemental Issues and Artificial Intelligence harvests have been prepared. The crawls have been launched at the end of March. More than 714 and 639 websites have been selected respectively.
Finally, we are going to launch our annual harvest. Nearly 8900 selections will be collected with 3 budget parameters (from 50 000 to 150 000 URL per domain) and 19 jobs.
We continue testing Solrwayback, lately we have indexed a closed collection about the terrorist gang ETA. This collection has 943 seeds and approximately 2 TB of information. We want to test different aspects of the search response in a specific collection, what level of relevance and accuracy have in the full-text search and way of Solrwayback tools represents information in graphics: wordcloud, link graph, domain stat or ngram Netarchive. Our goal is to improve the experience of the researcher by offering a better way of finding information, and new ways of export and work with this. All these tests will be very important to make decisions, how to show our collections by SolrWayback.