Each year, the different sections of the BnF legal deposit department give a view of the documents they have received. L’Observatoire du dépôt légal : reflet de l’édition contemporaine is now available online (in French only):Our annual crawl of auction houses has just finished. The scope of the collection is the same as in previous years, but last year the platform auction.fr, which represents about a third of the crawl, blocked access by our robots. The librarian in charge of the selection contacted the site owner who was happy to let us crawl the site, and the quality seems much better this year. We also have to be careful as the majority of the sites are hosted on two platforms (auction.fr and Drouot), and their catalogues and images are stored on a small number of hosts - we have to increase the budget for these hosts to collect as much as possible.
It gives analysis and raw data from 2015 on seed domains (more than 900,000 have appeared since the previous year and more than 500,000 have disappeared), on format, on http response codes, on the biggest harvested domains…
This month we also have several project crawls on different themes.
Among these project crawls, the annual one dedicated to French Official Publications is still going on with few new aims. Launched in the middle of June, it contains a sample of the web social presence of the central administration, with the decision to add the social media accounts of ministers and public bodies. While this is unfortunately without crawls of Facebook pages because of the now well-known problem of captchas, the goal is to reflect this type of official communication that was previously not so well covered in our selections. The frequence of the crawls of these specific ways to promote official publications, administrative and political communication could be extended in the future. The traditional aim of collecting the "classic" online publications is still relevant, with more than 800 URL seeds of traditional websites, crawled with a 100,000 URL budget for each.
We are also maintaining our crawl "Solidarities" with the same scope as last year, though we have also included sites that were selected for an emergency crawl on the refugee crisis .