Agenda for the joint NetarchiveSuite tele-conference 2021-04-06, 13:00-14:00.
- BNF: Auriane, Clara
- ONB: Andreas
- KB/DK - Copenhagen: Tue, Stephen, Anders
- KB/DK - Aarhus: Colin
- BNE: José, Alicia
- KB/Sweden: Pär, Peter
Update on NAS latest tests and developments
NetarchiveSuite 7.0 has been released: NetarchiveSuite 7.x Release Notes
For the rest of the Spring, the Core Development Team (ie Colin + Rasmus) will be concentrating on support tasks in connection with migration and deployment of NetarchiveSuite 7.0 so there will be very limited resources for development work on the NetarchiveSuite codebase.
Status of the production sites
We are pleased to announce that, last month, we published our selective crawls seed lists on the new version of the BnF website dedicated to APIs and datasets. These lists are created from BCWeb exports including some crawl settings and descriptive elements like themes and keywords.
In 2020, three new crawls were launched and added on the website: Instagram, Artificial Intelligence and Environnemental Issues.
You can consult all these lists at this address: https://api.bnf.fr/fr/liste-des-adresses-url-des-collectes-ciblees-du-web-francais-par-la-bnf
Another page which is focused on Covid-19 selections can be consulted at this address: https://api.bnf.fr/fr/node/176
For the second consecutive year, we launched an Instagram crawl. We plan to make five Instagram crawls, some of them are about specific subjects like the Olympic games or the regional and departmental elections in France.
Just like last year, we had to crawl picuki.com. Actually, in spite of many tests, we always end up being blocked by Instagram.
And finally, our in-house harvesting workshop about Flash is going to finish. It was complicated to find a way to harvest automatically some of the websites with Flash animations because some URLs are dynamically generated or relative, and so they are inaccessible to Heritrix. So we will try to discover all the URLs with the help of a human hand and we will launch the harvest in a second time.
In case of successful crawl, we will sometimes have an issue with compatibility of Flash plugin used with the Wayback.
- New contribution to coronavirus international crawl with a selection of 200 seeds
- Last month we had two meetings with our regional web curators from different part of Spain. We worked on the selection of seeds.
- This month we are working with our collaborators on a new event collection for the regional election in Madrid
- We continue to work on regional election in Catalonia
- May 4th
- June 8th
- July 6th
- September 7th
- October 5th
- November 2nd
- December 14th
- January 11th, 2022
Any other business?