Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Update on NAS latest tests and developments

NetarchiveSuite 7.0 has been released: NetarchiveSuite 7.0 Release Notes

For the rest of the Spring, the  Core Development Team (ie Colin + Rasmus) will be concentrating on support tasks in connection with migration and deployment of NetarchiveSuite 7.0 so there will be very limited resources for development work on the NetarchiveSuite codebase.

Status of the production sites






We are pleased to announce that, last month, we published our selective crawls seed lists on the new version of the BnF website dedicated to APIs and datasets. These lists are created from BCWeb exports including some crawl settings and descriptive elements like themes and keywords.
In 2020, three new crawls were launched and added on the website: Instagram, Artificial Intelligence and Environnemental Issues.
You can consult all these lists at this address:
Another page which is focused on Covid-19 selections can be consulted at this address:

For the second consecutive year, we launched an Instagram crawl. We plan to make five Instagram crawls, some of them are about specific subjects like the Olympic games or the regional and departmental elections in France.
Just like last year, we had to crawl Actually, in spite of many tests, we always end up being blocked by Instagram.

And finally, our in-house harvesting workshop about Flash is going to finish. It was complicated to find a way to harvest automatically some of the websites with Flash animations because some URLs are dynamically generated or relative, and so they are inaccessible to Heritrix. So we will try to discover all the URLs with the help of a human hand and we will launch the harvest in a second time.
In case of successful crawl, we will sometimes have an issue with compatibility of Flash plugin used with the Wayback.