Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Status of the production sites



We are working on upgrading our BCWeb installation to 7.2.0.



At the end of June, we have put in production the new version of NAS (6.0.0) with the official IIPC version of Heritrix (3.4.0-20200518). By this upgrade, we intend to improve the quality and the completeness of our crawlings. The new version of Heritrix includes contributions done by BnF's IT team's developers : treatment of the "data" attribute in the pictures tags, and harvesting of the files hosted on servers secured by SFTP, and not only on FTP servers. With the new Javascript extractor and the inclusion of "data" attrributes, we expect a significative amelioration in the harvesting of pictures, especially for the responsive websites. In addition, the new version of Heritrix allows parallelization of queues, and we expect more rapidity and completeness in the social networks accounts harvesting, singularly Twitter. In the next weeks, we plan to compare jobs done by the previous and the new version of Heritrix, to assess if these improvements become a reality.

The second round of the local elections was held on 28th of June. Since the beginning of June, our elections crawl continues with the initial schedule again : social networks crawled twice a day and other websites crawled twice a month. The crawling will go on until mid-July to cover the setup of the new city councils and the investiture of the mayors.