Status of the production sites
Our 3rd broad crawl for 2018 finished on January 5. We have harvested about 45 TB and more than 650 mill. Documents in 344 harvest jobs. We noticed, that our crwls run very slowly because of the throttling rules in the firewall we set up in order to reduce disturbances for the website owners.
Webhosting is going to be more and more centralized in Denmark. Therefore we made agreements for throttling with the webhosts.
We will rethink our strategy for the broad crawls, that is to say we mostl likely will exclude more big sites (e.g. municipalities’ websites) from the “regular” broad crawls and crawl them selectively in the same frequence as the broad crawls.
We had problems with wayback access – ten days in December it accessed the living web instead of the archive. This happened in connection with an upgrade process of our citrix and we had to roll bag to the former version. Furthermore Wayback (Blacklight) stil does not perform very well: lots off images are missing and lots of sites using https protocol are not displayed.
We are working on the adjustment of our procedures, documentation etc. according to the EU General Data Protection Regulation (GDPR)
A happy new year and best wishes to all for 2019 from the BnF web archiving team!
Over the past 4 months, a team of 2 developpers, 4 project members and a user group of 4 curators has been working on developping new frontoffice and backoffice functionalites for BCweb. The version 6.1 of BCweb is soon to be released (in January). One of the key features of this version will be a new functionality to check existing records, enabling the curators to easily monitor and fix invalid and redirected URLs. Administrators will also have a backoffice import functionality to bulk create records. More is to come in the next version which will include an improved graphic design. A release note for the NAS community is being prepared. The code will be made available to the BCweb/NAS community (uploaded to the Git repository). The next three iterations of this scrum-based development project will focus on making BCweb data available for use by other BnF applications, using webservices.