Agenda for the joint KB, BNF, ONB and BNE NetarchiveSuite tele-conference 2017-07-04, 13:00-14:00.
We launched our second domain crawl for 2017; we still have problems with scheduling jobs. Jobs are hanging in the SUBMITTED queue, jobs are failing and zombie jobs are created outside the Gui. Before restarting, we have to empty the queue manually.
We have NFS performance problems with wayback.
We are migrating our documentation from 2005 – 2014 from a MoinMoin wiki to jira. There is a lot of important documentation for the selective crawls. This project takes time – we have to migrate the content manually.
Our developers want to implement shine in our freetext interface. In the next couple of weeks, the curators (hopefully) will have the possibility for testing shine – before the implementation!
We have started the preparation of our 2017 broad crawl. As we have a lot of work to do on our infrastructure (new server, new OS, new storage), we finally decided to not open any development work for the preparation of this crawl.
We are also working on full text indexing and search: our target collection is 12TB. We are currently redefining the structure of our Search application to enable user profiles and internationalization.
We finished our second domain crawl with NAS on June 4th. Comparing with our first domain crawl, that was launched last year and lasted 3 months, this second one, finished in two months. The list of .es domains was 50.000 domains longer, and we were more ambitious this time and fixed a limit per domain of 150 Mb, instead of 100 Mb, that was the limit last year. 655 millions of URLs have been archived this year, around 23 Tb.
We are about to give access to our web collections only on certain computers at the National Library and at the regional libraries with competencies on legal deposit that have accepted the conditions of access established by the copyright legislation in Spain. We expect this access will be available in a couple of weeks.
Our engineers installed a test environment for NAS 5 and they have been doing some tests with the purpose of having a production environment ready to launch a domain crawl of the .gal domain (the regional domain of Galicia) in September.