We have started the preparation of our 2017 broad crawl. As we have a lot of work to do on our infrastructure (new server, new OS, new storage), we finally decided to not open any development work for the preparation of this crawl.
We are also working on full text indexing and search: our target collection is 12TB. We are currently redefining the structure of our Search application to enable user profiles and internationalization.
We finished our second domain crawl with NAS on June 4th. Comparing with our first domain crawl, that was launched last year and lasted 3 months, this second one, finished in two months. The list of .es domains was 50.000 domains longer, and we were more ambitious this time and fixed a limit per domain of 150 Mb, instead of 100 Mb, that was the limit last year. 655 millions of URLs have been archived this year, around 23 Tb.
We are about to give access to our web collections only on certain computers at the National Library and at the regional libraries with competencies on legal deposit that have accepted the conditions of access established by the copyright legislation in Spain. We expect this access will be available in a couple of weeks.
Our engineers installed a test environment for NAS 5 and they have been doing some tests with the purpose of having a production environment ready to launch a domain crawl of the .gal domain (the regional domain of Galicia) in September.
- August 8th
- September 5th
- October 3rd
- November 7th
- December 5th
- January 9th, 2018