Agenda for the joint KB, BNF, ONB and BNE NetarchiveSuite tele-conference 2017-06-06, 13:00-14:00.
- BNF: Sara, Géraldine
- ONB: Andreas, Michaela
- KB/DK - Copenhagen: Stephen, Tue, Nicholas
- KB/DK - Aarhus: Sabine, Colin
- BNE: -
- KB/Sweden: -
Upcoming NAS releases
5.3.1 bug fix release: https://sbforge.org/jira/projects/NAS/versions/12945
Status of the production sites
This month we focus very much on getting familiar to the use of BCWeb and the adaption of BCWeb to our needs. There will be local and regional elections in the end of the year and we would very much like to have a “Netarchive-BCweb” at that time, because we want to involve researchers and experts in for instance using social media with helping us to find url’s.
One crucial issue is, that we need the implementation of the possibility for bulk upload of url’s.
- On the production side, we are still running our 2017 election harvest to cover our parliamentary elections.
- We'll start to work on the preparation of our 2017 broad crawl mid June.
This month we are starting work again on full-text indexing, having developed a prototype and an experimental interface "Archives de l'internet Labs" in 2015 and 2016. At that time we used Solr along with tools from Netsearch and Web Archive Discovery to index our oldest archives, from the period 1996-2000.
The main objective for this year is to index our daily news crawl from its start in late 2010 until the end of 2016. As part of this work we will be aiming to improve our indexing process, in particular in terms of the algorithms applied to the text to improve the relevance of results for users. We will also be working on the interface to make it more modular, both to make it easier to include new collections in the future, and also to enable us to use the search function in our main access interface, while other functions (saved searches, corpus creation) will be maintained in the experimental Labs interface for the moment.
In relation to these developments, we hope to work with a team of researchers in linguistics who are studying the creation and use of neologisms in French. If the project goes ahead, they will use the news crawl in their work, and will bring their expertise to improving the text processing algorithms in our indexing process.
Any other business?