Page tree
Skip to end of metadata
Go to start of metadata

Agenda for the joint KB, BNF, ONB and BNE NetarchiveSuite tele-conference 2018-05-15, 13:00-14:00.

Participants

  • BNF: Sara, Géraldine
  • ONB: Andreas
  • KB/DK - Copenhagen: Tue, Nicholas, Soren
  • KB/DK - Aarhus: Colin, Sabine
  • BNE: Mar
  • KB/Sweden: -

Feedback and update on NAS 5.4

NAS 5.4 is available for download here but we are awaiting completion of the acceptance test before making a formal announcement.

We have actually found a bug (memory leak) in NAS 5.4  NAS-2751 - Getting issue details... STATUS  which affects the new functionality to manage the number of Jobs on-queue. The feature is, in fact, disabled by default, but we are working on a quick patch-release so there will be a 5.4.1 within days.

Status of the production sites

Netarkivet

  • We finalized the compression of the archive. The size is now 477 TB, before we started the compression process, the size was 793 TB (2017-09-17)
  • We upgraded our test environment to NAS 5.4. (Actually we have today upgraded in Production as well. We will patch in production when the fix to NAS-2751 is available.)
  • We are just now upgrading the Blacklight search frontend to the newest version, hwich is supporting the new SOLR index.
  • We just started a campaign for the collection of Danish podcasts and ditto Youtubers. The big challenge is to identify them. We sent an email to all our colleagues in our institution and asked them to help us. Does any of you have experiences with identifying podcasts and Youtubers?
  • The collective negotiations on pay for public employees did come to a result. Our event crawl will finish, when all union members have given their votes for the agreement.
  • The new full-time functional lead (Strategic Coordinator) for Netarchive Anders Klindt Myrvoll will start at KB in Copenhagen on 28th May – he will join us at forthcoming meetings

BnF

At the beginning of May we had a meeting to prepare the programme of the 2018 broad crawl. We decided to put NetarchiveSuite 5.4 into production without any additional development except the management of TLDs. We'll contact the same registrars as last year to collect a similar number of seeds, but we'll try to be more attentive to the scope of the harvest: we met some problems with new TLDs like .museum which contained a lot of foreign web sites. We'll also review all our storage space and its managment. The launch of the broad crawl is scheduled for October.

In parallel, we have made an evolution to the system to check the validity of URLs in BCWeb, so that the version 5.3 can recognize all types of HTTPS. We are aiming to use this release before the summer.

ONB

- The Problem with "duplicate Jobs submitted", seems to be solved. Changing the scheduling frequency from 1 to 5 Minuten works good so far since 4 weeks.
-  We finished our crawl to the 4th local election this year, so there is no further election planned for this year
-  Our Crawl of twoday.net, a austrian blogging platform which will shutdown by end of this month, is still running.
- And we have some change in the infrastructure. Our It-Department changes some Network attached storages. So this means a lot of syncing and copying

BNE

          

KB-Sweden


Next meetings

  • June 12th
  • July 17th
  • September 4th

Any other business?

  • No labels