Agenda for the joint NetarchiveSuite tele-conference 2019-11-05, 13:00-14:00.
- BNF: Clara, Sara
- ONB: Andreas
- KB/DK - Copenhagen: Tue, Stephen, Anders, Kristian
- KB/DK - Aarhus: Colin, Sabine, Knud Åge
- BNE: Alicia, María, Manuel, José
- KB/Sweden: Par, Thomas, Peter
Join from PC, Mac, Linux, iOS or Android:
Or an H.323/SIP room system:
Meeting ID: 104 443 571
Or Skype for Business (Lync):
Denmark: +45 89 88 37 88 or +45 32 71 31 57
United Kingdom: +44 203 051 2874 or +44 203 481 5237 or +44 203 966 3809 or +44 131 460 1196
Finland: +358 9 4245 1488 or +358 3 4109 2129
Sweden: +46 850 539 728 or +46 8 4468 2488
Norway: +47 7349 4877 or +47 2396 0588
US: +1 669 900 6833 or +1 646 558 8656
Meeting ID: 104 443 571
International numbers available: https://zoom.us/u/acRu0MV3xJ
You can join a meeting by using apps from a pc, a tablet or a smartphone, but you can also use the browser based version (it works with newer versions of Chrome or Firefox)
Update on NAS latest tests and developments
Feedback on usage / tests on NetarchiveSuite 5.6 release: see NetarchiveSuite 5.6 Release Notes
Feedback on tests on BnF test NAS 6.0 + IIPC H3 release : see presentation
Status of the production sites
Because of a political decision to change the terms of conditions, a broadcast station (radio 24/syv) decided to close down on 31 October. The announcement of the popular broadcast station to be closed raised a storm of reactions in the social media. People asked whether KB DK was going to keep this broadcasting station’s archive. We tried to capture as much podcasts as possible with umbra. As to the QA of the harvested content, we had to wait for the generation of an index (there was a long queue in the index generator), which was not ready before 4 November.
As Yahoo Groups are going to close down, too, we crawled Danish Yahoo Groups in the last couple of days.
The fourth broad crawl for 2019 is in preparation: there will be a step 1 with a domain limit of 50 MB and a step 2 with a domain limit of 16 GB. Together with this broad crawl we will run the following selective crawls: Research databases, Municipalities and regions, Ministries and Government Agencies, YouTube
We will crawl with NAS 5.5 and expect step 1 to last about 2 weeks, step 2 about 6-8 weeks.
Other projects keeping us busy:
• Work on risk assessment
• Implementation of SolR Wayback
• Consolidation of BCWeb (build up a community)
• Revision of collection strategies
• Capture of content behind paywalls – the never ending story
- We finished our 7th domain crawl, which was done again with only one stage. We crawled 150 Million objects in almost 7 TB (this is about 3 TB on disk).
- We upgraded to NAS 5.6 and use it in production
- We moved out the database to a stronger server, and since then the duplicate job generation error disappeared. Currently we are doing only selective crawls. We will see how it will work during a domain crawl.
We have finished our annual broad crawl:
- There were about 2 million of websites
- We have configured it with a limit of 150 MB/domain
- It lasted 29 days
- We have crawled 88% of the domains completely
We have some problems with the index and we are not be able to make QA of some of our content since February
We are going to install 6.1 version of BCWeb in the next weeks. After that, we will send our changes in BCWeb in order to let you analyze them and consider if you want to include them in next versions
- December 3
- January 7, 2020
Any other business?