Page tree
Skip to end of metadata
Go to start of metadata

Agenda for the joint NetarchiveSuite tele-conference 2020-12-08, 13:00-14:00.

Participants

  • BNF: Sara
  • ONB: -
  • KB/DK - Copenhagen: Tue, Stephen, Anders, Alexandre
  • KB/DK - Aarhus: Kristian, Colin
  • BNE: -
  • KB/Sweden: Pär, Peter

Join from PC, Mac, Linux, iOS or Android:

    https://kbdk.zoom.us/j/104443571

Or an H.323/SIP room system:

    H.323: 109.105.112.236
    Meeting ID: 104 443 571

    SIP: 104443571@109.105.112.236

Or Skype for Business (Lync):

    https://kbdk.zoom.us/skype/104443571

Or Telephone:

Denmark: +45 89 88 37 88 or +45 32 71 31 57
United Kingdom: +44 203 051 2874 or +44 203 481 5237 or +44 203 966 3809 or +44 131 460 1196
Finland: +358 9 4245 1488 or +358 3 4109 2129
Sweden: +46 850 539 728 or +46 8 4468 2488
Norway: +47 7349 4877 or +47 2396 0588
US: +1 669 900 6833 or +1 646 558 8656
    Meeting ID: 104 443 571

    International numbers available: https://zoom.us/u/acRu0MV3xJ

You can join a meeting by using apps from a pc, a tablet or a smartphone, but you can also use the browser based version (it works with newer versions of Chrome or Firefox)


Update on NAS latest tests and developments

Feedback on revisits

Status of the production sites

Netarkivet

Broad crawl
Still moving along slowly. We are investigating why


Corona event harvest
Set harvest to daily again due to second lock down 

Personnel

Allan Christophersen <ALCH@kb.dk> has joined as project employee and is on Netarkviet 20% of his time


SolrWayback;

https://github.com/netarchivesuite/solrwayback/releases/tag/4.0.5

https://github.com/netarchivesuite/solrwayback


http://webadmin.oszk.hu/solrwayback/ (Hungarian Archive)


BnF

Our annual broad crawl has ended on 7th of November. It lasted 32 days, executed 1037 jobs, and crawled 2,455 billions of URLs for a size of 117,59 TB (compressed).

The French newspaper Liberation contacted our team to inform us that their blog platform (https://www.liberation.fr/blogs,26) would be closed in the course of December. The platform hosts more than 300 blogs. We launched an emergency crawl last week to crawl these blogs and preserve them.

We are working on the full text indexation (with Solr) of our covid-19 crawl performed between February and July of 2020 and covering the first wave of the pandemic. The size of this collection is about 15 TB (compressed). The new collection will be put in production during december and will be available to the readers through the GUI Archives de l'internet Labs.

ONB


BNE


KB-Sweden


Next meetings

  • January 5, 2021

Any other business?

·         


  • No labels