Agenda for the joint NetarchiveSuite tele-conference 2021-12-14, 13:00-14:00.
- BNF: Clara, Sara, Auriane
- ONB: Andreas
- KB/DK - Copenhagen: Anders, Thomas, Stephen, Tue
- KB/DK - Aarhus: Colin
- BNE: José, Alicia, Miguel
- KB/Sweden: Peter, Jonas
Update on NAS latest tests and developments
Status of the production sites
- NAS and Bitmagasinet. We are having quite some difficulties getting everything perfect, but we are working hard to iron out any issues. Seems like it´s mainly Haddop-issues
- In the future how do we test NAS-releases as a community?
- We are planning to test RHEL 8 - but we need a stable test-system first, so we can se implications form upgrading. Any feedback from others who have updated would be appreciated
- We have a contact at YouTube/Google and they have asked us for IP- ranges, useragent and motivations - so there is hope to harvest YouTube in the future like BnF
- We are awaiting that the IIPC steering comitee decides to proceed (or not) with User-Friendly High Fidelity Browser-Based Crawling for All - IIPC Tools Portfolio Funding Proposal 2022-2023
- Broad harvest is postphoned until january/we have a working system
- We still do selective and special harvests.
First of all, we are pleased to welcome, in our team, Kevin Locoh-Donou, data engineer for the LIFRANUM research project, for a period of 9 months.
On the occasion of "Fantastic Futures" the 3rd international conference on Artificial Intelligence (AI) for librairies, archives and museums which took place at the BnF on December 10th 2021, the digital legal deposit service highlights its AI websites collection:
Finally, our 2021 broad crawl ended on November 14th and lasted a little less than 5 weeks. 2.5 billion URLs were crawled for a total of 114TB.
- Working on strategy working groups for our vision 2035 in which webarchiving should play a bigger role.
- For reasons we started again the Corona-Crawl
- Updated Frontend https://webarchiv.onb.ac.at from Bootstrap 3 (jquery 1.x) to Bootstrap 4 (jquery 3.6)
- We ingested the first 20 TB (20%) into the long time Preservation system
- Ending the broad crawl of the .gal domain (regional domain from Galicia) 2.500 domains and 315 GB of information
- We are studying the creation a new comic collection to harvest the webcomics and comics on the internet, an all this short-lived production in free access.
- Updating of Nas 7.2 is ready in Preporduction. We are waiting for a new powerful hardware on Janaury to carry it out
Any other business?