Agenda for the joint BNF, ONB, SB, KB and BNE NetarchiveSuite tele-conference 29-03-2016, 13:00-14:00.
- BNF: Sara, Lam
- ONB: Michaela, Andreas
- KB: Søren
- SB: Colin, Sabine, Ditte
- BNE: Mar, Soledad
Update about NAS 5.1 release testing
Information about NAS at the IIPC GA?
Status of the production sites
- The first broad crawl 2016 finished at Feb. 29
- The event crawl on the refugee crisis is ongoing: It is a supplement to our selective news media and social media crawls.
- We are still blocked by Facebook.
- All curators met for 1 1/2 days in Aarhus. We prepared for the NAS H3 test and started a discussion on a new strategy for the broad crawls
- We made some comparative tests: NAS 5 H3 versus NAS 4 H1
- We participate in an application for a research grand: “Real time analysis and visualization of news streams”. Netarchive will participate in the project with extraction of files (Twitter) from the archive under the condition of being paid for it by the project
- Our archive reached 26 billion URLs and 668 TB of (compressed) data at the end of 2015.
- We are currently running our bi-annual crawl.
- We started to draft our NAS 5 H3 migration project for the second part of 2016. Besides the profiles, configurations and seed lists, there will be important changes on the ingest module of our preservation repository.
- We are currently developing a "Labs" application within the frame of research project named Corpus. This application will include wayback, full text index, WAT medatata files, link maps and the ability for researchers to create corpora.
- In approx. 4 weeks presidential elections take place in Austria. We will prepare an event crawl about the topic.
- The work on our online user interface continues and will be finished soon.
- We are about to launch our first domain crawl with NAS, starting around next Monday 4th April.
- Web curators of most regional libraries have accessed the BCWeb pilot environment we offered them to test and build their own web collections centered on topics of regional and local interest, as part of the Legal deposit of online publications. It is not connected to NAS yet. We are very thankful for this to the BnF web archiving team.
- Our IT people is working on the NAS development environment (so far we only had one, used as development and production at the same time), which is about to be ready to give access to our regional web curators.
- As we don’t have a new government yet (after the December 2015 General Elections, and they probably will be repeated by June), our General Elections selective crawl is not closed yet (ca. 3,5 Tb).
Any other business?