Agenda for the joint BNF, ONB, SB, KB and BNE NetarchiveSuite tele-conference 23-02-2016, 13:00-14:00.
- BNF: Sara, Annick, Lam
- ONB: Michaela, Andreas
- KB: Tue, Søren, Nicholas, Stephen, Jonas
- SB: Colin, Sabine, Ditte
- BNE: Mar, Soledad
Update about NAS 5.0 development and implemented features (Colin)
Update about NAS 5.0 configuration and testing (Tue)
Information about our participations to the IIPC GA (all)
- Sabine is going and will make a presentation (abstract)
- Ditte is going and present
Status of the production sites
- Our first broad crawl is proceeding as planned.
- After 9 month we succeeded in breaking through the paywall for one of the biggest Danish newspaper’s sites, politiken.dk (IP validation for the harvester)
- Facebook.com seems to have blocked our harvester, just now we do not capture anything from FB
- We had a fruitful meeting with our advisory board: the members gave us feedback on what they thought was important cultural heritage on the Internet.
- We have established a Jira backlog for handling problems with the selective crawls
Here is an update on part of our regular activity, the ongoing crawls. The seeds are chosen by librarians in the different departments of the BnF, and also by partner libraries in Strasbourg and in Montpellier. They cover websites we absolutely must have in the main fields of knowledge, and each department or library draws up its collection policy for these crawls as part of its overall collection policy and within the legal deposit framework of web archiving.
In 2015, these crawls contained 14,000 seed URLs, which were harvested with a specific frequency (weekly, monthly, twice a year or annually) and depth (domain, host, path, page+2). In total, in 2015 we collected 756 million URLs, representing 38 TB.
We will maintain these ongoing crawls in 2016, alongside several project crawls (World War I, social movements, international publications, solidarity, official publications, Olympic games).
On this International Women's Day, we can tell you that we have a new partnership. The Centre of Archives on Feminism in Angers has joined us for the nomination of websites: in 2016, we will harvest 300 new sites dedicated to all aspects of feminism in France. This selection is included in our "Social Movements" crawl which will be held in May.
- We finished our 2015 broad crawl.
- Our archive currently holds 40 TB (physical, compressed). For 2016 we have a storage budget of only 1 TB.
- In April presidential elections will take place. Later this year we’ll have a project with ONB’s Women’s Documentation department. Media and politics collections are ongoing.
- We are still working on our search interface with a partial fulltext search function.
Any other business?