Agenda for the joint BNF, ONB, SB, KB and BNE NetarchiveSuite tele-conference 14-06-2016, 13:00-14:00.
- BNF: Sara, Annick, Lam
- ONB: Michaela, Andreas
- KB/DK: Søren, Tue, Jonas, Stephen, Nicholas
- SB: Colin, Sabine, Niels
- BNE: Mar
- KB/SE: Bengt ??, Stewart ??
Introducing new members
Niels Bønding from Netarkivet.
The royal Library of Sweden is going to use NAS. Bengt Neiss and Stewart Rutledge are joining the teleconferences.
NAS 5.1 Update
Feedback from DK on running NAS 5 + Heritrix 3.
A property in H3 respects the crawl-delay in robots.txt and by default sets it with 300 sec.
If you want to disable this property then add value 0 to ignore the robots.txt crawl-delay.
See the property marked with yellow:
End of January 2017 - 2,5 days - in Vienna
Please complete Michaela's poll : http://doodle.com/poll/nk6dfc3kav4a4hs8
Status of the production sites
We started the second broad crawl 2016 with a limit of 100 MB from each domain to be crawled.
We stopped the refugee crisis crawl. We did a smaller event crawl for the “Eurovision Song Contest”, were we focused on the Danish participants presence on Twitter and on thematic news sections. We are preparing for a crawl of the Olympic in Rio.
We started the implementatoin of our revised collection strategy. We have almost established the new selective crawls of national news sites.
One of the first social media platforms, arto.com, closed at 1st June. We had problems with our last complete crawl before the closing. With a specially developed modul, where the FetchDNS method is changed, we hope to be able to get all content directly from their server.
Potential collaboration project
The Parliamentary Library gives inhouse access to historical (archived) versions of the political parties’ websites. They are not quite satisfied with their solution. Netarchive and the Parliamentary Library are looking at potential future cooperation on this subject.
Niels Bønding is project lead for curation now.
This month we're opening an experimental access interface, Archives de l'internet Labs. This interface provides full-text searching of a small part of our collections, with the possibility to export results and save searches and selections in a personal workspace. It also provides access to statistics and metadata on the collections.
This interface builds on the work we have done over the past year or so on data mining and full text indexing. It is part of a four-year project at the BnF studying the creation of a service to provide researchers with corpora from the digital collections of the BnF, the web archives having been chosen as the case study for the first year of the project. For the moment this interface will only be available to researchers working on two specific projects who have signed a convention with the BnF, but as part of the overall project we will be looking at how this kind of service can be offered to more researchers.
- Please complete the doodle poll for the NAS meeting in Vienna by end of July: http://doodle.com/poll/nk6dfc3kav4a4hs8
The number of participants is needed to calculate the budget. Thank you!
- End of May presidential elections took place in Austria, the crawl continues until the new president is sworn in. As mentioned during one of the last calls, one of the political parties is blocking our crawlers. We captured the content with webcrawler.io, but still had no time to investigate how to include the warc-files into our archive.
- The new online search interface will be launched soon and we look forward to your feedback. We are currently waiting for a security check of our IT-department to be completed.
Any other business?