Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 56 Next »

Location: National Library of Estonia, Eesti Rahvusraamatukogu / Tõnismägi 2, Tallinn (http://www.nlib.ee ), meeting point: main entrance

Participants:

Organization

Technical

Curator 

BnF

., Sara Aubry

Clément Oury, Annick Le Follic, Géraldine Camile

ONB

Andreas Predikaka (participating through Skype)

Michaela Mayr (participating through Skype)

DK

Mikis Seth Sørensen, Søren Vejrup Carlsen, Tue Hejlskov Larsen, Per Møldrup-Dalum

Ditte Laursen , Sabine Schostag, Ulrich Karstoft Have

EstoniaMeelis Mihhailov, Rando Rostok

Jaanus Kõuts, Tiiu Daniel, Elis Karpov

Liina Abner (ebooks/newspapers discussion)

Spain

Juan Carlos García Arratia

Mar Pérez Morillo

The day before the workshop itself 6 NAS participants will give talks on the International Seminar on Web Archiving in Tallinn, see 2015-01-28 International seminar on web archiving in Estonia for details.

Topics to be discussed:

Heritrix 3 - technicalHeritrix 3 - curatorialNetarchiveSuite
  • State of the art of developments, scope of 1st release
  • WARC usage in Archive-it WARC-files compared to NAS (Tue)

  • Challenges
  • Upcoming developments: what, who, when
  • Feedback on testing
  • Motivations, practical examples for switching
  • Missing features
  • Priorities for future developments
  • Broad crawls: improve quality, reduce storage
  • Statistics based on ISO metrics: who is doing what? Future developments
  • Ebooks/newspapers: FTP harvesting
  • Any need from Estonia and Spain?
 

Agenda

Schedule for Day 1 (Thursday 29)

Location: Cupola Hall

09:00 - 09:30 Welcome and coffee

09:30 - 10:00 Workshop introduction (Sara)

Summary of the agenda, including any last minute additions.

10:00 - 11:15 Institution updates (one person from ONB, Estonia, BNE, BnFKB/SB)

Each institution presenting the main work topics and developments for 2014/2015.

11:15 - 11:30 Coffee break

11:30 - 13:00  Statistics on web archives using ISO metrics (Annick, ?SB/KB)

BnF presenting its currents statistics, tool and workflow, KB/SB presenting thoughts and decisions, exchanges/possible actions.

13:00 - 14:00 Lunch

14:00 - 14:10 Introducing Heritrix 3 in NetarchiveSuite: NAS 5.0 status and plans (Mikis)

14:10 - 14:30 Quick demonstration of Heritrix 3 (Søren)

14:30 - 14:45 Introducing Heritrix 3 in practices: BnF approach (Sara)

14:45 - 17:00 Heritrix 3 WARC/coders tracks: WARC usage in NAS compared to Archive-it (1h), NAS 5.0 code redesign/collaborative development possibilities (1h) (Mikis), location: seminar room

14:30 - 17:00 Heritrix 3 curator track: monitoring and QA crawls with Heritrix 3, identification of missing features (Annick/Géraldine)

Important: if possible, all participants should prepare this discussion by having run some preliminary tests on H3 as a standalone application.

15:30 - 15:45 Coffee break

17:00 - 17:30 Tour of web archiving activities in Estonia (Jaanus)

19:00 - Dinner

Schedule for Day 2 (Friday 30)

Location: Cupola Hall

09:00 - 09:30 Harvesting complex websites: experiments with Archive-it 4.9/5.0 using 3.3.0 with Umbra (Tue)

Experience With IA Umbra (Colin)

09:30 - 11:15 Broad crawls: improve quality with limited budget (Per)

Digging in the data mines of the Net Archive

Per presenting a study he just run on DK collections, all presenting on current practices and questions.

Details on the file identification experiment using Nanite: A Weekend With Nanite

Details on the "can we trust the MIME type as it was reported by the web server" experiment: http://rpubs.com/perdalum/de-dup1

The easiest way to get started with R: RStudio

My fork of JWAT-tools for easy extraction of craw.log files: https://bitbucket.org/perdalum/jwat-tools/branch/netarkivet

11:15 - 11:30 Coffee break

11:30 - 13:00 H3 tracks sum-up, review of NAS curator roadmap, community next steps

13:00 - 14:00 Lunch

14:00 - 15:30 Ebooks/newspapers: deposit or FTP harvesting (Tue, Liina, Géraldine)

15:30 - 15:45 Coffee break

15:45 - 17:00  Open space for an additional topic, individual discussions or free time

 

  • No labels