Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Location: National Library of Spain (entry at the ground floor)

Address: Paseo de Recoletos, 20-22 - 28071-Madrid

...

Organization

Technical

Curator 

Netarkivet

Colin Samuel Rosenthal

Knud Aage Hansen 

Stephen Hunt

Tue Hejlskov Larsen

Kristian Bak

Anders Klindt Myrvoll

Sabine Schostag

Stephen Hunt

ONB (via Skype)

Andreas P

Michaela

BnF

Sara Aubry

Clara Wiatrowski

Géraldine Camile

BNE

Juan Carlos García

José María Martín

Fernando Monzón

Luis Sánchez

Nuria Serrano

Alicia Pastrana

María Bueno

María Ezquerra

Mar Pérez

NL of SwedenThomas Roos

Pär Nilsson

Peter Svanberg

...

TechnicalCuratorial
  • State of the art of current bugs and possible fixes
  • State of the art of current developments and upcoming developments  in NetarchiveSuite
  • Integration of latest H3 stable release
  • Videos, social media, Umbra (installation, configuration tests and usage)
  • Introducing WARC 1.1
  • Brainstorming on priorities for future developments
  • Brief state of the art on access tools: use and perspectives in the different institutions
  • SolrWayback demo
  • How does your NAS Deployment/Configuration look like (Settings, Hardware)?
  • Oracle java is subject to a fee. ONB's IT department would like to switch completely to OpenJava if possible. Does NAS work as usual on OpenJava? What do you think? What will you do?

  • NAS missing features
  • Brainstorming on priorities for future developments
  • Scheduling harvests at a precise date or period
  • Presentation of BCweb latest release and futures evolutions
  • Presentation of how do we collect and crawl youtube and give access
  • Coordination of external selections
  • What documentation shall we provide for researchers?
  • How do we make workspaces for researchers - tools, limits?
  • Capturing social media
  • Webarchives and digital preservation
  • Which browser and version do we support today and in the near future in harvester requests, Umbra and in archive acccess? Umbra usage and experiences
  • OpenWayback and CDX creation issues and development - and experiences with other tools e.g. pywb, SOLRWayback?
  • Broad crawls
    • How do we make job monitoring during broad or big “deep” crawl’s?
    • How do we manage huge webhotels?
    • How do we manage byte/objects limits for different groups of domains?


...

16:15 - 17:30 NetarchiveSuite 5.5: demo and discussion of latest features including Umbra (Colin), Umbra usage and experiences, Feedback on tests (input from Clara, Tue, Andreas, ?). State of the art of current and upcoming developments (Colin)

20:00 or 21:00 30 Dinner (at own expense)

Schedule for 21.02.2019 (9:00-17:00)

09:00 - 12:3000

Technical track:

  • Share NAS deployment and configuration in our institutions to identify used/unused components (Colin to draft a "form" for common language, each institution to fill it in)
  • Troubleshoot management of broad crawls (input from Tue, Sara, others?):
    • How do we make job monitoring during broad or big “deep” crawl’s?
    • How do we manage huge webhotels?
    • How do we track web parkings?
    • How do we manage byte/objects limits for different groups of domains?

  • Discuss state of the art of current bugs and possible fixes
    • Current bugs identified during ONB Domain Crawl 2018 (Andreas)
    • Current bugs identified during BnF Domain Crawl 2018 (Sara, Clara)Issues
  • Review lists of NAS bugs and missing features from 2017 DK list - BnF list and internal lists, labell "Madrid" issues of interest
  • Review of issues labelled "Madrid":
      
    JIRA
    serverSBForge
    columnskey,summary,created,reporter
    maximumIssues100
    jqlQueryproject = NAS AND labels = Madrid
    serverId327e372c-baf0-3de4-afa1-7694d9fcf12b
  • Discuss state of the art of current and upcoming developmentsDiscuss possible integration of OpenJava, latest H3 stable release and WARC 1.1
  • ??? Review lists of NAS bugs and missing features from 2017 DK list - BnF list
  • Brainstorm on priorities and NAS codebase evolution for future developments from a technical perspectiveSum up
  • Discuss the possibility to submit an IIPC project

Curator track:

  • Troubleshoot management of broad crawls (Géraldine, ?)
  • Review and update the NAS curator roadmap (NASC) with comments from Vienna meeting (Sabine)
  • Brainstorm on priorities for future developments from a curatorial perspective
  • Discuss what type of documentation to provide for researchers (Michaela)
  • Discuss practices and challenges in coordinating external selections (Géraldine, Sabine, Mar?)
  • Sum up

10:30 - 10:45 Coffee break

12:30 - 14:00 Lunch14:00 - 1512:30 Sum up of curators and technical priorities. Discuss the possibility to submit an IIPC project.15

12:30 - 15:45 Coffee break15:45 14:00 Lunch

14:00 - 17:00 Access tools to webarchives:

...

Complex harvesting

Share experiences, practices and questions in the

...

management of broad crawls (input from Tue, Sara,

...

BNE, others?)

...

:

  • How do we make job monitoring during broad or big “deep” crawl’s?
  • How do we manage huge webhotels?
  • How do we track web parkings?
  • How do we manage byte/objects limits for different groups of domains?

  • How do we manage many harvesters running at the same time?

Share experiences and practices in crawling and giving access to YouTube videos (Sara, ?)

Share experiences in crawling social media (which media? who?)

Discuss possible further cooperation on these topics, common tools integration

15:30 - 15:45 Coffee break

17:00 - 19:00 ?? Guided tour of the BNE

Schedule for 22.02.2019 (9:00-14:30)

09:00 - 10:00 30 Update on BCweb (Géraldine, Clara)

  • Demo of BCweb new functionalities
  • Update on BnF current and upcoming developments
  • Update on open source status
  • Discuss interest in upgrading and possible community developements

10:00 - 30 - 10:45 Coffee break

10:45 - 12:30 Selecting, harvesting, accessing and preserving videos (Sara, ?)

  • Experiences and practices in the different institutions
  • Discuss common tool integration

10:30 - 10:45 Coffee break

Access tools to webarchives:

  • Discuss perspectives, projects and questions in the different institutions (input Sara, Géraldine, ?)
  • Browser, OpenWayback and CDX creation issues and development, experiences with other tools e.g. pywb, SOLRWayback
  • ??? Demo of SolrWayback

12:30 - 13:00 Community next steps

...