Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


NAS5 / Heritrix 3 - technicalHeritrix 3 - curatorial
  • State of the art of current developments
  • Upcoming developments
  • Introduce a multiple crawlers approach into NAS

  • Videos/social media harvesting
  • What CDX format are you using today and plan to support within next year?

  • Which version of (Open)Wayback are you using today and what do think about the future development of OpenWybackOpenWayback?

  • Performance of Wayback-Index. How to speed it up? Any expierence experience with splitting up the index in several chunks or serving the index from multiple hosts?
  • Which social media can you archive today?

  • How to consolidate crawl.log and frontier search features in NetarchiveSuite?
  • BNF's freetext search (better than KB DK's) - anything to share with the community?
  • Automatic quality assurance. Any Ideas? Proof of concepts?
  • Others ?
  • Feedback on using NAS 5 and Heritrix 3
  • Missing features
  • Priorities for future developmentsdevelopment
  • Is it possible to connect other tools than Heritrix to NAS (tools that can produce WARC files and capture content, which Heritrix is not able to catch) If so, which tools to we want to use?
  • Revival and update of the curator roadmap
  • Harvest the electoral web: selection, harvest parameters
  • Experiences with harvesting pages with login content (pay walls)
  • Experiences with harvesting images embedded in javascript (and replay them in the archive)
  • Exchange of experiences with documentation of the crawls (in and outside NAS)
  • Others ?