We are working what should be the final corrections before changing over to Heritrix 3 and NAS 5. Our tests have not identified any major problems, we are continuing to analyse the results to prepare for any changes that might arise, such as an increase in the amount of data collected. Once we start crawling with H3 we will increase our usual monitoring to be able to deal with any unexpected changes.
We started our 2017 elections crawls.
|- At the end of January we could finally finish our presidental elections crawl.|
- We are currently preparing our 5th domain crawl
- Finally we redeployed our Testenvironment. For that, we made some convenient changes in the DeplyApplication (deploying with optional logo images). See our Pullrequest https://github.com/netarchivesuite/netarchivesuite/pull/38
Answers to Questions from KB
1) We are using the CDX-Format coming from WaybackCDXExtractionARCBatchJob
2) We are using currently OpenWayback 2.3.1
3) We crawled some facebook pages by using https://webrecorder.io