Status of the production sites
Is now running on our production system. Metadata will be compressed. We do not run deduplication before the whole archive is compressed.
The analysis how to crawl selected social media is ongoing. We are looking at Facebook, Twitter, YouTube, Vimeo, Instagram, Soundcloud, Reddit., Flickr, Vine, Pinterest and Linkedin. We already decidet not to collect Snapchat, Google+ or Bandcamp.
We are going to test BCWeb in order to find out, whether we can use it to get help from external curators.
We have upgraded our citrix access software and solved problems with user categories.
Dialog with blocking Web hotels
We started a dialog with web hotels, who are blocking our harvester in order to find a solution that will make them stop the blockade.
Data amount (per 5.3.2017)
Total GB og TB i 1024 tal i arkivet: 793544 774
Number of GB/TB Broad crawls and ultra-big sites: 634638/619 and 62346/60
Number of GB/TB Selective crawls: 97198/94
Number of GB/TB Event crawls: 35387/34
(Exclusive metadata files and Test crawl files)