First of all, this month, we are going to launch an internal project to improve several of our harvests. The project will run until July. It includes several parts:
At the end of January, Wayback version 8.10.0 has been released. This new version includes the publication of our new virtual guided tour concerning Artificial Intelligence.
A new Video crawl is running since January 26th. We are harvesting 13 Youtube channels for an estimated size of 4,8 TB.
The .eus domain has been harvested for the first time. A broad crawl of the regional domain of Vasque Countries with over 13,000 domains and 730 GB. It is a milestone for us because for the first time we have managed to save all the Spanish domains: .es, .gal, .cat and .eus.
We continue to have problems when we want to harvest Twitter in general. We have tested lowering the number of objects to 5,000, but when we want to save in the same harvest many accounts, only the picture of the first ones accounts are saved, the rest only the text. We have not found a solution to this problem.
Since mid-January, we have detected a new error harvesting Twitter. When we want to save a hashtag, trendy topic or search we get a 404 error, even though they exist on the web. We think that Twitter has changed some security policy.