Last week, we launched our "Auction house" crawl, which concerns French auction houses websites. About 200 websites had been selected. Last year, we had been blacklisted by large auction sites. So we set up a specific harvest system for auction.fr where many websites are hosted. We added filters on all the other jobs in progress before starting the harvest and we created a special queue management to group the URLs of all hosts which belong to a website into one particular queue. This makes it possible to avoid sending too many requests at the same time as well as to limit the harvest to 100 000 URLs per website.
The LIFRANUM crawl carried out in partnership with researchers from the Jean Moulin University Lyon 3 and the Lumière University Lyon 2 is about to be launched.
Finally, we are continuing the preparations for our 2022 broad crawl.