...
Panel |
---|
First of all we wish you a very happy new year and all best wishes for 2018 ! We have a change in the team, Ange Aniesa has left to take up a position in another department at the BnF, we wish him all the best. In December, we organized a week-long workshop within the team on collecting Twitter, to build on last year's election crawls, where we used Heritrix 3 to collect more than 3 500 Twitter accounts or hashtags twice a day, with a depth of page + 1 click. This allowed us to crawl the time line for each seed (i.e. 40 tweets per day per seed) and a part of the context (the time line of other accounts or hashtags mentioned in the seed). The goal of this workshop was to continue this specific crawl during the year by creating a new specific harvest definition, and to improve its quality. The quality of the crawl depends of the number of seeds. First we tested dividing the seed list between several jobs. Then we tested putting all the seeds in one job and dividing the queue twitter.com into 10 separate queues. The quality is better when the seed list is shared between several jobs than in several queues within one job, apparently because the division between queues isn't equal : some queues crawled more than 15 000 URLs while some crawled less than 1500 URLs. We need to continue the tests.
|
ONB
Panel |
---|
We are currently selecting seeds for local elections taking place end of January. |
BNE
Panel |
---|
KB-Sweden
Panel |
---|
Next meetings
...