Details
-
New Feature
-
Resolution: Fixed
-
Critical
-
None
-
None
Description
We have moved forward with the Netarchive harvesting of e-books from Museum Tusculanum Press (MTP at Copenhagen University). Latest test harvesting has caught 389Mbytes in 104 objects:
http://kb-test-adm-001.kb.dk:8080/History/Harveststatus-jobdetails.jsp?jobID=734
Bjarne's comment:
"Next challenge for OAI harvesting is getting Heritrix to flick through the pages of the OAI result. The way MTP have made their OAI we only get 100 books at a time - and at the bottom of the XML file there is a ResumptionToken - a kind of code you need to generate a link. It requires a special setup of Heritrix. I believe we need to make a little script (BeanShellScript) for link extraction / link generation. A developer must write the actual code. I do not think anyone has tried to write a link-extraction script in BeanShellScript before (at Netarkivet). A mail on the Heritrix-mailing list could tell if anyone in the world have already created an OAI-target script to Heritrix - it would be nice if it already existed."
Attachments
Issue Links
- Trackbacks
-
2011-08-09 Netarkiv møde DK møde Tidspunkt: 9. aug 11:00 12:00 Kort information (Mikis) Workshop i December hos BnF https://sbforge.org/display/NAS/2011DecemberworkshopatBnF. Ansøgningsrunde til fuldtids WARC udvikler i gang....