Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-1613

To flick through the pages of OAI-harvest of e-books

    XMLWordPrintable

Details

    Description

      We have moved forward with the Netarchive harvesting of e-books from Museum Tusculanum Press (MTP at Copenhagen University). Latest test harvesting has caught 389Mbytes in 104 objects:
      http://kb-test-adm-001.kb.dk:8080/History/Harveststatus-jobdetails.jsp?jobID=734
      Bjarne's comment:
      "Next challenge for OAI harvesting is getting Heritrix to flick through the pages of the OAI result. The way MTP have made their OAI we only get 100 books at a time - and at the bottom of the XML file there is a ResumptionToken - a kind of code you need to generate a link. It requires a special setup of Heritrix. I believe we need to make a little script (BeanShellScript) for link extraction / link generation. A developer must write the actual code. I do not think anyone has tried to write a link-extraction script in BeanShellScript before (at Netarkivet). A mail on the Heritrix-mailing list could tell if anyone in the world have already created an OAI-target script to Heritrix - it would be nice if it already existed."

      Attachments

        Issue Links

          Activity

            People

              csr Colin Rosenthal
              clo Claus Lomborg (Inactive)
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: