Download the default_orderxml:
Replace the class "org.archive.crawler.extractor.ExtractorJS" with the class
"dk.netarkivet.harvester.harvesting.extractor.ExtractorJS"
Replace the default_orderxml with the edited version
make a simple harvest of netarkivet.dk
Nothing bad should happen. No NPEs and the like
Afterwords: look at the processors-report for this new extractor:
The processor-report should look similar to this for a standard netarkivet.dk harvest:
Processor: org.archive.crawler.extractor.ExtractorJS
Function: Link extraction on JavaScript code
CrawlURIs handled: 7
Links extracted: 46
but with a different processor line:
Processor: dk.netarkivet.harvester.harvesting.extractor.ExtractorJS