Child pages
  • Appendix A - How To Do Examples
Skip to end of metadata
Go to start of metadata


Install the QuickStart according to the Quick Start Manual, e.g. in /home/test/netarchive.

  • Add some domains to harvest using the the ADMGUI e.g.,,
  • Create and run a snapshot with a byte limit of 100.000
  • Wait until the job is done
  • Setup your browser for browsing and index your harvest job
    cd /home/test/netarchive/scripts/simple_harvest/bitarchive1/filedir
    export CLASSPATH=/home/test/netarchive/lib/dk.netarkivet.common.jar

Arc Merge:

java > resulting.arc 

Extract CDX:

java > output.cdx 

Get Record using Lucene:

#e.g. an URI from the harvest found in your "viewerproxy"
export URI=
cd /home/test/netarchive/scripts/simple_harvest/cache/fullcrawllogindex
cp -r 1-cache 1-cache.unzip
cd 1-cache.unzip/
gunzip *
export LUCENE_INDEX=/home/test/netarchive/scripts/simple_harvest/cache/fullcrawllogindex/1-cache.unzip
java -Ddk.netarkivet.settings.file=$SETTINGSFILE -Dsettings.common.remoteFile.port=5000 $LUCENE_INDEX $URI


cd /home/test/netarchive/scripts/simple_harvest/
cp /home/test/netarchive/scripts/simple_harvest/bitarchive1/filedir/resulting.arc new_resulting.arc
java -Ddk.netarkivet.settings.file=/home/test/netarchive/scripts/simple_harvest/settings.xml -Dsettings.common.remoteFile.port=5000 -cp /home/test/netarchive/lib/dk.netarkivet.archive.jar new_resulting.arc
#just press <CTRL-C> to stop the job

Batch e.g. with checksum:

cd /home/test/netarchive
mkdir batchprogs

#copy example batchprog to  batchprogs/.
cd batchprogs
java -cp lib/dk.netarkivet.archive.jar -Dsettings.common.remoteFile.port=5000 -Ddk.netarkivet.settings.file=/home/test/netarchive/scripts/simple_harvest/settings.xml -Cbatchprogs/ChecksumJob.class -Ooutput.checksum

  • No labels