Page tree

Note that this documentation is for the old 5.2 release.
For the newest documentation, please see the current release documentation.

Skip to end of metadata
Go to start of metadata

Contents

Install the QuickStart according to the Quick Start Manual, e.g. in /home/test/QUICKSTART

  • Add some domains to harvest using the the ADMGUI e.g. netarkivet.dk, kb.dk, statsbiblioteket.dk
  • Create and run a snapshot with a byte limit of 100.000
  • Wait until the job is done
  • Setup your browser for browsing and index your harvest job

    cd /home/test/QUICKSTART/bitarkiv/filedir
    export CLASSPATH=/home/test/QUICKSTART/lib/netarchivesuite-common-core.jar
    export LOG=-Dlogback.configurationFile=/path/to/logback.xml
    ls 

    e.g.

Extract CDX:

export FILEONE=1-1-20090519083732-00002-dia-test-int-01.kb.dk.warc
java $LOG dk.netarkivet.common.tools.ArchiveExtractCDX $FILEONE > output.cdx 

Get Record using Lucene:

#e.g. an URI from the harvest found in your "viewerproxy"
export URI=http://netarkivet.dk/index-da.php
cd /home/test/QUICKSTART/cache/fullcrawllogindex
cp -r 1-cache 1-cache.unzip
cd 1-cache.unzip/
ls
gunzip *
export SETTINGSFILE=/home/test/QUICKSTART/conf/settings.xml
export LUCENE_INDEX=/home/test/QUICKSTART/cache/fullcrawllogindex/1-cache.unzip
export OPTS=-Ddk.netarkivet.settings.file=$SETTINGSFILE \
      -Dsettings.common.remoteFile.port=5000 
java $LOG $OPTS dk.netarkivet.archive.tools.GetRecord $LUCENE_INDEX $URI

Upload:

cd /home/test/QUICKSTART
cp /home/test/QUICKSTART/bitarkiv/filedir/resulting.arc new_resulting.arc
export SETTINGSFILE=/home/test/QUICKSTART/settings.xml
export OPTS=-Ddk.netarkivet.settings.file=$SETTINGSFILE -Dsettings.common.remoteFile.port=5000
java $LOG $OPTS -cp /home/test/QUICKSTART/lib/netarchivesuite-archive-core.jar dk.netarkivet.archive.tools.Upload new_resulting.arc
#just press <CTRL-C> to stop the job

Batch e.g. with checksum:

cd /home/test/QUICKSTART
mkdir batchprogs

#copy example batchprog ChecksumJob.java to  batchprogs/.
cd batchprogs
javac ChecksumJob.java
export SETTINGSFILE=/home/test/QUICKSTART/settings.xml
java -cp lib/netarchivesuite-archive-core.jar -Dsettings.common.remoteFile.port=5000 \
$LOG -Ddk.netarkivet.settings.file=$SETTINGSFILE dk.netarkivet.archive.tools.RunBatch \
-Cbatchprogs/ChecksumJob.class -Ooutput.checksum

ChecksumJob.java

 

  • No labels