- Test search and retrieval of harvested material through wayback.
1) All netarchivesuite apps on firstname.lastname@example.org now uses the same derby server listening on port 50002. If this server is down, start the server with the command
cd derbyDB; bash start_derby.sh
2) This test is to be run with OpenWayback. IA wayback is no longer supported.
Clean TEST12 derby database
Note: some instances on email@example.com may need to be restarted after this operation (specifically, SystemTest and StressTest instances
Run a standard devel setup Setup DK test environment.
Upload a Small Bitarchive
Build Netarkivets Fork of OpenWayback
Clone the repository and build and deploy it to your local maven repository
Build Netarkivets OpenWayback Overlay
Clone the repository
Edit pom.xml to point to refer to the latest NetarchiveSuite snapshot version and to the same openwayback version installed in the previous step (currently 2.4.0-NAS-SNAPSHOT) and then build the package
This build the warfile target/netarkivet-openwayback.war which should be renamed to "wayback.war" for the next step.
Construct A Clean Wayback Environment
Checkout the deploy template from ssh://firstname.lastname@example.org:7999/nark/openwayback-config.git . (possibly with command git clone ssh://email@example.com:7999/nark/openwayback-config.git on kb-test-way-001.kb.dk) Copy the entire tree to kb-test-way-001.
Follow the instructions in the Readme.md file in the wayback_deploytemplate directory. Note the following:
- The name of the directory should normally be wayback_test12
- The procedure for building a warfile is described below
- tomcat version is 6.0.26
- The NAS settings file from the wayback indexer can be used unchanged
- The default ports 8080/8090 are not usually available so change these in settings.conf to 8080 and 8091.
- Also conf/tomcat_conf/server.xml specifies a redirect port 8443 which is not available. Change this to 8444
To build a an OpenWayback for NAS
- Checkout the git repository https://github.com/netarchivesuite/netarkivet-openwayback-overlay
- The pom.xml specifies version numbers for both NAS and OpenWayback. Check that these are the values we want.
Build the overlay with
- This should build a war file netarkivet-openwayback.war which can be renamed to wayback.war and dropped in the wars directory in the installation.
Now start wayback/tomcat with the start script in wayback_test12/bin.
Check the log for error messages
First do a sanity test that wayback is running and that the configuration is sane
- Use X-forwarding and start a firefox running directly on kb-test-way-001.kb.dk
- Check that the browser is not set to use a proxy
- Browse to localhost:8091 and check that you can reach wayback
After this you can try the accessing the proxy endpoint (port 8081) via ssh port forwarding (see details below).
Alternatively Install IA-Wayback
The procedure for installing IA Wayback is identical to that given above except that the warfile is prepared differently.
- clone the git project https://github.com/netarchivesuite/wayback-netarchivesuite
build the project with
- Copy the file wayback-1.8.0-SNAPSHOT.war to the wars directory in the wayback_test12 folder.
Check That Wayback Proxy Endpoint Is Working
Now, in a browser of your choice set the internet connection settings to use kb-prod-udv-001.kb.dk Port $PORT as proxy. In Firefox, a good idea is to execute firefox -P --no-remote and create a new profile which uses this proxy setting and points to wayback as its start-page.
Go to http://kb-test-way-001.kb.dk:8080/ (or whichever port you set up as the wayback endpoint in settings.conf) and check that you can see the wayback search.box.
Wait for Indexing to Complete
On kb-prod-udv-001 wait to see the indexer application run by executing:
The indexer runs every five minutes. If you are impatient, just log onto kb-test-way-001 and in the directory $TESTX/conf kill and restart the indexer. It will run right away.
You can follow the progress of indexing with the following two commands
The first gives the number of files discovered by the indexer, and the second gives the number of files indexed. When these are equal, indexing is done.
Wait for Aggregator
After the indexer is run, wait for the aggregator to run by watching for the creation of the index file:
until the file wayback_intermediate.index appears. This will take at most ten minutes. If you are impatient, just log onto kb-test-way-001 and in the directory $TESTX/conf kill and restart the aggregator. It will run right away.
Move The Index File
Move the index file to the place where wayback expects to read it. [I think this is now unnecessary - CSR]
In the proxied browser you should now be able to search and browse in the repository. The following standard domains are present in the arcfiles:
In addition, the following domains are present in the warcfiles:
The warc.gz files contain a single harvest each of honda.dk, toyota.dk, mazda.dk and sa.dk from 2016-10-31. sa.dk is an example of a https site which renders badly in the current version of wayback used by Netarkivet.
- Use the wayback advanced search page to list all the url's harvested from a particular domain.
- Choose some of them you would like to block by regular expressions.
- On devel@kb-test-way-001 add these regular expressions (one per line) to the file conf/wayback_regexps.txt under the wayback installation folder.
- On devel@kb-test-way-001, restart tomcat by executing the script bin/start_wayback.sh under the wayback installation folder.
- Check the blocked urls are no longer visible in advanced search
- Check that if you try to visit one of the blocked urls wayback shows you a page informing you that the content has been blocked
kb-test-way-001stop the wayback tomcat server using the stop_wayback.sh script in the bin folder of the wayback installation
Start the tomcat again.
- Check that you can still browse in the material.
- Shutdown the wayback server
Shutdown the Test
- If you have a background ssh port-forwarding process running a proxy to wayback then you should also kill this at this stage.