- Test search and retrieval of harvested material through wayback.
1) All netarchivesuite apps on email@example.com now uses the same derby server listening on port 50002. If this server is down, start the server with the command
cd derbyDB; bash start_derby.sh
2) This test is to be run with OpenWayback. IA wayback is no longer supported.
Clean TEST12 derby database
Note: some instances on firstname.lastname@example.org may need to be restarted after this operation (specifically, SystemTest and StressTest instances
Run a standard devel setup Setup DK test environment.
Upload a Small Bitarchive
Build Netarkivets Fork of OpenWayback
Clone the repository and build and deploy it to your local maven repository
Alternatively use "mvn -DskipTests clean deploy" to deploy a new snapshot to nexus.
Build Netarkivets OpenWayback Overlay
Clone the repository
Edit pom.xml to point to refer to the latest NetarchiveSuite snapshot version and to the same openwayback version installed in the previous step (currently 2.4.0-NAS-SNAPSHOT) and then build the package
This builds the warfile target/netarkivet-openwayback.war which should be renamed to "wayback.war" for the next step.
Construct A Clean Wayback Environment
Checkout the deploy template from ssh://email@example.com:7999/nark/openwayback-config.git . (possibly with command git clone ssh://firstname.lastname@example.org:7999/nark/openwayback-config.git on kb-test-way-001.kb.dk) Copy the entire tree to kb-test-way-001.
Follow the instructions in the Readme.md file in the wayback_deploytemplate directory. Note the following:
- The name of the directory should normally be wayback_test12
- The procedure for building a warfile is described above
- tomcat version is 6.0.26
- The NAS settings file contained in the git repository can be used unchanged
- The default ports for the proxy endpoint in settings.conf should be changed to your assigned tester port
- If the conf/tomcat_conf/server.xml redirect port 8443 is not available, change it to 8444
- Now drop the netarkivet-openwayback.war, renamed to wayback.war, in the wars directory in the installation.
Now start wayback/tomcat with the start script in wayback_test12/bin.
Check the log for error messages
First do a sanity test that wayback is running and that the configuration is sane
- Use X-forwarding and start a firefox running directly on kb-test-way-001.kb.dk
- Check that the browser is not set to use a proxy
- Browse to localhost:<your port> and check that you can reach wayback
After this you can try the accessing the proxy endpoint via ssh port forwarding (see details below).
Redeploying to an existing installation
To redeploy to an existing wayback installation
- Drop the warfile wayback.war in the wars directory
Touch the context-descriptor file
Wait a few seconds, then restart wayback with the provided script
Check That Wayback Proxy Endpoint Is Working
Now, in a browser of your choice set the internet connection settings to use kb-prod-udv-001.kb.dk Port $PORT as proxy. In Firefox, a good idea is to execute firefox -P --no-remote and create a new profile which uses this proxy setting and points to wayback as its start-page.
Go to http://kb-test-way-001.kb.dk:8080/ (or whichever port you set up as the wayback endpoint in settings.conf) and check that you can see the wayback search.box.
Wait for Indexing to Complete
On kb-prod-udv-001 wait to see the indexer application run by executing:
The indexer runs every five minutes. If you are impatient, just log onto kb-test-way-001 and in the directory $TESTX/conf kill and restart the indexer. It will run right away.
You can follow the progress of indexing with the following two commands
The first gives the number of files discovered by the indexer, and the second gives the number of files indexed. When these are equal, indexing is done.
Wait for Aggregator
After the indexer is run, wait for the aggregator to run by watching for the creation of the index file:
until the file wayback_intermediate.index appears. This will take at most ten minutes. If you are impatient, just log onto kb-test-way-001 and in the directory $TESTX/conf kill and restart the aggregator. It will run right away.
Move The Index File
Move the index file to the place where wayback expects to read it. [I think this is now unnecessary - CSR]
In the proxied browser you should now be able to search and browse in the repository. The following standard domains are present in the arcfiles:
In addition, the following domains are present in the warcfiles:
The warc.gz files contain a single harvest each of honda.dk, toyota.dk, mazda.dk and sa.dk from 2016-10-31. sa.dk is an example of a https site which renders badly in the current version of wayback used by Netarkivet.
- Use the wayback advanced search page to list all the url's harvested from a particular domain.
- Choose some of them you would like to block by regular expressions.
- On devel@kb-test-way-001 add these regular expressions (one per line) to the file conf/wayback_regexps.txt under the wayback installation folder.
- On devel@kb-test-way-001, restart tomcat by executing the script bin/start_wayback.sh under the wayback installation folder.
- Check the blocked urls are no longer visible in advanced search
- Check that if you try to visit one of the blocked urls wayback shows you a page informing you that the content has been blocked
kb-test-way-001stop the wayback tomcat server using the stop_wayback.sh script in the bin folder of the wayback installation
Make sure that the NAS settings file in the conf directory includes a block with the following settings
Start the tomcat again.
- Check that you can still browse in the material.
- Shutdown the wayback server
Shutdown the Test
- If you have a background ssh port-forwarding process running a proxy to wayback then you should also kill this at this stage.