[NAS-5] Reseachers should not be allowed access to e-books through Wayback Created: 10/Jan/11  Updated: 13/Apr/11  Due: 21/Jan/11  Resolved: 13/Apr/11

Status: Resolved
Project: NetarchiveSuite
Component/s: Wayback
Affects Version/s: I45
Fix Version/s: I46

Type: Bug Priority: Critical
Reporter: Christen Hedegaard (Inactive) Assignee: Colin Rosenthal
Resolution: Fixed  
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Inspector: Søren Vejrup Carlsen Søren Vejrup Carlsen (Inactive)

 Description   

Download of e-books to Netarkivet.dk has started in Decmber 2010, As reseachers are not allowed to get access to the e-books through the Wayback system a method must be implemented to hide e-book s for the researchers. Is it possible to create a rule in the Wayback system or must it be implemented in the index system for Wayback. This has to be implemented before the Wayback index is updated with the latest Netarchive.dk content.



 Comments   
Comment by Colin Rosenthal [ 24/Jan/11 ]

The proof of concept works. It is possible to block the entire subdomain www.mtp.hum.ku.dk. Blocking
of only non-free full-texts may or may not be possible, but cannot currently be tested because they
are apparently not harvestable from the test system.

Comment by Colin Rosenthal [ 19/Jan/11 ]

In principle, blocking access to url's in wayback is simple. Define an
exclusion factory
<bean id="static-exclusion" class="org.archive.wayback.accesscontrol.staticmap.StaticMapExclusionFilterFactory" init-method="init">
<property name="file" value="/tmp/exclude.txt" />
<property name="checkInterval" value="600" />
</bean>
and then add it to the access point definition
<property name="exclusionFactory" ref="static-exclusion">

The file lists url prefixes which are to be excluded. It is more difficult to see how we can enable QA of the ebooks while preventing unauthorised access. One possibility is to define an additional access point for wayback which is on a port visible only to the internal network.

Generated at Sat Apr 20 03:02:08 CEST 2024 using Jira 9.4.15#940015-sha1:bdaa9cbecfb6791ea579749728cab771f0dfe90b.