[NAS-2550] Don't try to show large warc-files in the browser but save them to a file Created: 13/Sep/16  Updated: 03/Nov/16  Resolved: 19/Oct/16

Status: Resolved
Project: NetarchiveSuite
Component/s: GUI, Viewerproxy
Affects Version/s: 5.1
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Søren Vejrup Carlsen (Inactive) Assignee: Søren Vejrup Carlsen (Inactive)
Resolution: Fixed  
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

External reference:

https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-1138

Sprint: NAS 5.2
Verification:

I didn't check sensitivity to the filesize parameter, but certainly warc.gz files are saved to disk now release test cases.


 Description   

The links presented by the QA-getfiles.jsp (e.g. http://kb-prod-adm-001.kb.dk:8080/QA/QA-getfiles.jsp?jobid=29234&harvestprefix=29234-76)

<p>The links below will only work if your browser is set up to use the viewerproxy as web proxy.</p>
    <a href="http://netarchivesuite.viewerproxy.invalid/getFile?arcFile=29234-76-20080606102834-00000-sb-prod-har-002.statsbiblioteket.dk.arc">29234-76-20080606102834-00000-sb-prod-har-002.statsbiblioteket.dk.arc</a><br>
    <a href="http://netarchivesuite.viewerproxy.invalid/getFile?arcFile=29234-76-20080606102834-00001-sb-prod-har-002.statsbiblioteket.dk.arc">29234-76-20080606102834-00001-sb-prod-har-002.statsbiblioteket.dk.arc</a><br>
    <a href="http://netarchivesuite.viewerproxy.invalid/getFile?arcFile=29234-76-20080606103124-00002-sb-prod-har-002.statsbiblioteket.dk.arc">29234-76-20080606103124-00002-sb-prod-har-002.statsbiblioteket.dk.arc</a><br>

asks the viewerproxy the fetch a single (w)arc-file and show the file in the browser.

If the file is big, this will crash the browser.

The solution: don't try to show the file in the browser, if the file is big
We do this already for big records



 Comments   
Comment by Søren Vejrup Carlsen (Inactive) [ 13/Sep/16 ]

The new feature as well as the feature used by GetRecord uses the maxSizeInBrowser setting with default 100000000 (100Mb), which I suggest reduced to 10 MB

<viewerproxy>
            <baseDir>viewerproxy</baseDir>
            <tryLookupUriAsFtp>false</tryLookupUriAsFtp>
            <maxSizeInBrowser>100000000</maxSizeInBrowser>
        </viewerproxy>
Comment by Søren Vejrup Carlsen (Inactive) [ 13/Sep/16 ]

I have now reduced the default to 10 MB

Comment by Søren Vejrup Carlsen (Inactive) [ 13/Sep/16 ]

If the arcFile parameter is missing, we get this error:

Internal server error for: http://netarchivesuite.viewerproxy.invalid/getFile?

dk.netarkivet.common.exceptions.IOFailure: Missing parameter 'arcFile'
	at dk.netarkivet.viewerproxy.GetDataResolver.getParameter(GetDataResolver.java:246)
	at dk.netarkivet.viewerproxy.GetDataResolver.doGetFile(GetDataResolver.java:211)
	at dk.netarkivet.viewerproxy.GetDataResolver.executeCommand(GetDataResolver.java:115)
	at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:72)
	at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:74)
	at dk.netarkivet.viewerproxy.WebProxy.handle(WebProxy.java:144)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Comment by Søren Vejrup Carlsen (Inactive) [ 13/Sep/16 ]

if we use the link

http://netarchivesuite.viewerproxy.invalid/getFile?arcFile=

or

http://netarchivesuite.viewerproxy.invalid/getFile?arcFile

we get error:

Internal server error for: http://netarchivesuite.viewerproxy.invalid/getFile?arcFile=
dk.netarkivet.common.exceptions.ArgumentNotValid: The value of the variable 'arcfilename' must not be an empty string.
	at dk.netarkivet.common.exceptions.ArgumentNotValid.checkNotNullOrEmpty(ArgumentNotValid.java:63)
	at dk.netarkivet.archive.arcrepository.distribute.JMSArcRepositoryClient.getFile(JMSArcRepositoryClient.java:203)
	at dk.netarkivet.viewerproxy.GetDataResolver.doGetFile(GetDataResolver.java:218)
	at dk.netarkivet.viewerproxy.GetDataResolver.executeCommand(GetDataResolver.java:115)
	at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:72)
	at dk.netarkivet.viewerproxy.CommandResolver.lookup(CommandResolver.java:74)
	at dk.netarkivet.viewerproxy.WebProxy.handle(WebProxy.java:144)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Comment by Søren Vejrup Carlsen (Inactive) [ 13/Sep/16 ]

So the parameter check should also check for an empty argument

Comment by Søren Vejrup Carlsen (Inactive) [ 14/Sep/16 ]

It does that now

Comment by Colin Rosenthal [ 15/Sep/16 ]

Did you commit this to master already? For me it already save files to disk (I'm testing with warc.gz) but the files always get the name "getFile".

Comment by Søren Vejrup Carlsen (Inactive) [ 06/Oct/16 ]

Yes, I did.

Generated at Mon Dec 09 07:15:55 CET 2019 using Jira 7.13.8#713008-sha1:1606a5c1e7006e1ab135aac81f7a9566b2dbc3a6.