Details
-
New Feature
-
Resolution: Unresolved
-
Major
-
2.0
-
None
Description
in the PLIGT-system we have havested a lot of data from www.bs.dk
None of this (except frontpage) can be viewed in viewerproxy
heritrix encodes URLs like:
http://www.bs.dk/showfile.aspx?IdGuid=
to
http://www.bs.dk/showfile.aspx?IdGuid=%7BBB0455A5-4BA9-4054-8EC3-4251813B96F4%7D
but when browsing (with IE - haven't checked other browsers) those braces are
not encoded by the browser - so nothing is found from viewerproxy.
so fix could be:
have CDXReader.getKey(String uri) URLencode uri before calling BinSearch.
It should be checked how this would affect uri's that are already URLencoded ? -
maybe just som chars should be encoded
BJA remembers doing similar thing in the special RoyalWedding-brach of viewerproxy.
NOTE: This bug is originally from Bugzilla bug_id=623.
This bug was previously assigned to Unassigned.