[NAS-2690] The function "Browse only relevant crawl-log lines for this domain" is faulty Created: 08/Jan/18 Updated: 24/Apr/18 Resolved: 24/Apr/18 |
|
Status: | Closed |
Project: | NetarchiveSuite |
Component/s: | GUI |
Affects Version/s: | 5.2.2, 5.3.1 |
Fix Version/s: | 5.4 |
Type: | Bug | Priority: | Minor |
Reporter: | Søren Vejrup Carlsen (Inactive) | Assignee: | Søren Vejrup Carlsen (Inactive) |
Resolution: | Fixed | ||
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
External reference: | https://sbprojects.statsbiblioteket.dk/jira/browse/NARK-1212 |
Sprint: | NAS 5.4 |
Verification: | How to test this. Test that the new domain specific regexp returns the correct lines |
Description |
The code in harvester/qa-gui/src/main/webapp/QA-searchcrawllog.jsp is faulty: if (regexp != null && regexp.length() != 0 ) { crawlLogExtract = Reporting.getCrawlLoglinesMatchingRegexp(jobid, regexp); } else { // use 'domain' as the regular expression regexp = ".*" + domain.replaceAll("\\.", "\\\\.") + ".*"; crawlLogExtract = Reporting.getCrawlLoglinesMatchingRegexp(jobid, regexp); } The regexp in the else logic is used for the "Browse only..." functionality |
Comments |
Comment by Søren Vejrup Carlsen (Inactive) [ 06/Mar/18 ] |
Do you still want a review, or is it enough just to read the diff: https://sbforge.org/fisheye/changelog/NetarchiveSuite-Github?cs=33f143e1f05312303548b4f512202b071032a374 |
Comment by Colin Rosenthal [ 02/Feb/18 ] |
Is there a review? If so, it doesn't seem to be linked to the issue. |
Comment by Søren Vejrup Carlsen (Inactive) [ 16/Jan/18 ] |
This code is reade for review |
Comment by Søren Vejrup Carlsen (Inactive) [ 12/Jan/18 ] |
A valid regular expression has now been found: ".*(https?:\\/\\/(www\\.)?|dns:|ftp:\\/\\/)([\\w_-]+\\.)?([\\w_-]+\\.)?([\\w_-]+\\.)?" + domain.replaceAll("\\.", "\\\\.") + "($|\\/|\\w|\\s).*"; |