Søren Vejrup Carlsen

Merge pull request #76 from netarchivesuite/NAS-2733

Nas 2733

All follow-up to this issue has been pushed to branch NAS-2733

All follow-up to this issue has been pushed to branch NAS-2733

Follow-up on NAS-2677 - removed H1 templates in deploy/distribution/src/main/resources/order_templates_dist/

NAS-2673 - fixed createHarvestDB.pgsql script and added documentation to the scripts. Added a deploy_standalone_example_postgresql.xml deploy script

Fixed NAS-2744 - changed HERITRIX3_VERSION to '3.3.0-BDB-5.0.x'

Fix NAS-2673. Fix incorrect creation of tables 'harvestdefinitions' and 'harvestchannel'. Include creation of SNAPSHOT and FOCUSED channels used by Quickstart

Work on NAS-2673. Updates on scripts RunNetarchiveSuite.sh and mq.sh. Removed sql-scripts from sql folder to folders mysql,postgresql, an derby respectedly

    • -16
    • +41
    /deploy/deploy-core/scripts/RunNetarchiveSuite.sh
    • -0
    • +156
    /deploy/deploy-core/scripts/derby/createArchiveDB.sql
    • -0
    • +641
    /deploy/deploy-core/scripts/derby/createfullhddb.sql
    • -0
    • +113
    /deploy/deploy-core/scripts/mysql/createArchiveDB.mysql
    • -0
    • +486
    /deploy/deploy-core/scripts/mysql/createfullhddb.mysql
    • -1
    • +18
    /deploy/deploy-core/scripts/openmq/mq.sh
NAS-2733 - don't search in database if chosen filteringMethod is 'database'

NAS-2733 - lacked some imports in the jsp page

Note that the new "old" filtering method gives you a Domain X is not registered! error, if domain X is not registered in the Domains table of the harvestdatabase

Note that the new "old" filtering method gives you a

 Domain X is not registered! 

error, if domain X is not registered in the Domains table of the harvestdatabase

NAS-2733 - Forgot to update the complete_settings.xml file. Added the default value for tempPath to HarvesterSettings class

In the latest commit, I have renabled the old filtering method using lookup in the database, and made this the default method. There is now a setting for the filteringMethod used by the Harveststat...

In the latest commit, I have renabled the old filtering method using lookup in the database, and made this the default method.
There is now a setting for the filteringMethod used by the Harveststatus-running.jsp page:
settings.harvester.webinterface.runningjobsFilteringMethod

If this value is set to "cachedLogs", the filtering is done by searching in the cached crawllogs
if this value is set to "database", the filtering is done by searching in the database
The latter is the default value in the settings

Fixed in the latest commit

Fixed in the latest commit

Fixed in the latest commit

Fixed in the latest commit

Fixed in the latest commit to branch NAS-2733

Fixed in the latest commit to branch NAS-2733

Made fix for NAS-2735. Work on NAS-2733 - Enabled filtering in running jobs using previously used FindRunningJobQuery class

The searching for jobs harvesting a domain used the code in 5.2.2 used the ./harvester/harvester-core/src/main/java/dk/netarkivet/harvester/webinterface/FindRunningJobQuery.java and this code in th...

The searching for jobs harvesting a domain used the code in 5.2.2
used the ./harvester/harvester-core/src/main/java/dk/netarkivet/harvester/webinterface/FindRunningJobQuery.java
and this code in the Harveststatus-running.jsp

 FindRunningJobQuery findJobQuery = new FindRunningJobQuery(request);
    Long[] jobIdsForDomain = findJobQuery.getRunningJobIds();

and further down:

<% if (jobIdsForDomain.length > 0) { %>
<br/>
<table class="selection_table_small">
<tr>
    <th><fmt:message key="running.jobs.finder.table.jobId"/></th>
</tr>
<% for (long jobId : jobIdsForDomain) {
    String jobDetailsLink = "Harveststatus-jobdetails.jsp?"
       + Constants.JOB_PARAM + "=" + jobId;
%>
<tr><td><a href="<%=jobDetailsLink%>"><%=jobId%></a></td></tr>
<% } %>
</table>
<% } else {

    //after using the search button "searchDone" !=null
    String searchDone = request.getParameter("searchDone");
    if (searchDone != null) { %>
    	 <fmt:message key="table.job.no.jobs"/>

<% } %>
<% } %>
<% } %>
<%
 HTMLUtils.generateFooter(out);
%>


Currently the method FindRunningJobQuery(ServletRequest req) is no longer called in Harveststatus-running.jsp

And also when using the new search-formula, jobs not found in the cached logs are not shown at all.
In the old version, the jobs matching the search were shown in a table of their own

However we could introduce the old method again without too much work.
But what do we want here?

Fixed trivial issue NAS-2741. It wrote out useless warnings to the log

But in that case it is only useful, if caching is enabled for all the H3 hosts

But in that case it is only useful, if caching is enabled for all the H3 hosts

After Sara's comments last Friday, I still would like to know what the initial requirements for this feature was. According to BNF, they would have preferred a filter, that searched in the harvest-...

After Sara's comments last Friday, I still would like to know what the initial requirements for this feature was.
According to BNF, they would have preferred a filter, that searched in the harvest-database instead of in the cached crawl-logs.
As it is - now - it is very confusing

I think it should possible to disable this feature.

There was no logging at all for the NASEnvironment class. And I didn't want to waste any time figuring out how logging should work in NICL's framework. So I've added my own framework. So, if you th...

There was no logging at all for the NASEnvironment class. And I didn't want to waste any time figuring out how logging should work in NICL's framework.
So I've added my own framework.
So, if you think the methodName writeLog is confusing, you're welcome to find another name for it.

Merge these two loglines

Merge these two loglines

We should create the directory represented by tempPath, if it doesn't exist instead of falling back to /tmp

We should create the directory represented by tempPath, if it doesn't exist instead of falling back to /tmp

Work on NAS-2733 - now searching for the domain in the cached logs. Only does lookup, if the crawlog...
Work on NAS-2733 - now searching for the domain in the cached logs. Only does lookup, if the crawlog...
This value is always set to "", even if we just searched for jobs harvesting a specific domain. We should replace "" with the value of searchedDomainName if it is not null Created https://sbforge....

This value is always set to "", even if we just searched for jobs harvesting a specific domain.
We should replace "" with the value of searchedDomainName if it is not null

Created https://sbforge.org/jira/browse/NAS-2735 for this

This fix is really not related to the bug, but I couldn't let this typo stay in the codebase

This fix is really not related to the bug, but I couldn't let this typo stay in the codebase

This line will no longer work, as NASEnvironment.getCrawledUrls no longer reads the path from h3Job.crawlLogFilePath

This line will no longer work, as NASEnvironment.getCrawledUrls no longer reads the path from h3Job.crawlLogFilePath

I've added my own logging to the NASEnvironment class to find out what was going on. This is written to log/h3monitor.log

I've added my own logging to the NASEnvironment class to find out what was going on. This is written to log/h3monitor.log

Work on NAS-2733 - now searching for the domain in the cached logs. Only does lookup, if the crawlog is cached. If no caching, the job will not be shown on the result.page

Fixed NAS-2726