NetarchiveSuite-Github

Clone Tools
  • last updated a few seconds ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Note that the new "old" filtering method gives you a Domain X is not registered! error, if domain X is not registered in the Domains table of the harvestdatabase

Note that the new "old" filtering method gives you a

 Domain X is not registered! 

error, if domain X is not registered in the Domains table of the harvestdatabase

In the latest commit, I have renabled the old filtering method using lookup in the database, and made this the default method. There is now a setting for the filteringMethod used by the Harveststat...

In the latest commit, I have renabled the old filtering method using lookup in the database, and made this the default method.
There is now a setting for the filteringMethod used by the Harveststatus-running.jsp page:
settings.harvester.webinterface.runningjobsFilteringMethod

If this value is set to "cachedLogs", the filtering is done by searching in the cached crawllogs
if this value is set to "database", the filtering is done by searching in the database
The latter is the default value in the settings

Fixed in the latest commit

Fixed in the latest commit

Fixed in the latest commit

Fixed in the latest commit

Fixed in the latest commit to branch NAS-2733

Fixed in the latest commit to branch NAS-2733

The searching for jobs harvesting a domain used the code in 5.2.2 used the ./harvester/harvester-core/src/main/java/dk/netarkivet/harvester/webinterface/FindRunningJobQuery.java and this code in th...

The searching for jobs harvesting a domain used the code in 5.2.2
used the ./harvester/harvester-core/src/main/java/dk/netarkivet/harvester/webinterface/FindRunningJobQuery.java
and this code in the Harveststatus-running.jsp

 FindRunningJobQuery findJobQuery = new FindRunningJobQuery(request);
    Long[] jobIdsForDomain = findJobQuery.getRunningJobIds();

and further down:

<% if (jobIdsForDomain.length > 0) { %>
<br/>
<table class="selection_table_small">
<tr>
    <th><fmt:message key="running.jobs.finder.table.jobId"/></th>
</tr>
<% for (long jobId : jobIdsForDomain) {
    String jobDetailsLink = "Harveststatus-jobdetails.jsp?"
       + Constants.JOB_PARAM + "=" + jobId;
%>
<tr><td><a href="<%=jobDetailsLink%>"><%=jobId%></a></td></tr>
<% } %>
</table>
<% } else {

    //after using the search button "searchDone" !=null
    String searchDone = request.getParameter("searchDone");
    if (searchDone != null) { %>
    	 <fmt:message key="table.job.no.jobs"/>

<% } %>
<% } %>
<% } %>
<%
 HTMLUtils.generateFooter(out);
%>


Currently the method FindRunningJobQuery(ServletRequest req) is no longer called in Harveststatus-running.jsp

And also when using the new search-formula, jobs not found in the cached logs are not shown at all.
In the old version, the jobs matching the search were shown in a table of their own

However we could introduce the old method again without too much work.
But what do we want here?

But in that case it is only useful, if caching is enabled for all the H3 hosts

But in that case it is only useful, if caching is enabled for all the H3 hosts

I was thinking of a use case where a domain-owner/host complains that a harvest is too aggressive, but we don't know which job is doing the harvesting - for example of inline content. So we want to...

I was thinking of a use case where a domain-owner/host complains that a harvest is too aggressive, but we don't know which job is doing the harvesting - for example of inline content. So we want to search in the crawl logs.

After Sara's comments last Friday, I still would like to know what the initial requirements for this feature was. According to BNF, they would have preferred a filter, that searched in the harvest-...

After Sara's comments last Friday, I still would like to know what the initial requirements for this feature was.
According to BNF, they would have preferred a filter, that searched in the harvest-database instead of in the cached crawl-logs.
As it is - now - it is very confusing

I think it should possible to disable this feature.

There was no logging at all for the NASEnvironment class. And I didn't want to waste any time figuring out how logging should work in NICL's framework. So I've added my own framework. So, if you th...

There was no logging at all for the NASEnvironment class. And I didn't want to waste any time figuring out how logging should work in NICL's framework.
So I've added my own framework.
So, if you think the methodName writeLog is confusing, you're welcome to find another name for it.

I'm getting confused because the word "logging" is being used both for diagnostic logging, and for crawl-logs. So is this intended to be a fix for the underlying bug, or just added diagnostic loggi...

I'm getting confused because the word "logging" is being used both for diagnostic logging, and for crawl-logs. So is this intended to be a fix for the underlying bug, or just added diagnostic logging to find out what is going on?

Merge these two loglines

Merge these two loglines

We should create the directory represented by tempPath, if it doesn't exist instead of falling back to /tmp

We should create the directory represented by tempPath, if it doesn't exist instead of falling back to /tmp

Work on NAS-2733 - now searching for the domain in the cached logs. Only does lookup, if the crawlog...
Work on NAS-2733 - now searching for the domain in the cached logs. Only does lookup, if the crawlog...
This value is always set to "", even if we just searched for jobs harvesting a specific domain. We should replace "" with the value of searchedDomainName if it is not null Created https://sbforge....

This value is always set to "", even if we just searched for jobs harvesting a specific domain.
We should replace "" with the value of searchedDomainName if it is not null

Created https://sbforge.org/jira/browse/NAS-2735 for this

This fix is really not related to the bug, but I couldn't let this typo stay in the codebase

This fix is really not related to the bug, but I couldn't let this typo stay in the codebase

This line will no longer work, as NASEnvironment.getCrawledUrls no longer reads the path from h3Job.crawlLogFilePath

This line will no longer work, as NASEnvironment.getCrawledUrls no longer reads the path from h3Job.crawlLogFilePath

I've added my own logging to the NASEnvironment class to find out what was going on. This is written to log/h3monitor.log

I've added my own logging to the NASEnvironment class to find out what was going on. This is written to log/h3monitor.log

Det ligner c++ syntax og jeg gætter på den laver en .toString(). Jeg har bare brugt dele af den eksisterende implementation.

Det ligner c++ syntax og jeg gætter på den laver en .toString().
Jeg har bare brugt dele af den eksisterende implementation.

Jeg havde en del problemer med at teste for den tomme streng. Det her virkede for mig.

Jeg havde en del problemer med at teste for den tomme streng. Det her virkede for mig.

Tricky. Det udkommenterede er smart når den kører med resten af GUI'en. Og det der kører nu er bedst når den kører stand alone og er nød til at køre på databasen. Så det ville være smart hvis man k...

Tricky. Det udkommenterede er smart når den kører med resten af GUI'en.
Og det der kører nu er bedst når den kører stand alone og er nød til at køre på databasen.
Så det ville være smart hvis man kunne se om den kører standalone eller i GUI.

Det går godt hvis vi får et jobmonitor object. Ellers prøver den igen om 1 minut.

Det går godt hvis vi får et jobmonitor object.
Ellers prøver den igen om 1 minut.

Er exception i java normalt internationaliseret? Hmm...

Er exception i java normalt internationaliseret? Hmm...

Fordømt! Det ved jeg ikke...

Fordømt! Det ved jeg ikke...

Jeg smider en TODO ind. Der kan potentielt komme alt for mange log beskeder som bare er ubrugelige hvis netværk eller h3 er nede.

Jeg smider en TODO ind. Der kan potentielt komme alt for mange log beskeder som bare er ubrugelige hvis netværk eller h3 er nede.

Index filen har en long værdi for alle linie starter i text/crawllog filen. Men starten på den først linie bliver aldrig fundet når routinen nedenunder bruger linefeeds til at opdatere index filen.

Index filen har en long værdi for alle linie starter i text/crawllog filen.
Men starten på den først linie bliver aldrig fundet når routinen nedenunder bruger linefeeds til at opdatere index filen.

Muligvis, jeg tvivler på et job bliver oprettet hvis den ikke kan finde en channel men man kan aldrig vide.

Muligvis, jeg tvivler på et job bliver oprettet hvis den ikke kan finde en channel men man kan aldrig vide.

Ingen meta refresh header i html output.

Ingen meta refresh header i html output.

Javadoc nu tilføjet denne og SearchResult klassen.

Javadoc nu tilføjet denne og SearchResult klassen.