NetarchiveSuite-Github

Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Note that the harvestName needs to be taken afterwards from the attributeMap generated during the parsing of the multidata request

Note that the harvestName needs to be taken afterwards from the attributeMap generated during the parsing of the multidata request

I sort of followed your suggestion

I sort of followed your suggestion

I've only looked over this quickly, but it looks to me that when you add seeds from a file, the request only contains a payload (checked with the Firefox Web Console). There are no parameters. Ther...

I've only looked over this quickly, but it looks to me that when you add seeds from a file, the request only contains a payload (checked with the Firefox Web Console). There are no parameters. Therefore ADD_SEEDS_PARAM will be null so you will never process the "else" statement. Instead the logic should start

 if (isMultiPart) {

} else {

}
Merged with master

Merged with master

Marking release 5.2.3 RC1

  1. … 25 more files in changeset.
Added cleanup of h3 files after post-processing - to 5.2.3 release

I like it.

I like it.

Fixes for NAS-2579 (TLD-SPAM), NAS-2686 (postprocessing fix), and NAS-2687 (indexing fix)

  1. … 30 more files in changeset.
I'm a bit worried about how well it will scale to read the uri list into an array. Can't you process the underlying InputStream one line at a time? ie have getCrawledUrls return some sort of closea...

I'm a bit worried about how well it will scale to read the uri list into an array. Can't you process the underlying InputStream one line at a time? ie have getCrawledUrls return some sort of closeable iterator?

Find which jobs are harvesting a given domain - even if the domain is not in the seedlist.
Find which jobs are harvesting a given domain - even if the domain is not in the seedlist.
NAS-2683: Adding seeds to eventharvest fail, if list of seeds contain invalid seeds
NAS-2683: Adding seeds to eventharvest fail, if list of seeds contain invalid seeds
I have now changed the functionality of the Definitions-add-event-seeds.jsp and Definitions-edit-selective-harvest.jsp So the html-form from Definitions-add-event-seeds.jsp is sent to the Definitio...

I have now changed the functionality of the Definitions-add-event-seeds.jsp and Definitions-edit-selective-harvest.jsp
So the html-form from Definitions-add-event-seeds.jsp is sent to the Definitions-edit-selective-harvest.jsp script.
invalid seeds found during the process are now shown in a text-area, just like we already do for invalid domains

NOTE: It does not work when we try to read the data from a while.
Nothing is written in the logs, although I have tried to add additional log-entries
The form data is just ignored, and the Definitions-add-event-seeds.jsp is rendered, as though you forgot to transmit the harvestname to the form!

We should also check deployment

We should also check deployment

Yes, but i) I wanted to be able to log the duplicates because we still don't understand what's really happening, and ii) What if the same id is returned by both queries ie for some reason the same ...

Yes, but
i) I wanted to be able to log the duplicates because we still don't understand what's really happening, and
ii) What if the same id is returned by both queries ie for some reason the same harvestId shows up as both fullharvest and partialharvest? It should never happen, but the schema actually allows it.

Nothing. I just wanted to put the configuration key in a variable so I could log it.

Nothing. I just wanted to put the configuration key in a variable so I could log it.

no followup required

no followup required

Ǹo followup required

Ǹo followup required

What do you want to prevent here?

What do you want to prevent here?

Instead of this, just use SELECT distinct in lines 672 and 676

Instead of this, just use SELECT distinct in lines 672 and 676

Max number of running h3 instances. So 80 or so now?

Max number of running h3 instances. So 80 or so now?

Is it possible to define the configuration and executions tags in pluginManagement so they don't need to be repeated in each module?

Is it possible to define the configuration and executions tags in pluginManagement so they don't need to be repeated in each module?