Colin Rosenthal

I think this is a single commit aa5ff1dbdaefd04652a9c66506d20f1a6ae01dc3 which we could offer as a pull request.

I think this is a single commit aa5ff1dbdaefd04652a9c66506d20f1a6ae01dc3 which we could offer as a pull request.

This is already in ia/master

This is already in ia/master

This is something we added because heritrix was treating inline image data as links. I think we should make a pull request for it.

This is something we added because heritrix was treating inline image data as links. I think we should make a pull request for it.

This is already what is in ia/master.

This is already what is in ia/master.

This is already in ia/master

This is already in ia/master

Merge issues NAS-heritrix/IIPC-heritrix
Merge issues NAS-heritrix/IIPC-heritrix
Obviously something weird here as "contrib" is there twice,

Obviously something weird here as "contrib" is there twice,

?? Where does this come from? What does it do?

?? Where does this come from? What does it do?

?

?

The fallback is "false", meaning no match, meaning "accept this url". Is this the best choice? Does it matter? Should the behaviour be configurable?

The fallback is "false", meaning no match, meaning "accept this url". Is this the best choice? Does it matter? Should the behaviour be configurable?

This commit is mission-critical for us because we have had serious problems with pathological regexes. To get it accepted we probably should make the default behaviour backwards compatible ie infin...

This commit is mission-critical for us because we have had serious problems with pathological regexes. To get it accepted we probably should make the default behaviour backwards compatible ie infinite timeout, even though that's probably a terrible idea. I'd like to persuade Andy to allow a sensible default like 20s.

(There's also a possibly better solution which is to use a 3rd party regex engine with guaranteed runtime complexity e.g. https://www.brics.dk/automaton/faq.html)

This shouldn't be hardcoded. Why is this not just a bean-value that can be set in crawler beans?

This shouldn't be hardcoded. Why is this not just a bean-value that can be set in crawler beans?

Maybe move/copy the javadoc to the super-class?

Maybe move/copy the javadoc to the super-class?

I think this is the main change added to enable access to the frontier queue, so it ought really to have some javadoc.

I think this is the main change added to enable access to the frontier queue, so it ought really to have some javadoc.

I think the following three methods just expose some internals so that they can be accessed from scripts - ie. there should be no good reason to object to them.

I think the following three methods just expose some internals so that they can be accessed from scripts - ie. there should be no good reason to object to them.

?

?

There are/were some issues with the fact that the contrib package was disabled in the LBS releases. Can we check whether they are by default when the iipc release is built? Then this maybe this ass...

There are/were some issues with the fact that the contrib package was disabled in the LBS releases. Can we check whether they are by default when the iipc release is built? Then this maybe this assembly plugin isnt necessary.

Presumably both these exclusions are correct.

Presumably both these exclusions are correct.

This class is not used and not necessary - the functionality is standard in all Processors.

This class is not used and not necessary - the functionality is standard in all Processors.

Merge pull request #1 from maeb/fix/warcrecord

Ensure header.warcTypeIdx is not null when used in comparison

So $USER is actually the CollectionID? Since that's a bit confusing, maybe you should add a comment pointing that out at the start of the file.

So $USER is actually the CollectionID? Since that's a bit confusing, maybe you should add a comment pointing that out at the start of the file.

What are you saying here? https://sbforge.org/fisheye/static/pmwukm/2static/images/wiki/icons/emoticons/smile.gif

What are you saying here?

Follow-ups completed.

Follow-ups completed.

In principle yes - although not here because we don't do anything with it on Windows machines. In practice there will be automatic integration tests that use this parameter.

In principle yes - although not here because we don't do anything with it on Windows machines. In practice there will be automatic integration tests that use this parameter.

Doubtless added automatically by intellij, but now fixed.

Doubtless added automatically by intellij, but now fixed.

Added javadoc to superclass declaration + @Override annotation.

Added javadoc to superclass declaration + @Override annotation.

Ok. Det ved jeg ikke noget som helst om, men det kan godt lade sig gøre.

Ok. Det ved jeg ikke noget som helst om, men det kan godt lade sig gøre.

Missing a Constants.DASH

Missing a Constants.DASH