heritrix3

Clone Tools
  • last updated a few minutes ago
Constraints
Constraints: committers
 
Constraints: files
Constraints: dates
Commented this test back in to make Travis happy

Merge pull request #312 from internetarchive/guava-dep

Exclude hbase-client's guava 12 transitive dependency

Exclude hbase-client's guava 12 transitive dependency

Guava 12 from hbase-client is closer to the root of the dependency tree than guava 17 from webarchive-commons so Maven prefers it. But recent changes to heritrix-commons rely on classes in the newer version of Guava. So let's ensure webarchive-commons wins.

Hopefully this doesn't break the hbase module, I have no way of testing it.

Fixes #311

Use HTTPS to resolve dependencies in Maven Build where possible

Merge pull request #308 from ldko/fix-restlet-errors

Fix stream closed exception for Paged view

Fix stream closed exception for Paged view

Merge pull request #300 from hennekey/fix-299

Use Guice instead of custom implementation

Merge pull request #304 from hennekey/remove-custom-base32

Replace custom Base32 encoding

Correct encoding

The previous implementation appears to always have returned upper case,

was able to encode either case, and did not reutrn padding.

Merge pull request #303 from hennekey/refactor-history

Replace constant with accessor methods

Merge pull request #306 from internetarchive/fix-stream-closed

Fix stream closed exception by not closing output stream

Fix stream closed exception by not closing output stream

ServerCall.writeResponseBody() flushes it after we return so it must

remain open.

Fixes #305

Replace custom Base32 encoding

Guava is available so a custom implementation is unnecessary.

Replace constant with accessor methods

CrawlURI already had the accessor method, and the use of the constant

was a bit inconsistent. This change adds the corresponding mutator

method to make working with the CrawlURI history a bit simpler.

Merge pull request #296 from hennekey/update-junit

Set JUnit version to latest

Merge pull request #302 from nlevitt/ydl-streaming

limit ExtractorYoutubeDL heap usage

Merge pull request #301 from nlevitt/fix-logging-config

fix logging config

Increment the count only when the filter notes it

Otherwise this is a count of how many times this add method is called,

not how many times an element was noted as being actually added.

Fix assertions

By using assertEquals and seting the expected and actual values, the

failure messages become a bit more useful.

java 8 compatibility

limit ExtractorYoutubeDL heap usage

We were seeing OOME due to large youtoube-dl json (for playlists and

such). So instead of storing the json in ram, stream through it, and

stash the contents in an thread-local anonymous tempfile so it can be

written to to warc.

fix logging config

by setting system property java.util.logging.config.file, because new

version restlet reconfigures logging after heritrix has already

configured it

Remove version

Allow the parent POM to specify the version

Merge pull request #298 from hennekey/fix-297

Speed up ObjectIdentityBdbManualCacheTest

Use Guice instead of custom implementation

This uses the avaialable code in Guice rather than a custom

implementation. It also provides a performance increase (as demonstrated

by the unit tests)

Merge pull request #287 from nlevitt/trough-dedup-fix

change trough dedup `date` type to varchar.

Merge pull request #294 from internetarchive/fix-config-post

Fix 'Method Not Allowed' on POST of config editor form

Merge pull request #295 from internetarchive/disable-wbm-test

Disable test that connects to wwwb-dedup.us.archive.org

Add default constructor to IdentityCacheWrapper

Kryo is spending a lot of time during serialization handling a

NoSuchMethodException. By adding this default constructor we can skip

doing all that work.

Move IdentityCacheableWrapper

This is utilized only in tests so it belongs there