Clone
 

colin rosenthal <csr@statsbiblioteket.dk> in heritrix3

[maven-release-plugin] prepare for next development iteration

[maven-release-plugin] prepare release heritrix-3.4.0-20200518-NAS-6.0

[maven-release-plugin] rollback the release of heritrix-3.4.0-20200518-NAS-6.0

[maven-release-plugin] prepare release heritrix-3.4.0-20200518-NAS-6.0

[maven-release-plugin] rollback the release of heritrix-3.4.0-20200518-NAS-6.0

[maven-release-plugin] prepare release heritrix-3.4.0-20200518-NAS-6.0

[maven-release-plugin] rollback the release of heritrix-3.4.0-20200518-NAS-6.0

[maven-release-plugin] prepare for next development iteration

[maven-release-plugin] prepare release heritrix-3.4.0-20200518-NAS-6.0

Fixed regex timeout handling following suggestion https://github.com/internetarchive/heritrix3/pull/290#discussion_r366711640

Commented this test back in to make Travis happy

Updated manually to new SNAPSHOT version

Merge branch 'inline-image-filter' into h3.4-merge

Merge branch 'crawltrap-regex-timeout' into h3.4-merge

Removed mistaken additions

    • -13
    • +0
    /.idea/libraries/Maven__com_google_code_gson_gson_2_2_4.xml
    • -13
    • +0
    /.idea/libraries/Maven__com_googlecode_json_simple_json_simple_1_1_1.xml
    • -13
    • +0
    /.idea/libraries/Maven__com_rethinkdb_rethinkdb_driver_2_3_3.xml
    • -13
    • +0
    /.idea/libraries/Maven__com_sun_istack_istack_commons_runtime_3_0_7.xml
    • -13
    • +0
    /.idea/libraries/Maven__com_sun_xml_fastinfoset_FastInfoset_1_2_15.xml
    • -13
    • +0
    /.idea/libraries/Maven__javax_activation_javax_activation_api_1_2_0.xml
    • -13
    • +0
    /.idea/libraries/Maven__javax_servlet_javax_servlet_api_3_1_0.xml
    • -13
    • +0
    /.idea/libraries/Maven__javax_xml_bind_jaxb_api_2_3_1.xml
    • -13
    • +0
    /.idea/libraries/Maven__org_apache_avro_avro_1_7_6_cdh5_3_5.xml
    • -13
    • +0
    /.idea/libraries/Maven__org_apache_curator_curator_client_2_6_0.xml
    • -13
    • +0
    /.idea/libraries/Maven__org_apache_curator_curator_framework_2_6_0.xml
    • -13
    • +0
    /.idea/libraries/Maven__org_apache_curator_curator_recipes_2_6_0.xml
  1. … 35 more files in changeset.
removed .iml file

    • -0
    • +13
    /.idea/libraries/Maven__com_101tec_zkclient_0_7.xml
    • -0
    • +13
    /.idea/libraries/Maven__com_google_code_gson_gson_2_2_4.xml
    • -0
    • +13
    /.idea/libraries/Maven__com_googlecode_json_simple_json_simple_1_1_1.xml
    • -0
    • +13
    /.idea/libraries/Maven__com_rethinkdb_rethinkdb_driver_2_3_3.xml
    • -0
    • +13
    /.idea/libraries/Maven__com_sleepycat_je_4_1_6.xml
    • -0
    • +13
    /.idea/libraries/Maven__com_sun_istack_istack_commons_runtime_3_0_7.xml
    • -0
    • +13
    /.idea/libraries/Maven__com_sun_xml_fastinfoset_FastInfoset_1_2_15.xml
    • -0
    • +13
    /.idea/libraries/Maven__javax_activation_javax_activation_api_1_2_0.xml
    • -0
    • +13
    /.idea/libraries/Maven__javax_servlet_javax_servlet_api_3_1_0.xml
    • -0
    • +13
    /.idea/libraries/Maven__javax_xml_bind_jaxb_api_2_3_1.xml
    • -0
    • +13
    /.idea/libraries/Maven__junit_junit_4_10.xml
    • -0
    • +13
    /.idea/libraries/Maven__org_apache_avro_avro_1_7_6_cdh5_3_5.xml
    • -0
    • +13
    /.idea/libraries/Maven__org_apache_curator_curator_client_2_6_0.xml
    • -0
    • +13
    /.idea/libraries/Maven__org_apache_curator_curator_framework_2_6_0.xml
    • -0
    • +13
    /.idea/libraries/Maven__org_apache_curator_curator_recipes_2_6_0.xml
  1. … 36 more files in changeset.
Merge remote-tracking branch 'origin/crawltrap-regex-timeout' into crawltrap-regex-timeout

Updated manually to new SNAPSHOT version

Added a timeout to crawlertrap regex matching

[maven-release-plugin] prepare for next development iteration

[maven-release-plugin] prepare release heritrix-3.3.0-BDB-5.0.x-NAS-5.6

Updated manually to new SNAPSHOT version

Added scm configuration in pom

Release version for NAS 5.5

Attempt to filter out embedded images.

Attempt to filter out embedded images.

(cherry picked from commit aa5ff1dbdaefd04652a9c66506d20f1a6ae01dc3)

Refactored to use KeyedProperties to store regex timeout.

Added a new field timeoutPerRegexSeconds to MatchesListRegexDecideRule. Default value is 20.

Added contrib module and checkes that it is included in distribution.

Merge branch 'BrowserBased' into NAS-2703-srcset

Conflicts:

.gitignore

commons/src/main/java/org/archive/util/KeyTool.java

engine/src/main/java/org/archive/crawler/Heritrix.java

modules/src/test/java/org/archive/modules/fetcher/CookieFetchHTTPIntegrationTest.java