Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Fixed
Priority: Minor
Fix Version/s: 4.2
Affects Version/s: None
Component/s: None
Labels:
None

Verification:

Hide

Let Indexserver have access to a lot of PROD metadata-arc files locally on index-server (using LocalArcRepository as arcrepositoryClient), and let IndexServer always return the same big index based on 8000 or 9000 jobs regardless the request (using the TestIndexRequestServer class: The jobs taking part of the index is contained in the file given as argument to the TestIndexRequestServer)
See kb-test-acs-001.kb.dk:/home/test/LUCENETEST/conf/settings_IndexServerApplication.xml
on how this is done
Ingest 50.000 domains (use the list kb-test-adm-001.kb.dk:/home/test/domains-30-05-2012.txt)
Start a snapshot crawl; set the number of max objects per domain to 100 (to avoid getting complaints)
Let one job completely, and see it does't go into paused mode (A symptom of OOM error in Heritrix)

indexserver settings override:

<common>
<environmentName>LUCENETEST</environmentName>
<arcrepositoryClient>
<class>dk.netarkivet.common.distribute.arcrepository.LocalArcRepositoryClient</class>
<fileDir>/data/rawdata/prod-metadata</fileDir>
</arcrepositoryClient>
...
<harvester>
..
<indexserver>
<indexrequestserver>
<class>dk.netarkivet.archive.indexserver.distribute.TestIndexRequestServer
</class>
<fileContainingJobsForTestindex>/home/test/prod-metadata-ids.txt</fileContainingJobsForTestindex>
</indexrequestserver>
</indexserver>
</harvester>

remember to set the heritrix heapsize to 2Gb

<heritrix>
..
<heapSize>1936M</heapSize>
..
</heritrix>

Show
Let Indexserver have access to a lot of PROD metadata-arc files locally on index-server (using LocalArcRepository as arcrepositoryClient), and let IndexServer always return the same big index based on 8000 or 9000 jobs regardless the request (using the TestIndexRequestServer class: The jobs taking part of the index is contained in the file given as argument to the TestIndexRequestServer) See kb-test-acs-001.kb.dk:/home/test/LUCENETEST/conf/settings_IndexServerApplication.xml on how this is done Ingest 50.000 domains (use the list kb-test-adm-001.kb.dk:/home/test/domains-30-05-2012.txt) Start a snapshot crawl; set the number of max objects per domain to 100 (to avoid getting complaints) Let one job completely, and see it does't go into paused mode (A symptom of OOM error in Heritrix) indexserver settings override: <common> <environmentName>LUCENETEST</environmentName> <arcrepositoryClient> <class>dk.netarkivet.common.distribute.arcrepository.LocalArcRepositoryClient</class> <fileDir>/data/rawdata/prod-metadata</fileDir> </arcrepositoryClient> ... <harvester> .. <indexserver> <indexrequestserver> <class>dk.netarkivet.archive.indexserver.distribute.TestIndexRequestServer </class> <fileContainingJobsForTestindex>/home/test/prod-metadata-ids.txt</fileContainingJobsForTestindex> </indexrequestserver> </indexserver> </harvester> remember to set the heritrix heapsize to 2Gb <heritrix> .. <heapSize>1936M</heapSize> .. </heritrix>

Attachments

Issue Links

mentioned in: Page Loading...

Sub-Tasks

Fix unittests using old Lucene indices.

Resolved

Søren Vejrup Carlsen (Inactive)

Activity

People

Assignee:: Søren Vejrup Carlsen (Inactive)

Reporter:: Søren Vejrup Carlsen (Inactive)

Watchers:: 1 Start watching this issue

Dates

Created:: 30/May/13 6:38 PM

Updated:: 17/Sep/15 8:50 AM

Resolved:: 21/Aug/13 11:22 AM