Child pages
  • Selective Harvests

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

Table of Contents

The front page by default shows the list of selective harvests.

You can Activate an inactive harvest definition and Deactivate an active harvest definition. If you deactivate a running harvest, the system will finish the running jobs.

Click on Edit to change an existing harvest definition or Create new harvest definition.

Click on History if you wish to trace back all the jobs from former finished harvests.

Creating/editing a selective harvest

Create a new selective harvest definition by pressing Create new selective harvestdefinition from the frontpage.

Give the harvestdefinition a recognizable harvest name – you can not change it later. If necessary add a comment.

Choose a schedule from the dropdown list.

Now you can add domains to the harvestdefinition.

Write the name of the domains you want to add in the box Enter domain(s) to add to the harvest here and click on Add domains.

The added domains will appear in the column Domain.

For each added domain, choose the wanted configuration from the dropdown list for each domain. Press Save to save the harvestdefinition.

The scheduling of selective harvest definitions can be overridden by filling out the input field Override with new date. Simply set the date to whenever you wish the harvest definition to run next time. The scheduling of the harvest definition will continue from that point in time.

Easy creation of non existing domains

When adding a domain that is not existing in the database you are warned with The following domains are unknown and were not added. You can simply add the unknown domains to the database and your harvestdefinition by clicking Create and add to the harvestdefinition.

Event harvest

Event harvests are treated almost the same as selective harvests in the system. The only difference is a power-adding of domains function. This could be used for selective harvests as well but was developed for event harvesting definitions where the operator must fill in larger number of URLs without having to edit configurations and seedlists on all those domains.

Adding seeds to an event harvest

Click the Add seeds at the bottom of the Seletive Harvest page. Enter identified start-URLs covering the event in the Enter seeds: box. In Max number of bytes per domain enter preferred max number, e.g. 1000000000. Select a harvest template with the Harvest template drop down box.

All seeds will use the same template, so to harvest different seeds with different templates you need to add them bunch by bunch for each template you need for your event harvest.

Pressing Insert starts the power-adding function. This function runs through the entered seeds one by one and does the following with each seed:

  1. Finds the domain from which the seed derives
  2. Creates a seedlist with the name of the harvestdefinition and the template as seedlist-name
  3. Creates a configuration with the name of the harvestdefinition and the template as configuration-name. And select the seedlist from (2) to use with the new configuration.If the seedlist to create in (2) or the configuration to create in (3) already exist (If the power-adding function has been used before with other seeds from the same domain in the same event harvest) the system will only add the new URLs to the existing seedlist.
    You can also use Add seeds from a file. This allows you upload a file with the seeds instead of entering the seeds in a text field. Otherwise the functionality is the same.

Section
Column

Column
width100%
 
Column