dk.netarkivet.harvester.datamodel
Class DomainConfiguration

java.lang.Object
  extended by dk.netarkivet.harvester.datamodel.DomainConfiguration
All Implemented Interfaces:
Named

public class DomainConfiguration
extends java.lang.Object
implements Named

This class describes a configuration for harvesting a domain. It combines a number of seedlists, a number of passwords, an order template, and some specialised settings to define the way to harvest a domain.


Constructor Summary
DomainConfiguration(java.lang.String theConfigName, Domain domain, java.util.List<SeedList> seedlists, java.util.List<Password> passwords)
          Create a new configuration for a domain.
DomainConfiguration(java.lang.String theConfigName, java.lang.String domainName, DomainHistory history, java.util.List<java.lang.String> crawlertraps, java.util.List<SeedList> seedlists, java.util.List<Password> passwords)
          Alternate constructor.
 
Method Summary
 void addPassword(Domain domain, Password password)
          Add password to the configuration.
 void addSeedList(Domain domain, SeedList seedlist)
          Add a new seedlist to the configuration.
 java.lang.String getComments()
          Returns comments.
 java.util.List<java.lang.String> getCrawlertraps()
           
 DomainHistory getDomainhistory()
           
 java.lang.String getDomainName()
          Returns the name of the domain aggregating this configuration.
 long getExpectedNumberOfObjects(long objectLimit, long byteLimit)
          Gets the best expectation for how many objects a harvest using this configuration will retrieve, given a job with a maximum limit pr.
(package private)  long getID()
          Get the ID of this configuration.
 long getMaxBytes()
          Returns the maximum number of bytes to download during a single harvest of a domain.
 long getMaxObjects()
          Returns the maximum number of objects to harvest from the domain.
 int getMaxRequestRate()
          Returns the maximum request rate to use when harvesting the domain.
 java.lang.String getName()
          Get the configuration name.
 java.lang.String getOrderXmlName()
          Returns the name of the order xml file used by the domain.
 java.util.Iterator<Password> getPasswords()
          Get an iterator of passwords used in this configuration.
 java.util.Iterator<SeedList> getSeedLists()
          Get an iterator of seedlists used in this configuration.
(package private)  boolean hasID()
          Check if this configuration has an ID set yet (doesn't happen until the DBDAO persists it).
 long minObjectsBytesLimit(long objectLimit, long byteLimit, long expectedObjectSize)
          Return the lowest limit for the two values, or MAX_DOMAIN_SIZE if both are infinite, which is the max size we harvest from this domain.
 void removePassword(java.lang.String passwordName)
          Remove a password from the list of passwords used in this domain.
 void setComments(java.lang.String comments)
          Set the comments field.
 void setCrawlertraps(java.util.List<java.lang.String> someCrawlertraps)
          Set the crawlerltraps for this configuration.
 void setDomainhistory(DomainHistory newDomainhistory)
          Set the domainHistory for this configuration.
(package private)  void setID(long anId)
          Set the ID of this configuration.
 void setMaxBytes(long maxBytes)
          Specify the maximum number of bytes to download from a domain in a single harvest.
 void setMaxObjects(long max)
          Specify the maximum number of objects to retrieve from the domain.
 void setMaxRequestRate(int maxrate)
          Specify the maximum request rate to use when harvesting data.
 void setOrderXmlName(java.lang.String ordername)
          Specify the name of the order.xml template to use.
 void setPasswords(Domain domain, java.util.List<Password> newPasswords)
          Sets the used passwords to the given list.
 void setSeedLists(Domain domain, java.util.List<SeedList> newSeedlists)
          Sets the used seedlists to the given list.
 java.lang.String toString()
          ToString of DomainConfiguration class.
 boolean usesPassword(java.lang.String passwordName)
          Check whether this domain uses a given password.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

DomainConfiguration

public DomainConfiguration(java.lang.String theConfigName,
                           Domain domain,
                           java.util.List<SeedList> seedlists,
                           java.util.List<Password> passwords)
Create a new configuration for a domain.

Parameters:
theConfigName - The name of this configuration
domain - The domain that this configuration is for
seedlists - Seedlists to use in this configuration.
passwords - Passwords to use in this configuration.

DomainConfiguration

public DomainConfiguration(java.lang.String theConfigName,
                           java.lang.String domainName,
                           DomainHistory history,
                           java.util.List<java.lang.String> crawlertraps,
                           java.util.List<SeedList> seedlists,
                           java.util.List<Password> passwords)
Alternate constructor. TODO Filter all history not relevant for this configuration

Parameters:
theConfigName - theConfigName The name of this configuration
domainName - The name of the domain that this configuration is for
history - The domainhistory belonging the given domain
crawlertraps - The domainhistory belonging the given domain
seedlists - Seedlists to use in this configuration
passwords - Passwords to use in this configuration.
Method Detail

setOrderXmlName

public void setOrderXmlName(java.lang.String ordername)
Specify the name of the order.xml template to use.

Parameters:
ordername - order.xml template name
Throws:
ArgumentNotValid - if filename null or empty

setMaxObjects

public void setMaxObjects(long max)
Specify the maximum number of objects to retrieve from the domain.

Parameters:
max - maximum number of objects to retrieve
Throws:
ArgumentNotValid - if max<-1

setMaxRequestRate

public void setMaxRequestRate(int maxrate)
Specify the maximum request rate to use when harvesting data.

Parameters:
maxrate - the maximum request rate
Throws:
ArgumentNotValid - if maxrate<0

setMaxBytes

public void setMaxBytes(long maxBytes)
Specify the maximum number of bytes to download from a domain in a single harvest.

Parameters:
maxBytes - Maximum number of bytes to download, or -1 for no limit.
Throws:
ArgumentNotValid - if maxBytes < -1

getName

public java.lang.String getName()
Get the configuration name.

Specified by:
getName in interface Named
Returns:
the configuration name

getComments

public java.lang.String getComments()
Returns comments.

Specified by:
getComments in interface Named
Returns:
string containing comments

getOrderXmlName

public java.lang.String getOrderXmlName()
Returns the name of the order xml file used by the domain.

Returns:
name of the order.xml file that should be used when harvesting the domain

getMaxObjects

public long getMaxObjects()
Returns the maximum number of objects to harvest from the domain.

Returns:
maximum number of objects to harvest

getMaxRequestRate

public int getMaxRequestRate()
Returns the maximum request rate to use when harvesting the domain.

Returns:
maximum request rate

getMaxBytes

public long getMaxBytes()
Returns the maximum number of bytes to download during a single harvest of a domain.

Returns:
Maximum bytes limit, or -1 for no limit.

getDomainName

public java.lang.String getDomainName()
Returns the name of the domain aggregating this configuration.

Returns:
the name of the domain aggregating this configuration.

getSeedLists

public java.util.Iterator<SeedList> getSeedLists()
Get an iterator of seedlists used in this configuration.

Returns:
seedlists as iterator

addSeedList

public void addSeedList(Domain domain,
                        SeedList seedlist)
Add a new seedlist to the configuration. Must exist in the associated domain and the equal to that seedlist.

Parameters:
seedlist - the seedlist to add
domain - The domain to check if the seedlist exists
Throws:
ArgumentNotValid - if the seedlist is null
UnknownID - if the seedlist is not defined on the domain
PermissionDenied - if the seedlist is different from the one on the domain.

getPasswords

public java.util.Iterator<Password> getPasswords()
Get an iterator of passwords used in this configuration.

Returns:
The passwords in an iterator

addPassword

public void addPassword(Domain domain,
                        Password password)
Add password to the configuration.

Parameters:
password - to add (must exist in the domain)
domain - the domain where the password should come from.

getExpectedNumberOfObjects

public long getExpectedNumberOfObjects(long objectLimit,
                                       long byteLimit)
Gets the best expectation for how many objects a harvest using this configuration will retrieve, given a job with a maximum limit pr. domain

Parameters:
objectLimit - The maximum limit, or Constants.HERITRIX_MAXOBJECTS_INFINITY for no limit. This limit overrides the limit set on the configuration, unless override is in effect.
byteLimit - The maximum number of bytes that will be used as limit in the harvest. This limit overrides the limit set on the configuration, unless override is in effect.
Returns:
The expected number of objects.

minObjectsBytesLimit

public long minObjectsBytesLimit(long objectLimit,
                                 long byteLimit,
                                 long expectedObjectSize)
Return the lowest limit for the two values, or MAX_DOMAIN_SIZE if both are infinite, which is the max size we harvest from this domain.

Parameters:
objectLimit - A long value defining an object limit, or 0 for infinite
byteLimit - A long value defining a byte limit, or HarvesterSettings.MAX_DOMAIN_SIZE for infinite.
expectedObjectSize - The expected number of bytes per object
Returns:
The lowest of the two boundaries, or MAX_DOMAIN_SIZE if both are unlimited.

setComments

public void setComments(java.lang.String comments)
Set the comments field.

Parameters:
comments - User-entered free-form comments.

removePassword

public void removePassword(java.lang.String passwordName)
Remove a password from the list of passwords used in this domain.

Parameters:
passwordName - Password to Remove.

usesPassword

public boolean usesPassword(java.lang.String passwordName)
Check whether this domain uses a given password.

Parameters:
passwordName - The given password
Returns:
whether the given password is used

setSeedLists

public void setSeedLists(Domain domain,
                         java.util.List<SeedList> newSeedlists)
Sets the used seedlists to the given list. Note: list is copied.

Parameters:
newSeedlists - The seedlists to use.
domain - The domain where the seedlists should come from
Throws:
ArgumentNotValid - if the seedslists are null

setPasswords

public void setPasswords(Domain domain,
                         java.util.List<Password> newPasswords)
Sets the used passwords to the given list. Note: list is copied.

Parameters:
newPasswords - The passwords to use.
domain - The domain where the passwords should come from
Throws:
ArgumentNotValid - if the passwords are null

getID

long getID()
Get the ID of this configuration. Only for use by DBDAO

Returns:
the ID of this configuration

setID

void setID(long anId)
Set the ID of this configuration. Only for use by DBDAO

Parameters:
anId - use this id for this configuration

hasID

boolean hasID()
Check if this configuration has an ID set yet (doesn't happen until the DBDAO persists it).

Returns:
true, if the configuration has an ID

toString

public java.lang.String toString()
ToString of DomainConfiguration class.

Overrides:
toString in class java.lang.Object
Returns:
a string with info about the instance of this class.

setCrawlertraps

public void setCrawlertraps(java.util.List<java.lang.String> someCrawlertraps)
Set the crawlerltraps for this configuration.

Parameters:
someCrawlertraps - a list of crawlertraps

getCrawlertraps

public java.util.List<java.lang.String> getCrawlertraps()
Returns:
the known crawlertraps for this configuration.

getDomainhistory

public DomainHistory getDomainhistory()
Returns:
the domainhistory for this configuration

setDomainhistory

public void setDomainhistory(DomainHistory newDomainhistory)
Set the domainHistory for this configuration.

Parameters:
newDomainhistory - the new domainHistory for this configuration( null is accepted for no History)