Class DomainConfiguration
- java.lang.Object
-
- dk.netarkivet.harvester.datamodel.DomainConfiguration
-
- All Implemented Interfaces:
Named
public class DomainConfiguration extends java.lang.Object implements Named
This class describes a configuration for harvesting a domain. It combines a number of seedlists, a number of passwords, an order template, and some specialised settings to define the way to harvest a domain.
-
-
Constructor Summary
Constructors Constructor Description DomainConfiguration(java.lang.String theConfigName, Domain domain, java.util.List<SeedList> seedlists, java.util.List<Password> passwords)
Create a new configuration for a domain.DomainConfiguration(java.lang.String theConfigName, java.lang.String domainName, DomainHistory history, java.util.List<java.lang.String> crawlertraps, java.util.List<SeedList> seedlists, java.util.List<Password> passwords)
Alternate constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addPassword(Domain domain, Password password)
Add password to the configuration.void
addSeedList(Domain domain, SeedList seedlist)
Add a new seedlist to the configuration.static java.lang.String
cfgToString(DomainConfiguration cfg)
java.util.List<EAV.AttributeAndType>
getAttributesAndTypes()
Get this configurations EAV attributes and attribute types.java.lang.String
getComments()
Returns comments.java.util.List<java.lang.String>
getCrawlertraps()
DomainHistory
getDomainhistory()
java.lang.String
getDomainName()
Returns the name of the domain aggregating this configuration.long
getExpectedNumberOfObjects(long objectLimit, long byteLimit)
Gets the best expectation for how many objects a harvest using this configuration will retrieve, given a job with a maximum limit pr.java.lang.Long
getID()
Get the ID of this configuration.long
getMaxBytes()
Returns the maximum number of bytes to download during a single harvest of a domain.long
getMaxObjects()
Returns the maximum number of objects to harvest from the domain.int
getMaxRequestRate()
Returns the maximum request rate to use when harvesting the domain.java.lang.String
getName()
Get the configuration name.java.lang.String
getOrderXmlName()
Returns the name of the order xml file used by the domain.java.util.Iterator<Password>
getPasswords()
Get an iterator of passwords used in this configuration.java.util.Iterator<SeedList>
getSeedLists()
Get an iterator of seedlists used in this configuration.long
minObjectsBytesLimit(long objectLimit, long byteLimit, long expectedObjectSize)
Return the lowest limit for the two values, or MAX_DOMAIN_SIZE if both are infinite, which is the max size we harvest from this domain.void
removePassword(java.lang.String passwordName)
Remove a password from the list of passwords used in this domain.void
setAttributesAndTypes(java.util.List<EAV.AttributeAndType> attributesAndTypes)
Set this configurations EAV attributes and attribute types.void
setComments(java.lang.String comments)
Set the comments field.void
setCrawlertraps(java.util.List<java.lang.String> someCrawlertraps)
Set the crawlerltraps for this configuration.void
setDomainhistory(DomainHistory newDomainhistory)
Set the domainHistory for this configuration.void
setMaxBytes(long maxBytes)
Specify the maximum number of bytes to download from a domain in a single harvest.void
setMaxObjects(long max)
Specify the maximum number of objects to retrieve from the domain.void
setMaxRequestRate(int maxrate)
Specify the maximum request rate to use when harvesting data.void
setName(java.lang.String configName)
Change the name of configuration to the given configName.void
setOrderXmlName(java.lang.String ordername)
Specify the name of the order.xml template to use.void
setPasswords(Domain domain, java.util.List<Password> newPasswords)
Sets the used passwords to the given list.void
setSeedLists(Domain domain, java.util.List<SeedList> newSeedlists)
Sets the used seedlists to the given list.java.lang.String
toString()
ToString of DomainConfiguration class.boolean
usesPassword(java.lang.String passwordName)
Check whether this domain uses a given password.
-
-
-
Constructor Detail
-
DomainConfiguration
public DomainConfiguration(java.lang.String theConfigName, Domain domain, java.util.List<SeedList> seedlists, java.util.List<Password> passwords)
Create a new configuration for a domain.- Parameters:
theConfigName
- The name of this configurationdomain
- The domain that this configuration is forseedlists
- Seedlists to use in this configuration.passwords
- Passwords to use in this configuration.
-
DomainConfiguration
public DomainConfiguration(java.lang.String theConfigName, java.lang.String domainName, DomainHistory history, java.util.List<java.lang.String> crawlertraps, java.util.List<SeedList> seedlists, java.util.List<Password> passwords)
Alternate constructor. TODO Filter all history not relevant for this configuration- Parameters:
theConfigName
- theConfigName The name of this configurationdomainName
- The name of the domain that this configuration is forhistory
- The domainhistory of the given domaincrawlertraps
- The crawlertraps of the given domainseedlists
- Seedlists to use in this configurationpasswords
- Passwords to use in this configuration.
-
-
Method Detail
-
cfgToString
public static java.lang.String cfgToString(DomainConfiguration cfg)
-
setOrderXmlName
public void setOrderXmlName(java.lang.String ordername)
Specify the name of the order.xml template to use.- Parameters:
ordername
- order.xml template name- Throws:
ArgumentNotValid
- if filename null or empty
-
setMaxObjects
public void setMaxObjects(long max)
Specify the maximum number of objects to retrieve from the domain.- Parameters:
max
- maximum number of objects to retrieve- Throws:
ArgumentNotValid
- if max<-1
-
setMaxRequestRate
public void setMaxRequestRate(int maxrate)
Specify the maximum request rate to use when harvesting data.- Parameters:
maxrate
- the maximum request rate- Throws:
ArgumentNotValid
- if maxrate<0
-
setMaxBytes
public void setMaxBytes(long maxBytes)
Specify the maximum number of bytes to download from a domain in a single harvest.- Parameters:
maxBytes
- Maximum number of bytes to download, or -1 for no limit.- Throws:
ArgumentNotValid
- if maxBytes < -1
-
getName
public java.lang.String getName()
Get the configuration name.
-
getComments
public java.lang.String getComments()
Returns comments.- Specified by:
getComments
in interfaceNamed
- Returns:
- string containing comments
-
getOrderXmlName
public java.lang.String getOrderXmlName()
Returns the name of the order xml file used by the domain.- Returns:
- name of the order.xml file that should be used when harvesting the domain
-
getMaxObjects
public long getMaxObjects()
Returns the maximum number of objects to harvest from the domain.- Returns:
- maximum number of objects to harvest
-
getMaxRequestRate
public int getMaxRequestRate()
Returns the maximum request rate to use when harvesting the domain.- Returns:
- maximum request rate
-
getMaxBytes
public long getMaxBytes()
Returns the maximum number of bytes to download during a single harvest of a domain.- Returns:
- Maximum bytes limit, or -1 for no limit.
-
getDomainName
public java.lang.String getDomainName()
Returns the name of the domain aggregating this configuration.- Returns:
- the name of the domain aggregating this configuration.
-
getSeedLists
public java.util.Iterator<SeedList> getSeedLists()
Get an iterator of seedlists used in this configuration.- Returns:
- seedlists as iterator
-
addSeedList
public void addSeedList(Domain domain, SeedList seedlist)
Add a new seedlist to the configuration. Must exist in the associated domain and the equal to that seedlist.- Parameters:
seedlist
- the seedlist to adddomain
- The domain to check if the seedlist exists- Throws:
ArgumentNotValid
- if the seedlist is nullUnknownID
- if the seedlist is not defined on the domainPermissionDenied
- if the seedlist is different from the one on the domain.
-
setSeedLists
public void setSeedLists(Domain domain, java.util.List<SeedList> newSeedlists)
Sets the used seedlists to the given list. Note: list is copied.- Parameters:
newSeedlists
- The seedlists to use.domain
- The domain where the seedlists should come from- Throws:
ArgumentNotValid
- if the seedslists are null
-
getPasswords
public java.util.Iterator<Password> getPasswords()
Get an iterator of passwords used in this configuration.- Returns:
- The passwords in an iterator
-
addPassword
public void addPassword(Domain domain, Password password)
Add password to the configuration.- Parameters:
password
- to add (must exist in the domain)domain
- the domain where the password should come from.
-
getExpectedNumberOfObjects
public long getExpectedNumberOfObjects(long objectLimit, long byteLimit)
Gets the best expectation for how many objects a harvest using this configuration will retrieve, given a job with a maximum limit pr. domain- Parameters:
objectLimit
- The maximum limit, or Constants.HERITRIX_MAXOBJECTS_INFINITY for no limit. This limit overrides the limit set on the configuration, unless override is in effect.byteLimit
- The maximum number of bytes that will be used as limit in the harvest. This limit overrides the limit set on the configuration, unless override is in effect.- Returns:
- The expected number of objects.
-
minObjectsBytesLimit
public long minObjectsBytesLimit(long objectLimit, long byteLimit, long expectedObjectSize)
Return the lowest limit for the two values, or MAX_DOMAIN_SIZE if both are infinite, which is the max size we harvest from this domain.- Parameters:
objectLimit
- A long value defining an object limit, or 0 for infinitebyteLimit
- A long value defining a byte limit, or HarvesterSettings.MAX_DOMAIN_SIZE for infinite.expectedObjectSize
- The expected number of bytes per object- Returns:
- The lowest of the two boundaries, or MAX_DOMAIN_SIZE if both are unlimited.
-
setComments
public void setComments(java.lang.String comments)
Set the comments field.- Parameters:
comments
- User-entered free-form comments.
-
removePassword
public void removePassword(java.lang.String passwordName)
Remove a password from the list of passwords used in this domain.- Parameters:
passwordName
- Password to Remove.
-
usesPassword
public boolean usesPassword(java.lang.String passwordName)
Check whether this domain uses a given password.- Parameters:
passwordName
- The given password- Returns:
- whether the given password is used
-
setPasswords
public void setPasswords(Domain domain, java.util.List<Password> newPasswords)
Sets the used passwords to the given list. Note: list is copied.- Parameters:
newPasswords
- The passwords to use.domain
- The domain where the passwords should come from- Throws:
ArgumentNotValid
- if the passwords are null
-
getID
public java.lang.Long getID()
Get the ID of this configuration.- Returns:
- the ID of this configuration
-
toString
public java.lang.String toString()
ToString of DomainConfiguration class.- Overrides:
toString
in classjava.lang.Object
- Returns:
- a string with info about the instance of this class.
-
setCrawlertraps
public void setCrawlertraps(java.util.List<java.lang.String> someCrawlertraps)
Set the crawlerltraps for this configuration.- Parameters:
someCrawlertraps
- a list of crawlertraps
-
getCrawlertraps
public java.util.List<java.lang.String> getCrawlertraps()
- Returns:
- the known crawlertraps for this configuration.
-
getDomainhistory
public DomainHistory getDomainhistory()
- Returns:
- the domainhistory for this configuration
-
setDomainhistory
public void setDomainhistory(DomainHistory newDomainhistory)
Set the domainHistory for this configuration.- Parameters:
newDomainhistory
- the new domainHistory for this configuration( null is accepted for no History)
-
setName
public void setName(java.lang.String configName)
Change the name of configuration to the given configName.- Parameters:
configName
- a new name for this configuration.
-
getAttributesAndTypes
public java.util.List<EAV.AttributeAndType> getAttributesAndTypes()
Get this configurations EAV attributes and attribute types.- Returns:
- this configurations EAV attributes and attribute types
-
setAttributesAndTypes
public void setAttributesAndTypes(java.util.List<EAV.AttributeAndType> attributesAndTypes)
Set this configurations EAV attributes and attribute types.- Parameters:
attributesAndTypes
- EAV attributes and attribute types
-
-