Class Domain
- java.lang.Object
-
- dk.netarkivet.harvester.datamodel.extendedfield.ExtendableEntity
-
- dk.netarkivet.harvester.datamodel.Domain
-
- All Implemented Interfaces:
Named
public class Domain extends ExtendableEntity implements Named
Represents known information about a domain A domain is identified by a domain name (ex: kb.dk)The following information is used to control how a domain is harvested: Seedlists, configurations and passwords. Each seedlist defines one or more URL's that the harvester should use as starting points. A configuration defines a specific combination of settings (seedlist, harvester settings, passwords) that should be used during harvest. Passwords define user names and passwords that might be used for the domain.
Information about previous harvests of this domain is available via the domainHistory.
Information from the domain registrant (DK-HOSTMASTER) about the domain registration is available in the registration. This includes the dates where the domain was known to exist (included in a domain list), together with domain owner information.
Notice that each configuration references one of the seedlists by name, and possibly one of the Passwords.
-
-
Field Summary
Fields Modifier and Type Field Description protected static org.slf4j.Logger
log
The logger for this class.-
Fields inherited from class dk.netarkivet.harvester.datamodel.extendedfield.ExtendableEntity
extendedFieldValues
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addConfiguration(DomainConfiguration cfg)
Adds a new configuration to the domain.void
addOwnerInfo(DomainOwnerInfo owner)
Add owner information.void
addPassword(Password password)
Adds a password to the domain.void
addSeedList(SeedList seedlist)
Adds a seed list to the domain.AliasInfo
getAliasInfo()
Returns the alias info for this domain, or null if this domain is not an alias.Iterator<DomainConfiguration>
getAllConfigurations()
Gets all configurations belonging to this domain.List<DomainConfiguration>
getAllConfigurationsAsSortedList(Locale loc)
Gets all configurations belonging to this domain.DomainOwnerInfo[]
getAllDomainOwnerInfo()
Get array of domain owner information.Iterator<Password>
getAllPasswords()
Return the passwords defined for this domain.List<Password>
getAllPasswordsAsSortedList(Locale loc)
Returns the passwords defined for this domain.Iterator<SeedList>
getAllSeedLists()
Get all seedlists belonging to this domain.List<SeedList>
getAllSeedListsAsSortedList(Locale loc)
Gets all seedlists belonging to this domain.HarvestInfo
getBestHarvestInfoExpectation(String configName)
Gets the harvest info giving best information for expectation or how many objects a harvest using a given configuration will retrieve, we will prioritise the most recently harvest, where we have a full harvest.String
getComments()
Get the comment of this object.DomainConfiguration
getConfiguration(String cfgName)
Returns an already registered configuration.List<String>
getCrawlerTraps()
Returns the list of regexps never to be harvested from this domain, or the empty list if none.DomainConfiguration
getDefaultConfiguration()
Gets the default configuration.static Domain
getDefaultDomain(String domainName)
Get a new domain, initialised with default values.long
getEdition()
Get the edition number.protected int
getExtendedFieldType()
All derived classes allow ExtendedFields from Type ExtendedFieldTypes.DOMAINDomainHistory
getHistory()
Get the domain history.long
getID()
Get the ID of this domain.String
getName()
Gets the name of this domain.Password
getPassword(String name)
Get password information.SeedList
getSeedList(String name)
Get a specific seedlist previously added to this domain.boolean
hasConfiguration(String configName)
Returns true if this domain has the named configuration.boolean
hasPassword(String passwordName)
Returns true if this domain has the named password.boolean
hasSeedList(String name)
Return true if the named seedlist exists in this domain.void
removeConfiguration(String configName)
Removes a configuration from this domain.void
removePassword(String name)
Removes a password from this Domain.void
removeSeedList(String name)
Removes a seedlist from this Domain.void
setComments(String comments)
Set the comments for this domain.void
setCrawlerTraps(List<String> regExps, boolean strictMode)
Sets a list of regular expressions defining urls that should never be harvested from this domain.void
setDefaultConfiguration(String cfgName)
Mark a configuration as the default configuration to use.void
setEdition(long theNewEdition)
Set the edition number.String
toString()
Return a human-readable representation of this object.void
updateAlias(String alias)
Update which domain this domain is considered an alias of.void
updateConfiguration(DomainConfiguration cfg)
Replaces existing configuration with cfg, using cfg.getName() as the id for the configuration.void
updatePassword(Password password)
Updates a password on the domain.void
updateSeedList(SeedList seedlist)
Update a seed list to the domain.-
Methods inherited from class dk.netarkivet.harvester.datamodel.extendedfield.ExtendableEntity
addExtendedFieldValue, addExtendedFieldValues, getExtendedFieldValue, getExtendedFieldValues, setExtendedFieldValues, updateExtendedFieldValue
-
-
-
-
Constructor Detail
-
Domain
protected Domain(String theDomainName)
Create new instance of a domain. It is generally recommended that getDefaultDomain is used instead of this constructor.- Parameters:
theDomainName
- Name used to reference the domain- Throws:
ArgumentNotValid
- if either of the arguments are null or empty, or if the domain does not match the regex for valid domains
-
-
Method Detail
-
getDefaultDomain
public static Domain getDefaultDomain(String domainName)
Get a new domain, initialised with default values.- Parameters:
domainName
- The name of the domain- Returns:
- a domain with the given name
- Throws:
ArgumentNotValid
- if name is null or empty
-
addConfiguration
public void addConfiguration(DomainConfiguration cfg)
Adds a new configuration to the domain. If this is the first configuration added, it becomes the default configuration. The seedlist referenced by the configuration must already be registered in this domain otherwise an UnknownID exception is thrown.- Parameters:
cfg
- the configuration that is added- Throws:
UnknownID
- if the name of the seedlist referenced by cfg is unknownPermissionDenied
- if a configuration with the same name already existsArgumentNotValid
- if null supplied
-
addSeedList
public void addSeedList(SeedList seedlist)
Adds a seed list to the domain.- Parameters:
seedlist
- the actual seedslist.- Throws:
ArgumentNotValid
- if an argument is nullPermissionDenied
- if the seedName already exists
-
updateSeedList
public void updateSeedList(SeedList seedlist)
Update a seed list to the domain. Replaces an existing seedlist with the same name.- Parameters:
seedlist
- the actual seedslist.- Throws:
ArgumentNotValid
- if an argument is nullUnknownID
- if the seedlist.getName() does not exists
-
addPassword
public void addPassword(Password password)
Adds a password to the domain.- Parameters:
password
- A password object to add.- Throws:
ArgumentNotValid
- if the argument is nullPermissionDenied
- if a password already exists with this name
-
updatePassword
public void updatePassword(Password password)
Updates a password on the domain.- Parameters:
password
- A password object to update.- Throws:
ArgumentNotValid
- if the argument is nullPermissionDenied
- if no password exists with this name
-
setDefaultConfiguration
public void setDefaultConfiguration(String cfgName)
Mark a configuration as the default configuration to use. The configuration name must match an already added configuration, otherwise an UnknownID exception is thrown.- Parameters:
cfgName
- a name of a configuration- Throws:
UnknownID
- when the cfgName does not match an added configurationArgumentNotValid
- if cfgName is null or empty
-
getConfiguration
public DomainConfiguration getConfiguration(String cfgName)
Returns an already registered configuration.- Parameters:
cfgName
- the name of an registered configuration- Returns:
- the configuration
- Throws:
UnknownID
- if the name is not a registered configurationArgumentNotValid
- if cfgName is null or empty
-
getDefaultConfiguration
public DomainConfiguration getDefaultConfiguration()
Gets the default configuration. If no configuration has been explicitly set the first configuration added to this domain is returned. If no configurations have been added at all a UnknownID exception is thrown.- Returns:
- the default configuration (never null)
- Throws:
UnknownID
- if no configurations exists
-
getName
public String getName()
Gets the name of this domain.
-
getComments
public String getComments()
Description copied from interface:Named
Get the comment of this object.- Specified by:
getComments
in interfaceNamed
- Returns:
- the domain comments.
-
getHistory
public DomainHistory getHistory()
Get the domain history.- Returns:
- the domain history
-
getSeedList
public SeedList getSeedList(String name)
Get a specific seedlist previously added to this domain.- Parameters:
name
- the name of the seedlist to return- Returns:
- the specified seedlist
- Throws:
ArgumentNotValid
- if name is null or emptyUnknownID
- if no seedlist has been added with the supplied name
-
hasSeedList
public boolean hasSeedList(String name)
Return true if the named seedlist exists in this domain.- Parameters:
name
- String representing a possible seedlist for the domain.- Returns:
- true, if the named seedlist exists in this domain
-
removeSeedList
public void removeSeedList(String name)
Removes a seedlist from this Domain. The seedlist must not be in use by any of the configurations, otherwise a PermissionDenied exception is thrown.- Parameters:
name
- the name of the seedlist to remove- Throws:
PermissionDenied
- if the seedlist is in use by a configuration or this is the last seedlist in this DomainUnknownID
- if the no seedlist exists with the nameArgumentNotValid
- if a null argument is supplied
-
removePassword
public void removePassword(String name)
Removes a password from this Domain. The password must not be in use by any of the configurations, otherwise a PermissionDenied exception is thrown.- Parameters:
name
- the name of the password to remove- Throws:
PermissionDenied
- if the password is in use by a configuration or this is the last password in this DomainUnknownID
- if the no password exists with the nameArgumentNotValid
- if a null argument is supplied
-
removeConfiguration
public void removeConfiguration(String configName)
Removes a configuration from this domain. The default configuration can not be removed, instead PermissionDenied is thrown. It is not possible to remove a configuration that is referenced by one or more HarvestDefinitions- Parameters:
configName
- The name of a configuration to remove.- Throws:
ArgumentNotValid
- if name is null or emptyPermissionDenied
- if the default configuration is attempted removed or if one or more HarvestDefinitions reference the configuration
-
getAllConfigurations
public Iterator<DomainConfiguration> getAllConfigurations()
Gets all configurations belonging to this domain.- Returns:
- all configurations belonging to this domain.
-
getAllSeedLists
public Iterator<SeedList> getAllSeedLists()
Get all seedlists belonging to this domain.- Returns:
- all seedlists belonging to this domain
-
getAllPasswords
public Iterator<Password> getAllPasswords()
Return the passwords defined for this domain.- Returns:
- Iterator
of known passwords.
-
getAllConfigurationsAsSortedList
public List<DomainConfiguration> getAllConfigurationsAsSortedList(Locale loc)
Gets all configurations belonging to this domain. The returned list is sorted by name according to language given in the parameter.- Parameters:
loc
- contains the language sorting must adhere to- Returns:
- all configurations belonging to this domain sorted according to language
-
getAllSeedListsAsSortedList
public List<SeedList> getAllSeedListsAsSortedList(Locale loc)
Gets all seedlists belonging to this domain. The returned list is sorted by name according to language given in the parameter.- Parameters:
loc
- contains the language sorting must adhere to- Returns:
- all seedlists belonging to this domain sorted according to language
-
getAllPasswordsAsSortedList
public List<Password> getAllPasswordsAsSortedList(Locale loc)
Returns the passwords defined for this domain. The returned list is sorted by name according to language given in the parameter.- Parameters:
loc
- contains the language sorting must adhere to- Returns:
- a sorted list of known passwords according to language
-
addOwnerInfo
public void addOwnerInfo(DomainOwnerInfo owner)
Add owner information.- Parameters:
owner
- owner
-
getAllDomainOwnerInfo
public DomainOwnerInfo[] getAllDomainOwnerInfo()
Get array of domain owner information.- Returns:
- array containing information about the domain owner(s)
-
getPassword
public Password getPassword(String name)
Get password information.- Parameters:
name
- the id of the password settings to retrieve- Returns:
- the password information
- Throws:
UnknownID
- if no password info exists with the id "name"
-
setComments
public void setComments(String comments)
Set the comments for this domain.- Parameters:
comments
- The new comments (can be null)
-
updateConfiguration
public void updateConfiguration(DomainConfiguration cfg)
Replaces existing configuration with cfg, using cfg.getName() as the id for the configuration.- Parameters:
cfg
- the configuration to update- Throws:
UnknownID
- if no configuration exists with the id cfg.getName(). ArgumentNotValid if cfg is null.
-
hasPassword
public boolean hasPassword(String passwordName)
Returns true if this domain has the named password.- Parameters:
passwordName
- the identifier of the password info- Returns:
- true if this domain has password info with id passwordname
-
hasConfiguration
public boolean hasConfiguration(String configName)
Returns true if this domain has the named configuration.- Parameters:
configName
- the identifier of the configuration- Returns:
- true if this domain has a configuration with id configNmae
-
getEdition
public long getEdition()
Get the edition number.- Returns:
- the edition number
-
setEdition
public void setEdition(long theNewEdition)
Set the edition number.- Parameters:
theNewEdition
- the new edition
-
getID
public long getID()
Get the ID of this domain. Only for use by DBDAO- Returns:
- Get the ID of this domain
-
toString
public String toString()
Return a human-readable representation of this object.
-
setCrawlerTraps
public void setCrawlerTraps(List<String> regExps, boolean strictMode)
Sets a list of regular expressions defining urls that should never be harvested from this domain. The list (after trimming the strings, and any empty strings have been removed) is copied to a list that is stored immutably.- Parameters:
regExps
- The list defining urls never to be harvested.strictMode
- If true, we throw ArgumentNotValid exception if invalid regexps are found- Throws:
ArgumentNotValid
- if regExps is null or regExps contains invalid regular expressions (unless strictMode is false).
-
getCrawlerTraps
public List<String> getCrawlerTraps()
Returns the list of regexps never to be harvested from this domain, or the empty list if none. The returned list should never be null.- Returns:
- The list of regexps of url's never to be harvested when harvesting this domain. This list is immutable.
-
getAliasInfo
public AliasInfo getAliasInfo()
Returns the alias info for this domain, or null if this domain is not an alias.- Returns:
- A domain name.
-
updateAlias
public void updateAlias(String alias)
Update which domain this domain is considered an alias of. Calling this function will a) cause some slightly expensive checks to be performed, and b) set the time of last update. For object construction and copying, use setAlias.- Parameters:
alias
- The name (e.g. "netarkivet.dk") of the domain that this domain is an alias of.- Throws:
UnknownID
- If the given domain does not existIllegalState
- If updating the alias info would violate constraints of alias: No transitivity, no reflection.
-
getBestHarvestInfoExpectation
public HarvestInfo getBestHarvestInfoExpectation(String configName)
Gets the harvest info giving best information for expectation or how many objects a harvest using a given configuration will retrieve, we will prioritise the most recently harvest, where we have a full harvest.- Parameters:
configName
- The name of the configuration- Returns:
- The Harvest Information for the harvest defining the best expectation, including the number retrieved and the stop reason.
-
getExtendedFieldType
protected int getExtendedFieldType()
All derived classes allow ExtendedFields from Type ExtendedFieldTypes.DOMAIN- Specified by:
getExtendedFieldType
in classExtendableEntity
- Returns:
- ExtendedFieldTypes.DOMAIN
-
-