dk.netarkivet.harvester.datamodel
Class Domain

java.lang.Object
  extended by dk.netarkivet.harvester.datamodel.Domain
All Implemented Interfaces:
Named

public class Domain
extends java.lang.Object
implements Named

Represents known information about a domain A domain is identified by a domain name (ex: kb.dk)

The following information is used to control how a domain is harvested: Seedlists, configurations and passwords. Each seedlist defines one or more URL's that the harvester should use as starting points. A configuration defines a specific combination of settings (seedlist, harvester settings, passwords) that should be used during harvest. Passwords define user names and passwords that might be used for the domain.

Information about previous harvests of this domain is available via the domainHistory.

Information from the domain registrant (DK-HOSTMASTER) about the domain registration is available in the registration. This includes the dates where the domain was known to exist (included in a domain list), together with domain owner information.

Notice that each configuration references one of the seedlists by name, and possibly one of the Passwords.


Field Summary
(package private)  long edition
          Edition is used by the DAO to keep track of changes.
protected static org.apache.commons.logging.Log log
          Prefix all domain names with this string.
 
Constructor Summary
protected Domain(java.lang.String theDomainName)
          Create new instance of a domain.
 
Method Summary
 void addConfiguration(DomainConfiguration cfg)
          Adds a new configuration to the domain.
 void addExtendedFieldValue(ExtendedFieldValue aValue)
          adds a Value to the ExtendedFieldValue List
 void addOwnerInfo(DomainOwnerInfo owner)
          Add owner information.
 void addPassword(Password password)
          Adds a password to the domain.
 void addSeedList(SeedList seedlist)
          Adds a seed list to the domain.
 AliasInfo getAliasInfo()
          Returns the alias info for this domain, or null if this domain is not an alias.
 java.util.Iterator<DomainConfiguration> getAllConfigurations()
          Gets all configurations belonging to this domain.
 java.util.List<DomainConfiguration> getAllConfigurationsAsSortedList(java.util.Locale loc)
          Gets all configurations belonging to this domain.
 DomainOwnerInfo[] getAllDomainOwnerInfo()
          Get array of domain owner information.
 java.util.Iterator<Password> getAllPasswords()
          Return the passwords defined for this domain.
 java.util.List<Password> getAllPasswordsAsSortedList(java.util.Locale loc)
          Returns the passwords defined for this domain.
 java.util.Iterator<SeedList> getAllSeedLists()
          Get all seedlists belonging to this domain.
 java.util.List<SeedList> getAllSeedListsAsSortedList(java.util.Locale loc)
          Gets all seedlists belonging to this domain.
 java.lang.String getComments()
          Get the comment of this object.
 DomainConfiguration getConfiguration(java.lang.String cfgName)
          Returns an already registered configuration.
 java.util.List<java.lang.String> getCrawlerTraps()
          Returns the list of regexps never to be harvested from this domain, or the empty list if none.
 DomainConfiguration getDefaultConfiguration()
          Gets the default configuration.
static Domain getDefaultDomain(java.lang.String domainName)
          Get a new domain, initialised with default values.
 long getEdition()
          Get the edition number.
 ExtendedFieldValue getExtendedFieldValue(java.lang.Long aExtendedFieldId)
          gets a extendedFieldValue by extendedField Is
 java.util.List<ExtendedFieldValue> getExtendedFieldValues()
          returns a List of all ExtendedfieldValues
 DomainHistory getHistory()
          Get the domain history.
 long getID()
          Get the ID of this domain.
 java.lang.String getName()
          Gets the name of this domain.
 Password getPassword(java.lang.String name)
          Get password information.
 SeedList getSeedList(java.lang.String name)
          Get a specific seedlist previously added to this domain.
 boolean hasConfiguration(java.lang.String configName)
          Returns true if this domain has the named configuration.
(package private)  boolean hasID()
          Check if this harvestinfo has an ID set yet (doesn't happen until the DBDAO persists it).
 boolean hasPassword(java.lang.String passwordName)
          Returns true if this domain has the named password.
 boolean hasSeedList(java.lang.String name)
          Return true if the named seedlist exists in this domain.
 void removeConfiguration(java.lang.String name)
          Removes a configuration from this domain.
 void removePassword(java.lang.String name)
          Removes a password from this Domain.
 void removeSeedList(java.lang.String name)
          Removes a seedlist from this Domain.
(package private)  void setAliasInfo(AliasInfo aliasInfo)
          Set the alias field on this object.
 void setComments(java.lang.String comments)
          Set the comments for this domain.
 void setCrawlerTraps(java.util.List<java.lang.String> regExps, boolean strictMode)
          Sets a list of regular expressions defining urls that should never be harvested from this domain.
 void setDefaultConfiguration(java.lang.String cfgName)
          Mark a configuration as the default configuration to use.
 void setEdition(long theEdition)
          Set the edition number.
 void setExtendedFieldValues(java.util.List<ExtendedFieldValue> aList)
          sets a List of extendedFieldValues
(package private)  void setID(long id)
          Set the ID of this domain.
 java.lang.String toString()
          Return a human-readable representation of this object.
 void updateAlias(java.lang.String alias)
          Update which domain this domain is considered an alias of.
 void updateConfiguration(DomainConfiguration cfg)
          Replaces existing configuration with cfg, using cfg.getName() as the id for the configuration.
 void updateExtendedFieldValue(java.lang.Long aExtendedFieldId, java.lang.String aContent)
          updates a extendedFieldValue by extendedField Is
 void updatePassword(Password password)
          Updates a password on the domain.
 void updateSeedList(SeedList seedlist)
          Update a seed list to the domain.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

log

protected static final org.apache.commons.logging.Log log
Prefix all domain names with this string.


edition

long edition
Edition is used by the DAO to keep track of changes.

Constructor Detail

Domain

protected Domain(java.lang.String theDomainName)
Create new instance of a domain. It is generally recommended that getDefaultDomain is used instead of this constructor.

Parameters:
theDomainName - Name used to reference the domain
Throws:
ArgumentNotValid - if either of the arguments are null or empty, or if the domain does not match the regex for valid domains
Method Detail

getDefaultDomain

public static Domain getDefaultDomain(java.lang.String domainName)
Get a new domain, initialised with default values.

Parameters:
domainName - The name of the domain
Returns:
a domain with the given name
Throws:
ArgumentNotValid - if name is null or empty

addConfiguration

public void addConfiguration(DomainConfiguration cfg)
Adds a new configuration to the domain. If this is the first configuration added, it becomes the default configuration. The seedlist referenced by the configuration must already be registered in this domain otherwise an UnknownID exception is thrown.

Parameters:
cfg - the configuration that is added
Throws:
UnknownID - if the name of the seedlist referenced by cfg is unknown
PermissionDenied - if a configuration with the same name already exists
ArgumentNotValid - if null supplied

addSeedList

public void addSeedList(SeedList seedlist)
Adds a seed list to the domain.

Parameters:
seedlist - the actual seedslist.
Throws:
ArgumentNotValid - if an argument is null
PermissionDenied - if the seedName already exists

updateSeedList

public void updateSeedList(SeedList seedlist)
Update a seed list to the domain. Replaces an existing seedlist with the same name.

Parameters:
seedlist - the actual seedslist.
Throws:
ArgumentNotValid - if an argument is null
UnknownID - if the seedlist.getName() does not exists

addPassword

public void addPassword(Password password)
Adds a password to the domain.

Parameters:
password - A password object to add.
Throws:
ArgumentNotValid - if the argument is null
PermissionDenied - if a password already exists with this name

updatePassword

public void updatePassword(Password password)
Updates a password on the domain.

Parameters:
password - A password object to update.
Throws:
ArgumentNotValid - if the argument is null
PermissionDenied - if no password exists with this name

setDefaultConfiguration

public void setDefaultConfiguration(java.lang.String cfgName)
Mark a configuration as the default configuration to use. The configuration name must match an already added configuration, otherwise an UnknownID exception is thrown.

Parameters:
cfgName -
Throws:
UnknownID - when the cfgName does not match an added configuration
ArgumentNotValid - if cfgName is null or empty

getConfiguration

public DomainConfiguration getConfiguration(java.lang.String cfgName)
Returns an already registered configuration.

Parameters:
cfgName - the name of an registered configuration
Returns:
the configuration
Throws:
UnknownID - if the name is not a registered configuration
ArgumentNotValid - if cfgName is null or empty

getDefaultConfiguration

public DomainConfiguration getDefaultConfiguration()
Gets the default configuration. If no configuration has been explicitly set the first configuration added to this domain is returned. If no configurations have been added at all a UnknownID exception is thrown.

Returns:
the default configuration (never null)
Throws:
UnknownID - if no configurations exists

getName

public java.lang.String getName()
Gets the name of this domain.

Specified by:
getName in interface Named
Returns:
the name of this domain

getComments

public java.lang.String getComments()
Description copied from interface: Named
Get the comment of this object.

Specified by:
getComments in interface Named
Returns:
The name of this object.

getHistory

public DomainHistory getHistory()
Get the domain history.

Returns:
the domain history

getSeedList

public SeedList getSeedList(java.lang.String name)
Get a specific seedlist previously added to this domain.

Parameters:
name - the name of the seedlist to return
Returns:
the specified seedlist
Throws:
ArgumentNotValid - if name is null or empty
UnknownID - if no seedlist has been added with the supplied name

hasSeedList

public boolean hasSeedList(java.lang.String name)
Return true if the named seedlist exists in this domain.

Parameters:
name - String representing a possible seedlist for the domain.
Returns:
true, if the named seedlist exists in this domain

removeSeedList

public void removeSeedList(java.lang.String name)
Removes a seedlist from this Domain. The seedlist must not be in use by any of the configurations, otherwise a PermissionDenied exception is thrown.

Parameters:
name - the name of the seedlist to remove
Throws:
PermissionDenied - if the seedlist is in use by a configuration or this is the last seedlist in this Domain
UnknownID - if the no seedlist exists with the name
ArgumentNotValid - if a null argument is supplied

removePassword

public void removePassword(java.lang.String name)
Removes a password from this Domain. The password must not be in use by any of the configurations, otherwise a PermissionDenied exception is thrown.

Parameters:
name - the name of the password to remove
Throws:
PermissionDenied - if the password is in use by a configuration or this is the last password in this Domain
UnknownID - if the no password exists with the name
ArgumentNotValid - if a null argument is supplied

removeConfiguration

public void removeConfiguration(java.lang.String name)
Removes a configuration from this domain. The default configuration can not be removed, instead PermissionDenied is thrown. It is not possible to remove a configuration that is referenced by one or more HarvestDefinitions

Parameters:
name -
Throws:
ArgumentNotValid - if name is null or empty
PermissionDenied - if the default configuration is attempted removed or if one or more HarvestDefinitions reference the configuration

getAllConfigurations

public java.util.Iterator<DomainConfiguration> getAllConfigurations()
Gets all configurations belonging to this domain.

Returns:
all configurations belonging to this domain.

getAllSeedLists

public java.util.Iterator<SeedList> getAllSeedLists()
Get all seedlists belonging to this domain.

Returns:
all seedlists belonging to this domain

getAllPasswords

public java.util.Iterator<Password> getAllPasswords()
Return the passwords defined for this domain.

Returns:
Iterator of known passwords.

getAllConfigurationsAsSortedList

public java.util.List<DomainConfiguration> getAllConfigurationsAsSortedList(java.util.Locale loc)
Gets all configurations belonging to this domain. The returned list is sorted by name according to language given in the parameter

Parameters:
loc - contains the language sorting must adhere to
Returns:
all configurations belonging to this domain sorted according to language

getAllSeedListsAsSortedList

public java.util.List<SeedList> getAllSeedListsAsSortedList(java.util.Locale loc)
Gets all seedlists belonging to this domain. The returned list is sorted by name according to language given in the parameter.

Parameters:
loc - contains the language sorting must adhere to
Returns:
all seedlists belonging to this domain sorted according to language

getAllPasswordsAsSortedList

public java.util.List<Password> getAllPasswordsAsSortedList(java.util.Locale loc)
Returns the passwords defined for this domain. The returned list is sorted by name according to language given in the parameter.

Parameters:
loc - contains the language sorting must adhere to
Returns:
a sorted list of known passwords according to language

addOwnerInfo

public void addOwnerInfo(DomainOwnerInfo owner)
Add owner information.

Parameters:
owner - owner

getAllDomainOwnerInfo

public DomainOwnerInfo[] getAllDomainOwnerInfo()
Get array of domain owner information.

Returns:
array containing information about the domain owner(s)

getPassword

public Password getPassword(java.lang.String name)
Get password information.

Parameters:
name - the id of the password settings to retrieve
Returns:
the password information
Throws:
UnknownID - if no password info exists with the id "name"

setComments

public void setComments(java.lang.String comments)
Set the comments for this domain.

Parameters:
comments -

updateConfiguration

public void updateConfiguration(DomainConfiguration cfg)
Replaces existing configuration with cfg, using cfg.getName() as the id for the configuration.

Parameters:
cfg - the configuration to update
Throws:
UnknownID - if no configuration exists with the id cfg.getName(). ArgumentNotValid if cfg is null.

hasPassword

public boolean hasPassword(java.lang.String passwordName)
Returns true if this domain has the named password.

Parameters:
passwordName - the identifier of the password info
Returns:
true if this domain has password info with id passwordname

hasConfiguration

public boolean hasConfiguration(java.lang.String configName)
Returns true if this domain has the named configuration.

Parameters:
configName - the identifier of the configuration
Returns:
true if this domain has a configuration with id configNmae

getEdition

public long getEdition()
Get the edition number.

Returns:
the edition number

setEdition

public void setEdition(long theEdition)
Set the edition number.

Parameters:
theEdition -

getID

public long getID()
Get the ID of this domain. Only for use by DBDAO

Returns:
Get the ID of this domain

setID

void setID(long id)
Set the ID of this domain. Only for use by DBDAO.

Parameters:
id - The new ID for this domain.

hasID

boolean hasID()
Check if this harvestinfo has an ID set yet (doesn't happen until the DBDAO persists it).

Returns:
true, if this domain has an ID different from null

toString

public java.lang.String toString()
Return a human-readable representation of this object.

Overrides:
toString in class java.lang.Object
Returns:
Some string identifying the object. Do not use this for machine processing.

setCrawlerTraps

public void setCrawlerTraps(java.util.List<java.lang.String> regExps,
                            boolean strictMode)
Sets a list of regular expressions defining urls that should never be harvested from this domain. The list (after trimming the strings, and any empty strings have been removed) is copied to a list that is stored immutably.

Parameters:
regExps - The list defining urls never to be harvested.
strictMode - If true, we throw ArgumentNotValid exception if invalid regexps are found
Throws:
ArgumentNotValid - if regExps is null or regExps contains invalid regular expressions (unless strictMode is false).

getCrawlerTraps

public java.util.List<java.lang.String> getCrawlerTraps()
Returns the list of regexps never to be harvested from this domain, or the empty list if none. The returned list should never be null.

Returns:
The list of regexps of url's never to be harvested when harvesting this domain. This list is immutable.

getAliasInfo

public AliasInfo getAliasInfo()
Returns the alias info for this domain, or null if this domain is not an alias.

Returns:
A domain name.

updateAlias

public void updateAlias(java.lang.String alias)
Update which domain this domain is considered an alias of. Calling this function will a) cause some slightly expensive checks to be performed, and b) set the time of last update. For object construction and copying, use setAlias.

Parameters:
alias - The name (e.g. "netarkivet.dk") of the domain that this domain is an alias of.
Throws:
UnknownID - If the given domain does not exist
IllegalState - If updating the alias info would violate constraints of alias: No transitivity, no reflection.

setAliasInfo

void setAliasInfo(AliasInfo aliasInfo)
Set the alias field on this object. This function performs no checking of existence of transitivity of alias domains, but it does check that the alias info is for this domain

Parameters:
aliasInfo - Alias information
Throws:
ArgumentNotValid - if the alias info is not for this domain

getExtendedFieldValues

public java.util.List<ExtendedFieldValue> getExtendedFieldValues()
returns a List of all ExtendedfieldValues


setExtendedFieldValues

public void setExtendedFieldValues(java.util.List<ExtendedFieldValue> aList)
sets a List of extendedFieldValues

Parameters:
aValue - List of extended Field objects

addExtendedFieldValue

public void addExtendedFieldValue(ExtendedFieldValue aValue)
adds a Value to the ExtendedFieldValue List

Parameters:
aValue - Valueobject of the extended Field

getExtendedFieldValue

public ExtendedFieldValue getExtendedFieldValue(java.lang.Long aExtendedFieldId)
gets a extendedFieldValue by extendedField Is

Parameters:
aExtendedFieldId - id of the extendedfield
Returns:
ExtendedFieldValue Object

updateExtendedFieldValue

public void updateExtendedFieldValue(java.lang.Long aExtendedFieldId,
                                     java.lang.String aContent)
updates a extendedFieldValue by extendedField Is

Parameters:
aExtendedFieldId - id of the extendedfield
aContent - id content to set