Class Domain

  • All Implemented Interfaces:
    Named

    public class Domain
    extends ExtendableEntity
    implements Named
    Represents known information about a domain A domain is identified by a domain name (ex: kb.dk)

    The following information is used to control how a domain is harvested: Seedlists, configurations and passwords. Each seedlist defines one or more URL's that the harvester should use as starting points. A configuration defines a specific combination of settings (seedlist, harvester settings, passwords) that should be used during harvest. Passwords define user names and passwords that might be used for the domain.

    Information about previous harvests of this domain is available via the domainHistory.

    Information from the domain registrant (DK-HOSTMASTER) about the domain registration is available in the registration. This includes the dates where the domain was known to exist (included in a domain list), together with domain owner information.

    Notice that each configuration references one of the seedlists by name, and possibly one of the Passwords.

    • Field Detail

      • log

        protected static final org.slf4j.Logger log
        The logger for this class.
    • Constructor Detail

      • Domain

        protected Domain​(String theDomainName)
        Create new instance of a domain. It is generally recommended that getDefaultDomain is used instead of this constructor.
        Parameters:
        theDomainName - Name used to reference the domain
        Throws:
        ArgumentNotValid - if either of the arguments are null or empty, or if the domain does not match the regex for valid domains
    • Method Detail

      • getDefaultDomain

        public static Domain getDefaultDomain​(String domainName)
        Get a new domain, initialised with default values.
        Parameters:
        domainName - The name of the domain
        Returns:
        a domain with the given name
        Throws:
        ArgumentNotValid - if name is null or empty
      • addConfiguration

        public void addConfiguration​(DomainConfiguration cfg)
        Adds a new configuration to the domain. If this is the first configuration added, it becomes the default configuration. The seedlist referenced by the configuration must already be registered in this domain otherwise an UnknownID exception is thrown.
        Parameters:
        cfg - the configuration that is added
        Throws:
        UnknownID - if the name of the seedlist referenced by cfg is unknown
        PermissionDenied - if a configuration with the same name already exists
        ArgumentNotValid - if null supplied
      • addSeedList

        public void addSeedList​(SeedList seedlist)
        Adds a seed list to the domain.
        Parameters:
        seedlist - the actual seedslist.
        Throws:
        ArgumentNotValid - if an argument is null
        PermissionDenied - if the seedName already exists
      • updateSeedList

        public void updateSeedList​(SeedList seedlist)
        Update a seed list to the domain. Replaces an existing seedlist with the same name.
        Parameters:
        seedlist - the actual seedslist.
        Throws:
        ArgumentNotValid - if an argument is null
        UnknownID - if the seedlist.getName() does not exists
      • addPassword

        public void addPassword​(Password password)
        Adds a password to the domain.
        Parameters:
        password - A password object to add.
        Throws:
        ArgumentNotValid - if the argument is null
        PermissionDenied - if a password already exists with this name
      • updatePassword

        public void updatePassword​(Password password)
        Updates a password on the domain.
        Parameters:
        password - A password object to update.
        Throws:
        ArgumentNotValid - if the argument is null
        PermissionDenied - if no password exists with this name
      • setDefaultConfiguration

        public void setDefaultConfiguration​(String cfgName)
        Mark a configuration as the default configuration to use. The configuration name must match an already added configuration, otherwise an UnknownID exception is thrown.
        Parameters:
        cfgName - a name of a configuration
        Throws:
        UnknownID - when the cfgName does not match an added configuration
        ArgumentNotValid - if cfgName is null or empty
      • getConfiguration

        public DomainConfiguration getConfiguration​(String cfgName)
        Returns an already registered configuration.
        Parameters:
        cfgName - the name of an registered configuration
        Returns:
        the configuration
        Throws:
        UnknownID - if the name is not a registered configuration
        ArgumentNotValid - if cfgName is null or empty
      • getDefaultConfiguration

        public DomainConfiguration getDefaultConfiguration()
        Gets the default configuration. If no configuration has been explicitly set the first configuration added to this domain is returned. If no configurations have been added at all a UnknownID exception is thrown.
        Returns:
        the default configuration (never null)
        Throws:
        UnknownID - if no configurations exists
      • getName

        public String getName()
        Gets the name of this domain.
        Specified by:
        getName in interface Named
        Returns:
        the name of this domain
      • getComments

        public String getComments()
        Description copied from interface: Named
        Get the comment of this object.
        Specified by:
        getComments in interface Named
        Returns:
        the domain comments.
      • getHistory

        public DomainHistory getHistory()
        Get the domain history.
        Returns:
        the domain history
      • getSeedList

        public SeedList getSeedList​(String name)
        Get a specific seedlist previously added to this domain.
        Parameters:
        name - the name of the seedlist to return
        Returns:
        the specified seedlist
        Throws:
        ArgumentNotValid - if name is null or empty
        UnknownID - if no seedlist has been added with the supplied name
      • hasSeedList

        public boolean hasSeedList​(String name)
        Return true if the named seedlist exists in this domain.
        Parameters:
        name - String representing a possible seedlist for the domain.
        Returns:
        true, if the named seedlist exists in this domain
      • removeSeedList

        public void removeSeedList​(String name)
        Removes a seedlist from this Domain. The seedlist must not be in use by any of the configurations, otherwise a PermissionDenied exception is thrown.
        Parameters:
        name - the name of the seedlist to remove
        Throws:
        PermissionDenied - if the seedlist is in use by a configuration or this is the last seedlist in this Domain
        UnknownID - if the no seedlist exists with the name
        ArgumentNotValid - if a null argument is supplied
      • removePassword

        public void removePassword​(String name)
        Removes a password from this Domain. The password must not be in use by any of the configurations, otherwise a PermissionDenied exception is thrown.
        Parameters:
        name - the name of the password to remove
        Throws:
        PermissionDenied - if the password is in use by a configuration or this is the last password in this Domain
        UnknownID - if the no password exists with the name
        ArgumentNotValid - if a null argument is supplied
      • removeConfiguration

        public void removeConfiguration​(String configName)
        Removes a configuration from this domain. The default configuration can not be removed, instead PermissionDenied is thrown. It is not possible to remove a configuration that is referenced by one or more HarvestDefinitions
        Parameters:
        configName - The name of a configuration to remove.
        Throws:
        ArgumentNotValid - if name is null or empty
        PermissionDenied - if the default configuration is attempted removed or if one or more HarvestDefinitions reference the configuration
      • getAllConfigurations

        public Iterator<DomainConfiguration> getAllConfigurations()
        Gets all configurations belonging to this domain.
        Returns:
        all configurations belonging to this domain.
      • getAllSeedLists

        public Iterator<SeedList> getAllSeedLists()
        Get all seedlists belonging to this domain.
        Returns:
        all seedlists belonging to this domain
      • getAllPasswords

        public Iterator<Password> getAllPasswords()
        Return the passwords defined for this domain.
        Returns:
        Iterator of known passwords.
      • getAllConfigurationsAsSortedList

        public List<DomainConfiguration> getAllConfigurationsAsSortedList​(Locale loc)
        Gets all configurations belonging to this domain. The returned list is sorted by name according to language given in the parameter.
        Parameters:
        loc - contains the language sorting must adhere to
        Returns:
        all configurations belonging to this domain sorted according to language
      • getAllSeedListsAsSortedList

        public List<SeedList> getAllSeedListsAsSortedList​(Locale loc)
        Gets all seedlists belonging to this domain. The returned list is sorted by name according to language given in the parameter.
        Parameters:
        loc - contains the language sorting must adhere to
        Returns:
        all seedlists belonging to this domain sorted according to language
      • getAllPasswordsAsSortedList

        public List<Password> getAllPasswordsAsSortedList​(Locale loc)
        Returns the passwords defined for this domain. The returned list is sorted by name according to language given in the parameter.
        Parameters:
        loc - contains the language sorting must adhere to
        Returns:
        a sorted list of known passwords according to language
      • addOwnerInfo

        public void addOwnerInfo​(DomainOwnerInfo owner)
        Add owner information.
        Parameters:
        owner - owner
      • getAllDomainOwnerInfo

        public DomainOwnerInfo[] getAllDomainOwnerInfo()
        Get array of domain owner information.
        Returns:
        array containing information about the domain owner(s)
      • getPassword

        public Password getPassword​(String name)
        Get password information.
        Parameters:
        name - the id of the password settings to retrieve
        Returns:
        the password information
        Throws:
        UnknownID - if no password info exists with the id "name"
      • setComments

        public void setComments​(String comments)
        Set the comments for this domain.
        Parameters:
        comments - The new comments (can be null)
      • updateConfiguration

        public void updateConfiguration​(DomainConfiguration cfg)
        Replaces existing configuration with cfg, using cfg.getName() as the id for the configuration.
        Parameters:
        cfg - the configuration to update
        Throws:
        UnknownID - if no configuration exists with the id cfg.getName(). ArgumentNotValid if cfg is null.
      • hasPassword

        public boolean hasPassword​(String passwordName)
        Returns true if this domain has the named password.
        Parameters:
        passwordName - the identifier of the password info
        Returns:
        true if this domain has password info with id passwordname
      • hasConfiguration

        public boolean hasConfiguration​(String configName)
        Returns true if this domain has the named configuration.
        Parameters:
        configName - the identifier of the configuration
        Returns:
        true if this domain has a configuration with id configNmae
      • getEdition

        public long getEdition()
        Get the edition number.
        Returns:
        the edition number
      • setEdition

        public void setEdition​(long theNewEdition)
        Set the edition number.
        Parameters:
        theNewEdition - the new edition
      • getID

        public long getID()
        Get the ID of this domain. Only for use by DBDAO
        Returns:
        Get the ID of this domain
      • toString

        public String toString()
        Return a human-readable representation of this object.
        Overrides:
        toString in class Object
        Returns:
        Some string identifying the object. Do not use this for machine processing.
      • setCrawlerTraps

        public void setCrawlerTraps​(List<String> regExps,
                                    boolean strictMode)
        Sets a list of regular expressions defining urls that should never be harvested from this domain. The list (after trimming the strings, and any empty strings have been removed) is copied to a list that is stored immutably.
        Parameters:
        regExps - The list defining urls never to be harvested.
        strictMode - If true, we throw ArgumentNotValid exception if invalid regexps are found
        Throws:
        ArgumentNotValid - if regExps is null or regExps contains invalid regular expressions (unless strictMode is false).
      • getCrawlerTraps

        public List<String> getCrawlerTraps()
        Returns the list of regexps never to be harvested from this domain, or the empty list if none. The returned list should never be null.
        Returns:
        The list of regexps of url's never to be harvested when harvesting this domain. This list is immutable.
      • getAliasInfo

        public AliasInfo getAliasInfo()
        Returns the alias info for this domain, or null if this domain is not an alias.
        Returns:
        A domain name.
      • updateAlias

        public void updateAlias​(String alias)
        Update which domain this domain is considered an alias of. Calling this function will a) cause some slightly expensive checks to be performed, and b) set the time of last update. For object construction and copying, use setAlias.
        Parameters:
        alias - The name (e.g. "netarkivet.dk") of the domain that this domain is an alias of.
        Throws:
        UnknownID - If the given domain does not exist
        IllegalState - If updating the alias info would violate constraints of alias: No transitivity, no reflection.
      • getBestHarvestInfoExpectation

        public HarvestInfo getBestHarvestInfoExpectation​(String configName)
        Gets the harvest info giving best information for expectation or how many objects a harvest using a given configuration will retrieve, we will prioritise the most recently harvest, where we have a full harvest.
        Parameters:
        configName - The name of the configuration
        Returns:
        The Harvest Information for the harvest defining the best expectation, including the number retrieved and the stop reason.
      • getExtendedFieldType

        protected int getExtendedFieldType()
        All derived classes allow ExtendedFields from Type ExtendedFieldTypes.DOMAIN
        Specified by:
        getExtendedFieldType in class ExtendableEntity
        Returns:
        ExtendedFieldTypes.DOMAIN