Class OnNSDomainsDecideRule

  • All Implemented Interfaces:
    Serializable, EventListener, org.archive.checkpointing.Checkpointable, org.archive.modules.seeds.SeedListener, org.archive.spring.HasKeyedProperties, org.springframework.beans.factory.Aware, org.springframework.beans.factory.BeanNameAware, org.springframework.context.ApplicationListener<org.springframework.context.ApplicationEvent>

    public class OnNSDomainsDecideRule
    extends org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
    Class that re-creates the SurtPrefixSet to include only domain names according to the domain definition of NetarchiveSuite. The NetarchiveSuite can't use the org.archive.crawler.deciderules.OnDomainsDecideRule because it uses a different domain definition.
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static String NON_VALID_DOMAIN
      This is what SurtPrefixSet.prefixFromPlain returns for a non valid URI.
      static Pattern SURT_FIRSTPART_PATTERN
      Pattern that matches the first part of SURT - until ??
      • Fields inherited from class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule

        beanName, recoveryCheckpoint, seeds, seedsAsSurtPrefixes, surtPrefixes, surtsDumpFile, surtsSource
      • Fields inherited from class org.archive.modules.deciderules.DecideRule

        comment, kp
    • Constructor Summary

      Constructors 
      Constructor Description
      OnNSDomainsDecideRule()
      Constructor for the class OnNSDomainsDecideRule.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static String convertToDomain​(String uri)
      Convert a URI to its domain.
      protected void myBuildSurtPrefixSet()
      Method that rebuilds the SurtPrefixSet to include only topmost domains - according to the domain definition in NetarchiveSuite.
      protected String prefixFrom​(String uri)
      Generate the SURT prefix that matches the domain definition of NetarchiveSuite.
      protected void readPrefixes()
      We override the default readPrefixes, because we want to make our prefixes.
      • Methods inherited from class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule

        addedSeed, buildSurtPrefixSet, concludedSeedBatch, doCheckpoint, dumpSurtPrefixSet, evaluate, finishCheckpoint, getAlsoCheckVia, getSeeds, getSeedsAsSurtPrefixes, getSurtsDumpFile, getSurtsSource, getSurtsSourceFile, nonseedLine, onApplicationEvent, setAlsoCheckVia, setBeanName, setRecoveryCheckpoint, setSeeds, setSeedsAsSurtPrefixes, setSurtsDumpFile, setSurtsSource, setSurtsSourceFile, startCheckpoint
      • Methods inherited from class org.archive.modules.deciderules.PredicatedDecideRule

        getDecision, innerDecide, onlyDecision, setDecision
      • Methods inherited from class org.archive.modules.deciderules.DecideRule

        accepts, decisionFor, getComment, getEnabled, getKeyedProperties, setComment, setEnabled
    • Field Detail

      • NON_VALID_DOMAIN

        public static final String NON_VALID_DOMAIN
        This is what SurtPrefixSet.prefixFromPlain returns for a non valid URI.
        See Also:
        Constant Field Values
      • SURT_FIRSTPART_PATTERN

        public static final Pattern SURT_FIRSTPART_PATTERN
        Pattern that matches the first part of SURT - until ??
    • Constructor Detail

      • OnNSDomainsDecideRule

        public OnNSDomainsDecideRule()
        Constructor for the class OnNSDomainsDecideRule. Makes the configured decision for any URI which is inside one of the domains in the configured set of domains - according to the domain definition of the NetarchiveSuite system. Giving that e.g. sports.tv2.dk will resolve to tv2.dk but www.bbc.co.uk will resolve to bbc.co.uk"
    • Method Detail

      • readPrefixes

        protected void readPrefixes()
        We override the default readPrefixes, because we want to make our prefixes.
        Overrides:
        readPrefixes in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
      • myBuildSurtPrefixSet

        protected void myBuildSurtPrefixSet()
        Method that rebuilds the SurtPrefixSet to include only topmost domains - according to the domain definition in NetarchiveSuite. This is only done once, during the startup phase?
      • prefixFrom

        protected String prefixFrom​(String uri)
        Generate the SURT prefix that matches the domain definition of NetarchiveSuite.
        Overrides:
        prefixFrom in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
        Parameters:
        uri - URL to convert to SURT
        Returns:
        String with SURT that matches the domain definition of NetarchiveSuite
      • convertToDomain

        public static String convertToDomain​(String uri)
        Convert a URI to its domain.
        Parameters:
        uri - URL to convert to Top most domain-name according to NetarchiveSuite definition
        Returns:
        Domain name