dk.netarkivet.harvester.harvesting
Class OnNSDomainsDecideRule

java.lang.Object
  extended by javax.management.Attribute
      extended by org.archive.crawler.settings.Type
          extended by org.archive.crawler.settings.ComplexType
              extended by org.archive.crawler.settings.ModuleType
                  extended by org.archive.crawler.deciderules.DecideRule
                      extended by org.archive.crawler.deciderules.ConfiguredDecideRule
                          extended by org.archive.crawler.deciderules.PredicatedDecideRule
                              extended by org.archive.crawler.deciderules.SurtPrefixedDecideRule
                                  extended by dk.netarkivet.harvester.harvesting.OnNSDomainsDecideRule
All Implemented Interfaces:
java.io.Serializable, javax.management.DynamicMBean, org.archive.crawler.scope.SeedListener

public class OnNSDomainsDecideRule
extends org.archive.crawler.deciderules.SurtPrefixedDecideRule

Class that re-creates the SurtPrefixSet to include only domain names according to the domain definition of NetarchiveSuite. The NetarchiveSuite can't use the org.archive.crawler.deciderules.OnDomainsDecideRule because it uses a different domain definition.

See Also:
Serialized Form

Nested Class Summary
 
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator
 
Field Summary
static java.lang.String NON_VALID_DOMAIN
          This is what SurtPrefixSet.prefixFromPlain returns for a non valid URI.
static java.util.regex.Pattern SURT_FIRSTPART_PATTERN
          Pattern that matches the first part of SURT - until ??
 
Fields inherited from class org.archive.crawler.deciderules.SurtPrefixedDecideRule
ATTR_ALSO_CHECK_VIA, ATTR_REBUILD_ON_RECONFIG, ATTR_SEEDS_AS_SURT_PREFIXES, ATTR_SURTS_DUMP_FILE, ATTR_SURTS_SOURCE_FILE, DEFAULT_ALSO_CHECK_VIA, DEFAULT_REBUILD_ON_RECONFIG, surtPrefixes
 
Fields inherited from class org.archive.crawler.deciderules.ConfiguredDecideRule
ALLOWED_TYPES, ATTR_DECISION
 
Fields inherited from class org.archive.crawler.deciderules.DecideRule
ACCEPT, PASS, REJECT
 
Fields inherited from class org.archive.crawler.settings.ComplexType
definition, definitionMap
 
Constructor Summary
OnNSDomainsDecideRule(java.lang.String s)
          Constructor for the class OnNSDomainsDecideRule.
 
Method Summary
static java.lang.String convertToDomain(java.lang.String uri)
          Convert a URI to its domain.
protected  void myBuildSurtPrefixSet()
          Method that rebuilds the SurtPrefixSet to include only topmost domains - according to the domain definition in NetarchiveSuite.
protected  java.lang.String prefixFrom(java.lang.String uri)
          Generate the SURT prefix that matches the domain definition of NetarchiveSuite.
protected  void readPrefixes()
          We override the default readPrefixes, because we want to make our prefixes.
 
Methods inherited from class org.archive.crawler.deciderules.SurtPrefixedDecideRule
addedSeed, buildSurtPrefixSet, dumpSurtPrefixSet, evaluate, getSeedfile, kickUpdate
 
Methods inherited from class org.archive.crawler.deciderules.PredicatedDecideRule
decisionFor
 
Methods inherited from class org.archive.crawler.deciderules.ConfiguredDecideRule
singlePossibleNonPassDecision
 
Methods inherited from class org.archive.crawler.deciderules.DecideRule
getController
 
Methods inherited from class org.archive.crawler.settings.ModuleType
addElement, listUsedFiles
 
Methods inherited from class org.archive.crawler.settings.ComplexType
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, toString, unsetAttribute
 
Methods inherited from class org.archive.crawler.settings.Type
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient
 
Methods inherited from class javax.management.Attribute
getName, hashCode
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

NON_VALID_DOMAIN

public static final java.lang.String NON_VALID_DOMAIN
This is what SurtPrefixSet.prefixFromPlain returns for a non valid URI.

See Also:
Constant Field Values

SURT_FIRSTPART_PATTERN

public static final java.util.regex.Pattern SURT_FIRSTPART_PATTERN
Pattern that matches the first part of SURT - until ??

Constructor Detail

OnNSDomainsDecideRule

public OnNSDomainsDecideRule(java.lang.String s)
Constructor for the class OnNSDomainsDecideRule.

Parameters:
s - The name of this DecideRule
Method Detail

readPrefixes

protected void readPrefixes()
We override the default readPrefixes, because we want to make our prefixes.

Overrides:
readPrefixes in class org.archive.crawler.deciderules.SurtPrefixedDecideRule

myBuildSurtPrefixSet

protected void myBuildSurtPrefixSet()
Method that rebuilds the SurtPrefixSet to include only topmost domains - according to the domain definition in NetarchiveSuite. This is only done once, during the startup phase?


prefixFrom

protected java.lang.String prefixFrom(java.lang.String uri)
Generate the SURT prefix that matches the domain definition of NetarchiveSuite.

Overrides:
prefixFrom in class org.archive.crawler.deciderules.SurtPrefixedDecideRule
Parameters:
uri - URL to convert to SURT
Returns:
String with SURT that matches the domain definition of NetarchiveSuite

convertToDomain

public static java.lang.String convertToDomain(java.lang.String uri)
Convert a URI to its domain.

Parameters:
uri - URL to convert to Top most domain-name according to NetarchiveSuite definition
Returns:
Domain name