Class OnNSDomainsDecideRule
- java.lang.Object
-
- org.archive.modules.deciderules.DecideRule
-
- org.archive.modules.deciderules.PredicatedDecideRule
-
- org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
- dk.netarkivet.harvester.harvesting.OnNSDomainsDecideRule
-
- All Implemented Interfaces:
Serializable
,EventListener
,org.archive.checkpointing.Checkpointable
,org.archive.modules.seeds.SeedListener
,org.archive.spring.HasKeyedProperties
,org.springframework.beans.factory.Aware
,org.springframework.beans.factory.BeanNameAware
,org.springframework.context.ApplicationListener<org.springframework.context.ApplicationEvent>
public class OnNSDomainsDecideRule extends org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
Class that re-creates the SurtPrefixSet to include only domain names according to the domain definition of NetarchiveSuite. The NetarchiveSuite can't use the org.archive.crawler.deciderules.OnDomainsDecideRule because it uses a different domain definition.- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description static String
NON_VALID_DOMAIN
This is what SurtPrefixSet.prefixFromPlain returns for a non valid URI.static Pattern
SURT_FIRSTPART_PATTERN
Pattern that matches the first part of SURT - until ??
-
Constructor Summary
Constructors Constructor Description OnNSDomainsDecideRule()
Constructor for the class OnNSDomainsDecideRule.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static String
convertToDomain(String uri)
Convert a URI to its domain.protected void
myBuildSurtPrefixSet()
Method that rebuilds the SurtPrefixSet to include only topmost domains - according to the domain definition in NetarchiveSuite.protected String
prefixFrom(String uri)
Generate the SURT prefix that matches the domain definition of NetarchiveSuite.protected void
readPrefixes()
We override the default readPrefixes, because we want to make our prefixes.-
Methods inherited from class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
addedSeed, buildSurtPrefixSet, concludedSeedBatch, doCheckpoint, dumpSurtPrefixSet, evaluate, finishCheckpoint, getAlsoCheckVia, getSeeds, getSeedsAsSurtPrefixes, getSurtsDumpFile, getSurtsSource, getSurtsSourceFile, nonseedLine, onApplicationEvent, setAlsoCheckVia, setBeanName, setRecoveryCheckpoint, setSeeds, setSeedsAsSurtPrefixes, setSurtsDumpFile, setSurtsSource, setSurtsSourceFile, startCheckpoint
-
Methods inherited from class org.archive.modules.deciderules.PredicatedDecideRule
getDecision, innerDecide, onlyDecision, setDecision
-
-
-
-
Field Detail
-
NON_VALID_DOMAIN
public static final String NON_VALID_DOMAIN
This is what SurtPrefixSet.prefixFromPlain returns for a non valid URI.- See Also:
- Constant Field Values
-
SURT_FIRSTPART_PATTERN
public static final Pattern SURT_FIRSTPART_PATTERN
Pattern that matches the first part of SURT - until ??
-
-
Constructor Detail
-
OnNSDomainsDecideRule
public OnNSDomainsDecideRule()
Constructor for the class OnNSDomainsDecideRule. Makes the configured decision for any URI which is inside one of the domains in the configured set of domains - according to the domain definition of the NetarchiveSuite system. Giving that e.g. sports.tv2.dk will resolve to tv2.dk but www.bbc.co.uk will resolve to bbc.co.uk"
-
-
Method Detail
-
readPrefixes
protected void readPrefixes()
We override the default readPrefixes, because we want to make our prefixes.- Overrides:
readPrefixes
in classorg.archive.modules.deciderules.surt.SurtPrefixedDecideRule
-
myBuildSurtPrefixSet
protected void myBuildSurtPrefixSet()
Method that rebuilds the SurtPrefixSet to include only topmost domains - according to the domain definition in NetarchiveSuite. This is only done once, during the startup phase?
-
prefixFrom
protected String prefixFrom(String uri)
Generate the SURT prefix that matches the domain definition of NetarchiveSuite.- Overrides:
prefixFrom
in classorg.archive.modules.deciderules.surt.SurtPrefixedDecideRule
- Parameters:
uri
- URL to convert to SURT- Returns:
- String with SURT that matches the domain definition of NetarchiveSuite
-
-