public class NASSurtPrefixedDecideRule extends org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
SurtPrefixedDecideRule
class.
Enable SURT seeds that allow sub-domains if the original seed URI has no path at all.
Can also add/remove www/www[Modifier and Type | Field and Description |
---|---|
protected boolean |
addBeforeAddingW3SubDomain
Enable/Disable the adding of the original SURT before adding a preceding www.
|
protected boolean |
addBeforeRemovingW3xSubDomain
Enable/Disable the adding of the original SURT before removing the preceding www[
|
protected boolean |
addW3SubDomain
Enable/Disable the adding of a preceding www in SURT host if none is present.
|
protected boolean |
allowSubDomainsRewrite
Enable/Disable the removing of ')/' in the SURT if the original URI does not have a path at all.
|
protected boolean |
removeW3xSubDomain
Enable/Disable the removing of a preceding www[
|
Constructor and Description |
---|
NASSurtPrefixedDecideRule() |
Modifier and Type | Method and Description |
---|---|
void |
addedSeed(org.archive.modules.CrawlURI curi) |
protected String |
addedSeedImpl(org.archive.modules.CrawlURI curi)
addedSeed
|
boolean |
getAddBeforeAddingW3SubDomain() |
boolean |
getAddBeforeRemovingW3xSubDomain() |
boolean |
getAddW3SubDomain() |
boolean |
getAllowSubDomainsRewrite() |
boolean |
getRemoveW3xSubDomain() |
void |
setAddBeforeAddingW3SubDomain(boolean addBeforeAddingW3SubDomain) |
void |
setAddBeforeRemovingW3xSubDomain(boolean addBeforeRemovingW3xSubDomain) |
void |
setAddW3SubDomain(boolean addW3SubDomain) |
void |
setAllowSubDomainsRewrite(boolean allowSubDomainsRewrite) |
void |
setRemoveW3xSubDomain(boolean removeW3xSubDomain) |
protected String |
subDomainsRewrite(String path,
String originalUri,
String scheme,
String surtHost,
String port,
String surt)
Method to rewrite the SURT to allow sub-domains if the original URI does not have a path at all.
|
buildSurtPrefixSet, concludedSeedBatch, doCheckpoint, dumpSurtPrefixSet, evaluate, finishCheckpoint, getAlsoCheckVia, getSeeds, getSeedsAsSurtPrefixes, getSurtsDumpFile, getSurtsSource, getSurtsSourceFile, nonseedLine, onApplicationEvent, prefixFrom, readPrefixes, setAlsoCheckVia, setBeanName, setRecoveryCheckpoint, setSeeds, setSeedsAsSurtPrefixes, setSurtsDumpFile, setSurtsSource, setSurtsSourceFile, startCheckpoint
getDecision, innerDecide, setDecision
protected boolean removeW3xSubDomain
protected boolean addBeforeRemovingW3xSubDomain
protected boolean addW3SubDomain
protected boolean addBeforeAddingW3SubDomain
protected boolean allowSubDomainsRewrite
public NASSurtPrefixedDecideRule()
public boolean getRemoveW3xSubDomain()
public void setRemoveW3xSubDomain(boolean removeW3xSubDomain)
public boolean getAddBeforeRemovingW3xSubDomain()
public void setAddBeforeRemovingW3xSubDomain(boolean addBeforeRemovingW3xSubDomain)
public boolean getAddW3SubDomain()
public void setAddW3SubDomain(boolean addW3SubDomain)
public boolean getAddBeforeAddingW3SubDomain()
public void setAddBeforeAddingW3SubDomain(boolean addBeforeAddingW3SubDomain)
public boolean getAllowSubDomainsRewrite()
public void setAllowSubDomainsRewrite(boolean allowSubDomainsRewrite)
public void addedSeed(org.archive.modules.CrawlURI curi)
addedSeed
in interface org.archive.modules.seeds.SeedListener
addedSeed
in class org.archive.modules.deciderules.surt.SurtPrefixedDecideRule
protected String addedSeedImpl(org.archive.modules.CrawlURI curi)
addedSeed
curi
- CrawlURI
object to convertprotected String subDomainsRewrite(String path, String originalUri, String scheme, String surtHost, String port, String surt)
path
- SURT path stringoriginalUri
- original URIscheme
- SURT scheme stringsurtHost
- SURT host as comma separated list of namessurt
- URI converted to SURT by the default Heritrix meansCopyright © 2005–2016 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.