public class SeedUriDomainnameQueueAssignmentPolicy extends org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
DomainnameQueueAssignmentPolicy
where domainname returned is the domainname of the candidateURI
except where the the SeedURI belongs to a different domain.
Using the domain as the queue-name.
The domain is defined as the last two names in the entire hostname or
the entirety of an IP address.
x.y.z -> y.z
y.z -> y.z
nn.nn.nn.nn -> nn.nn.nn.nnConstructor and Description |
---|
SeedUriDomainnameQueueAssignmentPolicy() |
Modifier and Type | Method and Description |
---|---|
String |
getClassKey(org.archive.modules.CrawlURI cauri)
The logic is as follows:
We get try to get the queue-name as the domain-name of the seed.
|
bucketBasis, getDeferToPrevious, getParallelQueues, getSubqueue, setDeferToPrevious, setParallelQueues
getForceQueueAssignment, getKeyedProperties, maximumNumberOfKeys, setForceQueueAssignment
public SeedUriDomainnameQueueAssignmentPolicy()
public String getClassKey(org.archive.modules.CrawlURI cauri)
getClassKey
in class org.archive.crawler.frontier.URIAuthorityBasedQueueAssignmentPolicy
cauri
- The crawl URI from which to find the key.Copyright © 2005–2018 The Royal Danish Library, the National Library of France and the Austrian National Library.. All rights reserved.