public class SeedUriDomainnameQueueAssignmentPolicy extends org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
DomainnameQueueAssignmentPolicy
where domainname returned is the domainname
of the candidateURI except where the domainname of the SeedURI is a different one.
Using the domain as the queue-name. The domain is defined as the last two names in the entire hostname or the entirety of an IP address. x.y.z -> y.z y.z -> y.z nn.nn.nn.nn -> nn.nn.nn.nn
Constructor and Description |
---|
SeedUriDomainnameQueueAssignmentPolicy() |
Modifier and Type | Method and Description |
---|---|
String |
getClassKey(org.archive.crawler.framework.CrawlController controller,
org.archive.crawler.datamodel.CandidateURI cauri)
Return a key for queue names based on domain names (last two parts of host name) or IP address.
|
public SeedUriDomainnameQueueAssignmentPolicy()
public String getClassKey(org.archive.crawler.framework.CrawlController controller, org.archive.crawler.datamodel.CandidateURI cauri)
getClassKey
in class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
controller
- The controller the crawl is running on.cauri
- A potential URI.HostnameQueueAssignmentPolicy.getClassKey(org.archive.crawler.framework.CrawlController,
org.archive.crawler.datamodel.CandidateURI)
Copyright © 2005–2016 The Royal Danish Library, the Danish State and University Library, the National Library of France and the Austrian National Library.. All rights reserved.