dk.netarkivet.harvester.harvesting
Class DomainnameQueueAssignmentPolicy
java.lang.Object
org.archive.crawler.frontier.QueueAssignmentPolicy
org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
dk.netarkivet.harvester.harvesting.DomainnameQueueAssignmentPolicy
public class DomainnameQueueAssignmentPolicy
- extends org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
Using the domain as the queue-name.
The domain is defined as the last two names in the entire hostname or
the entirety of an IP address.
x.y.z -> y.z
y.z -> y.z
nn.nn.nn.nn -> nn.nn.nn.nn
Field Summary |
(package private) static java.lang.String |
DEFAULT_CLASS_KEY
A key used for the cases when we can't figure out the URI. |
Method Summary |
java.lang.String |
getClassKey(org.archive.crawler.framework.CrawlController controller,
org.archive.crawler.datamodel.CandidateURI cauri)
Return a key for queue names based on domain names (last two parts of
host name) or IP address. |
Methods inherited from class org.archive.crawler.frontier.QueueAssignmentPolicy |
maximumNumberOfKeys |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DEFAULT_CLASS_KEY
static final java.lang.String DEFAULT_CLASS_KEY
- A key used for the cases when we can't figure out the URI.
This is taken from parent, where it has private access. Parent returns
this on things like about:blank.
- See Also:
- Constant Field Values
DomainnameQueueAssignmentPolicy
public DomainnameQueueAssignmentPolicy()
getClassKey
public java.lang.String getClassKey(org.archive.crawler.framework.CrawlController controller,
org.archive.crawler.datamodel.CandidateURI cauri)
- Return a key for queue names based on domain names (last two parts of
host name) or IP address. They key may include a # at the end.
- Overrides:
getClassKey
in class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
- Parameters:
controller
- The controller the crawl is running on.cauri
- A potential URI.
- Returns:
- a class key (really an arbitrary string), one of ,
#, or "default...".
- See Also:
HostnameQueueAssignmentPolicy.getClassKey(
org.archive.crawler.framework.CrawlController,
org.archive.crawler.datamodel.CandidateURI)