dk.netarkivet.harvester.harvesting
Class DomainnameQueueAssignmentPolicy

java.lang.Object
  extended by org.archive.crawler.frontier.QueueAssignmentPolicy
      extended by org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
          extended by dk.netarkivet.harvester.harvesting.DomainnameQueueAssignmentPolicy

public class DomainnameQueueAssignmentPolicy
extends org.archive.crawler.frontier.HostnameQueueAssignmentPolicy

Using the domain as the queue-name. The domain is defined as the last two names in the entire hostname or the entirety of an IP address. x.y.z -> y.z y.z -> y.z nn.nn.nn.nn -> nn.nn.nn.nn


Field Summary
(package private) static java.lang.String DEFAULT_CLASS_KEY
          A key used for the cases when we can't figure out the URI.
 
Constructor Summary
DomainnameQueueAssignmentPolicy()
           
 
Method Summary
 java.lang.String getClassKey(org.archive.crawler.framework.CrawlController controller, org.archive.crawler.datamodel.CandidateURI cauri)
          Return a key for queue names based on domain names (last two parts of host name) or IP address.
 
Methods inherited from class org.archive.crawler.frontier.QueueAssignmentPolicy
maximumNumberOfKeys
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_CLASS_KEY

static final java.lang.String DEFAULT_CLASS_KEY
A key used for the cases when we can't figure out the URI. This is taken from parent, where it has private access. Parent returns this on things like about:blank.

See Also:
Constant Field Values
Constructor Detail

DomainnameQueueAssignmentPolicy

public DomainnameQueueAssignmentPolicy()
Method Detail

getClassKey

public java.lang.String getClassKey(org.archive.crawler.framework.CrawlController controller,
                                    org.archive.crawler.datamodel.CandidateURI cauri)
Return a key for queue names based on domain names (last two parts of host name) or IP address. They key may include a # at the end.

Overrides:
getClassKey in class org.archive.crawler.frontier.HostnameQueueAssignmentPolicy
Parameters:
controller - The controller the crawl is running on.
cauri - A potential URI.
Returns:
a class key (really an arbitrary string), one of , #, or "default...".
See Also:
HostnameQueueAssignmentPolicy.getClassKey( org.archive.crawler.framework.CrawlController, org.archive.crawler.datamodel.CandidateURI)