Class ExtendedDNSFetcher

  • All Implemented Interfaces:
    org.archive.checkpointing.Checkpointable, org.archive.spring.HasKeyedProperties, org.springframework.beans.factory.Aware, org.springframework.beans.factory.BeanNameAware, org.springframework.context.Lifecycle

    public class ExtendedDNSFetcher
    extends org.archive.modules.Processor
    Processor to resolve 'dns:' URIs. Based on version of FetchDNS taken from https://github.com/internetarchive/heritrix3/commit/aee83dfe26ea5a36a4eb3092380e1b0d7b242aab Makes it possible to avoid lookup of bad hostnames, e.g. 'components' or 'www' without any valid domain-information
    Author:
    multiple sample usage: bean id="fetchDns" class="dk.netarkivet.harvester.harvesting.ExtendedDNSFetcher">
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected java.lang.String digestAlgorithm
      Which algorithm (for example MD5 or SHA-1) to use to perform an on-the-fly digest hash of retrieved content-bodies.
      protected org.archive.modules.net.ServerCache serverCache
      Used to do DNS lookups.
      protected java.net.InetAddress serverInetAddr  
      • Fields inherited from class org.archive.modules.Processor

        beanName, isRunning, kp, recoveryCheckpoint, uriCount
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean getAcceptNonDnsResolves()  
      java.lang.String getDigestAlgorithm()  
      boolean getDigestContent()  
      boolean getDisableJavaDnsResolves()  
      protected byte[] getDNSRecord​(long fetchStart, org.xbill.DNS.Record[] rrecordSet)  
      protected org.xbill.DNS.ARecord getFirstARecord​(org.xbill.DNS.Record[] rrecordSet)  
      boolean getPrevalidateHostname()  
      org.archive.modules.net.ServerCache getServerCache()  
      protected void innerProcess​(org.archive.modules.CrawlURI curi)  
      protected boolean isQuadAddress​(org.archive.modules.CrawlURI curi, java.lang.String dnsName, org.archive.modules.net.CrawlHost targetHost)  
      protected void recordDNS​(org.archive.modules.CrawlURI curi, org.xbill.DNS.Record[] rrecordSet)  
      void setAcceptNonDnsResolves​(boolean acceptNonDnsResolves)  
      void setDigestAlgorithm​(java.lang.String digestAlgorithm)  
      void setDigestContent​(boolean digest)  
      void setDisableJavaDnsResolves​(boolean disableJavaDnsResolves)  
      void setPrevalidateHostname​(boolean prevalidateHostname)  
      void setServerCache​(org.archive.modules.net.ServerCache serverCache)  
      protected void setUnresolvable​(org.archive.modules.CrawlURI curi, org.archive.modules.net.CrawlHost host)  
      protected boolean shouldProcess​(org.archive.modules.CrawlURI curi)  
      protected void storeDNSRecord​(org.archive.modules.CrawlURI curi, java.lang.String dnsName, org.archive.modules.net.CrawlHost targetHost, org.xbill.DNS.Record[] rrecordSet)  
      • Methods inherited from class org.archive.modules.Processor

        doCheckpoint, finishCheckpoint, flattenVia, fromCheckpointJson, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerProcessResult, innerRejectProcess, isRunning, isSuccess, process, report, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop, toCheckpointJson
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • serverCache

        protected org.archive.modules.net.ServerCache serverCache
        Used to do DNS lookups.
      • digestAlgorithm

        protected java.lang.String digestAlgorithm
        Which algorithm (for example MD5 or SHA-1) to use to perform an on-the-fly digest hash of retrieved content-bodies. The default is 'sha1'
    • Method Detail

      • getServerCache

        public org.archive.modules.net.ServerCache getServerCache()
      • setServerCache

        @Autowired
        public void setServerCache​(org.archive.modules.net.ServerCache serverCache)
      • setDigestAlgorithm

        public void setDigestAlgorithm​(java.lang.String digestAlgorithm)
      • shouldProcess

        protected boolean shouldProcess​(org.archive.modules.CrawlURI curi)
        Specified by:
        shouldProcess in class org.archive.modules.Processor
      • innerProcess

        protected void innerProcess​(org.archive.modules.CrawlURI curi)
        Specified by:
        innerProcess in class org.archive.modules.Processor
      • storeDNSRecord

        protected void storeDNSRecord​(org.archive.modules.CrawlURI curi,
                                      java.lang.String dnsName,
                                      org.archive.modules.net.CrawlHost targetHost,
                                      org.xbill.DNS.Record[] rrecordSet)
      • isQuadAddress

        protected boolean isQuadAddress​(org.archive.modules.CrawlURI curi,
                                        java.lang.String dnsName,
                                        org.archive.modules.net.CrawlHost targetHost)
      • recordDNS

        protected void recordDNS​(org.archive.modules.CrawlURI curi,
                                 org.xbill.DNS.Record[] rrecordSet)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • getDNSRecord

        protected byte[] getDNSRecord​(long fetchStart,
                                      org.xbill.DNS.Record[] rrecordSet)
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • setUnresolvable

        protected void setUnresolvable​(org.archive.modules.CrawlURI curi,
                                       org.archive.modules.net.CrawlHost host)
      • getFirstARecord

        protected org.xbill.DNS.ARecord getFirstARecord​(org.xbill.DNS.Record[] rrecordSet)