Class NASFetchDNS

  • All Implemented Interfaces:
    org.archive.checkpointing.Checkpointable, org.archive.spring.HasKeyedProperties, org.springframework.beans.factory.Aware, org.springframework.beans.factory.BeanNameAware, org.springframework.context.Lifecycle

    public class NASFetchDNS
    extends org.archive.modules.fetcher.FetchDNS
    Extended FetchDNS processor which allows the override of hosts to be used before they are querying through a DNS server.
    Author:
    nicl
    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected boolean acceptDefinedHosts
      Look for hosts in the hosts file/text value before doing a DNS lookup.
      protected Map<String,​String> hosts
      Map of hosts that override the normal DNS lookup.
      protected org.archive.io.ReadSource hostsFile
      Text from which to load hosts
      protected org.archive.io.ReadSource hostsSource
      Text from which to look for hosts.
      • Fields inherited from class org.archive.modules.fetcher.FetchDNS

        digestAlgorithm, serverCache, serverInetAddr
      • Fields inherited from class org.archive.modules.Processor

        beanName, isRunning, kp, recoveryCheckpoint, uriCount
    • Constructor Summary

      Constructors 
      Constructor Description
      NASFetchDNS()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      boolean getAcceptDefinedHosts()  
      protected void getHosts​(org.archive.io.ReadSource hostsSource)
      Run through the lines in a ReadSource and add all valid host lines encountered.
      org.archive.io.ReadSource getHostsFile()  
      org.archive.io.ReadSource getHostsSource()  
      protected void innerProcess​(org.archive.modules.CrawlURI curi)  
      protected void reload()
      Clear loaded hosts of reload from hosts file and value text.
      void setAcceptDefinedHosts​(boolean acceptDefinedHosts)  
      void setHostsFile​(org.archive.io.ReadSource hostsFile)  
      void setHostsSource​(org.archive.io.ReadSource hostsSource)  
      static int tokenize​(String str, String[] tokensArr)
      Split input string into tokens.
      • Methods inherited from class org.archive.modules.fetcher.FetchDNS

        getAcceptNonDnsResolves, getDigestAlgorithm, getDigestContent, getDisableJavaDnsResolves, getDNSRecord, getFirstARecord, getServerCache, isQuadAddress, recordDNS, setAcceptNonDnsResolves, setDigestAlgorithm, setDigestContent, setDisableJavaDnsResolves, setServerCache, setUnresolvable, shouldProcess, storeDNSRecord
      • Methods inherited from class org.archive.modules.Processor

        doCheckpoint, finishCheckpoint, flattenVia, fromCheckpointJson, getBeanName, getEnabled, getKeyedProperties, getRecordedSize, getShouldProcessRule, getURICount, hasHttpAuthenticationCredential, innerProcessResult, innerRejectProcess, isRunning, isSuccess, process, report, setBeanName, setEnabled, setRecoveryCheckpoint, setShouldProcessRule, start, startCheckpoint, stop, toCheckpointJson
    • Field Detail

      • acceptDefinedHosts

        protected boolean acceptDefinedHosts
        Look for hosts in the hosts file/text value before doing a DNS lookup.
      • hostsFile

        protected org.archive.io.ReadSource hostsFile
        Text from which to load hosts
      • hostsSource

        protected org.archive.io.ReadSource hostsSource
        Text from which to look for hosts.
      • hosts

        protected Map<String,​String> hosts
        Map of hosts that override the normal DNS lookup.
    • Constructor Detail

      • NASFetchDNS

        public NASFetchDNS()
    • Method Detail

      • getAcceptDefinedHosts

        public boolean getAcceptDefinedHosts()
      • setAcceptDefinedHosts

        public void setAcceptDefinedHosts​(boolean acceptDefinedHosts)
      • getHostsFile

        public org.archive.io.ReadSource getHostsFile()
      • setHostsFile

        public void setHostsFile​(org.archive.io.ReadSource hostsFile)
      • getHostsSource

        public org.archive.io.ReadSource getHostsSource()
      • setHostsSource

        public void setHostsSource​(org.archive.io.ReadSource hostsSource)
      • innerProcess

        protected void innerProcess​(org.archive.modules.CrawlURI curi)
        Overrides:
        innerProcess in class org.archive.modules.fetcher.FetchDNS
      • reload

        protected void reload()
        Clear loaded hosts of reload from hosts file and value text.
      • getHosts

        protected void getHosts​(org.archive.io.ReadSource hostsSource)
        Run through the lines in a ReadSource and add all valid host lines encountered.
        Parameters:
        hostsSource - hosts file or value text
      • tokenize

        public static int tokenize​(String str,
                                   String[] tokensArr)
        Split input string into tokens. Treats multiple whitespace as one. Only parse the number of tokens that are able to fit into the supplied token array.
        Parameters:
        str - split input string into tokens
        tokensArr - supply a string array to be filled with tokens
        Returns:
        number of tokens inserted into the token array