Class NASEnvironment

    • Constructor Detail

      • NASEnvironment

        public NASEnvironment​(javax.servlet.ServletContext servletContext,
                              javax.servlet.ServletConfig theServletConfig)
                       throws javax.servlet.ServletException
        Throws:
        javax.servlet.ServletException
    • Method Detail

      • getResourceAsString

        public java.lang.String getResourceAsString​(java.lang.String resource)
                                             throws java.io.IOException
        Throws:
        java.io.IOException
      • start

        public void start()
      • cleanup

        public void cleanup()
        Do some cleanup. This waits for the different workflow threads to stop running.
      • replaceH3HostnamePortRegexList

        public void replaceH3HostnamePortRegexList​(java.util.List<java.lang.String> h3HostnamePortRegexList,
                                                   java.util.List<java.lang.String> invalidPatternsList)
      • getCrawledUrls

        public java.util.stream.Stream<java.lang.String> getCrawledUrls​(long jobId,
                                                                        Heritrix3JobMonitor h3Job)
        Get the (attempted) crawled URLs of the crawllog for the running job with the given job id
        Parameters:
        jobId - Id of the running job
        h3Job - Heritrix3JobMonitor from which to get the job for the given jobId
        Returns:
        The (attempted) crawled URLs of the crawllog for given job
      • jobHarvestsDomain

        public boolean jobHarvestsDomain​(long jobId,
                                         java.lang.String domainName,
                                         Heritrix3JobMonitor h3Job)
        Find out whether the given job harvests given domain.
        Parameters:
        jobId - The job
        domainName - The domain
        Returns:
        whether the given job harvests given domain