Class ToeThread

  • All Implemented Interfaces:
    Runnable, org.archive.io.SinkHandlerLogThread, org.archive.modules.fetcher.HostResolver, org.archive.modules.ProcessorChain.ChainStatusReceiver, org.archive.util.ProgressStatisticsReporter, org.archive.util.Reporter

    public class ToeThread
    extends Thread
    implements org.archive.util.Reporter, org.archive.util.ProgressStatisticsReporter, org.archive.modules.fetcher.HostResolver, org.archive.io.SinkHandlerLogThread, org.archive.modules.ProcessorChain.ChainStatusReceiver
    One "worker thread"; asks for CrawlURIs, processes them, repeats unless told otherwise.
    Author:
    Gordon Mohr
    • Constructor Detail

      • ToeThread

        public ToeThread​(org.archive.crawler.framework.ToePool g,
                         int sn)
        Create a ToeThread
        Parameters:
        g - ToeThreadGroup
        sn - serial number
    • Method Detail

      • atProcessor

        public void atProcessor​(org.archive.modules.Processor proc)
        Specified by:
        atProcessor in interface org.archive.modules.ProcessorChain.ChainStatusReceiver
      • getSerialNumber

        public int getSerialNumber()
        Specified by:
        getSerialNumber in interface org.archive.io.SinkHandlerLogThread
        Returns:
        Return toe thread serial number.
      • getController

        public org.archive.crawler.framework.CrawlController getController()
        Get the CrawlController acossiated with this thread.
        Returns:
        Returns the CrawlController.
      • kill

        protected void kill()
        Terminates a thread.

        Calling this method will ensure that the current thread will stop processing as soon as possible (note: this may be never). Meant to 'short circuit' hung threads.

        Current crawl uri will have its fetch status set accordingly and will be immediately returned to the frontier.

        As noted before, this does not ensure that the thread will stop running (ever). But once evoked it will not try and communicate with other parts of crawler and will terminate as soon as control is established.

      • getStep

        public Object getStep()
        Returns:
        Current step (For debugging/reporting, give abstract step where this thread is).
      • isActive

        public boolean isActive()
        Is this thread validly processing a URI, not paused, waiting for a URI, or interrupted?
        Returns:
        whether thread is actively processing a URI
      • retire

        public void retire()
        Request that this thread retire (exit cleanly) at the earliest opportunity.
      • shouldRetire

        public boolean shouldRetire()
        Whether this thread should cleanly retire at the earliest opportunity.
        Returns:
        True if should retire.
      • reportTo

        public void reportTo​(PrintWriter pw)
        Compiles and returns a report on its status.
        Specified by:
        reportTo in interface org.archive.util.Reporter
        Parameters:
        pw - Where to print.
      • reportThread

        public static void reportThread​(Thread t,
                                        PrintWriter pw)
        Parameters:
        t - Thread
        pw - PrintWriter
      • shortReportMap

        public Map<String,​Object> shortReportMap()
        Specified by:
        shortReportMap in interface org.archive.util.Reporter
      • shortReportLineTo

        public void shortReportLineTo​(PrintWriter w)
        Specified by:
        shortReportLineTo in interface org.archive.util.Reporter
        Parameters:
        w - PrintWriter to write to.
      • shortReportLegend

        public String shortReportLegend()
        Specified by:
        shortReportLegend in interface org.archive.util.Reporter
      • shortReportLine

        public String shortReportLine()
      • progressStatisticsLine

        public void progressStatisticsLine​(PrintWriter writer)
        Specified by:
        progressStatisticsLine in interface org.archive.util.ProgressStatisticsReporter
      • progressStatisticsLegend

        public void progressStatisticsLegend​(PrintWriter writer)
        Specified by:
        progressStatisticsLegend in interface org.archive.util.ProgressStatisticsReporter
      • getCurrentProcessorName

        public String getCurrentProcessorName()
        Specified by:
        getCurrentProcessorName in interface org.archive.io.SinkHandlerLogThread
      • resolve

        public InetAddress resolve​(String host)
        Specified by:
        resolve in interface org.archive.modules.fetcher.HostResolver