Interface DeduplicateToCDXAdapterInterface

    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      java.lang.String adaptLine​(java.lang.String line)
      Takes a deduplicate line from a crawl log and converts it to a line in a cdx file suitable for searching in wayback.
      void adaptStream​(java.io.InputStream is, java.io.OutputStream os)
      Scans an input stream from a crawl log and converts all lines containing deduplicate information to cdx records which it outputs to an output stream.
    • Method Detail

      • adaptLine

        java.lang.String adaptLine​(java.lang.String line)
        Takes a deduplicate line from a crawl log and converts it to a line in a cdx file suitable for searching in wayback. The target url in the line is canonicalized by this method. The type of canonicalization is determined by the default canonicalizer from the wayback settings.xml file. If the input String is not a crawl-log duplicate line, null is returned.
        Parameters:
        line - a line from a crawl log
        Returns:
        a line for a cdx file or null if the input is not a duplicate line
      • adaptStream

        void adaptStream​(java.io.InputStream is,
                         java.io.OutputStream os)
        Scans an input stream from a crawl log and converts all lines containing deduplicate information to cdx records which it outputs to an output stream.
        Parameters:
        is - the input stream
        os - the output stream