Uploaded image for project: 'WebDanica'
  1. WebDanica
  2. WEBDAN-281

Simple loadSeeds testing tool

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • None

    Description

      This is an idea for a simple loadSeeds testing tool that doesn't check for duplicates in hbase and doesn't insert anything in hbase either. However it does read the blacklists from hbase.

      The steps of the tool is as follows:

      Argument: outlinks file with or without annotations
      Should use PROD webdanica_settings.xml

      pseudo-code:

      for each line in the outlinks file do {
        if line is not acceptable seed, skip seed
        If seed matches any of the enabled blacklists, skip seed
        print out seed
      }
      

      Duplicates are removed by unix command

      sort | uniq
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            svc Søren Vejrup Carlsen (Inactive)
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: