Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-1727

Ingest domain seed URLs

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Unresolved
    • Major
    • None
    • 3.8
    • None

    Description

      Currently the Definitions/Create Domain UI allows to batch import domains from a file. It would be nice to be able to similarly process a list of seed URLs with the following workflow:
      for each seed {
      check that URL is well formed if not log it
      extract the domain
      if the domain is a subdomain (e.g. blah.gibberish.foo) and root domain is
      not declared in the TLDs log it
      if the domain does not exist in the DB create it
      add the seed to the domain's default seedlist
      }
      Review of this workflow proposal will be welcome.

      Attachments

        Activity

          People

            ngiraud Nicolas Giraud (Inactive)
            ngiraud Nicolas Giraud (Inactive)
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated: