Uploaded image for project: 'NetarchiveSuite'
  1. NetarchiveSuite
  2. NAS-2044

Display of crawl log lines doesn't scale

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • I53, 4.0
    • 3.18.0
    • Archive
    • None
    • Hide

      Make a harvest with at least 100 entries in the crawl log. On the job-details page check the functionality with different regexp's for crawl-log filtering. For <100 lines it should be displayed in the browser. For >100 lines it will prompt to save to a file. The same functionality should be visible when filtering by seed domain.

      Show
      Make a harvest with at least 100 entries in the crawl log. On the job-details page check the functionality with different regexp's for crawl-log filtering. For <100 lines it should be displayed in the browser. For >100 lines it will prompt to save to a file. The same functionality should be visible when filtering by seed domain.

    Description

      From corresponding NARK issue:

      When we search for .virk\.kb\.dk. for the job 140309 it takes a couple of minutes and gives the start of a random list of crawllog lines and hangs the browser up until it crashes.

      We have no other way than to ask TLR to manually take the crawllog from one of the bitarchive servers to get crawllogs lines from the harvesting jobs of virk.dk

      This is critical for the understanding and solving the problem with Erhvervstyrelsen!

      There may be two related issues i) a scaling issue that prevents the request being processed correctly and ii) a browser issue because the result is too large to be displayed in a browser. These both need to be addressed.

      Attachments

        Activity

          People

            csr Colin Rosenthal
            csr Colin Rosenthal
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: