[NAS-2076] TEST1: http://netarkivet.dk/ harvested as part of TEST1 event harvest Created: 02/Jul/12  Updated: 11/Mar/21

Status: Reopened
Project: NetarchiveSuite
Component/s: Test
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Søren Vejrup Carlsen (Inactive) Assignee: Søren Vejrup Carlsen (Inactive)
Resolution: Unresolved  
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

TEST1: http://netarkivet.dk/ harvested as part of TEST1 event harvest
Which were not to happen, but happens, because http://netarkivet.dk/index-da.php is permanently redirected to http://netarkivet.dk/



 Comments   
Comment by Colin Rosenthal [ 07/Mar/18 ]

In the latest test, the front page is not harvested, but the "adgang" page is harvested, which contradicts the test description.

Comment by Colin Rosenthal [ 24/Oct/16 ]

It looks like the relevant parts of the crawl log are

2016-10-24T08:22:03.803Z   200      97184 http://netarkivet.dk/wp-includes/js/jquery/jquery.js?ver=1.12.4 E http://netarkivet.dk/adgang/ application/javascript #035 20161024082202186+162 sha1:A5SSIGDNXPOUYQNPXPLLEYGZ4RVASWAR - duplicate:"3-2-20161019120659383-00000-sb-test-har-001.statsbiblioteket.dk.warc.gz,1528384,20161019120712453"
2016-10-24T08:22:38.067Z   200      28584 http://netarkivet.dk/ EX http://netarkivet.dk/wp-includes/js/jquery/jquery.js?ver=1.12.4 text/html #035 20161024082237539+384 sha1:DOF3QA7RZD7KYGVE7EAEFID3XD7MWF3M - -

The "X" means that the link is speculative. One could try again but without javascrip extraction.

Comment by Søren Vejrup Carlsen (Inactive) [ 05/Sep/12 ]

This still includes more than the frontpages.

Comment by Søren Vejrup Carlsen (Inactive) [ 31/Aug/12 ]

TEST1, subpage https://netarkivet.statsbiblioteket.dk/suite/It11DefineEventHarvest
corrected accordingly

Comment by Søren Vejrup Carlsen (Inactive) [ 02/Jul/12 ]

Suggest http://netarkivet.dk/index-da.php replaced by http://netarkivet.dk/adgang-for-forskere/

Generated at Tue Apr 23 13:31:43 CEST 2024 using Jira 9.4.15#940015-sha1:bdaa9cbecfb6791ea579749728cab771f0dfe90b.