Page tree
Skip to end of metadata
Go to start of metadata

Overview of the web harvesting history in Denmark.

Broad crawls

Name

Pause-days¹

Start time

Stop time

Number of days

Number of days pr. WHOLE harvest

Bytes Harvested

Bytes pr. Whole harvest

Documents Harvested

Documents pr. WHOLE Harvested

Bytes pr. document

2005-4-10MB


16-12-2005

10-02-2006

56


1.084.082.767.483


48.862.988



2005-4-500MB

10

20-02-2006

30-05-2006

99

155

6.374.278.151.888

7.458.360.919.371

244.320.699

293.183.687

25.439

2006-2-10MB

9

08-06-2006

12-07-2006

34


1.281.775.851.603


56.939.820



2006-2-1GB

76

26-09-2006

18-12-2006

83

117

7.688.670.632.009

8.970.446.483.612

251.650.664

308.590.484

29.069

2007-1-2GB

67

23-02-2007

Interrupted

 

 

6.129.586.061

 

1.355.573

 

 

2007-1a-2GB

73

01-03-2007

14-07-2007

135

135

12.496.411.679.528

12.496.411.679.528

431.565.956

431.565.956

28.956

2007-2-10MB

3

17-07-2007

29-08-2007

43


1.651.476.467.031


72.230.501



2007-2-2GB

16

14-09-2007

09-01-2008

117

160

11.116.847.990.114

12.768.324.457.145

350.645.166

422.875.667

30.194

2008-1-10MB

82

31-03-2008

19-05-2008

49


1.849.483.647.241


78.482.709



2008-1-4GB²

79

05-08-2008

27-01-2009

174

223

18.627.533.172.100

20.477.016.819.341

587.221.620

665.704.329

30.760

2009-1-10MB

30

26-02-2009

06-03-2009

8


2.162.985.398.954


83.279.445



2009-1-4GB

35

10-04-2009

06-07-2009

87

95

20.435.171.532.058

22.598.156.931.012

621.134.022

704.413.467

32.081

2009-2-10MB

14

20-07-2009

31-07-2009

11


2.323.322.214.391


86.228.738



2009-2-6GB

13

13-08-2009

21-12-2009

130

141

22.180.425.491.440

24.503.747.705.831

590.836.809

677.065.547

36.191

2010-1-10MB

44

03-02-2010

19-02-2010

16


2.419.969.931.887


84.727.007



2010-1-6GB

20

11-03-2010

25-06-2010

106

122

26.014.450.885.687

28.434.420.817.574

642.849.219

727.576.226

39.081

2010-2a-10MB

21

16-07-2010

27-08-2010

42


2.436.346.897.279


81.364.820



2010-2a-6GB

31

27-09-2010

01-02-2011

127

169

24.025.609.222.449

26.461.956.119.728

586.338.352

667.703.172

39.631

2011-1-10MB³

48

21-03-2011

05-04-2011

15


2.568.891.852.563


78.579.431



2011-1-8GB³

7

12-04-2011

05-07-2011

84

99

21.412.357.838.088

23.981.249.690.651

468.317.276

546.896.707

43.850

2011-2-10MB

44

18-08-2011

30-08-2011

12


2.638.149.069.320


84.468.450



2011-2-8GB

1

31-08-2011

17-10-2011

47

59

24.835.281.019.954

27.473.430.089.274

545.554.046

630.022.496

43.607

Total:

655



1475

1475



Number of days

%

Since start:


2130

100

Pauses:


655

31

  • ¹ Only days between harvests are counted, that is the time from a harvests finishes to the next harvest begins (eg. not any interruptions).
  • ² Default domain limit was raised from 500MB to 1 GB and 499MB domains to 999MB domains.
  • ³ 2011-1-harvest respected robot.txt, limit for 3.614 domains was lowered from 999MB to 499MB and files where for the first time deleted during the harvest.
  • No labels

1 Comment

  1. The Danish Netarchive not only uses NAS for brad crawls, but also for selective crawls and event crawls.

    Selective crawls: http://netarkivet.dk/om-netarkivet/selektive-hostninger/

    Event crawls: http://netarkivet.dk/om-netarkivet/begivenhedshostninger/