Details
-
Sub-task
-
Resolution: Fixed
-
Minor
-
None
-
None
Description
We need to define how to identify metadata records in the metadata WARC file.
With the existing metadata arcfile, each kind of metadata (logs/reports/setup/cdx) have their own unique URI:
e.g.:
metadata://netarkivet.dk/crawl/setup/duplicatereductionjobs?majorversion=1&minorversion=0&harvestid=1&harvestnum=2&jobid=3 metadata://netarkivet.dk/crawl/setup/crawl-manifest.txt?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/setup/harvestInfo.xml?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/setup/order.xml?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/setup/seeds.txt?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/reports/arcfiles-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/reports/crawl-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/reports/frontier-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/reports/hosts-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/reports/mimetype-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/reports/processors-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/reports/responsecode-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/reports/seeds-report.txt?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/logs/crawl.log?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/logs/heritrix.out?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/logs/heritrix_dmesg.log?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/logs/local-errors.log?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/logs/progress-statistics.log?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/logs/runtime-errors.log?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/logs/uri-errors.log?heritrixVersion=1.14.4&harvestid=1&jobid=3 metadata://netarkivet.dk/crawl/index/cdx?majorversion=1&minorversion=0&harvestid=1&jobid=3×tamp=20120329112720&serialno=00000
Note that the HeritrixVersion, harvestId, jobId are included within the URI as URL parameters.
Attachments
Issue Links
- mentioned in
-
Page Loading...