From NAS 5.2 onwards, it is possible to harvest RSS feeds using the Crawler RSS (https://github.com/Landsbokasafn/crawlrss) module developed by Kristinn Sigurðsson at the National Library of Iceland. In order to use the module it is necessary to configure the feeds to be harvested in a special crawler-bean template. At present it is not possible to define the seeds of an RSS harvest directly through the NAS GUI. A sample template suitable for use with NAS can be downloaded from https://raw.githubusercontent.com/netarchivesuite/crawlrss/master/src/main/conf/jobs/CrawlRSS-Sample-Profile/netarkivet-crawlrss.dr.dk.cxml .
The template can be customised by replacing this section
with your own list of feeds to be harvested. Associated with each rss-feed uri is a list of implied pages. These can be ordinary html landing pages associated with the feed. By harvesting these together with the rss-feed one can ensure a consistent browsing experience in the harvested data.
To use the rss-template one needs to define, for any domain, a configuration with an empty seed list. Strictly speaking, seed lists cannot be completely empty, but a seed list can consist solely of a single comment character "#". Then simple define a harvest configuration using the crawlrss template together with the empty seed list.