Child pages
  • RSS Harvests

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
            <list>
                <bean class="is.landsbokasafn.crawler.rss.RssFeed">
                    <property name="uri" value="http://www.dr.dk/nyheder/service/feeds/indland" />  <!--RSS url -->
                    <property name="impliedPages">
                        <list>
                            <value>https://www.dr.dk/nyheder/</value>
                            <value>http://www.dr.dk/nyheder/allenyheder/indland</value> <!-- Landing Page -->
                        </list>
                    </property>
                </bean>
                <bean class="is.landsbokasafn.crawler.rss.RssFeed">
                    <property name="uri" value="http://www.dr.dk/nyheder/service/feeds/udland" />  <!--RSS url -->
                    <property name="impliedPages">
                        <list>
                            <value>http://www.dr.dk/nyheder/allenyheder/udland</value>
                        </list>
                    </property>
                </bean>
                <bean class="is.landsbokasafn.crawler.rss.RssFeed">
                    <property name="uri" value="http://www.dr.dk/nyheder/service/feeds/penge" /> <!--RSS url -->
                    <property name="impliedPages">
                        <list>
                            <value>http://www.dr.dk/nyheder/allenyheder/penge</value>
                        </list>
                    </property>
                </bean>
            </list>

...

To use the rss-template one needs to define, for any domain, a configuration with an empty seed list. Strictly speaking, seed lists cannot be completely empty, but a seed list can consist solely of a single comment character "#". Then simple define a harvest configuration harvest using the crawlrss template together with the empty seed list.