|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjavax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Filter
org.archive.crawler.framework.CrawlScope
org.archive.crawler.deciderules.DecidingScope
dk.netarkivet.harvester.tools.TwitterDecidingScope
public class TwitterDecidingScope
Heritrix CrawlScope that uses the Twitter Search API (https://dev.twitter.com/docs/api/1/get/search) to add seeds to a crawl. The following parameters to twitter search are supported: keywords: a list equivalent twitters "query" text. geo_locations: as defined in the twitter api. language: quivalent to twitter's "lang" parameter. These may be omitted. In practice only "keywords" works well in the current version of twitter. In addition, the number of results to be considered is determined by the parameters "pages" and "twitter_results_per_page".
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
---|
org.archive.crawler.settings.ComplexType.MBeanAttributeInfoIterator |
Field Summary | |
---|---|
static java.lang.String |
ATTR_GEOLOCATIONS
Attribute/value pair. |
static java.lang.String |
ATTR_KEYWORDS
Attribute/value pair. |
static java.lang.String |
ATTR_LANG
Attribute/value pair. |
static java.lang.String |
ATTR_PAGES
Attribute/value pair. |
static java.lang.String |
ATTR_QUEUE_KEYWORD_LINKS
Attribute/value pair specifying whether an html search for the given keyword(s) should also be queued. |
static java.lang.String |
ATTR_QUEUE_LINKS
Attribute/value pair specifying whether embedded links should be queued. |
static java.lang.String |
ATTR_QUEUE_USER_STATUS
Attribute/value pair specifying whether the status of discovered users should be harvested. |
static java.lang.String |
ATTR_QUEUE_USER_STATUS_LINKS
Attribute/value pair specifying whether one should additionally queue all links embedded in a users status. |
static java.lang.String |
ATTR_RESULTS_PER_PAGE
Attribute/value pair. |
(package private) static java.util.logging.Logger |
logger
Logger for this class. |
Fields inherited from class org.archive.crawler.deciderules.DecidingScope |
---|
ATTR_DECIDE_RULES |
Fields inherited from class org.archive.crawler.framework.CrawlScope |
---|
ATTR_NAME, ATTR_REREAD_SEEDS_ON_CONFIG, ATTR_SEEDS, DEFAULT_REREAD_SEEDS_ON_CONFIG, seedListeners |
Fields inherited from class org.archive.crawler.framework.Filter |
---|
ATTR_ENABLED |
Fields inherited from class org.archive.crawler.settings.ComplexType |
---|
definition, definitionMap |
Constructor Summary | |
---|---|
TwitterDecidingScope(java.lang.String name)
Constructor for the method. |
Method Summary | |
---|---|
boolean |
addSeed(org.archive.crawler.datamodel.CandidateURI curi)
Adds a candidate uri as a seed for the crawl. |
void |
initialize(org.archive.crawler.framework.CrawlController controller)
This routine makes any necessary Twitter API calls and queues the content discovered. |
Methods inherited from class org.archive.crawler.deciderules.DecidingScope |
---|
getDecideRule, innerAccepts, kickUpdate |
Methods inherited from class org.archive.crawler.framework.CrawlScope |
---|
addSeedListener, checkClose, getSeedfile, isSameHost, isSeed, listUsedFiles, refreshSeeds, seedsIterator, seedsIterator, toString |
Methods inherited from class org.archive.crawler.framework.Filter |
---|
accepts, getFilterOffPosition, returnTrueIfMatches |
Methods inherited from class org.archive.crawler.settings.ModuleType |
---|
addElement |
Methods inherited from class org.archive.crawler.settings.ComplexType |
---|
addElementToDefinition, checkValue, earlyInitialize, getAbsoluteName, getAttribute, getAttribute, getAttribute, getAttributeInfo, getAttributeInfo, getAttributeInfoIterator, getAttributes, getDataContainerRecursive, getDataContainerRecursive, getDefaultValue, getDescription, getElementFromDefinition, getLegalValues, getLocalAttribute, getMBeanInfo, getMBeanInfo, getParent, getPreservedFields, getSettingsHandler, getUncheckedAttribute, getValue, globalSettings, invoke, isInitialized, isOverridden, iterator, removeElementFromDefinition, setAsOrder, setAttribute, setAttribute, setAttributes, setDescription, setPreservedFields, unsetAttribute |
Methods inherited from class org.archive.crawler.settings.Type |
---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
Methods inherited from class javax.management.Attribute |
---|
getName, hashCode |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
static java.util.logging.Logger logger
public static final java.lang.String ATTR_KEYWORDS
public static final java.lang.String ATTR_PAGES
public static final java.lang.String ATTR_RESULTS_PER_PAGE
public static final java.lang.String ATTR_GEOLOCATIONS
public static final java.lang.String ATTR_LANG
public static final java.lang.String ATTR_QUEUE_LINKS
public static final java.lang.String ATTR_QUEUE_USER_STATUS
public static final java.lang.String ATTR_QUEUE_USER_STATUS_LINKS
public static final java.lang.String ATTR_QUEUE_KEYWORD_LINKS
Constructor Detail |
---|
public TwitterDecidingScope(java.lang.String name)
name
- the name of this scope.Method Detail |
---|
public void initialize(org.archive.crawler.framework.CrawlController controller)
initialize
in class org.archive.crawler.framework.CrawlScope
controller
- The controller for this crawl.public boolean addSeed(org.archive.crawler.datamodel.CandidateURI curi)
addSeed
in class org.archive.crawler.framework.CrawlScope
curi
- The crawl uri to be added.
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |