Package dk.netarkivet.wayback.aggregator
The Aggregator takes care of sorting the raw index files generated by the indexer and
merge the files into larger index files usable by Wayback.
Once every
WaybackSettings.WAYBACK_AGGREGATOR_AGGREGATION_INTERVAL
an index aggregation is
run. An Aggregation process consists of:
- All new index files found in the
WaybackSettings.WAYBACK_AGGREGATOR_OUTPUT_DIR
are sorted and merged into a temp intermediate index file. - The temp intermediate index file is merged into the working intermediate index
- If the Intermediate Index file size exceeds
WaybackSettings.WAYBACK_AGGREGATOR_MAX_INTERMEDIATE_INDEX_FILE_SIZE
the following sequence occures- The currently active wayback index file is checked to see if it can contain indexes from the new
intermediate file, without reaching the
WaybackSettings.WAYBACK_AGGREGATOR_MAX_MAIN_INDEX_FILE_SIZE
limit - If the limit isn't reach the intermediate index file is merged into the active index file ('wayback.index')
- Else the main index file is renamed with a unique name containing the current timestamp and a new wayback main index file is created. This is set to be the active index file, and the intermediate indexes are merged to this file.
- The currently active wayback index file is checked to see if it can contain indexes from the new
intermediate file, without reaching the
- The original unsorted index files are deleted
-
Class Summary Class Description AggregationWorker TheAggregationWorker
singleton contains the schedule and file bookkeeping functionality needed in the aggregation of indexes.AggregatorApplication This wrapper class is used to start theAggregationWorker
inside Jetty.IndexAggregator Encapsulates the functionality for sorting and merging index files.