Package dk.netarkivet.wayback.aggregator

The Aggregator takes care of sorting the raw index files generated by the indexer and merge the files into larger index files usable by Wayback. Once every WaybackSettings.WAYBACK_AGGREGATOR_AGGREGATION_INTERVAL an index aggregation is run. An Aggregation process consists of:
  • All new index files found in the WaybackSettings.WAYBACK_AGGREGATOR_OUTPUT_DIR are sorted and merged into a temp intermediate index file.
  • The temp intermediate index file is merged into the working intermediate index
  • If the Intermediate Index file size exceeds WaybackSettings.WAYBACK_AGGREGATOR_MAX_INTERMEDIATE_INDEX_FILE_SIZE the following sequence occures
    • The currently active wayback index file is checked to see if it can contain indexes from the new intermediate file, without reaching the WaybackSettings.WAYBACK_AGGREGATOR_MAX_MAIN_INDEX_FILE_SIZE limit
    • If the limit isn't reach the intermediate index file is merged into the active index file ('wayback.index')
    • Else the main index file is renamed with a unique name containing the current timestamp and a new wayback main index file is created. This is set to be the active index file, and the intermediate indexes are merged to this file.
  • The original unsorted index files are deleted