Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 18 Next »

Update Tracking

This document describes how the backend services should use to view to detect changes in records.

Motivation

Given that we want to work on Views of data, we want to be able to monitor when an object View has changed. We say that an object View has changed if any of the objects in the View have changed. We use the term Record to denote the set of objects in a view.

States in Fedora get special treatment for Records. A Record is considered Active if all objects in the record are Active. A Record is considered Deleted if the entry object is Deleted. In all other combinations the Record is considered Inactive.

We want to be able to return all Records in a given Collection for a given State that have been modified after a given Time. To do this, we maintain a database of Views that is updated on all changes of an object.

Finding Changed objects

To find changed records we will ask for a set of entry objects with the following criteria

  • collectionPid: The id of the collection containing the objects
  • state: The state of the entry object
  • viewAngle: The viewangle which the entry objects must be entries in
  • offset: Disregard any entries not modified after this timestamp
  • limit: The number of entry objects to return

We will then get a list of records, sorted by the timestamps with these fields

  • entryPid: The pid of the entry object
  • timestamp: The timestamp of the last change to this record
  • collectionPid: The collection pid, as given in the parameter. This is not currently used for anything
  • contentModelPid: The content model which denoted this object as a entry object for the given view. This is not currently used for anything

. For each entry pid, we can construct the Record by the method CalculateView, as it looked at the given timestamp.

Maintaining state

Whenever one of the objects in a Record is changed, the whole Record counts as updated. As such, any services that subscribe to the Repository in any way need to be notified. If there is a search index for the Records, and one is updated, its state in the index must be recomputed.

The problem arrives when trying to do this. The View system is designed to ease the computing of a Record when knowing the Entry object. The reverse is finding the Records, ie. the Entry objects, that have this data object in their View. Rather than encoding this information in the model, we chose to keep an external record of all the views.

The external record will be a database. It will have two tables.

The first table, ENTRIES, will have these columns

  • entryPid: This is the pid of the entry object
  • viewAngle: This is the viewangle
  • state: This is the fedora object state
  • dateForChange: This is the timestamp when this row was created
  • collectionPid: This is the collection this entry object is part of
  • contentModelPid: This is the content model that marked this object as an entry object in this view angle. It is not used in any of the queries, but must be part of the return value, but currently without any uses

EntryPid, viewAngle and State will form an unique key. Notice that this means that for each entryPid,viewAngle,State triplet, there can be only one dateForChange,collectionPid,contentModelPid triplet. As such, we presently require that objects are only part of ONE collection, and that only one content model denote the object as an entry for each viewAngle (one content model can denote the object as an entry for multiple viewAngles, but two content models can not both denote the object as entry for a given viewAngle).

The second table, OBJECTS, will have these columns

  • objectPid: The pid of this object
  • entryPid: The pid of the entry object that includes this object
  • viewAngle: The name of the view angle by which the entry object includes this object

As such, this table contains part of the unique key to the ENTRIES table. 

Changing an object and marking the view as updated

Basically, we need three kinds of operations to handle updates:

  • We need to update the time for when a bundle was last updated. We'll call this "updateTimestamps"
  • We need to update which bundles in which states exist. We'll call this "modifyState"
  • We need to update which objects are part of the view. We'll call this "recalculateView"

There are a fixed number of operations that can be done on objects in doms. 

For each of these, this is what should be done on the index as a result

  1. Object Created: The Object was created in DOMS
    Fedora operations: 
    - ingest
    Action:

      modifyState()
      recalculateView()
      updateTimestamps()

     

  2. Object Deleted: The Object was purged from DOMS
    Fedora operations:
    - purgeObject
    Action:

      modifyState('D')
      updateTimestamps()
      if content model
    	for all objects of this class
    		recalculateView()
    		updateTimestamp()

     

  3. Object State Changed: The Object changed state in DOMS
    Fedora operations:
    - modifyObject
    Action:

      modifyState()
      updateTimestamp()

     

  4. Datastream Changed: The Object datastreams changed. Handled differently depending on whether this is the relations datastream
    Fedora operations:
    addDatastream
    - modifyDatastreamByReference
    - modifyDatastreamByValue
    purgeDatastream
    - setDatastreamState
    - setDatastreamVersionable
    updateTimestamp
    Action:

      if RELS-EXT
        recalculateView()
      fi
      updateTimestamp()
      if VIEW and Content Model
    	for all objects of this class
    		recalculateView()
    		updateTimestamp()
      fi 
    
  5. Object Relations Changed: The Object changed in a fashion that DOES require the view to be recomputed.
    Fedora operations:
    - addRelationship
    - purgeRelationship
    Action:

      recalculateView()
      updateTimestamp()
      if content model
    	  for all objects of this class
    		recalculateView()
    		updateTimestamp()


Each of these operations will be elaborated below

modifyState

When an object state changes, we will have to update/add an row to ENTRIES table.

If the object was not previously known in OBJECTS and if the object is an Entry object

  • Create row in ENTRIES and OBJECTS denoting this Record

If this object became Active

  • For each Record (ENTRIES row) containing this object (OBJECTS row with entryPid and ViewAngle),
    • if all objects in the Record is now Active and if the Record was not already Active
      • addsert a row to ENTRIES with state Active 

If this Object became Deleted

  • If this object is an Entry object
    • Delete all rows with this entry pid from OBJECTS
    • Remove all rows with this entry pid and state=Active from ENTRIES
    • change state to Deleted for all rows with this entryPid and state=Inactive from ENTRIES and update timestamp to now.
  • Otherwise
    • Delete all rows with objectPid=this from OBJECTS
    • For each of these entryPid/viewAngle pair 
      • remove old view from OBJECTS
      • recalculate view of entryPid/viewAngle
      • update OBJECTS

Otherwise (object became Inactive): Do nothing (updateTimestamps)

 

modifyState(pid, date, state) {
 
	//TODO we cannot get this if the object is deleted
    List<viewangle,cmpid,collection,isEntry> viewEntries = doms.getViewInfo(pid,date);

    //If this object was previously unknown and is an entry object, add it as a new entry object
    if (!OBJECTS.contains(pid)) {
        foreach (viewEntry : viewEntries) {
			if (viewEntry.isEntry == true) {
            	ENTRIES.add(pid,date,state,viewangle,cmpid,collection)
	            OBJECTS.add(pid,pid,viewangle)
			}
        }
    }
 
	if (state is Active) {
		//Find the DomsObject rows that regard this object.
		//There will be one per entry/viewAngle combination
		domsObjects = OBJECTS.list(objectPid=pid);

		//Find all Entries that include this object
		foreach (domsObject : domsObjects) {
			entries = ENTRIES.list(entryPid=domsObject.entryPid, viewAngle=domsObject.viewAngle);
			foreach ( entry : entries) {
	    	    oldstate = entry.state;
				if (oldState is not Active) {
		        	newstate = calculatestate(result.entryPid, pid, timestamp, state) //TODO! Missing          
			        // If it is set active, remove any deleted entries	
    			    if (newstate = 'A') {
        			    ENTRIES.remove(entryPid=entry.entryPid, state='Deleted');
						ENTRIES.addsert(entry.entryPid,date,'A',entry.viewangle,entry.cmpid,entry.collection)
			    	}
				}
			}
		}
	}

	if (state is Deleted) {
        foreach (viewEntry : viewEntries) {
			if (viewEntry.isEntry == true) {
				OBJECTS.removeAll(entryPid=pid)
				ENTRIES.removeAll(entryPid=pid, viewAngle = viewEntry.viewAngle, state='A')
				ENTRIES.add(entryPid=pid,'D',timestamp=date,viewEntry.*)
			}
        }
		domsObjects = OBJECTS.list(objectPid=pid);

		//Find all Entries that include this object
		foreach (domsObject : domsObjects) {
			entryPid=domsObject.entryPid
			OBJECTS.remove(domsObject)
			recalculateView(entryPid,date)
		}

    }
}

Update Timestamps

For each entryPid/ViewAngle in OBJECTS(objectPid):

  • for each record in ENTRIES(entryPid,viewAngle)
    • switch record.state
      • Inactive: update timestamp
      • Deleted: do nothing
      • Active: if entryPid is currently Active: update timestamp

Recalculate View

An object's relations changed. This could change which objects are in which entry's views. 

If this Object has now become an Entry Object for any ViewAngle

insert row in ENTRIES

insert row in OBJECTS

If this object has now ceased to be an Entry Object for any ViewAngle

remove row from ENTRIES

remove row from OBJECTS

If this object is an Entry object for any viewAngle

update collection field in ENTRIES

So we now knows that the list of entryPid/viewAngle/state rows in ENTRIES is correct. The only way an object can enter or leave a view is if some objects in the view have their relations (or state) changed (this is not entirely true, but we disregard that). We can find all view that this object belongs to, by querying OBJECTS

for each entryPid/viewAngle this object is part of (query OBJECTS)

  • remove old view from OBJECTS
  • recalculate view of entryPid/viewAngle
  • update OBJECTS

There are a number of cases, which are better to discuss now

  1. A view relation is added, meaning that some other object will now be included in the view. We recalculate the view and update OBJECTS, so this will be noticed.
  2. A non-view relation is added, meaning that the view will not change We recalculate the view and update OBJECTS, but this will be a no-change.
  3. A view relation is removed, meaning that the view will now contain one object less. We recalculate the view and update OBJECTS, so this will be noticed.
  4. A non-view relation is added, pointing to an object in another view, which have this relation as an inverseViewRelation, meaning that this object will now be part of that other view. This will not currently be noticed, which is a problem
  5. A content model relation is added. If this content model makes the object an Entry object, this will be noticed. If this object was already part of a view, the view will be recalculated, and thus the change will be noticed. If the object was not part of a view and the change did not make it an entry, it should not be noticed.
  6. A collection relation is added. If this object is or becomes an entry, this will be noticed. Otherwise, it should not be noticed. 

Other kinds of changes we need to handle


Implementation considerations

 

  • For performance reasons, it might be a VERY GOOD IDEA to cache some of the content models lookups, as recalculateView could be way to slow. 
  • "if entryPid is currently Active" can be calculated by querying of the entry pid in ENTRIES have a Active row with the same timestamp as the Inactive row. If the Inactive row have a higher timestamp, the entry have been set inactive. 
  • Especially update Timestamps will have to complete is milliseconds if this design is to work, as otherwise the doms operations will be faster, and with the current load the update tracker would get further and further behind. Whereas DOMS can be multithreaded, the update tracker will have to complete operations sequentially or implement very advanced locking.


Purge of content models are meaningful. Mark as deleted does not matter, as no content model states matter

A content model table, to quickly check if you are a content model and what you are entry for could speed things up a lot

  • No labels