Friday, January 20, 2012

On the Oracle SCN flaw,5

This is an interesting problem with some complicated solutions.  Since the problem really is unpredictable growth in a very large number with cross infection between critical systems... then there seems to be some fairly straight forward solutions.

The first is what Oracle have done, which is to patch and innoculate recent versions.  Another would be to raise the ceiling of the soft limit... again they have done that.... another would be to expand the hard limit by adding a second number as a multiplier which would allow the hardlimit clock to roll over and turn the SCN into a much larger number.... this reduces the possibility of hitting the hard limit while remediation is made.

The last solution would be to remove the hard synchronisation requirement between databases in large interconnected data centers and instead simply have a synchronisation table for each connected database.  This way there is a stupidity buffer between the instances SCN ( which it only increments ) and the SCN of any other instance.  If the two DB's need to keep in step, then they keep a step difference in a table and do some math as needed.  This way the only thing that increments a db's SCN is the actual transactions that are happening in that db. 

So even if one db is poisoned with a large SCN or during the patching older systems get in a tangle with low soft ceilings then there is no propogation of SCN's through the interconnects. Its just table data that a dba can get in and edit to correct. Then reconnect the sane db's and get on with the buziness.

These interconnect SCN offsets will probably have to be tracked in the logs and reconciled where needed, the point is to stop it being a hard requirement and allow the dba's to set and correct as they need.

The biggest problem is just how fundamental this part of the architecture is. Any changes will take a massive amount of testing and care on Oracles part. 

No comments:

Post a Comment