Whether you want to build the software, run it, grow the community or just learn more about it, there will be content, workshops and design sessions for you to attend at the OpenStack Summit, Oct 15-18 in San Diego. Stick around Friday for the first OpenStack service day, a 1/2 day beach cleanup.
This session will include the following subject(s):
Coordinate all the State!:
One of the design tenets of Openstack is "Accept eventual consistency and use it where it is appropriate". This is clearly necessary in a distributed system as OpenStack. However "accepting it" does not make it work. OpenStack (nova in particular) relies mainly on the database for coordination (and it does not use features like foreign keys that might help with that). Using a message queue is also a good idea, but using a MQ + a DB and thus having two state "channels" does not make the problem easier (example: nova-compute service).
Moving out services into separate projects is generally a good idea but it only increases this problem.
I think it is time to talk about state coordination and some guidelines developers could use to increase consistency of internal state of the system. Just adding "#TODO: Sometimes, strange things happen here. Might be a concurrency bug" does not solve the problem.
This problem is hard. But we should start to solve it.
Recovery of instances from inconsistent state:
If some OpenStack service goes down(or is already down), while processing a request, the corresponding instance remains in an inconsistent state (some 'ing' state) for various scenarios. There are a limited set of operations possible on such instances, mostly leaving the instance in an un-usuable state. Such instances also continue utilizing the resources with no productivity. Therefore, they need to be identified, put into a stable state, and release the associated resources if no longer required.