There has been buzz around the CAP theorem and its validity in the recent past and so it was brought up with Data Management in the Cloud class that I'm taking at UC Santa Cruz this quarter. I was asked to talk about it in one of the lectures and here's the essence of the discussion.
CAP is a good theoretical way of looking at systems and it definitely makes sense, but its limiting. After reading the proof of the CAP theorem by Gilbert and Lynch, I was left with questions on how I could map it to a practical system. Its a good theoretical paper but it did not tell me much on how to build systems and trade off between C, A and P. From a practical standpoint, the choice really is between CA and AP. A CP system does not make sense. Its like saying "I will give you a consistent answer whenever I give one, but I dont guarantee an answer". There is little practical use for such a system. So, essentially, the choice gets down to the following: Either the system can try to give you consistency all the time and when there is a failure, it gives up availability, which means it does not return requests; or it can give you availability even in case of failures, in which case you cant ask for consistency.
To overcome the shortcomings of the theorem, Daniel Abadi proposed PACELC some time back. I think what he says makes sense. I agree with Abadi's idea of including latency as a parameter, since its not required to constrain the system in all scenarios. Saying that the system is going to give up consistency in every situation does not necessarily make sense. If there is no network partitioning, I have a choice between giving a consistent view of the data (by either reading all replicas and reconciling, or writing all replicas in the write phase and read only one at read time) and answering a request with low latency. This means that the system can be designed to have a low latency write operation (write only one replica - this is what Bigtable does in some sense) or low latency read operation (read from only one replica) or give an always consistent view when the data is read (the Bigtable approach). Note that there is no partitioning of the network involved here and I can choose between the above 3 options. However, when a partition happens, the choice is between the system being available or consistent. This gives more flexibility in terms of the design choices that one needs to make while building a scalable system.
Talking about PACELC and CAP, I quote Ryan Rawson "Its computer science v/s software engineering". I'm still thinking about PACELC and CAP and still not fully convinced if any of these are comprehensive enough to cover all scenarios.



0 comments:
Post a Comment