Tuesday, August 17, 2010

Ceph as an alternative file system for Hadoop

Ceph has been gaining ground as a quality file system and is under active development. It is on its path to becoming a strong contendor to replace existing systems like HDFS etc. I co-authored an article for the Usenix ;login: August edition on Ceph as an alternative file system for Hadoop. You can read it here.

HBase @ Hadoop Day Seattle

Tuesday, May 18, 2010

Comparing PNUTS, HBase and Cassandra


A lot of NoSQL systems have been sprouting up recently and an increasing number of people are using NoSQL data stores and moving away from RDBMS systems. There's nothing wrong with relational database systems but they are optimized for certain use cases, which they handle very well. NoSQL systems (Bigtable, Dynamo, PNUTS, CouchDB, MongoDB, Keyspace, to name a few) solve different sets of problems, for which they are best suited for.

Recently, in a course that I'm taking at UC Santa Cruz, I got a chance to present the PNUTS paper and compare the system with Bigtable and Dynamo.

At a high level, here's how these systems compare:
(This post talks about HBase instead of Bigtable and Cassandra instead of Dynamo)




HBase

Cassandra

PNUTS

Consistency Model

Fully consistent.

Eventual consistency. Divergent version trees of the same row can exist. Client can trade off between latency and consistency.

Timeline consistency. All versions of a row honor a timeline and there are no divergent version trees.

ACID Semantics

put() call is atomic at a row level. There is no concept of transactions and no notion of consistency between rows.

Scans don't give a consistent view of the table. However, any row returned by a scan will be a consistent view of that given row. Any row updated with a timestamp older than the scanner initialization timestamp may show up in the results.

(Details available in the HBASE- 2294 jira)

None specified

Write call gives the same ACID semantics as a transaction involving a single row

Data Model

Tabular, column oriented.

Table consists of column families and each family has multiple columns. The schemas are flexible and there can be arbitrary columns in any given family. However, the families are specified on table creation. Different versions can be stored for each cell.

Storage model is columnar and is strictly ordered on the rows.

Similar to HBase. Dynamo on the other hand is a key value store. Cassandra has the Bigtable data model over the Dynamo
P2P architecture. Cassandra also has super columns, which are like columns families within a column family.

Tabular, row oriented.

The schemas are flexible and a row can have arbitrary columns, with some being empty as well. Each node stores only a single version of any given row but different versions can exist across the cluster.

Storage model is row oriented.

Underlying Storage

HDFS or any other distributed file system

Node's local storage

Node's local storage (Choice of Hash table or Ordered table)

Replication

Asynchronous.
Data is replicated by the file system when it is persisted

Choice between Synchronous and Asynchronous for each update.

Asynchronous.
Data is written to the master copy, which propagates it to the message broker, which takes care of replication

Fault Tolerance

Regions are restarted (on the same node or any other) if they crash.  If a region server dies, its regions
are distributed to the other servers that are functional. No re-allocation of data takes place.

Updates are first logged into a Write Ahead Log before they go to the  memstore. One WAL is maintained per region server.

Data from the failing node is re-assigned to the next node in the consistent hashing
circle.

Updates are first written to a Write Ahead Log before committed to the table.

If the master for a given record fails, either a new master is elected or the write fails. It is never the case that a write will go to a node that is not the
master for a record.

The message broker does logging when it receives the update from the master. There is no logging done at the individual nodes.

Scalability

1000s of nodes.
Each table
can have millions of columns and billions of rows

10-100s of nodes

10s of sites with 1000s of nodes. Since it is row oriented storage, the number of columns would not be very high (no numbers reported in the papers)

Optimized for

Writes, Scans. The writes are kept in memory (in the memstore) and flushed to disk in chunks, which gives good write performance.

Writes.

Poor scan performance as compared to the other two systems. [5]

Reads. The client has the option to read from a copy which is geographically close and
can define the level of consistency desired.

Where does it fit?

If you want a scalable system that is deployed in a single data center and you don't care about network partitions, HBase is your friend. If you cannot tolerate loose consistency in data, this is the best option.

Update:
If you already have Hadoop and want to be able to read/write small objects quickly and also run some analytics over the data, HBase is the way to go.

If you can deal with eventual consistency and want a highly available system that can span across data centers, Cassandra is your friend. As of now, Cassandra is easier to get off the ground than HBase and has lesser components that you need to get running to begin with. Contrary to the popular belief, Cassandra has more moving parts than HBase but they are managed by the framework and user does not need to worry much.

Update:
If you dont have Hadoop and dont need it but you just need a database that scales beyond a single node, Cassandra is your friend. Keep in mind that there is no SQL here. For HBase, you need to also deploy Hadoop (HDFS atleast), which is not required by Cassandra.

If you want the system to be geographically distributed in order to serve large number of reads with low latency from across the globe, PNUTS is your friend.  You also get fine-grained control over consistency and can trade it with low latency for reads.
Access options

Native Java API, Jython, Groovy DSL, Scala, REST, Thrift

Native Java API, Ruby, Perl, Python, Scala, PHP, Clojure, Grails, C++, C#



If you are out there looking for a scalable database solution, you've got quite a few choices. These are just some of the more popular ones. PNUTS is not open source but is a nice system from an architectural stand point and thats why I talk about it in this post.

References:
[1] HBase: http://hadoop.apache.org/hbase/
[2] Cassandra: http://cassandra.apache.org/
[3] Bigtable: http://labs.google.com/papers/bigtable.html
[4] Dynamo: http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
[5] Yahoo! YCSB Benchmark: http://research.yahoo.com/node/3202
[6] Lars George's description of the HBase Architecture: http://www.larsgeorge.com/2010/01/hbase-architecture-101-write-ahead-log.html

Thursday, May 06, 2010

My thoughts on the CAP theorem

There has been buzz around the CAP theorem and its validity in the recent past and so it was brought up with Data Management in the Cloud class that I'm taking at UC Santa Cruz this quarter. I was asked to talk about it in one of the lectures and here's the essence of the discussion.

CAP is a good theoretical way of looking at systems and it definitely makes sense, but its limiting. After reading the proof of the CAP theorem by Gilbert and Lynch, I was left with questions on how I could map it to a practical system. Its a good theoretical paper but it did not tell me much on how to build systems and trade off between C, A and P. From a practical standpoint, the choice really is between CA and AP. A CP system does not make sense. Its like saying "I will give you a consistent answer whenever I give one, but I dont guarantee an answer". There is little practical use for such a system. So, essentially, the choice gets down to the following: Either the system can try to give you consistency all the time and when there is a failure, it gives up availability, which means it does not return requests; or it can give you availability even in case of failures, in which case you cant ask for consistency.

To overcome the shortcomings of the theorem, Daniel Abadi proposed PACELC some time back. I think what he says makes sense. I agree with Abadi's idea of including latency as a parameter, since its not required to constrain the system in all scenarios. Saying that the system is going to give up consistency in every situation does not necessarily make sense. If there is no network partitioning, I have a choice between giving a consistent view of the data (by either reading all replicas and reconciling, or writing all replicas in the write phase and read only one at read time) and answering a request with low latency. This means that the system can be designed to have a low latency write operation (write only one replica - this is what Bigtable does in some sense) or low latency read operation (read from only one replica) or give an always consistent view when the data is read (the Bigtable approach). Note that there is no partitioning of the network involved here and I can choose between the above 3 options. However, when a partition happens, the choice is between the system being available or consistent. This gives more flexibility in terms of the design choices that one needs to make while building a scalable system.

Talking about PACELC and CAP, I quote Ryan Rawson "Its computer science v/s software engineering". I'm still thinking about PACELC and CAP and still not fully convinced if any of these are comprehensive enough to cover all scenarios.

Thursday, June 05, 2008

A new member in the family...

We just got a new member into the family... Notty... a cute little golden retriever pup! :)

The sadness and pain of Pluto passing away had not even sunk into me when his vet called up and said "I have a cute little golden retriever pup for you. Are you ready for it?" Dad's immediate reply was "No.. not yet.. And we would want a smaller dog." I wasnt quite sure of what I was feeling about this whole thing. Pluto not being with us anymore was something that had not yet sunk into me. I didnt know how to react, didnt know what I was feeling. Whether I was happy that the pain he was going through came to an end, or whether I was sad about him not being with us anymore, I really dont know... But before I could figure out what I was feeling, I had already begun to think beyond it. In the midst of all the chaos in my head, discussions started to happen at home whether to get this pup or not. My sister and mom were all excited and ready. Dad was neutral and had concerns. I really didnt know what I was feeling (very unlike me). But I said "Yes" to the idea and started to look forward to it.

3rd June is the day I went to get him. Oh baby, the moment I looked at him, I knew I was ready to have him home. A cute, but scared golden colored pup. I just fell in love with him there and then. On the way back home, we picked up some rice cerelac for him. Yes.. thats right.. Cerelac! :) For the little one. We got home and put him on the floor. To begin with, he was a little jittery, scared, unsure of his steps and where he was walking. But in no time, he was all over the place, biting things, tearing paper, chewing shoes and playing with all the family members. From a scared little pup to a bundle of joy in no time.

Monday, March 17, 2008

Experiencing a lay off - one of the many experiences at QSquare (a software startup)

I joined QSquare Technologies Pvt Ltd, a delhi based software testing startup company in August 2007, thinking that I'll get to learn about how startups work, get a more holistic picture of the software lifecycle and also be close to the management of the company. I was working at a client end as a Design Analyst and Systems Architect. Unfortunately, the whole idea back fired on my face and I was laid off! Even before I could see myself growing and learning with the company, I was asked to leave and find something else for myself... Now that certainly didnt sound all that great.

Here's what happened and what I learnt from it (both good and bad experiences):

1. There was no clarity of projects and work at the client end. And to top it up, everyone was busy enough with existing projects and nothing new actually got incepted. So, at the end of the day, I was left with almost no work for most of my time. And that was quite bugging! In due course of time, I also slacked and stopped demanding and expecting work.
Learning- Dont care about whats happening in the company and what others are upto! Make sure you dont sit idle and dont stop growing. You got to take responsibility of your own involvement in the company!

2. The internal teams at the client resisted my involvement in their projects much more than I had ever anticipated. This didnt make my work easier at all. But instead I struggled more in dealing with this rather than getting or doing work.
Learning- When working at a client end, make sure that the teams there are aligned to your involvement in the project. And if there are any issues coming up during the work, get the senior management involved immediately. Dont wait for things to blow up more than what can be handled.

3. When there was no work, I did not spend the whole day at work. Reached at 11 am and left at 6 pm. Instead, I involved myself with other activities and started learning and doing stuff beyond my job. This didnt go down well with the people at the client end.
Learning- It doesnt matter whether you have work or not. Unless the team and the management is in loop of what you are doing, you are bound to be held to account for the erratic timings.
Infact, the day I was laid off, I entered office at 12:30 pm. There was no work at office, so I spent the morning getting some of my own pending work sorted out. This didnt go down well with the client at all!

Lastly, working at a client end is a risky affair. You got to be very careful about what gets displayed to the teams involved in the project!

I am taking whatever learnings I can from this experience and will incorporate them into my career. This was probably just a glimpse of what I'll be facing when I start my own company! Its probably a small little trailer of entrepreneurship...!

Failure is success if we learn from it - Malcolm Forbes

Sunday, October 21, 2007

Is that it??

You came alone, you'll go alone. In the end it all means NOTHING!

Surprising but true, the statement above completely describes life. Is that it? There is nothing to do, nowhere to go, nothing to acheive, no place to get.....? Its all just what you say...! Thats it!

On this note, I asked myself a question... How about having my life be about love and happiness. And about contributing value to this world and its people in whatever way I can. How about giving away love and making a difference to this world that I live in.... Wow! thats seems to be truly a purpose worth living life for. But the thought doesnt seem to last long. The very next moment, life takes over and I am back into thinking how I can win the rat race... Damn!

But really, is that it?

Saturday, January 07, 2006

Indian courts!

How much I wish I had written my full name in my 10th class CBSE form. But that wasnt so!
I wrote Amandeep Singh and missed out the Khurana. Didnt think it would cause such an issue at a later stage... and so it did. The name continued in class 12 and so my CBSE certificates said "Amandeep Singh" and not "Amandeep Singh Khurana".
To worsen the matters, my passport had my full name and lead to my license too having the full version.

When it came to applying for admission into Engineering after 12th (was trying for Canada), it became a matter of concern as there was a discrepency. But it didnt trouble then because I took admission into Thapar. And that is when the escapade of getting my name legally changed from Amandeep Singh to Amandeep Singh Khurana began.. And a long one. As I write this down, it has still not ended, though I am hoping it would before I pass out in May.

The first step, naturally was to request Thapar athourities to write the full name in the grade cards and subsequently in all certificated to follow. The response was pretty straight forward, "We'll follow what the CBSE certificates say". Alright, nevermind, lets go to CBSE. And so I did, to request them to change my name in their records and issue me fresh certificates.

CBSE, like any other government organization had its own procedures (pretty long) to be followed. I was asked to get an order from the court saying that the name should be changed in their records and only then will the certificated be reissued. The procedure wasnt supposed to be too long.

Ok! So, next, I approached a lawyer, who was also handling my father's legal matters. Asked him on what to do and how to go about it. The answer was, "It isnt a big deal. You'll need to get it in the newspaper once and file a request in the court". And so I did! This was back in August 2003! And its Jan 2006 now. The story from then to now is what follows.

A small ad was given in the paper and subsequently a case was filed in the Karkardooma courts in New Delhi, involving CBSE and Delhi Government. None of the two organizations should have ideally had any objections in me adding my family name to mine, but apparantly, they did! And they did raise objections and cause troubles.

The case was filed and a date was give. I appeared in the court on the given date and time with my father. An objection was raised!! Dad had filed the case on my behalf with an affidavit from me allowing him to do so, since I was in Patiala and couldnt have come here during the weekdays to do it myself. Moreover, he decided not to drag me into the legal formailities and rather do it himself alongwith his other matters. The objection was right in that. "Why did I not file the case myself?" Case rejected! File it again, and this time in person. Not through anybody else...

Ok your honour! Shall do that! Thank you! So, the case was filed again, and this time by me. Again, I had to appear in the court for the hearing and cross examination and so on......
And I did on the given date and time, once again! This time, my statement was recorded and the next date was given for some further formalities.

Right then! The next date came, and I was present. The CBSE lawyer raised an objection. God knows for what. Just for the heck of it! And another date was given to answer to his objections.

The next date, we tipped the CBSE lawyer to not trouble us and let the case get solved smoothly. So, he presented no objections this time. Phew!!! So some movement atleast! But yet another date given..

Next time, I reached the court and found that my original certificates were needed to be shown, which apparantly I had left in Patiala. Again, another date..

The next date came by, this was Dec'05. I was present there with my original certificates and all papers that the judge could possibly ask for. Just to find out that the judge was on a leave that day! I drove 45 minutes just to find that out!!! Bastard! I was literally fuming that day! Ready to kick the judge on his butt with all the force I possibly could. But that ofcourse wasnt to be, and I was given yet another date in Jan'06. So, I appeared in the court, once again, hoping that the matter would be closed this time. The judge went through the papers and asked a few things to the lawyers involved, who ofcourse were in a hurry to get this petty case out of their list. I was cross examined and so was my Dad... At the end, the judge said "Date for final arguments is blah blah blah!" Now that was enough! Final arguments for what? No one had anything to argue about!! But the arguments had to be done before the order could be passed.

Now, here I am, waiting for the final arguments to be done on the appointed date, and the order passed, so that the next procedure can be initiated for the certificates to be reissued by CBSE and then by my college... Not sure when the wait will end... All I can do is hope to get all of it done before I pass out of college in May.

I need not comment on the system at all. All of this doesnt really need anything to be said about. Spending so much time, money and effort and wasting national resource on such a petty thing.. It happens only in India!