NoSQL

NoSQL for layman

I haven’t come across a definition which will explain a naive person who comes from a very strict relational background (like me!!), the exact difference between RDBMS and NoSQL database. Here’s what I have learned so far. 

  • First and foremost, NoSQL is not ‘No SQL Anymore’ or ‘No To SQL’. It means ‘Not Only SQL’ i.e. any database which is not relational can be classified as NoSQL. ‘Not Relational Database’ is a better description but  for some reason it didn’t catch up. 
  • With the explosion of the web the amount of data grew exponentially. Wikipedia started with zero articles in 2001 and now has more than 4 million of them. (http://en.wikipedia.org/wiki/File:EnwikipediaArt.PNG). Twitter had 5000 tweets per day on average in 2007 today it is more than 200 million. (http://blog.twitter.com/2011/06/200-million-tweets-per-day.html)  More over, the data structures the Web uses are varied like RSS, Ontologies, Tags, Hytertext, Text documents, blogs,  graphs, nodes ,user generated content, photos, videos and the list is long. Relational engine was not designed with these data structures in mind and hence is not necessarily optimized to store them. Sure you can hack around and make a RDBMS store it any way but  the performance will suffer.  New data storage mechanism were invented  to tackle this issue which collectively came to know as NoSQL.
  • In relational world, the data is modeled using tables where each row represents an instance of what is being stored. The schema is strictly adhered to and changing schema is a complex process. For e.g. in the Customers table we store an instance for each customer. Like his name, his address, his DOB etc. Adding new information about our customer will involve changing the schema of the table and adding a new column. In a petabyte size database this a costly operation. In contrast to this in NoSQL databases, the data is schema free or at least restriction on the schema are weak and modeled loosely based on real life entity.
  • RDBMS are very strict in enforcing ACID properties. It uses highly normalized tables, primary keys, foreign keys, indexes etc. to enforce these properties thus avoiding data redundancy, prevent duplicates, remove orphan records etc. As the databases size grows , data retrieval became expensive because many tables had to be joined and joins are expensive. NoSQL database sacrifice the ACID properties or ACID properties take a secondary place. ‘Say what? No ACID?’ you might say but in many of the web applications there is no need for it. There are certain NoSQL databases who promise eventual consistency but we won’t discuss that here. 
  • Scale Out and Scale up. This aspect confuses the heck out of me. In lay man’s terms, performance of RDBMS is proportional to the resources. More the resources, the better it will perform. Hence the usual practice to improve the performance of RDBMS is to throw more resources at it( Not that it work’s all the time but that’s a separate issue entirely.)   The resources are expensive and there is always a limit to that. This is called Scaling Up. NoSQL on the other hand have been designed from ground-up to run of commodity hardware.  Imagine ten beat up old servers connected together in a cluster. Each is a separate node and one of them is assigned as master node. Your NoSQL database is running on this cluster. When you ask the NoSQL database to do something, the master node splits the job and assigns it to each individual node.(This is called as Map from Map Reduce you might have heard.) Each node then works on its  piece in parallel to other nodes. At the end each node hands over the result to master which then combines all the results and gives it back to user (This is called as Reduce from Map Reduce). The nodes are swappable which means I can add node on the fly as I need more computing power. Hence NoSQL databases are good at scaling out.    

 

So that is my understanding of NoSQL so far. I will be adding more information to this post as an when I come across any. In the next post I am going to briefly explain the different types of NoSQL databases.

Regards