If the product you manage requires content aggregation you may have heard your dev team say “we need to go with a NoSQL storage solution”. Below is a summery of key points about NoSQL I found through some basic research.
How does NoSQL differ from SQL?
“Not only SQL”
NoSQL is a database movement which promotes non-relational data stores that do not need a fixed schema.
There are several primary storage techniques or “implementations” used by the NoSQL approach:
- Document store – MongoDB, CouchDB
- Eventually‐consistent key‐value store (“ColumnFamily”) – Cassandra
- Graph – Neo4j
- Key/value store on disk – Amazon’s SimpleDB
- Key/value cache in RAM – Redis, memcached
Taken from this Wikipedia article on NoSQL databases
An introduction to NoSQL on Hacker News
A 10 minute talk from Brian Aker bashing NoSQL
What is MapReduce?
MapReduce is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes), collectively referred to as a cluster.
“Map” step: The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.
“Reduce” step: The master node then takes the answers to all the sub-problems and combines them in a way to get the output – the answer to the problem it was originally trying to solve.
“GROUP BY” in SQL is very similar to “Map Reduce” in NoSQL.
Taken from this Wikipedia article on MapReduce
What is a Graph db?
A graph database is a database that uses graph structures with nodes, edges and properties to represent and store information.
Example:
Node firstNode = graphDb.createNode();
Node secondNode = graphDb.createNode();
Relationship relationship = firstNode.createRelationshipTo( secondNode, MyRelationshipTypes.KNOWS );
firstNode.setProperty( “message”, “Hello, ” );
secondNode.setProperty( “message”, “world!” );
relationship.setProperty( “message”, “brave Neo4j ” );
We now have a graph that looks like this:
(firstNode )—KNOWS—>(secondNode)
A popular vendor is Neo4j
Taken from this Wikipedia article on Graph Databases
What is a Document db?
As opposed to relational databases, document-based databases do not store data in tables with uniform sized fields for each record. Instead, each record is stored as a document that has certain characteristics. There is no real hierarchy of data; just a collection of documents which may contain virtually any kind of data. The documents may not necessarily be the same length, as some documents may contain details of fields that other documents do not need to store. In other words, you are not constrained by a database schema.
Example:
FirstName=”Bob”, Address=”5 Oak St.”, Hobby=”sailing”.
Another document could be:
FirstName=”Jonathan”, Address=”15 Wanamassa Point Road”, Children=(“Michael,10”, “Jennifer,8”, “Samantha,5”, “Elena,2”).
Notice that both documents have some similar information and some different – but unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty ‘fields’ in either document (record) in this case. This system allows information to be added any time without using storage space for “empty fields” as in relational databases.
A popular vendor is MongoDB.
MongoDB manages collections of JSON-like documents. This allows many applications to model data in a more natural way, as data can be nested in complex hierarchies and still be query-able and indexable.
{
“username” : “bob”,
“address” : {
“street” : “123 Main Street”,
“city” : “Springfield”,
“state” : “NY”
}
}
Another popular vendor is CouchDB (Apache) as it works well with Rails.
Taken from this Wikipedia article on Document-Oriented Databases