Understanding NoSQL for Product Owners

If the product you manage requires content aggregation you may have heard your dev team say “we need to go with a NoSQL storage solution”.  Below is a summery of key points about NoSQL I found through some basic research.

How does NoSQL differ from SQL?

“Not only SQL”
NoSQL is a database movement which promotes non-relational data stores that do not need a fixed schema.
There are several primary storage techniques or “implementations” used by the NoSQL approach:

  • Document store – MongoDB, CouchDB
  • Eventually‐consistent key‐value store (“ColumnFamily”) – Cassandra
  • Graph – Neo4j
  • Key/value store on disk – Amazon’s SimpleDB
  • Key/value cache in RAM – Redis, memcached

Taken from this Wikipedia article on NoSQL databases
An introduction to NoSQL on Hacker News
A 10 minute talk from Brian Aker bashing NoSQL

What is MapReduce?

MapReduce is a framework for processing huge datasets on certain kinds of distributable problems using a large number of computers (nodes), collectively referred to as a cluster.

“Map” step: The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.

“Reduce” step: The master node then takes the answers to all the sub-problems and combines them in a way to get the output – the answer to the problem it was originally trying to solve.

“GROUP BY” in SQL is very similar to “Map Reduce” in NoSQL.

Taken from this Wikipedia article on MapReduce

What is a Graph db?

A graph database is a database that uses graph structures with nodes, edges and properties to represent and store information.

Example:

Node firstNode = graphDb.createNode();
Node secondNode = graphDb.createNode();
Relationship relationship = firstNode.createRelationshipTo( secondNode, MyRelationshipTypes.KNOWS );

firstNode.setProperty( “message”, “Hello, ” );
secondNode.setProperty( “message”, “world!” );
relationship.setProperty( “message”, “brave Neo4j ” );

We now have a graph that looks like this:
(firstNode )—KNOWS—>(secondNode)

A popular vendor is Neo4j
Taken from this Wikipedia article on Graph Databases

What is a Document db?

As opposed to relational databases, document-based databases do not store data in tables with uniform sized fields for each record. Instead, each record is stored as a document that has certain characteristics.   There is no real hierarchy of data; just a collection of documents which may contain virtually any kind of data. The documents may not necessarily be the same length, as some documents may contain details of fields that other documents do not need to store. In other words, you are not constrained by a database schema.

Example:

FirstName=”Bob”, Address=”5 Oak St.”, Hobby=”sailing”.

Another document could be:

FirstName=”Jonathan”, Address=”15 Wanamassa Point Road”, Children=(“Michael,10”, “Jennifer,8”, “Samantha,5”, “Elena,2”).

Notice that both documents have some similar information and some different – but unlike a relational database where each record would have the same set of fields and unused fields might be kept empty, there are no empty ‘fields’ in either document (record) in this case. This system allows information to be added any time without using storage space for “empty fields” as in relational databases.

A popular vendor is MongoDB.
MongoDB manages collections of JSON-like documents. This allows many applications to model data in a more natural way, as data can be nested in complex hierarchies and still be query-able and indexable.

{
“username” : “bob”,
“address” : {
“street” : “123 Main Street”,
“city” : “Springfield”,
“state” : “NY”
}
}

Another popular vendor is CouchDB (Apache) as it works well with Rails.
Taken from this Wikipedia article on Document-Oriented Databases

How I use Social Networks

I read this paragraph from an article titled 10 Ways Social Media will change in 2009:

“They will connect around meaningful topics and have live and simultaneous conversations within parameters they themselves define, which will bring relevance back to their interaction with others.”

This got me thinking about my “meaningful topics” and how they related to the social networking technologies I use the most: Facebook, Twitter, BrightKite, LinkedIn and Gyminee.

Facebook is my mindless entertainment

Like everyone it seems, I joined Facebook a while ago and connected with a few old high school friends.  That seems like the good ole’ days of Facebook, now I am friends with clients, coworkers, and people I grew up with but were never friends with and my wife and Denver friends.  Facebook competes with entertainment like watching TV, reading a magazine, etc.  Most of my Facebook activity is at night or triggered by receiving an email notification.  Facebook becomes increasingly relevant to my social life as my friends begin to use it and post time sensitive, relevant things like “wanna go skiing tomorrow?”.  Find Kelly Taylor on Facebook

Twitter means keeping in touch with the tech community

Most of the people I follow on twitter are tweeps in the Boulder/Denver tech scene, VCs or software developers.  Only a small percentage of people I follow are actually friends of mine in real life.  Surfing twitter usually sends me down productive, educational rabbit holes, informs me of what’s going on in the community and what is top of mind for the important thinkers in the tech industry.  I take twitter very seriously and feel it’s added tremendous value to my career.  Find Kelly Taylor on twitter

BrightKite broadcasts how rad I am

I only “Check In” occasionally using BrightKite on my iPhone.  Usually I do this because I am somewhere cool like my favorite restaurant Mountain Sun or hanging out skiing in Silverthorne.  I love BrightKite’s iPhone app and have fun with the service….but kindof wish twitter would buy them to simplify things.  Find Kelly Taylor on BrightKite

LinkedIn is my Career Counselor

I look at and tweak my LinkedIn profile occasionally which helps me perform an internal audit of my career and how things are progressing.  The act of joining groups on LinkedIn seems almost more important than participating in them.  Occasionally I comment on a discussion or attempt to connect with someone through someone in my network, but that’s about the extent of it.  View Kelly Taylor’s profile on LinkedIn

Gyminee is my training reality check

I’ve used web apps before to help with my marathon and triathlon training such as Training Peaks.  My latest favorite is Gyminee because of the beautiful UI including graphs and “letter grades” as well as it’s social networking component.  Most of my “friends” on Gyminee are similar to my twitter friends, I only know them online and we’ve possibly met once or twice in real life.  Even though I don’t know these people it still provides a good motivator for me to keep up on my workouts, post my progress and comment on group discussions.  View Kelly Taylor’s Workouts on Gyminee

_______  

It is interesting to ponder the “meaningful topics” in my life that I don’t use social networking technologies for, there aren’t any.

This leads me to the conclusion that the idea of a “Social Network” is going to fade into basic computing and life infrastructure.  

I enjoyed reading the above articles for the author’s predictions in 2009:  Ad Revenue, Convergence of Networks and Platforms, Social Media Jobs…very similar to 2008 predictions I’ve read.  One predication I feel he missed is that today’s “Social Media” is the gateway to interactive TV.  Watching Obama’s Inauguration speech on CNN/Facebook with my Facebook “News Feed” showing my friends realtime comments about the event is a good indicator of things to come.

All in all, pondering Social Media’s future from time to time is a good thing.  Goodness knows most of my drinking beers, dinner with friends conversation begins with “So, on Facebook I read that…..”.