Big Data

From wiki.mikejung.biz
Jump to: navigation, search

Liquidweb 728x90.jpg

Links

SQL, NoSQL, Big Data Oh My!

You might be asking yourself, what is NoSQL, and why would I want to use this over typical SQL systems like MySQL, or MSSQL? Well, the answer is simple, times are changing and data is getting larger and more diverse. The main selling point for NoSQL, and Big Data is that they accept any kind of data and they don't require strict schema frameworks to operate.

Where things really start to get confusing is when multiple applications enter the fray. For instance, there are many, many NoSQL applications out there, and not all of them are used for the same thing. Below are some basic selling points for NoSQL software.

The last point that I want to add to this section for now is that relational databases can still be used along side NoSQL, or Big Data like applications. Typical interactions between these two types of databases would be:

  • Using relational databases as a source that is fed into MapReduce functions. This can then be compared against other data sources for detailed analysis.
  • Re-injecting the results of data that was processed from MapReduce into a relational database. Which could then be used as mentioned above. Mind blowing stuff maaann.

SMAQ based software

Cassandra

Hadoop

NoSQL with MapReduce features

CouchDB:

  • Has MapReduce functionality
  • Distributed Database offering semi-structured document based storage
  • Strong replication and distributed updates are the key features
  • Java based queries that allow for Map and Reduce phases

MongoDB

  • Similar to CouchDB, however it has a stronger emphasis on performance
  • Less suitable for distributed updates, replication and versioning
  • MapReduce functions are based on Java

Riak:

  • Similar to CouchDB and MongoDB
  • Strong emphasis on High Availability
  • MapReduce functions are based on either Java or Erlang

Solr

  • Used for search and indexing on NoSQL systems. Based off of Lucene search technology.
  • Example: MapReduce is used on say, Facebook to compute the influencial power of each person according to some pre-determined metric. This ranking would then be injected into another database. Solr is then used to allow search functions against who is most popular, using the database that MapReduce created.