- http://www.crummy.com/software/BeautifulSoup/ Beautiful Soup is a Python library designed for quick turnaround projects like screen-scraping
- http://www.google.com/trends/ Used for well, googling trends
- http://www.datastax.com/ Cassandra solutions
- http://www-958.ibm.com/software/analytics/manyeyes/ IBM page for visualizing cool data
- http://nltk.org/ NLTK is a leading platform for building Python programs to work with human language data.
- https://www.mturk.com/mturk/welcome This is a market place for work.
- http://aws.amazon.com/dynamodb/ Amazon's offering
- http://cassandra.apache.org/ Cassandra main page
- http://www.infochimps.com/ We built the world’s largest data marketplace.
SQL, NoSQL, Big Data Oh My!
You might be asking yourself, what is NoSQL, and why would I want to use this over typical SQL systems like MySQL, or MSSQL? Well, the answer is simple, times are changing and data is getting larger and more diverse. The main selling point for NoSQL, and Big Data is that they accept any kind of data and they don't require strict schema frameworks to operate.
Where things really start to get confusing is when multiple applications enter the fray. For instance, there are many, many NoSQL applications out there, and not all of them are used for the same thing. Below are some basic selling points for NoSQL software.
The last point that I want to add to this section for now is that relational databases can still be used along side NoSQL, or Big Data like applications. Typical interactions between these two types of databases would be:
- Using relational databases as a source that is fed into MapReduce functions. This can then be compared against other data sources for detailed analysis.
- Re-injecting the results of data that was processed from MapReduce into a relational database. Which could then be used as mentioned above. Mind blowing stuff maaann.
SMAQ based software
NoSQL with MapReduce features
- Has MapReduce functionality
- Distributed Database offering semi-structured document based storage
- Strong replication and distributed updates are the key features
- Java based queries that allow for Map and Reduce phases
- Similar to CouchDB, however it has a stronger emphasis on performance
- Less suitable for distributed updates, replication and versioning
- MapReduce functions are based on Java
- Similar to CouchDB and MongoDB
- Strong emphasis on High Availability
- MapReduce functions are based on either Java or Erlang
- Used for search and indexing on NoSQL systems. Based off of Lucene search technology.
- Example: MapReduce is used on say, Facebook to compute the influencial power of each person according to some pre-determined metric. This ranking would then be injected into another database. Solr is then used to allow search functions against who is most popular, using the database that MapReduce created.