Oracle Big Data Appliance – nonsense in a box

Oracle has been making a lot of noise about Big Data lately, culminating in the unveiling of their Big Data Appliance.  At the recent database session at Oracle’s OpenWorld conference, Andy Mendelsohn, Senior VP of Oracle’s server technologies, explained that the Appliance will have 216 CPU cores, up to 864 GB of RAM, and up to 432 TB of local storage – all connected with 40 Gbps Infiniband.

Oraclebigdata

For the sake of simplicity, this “full cabinet”-size appliance translates into 42 1U servers with the following specs on average:

  • 4..6 CPU cores
  • 20GB RAM
  • 10 TB HDD space

This is actually similar to the dev servers we use at iTrend – only we use less local storage per server.

Hadoop works great with SATA drives and no RAID. Most real-world data gathering and processing applications will work fine with 10 GigE or even 1 GigE. Reliability (and uptime) of individual nodes is not a problem either – Hadoop is designed to run on servers that fail*.  

So, how much would it cost us (iTrend) to assemble an “appliance” that would match (or outperform) Oracle’s monolithic system? 

$38,000

It will be interesting to see Oracle’s pricing.

“Real world” scenarios is key here.  I am sure Oracle’s hardware will perform better against artificial test cases (and will fail  less) in a lab. However, they are talking about the following real-life scenarios in their press release:

Weblogs, social media feeds, smart meters, sensors and other devices generate massive volumes of data (commonly defined as ‘Big Data’) that isn’t readily accessible in enterprise data warehouses and business intelligence applications today.

Some of the reasons why the Big Data Appliance makes no sense to me:

  1. Hadoop is designed to run on cheap commodity hardware that fails.  Oracle’s expensive hardware is designed for uptime.
  2. Hadoop and related software stack are designed for nodes working with local data sets.  Oracle’s appliance puts heavy emphasys on data transfer rates (between nodes). 
  3. Sensors and specialized devices aside, no data sources (that are commonly used for Big Data processing) generate new content at a rate that would require Infiniband.
  4. Even if you are after Terabytes of new data daily, and are crawling the web or scraping microblogs – your data fetching rates are still limited by politeness policies.  Again – no Infiniband is needed.
  5. Hadoop lets you scale out as you grow (or scale up and rebalance as needed – like adding RAM to machine handling search indexes etc), while with Oracle’s monolithic system you are stuck – you hit performance ceiling on one parameter (e.g. CPU), and the rest of your expensive resources will sit unutilized.
  6. The default “appliance” configuration will not work for everyone.  For example, you may discover that you need more RAM in your Namenode, and less RAM but more CPU in your Datanodes.  See above.

In conclusion, Oracle Big Data Appliance seems poorly thought out, impractical, inflexible, and bad for your ROI.  But we need to remember that this solution targets the Enterprise market, which means this product may do well.

* Hadoop Namenodes, unlike Datanodes, need special provisions for failover/redundancy.

By Michael Alatortsev

 

December 19 2011 update: surprisingly, this article remains quite popular, according to our Google Analytics reports.  iTrend provides hardware consulting, and we will be happy to assist you with your Big Data initiative.  Email your requirements to info@itrend.tv and mention this blog.

Technologist, parallel entrepreneur. Interests: travel, photography, big data, analytics, predictive modeling.

Tagged with: , , , , , , ,
Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: