Oracle has been making a lot of noise about Big Data lately, culminating in the unveiling of their Big Data Appliance. At the recent database session at Oracle’s OpenWorld conference, Andy Mendelsohn, Senior VP of Oracle’s server technologies, explained that the Appliance will have 216 CPU cores, up to 864 GB of RAM, and up to 432 TB of local storage – all connected with 40 Gbps Infiniband.
For the sake of simplicity, this “full cabinet”-size appliance translates into 42 1U servers with the following specs on average:
- 4..6 CPU cores
- 20GB RAM
- 10 TB HDD space
Hadoop works great with SATA drives and no RAID. Most real-world data gathering and processing applications will work fine with 10 GigE or even 1 GigE. Reliability (and uptime) of individual nodes is not a problem either – Hadoop is designed to run on servers that fail*.
So, how much would it cost us (iTrend) to assemble an “appliance” that would match (or outperform) Oracle’s monolithic system?
It will be interesting to see Oracle’s pricing.
“Real world” scenarios is key here. I am sure Oracle’s hardware will perform better against artificial test cases (and will fail less) in a lab. However, they are talking about the following real-life scenarios in their press release:
Weblogs, social media feeds, smart meters, sensors and other devices generate massive volumes of data (commonly defined as ‘Big Data’) that isn’t readily accessible in enterprise data warehouses and business intelligence applications today.
Some of the reasons why the Big Data Appliance makes no sense to me:
- Hadoop is designed to run on cheap commodity hardware that fails. Oracle’s expensive hardware is designed for uptime.
- Hadoop and related software stack are designed for nodes working with local data sets. Oracle’s appliance puts heavy emphasys on data transfer rates (between nodes).
- Sensors and specialized devices aside, no data sources (that are commonly used for Big Data processing) generate new content at a rate that would require Infiniband.
- Even if you are after Terabytes of new data daily, and are crawling the web or scraping microblogs – your data fetching rates are still limited by politeness policies. Again – no Infiniband is needed.
- Hadoop lets you scale out as you grow (or scale up and rebalance as needed – like adding RAM to machine handling search indexes etc), while with Oracle’s monolithic system you are stuck – you hit performance ceiling on one parameter (e.g. CPU), and the rest of your expensive resources will sit unutilized.
- The default “appliance” configuration will not work for everyone. For example, you may discover that you need more RAM in your Namenode, and less RAM but more CPU in your Datanodes. See above.
In conclusion, Oracle Big Data Appliance seems poorly thought out, impractical, inflexible, and bad for your ROI. But we need to remember that this solution targets the Enterprise market, which means this product may do well.
* Hadoop Namenodes, unlike Datanodes, need special provisions for failover/redundancy.
By Michael Alatortsev
December 19 2011 update: surprisingly, this article remains quite popular, according to our Google Analytics reports. iTrend provides hardware consulting, and we will be happy to assist you with your Big Data initiative. Email your requirements to firstname.lastname@example.org and mention this blog.