Currently, we are using Hadoop, Hive, Pig, HBase, Nutch, Lucene, Solr, and several other frameworks.
Custom data gathering software is written in Python or Java. The software runs primarily on Dell hardware, and our typical dev server setup looks like this:
We are now relying on commercial “cloud” providers for additional capacity, but are planning to upgrade our own infrastructure over next few months to support full size data sets (billions of objects).
If you have practical “big data” experience and would like to join the iTrend team, send your resume to info at itrend.tv.