Hadoop and Big Data no longer runs on Commodity Hardware

I have spent the last week and will be spending this week in México, meeting with clients, press and partners. It’s been a great experience with a lot of learning opportunities. During these discussions I have been struck by the perception that Hadoop runs on ‘commodity hardware’. Clearly this was the case around 2 years ago with cheap servers building a high performance, fault tolerant, scalable cluster. But, as I mentioned previously, this was OK for clusters that were delivering batch processed, overnight jobs for actionable insights or reports. With the continuing development of the Hadoop ecosystem and Cloudera in particular this has changed completely, here’s why :-

  1. Spark requires much greater memory, 32 or 64GB machines cannot perform on Spark. 128, 256 or even greater amounts of memory are really the standard now for Spark, as Spark replaces MapReduce this requirement will only grow.
     
  2. The transition from Batch to real-time, in particular the heavy adoption of NoSQL databases like HBase and others mean HBase Regions need 128GB minimum, 256Gb standard or 512GB for performance in memory. Join HBase with Spark and you need some very high end machines.
     
  3. The increasing requirement for streaming and/or transactional data using Kafka and other tools means the servers that ingest the data and then serve up the analysis in real time have much greater memory requirements.
     
  4. With the move to realtime analytics and services, most new systems really benefit from SSD storage. While the cost of SSD storage is declining it’s still an expensive option.
     
  5. Take all of the above into account and quad core systems are the absolute minimum required now.
     

So – when thinking about Big Data and Hadoop/Cloudera in particular – probably a good idea to reset your expectations on Hardware costs as they are going up and will continue to go up. The good news is that as the Hadoop ecosystem grows in capability organizations will be able to deliver a much broader spread of use cases (see my post next week for a use case discussion) covering not just BI/Analytics but actual services to consumers/users.

What do you think? Is Hadoop moving beyond commodity hardware to be more expensive? Will this slow down Hadoop adoption?

If you have additional questions, get in touch with us!

9 + 14 =

USA

Corporate Head Quarters

2205 152nd Avenue NE
Redmond, WA 98052
USA

+1 (425) 605 1289

Latin America

(Mexico, Colombia & Chile)

Mexico City

Córdoba 42 Int. 807, Roma Norte, Cuauhtémoc, 06700, Mexico City

+52 (55) 5255 1329

United Kingdom

London

85 Great Portland Street, First Floor, London, W1W 7LT

+44 2030 971584

Ireland

Sligo

77 Camden Street Lower, Dublin, D02 XE80, Ireland

+353 71 915 9710

Search Guard is a trademark of floragunn GmbH, registered in the U.S. and in other countries. Elasticsearch, Kibana, Logstash, and Beats are trademarks of Elasticsearch BV, registered in the U.S. and in other countries. Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries. OpenSearch is licensed under Apache 2.0. All other trademark holders rights are reserved.