We recently configured a small three-node cluster (Dell servers + Dell managed switch) using Cloudera’s distribution of Hadoop:
OS: CentOS 6.2
Hadoop: Cloudera CDH, Cloudera Manager Free Edition version 3.7.3.
Hadoop installation was a snap (much better experience overall, compared to version 3.6), but some services (HDFS, HBASE, Hue) would not start on some nodes:
Note that on a typical CentOS 6.2 installation (which all our Hadoop nodes are), firewall is enabled by default on all interfaces – which may prevent your nodes from talking to each other, thus preventing some services from starting in distributed mode.
We like to physically separate internal Hadoop chatter from other kinds of traffic by designating one interface (eth0) on each node as “Hadoop”, giving it its own subnet with statically assigned IP addresses, and connecting them all via their own VLAN. This approach helps improve your cluster’s performance, security, ease of management.
Because our Hadoop traffic is already restricted at several levels, we can just designate each of our node’s “Hadoop” interfaces (eth0) as “Trusted”:
This needs to be done on each node.
Once the firewall settings have been updated, you can restart affected services using Cloudera Manager (HDFS first, followed by Mapreduce, and finally Hbase):
If you found this post helpful, feel free to hit “Like” below, or Tweet about it.
by Michael Alatortsev