This version offers several interesting improvements over the previous stable release, affecting both HDFS and Map Reduce – and the framework’s ability to scale:
- HDFS Federation
Lets you use multiple independed Namenodes/namespaces and scale the name service horizontally. Namenodes are independent and don’t require coordination with each other. Datanodeas are used by all Namenodes as common storage.
Key benefits, according to the release document on apache.org:
- Namespace Scalability – HDFS cluster storage scales horizontally but the namespace does not. Large deployments or deployments using lot of small files benefit from scaling the namespace by adding more Namenodes to the cluster
- Performance – File system operation throughput is currently limited by a single Namenode. Adding more Namenodes to the cluster scales the file system read/write operations throughput.
- Isolation – A single Namenode offers no isolation in multi user environment. An experimental application can overload the Namenode and slow down production critical applications. With multiple Namenodes, different categories of applications and users can be isolated to different namespaces.
- MapReduce NextGen aka YARN aka MRv2 The new architecture splits JobTracker functionality into separate components: resource management (assignment of computer resources to applications) and job life-cycle management (application scheduling and coordination).
These should let you do more with the same infrastructure, Hadoop needed this.
Also – check out the following presentation by Hortonworks founder, Arun C. Murthy.