Nutch 2.0 typically won’t run on a full (Hadoop base services, Hbase, Hue) SCM installation of Cloudera’s CDH3.
Same problem will occur on a CDH3 distro (without SCM) with Hue distros installed. The error is caused by a bug in MAPREDUCE-967 which modifies the way MapReduce unpacks the job’s jar. Previously, the whole jar would be unpacked; after the update, only classes/ and lib/ would be unpacked. That way, Nutch would complain about a missing plugins/ directory.
1) force unpacking of the plugin/ directory by adding the following properties to nutch-site.xml:
2) remove hue-plugins-1.2.0-cdh3u1.jar from the hadoop lib folder (e.g. /usr/lib/hadoop-0.20/lib)
3) recreate the Nutch job file using ant
4) set HADOOP_OPTS=”-Djob.local.dir=/<MY HOME>/nutch/plugins” in hadoop-env.sh
See Nutch Wiki for more information: