Running Nutch 2.0 with Cloudera’s CDH3 – missing plugins

Nutch 2.0 typically won’t run on a full (Hadoop base services, Hbase, Hue) SCM installation of Cloudera’s CDH3.  

Same problem will occur on a CDH3 distro (without SCM) with Hue distros installed.  The error is caused by a bug in MAPREDUCE-967 which modifies the way MapReduce unpacks the job’s jar. Previously, the whole jar would be unpacked; after the update, only classes/ and lib/ would be unpacked.  That way, Nutch would complain about a missing plugins/ directory.


1) force unpacking of the plugin/ directory by adding the following properties to nutch-site.xml:



2) remove hue-plugins-1.2.0-cdh3u1.jar from the hadoop lib folder (e.g. /usr/lib/hadoop-0.20/lib)

3) recreate the Nutch job file using ant

4) set HADOOP_OPTS=”-Djob.local.dir=/<MY HOME>/nutch/plugins” in


See Nutch Wiki for more information:



Technologist, parallel entrepreneur. Interests: travel, photography, big data, analytics, predictive modeling.

Posted in Uncategorized
One comment on “Running Nutch 2.0 with Cloudera’s CDH3 – missing plugins
  1. Michael Alatortsev says:

    Jeff, thanks for noticing. We are planning another Nutch install on CDH3 a few weeks from now, will be happy to provide more details to your team.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: