Monday, April 4, 2016

Adding Hadoop jar files to your MapReduce project in Eclipse

So I've been learning Hadoop using Tom White's book. Chapter 2 gives the first sample MapReduce program.

First off, for getting the NCDC data set, I found the script listed at this page immensely useful
https://gist.github.com/Alexander-Ignatyev/6478289
Note the updated FTP server listed in the comments section.

I downloaded a few of the year.gz files and then did
  • gunzip *
  • cat * > sample.txt
 To create my sample input file for the MaxTemperature program.

Anyhow, I am using Eclipse to code the MaxTemperature example. So I created a project for MaxTemperature. To add the jar files so that eclipse & java will recognize the org.apache.hadoop.* packages,
Right Click on the project > Properties > Java Build Path > Libraries
Click the "Add External JARs..." option, and add
  • share/hadoop/common/hadoop-common-2.X.Y.jar
  • share/hadoop/mapreduce/hadoop-mapreduce-client-core.2.X.Y.jar
files. After that, the org.apache.hadoop.* packages will be appropriately recognized.

.
 

No comments: