Thursday, April 28, 2016

Using Maven

Source directory:
src/main/java
contains all the source java files

Test directory:
src/test/java
contains all the test java files

pom.xml must be at the same level as the src directory

java -version
gives java version
mvn --version
gives maven version
mvn compile
compiles only the source files to .class files
mvn test-compile
compiles both source and test files to .class files
mvn clean
cleans up the distribution
mvn test
runs the unit tests
mvn package
creates jar file
mvn verify
verifies jar file (?)
mvn install
installs jar file
Maven build lifecycle:
validate, compile, test, package, integration test, verify, install

Monday, April 4, 2016

Adding Hadoop jar files to your MapReduce project in Eclipse

So I've been learning Hadoop using Tom White's book. Chapter 2 gives the first sample MapReduce program.

First off, for getting the NCDC data set, I found the script listed at this page immensely useful
https://gist.github.com/Alexander-Ignatyev/6478289
Note the updated FTP server listed in the comments section.

I downloaded a few of the year.gz files and then did
  • gunzip *
  • cat * > sample.txt
 To create my sample input file for the MaxTemperature program.

Anyhow, I am using Eclipse to code the MaxTemperature example. So I created a project for MaxTemperature. To add the jar files so that eclipse & java will recognize the org.apache.hadoop.* packages,
Right Click on the project > Properties > Java Build Path > Libraries
Click the "Add External JARs..." option, and add
  • share/hadoop/common/hadoop-common-2.X.Y.jar
  • share/hadoop/mapreduce/hadoop-mapreduce-client-core.2.X.Y.jar
files. After that, the org.apache.hadoop.* packages will be appropriately recognized.

.