Big Data

Hadoop Standalone Mode

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. In this mode there is no HDFS and Hadoop reads and writes instead to the local file system. This is useful for debugging.

The following example shows how to run the Maximum Temperature example in the standalone mode. First download the jar file and a weather sample file. Create an input folder and put the sample file inside. Then run the following command:

$ bin/hadoop jar maxtemp.jar \
  heigvd.bda.labs.weather.MaxTemperature input output

The output is written to the given output folder. It should not exist when running the command. You can check the output through:

$ cat output/*

The source code of the Maximum Temperature example is taken from Hadoop: The Definitive Guide, 3rd Edition, Tom White and is available on https://s3.amazonaws.com/mse-bda-data/codes/weather.zip.

Maven

We use Maven to manage dependency, build the project and create the jar file.

You can build the project using:

$ mvn clean compile

You can create the project jar file using:

$ mvn package

The jar file will be created in the target folder.

To run the project using Maven:

$ mvn exec:java -Dexec.mainClass="heigvd.bda.labs.weather.MaxTemperature" -Dexec.args="input output"

Eclipse

You can create an Eclipse project using:

$ mvn eclipse:eclipse

Once the project is created, you can import it in Eclipse:

From Eclipse choose Import... -> Existing Projects into Workspace and open the project folder.

To compile project using Eclipse:

Run Configuration -> New Maven Build

Define the goal: clean compile package

To run project using Eclipse:

Run Configuration -> New Java Application

Define the main class and the arguments.

Finally you can add Hadoop sources in your Eclipse project:

Project properties -> Java Build Path -> Libraries

Select the Library you want to attach source/javadoc for (in case of Hadoop it is M2_REPO/org/apache/hadoop/hadoop-core/1.0.3/hadoop-core-1.0.3.jar) and then expand it, you'll see a list like:

Source Attachment: (none)
Javadoc location: (none)
Native library location: (none)
Access rules: (No restrictions)

Select Source Attachement -> Edit -> Variable -> New -> Folder

Add Hadoop folder and then choose Hadoop source folder as Extension.

Now you can find the source code of Hadoop classes using F3.