Big Data

Preparation for using Hadoop

We will use Apache Hadoop to implement MapReduce jobs and HDFS as a distributed file system for some of the labs. Hadoop allows single node setup to simply test and debug the MapReduce programs.

Requirements

We require following platform and software for the labs:

  1. GNU/Linux or Mac OS X platform

  2. Java 1.7.x, preferably from Oracle

  3. ssh and rsync (already included in Mac OS X)

  4. Hadoop version 1.0.3

  5. Maven 3.0.5

  6. Eclipse 4.4

Download

Hadoop version 1.0.3 can be downloaded from http://archive.apache.org/dist/ha doop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz

Configuration

Unpack the Hadoop distribution:

$ tar -xvzf hadoop-1.0.3.tar.gz

In the distribution, edit the file conf/hadoop-env.sh to define JAVA_HOME to be the root of your Java installation.

Try the following command from the hadoop directory:

$ bin/hadoop

This will display the usage documentation for the hadoop script.

Hadoop can be started in one of three supported modes:

Reference

Hadoop documentation can be found on the Hadoop folder: ${HADOOP_HOME}/docs/single_node_setup.pdf.