Preparation for using Hadoop
We will use Apache Hadoop to implement MapReduce jobs and HDFS as a distributed file system for some of the labs. Hadoop allows single node setup to simply test and debug the MapReduce programs.
Requirements
We require following platform and software for the labs:
-
GNU/Linux or Mac OS X platform
-
Java 1.7.x, preferably from Oracle
-
ssh and rsync (already included in Mac OS X)
-
Hadoop version 1.0.3
-
Maven 3.0.5
-
Eclipse 4.4
Download
Hadoop version 1.0.3 can be downloaded from http://archive.apache.org/dist/ha doop/core/hadoop-1.0.3/hadoop-1.0.3.tar.gz
Configuration
Unpack the Hadoop distribution:
$ tar -xvzf hadoop-1.0.3.tar.gz
In the distribution, edit the file conf/hadoop-env.sh
to define JAVA_HOME
to be the root of your Java installation.
Try the following command from the hadoop directory:
$ bin/hadoop
This will display the usage documentation for the hadoop script.
Hadoop can be started in one of three supported modes:
-
Fully-Distributed Mode
Reference
Hadoop documentation can be found on the Hadoop folder: ${HADOOP_HOME}/docs/single_node_setup.pdf
.