Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Running A Mapreduce Program On Cloudera Quickstart VM: Requirements

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Running a MapReduce program on Cloudera QuickStart VM

Requirements:
1. Cloudera QuickStart VM must be installed and running on the system.

Creating JAVA Project :

1. Start your virtual machine, and open Eclipse which comes pre-installed with
Cloudera Quickstart VM.
2. Right Click on the package Explorer window and select ​New​ Java Project​.
See ​Figure 1​.

Figure 1: Creating JAVA Project

3. Write the name of the project in the ​Project Name​ attribute and click ​Next. ​See
Figure 2
Figure 2: Creating JAVA Project

4. Click on the “Libraries” tab and select ​Add External Jars​ option. See ​Figure 3
Figure 3: Creating JAVA Project

5. The external jars which need to be added are present in the following directory:
usr/lib/hadoop/client-0.20 ​and
usr/lib/hadoop/lib

6. Select ​client-0.20​ directory inside the hadoop directory. A lot of jar files are
present in the directory. Though, all the jar files are not required, still to be on the
safer side, it’s advised to add all the jar files to your project. See ​Figure 4.
Figure 4: Creating JAVA Project

7. Select all the jar files and click ​Ok

8. One more jar file, present in ​usr/lib/hadoop/lib​ is left to be added to complete


the process. Therefore, click on ​Add External Jars​ and go to the directory:
usr/lib/hadoop/lib

9. Find and select the jar file named as ​commons-httpclient-3.1.jar​.​ ​See ​Figure 5​.
Figure 5: Creating JAVA Project

10. All the jar files required to run your MapReduce program have been added to the
project. Click on ​Finish​ to complete the process.

11. A new project with the name specified in the ​Project Name​ field is created and
appears in the “Package Explorer” window. Now it’s time to link the java classes
to this project.

If source files are available :

1. Open a source file and find the package name on the top of the code.
2. In the package explorer section of eclipse, go to your project, right click on ​src​,
go to ​New​ and select ​package​.
3. Write the same package name in the name field of “New Java Package” window
as the one written in the source code.
4. Next, add the source files to the package. This can be done by selecting all the
source files and dragging and dropping them to the newly created package.
Creation of new source files if source files are not available:

1. Right click on ​src, ​go to ​New ​ and select ​package.


2. Add a relevant package name in the name field. The Java naming convention is
to name the package in the format com.<packagename>. However, no error is
thrown even if this convention is not followed.
3. Right click on the newly created package and select ​New select ​Class.
4. Enter the name of the class in the name field and click ​Finish.

Running MapReduce program in standalone mode:

Once the project is created and source files are added to it, you are set to run the
program in standalone mode. Follow the given steps to do so :
1. Right click on the project and select “Run as” ”Run Configurations” and
select the “New Launch Configuration “ button in the upper left corner. See
Figure 6​ and ​Figure 7​.

Figure 6: Running MapReduce


Figure 7: Running MapReduce

2. Enter the name of the class containing the main function in the “Main Class” field.
Click on search button and select the main class. It will be displayed in the format
<driver class>-<package name>. See ​Figure 8​. Once you select the correct
option, <packagename>.<classname> appears in the “Main Class” field. See
Figure 9​.
Figure 8: Running MapReduce
Figure 9: Running MapReduce

3. If the program takes input as arguments, then switch to “Arguments” tab and
enter the required arguments in the correct sequence.

4. Ensure that the input file exists in the package folder inside the “workspace”
folder.

5. Click ​Run​.

Running MapReduce program in pseudo-distributed mode:

1. The class files obtained after compilation need to be converted into jar files
before running the program in pseudo-distributed mode. Right click on the project
and select ​Export. ​See ​Figure 10.

Figure 10: Running MapReduce in standalone mode.


2. “Export” window appears on the screen. Select ​Java ​option and select ​JAR file
from the dropdown menu. Click ​Next. ​See ​Figure 11​.

Figure 11: Running MapReduce in standalone mode.

3. Select the export destination where the jar file needs to be stored and the name
of the jar file in the “Select the export destination:” field. There is no constraint on
the choice of export destination or the name of the jar file. Click on ​Next. ​See
Figure 12.
Figure 12: Running MapReduce in Standalone mode

4. Click on ​Finish. ​Compilation would require some time. When the process is
completed, a dialogue box appears indicating that the jar export has finished.
Click on ​Ok. ​The jar file of specified name is created in the desired folder.See
Figure 13​.
Figure 13: Running a MapReduce in standalone mode

5. If your program requires an input file, then the file should be stored on the HDFS
(Hadoop distributed file system). The command for copying a file from local file
system to HDFS is as follows:
hdfs dfs -put <input file with path> <path in HDFS>

6. Go to the directory where jar file is stored. The command used to do this is as
follows:
cd <path>

7. Run the jar file stored in the folder using the following command. Before this,
ensure that your jar file contains the Driver, Mapper and the Reducer class files.
hadoop jar <jar_file_name> <driver_class_path> <arguments>
The arguments can be path of input file stored on HDFS or the path of a directory
on HDFS where the output is to be stored. The output files generated by the
program will also be stored on HDFS. Ensure that no directory exists with the
name specified as the output folder before running the jar file.
8. After completion of the MapReduce job, check the output files generated in the
specified directory on HDFS system by using the read commands.

You might also like