Running A Mapreduce Program On Cloudera Quickstart VM: Requirements

Running a MapReduce program on Cloudera QuickStart VM
Requirements:
1. Cloudera QuickStart VM must be installed and running on the system.
Creating JAVA Project :
1. Start your virtual machine, and open Eclipse which comes pre-installed with
Cloudera Quickstart VM.
2. Right Click on the package Explorer window and select New Java Project.
See Figure 1.
Figure 1: Creating JAVA Project
3. Write the name of the project in the Project Name attribute and click Next. See
Figure 2
4. Click on the “Libraries” tab and select Add External Jars option. See Figure 3
5. The external jars which need to be added are present in the following directory:
usr/lib/hadoop/client-0.20 and
usr/lib/hadoop/lib
6. Select client-0.20 directory inside the hadoop directory. A lot of jar files are
present in the directory. Though, all the jar files are not required, still to be on the
safer side, it’s advised to add all the jar files to your project. See Figure 4.
7. Select all the jar files and click Ok
8. One more jar file, present in usr/lib/hadoop/lib is left to be added to complete

the process. Therefore, click on Add External Jars and go to the directory:
usr/lib/hadoop/lib
9. Find and select the jar file named as commons-httpclient-3.1.jar. See Figure 5.
10. All the jar files required to run your MapReduce program have been added to the
project. Click on Finish to complete the process.
11. A new project with the name specified in the Project Name field is created and
appears in the “Package Explorer” window. Now it’s time to link the java classes
to this project.
If source files are available :
1. Open a source file and find the package name on the top of the code.
2. In the package explorer section of eclipse, go to your project, right click on src,
go to New and select package.
3. Write the same package name in the name field of “New Java Package” window
as the one written in the source code.
4. Next, add the source files to the package. This can be done by selecting all the
source files and dragging and dropping them to the newly created package.
Creation of new source files if source files are not available:
1. Right click on src, go to New and select package.

2. Add a relevant package name in the name field. The Java naming convention is
to name the package in the format com.<packagename>. However, no error is
thrown even if this convention is not followed.
3. Right click on the newly created package and select New select Class.
4. Enter the name of the class in the name field and click Finish.
Running MapReduce program in standalone mode:
Once the project is created and source files are added to it, you are set to run the
program in standalone mode. Follow the given steps to do so :
1. Right click on the project and select “Run as” ”Run Configurations” and
select the “New Launch Configuration “ button in the upper left corner. See
Figure 6 and Figure 7.
Figure 6: Running MapReduce

2. Enter the name of the class containing the main function in the “Main Class” field.
Click on search button and select the main class. It will be displayed in the format
<driver class>-<package name>. See Figure 8. Once you select the correct
option, <packagename>.<classname> appears in the “Main Class” field. See
Figure 9.
3. If the program takes input as arguments, then switch to “Arguments” tab and
enter the required arguments in the correct sequence.
4. Ensure that the input file exists in the package folder inside the “workspace”
folder.
5. Click Run.
Running MapReduce program in pseudo-distributed mode:
1. The class files obtained after compilation need to be converted into jar files
before running the program in pseudo-distributed mode. Right click on the project
and select Export. See Figure 10.
Figure 10: Running MapReduce in standalone mode.

2. “Export” window appears on the screen. Select Java option and select JAR file
from the dropdown menu. Click Next. See Figure 11.
Figure 11: Running MapReduce in standalone mode.
3. Select the export destination where the jar file needs to be stored and the name
of the jar file in the “Select the export destination:” field. There is no constraint on
the choice of export destination or the name of the jar file. Click on Next. See
Figure 12.
Figure 12: Running MapReduce in Standalone mode
4. Click on Finish. Compilation would require some time. When the process is
completed, a dialogue box appears indicating that the jar export has finished.
Click on Ok. The jar file of specified name is created in the desired folder.See
Figure 13.
Figure 13: Running a MapReduce in standalone mode
5. If your program requires an input file, then the file should be stored on the HDFS
(Hadoop distributed file system). The command for copying a file from local file
system to HDFS is as follows:
hdfs dfs -put <input file with path> <path in HDFS>
6. Go to the directory where jar file is stored. The command used to do this is as
follows:
cd <path>
7. Run the jar file stored in the folder using the following command. Before this,
ensure that your jar file contains the Driver, Mapper and the Reducer class files.
hadoop jar <jar_file_name> <driver_class_path> <arguments>
The arguments can be path of input file stored on HDFS or the path of a directory
on HDFS where the output is to be stored. The output files generated by the
program will also be stored on HDFS. Ensure that no directory exists with the
name specified as the output folder before running the jar file.
8. After completion of the MapReduce job, check the output files generated in the
specified directory on HDFS system by using the read commands.

Running A Mapreduce Program On Cloudera Quickstart VM: Requirements

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Running A Mapreduce Program On Cloudera Quickstart VM: Requirements

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Running A Mapreduce Program On Cloudera Quickstart VM: Requirements

Uploaded by

Copyright:

Available Formats

Running a MapReduce program on Cloudera QuickStart VM

Creating JAVA Project :

Figure 1: Creating JAVA Project

7. Select all the jar files and click Ok

8. One more jar file, present in usr/lib/hadoop/lib is left to be added to complete

If source files are available :

1. Right click on src, go to New and select package.

Running MapReduce program in standalone mode:

Figure 6: Running MapReduce

Running MapReduce program in pseudo-distributed mode:

Figure 10: Running MapReduce in standalone mode.

Figure 11: Running MapReduce in standalone mode.

You might also like

Running A Mapreduce Program On Cloudera Quickstart VM: Requirements

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Running A Mapreduce Program On Cloudera Quickstart VM: Requirements

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Running A Mapreduce Program On Cloudera Quickstart VM: Requirements

Uploaded by

Copyright:

Available Formats

Running a MapReduce program on Cloudera QuickStart VM

Creating JAVA Project :

Figure 1: Creating JAVA Project

7. Select all the jar files and click ​Ok

8. One more jar file, present in ​usr/lib/hadoop/lib​ is left to be added to complete

If source files are available :

1. Right click on ​src, ​go to ​New ​ and select ​package.

Running MapReduce program in standalone mode:

Figure 6: Running MapReduce

Running MapReduce program in pseudo-distributed mode:

Figure 10: Running MapReduce in standalone mode.

Figure 11: Running MapReduce in standalone mode.

You might also like

7. Select all the jar files and click Ok

8. One more jar file, present in usr/lib/hadoop/lib is left to be added to complete

1. Right click on src, go to New and select package.