Hadoop Exams
Hadoop Exams
8) What is HDFS?
A. HDFS is regular file system like any other file system, and you can
perform any operations on HOFS
B. HDFS is a layered file system on top of your native file system and you
can do all the operations you want
C. HDFS is layered file system which modifies the local system in such a
way that you can perform any operations
D. HDFS is layered file system on top of your local file system which does
not modify local file system and there are some restrictions with
respect to the operations which you perform
10) When you put the files on HDFS, where does the HDFS
stores its blocks?
A. On HDFS
B. On Name Nodes local file system
C. On Data Nodes local file system
D. Blocks are placed both on Name Nodes and Data Nodes local file
system so that if Data Node goes down , Name Node should be able to
replicate the data from its own local file system
A. You will search each of the data nodes and ask the data nodes list of
blocks .Then you check each of the blocks and read the appropriate
block
B. You will ask the Name Node, and since Name Node has the meta
information, it will read the data from the data node s give back the file
to you
C. You will ask the Name Node and since the Name Node has the meta
information, it will give you the list of data nodes which are hosting the
blocks & then you go each of the data nodes & read the block
D. You will directly read the files from HDFS
14) What is not true about Local Job Runner Mode? Choose two
15) What is the command you will use to run a driver named
Sales Analysis whose complied code is available in a jar file
Sales Analytics.jar with input data in directory /sales/data
and output in a directory Sales/analytics?
16) One map-reduce programme takes a text were each line break
is considered one complete record and the line offset as a key.
The map method parses the record into words and for each word
it creates multiple key value pair where keys are the words itself
and values are the characters in the word. The reducer finds the
characters used for each unique word.This programme may not
be a perfect programme but it works correctly. The problem this
program has is That, it creates more key value pairs in the
intermediate output of mappers from single input (key-value).
This leads to increasese of which of the following> (Select the
correct answer)
A. The reduce side joining is faster as it receives the records sorted by keys.
B. The reduce side joining is faster as it uses secondary sort.
C. The map side joining faster as it caches the data from one file in-memory
D. The map side joining faster as it writes the intermediate data on local file
system
20) You want to run 2 different jobs which may use same look up
data (for example, US state code).While submitting the first job
you used the distributed cache to copy the lookup data file in
each data node. Both the jobs have mapper configure method
where the distributed file is retrieved programmatically and
values are cached in a hash map. Both the job uses Tool Runner
so that the file for distributed cache can be provided at the
command prompt .You run the first Job with file passed to the
distributed cache. When the job is complete you fire the second
job without passing the look up file to distributed cache. What is
consequence? (Select one)
A. The first job runs but the second job fails .This is because ,distributed
cache is persistent as long as the job is not complete .After the job is
complete the distributed cache gets removed.
B. The first and second job completes without any problem as Distributed
caches are once set those are permanently copied
C. The first and second job will be successfully completed if the number of
reducer is set to zero. Because distributed cache works only with map only
jobs.
D. Both the jobs are successfully if those are chained using chain mapper or
chain reducer. Because, distributed cache only works with chain Mapper or
Chain Reducer.
HADOOP ON DEVELOPMENT
21) You have Just executed a map reduce job. Where the
intermediate data is written to after being emitted from mappers
method?
A. Have your system administrator copy the JAR to all nodes in the cluster
and set its location in the HADOOP_CLASSPATH environment variable
before you submit your job.
B. Have your system administrator place the JAR file on a web server
accessible to all cluster nodes and then set the HTIP_JAR_URL_environment
variable to its location.
C. When submitting the job on command line specify-libjars option followed
by the Jars file path.
D. Package your code and the apache common math library into single zip
file
A. Run all the nodes in your production cluster as virtual machine on your
development workstation.
B. Run the Hadoop command with the _jt local & local & the fs ~ options.
C. Run the Data Node, Task Tracker, Job Tracker and Name Node daemons on
a single machine.
D. Run simpldoop, Apache open source software for simulating Hadoop
cluster.
24) You are developing a Map Reduce job for reporting .The
mapper will process input keys representing the year (int
HADOOP ON DEVELOPMENT
27)You have submitted a job on an input file which has 400 input
splits in HDFS. How many map tasks will run?
A. At most 400.
B. At least 400.
C. Between 400 and 1200.
D. Between 100 and 400.
B. Any programming language that can comply with Map Reduce concept can
be supported.
C. Only Java supported since Hadoop was written in Java.
D. Currently Map Reduce supports Java, C, C++ and COBOL.
31) If you run the word count MapReduce program with m mapper
and r reducers, how many output files will you get at the end of
the job? And how many key-value pairs will there be in each file?
Assuming k is the number of unique words in the input files.
A. There will be r files, each with exactly k/r key value pairs
B. There will be r files, each with approximately kim key value pairs.
C. There will be r files, each with approximately k/r key-value pairs.
D. There will be m files , each with exactly kim key value pairs.
E. There will be m files, each with approximately kim key- value pairs.
E. Input file splits may cross line boundary. A line that crosses file splits is
read by the RecordReader of split that contains the beginning of the
broken line.
B. The values are arbitrarily ordered , and the ordering may vary from run to
run of the same MapReduce job
C. The values are arbitrarily ordered, but multiple runs of the same
MapReduce job will always have the same ordering.
D. Since the values come from mapper outputs, the reducers will receive
contiguous sections of sorted values.
38) You write a MapReduce job to process 100 files in HDFS. Your
MapReduce algorithm uses Text Input Format and the
IdentityReducer. The mapper applies a regular expression over
input values and emits key value pairs with the key consisting of
the matching text, and the value containing the filename and
byte offset. Determine the difference between setting the number
of reducer to zero.
A. There is no difference in output between the two settings.
B. With Zero reducers, no reducer runs and the job throw an exception, with
one reducer; instances of matching patterns are stored in a single file on
HDFS.
C. With zero reducer, all instances of matching patterns stored in multiple
files on HDFS.
D. With zero reducers, instances of matching pattern are stored in multiple
files on HDFS.With one reducer; all instances of matching patterns are
collected in one on HDFS.
39) You use the Hadoop fs-put command to write a 300 MB file
using an HDFS block size of 64MB. Just after this command has
finished writing 200MB of this file, what would another user see
when trying to access this file?
A. They would see no content until the whole file is written and closed.
B. They would see the content of the file through the complete block.
C. They would see the current state of the file, up to the last bit written by
the command.
HADOOP ON DEVELOPMENT
D. They would see Hadoop throw a concurrent File Access Exception when
they try to access this file
41) Your cluster has 10 Datanodes, each with a single 1TB hard
drive. You utilize all your disk capacity for HDFS, reserving none
for MapReduce. You implement default replication settings. What
is the storage capacity of your Hadoop Cluster? (Assuming no
compression)?
A. About 3TB
B. About 5TB
C. About 10 TB
D. About 11 TB
43) You have written a MapReduce job that process 500 million
input records and generate 500 million key-value pairs .The data
is not uniformly distributed. Your MapReduce job will create a
significant amount of intermediate data that it needs to transfer
between mappers and reducers which is s potential bollteneck. A
custom implementation of which of the following. Interfacesare
most likely reducing the amount of Intermediate data transferred
across the network?
A. Writable
B. Writable comparable
C. Input Format
D. Out Format
HADOOP ON DEVELOPMENT
E. Combiner
F. Partitioner
44) How does the Namenode detect that a DataNode has failed?
A. The NameNode does not need to know that DataNode has failed.
B. When the NameNode fails to receive periodic heartbeats from the
DataNode, it considers the DataNode as failed.
C. The NameNode pings the DataNode .If the DataNode does not respond,
the NameNode consider the DataNode failed.
D. When HDFS starts up, the NameNode tries to communicate with the
DataNode and consider the DataNodes failed if it does not respond.
48) If you want to load a lookup table which will be used by all
map tasks what is the best way to do it?
HADOOP ON DEVELOPMENT
A. Copy the file into HDFS and using Hadoop API initialize the values in each
mapper task.
B. Create a hash map of lookup values and pass it to the JobConf object in
the driver code aa known parameter. Then assist it using the parameter
name in each map task.
C. Copy the lookup table to each node using distributed cache during the
submission of the job (or programmatically in driver code).Then assist it
during the configure method of the mapper tasks
D. It is not possible to use a lookup table in mapper tasks
49) One large data set has fewer keysets but each key has large
number of occurrences in the data. A single reducer may not be
able to process the whole data set. So, you decided to create one
reducer task per key ranges. What is component you will use to
make each key is processed by the appropriate reducer?
A. Combiner
B. OOZIE
C. PIG
D. Total Order Partitioner
50) You want to count the number of occurrences for each unique
word in the supplied input data. You have decided to implement
this by having your mapper tokenize each word and emit a literal
value 1 and then have your reducer increment a counter for each
literal 1 it receives. After successful implementation it occurs to
you that you could optimise this by specifying a combiner. Will
you be able to use your existing reducer as your combiner and
why or why not?
A. Yea because the sum operation is both associative and commutative and
the input and output types of reduce method match
B. No, because the sum operation in reducer is incompatible with operation
of a reducer.
C. No ,because combiner and reducers use different interfaces
D. No, because mapper and combiner must use the same input data types.
52) You are developing a combiner that takes as input Text keys,
IntWritable Values, and emits Text keys, IntWritable values.
Which interface should your class implement?
A. Combiner< Text, IntWritable, Text, IntWritable>
B. Reducer< Text, Text,IntWritable,IntWritable>
C. Reducer<Text, Text ,IntWritable ,IntWritable>
D. Reducer< Text, IntWritable, Text, IntWritable>
55) What is the maximum limit for key-value pair that a mapper
can emit?
A. Its equivalent to number of lines in input files.
B. Its equivalent to number of times mapt) method is called in mapper task.
C. There is no such restriction. It depends on the use case and logic.
D. 10000
57) You need to move a large file titled weblog into HDFS.
When you try to copy the file you cant .When you verify the
HADOOP ON DEVELOPMENT
cluster memory then you see that there are ample space available
.Which action you should take to relieve this situation and store
more files into HDFS.
A. Combiner
B. Partitioner
C. comparator
D. Reducer
E. All of the above
A. Task Tracker
B. Job Tracker
C. Namenode
D. Data Node
E. All of the above
60) Which of the file is used for copying file from one cluster to
another cluster
A. Shell script
B. Java program
C. DistCp
D. none of the above
E. all of the above
================================================
========================