Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Big Data Solution Assignment-I

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Quiz Assignment-I Solutions: Big Data Computing (Week-1)

___________________________________________________________________________

Q.1 ________________ is responsible for allocating system resources to the various


applications running in a Hadoop cluster and scheduling tasks to be executed on different
cluster nodes.

A. Hadoop Common
B. Hadoop Distributed File System (HDFS)
C. Hadoop YARN
D. Hadoop MapReduce

Answer: C) Hadoop YARN

Explanation:

Hadoop Common: It contains libraries and utilities needed by other Hadoop modules.

Hadoop Distributed File System (HDFS): It is a distributed file system that stores data on a
commodity machine. Providing very high aggregate bandwidth across the entire cluster.

Hadoop YARN: It is a resource management platform responsible for managing compute


resources in the cluster and using them in order to schedule users and applications. YARN is
responsible for allocating system resources to the various applications running in a Hadoop
cluster and scheduling tasks to be executed on different cluster nodes

Hadoop MapReduce: It is a programming model that scales data across a lot of different
processes.

Q. 2 Which of the following tool is designed for efficiently transferring bulk data between
Apache Hadoop and structured datastores such as relational databases ?

A. Pig
B. Mahout
C. Apache Sqoop
D. Flume

Answer: C) Apache Sqoop

Explanation: Apache Sqoop is a tool designed for efficiently transferring bulk data between
Apache Hadoop and structured datastores such as relational databases
Q. 3 _________________is a distributed, reliable, and available service for efficiently
collecting, aggregating, and moving large amounts of log data.

A. Flume
B. Apache Sqoop
C. Pig
D. Mahout

Answer: A) Flume
Explanation: Flume is a distributed, reliable, and available service for efficiently collecting,
aggregating, and moving large amounts of log data. It has a simple and very flexible
architecture based on streaming data flows. It's quite robust and fall tolerant, and it's really
tunable to enhance the reliability mechanisms, fail over, recovery, and all the other
mechanisms that keep the cluster safe and reliable. It uses simple extensible data model that
allows us to apply all kinds of online analytic applications.

Q. 4 _______________refers to the connectedness of big data.

A. Value
B. Veracity
C. Velocity
D. Valence

Answer: D) Valence

Explanation: Valence refers to the connectedness of big data. Such as in the form of graph
networks

Q. 5 Consider the following statements:

Statement 1: Volatility refers to the data velocity relative to timescale of event being studied

Statement 2: Viscosity refers to the rate of data loss and stable lifetime of data

A. Only statement 1 is true


B. Only statement 2 is true
C. Both statements are true
D. Both statements are false

Answer: D) Both statements are false

Explanation: The correct statements are:

Statement 1: Viscosity refers to the data velocity relative to timescale of event being studied

Statement 2: Volatility refers to the rate of data loss and stable lifetime of data
Q. 6 ________________refers to the biases, noise and abnormality in data, trustworthiness
of data.

A. Value
B. Veracity
C. Velocity
D. Volume

Answer: B) Veracity

Explanation: Veracity refers to the biases ,noise and abnormality in data, trustworthiness of
data.

Q. 7 _____________ brings scalable parallel database technology to Hadoop and allows


users to submit low latencies queries to the data that's stored within the HDFS or the Hbase
without acquiring a ton of data movement and manipulation.

A. Apache Sqoop
B. Mahout
C. Flume
D. Impala

Answer: D) Impala

Explanation: Cloudera, Impala was designed specifically at Cloudera, and it's a query engine
that runs on top of the Apache Hadoop. The project was officially announced at the end of
2012, and became a publicly available, open source distribution. Impala brings scalable
parallel database technology to Hadoop and allows users to submit low latencies queries to
the data that's stored within the HDFS or the Hbase without acquiring a ton of data movement
and manipulation.

Q. 8 True or False ?
NoSQL databases store unstructured data with no particular schema
A. True
B. False

Answer: A) True
Explanation: While the traditional SQL can be effectively used to handle large amount of
structured data, we need NoSQL (Not Only SQL) to handle unstructured data. NoSQL
databases store unstructured data with no particular schema
Q. 9 ____________is a highly reliable distributed coordination kernel , which can be used for
distributed locking, configuration management, leadership election, and work queues etc.

A. Apache Sqoop
B. Mahout
C. ZooKeeper
D. Flume

Answer: C) ZooKeeper

Explanation: ZooKeeper is a central store of key value using which distributed systems can
coordinate. Since it needs to be able to handle the load, Zookeeper itself runs on many
machines.

Q. 10 True or False ?
MapReduce is a programming model and an associated implementation for processing and
generating large data sets.

A. True
B. False

Answer: A) True
___________________________________________________________________________

You might also like