Big Data Solution Assignment-I
Big Data Solution Assignment-I
Big Data Solution Assignment-I
___________________________________________________________________________
A. Hadoop Common
B. Hadoop Distributed File System (HDFS)
C. Hadoop YARN
D. Hadoop MapReduce
Explanation:
Hadoop Common: It contains libraries and utilities needed by other Hadoop modules.
Hadoop Distributed File System (HDFS): It is a distributed file system that stores data on a
commodity machine. Providing very high aggregate bandwidth across the entire cluster.
Hadoop MapReduce: It is a programming model that scales data across a lot of different
processes.
Q. 2 Which of the following tool is designed for efficiently transferring bulk data between
Apache Hadoop and structured datastores such as relational databases ?
A. Pig
B. Mahout
C. Apache Sqoop
D. Flume
Explanation: Apache Sqoop is a tool designed for efficiently transferring bulk data between
Apache Hadoop and structured datastores such as relational databases
Q. 3 _________________is a distributed, reliable, and available service for efficiently
collecting, aggregating, and moving large amounts of log data.
A. Flume
B. Apache Sqoop
C. Pig
D. Mahout
Answer: A) Flume
Explanation: Flume is a distributed, reliable, and available service for efficiently collecting,
aggregating, and moving large amounts of log data. It has a simple and very flexible
architecture based on streaming data flows. It's quite robust and fall tolerant, and it's really
tunable to enhance the reliability mechanisms, fail over, recovery, and all the other
mechanisms that keep the cluster safe and reliable. It uses simple extensible data model that
allows us to apply all kinds of online analytic applications.
A. Value
B. Veracity
C. Velocity
D. Valence
Answer: D) Valence
Explanation: Valence refers to the connectedness of big data. Such as in the form of graph
networks
Statement 1: Volatility refers to the data velocity relative to timescale of event being studied
Statement 2: Viscosity refers to the rate of data loss and stable lifetime of data
Statement 1: Viscosity refers to the data velocity relative to timescale of event being studied
Statement 2: Volatility refers to the rate of data loss and stable lifetime of data
Q. 6 ________________refers to the biases, noise and abnormality in data, trustworthiness
of data.
A. Value
B. Veracity
C. Velocity
D. Volume
Answer: B) Veracity
Explanation: Veracity refers to the biases ,noise and abnormality in data, trustworthiness of
data.
A. Apache Sqoop
B. Mahout
C. Flume
D. Impala
Answer: D) Impala
Explanation: Cloudera, Impala was designed specifically at Cloudera, and it's a query engine
that runs on top of the Apache Hadoop. The project was officially announced at the end of
2012, and became a publicly available, open source distribution. Impala brings scalable
parallel database technology to Hadoop and allows users to submit low latencies queries to
the data that's stored within the HDFS or the Hbase without acquiring a ton of data movement
and manipulation.
Q. 8 True or False ?
NoSQL databases store unstructured data with no particular schema
A. True
B. False
Answer: A) True
Explanation: While the traditional SQL can be effectively used to handle large amount of
structured data, we need NoSQL (Not Only SQL) to handle unstructured data. NoSQL
databases store unstructured data with no particular schema
Q. 9 ____________is a highly reliable distributed coordination kernel , which can be used for
distributed locking, configuration management, leadership election, and work queues etc.
A. Apache Sqoop
B. Mahout
C. ZooKeeper
D. Flume
Answer: C) ZooKeeper
Explanation: ZooKeeper is a central store of key value using which distributed systems can
coordinate. Since it needs to be able to handle the load, Zookeeper itself runs on many
machines.
Q. 10 True or False ?
MapReduce is a programming model and an associated implementation for processing and
generating large data sets.
A. True
B. False
Answer: A) True
___________________________________________________________________________