Big Data Computing - Assignment 3
Big Data Computing - Assignment 3
Assessment submitted.
(https://swayam.gov.in)
(https://swayam.gov.in/nc_details/NPTEL)
X
remeshbabu@gecskp.ac.in
NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL)
»
Big Data Computing (course)
Register for
Certification
exam
Thank you for taking the Week 3:
(https://examform.nptel.ac.in/) Assignment-3.
Course
outline Week 3: Assignment-3
Your last recorded submission was on 2021-09-06, 06:41 Due date: 2021-09-15, 23:59 IST.
How does an IST
NPTEL online
course work? 1) In Spark, a ______________________is a read-only collection of objects 1 point
partitioned across a set of machines that can be rebuilt if a partition is lost.
Week-0
Spark Streaming
FlatMap
Week-1
Driver
Week-2
Resilient Distributed Dataset (RDD)
2) Given the following definition about the join transformation in Apache Spark:
1 point
Week-3
Parallel
def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]
Programming
with Spark Where join operation is used for joining two datasets. When it is called on datasets of type (K, V)
(unit? and (K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.
unit=33&lesson=34)
Introduction to
Spark (unit?
Spark Built-in
val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64)))
Libraries (unit?
unit=33&lesson=36)
val joinrdd = rdd1.join(rdd2)
Design of Key-
(unit?
unit=33&lesson=37)
Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,
Week 3:
(59,61)), (s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))
Lecture
https://onlinecourses.nptel.ac.in/noc21_cs86/unit?unit=33&assessment=94 1/3
9/6/21, 6:41 AM Big Data Computing - - Unit 5 - Week-3
material (unit?
Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,
Assessment submitted.
unit=33&lesson=38)
(59,61)), (s,(59,62)), (e,(57,58)), (s,(54,61)), (s,(54,62)))
X
Quiz: Week 3:
Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,
Assignment-3 (59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
(assessment?
name=94)
None of the mentioned
Statement 1: Spark improves efficiency through in-memory computing primitives and general
computation graphs.
Statement 2: Spark improves usability through high-level APIs in Java, Scala, Python and also
provides an interactive shell.
Only statement 1 is true
Only statement 2 is true
Both statements are true
Both statements are false
4) True or False ?
1 point
True
False
HBase
Cassandra
SQL Server
None of the mentioned
6) True or False ?
1 point
Apache Spark potentially run batch-processing programs up to 100 times faster than Hadoop
MapReduce in memory, or 10 times faster on disk.
True
False
MLlib
Spark Streaming
GraphX
RDDs
MLlib
Spark streaming
https://onlinecourses.nptel.ac.in/noc21_cs86/unit?unit=33&assessment=94 2/3
9/6/21, 6:41 AM Big Data Computing - - Unit 5 - Week-3
GraphX
Assessment submitted.
All of the mentioned
X
9) Point out the incorrect statement in the context of Cassandra: 1 point
It is a centralized key-value store
It is originally designed at Facebook
It is designed to handle large amounts of data across many commodity servers,
providing high availability with no single point of failure
It uses a ring-based DHT (Distributed Hash Table) but without finger tables or routing
Statement 1: Scale out means grow your cluster capacity by replacing with more powerful
machines.
Statement 2: Scale up means incrementally grow your cluster capacity by adding more COTS
machines (Components Off the Shelf).
Only statement 1 is true
Only statement 2 is true
Both statements are false
Both statements are true
You may submit any number of times before the due date. The final submission will be
considered for grading.
Submit Answers
https://onlinecourses.nptel.ac.in/noc21_cs86/unit?unit=33&assessment=94 3/3