Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
313 views

Big Data Computing - Assignment 3

This document contains a 10 question quiz about Spark, distributed computing concepts, and NoSQL databases for the Week 3 assignment of the Big Data Computing course. The questions cover topics like RDDs, Spark APIs, Spark Streaming, GraphX, Cassandra, and scaling strategies. The document provides the questions, possible answer options, and a submission button to record responses before the due date.

Uploaded by

VarshaMega
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
313 views

Big Data Computing - Assignment 3

This document contains a 10 question quiz about Spark, distributed computing concepts, and NoSQL databases for the Week 3 assignment of the Big Data Computing course. The questions cover topics like RDDs, Spark APIs, Spark Streaming, GraphX, Cassandra, and scaling strategies. The document provides the questions, possible answer options, and a submission button to record responses before the due date.

Uploaded by

VarshaMega
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

9/6/21, 6:41 AM Big Data Computing - - Unit 5 - Week-3

Assessment submitted.

(https://swayam.gov.in)      

(https://swayam.gov.in/nc_details/NPTEL)
X

remeshbabu@gecskp.ac.in 

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL)
»
Big Data Computing (course)

Register for
Certification
exam
Thank you for taking the Week 3:
(https://examform.nptel.ac.in/) Assignment-3.
Course
outline Week 3: Assignment-3
Your last recorded submission was on 2021-09-06, 06:41 Due date: 2021-09-15, 23:59 IST.
How does an IST
NPTEL online
course work? 1) In Spark, a ______________________is a read-only collection of objects 1 point
partitioned across a set of machines that can be rebuilt if a partition is lost.
Week-0

Spark Streaming

FlatMap
Week-1

Driver
Week-2
Resilient Distributed Dataset (RDD)

2) Given the following definition about the join transformation in Apache Spark:
1 point
Week-3

Parallel
                   def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]
Programming

with Spark Where join operation is used for joining two datasets. When it is called on datasets of type (K, V)
(unit? and (K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.

unit=33&lesson=34)

Output the result of joinrdd, when the following code is run.

Introduction to
Spark (unit?

unit=33&lesson=35) val rdd1 = sc.parallelize(Seq(("m",55),("m",56),("e",57),("e",58),("s",59),("s",54)))

Spark Built-in
val rdd2 = sc.parallelize(Seq(("m",60),("m",65),("s",61),("s",62),("h",63),("h",64)))

Libraries (unit?

unit=33&lesson=36)
val joinrdd = rdd1.join(rdd2)

Design of Key-

Value Stores joinrdd.collect

(unit?
unit=33&lesson=37)
Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,
Week 3:
(59,61)), (s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))
Lecture

https://onlinecourses.nptel.ac.in/noc21_cs86/unit?unit=33&assessment=94 1/3
9/6/21, 6:41 AM Big Data Computing - - Unit 5 - Week-3

material (unit?
Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,
Assessment submitted.
unit=33&lesson=38)
(59,61)), (s,(59,62)), (e,(57,58)),  (s,(54,61)), (s,(54,62)))
X
Quiz: Week 3:
Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,
Assignment-3 (59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
(assessment?
name=94)

None of the mentioned

  3) Consider the following statements in the context of Spark:


1 point

Statement 1:  Spark improves efficiency through in-memory computing primitives and general
computation graphs.

Statement 2:  Spark improves usability through high-level APIs in Java, Scala, Python and also
provides an interactive shell.


Only statement 1 is true

Only statement 2 is true

Both statements are true

Both statements are false

4) True or False ?
1 point

Resilient Distributed Datasets (RDDs) are fault-tolerant and immutable.


True

False

5) Which of the following is not a NoSQL database ? 1 point


HBase

Cassandra

SQL Server

None of the mentioned

6) True or False ?
1 point

Apache Spark potentially run batch-processing programs up to 100 times faster than Hadoop
MapReduce in memory, or 10 times faster on disk.


True

False

7) ______________ leverages Spark Core fast scheduling capability to perform 1 point


streaming analytics.


MLlib

Spark Streaming

GraphX

RDDs

8) ____________________ is a distributed graph processing framework on top of 1 point


Spark.


MLlib

Spark streaming

https://onlinecourses.nptel.ac.in/noc21_cs86/unit?unit=33&assessment=94 2/3
9/6/21, 6:41 AM Big Data Computing - - Unit 5 - Week-3


GraphX
Assessment submitted.

All of the mentioned
X
9) Point out the incorrect statement in the context of Cassandra: 1 point


It is a centralized key-value store

It is originally designed at Facebook

It is  designed to handle large amounts of data across many commodity servers,
providing high availability with no single point of failure

It uses a ring-based DHT (Distributed Hash Table) but without finger tables or routing

10) Consider the following statements:


1 point

Statement 1: Scale out means grow your cluster capacity by replacing with more powerful
machines.

Statement 2: Scale up means incrementally grow your cluster capacity by adding more COTS
machines (Components Off the Shelf).


Only statement 1 is true

Only statement 2 is true

Both statements are false

Both statements are true

You may submit any number of times before the due date. The final submission will be
considered for grading.
Submit Answers

https://onlinecourses.nptel.ac.in/noc21_cs86/unit?unit=33&assessment=94 3/3

You might also like