100% found this document useful (1 vote)

936 views

Questions On Google File System

GFS was created to address Google's unique needs of handling large files and frequent component failures in their distributed storage systems. It focuses on high scalability, fault tolerance, and efficient support for appending large files. The key components of GFS include clients, a master server that manages metadata, and chunkservers that store and serve file data in 64MB chunks with replication for reliability. The system remains highly available even during node failures through replication and the ability to start a new master from replicas.

Uploaded by

rahul

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

936 views

Questions On Google File System

Uploaded by

rahul

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Questions on Google File System

Q1: Why was GFS created when there were so many other file systems already? What is the
rationale behind creating such a file system?

Ans1: GFS shares many of the same goals as earlier distributed file systems such as performance,
scalability, reliability, and availability, however its design was driven by the observations of
Google’s application workloads and technological environment, both current and anticipated. It
reflected a marked departure from some earlier file system assumptions and led Google to
reexamine traditional choices and explore radically different design points.

Following are the main considerations that Google took into account while designing GFS:

1.) Component failures are quite common

a. While designing GFS, Google kept in mind that component failures are the norm
rather than the exception. The file system consists of hundreds or even thousands
of storage machines built from inexpensive commodity parts and is accessed by a
comparable number of client machines. The quantity and quality of the components
virtually guarantee that some are not functional at any given time and some will not
recover from their current failures. Google had seen problems caused by application
bugs, operating system bugs, human errors, and the failures of disks, memory,
connectors, networking, and power supplies. Therefore, constant monitoring, error
detection, fault tolerance, and automatic recovery must be integral to the system.
2.) Files are huge by traditional standards
a. Multi-GB files are common. Each file typically has many application objects such
as web documents. When we continually work with fast growing data sets of many
TBs comprising billions of objects, it is unwieldy to manage billions of approx.
KB-sized files even when the file system could support it. As a result, design
assumptions and parameters such as I/O operation and block sizes must be revisited.
3.) Most files are mutated by appending new data rather than overwriting existing data
a. Random writes within a file are practically non-existent. Once written, the files are
only read, and often only sequentially. A variety of data share these characteristics.
Some may constitute large repositories that data analysis programs scan through.
Some may be data streams continuously generated by running applications. Some
may be archival data. Some may be intermediate results produced on one machine
and processed on another, whether simultaneously or later in time. Given this
access pattern on huge files, appending becomes the focus of performance
optimization and atomicity guarantees, while caching data blocks in the client loses
its appeal.
4.) Co-designing the applications and the file system API benefits the overall system by
increasing our flexibility
a. Google relaxed GFS’s consistency model to vastly simplify the file system without
imposing an onerous burden on the applications. They also introduced an atomic
append operation so that multiple clients can append concurrently to a file without
extra synchronization between them. Multiple GFS clusters are currently deployed
for different purposes. The largest ones have over 1000 storage nodes, over 300 TB
of disk storage, and are heavily accessed by hundreds of clients on distinct
machines on a continuous basis.

Comparison of Google File Systems with other systems:

i) GFS provides location independent namespace which enables data to be move

transparently for load balance and fault tolerance like Andrew File Systems (AFS).

ii) GFS spreads data across storage servers unlike AFS which reads from the same server

iii) GFS uses simple file replication unlike RAID where file replication is complex

iv) GFS does not provide caching below the filesystem while Sun Network File System
(SNFS) does.

v) GFS has single master, rather than distributed.

vi) HDFS (Hadoop) is an open source implementation of Google File System written in Java.
It follows the same overall design, but differs in supported features and implementation
details:
a.) Does not support random writes
b.) Does not support appending to existing files
c.) Does not support multiple concurrent writers

Q2: Why GFS is extremely scalable?

Ans2: Modularity allows GFS to easily expand to account for increasing amounts of data and
users. The paper states that currently the system accounts for approximately 300 TB of information
however, the system is designed such that adding more chunkservers can be accomplished without
significantly modifying the master server. Further, the decentralized method of data access that
primarily involves chunkserver-application interaction alleviates significant bottlenecks at the
master. The combination of extensibility as well as performance across increasing amounts of
users and data means that the system is entirely scalable within the Google context.

Q3: Explain the key components for GFS architecture.

Ans3: GFS architecture and components:

The GFS is composed of clusters. A cluster is a set of networked computers. GFS clusters contain
three types of interdependent entities which are: Client, master and chunk server. Clients could be:
Computers or applications manipulating existing files or creating new files on the system. The
master server is the orchestrator or manager of the cluster system that maintain the operation log.
Operation log keeps track of the activities made by the master itself which helps reducing the
service interruptions to a minimum level. At startup, master server retrieves information about
contents and inventories from chunk servers. Then after, the master server keeps tracks of the
location of the chunks with the cluster. The GFS architecture keeps the messages that the master
server sends and receives very small. The master server itself doesn’t handle file data at all, this is
done by chunk servers. Chunk servers are the core engine of the GFS. They store file chunks of
64 MB size. Chunk servers coordinate with the master server and send requested chunks to clients
directly.

GFS replicas: The GFS has two replicas: Primary and secondary replicas. A primary replica is
the data chunk that a chunk server sends to a client. Secondary replicas serve as backups on other
chunk servers. The master server decides which chunks act as primary or secondary. If the client
makes changes to the data in the chunk, then the master server lets the chunk servers with
secondary replicas, know they have to copy the new chunk off the primary chunk server to stay in
its current state.

Q4: How GFS handles a node failure?

Ans4: In GFS, the chunked data is replicated on 3 different chunkservers by default and the
replication can even be increased later. So, in case of a node failure, there are still at least 2 nodes
having the same data as stored in the failed node. Node failure is detected when the
datanode/chunkserver fails to send the heartbeat to the GFS master and when it happens, the GFS
master/namenode decreases the replica counts for each of its blocks and then performs replication
on a different datanode/chunkserver.

Master state is also replicated for reliability on multiple machines, using the operation log and
checkpoints.

i) If master fails, GFS can start a new master process at any of these replicas and modify
DNS alias accordingly

ii) “Shadow” masters also provide read-only access to the file system, even when primary
master is down

a. They read a replica of the operation log and apply the same sequence of changes
b. Not mirrors of master – they lag primary master by fractions of a second
c. This means we can still read up-to-date file contents while master is in recovery!

Distributed System MCQ
100% (1)
Distributed System MCQ
10 pages
Net-Centric Past Questions Answers
No ratings yet
Net-Centric Past Questions Answers
7 pages
Distributed System Answer Key
91% (11)
Distributed System Answer Key
50 pages
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
No ratings yet
Data Mining MCQ Multiple Choice Questions With Answers: Eguardian
15 pages
Unit 1 Web Technology: Introduction and Web Development Strategies
No ratings yet
Unit 1 Web Technology: Introduction and Web Development Strategies
29 pages
Actualtests.1Z0-133.79.Qa: 1Z0-133 Oracle Weblogic Server 12C: Administration I
No ratings yet
Actualtests.1Z0-133.79.Qa: 1Z0-133 Oracle Weblogic Server 12C: Administration I
28 pages
MCQ Amt 2
No ratings yet
MCQ Amt 2
9 pages
Answer Midterm Exam Data Mining1 2021 - 2022
No ratings yet
Answer Midterm Exam Data Mining1 2021 - 2022
4 pages
Data Mining Metrices
No ratings yet
Data Mining Metrices
6 pages
Compiler Design Quiz-II 2016 17
No ratings yet
Compiler Design Quiz-II 2016 17
6 pages
Distributed 2 Marks
No ratings yet
Distributed 2 Marks
16 pages
Unit 4 Transaction Processing
No ratings yet
Unit 4 Transaction Processing
45 pages
CC MCQ QB 3
No ratings yet
CC MCQ QB 3
5 pages
Computer Networks Set 1
No ratings yet
Computer Networks Set 1
5 pages
Ch1 Sad & Access MCQ
No ratings yet
Ch1 Sad & Access MCQ
10 pages
Business Intelligence MCQ Bank 1
100% (1)
Business Intelligence MCQ Bank 1
8 pages
BI UNIT-II Chp01(Mathematical models for decision making)
No ratings yet
BI UNIT-II Chp01(Mathematical models for decision making)
9 pages
Unit 1 PPT CC
No ratings yet
Unit 1 PPT CC
38 pages
2mark With Answer
No ratings yet
2mark With Answer
38 pages
DDBMS True False
No ratings yet
DDBMS True False
7 pages
Distributed File System Questions and Answers
No ratings yet
Distributed File System Questions and Answers
6 pages
Data Mining and Warehousing
100% (3)
Data Mining and Warehousing
30 pages
CC MCQ QB 1
No ratings yet
CC MCQ QB 1
5 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
38 pages
DM Important Questions
100% (1)
DM Important Questions
2 pages
MCQs - Big Data Analytics - 7 V's of Big Data
No ratings yet
MCQs - Big Data Analytics - 7 V's of Big Data
7 pages
CC MCQ
No ratings yet
CC MCQ
28 pages
Java 2 Marks
No ratings yet
Java 2 Marks
9 pages
Spark MCQ
No ratings yet
Spark MCQ
3 pages
(MCQ) - Data Warehouse and Data Mining - LMT
No ratings yet
(MCQ) - Data Warehouse and Data Mining - LMT
4 pages
DWDM Online Bits
No ratings yet
DWDM Online Bits
3 pages
Distributed File Systems: Unit - V Essay Questions
No ratings yet
Distributed File Systems: Unit - V Essay Questions
10 pages
MCQ On Operating System-All Unit
No ratings yet
MCQ On Operating System-All Unit
15 pages
Design and Analysis of Algorithms Solved MCQs (Set-1) PDF
No ratings yet
Design and Analysis of Algorithms Solved MCQs (Set-1) PDF
7 pages
Hadoop 1000 MCQ Question
No ratings yet
Hadoop 1000 MCQ Question
96 pages
Daa Previous Years Question Papers
No ratings yet
Daa Previous Years Question Papers
12 pages
Advanced Cluster Analysis: Clustering High-Dimensional Data
No ratings yet
Advanced Cluster Analysis: Clustering High-Dimensional Data
49 pages
Data Mining Question Bank
0% (1)
Data Mining Question Bank
7 pages
Mobile Computing Questions
100% (1)
Mobile Computing Questions
3 pages
UNIT 1 Mcqs (IPT)
No ratings yet
UNIT 1 Mcqs (IPT)
3 pages
Software Engineering - User Interface Design MCQ ExamRadar
No ratings yet
Software Engineering - User Interface Design MCQ ExamRadar
5 pages
Software Testing - QUESTION BANK
No ratings yet
Software Testing - QUESTION BANK
19 pages
DBMS MCQs - Chapterwise Database Management Multiple Choice Questions and Answers
50% (2)
DBMS MCQs - Chapterwise Database Management Multiple Choice Questions and Answers
6 pages
MCQ Esiot-2
50% (2)
MCQ Esiot-2
35 pages
Dmbi Mcqs Mcqs For Data Mining and Business Intelligence
No ratings yet
Dmbi Mcqs Mcqs For Data Mining and Business Intelligence
24 pages
Unit Iii
No ratings yet
Unit Iii
9 pages
Ir MCQ-1
No ratings yet
Ir MCQ-1
22 pages
Network Security MCQ Questions
No ratings yet
Network Security MCQ Questions
5 pages
CC QTN Paper
No ratings yet
CC QTN Paper
2 pages
MCQ, Design and Analysis of Algorithm
100% (1)
MCQ, Design and Analysis of Algorithm
19 pages
006 Practical List of DM-2023
No ratings yet
006 Practical List of DM-2023
1 page
Presantation - Chapter 07-Decrease and Conquer
No ratings yet
Presantation - Chapter 07-Decrease and Conquer
41 pages
Unit 1 - OS - Introduction - Question Bank
No ratings yet
Unit 1 - OS - Introduction - Question Bank
3 pages
Sample Questions Answers
No ratings yet
Sample Questions Answers
8 pages
ML MCQ Questions and Answer PDF
No ratings yet
ML MCQ Questions and Answer PDF
10 pages
Architecture of An RMI Application
No ratings yet
Architecture of An RMI Application
1 page
Question Paper Code:: (10×2 20 Marks)
No ratings yet
Question Paper Code:: (10×2 20 Marks)
2 pages
Compiler Design MCQ Questions PDF
No ratings yet
Compiler Design MCQ Questions PDF
6 pages
Nr-35-Mca-Design and Analysis of Algorithm
100% (1)
Nr-35-Mca-Design and Analysis of Algorithm
2 pages
Touchpad Plus Ver. 2.1 Class 2
From Everand
Touchpad Plus Ver. 2.1 Class 2
Team Orange
No ratings yet
A Review On GOOGLE File System
No ratings yet
A Review On GOOGLE File System
4 pages
Timreader Co Uk
No ratings yet
Timreader Co Uk
4 pages
Distributed System Models: Most Concepts Are Drawn From Chapter 2
No ratings yet
Distributed System Models: Most Concepts Are Drawn From Chapter 2
49 pages
User Manual Epson TM
No ratings yet
User Manual Epson TM
70 pages
ISM Book Exercise Solutions
No ratings yet
ISM Book Exercise Solutions
36 pages
Prospectus MICT
No ratings yet
Prospectus MICT
32 pages
Uds 21 User Administration Guide Rev1
No ratings yet
Uds 21 User Administration Guide Rev1
254 pages
CH 4 Edge To Cloud Protocol
No ratings yet
CH 4 Edge To Cloud Protocol
84 pages
Os Avya
No ratings yet
Os Avya
626 pages
FPGA Implementation of A Phaselet Method For High Speed Distance Relaying Preliminary Results
No ratings yet
FPGA Implementation of A Phaselet Method For High Speed Distance Relaying Preliminary Results
5 pages
Nortel GSM BSS Parameter Reference: Global System For Mobile Communications (GSM)
100% (2)
Nortel GSM BSS Parameter Reference: Global System For Mobile Communications (GSM)
556 pages
ITU Classification of Emissions
100% (1)
ITU Classification of Emissions
3 pages
Data Analytics For Ioe: Syllabus
No ratings yet
Data Analytics For Ioe: Syllabus
23 pages
HP t310 Zero Client: High-Performance. Zero Hassles
No ratings yet
HP t310 Zero Client: High-Performance. Zero Hassles
2 pages
F5 Pass4sureexam 101 v2019!02!20 by Donald 250q
No ratings yet
F5 Pass4sureexam 101 v2019!02!20 by Donald 250q
118 pages
B360 Aorus Gaming 3 Wifi B360 Aorus Gaming 3: User's Manual
No ratings yet
B360 Aorus Gaming 3 Wifi B360 Aorus Gaming 3: User's Manual
44 pages
Accelnet Plus Dualaxis Module CANopen-AP2-Datasheet-Datasheet
No ratings yet
Accelnet Plus Dualaxis Module CANopen-AP2-Datasheet-Datasheet
34 pages
Airport Extreme Card: Replacement Instructions
No ratings yet
Airport Extreme Card: Replacement Instructions
7 pages
DCN Over Vlan
No ratings yet
DCN Over Vlan
14 pages
A Study Into The Effectiveness of Viral Marketing
100% (1)
A Study Into The Effectiveness of Viral Marketing
28 pages
Arctic Network (UVa) - ASDF Coding
No ratings yet
Arctic Network (UVa) - ASDF Coding
3 pages
Crowdpouch Product Requirements Document Target Release Date: Q3 2017
No ratings yet
Crowdpouch Product Requirements Document Target Release Date: Q3 2017
17 pages
7.6.1: Packet Tracer Skills Integration Challenge Activity: Topology Diagram
No ratings yet
7.6.1: Packet Tracer Skills Integration Challenge Activity: Topology Diagram
4 pages
Comst2661384 PDF
No ratings yet
Comst2661384 PDF
27 pages
Sms Dump 20191024230556
No ratings yet
Sms Dump 20191024230556
94 pages
Fuji FM DPL
No ratings yet
Fuji FM DPL
54 pages
Vico Office R6.5 Installation Guide
No ratings yet
Vico Office R6.5 Installation Guide
32 pages
Vault Search Setting
No ratings yet
Vault Search Setting
5 pages
CP343-1 - Industrial Ethernet 6GK7 343 1EX30 0XE0
No ratings yet
CP343-1 - Industrial Ethernet 6GK7 343 1EX30 0XE0
54 pages
Social Networking Questionnaire1
No ratings yet
Social Networking Questionnaire1
15 pages

Questions On Google File System

Uploaded by

Questions On Google File System

Uploaded by

Questions on Google File System

1.) Component failures are quite common

Comparison of Google File Systems with other systems:

i) GFS provides location independent namespace which enables data to be move

v) GFS has single master, rather than distributed.

Q2: Why GFS is extremely scalable?

Q3: Explain the key components for GFS architecture.

Ans3: GFS architecture and components:

Q4: How GFS handles a node failure?

You might also like