What Is Distributed Data Processing?

Distributed file systems allow centralized storage and access of files across a network. They organize files in a hierarchical structure and track file locations with a uniform naming convention. When a client retrieves a file, it appears local but is actually stored on a server and returned to the server after editing. Google File System is a distributed file storage system that replicates file chunks across multiple servers for fault tolerance. It uses large chunk and batch processing to improve efficiency at scale for Google's data storage needs.

Uploaded by

Isabela Nora Valenzuela

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

What Is Distributed Data Processing?

Uploaded by

Isabela Nora Valenzuela

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

1. What is Distributed Data Processing?

Distributed file system (DFS) is a method of storing and accessing files based in
client/server architecture. In a distributed file system, one or more central servers store
files that can be accessed, with proper authorization rights, by any number of remote
clients in the network.

Much like an operating system organizes files in a hierarchical file management system;
the distributed system uses a uniform naming convention and a mapping scheme to keep
track of where files are located. When the client device retrieves a file from the server,
the file appears as a normal file on the client machine, and the user is able to work with
the file in the same ways as if it were stored locally on the workstation. When the user
finishes working with the file, it is returned over the network to the server, which stores
the now-altered file for retrieval at a later time.

Distributed file systems can be advantageous because they make it easier to distribute
documents to multiple clients and they provide a centralized storage system so that client
machines are not using their resources to store files.

[Beal, Vangie. (n.d.). Distributed File System. Webopedia. Retrieved December 14,
2020, from https://www.webopedia.com/TERM/D/distributed_file_system.html]

2. Watch in YouTube the “Google File System – Paper that inspired Hadoop”
https://www.youtube.com/watch?v=eRgFNW4QFDc

3. Illustrate and discuss the Google File System and its Components.

Google file system is essentially a Distributed File Storage. In any given cluster of
Google File system can be hundreds or thousands of commodity servers and this cluster
provides an interface for number of clients to either read a file or write a file. So,
theoretically, it is exactly like a file system but distributed over hundreds or thousands of
servers.

The GFS node cluster is a single master with multiple chunk servers that are continuously
accessed by different client systems. Chunk servers store data as Linux files on local
disks. Stored data is divided into large chunks (64 MB), which are replicated in the
network a minimum of three times. The large chunk size reduces network overhead.
GFS is designed to accommodate Google’s large cluster requirements without burdening
applications. Files are stored in hierarchical directories identified by path names.
Metadata - such as namespace, access control data, and mapping information - is
controlled by the master, which interacts with and monitors the status updates of each
chunk server through timed heartbeat messages.
GFS features include:
 Fault tolerance
 Critical data replication
 Automatic and efficient data recovery
 High aggregate throughput
 Reduced client and master interaction because of large chunk server size
 Namespace management and locking
 High availability

The largest GFS clusters have more than 1,000 nodes with 300 TB disk storage capacity.
This can be accessed by hundreds of clients on a continuous basis.

COMPONENTS
1) Commodity Hardware – Commodity servers are cheap and can be made to scale
horizontally with right software.
2) Google Large Files – Google system is optimized to store and read large files
ranges from 100 MB to multiple GBs .
3) File Operations – It was optimized for two kinds operations which are to read and
append only. It keeps appending crawled content and use batch processing (read)
to create index.
4) Chunks – Each chunk are of 64 Mb that are distributed in multiple machines.
5) Replicas – Google File System ensures that each chunk of your file has at least
three replicas across three different servers so even one server goes down, you
still have other two replicas to work with.
6) Google File System Master
7) Heartbeats – The file servers are off-the-shelf cheap commodity hardware and
they could go down for number of reasons so it is important for chunk servers to
have heartbeat messages passed along to the master so that the master knows that
the chunk server is still alive.
8) Ensure Chunk Replica Count – If chunk server is down, master ensures all chunks
that were on it are copied on other servers.
9) Operations Log – Each file operation with the corresponding timestamp and the
user details who performed that operation is stored in this operations log.
10) Shadow Master – Files are identified by path names which as namespace, access
control data and mapping information that is controlled by the master server. Each
file is divided into file size chunk which is stored by chunk server and data
transfer happens directly between clients and chunk servers.

36 DC Expt9
No ratings yet
36 DC Expt9
4 pages
Refer Slide Time: 00:15
No ratings yet
Refer Slide Time: 00:15
31 pages
BDA-Unit-I
No ratings yet
BDA-Unit-I
18 pages
The Google File System
No ratings yet
The Google File System
21 pages
Lecture 4.1 - Hadoop - MapReduce - Hbase
No ratings yet
Lecture 4.1 - Hadoop - MapReduce - Hbase
94 pages
Google File System
No ratings yet
Google File System
9 pages
GFD Summary
No ratings yet
GFD Summary
3 pages
Google File System
No ratings yet
Google File System
22 pages
Google File System (GFS)
No ratings yet
Google File System (GFS)
18 pages
Chapter_2_c8ad153f2f004857aca733db68105108_1712934164766
No ratings yet
Chapter_2_c8ad153f2f004857aca733db68105108_1712934164766
21 pages
Distributed Computing Module 5 Important Topics PYQs
No ratings yet
Distributed Computing Module 5 Important Topics PYQs
23 pages
Google_File_System_1
No ratings yet
Google_File_System_1
48 pages
chap6
No ratings yet
chap6
54 pages
Hadoop and Big Data Unit 2
No ratings yet
Hadoop and Big Data Unit 2
11 pages
Google File System
No ratings yet
Google File System
48 pages
Questions On Google File System
100% (1)
Questions On Google File System
3 pages
Unit 5 Lecture 2
No ratings yet
Unit 5 Lecture 2
22 pages
BDA Unit-1
No ratings yet
BDA Unit-1
19 pages
Google File System Basics: Google World Wide Web Computers
No ratings yet
Google File System Basics: Google World Wide Web Computers
5 pages
A Review On GOOGLE File System
No ratings yet
A Review On GOOGLE File System
4 pages
Lecture_14_HDFS_GFS
No ratings yet
Lecture_14_HDFS_GFS
30 pages
9238 DC Assignment 3
No ratings yet
9238 DC Assignment 3
5 pages
Saritha Gfs Report
No ratings yet
Saritha Gfs Report
28 pages
Unit 3.4 Gfs and Hdfs
No ratings yet
Unit 3.4 Gfs and Hdfs
4 pages
AnalyzingGFS_HDFS
No ratings yet
AnalyzingGFS_HDFS
11 pages
DBMS Final
No ratings yet
DBMS Final
21 pages
2 GFS
No ratings yet
2 GFS
30 pages
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
No ratings yet
Thegooglefilesystem Lecturebyromainjacotin 141001154546 Phpapp02
52 pages
Google File System
No ratings yet
Google File System
20 pages
An Overview of Google File System (GFS) _ Medium
No ratings yet
An Overview of Google File System (GFS) _ Medium
10 pages
The Google File System: Firas Abuzaid
No ratings yet
The Google File System: Firas Abuzaid
22 pages
The Google File System Final
No ratings yet
The Google File System Final
20 pages
The Google File System: Alexandru Costan
No ratings yet
The Google File System: Alexandru Costan
38 pages
Case Study: Google File System
No ratings yet
Case Study: Google File System
7 pages
GFS
No ratings yet
GFS
9 pages
15 Gfs
No ratings yet
15 Gfs
40 pages
GFS
No ratings yet
GFS
33 pages
Unit-II (BIG DATA)
No ratings yet
Unit-II (BIG DATA)
9 pages
storage-systems
No ratings yet
storage-systems
23 pages
1564-Article Text-2810-1-10-20171231 PDF
No ratings yet
1564-Article Text-2810-1-10-20171231 PDF
5 pages
DS Lecture 5
No ratings yet
DS Lecture 5
28 pages
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
No ratings yet
Rapid Application Development and Short-Time To The Market Low Latency Scalability High Availability Consistent View of The Data
21 pages
05 en Distributed File Systems
No ratings yet
05 en Distributed File Systems
63 pages
Chapter 5a
No ratings yet
Chapter 5a
23 pages
Google Fs
No ratings yet
Google Fs
35 pages
Paper Review 1 - Google File System
No ratings yet
Paper Review 1 - Google File System
2 pages
BCA 5 Google File System
No ratings yet
BCA 5 Google File System
17 pages
Gfs Google File System 13331
No ratings yet
Gfs Google File System 13331
28 pages
GPS Vs Hdfs
No ratings yet
GPS Vs Hdfs
6 pages
Google File System and Hadoop Distributed File System-An Analogy
No ratings yet
Google File System and Hadoop Distributed File System-An Analogy
11 pages
Cloud Computing Unit 2 Notes
No ratings yet
Cloud Computing Unit 2 Notes
14 pages
Unit 2
No ratings yet
Unit 2
22 pages
Sodapdf
No ratings yet
Sodapdf
6 pages
R16 4-1 BDA - Unit-2 (Ref-3)
No ratings yet
R16 4-1 BDA - Unit-2 (Ref-3)
22 pages
Bda Material Unit 2
No ratings yet
Bda Material Unit 2
19 pages
2 Uvm
No ratings yet
2 Uvm
15 pages
The Google File System: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google
No ratings yet
The Google File System: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google
15 pages
Case Study
No ratings yet
Case Study
6 pages
Gluster Filesystem - Practical Method
From Everand
Gluster Filesystem - Practical Method
Fabian Mestre
No ratings yet
Operating Systems: Concepts to Save Money, Time, and Frustration
From Everand
Operating Systems: Concepts to Save Money, Time, and Frustration
Jonathan Rigdon
No ratings yet
Alexander Todorov - Testing The Creditcoin Blockchain A Daily Account From A Test Engineer's Perspective (2024)
No ratings yet
Alexander Todorov - Testing The Creditcoin Blockchain A Daily Account From A Test Engineer's Perspective (2024)
258 pages
The Pirate Code V2.0
No ratings yet
The Pirate Code V2.0
21 pages
Block Chain Unit 4
No ratings yet
Block Chain Unit 4
37 pages
Distributed System Design Issues - Nfinal
No ratings yet
Distributed System Design Issues - Nfinal
10 pages
Bills _ Billing _ Global
No ratings yet
Bills _ Billing _ Global
10 pages
All TapSwap Codes (Up-To-Date)
100% (1)
All TapSwap Codes (Up-To-Date)
17 pages
DDBMS Questions Answers
No ratings yet
DDBMS Questions Answers
4 pages
Module 1 - QN Bank
No ratings yet
Module 1 - QN Bank
9 pages
Az-900 18v1.1
No ratings yet
Az-900 18v1.1
184 pages
Transaction Concept: Unit - Iv Transaction Management
No ratings yet
Transaction Concept: Unit - Iv Transaction Management
47 pages
Parallel and Distributed Computing
No ratings yet
Parallel and Distributed Computing
1 page
Assignment 02 BigData Computing Noc23-Cs112
No ratings yet
Assignment 02 BigData Computing Noc23-Cs112
9 pages
loopbackWhitepaper (3)
No ratings yet
loopbackWhitepaper (3)
3 pages
CS8791 Cloud Computing
No ratings yet
CS8791 Cloud Computing
52 pages
DS Architectures
No ratings yet
DS Architectures
38 pages
The Content Addressable Network
No ratings yet
The Content Addressable Network
2 pages
International Journal of Information Technology, Control and Automation (IJITCA)
No ratings yet
International Journal of Information Technology, Control and Automation (IJITCA)
2 pages
Chapter 1.3
No ratings yet
Chapter 1.3
17 pages
Autonomic Runtime Manager For Adaptive D
No ratings yet
Autonomic Runtime Manager For Adaptive D
9 pages
SpringerNature Books Title List 20240403 062757
No ratings yet
SpringerNature Books Title List 20240403 062757
1,056 pages
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
No ratings yet
2-INTRODUCTION TO PDC - MOTIVATION - KEY CONCEPTS-03-Dec-2019Material - I - 03-Dec-2019 - Module - 1 PDF
63 pages
Zachxbt
No ratings yet
Zachxbt
21 pages
AWS Management Console
No ratings yet
AWS Management Console
7 pages
Distributed Shared Memory For Advanced Os
No ratings yet
Distributed Shared Memory For Advanced Os
21 pages
Module 1
No ratings yet
Module 1
30 pages
Hyper Ledger
No ratings yet
Hyper Ledger
15 pages
Advanced Database Technology: Ambo University
100% (1)
Advanced Database Technology: Ambo University
28 pages
Core White Paper v1.0.6
No ratings yet
Core White Paper v1.0.6
35 pages
NTA UGC NET Computer Science Paper 3 Solved September 2013
No ratings yet
NTA UGC NET Computer Science Paper 3 Solved September 2013
23 pages
Network Pos Blockchain
No ratings yet
Network Pos Blockchain
8 pages

What Is Distributed Data Processing?

Uploaded by

What Is Distributed Data Processing?

Uploaded by

1. What is Distributed Data Processing?

You might also like