Parallel Processing

This document discusses data-intensive computing systems which process large volumes of data, often terabytes or petabytes in size, known as big data. It proposes a data-intensive computer system consisting of an HPC cluster, massively parallel database, and intermediate operating system to process petascale datasets. The operating system would exploit parallelism in the database and optimize data flow between the cluster and database. A data-object oriented operating system is proposed to support high-level data objects like multi-dimensional arrays. User applications would compile to code executing on the cluster and database. The system supports collaborative work where large datasets are created and processed by many users.

Uploaded by

Mustafa Al-Naimi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views

Parallel Processing

Uploaded by

Mustafa Al-Naimi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Introduction

Data-intensive computing is a class of parallel computing applications which use a data

parallel approach to process large volumes of data typically terabytes or petabytes in size and
typically referred to as big data. Computing applications which devote most of their execution
time to computational requirements are deemed compute-intensive, whereas computing
applications which require large volumes of data and devote most of their processing time to
I/O and manipulation of data are deemed data-intensive. Scientific instruments, as well as
simulations, generate increasingly large datasets, changing the way we do science. We
propose that processing Petascale-sized datasets will be carried in a data-intensive computer,
a system consisting of an HPC cluster, a massively parallel database and an intermediate
operating system layer. The operating system will run on dedicated servers and will exploit
massive parallelism in the database, as well as numerous optimization strategies, to deliver
highthroughput, balanced and regular data flow for I/O operations between the HPC cluster
and the database. The programming model of sequential file storage is not appropriate for
dataintensive computations, so we propose a data-object-oriented operating system, where
support for high-level data objects, such as multi-dimensional arrays, is built in. User
application programs will be compiled into code that is executed both on the HPC cluster and
inside the database. The data-intensive operating system is however non-local, so that user
applications running on a remote PC will be compiled into code executing both on the PC and
inside the database. This model supports the collaborative environment, where a large data set
is typically created and processed by a large group of users. We have implemented a software
library, MPI-DB, which is a prototype of the data-intensive operating system. It is currently
being used to ingest the output of the simulation of a turbulent channel flow into the database.

Examples
1. The large-scale structure of the Universe
Contemporary research in astrophysics has deep and important connections to particle
physics. Observations of large structures in the universe lead physicists to the
discovery of the dark matter and the dark energy, and understanding these new forms
of matter will change our view of the universe on all scales, including the particle
scale and the human scale. Theoretical developments in astrophysics must be tested
against vast amounts of data collected by instruments, such as the Hubble Space
Telescope, as well as against the results of supercomputer simulation experiments,
like the Millenium Run [5]. These data sets are available in public databases, and are
being mined by scientists to gain intuition and to make new discoveries, but the
researchers are limited by the technological means available to access the data. In
order to analyze astrophysical data researchers write scripts that perform database
queries, transfer the resulting data sets to their local computers and store them as flat
files. Such limited access has already produced important discoveries. For example,
recently a new log-power density spectrum was discovered by such analysis of the
data in the Millenium Run database [6]. This is the most efficient quantitative
description of the distribution of the density of matter in the Universe, that was
obtained so far.
2. Computational modeling of the cochlea
The human cochlea is a remarkable highly nonlinear transducer that extracts vital
information from sound pressure and converts it into neuronal impulses that are sent
to the auditory cortex. The cochlea’s accuracy, amplitude range and frequency range
are orders of magnitude better than man made transducers. Understanding its
function has tremendous medical and engineering significance. The two most
fundamental questions of cochlear research are to provide a mathematical description
of the transform computed by the cochlea and to explain the biological mechanisms
that compute this transform. Presently there is no adequate answer to either of these
two questions. Signal processing in the cochlea is carried out by a collection of
coupled biological processes occuring on length scales measuring from one
centimeter down to a fraction of a nanometer. A comprehensive model describing
the coupling of the dynamics of the biological processes occurring on multiple scales
is needed in order to achieve system level understanding of cochlear signal
processing. A model of cochlear macro-mechanics was constructed in 1999–2002 by
Givelberg and Bunn [18], who used supercomputers to generate very large data sets,
containing results of simulation experiments. These results were stored as flat files
which were subsequently analyzed by the authors on workstations using specially
developed software. aA set of web pages devoted to this research [19] is widely and
frequently accessed, however the data was never exposed to the wide community for
analysis since no tools to ingest simulation output into a database existed when the
cochlea model was developed.

 Characteristics:
Several common characteristics of data-intensive computing systems distinguish them
from other forms of computing:

(1) The principle of collection of the data and programs or algorithms is used to
perform the computation. To achieve high performance in data-intensive computing,
it is important to minimize the movement of data.[19] This characteristic allows
processing algorithms to execute on the nodes where the data resides reducing system
overhead and increasing performance.[20] Newer technologies such as InfiniBand
allow data to be stored in a separate repository and provide performance comparable
to collocated data.

(2) The programming model utilized. Data-intensive computing systems utilize a

machine-independent approach in which applications are expressed in terms of high-
level operations on data, and the runtime system transparently controls the scheduling,
execution, load balancing, communications, and movement of programs and data
across the distributed computing cluster.[21] The programming abstraction and
language tools allow the processing to be expressed in terms of data flows and
transformations incorporating new dataflow programming languages and shared
libraries of common data manipulation algorithms such as sorting.
(3) A focus on reliability and availability. Large-scale systems with hundreds or
thousands of processing nodes are inherently more susceptible to hardware failures,
communications errors, and software bugs. Data-intensive computing systems are
designed to be fault resilient. This typically includes redundant copies of all data files
on disk, storage of intermediate processing results on disk, automatic detection of
node or processing failures, and selective re-computation of results.

(4) The inherent scalability of the underlying hardware and software architecture.
Data-intensive computing systems can typically be scaled in a linear fashion to
accommodate virtually any amount of data, or to meet time-critical performance
requirements by simply adding additional processing nodes. The number of nodes and
processing tasks assigned for a specific application can be variable or fixed depending
on the hardware, software, communications, and distributed file system architecture.

 The data-intensive computer differs from the traditional

computer in a number of important aspects:
A. Direct I/O between memory and database.
B. Moving the program to the data.
C. Data-object-oriented operating system.
D. Operating system support for distributed data objects .
E. Collaborative, non-local operating system services .

Why Is Big Data Important?

The importance of big data doesn’t revolve around how much data you have, but what
you do with it. You can take data from any source and analyze it to find answers that
enable cost reductions, time reductions, new product development and optimized
offerings and smart decision making. When you combine big data with high-
powered analytics, you can accomplish business-related tasks such as:

 Determining root causes of failures, issues and defects in near-real time.

 Generating coupons at the point of sale based on the customer’s buying habits.

 Recalculating entire risk portfolios in minutes.

 Detecting fraudulent behavior before it affects your organization.

References :

1. Distributed Computing Economics by J. Gray, "Distributed Computing

Economics," ACM Queue, Vol. 6, No. 3, 2008, pp. 63-68.
2. Computing in the 21st Century, by I. Gorton, P. Greenfield, A. Szalay, and R.
Williams, IEEE Computer, Vol. 41, No. 4, 2008, pp. 30-32.
3. Data Intensive Scalable Computing by R.E. Bryant. "Data Intensive Scalable
Computing," 2008.
4. Data Intensive Computer. (https://www.scribd.com).
5. Big Data What it is and why it matters
(https://www.sas.com/en_us/insights/big-data/what-is-big-data.html).

IB DP Computer Science Syllabus
100% (1)
IB DP Computer Science Syllabus
6 pages
Ambit Optimist 8 Installation Guide
0% (1)
Ambit Optimist 8 Installation Guide
87 pages
Problem Set Time Value of Money
No ratings yet
Problem Set Time Value of Money
5 pages
15cs565 Cloud Computing Module 4 Notes
No ratings yet
15cs565 Cloud Computing Module 4 Notes
33 pages
DataIntensive Computer
No ratings yet
DataIntensive Computer
10 pages
Data-Intensive Supercomputing: The Case For DISC: Randal E. Bryant
No ratings yet
Data-Intensive Supercomputing: The Case For DISC: Randal E. Bryant
22 pages
Big Data and Genomics
No ratings yet
Big Data and Genomics
17 pages
data intensive computing
No ratings yet
data intensive computing
33 pages
Editorial: Complexity Problems Handled by Big Data Technology
No ratings yet
Editorial: Complexity Problems Handled by Big Data Technology
8 pages
The Earth System Grid
No ratings yet
The Earth System Grid
18 pages
Optimized Error Detection in Cloud User For Networking Services
No ratings yet
Optimized Error Detection in Cloud User For Networking Services
7 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
6 pages
Minor Research Project Report
No ratings yet
Minor Research Project Report
23 pages
Question Bank
No ratings yet
Question Bank
23 pages
Evolution of Analytical Scalability
100% (1)
Evolution of Analytical Scalability
11 pages
5 - A New Layered Architecture For Future Big Data-Driven Smart Homes
No ratings yet
5 - A New Layered Architecture For Future Big Data-Driven Smart Homes
11 pages
The Growing Enormous of Big Data Storage
No ratings yet
The Growing Enormous of Big Data Storage
6 pages
A Cost-Effective Strategy For Intermediate Data Storage in Scientific Cloud Workflow Systems
No ratings yet
A Cost-Effective Strategy For Intermediate Data Storage in Scientific Cloud Workflow Systems
12 pages
A New Apache Spark-based Framework for Big Data Streaming Forecasting in IoT Networks(Recovered)
No ratings yet
A New Apache Spark-based Framework for Big Data Streaming Forecasting in IoT Networks(Recovered)
23 pages
Big Data Approach To Analytical Chemistry: February 2014
No ratings yet
Big Data Approach To Analytical Chemistry: February 2014
7 pages
Intelligent Oilfield - Cloud Based Big Data Service in Upstream Oil and Gas
No ratings yet
Intelligent Oilfield - Cloud Based Big Data Service in Upstream Oil and Gas
15 pages
IJRPR6988
No ratings yet
IJRPR6988
4 pages
BigDataAnalytics
100% (1)
BigDataAnalytics
36 pages
33 Auditing Big Data Storage in Cloud Computing Using Divide and Conquer Tables
No ratings yet
33 Auditing Big Data Storage in Cloud Computing Using Divide and Conquer Tables
3 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Architecture and Implementation of A Scalable Sensor Data Dpem18lb7j
No ratings yet
Architecture and Implementation of A Scalable Sensor Data Dpem18lb7j
12 pages
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
No ratings yet
Scalable Machine-Learning Algorithms For Big Data Analytics: A Comprehensive Review
21 pages
Sharna Cs Done
No ratings yet
Sharna Cs Done
15 pages
IoT - IA III QB Students
No ratings yet
IoT - IA III QB Students
5 pages
Fog Computing (Use It in Some Application)
No ratings yet
Fog Computing (Use It in Some Application)
18 pages
BDA
No ratings yet
BDA
52 pages
A Survey On Big Data and Cloud Computing: D. Asir Antony Gnana Singh B. Tamizhpoonguil E. Jebamalar Leavline
No ratings yet
A Survey On Big Data and Cloud Computing: D. Asir Antony Gnana Singh B. Tamizhpoonguil E. Jebamalar Leavline
5 pages
learn 2
No ratings yet
learn 2
32 pages
Document Clustering: Alankrit Bhardwaj 18BIT0142 Priyanshu Gupta 18BIT0146 Aditya Raj 18BIT0412
No ratings yet
Document Clustering: Alankrit Bhardwaj 18BIT0142 Priyanshu Gupta 18BIT0146 Aditya Raj 18BIT0412
33 pages
A Study On Big Data Processing Mechanism & Applicability: Byung-Tae Chun and Seong-Hoon Lee
No ratings yet
A Study On Big Data Processing Mechanism & Applicability: Byung-Tae Chun and Seong-Hoon Lee
10 pages
Computational Grids: Reprinted by Permission of Morgan Kaufmann Publishers From, I. Foster and C. Kesselman (Eds), 1998
No ratings yet
Computational Grids: Reprinted by Permission of Morgan Kaufmann Publishers From, I. Foster and C. Kesselman (Eds), 1998
29 pages
Final - IJCSAPaper 21 07 2022 Updated
No ratings yet
Final - IJCSAPaper 21 07 2022 Updated
9 pages
DOLAP 2011-Analytics Over Large Scale MD Data
No ratings yet
DOLAP 2011-Analytics Over Large Scale MD Data
3 pages
Role of Cloud Computing in Bioinformatics: Abstract
No ratings yet
Role of Cloud Computing in Bioinformatics: Abstract
4 pages
Big Data Dimensional Analysis
No ratings yet
Big Data Dimensional Analysis
6 pages
February 22, 2010
No ratings yet
February 22, 2010
53 pages
Toward Scalable Systems For Big Data Analytics: A Technology Tutorial
No ratings yet
Toward Scalable Systems For Big Data Analytics: A Technology Tutorial
36 pages
TMP - 11927-Information Retrieval From Big Data For Sensor Data Collection-520372139741037689682152
No ratings yet
TMP - 11927-Information Retrieval From Big Data For Sensor Data Collection-520372139741037689682152
3 pages
764 Stream Processing Pipeline Architecture for Real Time Synchrophasor Analytics
No ratings yet
764 Stream Processing Pipeline Architecture for Real Time Synchrophasor Analytics
11 pages
Real-Time Handling of Network Monitoring Data Using A Data-Intensive Framework
No ratings yet
Real-Time Handling of Network Monitoring Data Using A Data-Intensive Framework
8 pages
PDC DA 2 19BCE0660
No ratings yet
PDC DA 2 19BCE0660
29 pages
A Cloud-Based Framework for Agricultural Data Integration A Top-Down Bottom-Up Approach
No ratings yet
A Cloud-Based Framework for Agricultural Data Integration A Top-Down Bottom-Up Approach
4 pages
ANewLayeredArchitecture_IEEEACCESS
No ratings yet
ANewLayeredArchitecture_IEEEACCESS
11 pages
15CS565 Module4
No ratings yet
15CS565 Module4
61 pages
Seminar Report
No ratings yet
Seminar Report
25 pages
Big Data and Data Science: Case Studies: Priyanka Srivatsa
No ratings yet
Big Data and Data Science: Case Studies: Priyanka Srivatsa
5 pages
Query Processing in
No ratings yet
Query Processing in
10 pages
Synthetic Generation of High Dimensional Dataset
No ratings yet
Synthetic Generation of High Dimensional Dataset
8 pages
4 A Review Paper On Big Data and Hadoop
No ratings yet
4 A Review Paper On Big Data and Hadoop
3 pages
Data Mining and Data Warehouse
No ratings yet
Data Mining and Data Warehouse
11 pages
A SEARCHABLE AND VERIFIABLE DATA PROTECTION SCHEME FOR SCHOLARLY BIG DATA
No ratings yet
A SEARCHABLE AND VERIFIABLE DATA PROTECTION SCHEME FOR SCHOLARLY BIG DATA
57 pages
5 Dr. Amit Sharma
No ratings yet
5 Dr. Amit Sharma
8 pages
Literature Review
No ratings yet
Literature Review
6 pages
Deep Reinforcement Learning MultiAgent System For Resource Allocation in Industrial Internet of ThingsSensors
No ratings yet
Deep Reinforcement Learning MultiAgent System For Resource Allocation in Industrial Internet of ThingsSensors
23 pages
Assignment
No ratings yet
Assignment
5 pages
Cloud Computing Unit 1 Ppt
No ratings yet
Cloud Computing Unit 1 Ppt
29 pages
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
No ratings yet
Data Mining and Data Warehouse: Qis College of Engineering & Technology Ongole
10 pages
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
P.T. Stanvac Indonesia P.T. Schlumberger Geophysics: Nusantara
No ratings yet
P.T. Stanvac Indonesia P.T. Schlumberger Geophysics: Nusantara
16 pages
Stat Post Test
No ratings yet
Stat Post Test
3 pages
08 Fugacity of Species in A Solution PDF
No ratings yet
08 Fugacity of Species in A Solution PDF
41 pages
Equipment Operational Reliability Evaluation Metho
No ratings yet
Equipment Operational Reliability Evaluation Metho
9 pages
Kumpulan Shortcut Key Untuk MS Word
No ratings yet
Kumpulan Shortcut Key Untuk MS Word
2 pages
JC Cuevas Molecular Electronics Lecture PDF
No ratings yet
JC Cuevas Molecular Electronics Lecture PDF
83 pages
Fruit ML
No ratings yet
Fruit ML
12 pages
Management of The Soft Palate Defect Steven Eckert PDF
No ratings yet
Management of The Soft Palate Defect Steven Eckert PDF
15 pages
UTR LM A 5 UTR Language Model For Decoding
No ratings yet
UTR LM A 5 UTR Language Model For Decoding
13 pages
(Lab Manual) Chemistry Laboratory
No ratings yet
(Lab Manual) Chemistry Laboratory
79 pages
Alstom Digital Substation Solution EN PDF
No ratings yet
Alstom Digital Substation Solution EN PDF
11 pages
Development of Mill Drives For The Cement Industry
No ratings yet
Development of Mill Drives For The Cement Industry
16 pages
Simple Practice Problems On Numbers-1
No ratings yet
Simple Practice Problems On Numbers-1
3 pages
Lec07 EyeToBrain Chap3B
No ratings yet
Lec07 EyeToBrain Chap3B
48 pages
#9 Stability Guidelines June 98
No ratings yet
#9 Stability Guidelines June 98
114 pages
DC Tut 1 2023
No ratings yet
DC Tut 1 2023
2 pages
Revision Point - Dataframe
No ratings yet
Revision Point - Dataframe
11 pages
Dow-Espesantes ACRYSOLES PDF
No ratings yet
Dow-Espesantes ACRYSOLES PDF
12 pages
Fuel Testing Laboratory Overview
No ratings yet
Fuel Testing Laboratory Overview
12 pages
Sample Black Book
No ratings yet
Sample Black Book
64 pages
H2 Chem Notes 9729 PDF
No ratings yet
H2 Chem Notes 9729 PDF
78 pages
Semester Test 1 Memo
No ratings yet
Semester Test 1 Memo
12 pages
Boron Deficiency
No ratings yet
Boron Deficiency
2 pages
Sas Semma
100% (1)
Sas Semma
39 pages
Automated Welding Manipulators Available Manipulator Sizes
No ratings yet
Automated Welding Manipulators Available Manipulator Sizes
4 pages
Mod02a Activity-Cell Structure & Function
No ratings yet
Mod02a Activity-Cell Structure & Function
6 pages
Math 7 Angles Practice Quiz
No ratings yet
Math 7 Angles Practice Quiz
6 pages