0% found this document useful (0 votes)

34 views

Distributed Large-Scale Graph Processing: Data Mining (CS6720)

1) The document discusses algorithms for maximal matching on graphs in the massively parallel computation (MPC) model. 2) It describes a filtering algorithm that finds a maximal matching in superlinear memory regimes. The algorithm runs in phases, where in each phase edges are randomly sent to a leader machine which computes a maximal matching and broadcasts it back to remove edges. 3) Analysis shows that with high probability, the leader receives at most n/√S edges in each phase, and the number of remaining edges halves each phase.

Uploaded by

Venkata Praneeth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views

Distributed Large-Scale Graph Processing: Data Mining (CS6720)

Uploaded by

Venkata Praneeth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

26-02-2020

John Augustine
Distributed
Jan 16, 2020 Large-Scale
Data Mining (CS6720) Graph Processing

1 2

Shared Memory PRAM Massively Parallel Computation (MPC) Model

• Input data size 𝑁 words; each word = 𝑂(log 𝑁) bits.
MapReduce
• The number of machines 𝑘. (Machines identified by {1, 2,…, 𝑘}.)
Programming
Parallel &
Distributed
Models • Memory size per machine 𝑆 words.
Computing Models Think like a vertex • 𝑆 ≥ 𝑁 is uninteresting. Assume: 𝑆 = 𝑂(𝑁 ) for some 𝜖 ∈ (0,1].
• Also, require 𝑆𝑘 ≥ 𝑁.
• Synchronous communication rounds
Massively Parallel
Computation
• Local computation within each machine
• Create messages for other machines. Sum of message sizes ≤ 𝑆.
Message Passing
• Send… Receive. Ensure no machine requires > 𝑆 memory.
𝑘-machine model • Goal: Solve problem in as few rounds as possible.

3 4
1
26-02-2020

Initial Data Distribution

On Graphs:
𝑁=𝑂 𝑛
• Typically, data is split into words (often as ⟨𝑘𝑒𝑦, 𝑣𝑎𝑙𝑢𝑒⟩ pairs). (Strongly)
Superlinear

• The words could be either randomly distributed or arbitrarily 𝑁 =𝑂 𝑛+𝑚

distributed.
• Load balanced so that no machine has much more than other
machines. = 𝑂(𝑚)
• Output: usually distributed & depends on problem. Memory
• Questions Size 𝑆
• How to achieve random load balanced distribution?
• How to remove duplicates? 𝑁 = 𝑂(𝑛) 𝑁 = 𝑛 for
𝛼 ∈ (0,1).
Near (Strongly)
Linear Sublinear

5 6

Broadcasting Maximal Matching

• Let 𝑆 = 𝑛 for some constant 𝜖 > 0. • A matching in a graph 𝐺 = (𝑉, 𝐸) is a set of edges that don’t share
common vertices.
• One machine src needs to broadcast 𝑛 words.
• Approach 1: the machine sends 𝑘 messages of size 𝑛. If 𝑘 > 𝑛 ???
• A maximum matching is a matching of maximum possible cardinality.
• Approach 2: Build 𝑛 -ary tree with src as root.
• A maximal matching is a matching that ceases to be one when any
• Broadcast takes 𝑂(ℎ𝑒𝑖𝑔ℎ𝑡) rounds edge is added to it.
• ℎ𝑒𝑖𝑔ℎ𝑡 = 𝑂 log 𝑘 =𝑂
• A maximal matching has cardinality at least half of a maximum
since 𝑁 = 𝑝𝑜𝑙𝑦 𝑆 (𝑂(𝑛 ) for graphs) matching. Homework: Prove this.

7 8
2
26-02-2020

Sequential Algorithm for Filtering: Idea to find a maximal matching in

finding a maximal matching. the superlinear memory regime
1. Let 𝑋 = ∅. Preprocessing.
Let ℓ be a designated “leader” machine (say, machine 0). Assume it doesn’t hold any edge at the
2. For each 𝑒 = 𝑢, 𝑣 ∈ 𝐸, beginning. (Why is this OK?) During the course of the algorithm, ℓ maintains a matching (initially
1. If neither 𝑢 nor 𝑣 is an endpoint of any edge in 𝑋, then 𝑋 = 𝑋 ∪ {𝑒}. empty).
Other machines are called regular machines. 𝐺 = 𝑉 , 𝐸 denotes graph during phase 𝑟. We use
3. Output 𝑋. 𝑚 for number of edges in 𝐺 . 𝐺 ← 𝐺.
Steps in each phase 0,1, … (until 𝐺 becomes empty.)
Correctness: 1. Each regular machine marks each local edge independently with probability 𝑝 = and
sends the marked edges to the leader ℓ.
• Invariant: 𝑋 is a matching at all times.
2. The leader ℓ recomputes the maximal matching with edges it received but without losing any
• Suppose 𝑋 is not maximal at the end. Then some edge 𝑒 can be edge from the previous matching. (How?)
added to it and it will remain a matching. But why was 𝑒 rejected? 3. The leader ℓ broadcasts the matching so computed (≤ 𝑛/2 edges) to all machines.
4. Each regular machine removes edges that have at least one common vertex with the received
matching. Isolated vertices are also removed.

9 10

Outline of the Analysis Claim: At most whp at end of round 𝑟

• Correctness is obvious (similar to the sequential algorithm) if • Let 𝐺 = 𝑉 , 𝐸 be the leftover graph at the end of round 𝑟 − 1.
bandwidth limitation is not violated. • For some pair of vertices 𝑢, 𝑣 ∈ 𝑉 , can 𝑒 = 𝑢, 𝑣 have been sent to
the leader? No! (Why? If sent, at least one of 𝑢 or 𝑣 would have been
matched, and therefore discarded.)
• Claims:
• The leader ℓ receives at most 𝑛 edges (whp) in step 1. (Homework)
• Consider any set of vertices 𝐽 with > edges with both end
• If a phase 𝑟 starts with 𝑚 edges, then the number of edges at the end of points in 𝐽.
round 𝑟 is with high probability. • What is the chance that V = 𝐽?
• The total number of rounds is log m∈𝑂 . Why? Pr 𝑎𝑙𝑙 𝑖𝑛𝑑𝑢𝑐𝑒𝑑 𝑒𝑑𝑔𝑒𝑠 𝑛𝑜𝑡 𝑠𝑒𝑛𝑡 ≤ 1 − 𝑝 ≤𝑒 .
There are at most 2 subsets of 𝑉, so by union bound, the result holds.

11 12
3
26-02-2020

Data Distribution
The 𝑘-machine Model
The Random Vertex Partitioning (RVP)
• Input data size 𝑁 words; each word = 𝑂(log 𝑁) bits. • Typically, data is split into words (often as ⟨𝑘𝑒𝑦, 𝑣𝑎𝑙𝑢𝑒⟩ pairs).
• The number of machines 𝑘. (Machines identified by {1, 2,…, 𝑘}.) • The words could be either randomly distributed or arbitrarily
distributed.
• Memory size is unbounded (but usually not abused).
• Typically used in processing large graphs.
• Synchronous communication rounds
• RVP: Most common approach is to randomly partition vertices into 𝑘
• Local computation within each machine parts and place each part into one of the machines. Then, a copy of
• Each machine creates one message of 𝑂(log 𝑛) bits for every other machine. each edge is placed in the (≤ 2) machines that contain either of its
• Send… Receive. end points.
• Goal: Solve problem in as few rounds as possible. • Other partitioning of graph data is also conceivable (e.g., random
edge partitioning, arbitrary edge partitioning, etc.).

13 14

RVP is Load Balanced

Claim: Under RVP of a graph 𝐺 = (𝑉, 𝐸) with 𝑛 vertices and 𝑚 edges,
whp, every machine has
1. at most 𝑂 vertices and
2. at most 𝑂 + Δ edges,
where Δ is the maximum degree in 𝐺.
Proof of part 1 is easy. Just use Chernoff bound.
Proof of part 2 is more complicated and therefore skipped.

15
4

Work at A Pizza Place FIXED SIREL KILLERS AUTO FARM
No ratings yet
Work at A Pizza Place FIXED SIREL KILLERS AUTO FARM
20 pages
MA252 - Combinatorial Optimisation
No ratings yet
MA252 - Combinatorial Optimisation
9 pages
Approximation Algorithms PDF
No ratings yet
Approximation Algorithms PDF
37 pages
hw3 S
No ratings yet
hw3 S
11 pages
Advanced Manufacturing Systems
No ratings yet
Advanced Manufacturing Systems
23 pages
Large Scale Distributed Graph Processing: Data Mining (CS6720)
No ratings yet
Large Scale Distributed Graph Processing: Data Mining (CS6720)
7 pages
Lecture4 GraphStreams
No ratings yet
Lecture4 GraphStreams
56 pages
Fundamental Problems AND Algorithms Graph Theory and Combinational
No ratings yet
Fundamental Problems AND Algorithms Graph Theory and Combinational
31 pages
DSA DAY 6 - Graphs
67% (3)
DSA DAY 6 - Graphs
45 pages
Algorithms
No ratings yet
Algorithms
8 pages
Review 4: CSCI 2720: Data Structures
No ratings yet
Review 4: CSCI 2720: Data Structures
33 pages
Notebook 231102
No ratings yet
Notebook 231102
10 pages
Computing Functions Over Wireless Networks
No ratings yet
Computing Functions Over Wireless Networks
37 pages
Apznzaac8vgwcs8m7wss9ifm3m39bv2dblkn6hgjnfm6hl8tw6xsqse0zbshp0hc0smk1hvlj2jhy3jl29zxun8chwelu92m9jiwml Botqroep 5xpwlshvrenjn1rq8wgwpyxcfsuyi6k6faid9u2oxfo7 u35n1cm8cvabfgumu0acmli c6iydtlfactuaqgwdpq1loap9q94ry46
No ratings yet
Apznzaac8vgwcs8m7wss9ifm3m39bv2dblkn6hgjnfm6hl8tw6xsqse0zbshp0hc0smk1hvlj2jhy3jl29zxun8chwelu92m9jiwml Botqroep 5xpwlshvrenjn1rq8wgwpyxcfsuyi6k6faid9u2oxfo7 u35n1cm8cvabfgumu0acmli c6iydtlfactuaqgwdpq1loap9q94ry46
14 pages
Applications of Graph Theory in
No ratings yet
Applications of Graph Theory in
18 pages
Graph Algorithm
No ratings yet
Graph Algorithm
10 pages
Chapter_4_Graph Theory_Part_1
No ratings yet
Chapter_4_Graph Theory_Part_1
78 pages
Projects
No ratings yet
Projects
4 pages
Cs - 502 F-T Subjective by Vu - Toper
No ratings yet
Cs - 502 F-T Subjective by Vu - Toper
18 pages
ds mod 4
No ratings yet
ds mod 4
26 pages
Graph Algorithms
No ratings yet
Graph Algorithms
82 pages
Tutorial Problems
No ratings yet
Tutorial Problems
2 pages
Graphs & Algorithms
No ratings yet
Graphs & Algorithms
14 pages
Probabilistic Graphical Models CPSC 532c (Topics in AI) Stat 521a (Topics in Multivariate Analysis)
No ratings yet
Probabilistic Graphical Models CPSC 532c (Topics in AI) Stat 521a (Topics in Multivariate Analysis)
35 pages
26-Apr-24-NP-Completeness4
No ratings yet
26-Apr-24-NP-Completeness4
27 pages
Computer Network Assignment Help: Problems and Solutions
0% (1)
Computer Network Assignment Help: Problems and Solutions
28 pages
ps2
No ratings yet
ps2
5 pages
DS Unit-4
No ratings yet
DS Unit-4
47 pages
Dna Book
No ratings yet
Dna Book
171 pages
16 - Shortest Path Algorithms
No ratings yet
16 - Shortest Path Algorithms
25 pages
Design & Analysis of Algorithms: Laboratory Instructions & Assignments
No ratings yet
Design & Analysis of Algorithms: Laboratory Instructions & Assignments
48 pages
Lecture3434 - 16870 - Graphs 1
No ratings yet
Lecture3434 - 16870 - Graphs 1
43 pages
Talk Graph Algorithms
No ratings yet
Talk Graph Algorithms
31 pages
Notes On Distributed Systems
No ratings yet
Notes On Distributed Systems
384 pages
Matching: Algorithms and Networks
No ratings yet
Matching: Algorithms and Networks
52 pages
Lecture 4
No ratings yet
Lecture 4
39 pages
Approximation Algorithms: Vertex Cover - Max Cut Problems
No ratings yet
Approximation Algorithms: Vertex Cover - Max Cut Problems
14 pages
Parallel Random Access Machine (PRAM) : Control
No ratings yet
Parallel Random Access Machine (PRAM) : Control
9 pages
Distributed Systems
67% (3)
Distributed Systems
331 pages
Lecture 3
No ratings yet
Lecture 3
57 pages
Algorithms and Complexity II
100% (2)
Algorithms and Complexity II
102 pages
Ds-unit-4
No ratings yet
Ds-unit-4
5 pages
Practice 2
No ratings yet
Practice 2
8 pages
Recap
No ratings yet
Recap
10 pages
Unit IV - Graph
No ratings yet
Unit IV - Graph
7 pages
Ads Answersheet
No ratings yet
Ads Answersheet
15 pages
Example:: Connected Graph Complete Graph - Weighted Graph - Digraph
No ratings yet
Example:: Connected Graph Complete Graph - Weighted Graph - Digraph
2 pages
FINAL_PAPER
No ratings yet
FINAL_PAPER
7 pages
Cse 421 Midterm
No ratings yet
Cse 421 Midterm
5 pages
DSA
No ratings yet
DSA
48 pages
Homework 4: Question 1 - Exercise 17.4-3, 17-3.6
No ratings yet
Homework 4: Question 1 - Exercise 17.4-3, 17-3.6
5 pages
Graphs
No ratings yet
Graphs
14 pages
2019 Spring Final
No ratings yet
2019 Spring Final
24 pages
Chapter 7 Graphs
No ratings yet
Chapter 7 Graphs
51 pages
MMW Reviewer Lesson 5 and 6
No ratings yet
MMW Reviewer Lesson 5 and 6
10 pages
Lecture07_Graph2
No ratings yet
Lecture07_Graph2
41 pages
Data Structures: Mahesh Goyani
No ratings yet
Data Structures: Mahesh Goyani
34 pages
Graph Algorithms: Text Book: Introduction To Algorithms Byclrs
No ratings yet
Graph Algorithms: Text Book: Introduction To Algorithms Byclrs
142 pages
Graph Algorithms (Crowdsourced)
No ratings yet
Graph Algorithms (Crowdsourced)
13 pages
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
01 Streaming PDF
No ratings yet
01 Streaming PDF
8 pages
Ap Eamcet - 2017 District Wise Toppers in Engineering: Page 1 of 28
No ratings yet
Ap Eamcet - 2017 District Wise Toppers in Engineering: Page 1 of 28
28 pages
Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors
No ratings yet
Action Recognition With Trajectory-Pooled Deep-Convolutional Descriptors
10 pages
Department of Humanities and Social Sciences IIT Madras: S.No. Slot Course No Course Name Instructor Name Room Credit
No ratings yet
Department of Humanities and Social Sciences IIT Madras: S.No. Slot Course No Course Name Instructor Name Room Credit
1 page
Learning Spatiotemporal Features With 3D Convolutional Networks
No ratings yet
Learning Spatiotemporal Features With 3D Convolutional Networks
16 pages
Two-Stream Convolutional Networks For Action Recognition in Videos
No ratings yet
Two-Stream Convolutional Networks For Action Recognition in Videos
9 pages
The Drinking Philosophers Problem: K. M. Chandy and J. Misra University of Texas at Austin
No ratings yet
The Drinking Philosophers Problem: K. M. Chandy and J. Misra University of Texas at Austin
15 pages
Marx and Hegel On Alienation
No ratings yet
Marx and Hegel On Alienation
10 pages
Mk204a User
No ratings yet
Mk204a User
4 pages
docklight_manual
No ratings yet
docklight_manual
84 pages
Ardhouse Pro V.3.1+
No ratings yet
Ardhouse Pro V.3.1+
6 pages
Honors Advanced Calculus and Linear Algebra p2
No ratings yet
Honors Advanced Calculus and Linear Algebra p2
2 pages
Poe Dem
No ratings yet
Poe Dem
7 pages
Reading and Interacting With SNMP Servers
No ratings yet
Reading and Interacting With SNMP Servers
10 pages
Day 1
No ratings yet
Day 1
13 pages
CG Practical File
No ratings yet
CG Practical File
48 pages
Unit-4 HashFunction & DigitalSignature
No ratings yet
Unit-4 HashFunction & DigitalSignature
78 pages
Ba 5817 FM
No ratings yet
Ba 5817 FM
3 pages
Implement UX
No ratings yet
Implement UX
40 pages
28 Judul
No ratings yet
28 Judul
245 pages
Smart Encoders & Actuators
No ratings yet
Smart Encoders & Actuators
24 pages
Analysis of DC Link Operation Voltage of A Hybrid Railway Power Quality Conditioner and Its PQ Compensation Capability in High Speed Co-Phase Traction Power Supply
No ratings yet
Analysis of DC Link Operation Voltage of A Hybrid Railway Power Quality Conditioner and Its PQ Compensation Capability in High Speed Co-Phase Traction Power Supply
5 pages
Status Feedback: SAP Solution Manager Expert Guided Implementation
No ratings yet
Status Feedback: SAP Solution Manager Expert Guided Implementation
2 pages
Business Plan Internet Cafe
67% (3)
Business Plan Internet Cafe
17 pages
Computer System Architecture Lab Report 3
No ratings yet
Computer System Architecture Lab Report 3
7 pages
802.1AEbw-2013 - IEEE STD For LAN&MANs - Media Access Control (MAC) Security. Amendment 2. Extended Packet Numbering
No ratings yet
802.1AEbw-2013 - IEEE STD For LAN&MANs - Media Access Control (MAC) Security. Amendment 2. Extended Packet Numbering
67 pages
1 - Online Safety, Security, Ethics, and Etiquette
No ratings yet
1 - Online Safety, Security, Ethics, and Etiquette
9 pages
Maintenance Philosophy
No ratings yet
Maintenance Philosophy
18 pages
Row Chaining and Row Migration
No ratings yet
Row Chaining and Row Migration
8 pages
thuvienhoclieu.com-De-kiem-tra-cuoi-HK2-Tieng-Anh-8-Global-De-3-
No ratings yet
thuvienhoclieu.com-De-kiem-tra-cuoi-HK2-Tieng-Anh-8-Global-De-3-
6 pages
7 Simulation Techniques
No ratings yet
7 Simulation Techniques
16 pages
Microwave Connectors
No ratings yet
Microwave Connectors
30 pages
Minerva Library: Library Management System CCP 1103 - Computer Programming 3
No ratings yet
Minerva Library: Library Management System CCP 1103 - Computer Programming 3
4 pages
Heart Disease Prediction Using Effective Machine Learning Techniques
No ratings yet
Heart Disease Prediction Using Effective Machine Learning Techniques
7 pages
Business Model Innovation - Coffee Triumphs For NESPRESSO
No ratings yet
Business Model Innovation - Coffee Triumphs For NESPRESSO
9 pages

Distributed Large-Scale Graph Processing: Data Mining (CS6720)

Uploaded by

Distributed Large-Scale Graph Processing: Data Mining (CS6720)

Uploaded by

26-02-2020

Shared Memory PRAM Massively Parallel Computation (MPC) Model

Initial Data Distribution

• The words could be either randomly distributed or arbitrarily 𝑁 =𝑂 𝑛+𝑚

Broadcasting Maximal Matching

Sequential Algorithm for Filtering: Idea to find a maximal matching in

Outline of the Analysis Claim: At most whp at end of round 𝑟

RVP is Load Balanced

You might also like