Brian Cho

The MapReduce model uses a barrier between the Map and Reduce stages. This provides simplicity in both programming and implementation. However, in many situations, this barrier hurts performance because it is overly restrictive. Hence, we... more

The MapReduce model uses a barrier between the Map and Reduce stages. This provides simplicity in both programming and implementation. However, in many situations, this barrier hurts performance because it is overly restrictive. Hence, we develop a method to break the barrier in MapReduce in a way that improves efficiency. Careful design of our barrierless MapReduce framework results in equivalent generality and retains ease of programming. We motivate our case with, and experimentally study our barrier-less techniques in, a wide variety of MapReduce applications divided into seven classes. Our experiments show that our approach can achieve better performance times than a traditional MapReduce framework. We achieve a reduction in job completion times that is 25% on average and 87% in the best case.

Publication Date: 2010

Research Interests:
Cluster Computing, Experimental Study, and Data Intensive Computing

Download (.pdf)

Publication Date: 2010

Research Interests:
Distributed Data Mining, Experimental Evaluation, Storage system, Replication, Fault Tolerant, and Program Generation

Download (.pdf)

Cloud collaborators wish to combine large amounts of data, in the order of TBs, from multiple distributed locations to a single datacenter. Such groups are faced with the challenge of reducing the latency of the transfer, without... more

Cloud collaborators wish to combine large amounts of data, in the order of TBs, from multiple distributed locations to a single datacenter. Such groups are faced with the challenge of reducing the latency of the transfer, without incurring excessive dollar costs. Our Pandora system is an autonomic system that creates data transfer plans that can satisfy latency and cost needs,

Publication Date: 2011

Research Interests:
Cloud Computing, Data transfer, Experimental Evaluation, Web Service, Autonomic System, and 4 moreData Intensive Computing, Boolean Satisfiability, Budget Constraint, and Binary Search

Publication Date: 2009

Research Interests:
Cloud Computing and Data storage

Download (.pdf)

Publication Date: 2007

Research Interests:
Distributed System

Download (.pdf)

Publication Date: 2010

Research Interests:
Computer Science, Distributed Computing, Urban Planning, Integer Programming, Cloud Computing, and 7 moreData transfer, Experimental Evaluation, Data Handling, Boolean Satisfiability, Large Dataset Analysis, Internet, and Integer Program

Download (.pdf)

ABSTRACT Based on the intuition that &quot;two objects are similar if they are related to similar objects&quot;, SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes... more

ABSTRACT Based on the intuition that &quot;two objects are similar if they are related to similar objects&quot;, SimRank (proposed by Jeh and Widom in 2002) has become a famous measure to compare the similarity between two nodes using network structure. Although SimRank is applicable to a wide range of areas such as social networks, citation networks, link prediction, etc., it suffers from heavy computational complexity and space requirements. Most existing efforts to accelerate SimRank computation work only for static graphs and on single machines. This paper considers the problem of computing SimRank efficiently in a distributed system while handling dynamic networks which grow with time. We first consider an abstract model called Harmonic Field on Node-pair Graph. We use this model to derive SimRank and the proposed Delta-SimRank, which is demonstrated to fit the nature of distributed computing and can be efficiently implemented using Google&#39;s MapReduce paradigm. Delta-SimRank can effectively reduce the computational cost and can also benefit the applications with non-static network structures. Our experimental results on four real world networks show that Delta-SimRank is much more efficient than the distributed SimRank algorithm, and leads to up to 30 times speed-up in the best case1.

Publication Date: 2012

Research Interests:
Distributed Computing, Computational Complexity, Distributed System, Link Prediction, Network structure, and 3 moreSocial Network, Dynamic Networks, and Single Machine

Research Interests:
Distributed System

Download (.pdf)

Download (.docx)

Publication Date: 2010

Research Interests: Cluster Computing, Experimental Study, and Data Intensive Computing<div>()</div>

Publication Date: 2010

Research Interests: Distributed Data Mining, Experimental Evaluation, Storage system, Replication, Fault Tolerant, and Program Generation<div>()</div>

Publication Date: 2011

Publication Date: 2009

Research Interests: Cloud Computing and Data storage<div>()</div>

Publication Date: 2007

Research Interests: Distributed System<div>()</div>

Publication Date: 2010

Publication Date: 2012

Research Interests: Distributed System<div>()</div>

Log In

Research Interests:
Cluster Computing, Experimental Study, and Data Intensive Computing

Research Interests:
Distributed Data Mining, Experimental Evaluation, Storage system, Replication, Fault Tolerant, and Program Generation

Research Interests:
Cloud Computing and Data storage

Research Interests:
Distributed System

Research Interests:
Distributed System