15. Dryad &&DryadLINQ
• Dryad is a general‐purpose di ib d
di l distributed
execution engine for coarse‐grain data‐
parallel applications. A Dryad application
combines computational “vertices” with
communication “channels” to form a
dataflow graph. Dryad runs the application
by executing the vertices of this graph on a
set of available computers, communicating g
as appropriate through files, TCP pipes, and
shared‐memory FIFOs
y
17. Dryad System Architecture
data plane
Files, TCP, FIFO, Network
Files, TCP, FIFO, Network
job schedule
V V V
NS PD PD PD
Job manager
J b control plane
control plane cluster
l t
17
19. Dryad
• select distinct p.objID
• from photoObjAll p
f h t ObjAll
• join neighbors n — call this join “X”
• on p.objID = n.objID
p j j
• and n.objID < n.neighborObjID
• and p.mode = 1
• join photoObjAll l call this join “Y”
l — call this join Y
• on l.objid = n.neighborObjID
• and l.mode = 1
• and abs((p.u‐p.g)‐(l.u‐l.g))<0.05
• and abs((p.g‐p.r)‐(l.g‐l.r))<0.05
• and abs((p.r‐p.i)‐(l.r‐l.i))<0.05
and abs((p r‐p i)‐(l r‐l i))<0 05
• and abs((p.i‐p.z)‐(l.i‐l.z))<0.05
• WWWWB?
20. Pregel
• 图算法 以被写成是 系列的链式
图算法可以被写成是一系列的链式mapreduce作业。 作业
• 可用性和性能。Pregel将顶点和边在本地机器进行运算,
而仅仅利用网络来传输信息,而不是传输数据。
而仅仅利用网络来传输信息 而不是传输数据
• MapReduce本质上是面向函数的,所以将图算法用
mapreduce来实现就需要将整个图的状态从一个阶段传输
到另外一个阶段,这样就需要许多的通信和随之而来的序
列化和反序列化的开销
列化和反序列化的开销。
• mapreduce作业中各阶段需要协同工作也给编程增加了难
度,这样的情况能够在Pregel的各轮superstep的迭代中避
g p p
免。
44. 简介-基本事件
简介 基本事件
• Record • Segment
add record add segment
batch add record del segment
timer record modify segment
full record occupy segment
del record reduce
modify record
45. 简介-基本接口
简介 基本接口
• int init(Context context);
• int process(DataItem doc);
• int resolve(List<Value> valueSet);
• int uninit();
• 扩展接口…
扩展接口
59. 如何编写iprocess的job响应
• 响应事件
响应事件:
– int process(DataItem doc);
p ( );
– int resolve(list<Value> valueSet);
– vector<VM> Partition(doc);
– 扩展接口都是process函数的再定制
• Reduce(key, list<v>)
• Reduce(key, tree<v>)
• Reduce(key, tree<v>)
60. Iprocess-locality
Iprocess locality
• Mt
• St,gt与pn部署在相同物理机器
• 数据的局部性由master来监控st,gt的
或者l k i 的
master或者lockservice的metainfo,根据策
i f 根据策
略进行动态的processor迁移或者暂时组织
数据的迁移或者容忍数据的迁移
• Push computation to the data
65. Job-join
j
• Hadoop玩法
p
– Map‐Side Join
• All datasets must be sorted
• All datasets must be partitioned
• The number of partitions in
• the datasets must be identical.
the datasets m st be identical
• A given key has to be in the same partition in
g y p
each dataset
67. iprocess-join
iprocess join
• 物化 i
物化view. A.2=B.1 AND B.2=C.1
A B AND B C
View
{
table a;
view
{
table b;
table c;
t bl
joinFiled b.2, c.1;
}bc
joinFiled a.2, bc.1;
}
68. iprocess-实时join
iprocess 实时join
• 店主表join产品表,join key为memberid。注意这里的member数据和offer数据是并行并且没有顺序的即时(实时)插入
店主表join产品表 join key为memberid 注意这里的member数据和offer数据是并行并且没有顺序的即时(实时)插入
系统的(mr需要两张表所有数据都ready才能开始,而iprocess是数据随时插进来就开始计算‐想想那张图)。Join是实
时join,即join完一条记录就输出。这是和mapreduce的本质区别。 代码虽然和MR实现类似,但实时输出是系统来支
持的,即条件的完备性(同memberid的member数据和offer数据都插入了)开发者无需考虑。
class MemberMapper : public Mapper
{
public:
bli
void map(string key, v value, MapperContextPtr context)
{
context‐>write(value‐>get("member_id").toString(),value,"member");
}
}
class OfferMapper : public Mapper
{
public:
void map(string key, v value, MapperContextPtr context)
{
context‐>write(value‐>get(“member_id").toString(), value, "offer");
}
}
Reduce(key, tree<v> values, ReducerContext context)
{
Iterator<v> members = values.get("member");
Iterator<v> offers = values.get("offer");
while(members.hasNext()) {
v member= members.next();
while(offers.hasNext()) {
(o ()) {
v offer = offers.next();
context.write(offer.get("offer_id"), member‐>merger(offer));
}
}
}笛卡尔积