Dynamic Replication in Data-Centers Connected Over IPFS
Dynamic Replication in Data-Centers Connected Over IPFS
Dynamic Replication in Data-Centers Connected Over IPFS
Abstract— Default replication rate in IPFS-Cluster[5] some- the potential peers and place the CID or pin the CID to the
times create bottlenecks in no. of request a cluster could handle top peer obtained after that sorting metric. The replication
when multiple Data-Centers are connected to each other over factor however remain constant.
IPFS[4] to share data using IPFS-Cluster. Dynamic placement
and replication of data is necessary both inside the cluster and Our main contribution to this project would be to support
among the connected cluster to deal with the overload of request dynamic replication in IPFS Cluster and to use dynamic
on one node or cluster in the given setting i.e multiple cluster replication to load balance the no of request for a CID within
connected over IPFS using IPFS-Cluster. and among clusters by modifying the no. of replica it has
both within and among the cluster.
I. P ROBLEM D ESCRIPTION
To achieve the above stated goal we need to consider the
IPFS is a distributed file-system that seeks to connect following
all computing devices with the same system of files. IPFS-
Cluster on the other hand is a software to orchestrate IPFS 1) Finding Hot-spots : These are the CIDs experiencing
daemons running on different hosts. An IPFS-Cluster is the most no. of read/write requests. This could be
formed by a number of Peers, each of them associated to achieve by looking at the request to the cluster and
one IPFS daemon. The peers share a pin-set (also known setting a threshold for which the CID would be termed
as shared state) which lists the CIDs (Content Identifiers) as hot. All the request eventually goes to the leader
which are cluster-pinned and their properties (allocations, elected by the consensus protocol. So changes are
replication factor etc.). Multiple Data-Centers could also required in how this leader processed the request.
connect there clusters by installing IPFS over there clusters 2) Supporting Dynamic replication : Right now the
and sharing the Secret key(used by new nodes to connect replication is done based on the default replication
with the peer of the cluster).IPFS Cluster uses Raft[3] value field in the configuration file. This need to be
consensus to maintain a consistent view of peer-set(peers changed to support dynamic replication on the go.
participating the cluster) and pin-set. 3) Replication Metric : This would be the core of the
problem to decide which CID need to be replicated
how much times based on the no. of request it get.
The replication metric need to consider both the no of
replicas of CID and request it gets within and across
data-centers.
Some Changes to the shared pin-set state need to be
done to track the no of request for each CID per cluster
basis. Based on this no. we can track which hot CID
is under-replicated and perform dynamic replication of
that respective CID.
4) Cold CIDs : Since we are pinning more instances of
hot CIDs. Cold CIDs (CIDs with no. of request below
min threshold) need to be under-pinned. This could
also be done by analysing shared state from time to
time.