Blockstack PDF
Blockstack PDF
Blockstack PDF
Technical Whitepaper
Copyright 2017 Blockstack PBC -- a Public Benefit Corp. All rights reserved.
Parts of this whitepaper were published earlier in the following peer-reviewed
conferences and magazine:
DISCLAIMER: The Blockstack Tokens are a crypto asset that is currently being developed by Block-
stack Token LLC, a Delaware limited liability company, whose website can be found at www.blockstack.com.
This whitepaper does not constitute an offer or sale of Blockstack Tokens (the Tokens) or any other mech-
anism for purchasing the Tokens (such as, without limitation, a fund holding the Tokens or a simple agree-
ment for future tokens related to the Tokens). Any offer or sale of the Tokens or any related instrument
will occur only based on definitive offering documents for the Tokens or the applicable instrument.
Blockstack: A New Internet for Decentralized Applications
Muneeb Ali∗ Ryan Shea∗ Jude Nelson Michael J. Freedman†
http://blockstack.org
Whitepaper Version 1.1
Abstract
The traditional internet has many central points of failure and trust, like (a) the Do-
main Name System (DNS) servers, (b) public-key infrastructure, and (c) end-user data
stored on centralized data stores. We present the design and implementation of a new
internet, called Blockstack, where users don’t need to trust remote servers. We remove
any trust points from the middle of the network and use blockchains to secure critical
data bindings. Blockstack implements services for identity, discovery, and storage and
can survive failures of underlying blockchains. The design of Blockstack is informed by
three years of experience from a large blockchain-based production system. Blockstack
gives comparable performance to traditional internet services and enables a much-needed
security and reliability upgrade to the traditional internet.
1 Introduction
The internet was designed more than 40 years ago and is showing signs of age. Critical
internet services can be taken offline by attacks like the DDoS attack on DNS servers [1].
Further, in the current internet architecture users implicitly trust certain hidden services
and intermediaries like domain name servers and certificate authorities (CAs). These
trust points can be exploited to trick users into connecting to malicious websites like
the recent incident where a Turkish CA issued false security certificates for Google [2].
Over the last decade, we’ve seen a shift from desktop apps (that run locally) to
cloud-based apps that store user data on remote servers. These centralized services are
a prime target for hackers and frequently get hacked. In 2016, Yahoo! admitted to
losing information for 500 million people [3]. Security problems with the core internet
∗
Co-primary author.
†
Professor of Computer Science at Princeton University and an advisor to Blockstack.
1
infrastructure and the centralized data models of web services built on top have exposed
flaws in the internet’s original design.
Blockstack is an open-source effort to re-decentralize the internet; it builds a new
internet for decentralized applications and enables users to own their application data
directly [4]. Blockstack uses the existing internet transport layer (TCP or UDP) and
underlying communication protocols and focuses on removing points of centralization
that exist at the application layer. Alternate transport layer protocols, like new mesh
networking protocols [5], can be supported with Blockstack.
There are many fundamental technical challenges with creating a fully decentral-
ized replacement for core internet components like DNS, public-key infrastructure, and
storage backends. New users/nodes need to establish trust on the network and discover
the relevant data without relying on any remote servers. The decentralized solutions
need to give comparable performance to the traditional internet and scale accordingly
as well. Our implementation of Blockstack has three components:
Blockstack is deployed in production and, to date, 74,000 new domains have been
registered on it with several companies and open-source contributors actively developing
new services using Blockstack [4]. We’ve released Blockstack as open-source [7].
2 System Architecture
Blockstack has the following design goals:
Until recently, decentralized naming systems with human-readable names were con-
sidered impossible to build (see Zooko’s Triangle in Section 3) and decentralized storage
2
systems like BitTorrent, etc. don’t offer performance/bandwidth comparable to central-
ized services [8]. Blockstack presents a solution to these problems.
3
Storage
Amazon S3 Dropbox Microsoft Azure FreeNAS Server Google Drive
Peer Network
Zone file hash Zone file
URI’s in zone files
point to stored data
local DB
Blockchain
n n+1 n+2 n+3 n+4
Layer 1: Blockchain
In our architecture the blockchain occupies the lowest layer, and serves two purposes:
it provides the storage medium for operations and it provides consensus on the order in
which the operations were written. Virtualchain encodes operations in transactions on
the underlying blockchain. The blockchain provides an abstraction of totally-ordered
operations to virtualchain and serves as the “narrow waist” of our architecture. A lot
of complexity, like mining operations, consensus algorithms, cryptocurrency fluctua-
tions etc., are hidden underneath this abstraction. The higher layers only care about
reading/writing totally ordered operations and can operate on top of any blockchain.
The blockchain layer also includes a virtualchain, which defines new operations with-
out requiring changes to the underlying blockchain. Nodes of the underlying blockchains
4
are not aware of this layer. Virtualchains are like virtual machines, where a specific
VM like Debian 8.7 can run on top of a specific physical machine. Different types of
virtualchains can be defined and they run on top of the specific underlying blockchain.
Virtualchain operations are encoded in valid blockchain transactions as additional meta-
data. Blockchain nodes do see the raw transactions, but the logic to process virtualchain
operations only exists at the virtualchain level.
The rules for accepting or rejecting virtualchain operations are defined in the spe-
cific virtualchain, e.g., a virtualchain which defines a single state machine implementing
operations for a global naming system. Operations accepted by rules defined in our
virtualchain are processed to construct a database that stores information on the global
state of the naming system along with state changes at any given blockchain block.
Layer 3: Storage
The top-most layer (layer-3) is the storage layer, which hosts the actual data values and
is part of the data plane. All stored data values are signed by an owner key defined in
the control plane. By storing data values outside of the blockchain, Blockstack allows
values of arbitrary size and allows for a variety of storage backends. Users do not
need to trust the storage layer and can verify their integrity in the control plane.
Our design benefits from the performance and reliability of the backend cloud storage
systems used and offers comparable performance to traditional internet services.
Blockstack implements a decentralized naming system, called the Blockchain Name
5
System (BNS) by defining operations in a new virtualchain and storing discovery data
in a peer network called the Atlas Network (Section 5). Our virtualchain uses the
underlying blockchain to achieve consensus on the state of BNS and binds names to
data records. Relying on the consensus protocol of the underlying blockchain, our
virtualchain can provide a total ordering for all operations supported by BNS, like
name registrations, name updates and name transfers. Our virtualchain represents the
global state of BNS, including who owns a particular name and what data is associated
with a name. We present more details on these components in the next sections.
6
blockchain. Our experience of running a production network on Namecoin revealed
certain security and reliability issues that highlight the need for using the largest most-
secure blockchain network [17].
7
DNS Root Servers
3
6
2
TLD Servers TLD Servers
.com .edu
7
5 4
End-user
2
TLD Blockchain TLD Blockchain
.app .id
sync 3
Peer Network
8
Pricing Functions: Anyone can create a namespace or register names in a names-
pace, as there is no central party to stop someone from doing so. Pricing functions
define how expensive it is to create a namespace or to register names in a namespace.
Defining intelligent pricing functions is a way to prevent “land grabs” i.e., stop people
from registering a lot of namespaces/names that they don’t intend to actually use. BNS
has support for sophisticated pricing functions. For example, we created a .id names-
pace in our implementation of BNS with a pricing function where (a) the price of a
name drops with an increase in name length and (b) introducing non-alphabetic char-
acters in names also drops the price. With this pricing function, the price of john.id >
johnadam.id > john0001.id. The function is generally inspired by the observation that
short names with alphabetics only are considered more desirable on namespaces like the
one for Twitter usernames. It’s possible to create namespaces where name registrations
are free as well. Further, we expect that in the future there will be a reseller market
for names, just as there is for DNS. A detailed discussion of pricing functions is out of
the scope of this whitepaper, and the reader is encouraged to see [15] for more details
on pricing functions and name squatting problems in decentralized naming systems.
Blockstack uses BNS as the default naming system. BNS is implemented by defining
a state machine and rules for state transitions in a new virtualchain. We store zone files
in a new peer network, called the Atlas Network. We present the details for our BNS
implementation with virtualchains and Atlas in Section 4 and Section 5 respectively.
Like names, namespaces also have a pricing function [7]. To start the first namespace on
Blockstack, the .id namespace, we paid 40 bitcoins ($10,000 at the time) to the network.
This shows that even the developers of this decentralized system have to follow the rules
and pay appropriate fees.
9
BNS already provides public key associations with domain names and all domains,
by default, get certificates. While efforts like Lets Encrypt [19] are reducing the cost
of obtaining digital certificates and encouraging more websites to enable secure connec-
tions, a vast majority of the internet still runs on insecure connections. If the naming
system binds public keys by default, then all websites have security certificates and
security is on by default. In BNS, domain names can serve as memorable identifiers for
public keys. The names themselves make no implication about identity and are used as
memorable identifiers only. Third-party attestations can be attached to the memorable
name later on. Further, all BNS nodes have access to a single global state, so any key
revocations or state changes to public key mappings cannot be hidden from any user.
4 Virtualchain
Public blockchains are becoming a universal network service. However, it’s hard to
make consensus-breaking changes to production blockchain networks. Introducing new
features directly in a blockchain requires everyone on the network, including miners, to
upgrade. These upgrades potentially break consensus and cause forks [18]. Our experi-
ence with the Namecoin blockchain shows that starting new, smaller blockchains leads
to security problems, like reduced computational power needed to attack the network,
and should be avoided when possible [17]. To overcome this, we created virtualchains,
a virtual blockchain for creating arbitrary state machines on top of already-running
blockchains. Virtualchains, like virtual machines, enable the ability to migrate (from
one blockchain to another) and improve fault tolerance. We used virtualchains to mi-
grate a production network from Namecoin to Bitcoin earlier [20]. The migration showed
that virtualchains could be used to cope with failures with the underlying blockchain.
Blockchains provide a totally-ordered, tamper-resistant log of state transitions. New
applications can store a log of all state changes in a public blockchain, such as Bit-
coin [21], Litecoin [22], or Ethereum [10]. By using the blockchain as a shared com-
munication channel, these applications can then bootstrap global state in a secure,
decentralized manner, since every node on the network can independently construct the
same state.
However, there are two key challenges to using blockchains as a building block for
decentralized applications and services:
1. First, a blockchain can fail, i.e., it can go offline, or its consensus mechanism can
become “centralized” by falling under the de facto control of a single entity. To
tolerate such failures, it should be possible to migrate application state across
blockchains efficiently.
2. The second challenge is that the application’s log can be forked and corrupted
if the underlying blockchain forks. Under a blockchain fork, nodes on different
forks will write and read different events. The blockchain may drop and re-order
transactions when the forks resolve, causing bootstrapping nodes to construct
10
ata Plane
Storage Drivers
Blockchain
Control Plane
op_code, hash
Virtualchain
Virtual Blockchain name_op, hash
op_code, hash
name_op, hash
op_code, hash
name_op, hash
name_op, hash
op_code, hash
op_code, hash
name_op, hash
Bitcoin Blockchain
11
virtualchain tx
}
}
}
CH(n-4) CH(n-3) CH(n-2) CH(n-1) V = Merkle(tx ∈ b )
n n
P = {CH(p) | p = n - 2i }
n
CH(n) = Hash(V + P )
n n
power is controlled by honest peers [18]. This means that most of the time, transac-
tions are very likely to be durable and linearizable after a constant number of blocks
(confirmations) have been appended on top of them. We use these properties to imple-
ment fork*-consistent replicated state machines (RSMs) on top of public blockchains.
Application nodes read the blockchain to construct state machine replicas and submit
new transactions to the blockchain to execute state transitions.
Consensus Hashes: To make forward progress, nodes read new blockchain trans-
actions and determine whether or not each transaction of the underlying blockchains
represents a valid state transition in virtualchain. Since anyone can write transactions
and they can get arbitrarily delayed, nodes must be able to filter transactions (and as-
sociated state transitions) and ignore transactions that relate to a fork that they’re not
interested in. We achieve this by requiring that the current consensus hash is announced
in new transactions.
A consensus hash is a cryptographic hash that each node calculates at each block.
It is derived from the accepted state transitions in the last-processed block, and a
geometric series of prior-calculated consensus hashes. Figure 5 shows this process. Let
tx ∈ bn be the sequence of transaction logs found in block bn , let M erkle(tx ∈ bn ) be a
function that calculates the Merkle tree root over these transactions, and let Hash(x)
be a cryptographic hash function. Then, we define CH(n) to be the consensus hash at
block n, where
Vn = M erkle(tx ∈ bn ) (1)
Block b0 contains the first log entry, while Pn is the geometric series of prior consen-
sus hashes starting from b, i.e., the consensus hash for the previous block, two blocks
ago, four blocks ago, etc.
12
Users include their latest known CH(n) in each transaction they submit through
their clients, and applications ignore state transitions with “stale” (too old) or unknown
consensus hashes. This way, applications ignore forks of their own log, and application
users (or the clients they’re using) can tell when to retry lost transactions (announcing
state transitions). In doing so, consensus hashes preserve the join-at-most-once
property of fork*-consistency: an application will accept a state transition with
CH(n) only if it has accepted all the prior state-transitions that derived CH(n).
Fast Queries: Not all users will have a copy of the full blockchain on their machine.
We use a protocol for fast queries that is useful for creating “lightweight nodes” that
do not need blockchain or state replicas. Instead, they can query highly-available but
untrusted “full nodes” (which have a full copy of the blockchain) as needed. For example,
Blockstack’s virtualchain uses this feature to implement a Simple Name Verification
(SNV) protocol [26].
For fast queries, application users obtain CH(n) from a trusted node, such as one
running on the same host. A user can then use this trusted CH(n) to query previous
state transitions from untrusted nodes in a logarithmic amount of time and space. To
do so, it iteratively queries and verifies Pn and M erkle(tx ∈ bn ) using CH(n) until
it finds CH(n0 ) and M erkle(tx ∈ b0n ), where b0 is the block that contains the state
transition to query. Once it has M erkle(tx ∈ b0n ), it can ask for and verify the previous
state transitions (tx ∈ b0n ).
Blockchain Fork Detection and Recovery: If the transaction logs never retroac-
tively fork, the application logic and consensus hashes can preserve the legitimate-
request property of fork*-consistency. Retroactive forks in proof-of-work blockchains
are highly unlikely, but they can occur since an entity can (theoretically) come up with
a longer blockchain with a different transaction history of old blocks (called a “deep
chain reorg”). Short-lived forks, on the other hand, are fairly common and are not an
issue for applications/services built with virtualchains. Nodes avoid short-lived forks
by only accepting sufficiently-confirmed transactions. Applications may increase the
number of required confirmations to decrease the likelihood of loss or reordering, e.g.,
Blockstack requires 10 confirmations (in the Bitcoin blockchain).
To detect deep chain reorgs, a node runs multiple processes that subscribe to a
geometric series of prior block heights. If a process at a lower height derives a different
consensus hash than one from a higher height, then a blockchain fork might have oc-
curred, and all processes at higher heights have potentially-divergent state. This means
all running nodes may be in a separate fork set from bootstrapping nodes.
We can automate deep reorg discovery, but reconciling the fork sets requires human
intervention, since irreversible actions taken by the application may be based on now-lost
state transitions. Fortunately, long-lived forks are rare and severe enough to be widely
noticed [27] [28] [29]. This means that when they happen, end-users or app developers
can determine which transactions were affected, and re-send state-transitions.
13
Blockchain A
migrate_to(B, state at bn )
Blockchain B
5 Atlas Network
Blockchains have limited bandwidth and cannot store much data. Every node on the
network has a copy of the data stored on blockchains, and they typically grow linearly
with time, e.g., the Bitcoin blockchain grew from 14GB to 120GB between 2014 and
2017 [31]. In our architecture, only pointers to data values are kept in the blockchain;
peer-networks are used as additional storage. In the Blockstack implementation, the
peer-networks store zone files for BNS (these zone files are identical to DNS zone files).
14
Using peer networks significantly increases the storage capacity but comes with other
challenges: traditional peer-networks are susceptible to Sybil attacks [32] and are not a
reliable source of data, especially under high churn.
In peer networks participating nodes are equally privileged and collaborate to per-
form a function or provide a service. Peer networks were popularized by file sharing
networks like Napster in 1999 [33]. Nodes in a peer network maintain a connection
to a subset of other peers on the network and these connections can be structured or
unstructured (random connections to peers). In our architecture, we use peer networks
for content discovery. Pointers to large data files are stored in peer networks, while the
actual data resides on storage backends (Section 6).
The reliability of the applications and services running on our internet architecture
depends on the reliability of the blockchain layer and discovery/storage peers. Out of
the different layers, the peer networks used for discovery are the most vulnerable to
reliability issues (cloud storage providers have 99.9% uptime SLAs [34] and blockchains
are fully-replicated across peers). Theoretically, any person or company can decide to
run a (centralized) index of discovery data for their particular app/service. Apps can
also choose to index/mirror only a particular namespace (TLD), and they don’t have
to index the pointers to all data. This helps with scalability. Let’s say there are m
namespaces with n name-value pairs in each namespace. Instead of indexing O(m × n)
records, you can index O(n) records and n for your namespace could be significantly
small. However, realistically, the global Blockstack network should have at least one, if
not more, default discovery service for all data in addition to any specialized app-specific
discovery services. Further, the global discovery service cannot violate the trust-to-trust
principle and cannot be centralized. This implies the need to use decentralized peer-to-
peer networks for content discovery.
Challenges with Peer Networks: Peer networks are well studied in distributed
systems [35, 8] and researchers have identified several challenges with peer networks.
15
data every so often to keep it available. Data sources can go off-line before repub-
lishing [41]. Structured peer networks can split into one or more disjoint networks
due to partitions, and re-join later on. This can lead to inconsistent state; some
clients can see one value for the key, and other clients can see a different value.
• Junk Data Writes: Without some rate-limiting or access-control mechanism,
peer networks have no way to limit the amount of data inserted. An adversary
can flood the peer network with lots of garbage data and knock nodes off-line.
• Node Eclipse Attack. In structured peer networks, an attacker can take over
the neighbors of all nodes storing a particular key/value pair and effectively censor
nodes/keys from the network. Such Sybil-attacks are a general problem for struc-
tured peer networks with no good solutions available without requiring centralized
gatekeepers or human input on peer connections [42].
For BNS, the size of individual zone files is fairly small (<4KB) and the total space
needed to store them increases linearly with the no. of domain registrations. Currently,
it takes only 300MB to store all zone files of the 70,000 domains registered on BNS
and 100GB space can store zone files for all 250 million ICANN domains (which is
smaller than the size of the current Bitcoin blockchain). Inspired by the need to store
the (small-sized) zone files of BNS, we designed a new peer network called the Atlas
Network. The Atlas network solves a particular case of decentralized storage using peer
networks–the case where:
All Atlas nodes maintain a 100% state replica, and they organize into an unstruc-
tured overlay network. The unstructured approach is easier to implement, has no over-
head for maintaining routing structure and is resilient against targeted node attacks.
When a new Atlas node boots up, it first gets the index of all data keys and hashes of
values stored in the blockchain. After getting the index, Atlas nodes talk to their peers
to fetch key/value pairs they dont have. The Atlas network implements a K-regular
random graph. Each node selects K other nodes at random to be its neighbors using
the Metropolis-Hastings Random Walk algorithm with delayed acceptance (MHRW-
DA [43]), and regularly asks them for the set of key/value pairs they have. Peers pull
missing key/value pairs in rarest-first order to maximize availability, i.e., new key/value
pairs written to the network are given preference for propagation through the network.
In addition to storing key/value pairs locally, peers can also write them to remote
backup locations (e.g., a service like Dropbox or S3) for additional protection against
data loss. When a peer receives a missing key/value pair, it pushes it to its immediate
neighbors that dont have it yet.
Atlas nodes already know the hashes of the zone files so that no one can upload
invalid data. Data is replicated on O(N ) nodes instead of only on a small subset of
nodes (typical of DHT-based networks). The Atlas network makes censoring attacks
16
expensive. Censoring the entire network requires attacking O(N ) nodes. By contrast,
only O(logN ) DHT nodes need to be taken over to censor a key/value pair for everyone.
Even then, the victim node will detect the censorship unless the attacker also eclipses
the victims Bitcoin node (which requires building a fraudulent blockchain fork with
sufficient proof-of-work). We believe that the Atlas network is a significant step forward
towards having a reliable, hard-to-censor, and decentralized peer network.
In our production deployment, during September 2015 and Nov 2016, we used a
Kademlia-based DHT network. We didn’t notice any explicit node eclipse attacks, but
we did encounter partitions of the DHT overlay where some nodes hosted in Hong Kong
and Europe would end up on a different partition. Churn is a general problem with
structured DHTs. Our DHT nodes were programmed not to accept data writes unless
a hash of the data is present in the blockchain, i.e., someone has paid a fee to gain
access to write data. The DHT-based discovery network served as an acceptable initial
design, but with a growing network, the daily and hourly churn became a bigger issue.
The Blockstack implementation switched to the Atlas network from the DHT-based
discovery network in Fall 2016, and since November 2016, we have been distributing
BNS zone files using the Atlas network.
Network Partitions: The Atlas network is more reliable than the previous DHT
network. For our DHT deployment, we frequently ran into network partition issues
where some nodes, e.g., in Hong Kong would get disconnected from the “mainline” DHT.
Between September 2015 and September 2016, there were at least 7 major incidents
where we had to work with our community to restore network partitions in our DHT
deployment. Since moving entirely to the Atlas network, between November 2016 and
the time of this writing (May 2017), we’ve had 0 incidents of network partitions or any
other network outage. In fact, there is no concept of a network partition on the network
since Atlas is unstructured and all nodes have a full replica.
Node Recovery: Atlas nodes can recover from failures on their own. If the lo-
cal index of the Atlas data becomes corrupt, the nodes can reconstruct it from the
blockchain data. Nodes can also re-fetch all zone files in case of a data loss. We ran
an experiment where we intentionally destroyed zone files on Atlas nodes and 100% of
our nodes were able to recover from a complete data loss within hours fully.
The Atlas network is self-healing in that aspect and can recover from failures even if
very few copies of data remain on the peer network.
17
Storage Layer
Dropbox Amazon S3 Google Drive FreeNAS Server
(encrypted data)
Peer Network
lookup(URI)
Discovery Layer
Peer Node
(full index)
sh)
(ha
kup
loo
Virtualchain Layer
Block N-5 Block N-4 Block N-3 Block N-2 Block N-1 Block N
(name, hash)
Blockstack has the potential to release users from these data silos by giving users
access to a decentralized storage system, called Gaia, that provides comparable perfor-
mance to centralized cloud providers. Users can log in to apps and services by using
blockchain-based decentralized identity [44] and save data generated by apps/services
on storage backends owned by the user (instead of the service provider). Gaia’s design
philosophy is to reuse existing cloud providers and infrastructure in a way that end-users
don’t need to trust the underlying cloud providers. We treat cloud storage providers
(like Dropbox, Amazon S3, and Google Drive) as “dumb drives” and store encrypted
and/or signed data on them. The cloud providers, like Dropbox, have no visibility into
user’s data; they only see encrypted data blobs. Further, since the associated public
keys or data hashes are discoverable through the blockchain channel, cloud providers
cannot tamper with user data.
In Gaia, the user’s zone file contains a URI record that points to the data, and the
data is constructed to include a signature from the user’s private key. Writing the data
involves signing and replicating the data (but not the zone file), and reading the data
involves fetching the zone file and data, verifying that hash(zonef ile) matches the hash
in the blockchain, and verifying the data’s signature with the user’s public key. This
allows for writes to be as fast as the signature algorithm and underlying storage system
allow, since updating the data does not alter the zone file and thus does not require any
blockchain transactions. However, readers and writers must employ a data versioning
18
scheme to avoid consuming stale data.
Figure 7 shows an overview of Gaia. We show an example encrypted data blob with
three replicated copies at Dropbox, Google Drive, and a FreeNAS Server (and not on
Amazon S3). In our Blockstack implementation, we have drivers for individual cloud
providers like Dropbox and S3, and integrate them as a storage backends. This hides
the individual APIs for storage backends and exposes a simple PUT/GET interface to
Blockstack users. Looking up data for a name, like werner.id, works as follows:
1. Lookup the name in the virtualchain to get the (name, hash) pair.
2. Lookup the hash(name) in the Atlas network to get the respective zone file (all
peers in the Atlas network have the full replica of all zonefiles).
3. Get the storage backend URI from the zonefile and lookup the URI to connect to
the storage backend.
4. Read the data (decrypt it if needed and if you have the access rights) and verify
the respective signature or hash.
19
network). We’re exploring the option to pack multiple virtualchain transactions into a
single blockchain transaction [45] for addressing blockchain scalability. This can enable
us to register several hundreds of millions of end-users. Scaling Blockstack to billions
of users in practice will likely uncover scalability issues that are not obvious right now
and addressing these challenges is an area of future work.
7 Conclusion
We present Blockstack, a new decentralized internet secured by blockchains. Blockstack
provides a full stack to developers for building decentralized applications including ser-
vices for identity, discovery, and storage. Blockstack can introduce new functionality
without modifying the underlying blockchains and can survive the failure of underlying
blockchains. The design of Blockstack is informed by 3 years of production experience
from one of the largest blockchain-based production systems to date. Our performance
results show that Blockstack can give comparable performance to cloud services on
the traditional internet and only introduces a small CPU overhead. We’ve released
Blockstack as open-source [7].
Acknowledgements
Blockstack was started by Muneeb and Ryan in 2014 and over the years many people
have contributed to it. We’d like to thank Larry Salibra, Guy Lepage, Patrick Stanley,
Aaron Blankstein, John Light, and 2,500+ open-source community members for their
contributions. We will update this list as we release new versions of the white paper.
References
[1] L. Newman, “What we know about fridays massive east coast internet outage,” Oct. 2016. https:
//www.wired.com/2016/10/internet-outage-ddos-dns-dyn/.
[2] S. Rosenblatt, “Fake turkish site certs create threat of bogus google sites,” Jan. 2013. http:
//cnet.co/2oArU6O.
[3] N. Perlroth, “Yahoo says hackers stole data on 500 million users in 2014,” Sept. 2016. http:
//nyti.ms/2oAqn0G.
[4] “Blockstack website,” 2017. http://blockstack.org.
[5] “Gotenna mesh networking.” https://www.gotenna.com.
[6] J. Nelson, M. Ali, R. Shea, and M. J. Freedman, “Extending existing blockchains with vir-
tualchain,” in Workshop on Distributed Cryptocurrencies and Consensus Ledgers (DCCL’16),
(Chicago, IL), June 2016.
[7] “Blockstack source code release v0.14,” 2017. http://github.com/blockstack/blockstack-core.
[8] R. Hasan, Z. Anwar, W. Yurcik, L. Brumbaugh, and R. Campbell, “A survey of peer-to-peer
storage techniques for distributed file systems,” in Proceedings of the International Conference on
Information Technology: Coding and Computing (ITCC’05) - Volume 02, ITCC ’05, pp. 205–213,
2005.
[9] “Namecoin.” https://namecoin.info.
[10] V. Buterin, “A next-generation smart contract and decentralized application platform,” tech. rep.,
2017. https://github.com/ethereum/wiki/wiki/White-Paper.
20
[11] M. Ali, J. Nelson, R. Shea, and M. J. Freedman, “Bootstrapping trust in distributed systems with
blockchains,” USENIX ;login:, vol. 41, no. 3, pp. 52–58, 2016.
[12] H. Balakrishnan, S. Shenker, and M. Walfish, “Semantic-Free Referencing in Linked Distributed
Systems,” in Peer-to-Peer Systems II, vol. 2735 of Lecture Notes in Computer Science, pp. 197–
206, Springer Berlin / Heidelberg, 2003.
[13] Juan Benet, “IPFS - Content Addressed, Versioned, P2P File System,” draft, ipfs.io, 2015. https:
//github.com/ipfs/papers.
[14] D. Mazieres, M. Kaminsky, M. F. Kasshoek, and E. Witchel, “Separating key management from
file system security,” in Proc. 17th SOSP, (Kiawah Island Resort, SC), 1999.
[15] H. Kalodner, M. Carlsten, P. Ellenbogen, J. Bonneau, and A. Narayanan, “An empirical study
of Namecoin and lessons for decentralized namespace design,” WEIS ’15: Proceedings of the 14th
Workshop on the Economics of Information Security, June 2015.
[16] D. Kaminsky, “Spelunking the triangle: Exploring aaron swartzs take on zookos triangle,” Jan
2011. http://dankaminsky.com/2011/01/13/spelunk-tri/.
[17] M. Ali, J. Nelson, R. Shea, and M. Freedman, “Blockstack: A global naming and storage system
secured by blockchains,” in Proc. USENIX Annual Technical Conference (ATC ’16), June 2016.
[18] J. Bonneau, A. Miller, J. Clark, A. Narayanan, J. A. Kroll, and E. W. Felten, “Sok: Research per-
spectives and challenges for bitcoin and cryptocurrencies,” in 2015 IEEE Symposium on Security
and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pp. 104–121, 2015.
[19] “Let’s encrypt.” https://letsencrypt.org.
[20] “Why Blockstack is migrating to the Bitcoin blockchain.” https://blockstack.org/blog/
why-blockstack-is-migrating-to-the-bitcoin-blockchain.
[21] Satoshi Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,” tech report, 2009. https:
//bitcoin.org/bitcoin.pdf.
[22] “Litecoin.” https://litecoin.org.
[23] J. Li and D. Maziéres, “Beyond one-third faulty replicas in byzantine fault tolerant systems.,” in
Proc. 4th USENIX/ACM Symposium on Networked Systems Design and Implementation (NSDI
’07), (February), 2007.
[24] “Statistics of usage for bitcoin OP RETURN.” Retrieved from http://opreturn.org in May 2017.
[25] M. Jakobsson and A. Juels, “Proofs of work and bread pudding protocols,” in Secure Information
Networks, pp. 258–272, Springer, 1999.
[26] “Simplified name verification protocol.” http://blockstack.org/docs/light-clients.
[27] “Bitcoin Improvement Proposal 50.” https://github.com/bitcoin/bips/blob/master/
bip-0050.mediawiki.
[28] “Bitcoin Improvement Proposal 66.” https://github.com/bitcoin/bips/blob/master/
bip-0066.mediawiki.
[29] “List of Bitcoin CVEs.” https://en.bitcoin.it/wiki/Common_Vulnerabilities_and_Exposures.
[30] “Virtualchain source code release v0.14.1,” May 2017. http://github.com/blockstack/
virtualchain.
[31] “Bitcoin blockchain size.” https://blockchain.info/charts/blocks-size.
[32] J. R. Douceur, “The sybil attack,” in Revised Papers from the First International Workshop on
Peer-to-Peer Systems, IPTPS ’01, (London, UK), pp. 251–260, Springer-Verlag, 2002.
[33] B. Carlsson and R. Gustavsson, The Rise and Fall of Napster - An Evolutionary Approach,
pp. 347–354. Springer Berlin Heidelberg, 2001.
[34] “Google Cloud Storage SLA.” Retrieved from https://cloud.google.com/storage/sla in May
2017.
[35] B. P. B, K. Bertels, and S. Vassiliadis, “A survey of peer-to-peer networks,” in 16th workshop on
circuits, systems, and signal processing (proRISC), 2005.
[36] J. Liang, R. Kumar, and K. Ross, “The KaZaA Overlay: A Measurement Study,” Sept. 2004.
[37] O. Heckmann, A. Bock, A. Mauthe, and R. Steinmetz, “The eDonkey file-sharing network.,” in
GI Jahrestagung (2) (P. Dadam and M. Reichert, eds.), vol. 51 of LNI, pp. 224–228, GI.
[38] E. K. Lua, J. Crowcroft, M. Pias, R. Sharma, and S. Lim, “A survey and comparison of peer-to-
peer overlay network schemes,” Commun. Surveys Tuts., vol. 7, pp. 72–93, Apr. 2005.
21
[39] M. J. Freedman, E. Freudenthal, and D. Mazieres, “Democratizing content publication with coral,”
in Proc. 1st NSDI, (San Francisco, CA), 2004.
[40] V. Ramasubramanian and E. G. Sirer, “Beehive: O(1)lookup performance for power-law query
distributions in peer-to-peer overlays,” in Proceedings of the 1st Conference on Symposium on
Networked Systems Design and Implementation - Volume 1, NSDI’04, pp. 8–8, 2004.
[41] M. J. Freedman, “Experiences with coralcdn: A five-year operational view,” in Proceedings of the
7th USENIX Conference on Networked Systems Design and Implementation, NSDI’10, 2010.
[42] C. Lesniewski-Laas and M. F. Kaashoek, “Whānau: A Sybil-proof distributed hash table,” in
Proceedings of the 7th USENIX Symposium on Networked Systems Design and Implementation
(NSDI ’10), (San Jose, CA), Apr. 2010.
[43] C.-H. Lee, X. Xu, and D. Y. Eun, “Beyond random walk and metropolis-hastings samplers:
Why you should not backtrack for unbiased graph sampling,” in Proceedings of the 12th ACM
SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling
of Computer Systems, SIGMETRICS ’12, pp. 319–330, 2012.
[44] “What is a blockstack id?.” https://blockstack.org/docs/blockchain-identity.
[45] “Chainpoint white paper.” https://tierion.com/chainpoint.
22