Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

03 BCT Bitcoin Cryptographic Concepts

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

Cryptographic Concepts for Blockchain

Source: Arvind Narayanan, Joseph Bonneau ,


Edward Felten , Andrew Miller , Steven
Goldfeder, “ Bitcoin and Cryptocurrency
Technologies: A Comprehensive Introduction”,
Princeton University Press, Princeton, NJ, 2016

1
Basic Cryptographic Concepts in Blockchain
 Properties of:
 Hash Functions
 Digital Signatures
 … and applications
 Build a basic cryptocurrency using these properties
Cryptography
 Cryptography provides a mechanism for securely encoding the rules of
a cryptocurrency system within the system itself.
 Prevent tampering and equivocation
 Use of ambiguity for concealing/hiding facts
 Encode rules for creation of new units of the currency in a mathematical
protocol
 Bitcoin relies on a handful of relatively well-known cryptographic
constructions
 Cryptographic Hashes
 Digital Signatures
 Zero-knowledge Proofs
 proposed extensions and modifications to Bitcoin
 Display only required information 3
Hash Function
▪ A hash function is a mathematical function with
the following three properties:
▪ Its input can be any string of any size.
▪ Fixed size output (assume a 256-bit output.)
▪ Efficiently computable
▪ Output of the hash function can be computed in a
reasonable amount of time

4
Cryptographic Hash Function

▪ Three additional properties for cryptographic


hash functions:
▪ Collision Resistance
▪ Hiding
▪ Puzzle Friendliness

5
Collision Resistance (1)
▪ A collision occurs when two distinct inputs produce the
same output.

▪ A hash function H(.) is collision resistant if nobody can find a


collision.
▪ Collision Resistance: A Hash function H is said to be
collision resistant if it is infeasible to find two values x and y
such that x ≠ y, yet H(x) = H(y)
▪ Nobody can find a collision 6
Collision Resistance[2]
▪ But collisions do exist. Why?
▪ Input Space is much larger than the output space

▪ More possibilities on the input side


▪ … but can anyone find them? 7
How to find a collision?
▪ Hash function with 256-bit output size:
▪ Pick 2256 + 1 distinct values
▪ Compute the hashes of each of them
▪ Check whether any two outputs are equal
▪ Guaranteed to find a collision

▪ Picking just 2130 + 1 distinct values is guaranteed to produce a


collision with a 99.8% probability 8
How to find a collision?
▪ Finding a collision by this method (brute-force) takes
too long time to matter in practical situations.
▪ If a computer calculates 10,000 hashes per second, it
would take > 1027 years to compute 2128 hashes.
▪ Another way of thinking about it:

9
Is there a faster way to find collisions?
 Consider the following Hash function:
 H(x) = x mod 2256
 Accepts input of any size and returns a fixed-sized (256 bits) output.
 Efficiently computable
 But, this function just returns the last 256 bits of the input
 One collision: 3 and 3+2256
 This function is not usable in practice
 For others, we do not know of one.
 No hash function has been proven collision-free
 SHA-256 (Secure Hash Algorithm)
 MD5 algorithm: collisions were found after years of work, function
deprecated and phased out of practical use.

10
Application: Message Digests
 Use Hash outputs as a message digest.
 Message Digests help us conclude whether two messages contain
the same content, without actually examining the messages.
 Scenario:
 Checking whether the current downloaded file is the same as the previously
uploaded file (to a cloud storage)?
 Keep a local copy and compare it to the downloaded file.
 Inefficient and wastes storage space
 Collision-resistant hashes provide an elegant and efficient solution to this
problem.
 Store the hash of the file locally
 Download the original file
 Compare its hash with the stored hash. 11
Property 2: Hiding[1]
 Informally this property states that, if we are given the output of a Hash
function H(x) = y, there is no feasible way to figure out the input x.
 This will not hold if the number of possible inputs x is limited
 easy to pre-calculate all y’s for all possible x’s.
 For example, consider a coin flip game
 Heads → H(“heads”), Tails → H(“Tails”)
 Seeing the Hash announced, an adversary can easily guess which one of
Heads/Tails by pre-computing the hashes of “Heads” and “Tails.”
 To achieve the hiding property, there must be no value of x that is
particularly likely.
 X has to be chosen from a set that is very spread out
 What if x Є {“heads”, “tails”}?
12
Property 2: Hiding[2]
 We can hide an input that is not spread out by concatenating (symbol
ǁ denotes concatenation) with another input that is spread out.
 Hiding: A has function H is said to be hiding if when a secret value
r is chosen from a probability distribution that has high min-
entropy, then given H(rǁx), it is infeasible to find x.
 In information theory, min-entropy is a measure of the predictability
of an outcome.
 High min-entropy captures the intuitive idea that the distribution (of
a random variable) is very spread out.
 If r is chosen uniformly from among all strings that are 256 bits
long, then any particular string is chosen with a probability of 1/2256,
an infinitesimally small value.
13
Applications: Commitments[1]
 A commitment is the digital analog of taking a value, sealing it in an
envelope, and putting that envelope on the table where everyone can
see it.
 Committed yourself to the value inside the envelope.
 The value remains a secret from everyone else.
 Open the envelope and reveal the value that was committed to earlier.
 Putting the sealed envelope in the table:
 Commit function applied to a random nonce plus msg, the value being
committed to, giving the commitment com
 Opening the envelope:
 Publish the random nonce and the message msg.
 Two properties of hiding and binding ensure that we cannot
commit to one value and then later claim that we committed to
another value. 14
Applications: Commitments[2]

15
Applications: Commitments[3]
 Commitment schemes can be implemented using a cryptographic
hash function.
 Commit(msg, nonce) := H(nonce ǁ msg), where nonce is a random
256-bit value.
 The properties required for commitment now become:
 Hiding: Given H(nonce ǁ msg), it is infeasible to find msg.
 Binding: It is infeasible to find two pairs (nonce, msg) and (nonce’,
msg’) such that msg ≠ msg’ and H(nonce ǁ msg) == H(nonce’ ǁ
msg’)
 The binding property is implied by the collision resistant property of
the underlying hash function.
16
Property 3: Puzzle Friendliness
 Puzzle friendliness: A hash function H is said to be puzzle friendly
if for every possible n-bit output value y, if k is chosen from a
distribution with high min-entropy, then it is infeasible to find x
such that H(k ǁ x) = y in time significantly less than 2n.
 Suppose we want to target the hash function to have some particular
output value y, and
 if part of the input k has been chosen in a suitably randomized way,
 then it’s very difficult to find another value that hits exactly that target.
 Search puzzle
 A mathematical problem that requires searching a very large space to find a
solution
 A search puzzle has no shortcuts
 No way to find a solution other than searching that large space. 17
 If H has an n-bit output, then it can take any of 2n values.
 Solving the puzzle requires finding an input (id ǁ x) such that the output falls
within the set Y, which is typically much smaller than the set of all outputs.
‘id’ is fixed, so we have to work with ‘x’
 Difficulty of the puzzle is determined by the size of Y.
 If Y is the set of all n-bit strings, then the puzzle is trivial, whereas if Y has
only one element, then the puzzle is maximally hard.
18
 For a puzzle friendly hash function, there’s no solving strategy
that is much better than just trying random values of x.
 To pose a puzzle that’s difficult to solve, we can do it this way
as long as we can generate puzzle-IDs in a suitably random way.
 Idea used in Bitcoin mining. 19
The SHA-256 Hash Function
 The SHA-256 Hash function is used in Bitcoin.
 Secure Hash Algorithm
 A Hash function should work on inputs of arbitrary lengths and yield
a fixed length output.
 Merkle-Damgard Transform: A generic method that converts a
Hash function that works on a fixed length input to a Hash function
that works on inputs of arbitrary lengths.
 In common terminology, the underlying fixed-length collision
resistant hash function is called a compression function.
 It can be shown that, if the underlying compression function is
collision resistant, then the overall hash function is collision
resistant as well. 20
The Merkle Damgard Transform [1]
 Input to compression function = length m
 Output of compression function = length n , m > n
 The input of the hash function (any size) is divided into blocks of
length (m-n)
 Pass each block together with the output of the previous block into
the compression function
 Input length is (m – n) + n = m, which is the input length to the
compression function.

21
The Merkle Damgard Transform [2]
 For the first block, where there is no previous block, an initialization
vector (IV) is used.
 Initialization vector is a standard & well-known initialization vector.
 For the last block, the input length may be < (m-n).
 The input is padded, so that its length is a multiple of 512 bits or (m-
n).
 Here m = 768 bits, n=256 bits and m-n = 512 bits
 The result (hash) is the output of the last block.

22
The Merkle Damgard Transform [3]
 To summarize:
 SHA-256 uses a compression function that takes 768-bit input
and produces 256-bit outputs.
 The block size is 512 bits.

23
Hash Pointers and Data Structures [1]
 Hash pointer: A hash pointer is a pointer
to where data is stored together with a
cryptographic hash of the value of this data
at some fixed point in time.
 Here, a familiar data structure that uses
pointers, such as a linked list or a binary
search tree is implemented with hash
pointers instead of ordinary pointers

• Block chain: A block chain


is a linked list that is built
with hash pointers instead of
pointers.
24
Hash Pointers and Data Structures [2]
 Regular linked list:
 A series of blocks
 Each block has data as well as a pointer to the previous block in the list.
 Block chain:
 The previous-block pointer of a regular linked list is replaced with a hash
pointer.
 Each block not only tells us:
 the location of the value of the previous block
 contains a digest of that value
 Allows us to verify that the previous block value hasn’t been changed.
 Head of the list: a regular hash-pointer that points to the most recent data
block 25
Blockchain as a Tamper-evident log
 Tamper-evident log: A log data structure that
stores data and allows appending of data to the
end of the log.
 Any alteration to the data that appears earlier in
the log can be easily detected
• Suppose an adversary wants to tamper with data in the middle of the chain.
• The adversary’s goal is to tamper data in such a way that someone who
remembers only the hash pointer at the head of the block chain won’t be
able to detect the tampering.
• Suppose the adversary changes the data of some block k.
• Change in data means, the hash in block k +1, which is a hash of the entire block k, is
not going to match, since the hash function is collision resistant.
• The inconsistency between the new data in block k and the hash pointer in block k + 1
26
will be detected.
Blockchain as a Tamper-evident log[2]

• The adversary can continue to try and cover up this change by changing the next block’s hash
as well.
• The adversary can continue doing this for other blocks
• This strategy will fail when the adversary reaches the head of the list.
• Specifically, as long as the hash pointer at the head of the list is stored in a place where
the adversary cannot change it, the adversary will be unable to change any block
without being detected
• The hash pointer at the head of the list is a tamper-evident hash of the entire list.
• First block is called the genesis block. 27
Merkle Trees [1]
• A binary tree with hash pointers is known as a Merkle tree after its inventor, Ralph
Merkle.
• The blocks of data make up the leaves of the tree.
• The data blocks (say, transactions) are grouped into pairs of two.
• For each pair, we build a data structure that has two hash pointers, one to each of the
blocks. These data structures make up the next level of the tree.

28
Merkle Trees [2]
• These, in turn, are grouped into groups of two
• For each pair create a new data structure that contains the hash of each, is created.
• This process continues until we reach a single block, the root of the tree.
• The pointer at the root of the tree is remembered

29
Merkle Trees [3]
• If an adversary tampers with some data block at the bottom of the tree, this change will
cause the hash pointer one level up to not match
• Even if he continues to tamper with other blocks farther up the tree, the change will
eventually propagate to the top.
• The root node hash pointer is stored safely.

30
Merkle Trees: Proof of Membership

To confirm Transaction D, one only needs to know H(AB), H(C), H(D), and H(EFGH).
• Concise Proof of Membership: Prove that a certain data block is a member of the
Merkle tree. The root is known.
• Required: The data block, and the blocks on the path from the data block to the root.
• The rest of the tree can be ignored as blocks on this path are enough to allow us to
verify the hashes all the way up to the root of the tree. 31
Digital Signatures [1]
 A digital signature is supposed to be the digital analog
(equivalent) of a handwritten signature on paper.
 Two desirable properties of digital signatures:
 Only the concerned person can make his/her signature, but
anyone who sees it can verify that it’s valid.
 The signature should be tied to a particular document, so that
the signature cannot be used to indicate the signer’s agreement
or endorsement of a different document.
 For handwritten signatures, this latter property is analogous to
ensuring that somebody can’t take your signature and snip it off
one document and glue it to the bottom of another one. 32
Digital Signatures [2]
• generateKeys and sign are
randomized algorithms
• Generates different keys for
different people
• Verify is always deterministic
• Valid signatures must be
verifiable – basic requirement
• Sign a message with secret key,
sk.
• Later, validate that signature
over that same message using the
public key, pk.
• The signature must validate
correctly.
33
Digital Signatures [3]
 Unforgeability – It is computationally infeasible to forge signatures.
 An adversary who knows your public key (pk) and has seen your
signatures on some messages(m1, ..mn) cannot forge your signature
on some message that he has not seen, i.e. message (munseen).
 Formalized in terms of a game that is played with an adversary.
 Adversary Claims that he can forge signatures
 Challenger tests this claim
 Step 1: Use generateKeys to generate sk and pk.
 Step 2: sk given to challenger and pk given to both challenger and
the adversary
 Step 3: The adversary knows only that information that is public and
his task is to forge a message
 Step 4: The challenger can make signatures since he knows the sk. 34
Digital Signatures [5]
 The setup of this game matches real-
world conditions
 A real world attacker would be able
to see valid signatures from his
would be victim on different
documents
 Manipulate the victim into signing
innocuous-looking documents
 Game: Allow adversary to get
signatures on documents of his
choice, for as long as he wants and as • After the adversary has seen enough
long as the number of guesses is signatures, he will pick some message
plausible M, that he will attempt to forge a
 Try 1 million guesses but not 280 signature on.
guesses • M should not have been signed before.
35
Digital Signatures [5]
 The challenger runs the verify
algorithm on the signature
produced by the adversary.
 Is the signature produced by the
adversary on M a valid one under
the public verification key?
 If the signature successfully
verifies, the adversary wins the
game.
 Signature scheme is unforgeable
if and only if the chances of
successfully forging a message is
extremely small – so small that it
will never happen in practice.
36
Digital Signatures [6]
 Practical concerns:
 Source of Randomness
 Many signature algorithms are randomized
 Good source of randomness is important
 Bad randomness makes an otherwise secure algorithm insecure.
 Message Size:
 In practice, there is a limit to the length of the message that you can sign.
 Getting around the limitation
 Sign the hash of the message rather than the message itself.
 Sign “Hash Pointer”
 Here, the signature covers or protects the whole structure, not just the hash
pointer itself but everything the chain of hash pointers point to.
 Digitally sign the entire Blockchain
 Sign the hash pointer located at the end of the Blockchain.
37
Digital Signatures [7]
 ECDSA (Elliptic Curve Digital Signature Algorithm)
 Digital signature scheme used in Bitcoin
 US Government standard
 An update of the earlier DSA algorithm adapted to use elliptic curves
 These algorithms are generally believed to be secure
 Bitcoin uses ECDSA over the standard elliptic curve secp256k1, which
is estimated to provide 128 bits of security
 it is as difficult to break this algorithm as it is to perform 2128 symmetric-key
cryptographic operations, such as invoking a hash function
 Other applications using ECDSA (such as key exchange in the TLS
protocol for secure web browsing)
 use the more common secp256r1 curve
 secp256k1 was chosen by Satoshi in the early specification of the
system and is now difficult to change.
38
Digital Signatures [8]

 ECDSA (Elliptic Curve Digital Signature Algorithm)


 A good source of randomness is essential.
 A bad source will likely leak your key -- Intuitive
 Particular quirk of ECDSA: even if you use bad randomness only when
making a signature and you use your perfectly good key, the bad signature
will also leak your private key
 A bad source of randomness is a common pitfall of otherwise secure systems.
39
Public Keys as Identities
 The public verification keys from a digital signature scheme are sometimes
used to identify a person or an actor in the system.
 If a message with a signature verifies correctly under a public key pk, we can
think of this as the actor pk stating the message.
 Then, pk becomes the identity.
 For someone to speak for the identity pk, he must know the corresponding secret key
sk.
 To verify that a message comes from an identity X, one will have to check that
(1) pk indeed hashes to the identity X, and (2) the message verifies under public
key pk.
 A new identity can be generated by just creating a new fresh key pair (pk, sk).
 Public keys are long and therefore hash of the public key can be used as the
identity.
 Public key pk looks random and hence difficult to connect pk to real world
identity.
40
Decentralized Identity Management
 No need for a central authority for registering users in a system
 Users can register by themselves
 A user can generate a name at any time and can create as many
names/identities as he/she wishes
 Bitcoin uses the term “addresses” to refer to identities.
 These are hashes of a public key
 The probability of two users generating the same 256-bit key is so small
that we need not worry in practice.
 Decentralized Identity Management might seem to offer great
anonymity and privacy.
 The pattern of a user’s behavior might itself be identifying:
 The actions of a person under an identity can be linked over time and certain
identity revealing inferences can be made. 41
Two Simple Cryptocurrencies
 Goofycoin
 Rules:
 Only Goofy can create new coins by simply signing a
statement that he’s making a new coin with a unique
coin ID {createCoin[uniqueCoinID]}.
 Whoever owns a coin can pass it on to someone else
by signing a statement that says, “Pass on this coin to
X” (where X is specified as a public key).
 Anyone can verify the validity of a coin by following
the chain of hash pointers back to its creation by
Goofy, verifying all signatures along the way.
 Security Problem with Goofycoin
 Goofycoin does not prevent double spending • Double-spending attacks are one
 A person can pay the same coin to multiple people of the key problems that any
simultaneously cryptocurrency has to solve

42
Scroogecoin [1]
 Solving the double-spending problem
 A designated entity called Scrooge publishes an append-only ledger containing the history
of all transactions
 The append-only ledger protects against double-spending by requiring all transactions to be
written in the ledger before they are accepted.
 The append-only functionality can be implemented via a Blockchain that Scrooge digitally
signs.
 Each block has the ID of a transaction, the transaction’s contents, and a hash pointer to the previous block.
 Scrooge digitally signs the final hash pointer, which binds all the data in this entire structure, and he
publishes the signature along with the block chain

43
Scroogecoin [2]
 A transaction is valid only if it is in the block chain
signed by Scrooge.
 Verification: Anybody can verify that a transaction was
endorsed by Scrooge by checking Scrooge’s signature on
the block that records the transaction.
 A Transaction that attempts to double spend an • On the other hand, in a system
already spent coin is not endorsed by Scrooge. where Scrooge signed blocks
individually, one would have to
 To ensure append-only property, we need for both a block keep track of every single
chain with hash pointers in addition to having Scrooge signature Scrooge ever issued.
sign each block. • A block chain makes it easy for
 Any modification by Scrooge i.e. addition, modification any two individuals to verify
or removal of will affect all following blocks because of that they have observed the
the hash pointers. same history of transactions
signed by Scrooge
 If someone monitors the latest hash pointer published by
Scrooge, the change will be obvious and easy to catch. 44
Transactions in Scroogecoin [1]
 CreateCoins creates multiple new coins with different values and assigns
them to people as initial owners.
 Multiple coins are allowed to be created in one transaction
 Each coin has a serial number in the transaction.
 Each coin also has a value; it’s worth a certain number of scroogecoins.
 Each coin has a recipient - a public key that gets the coin when it’s created.
 CoinIDs: A CoinID is a combination of a transaction ID and the coin’s serial
number in that transaction.

45
Transactions in Scroogecoin [2]
 PayCoins transaction consumes some coins (i.e., destroys them) and creates
new coins of the same total value.
 The new coins might belong to different people (public keys).
 This transaction has to be signed by everyone who’s paying in a coin.
 The owner of one of the coins that’s going to be consumed in this transaction,
has to digitally sign the transaction to say that he/she is OK with spending this
coin.

46
Transactions in Scroogecoin [3]
 A PayCoins transaction is valid if it satisfies four conditions:
 Consumed coins are valid, i.e. they were created in previous transactions.
 The consumed coins have not already been consumed in some previous transaction.
That is, this is not a double-spend transaction.
 The total value of the coins that come out of this transaction is equal to the total value of
the coins that went in. Only Scrooge can create new value.
 The transaction is validly signed by the owners of all coins consumed in the transaction.

47
Transactions in Scroogecoin [4]
 Coins in this system are immutable—they are never changed, subdivided, or
combined.
 Each coin is created, once, in one transaction and then later consumed in
another transaction.
 We can get the same effect as being able to subdivide or combine coins by
using transactions.
 For example, to subdivide a coin, Alice creates a new transaction that
consumes that one coin and then produces two new coins of the same total
value.
 Those two new coins could be assigned back to her. So although coins are
immutable in this system, it has all the flexibility of a system that doesn’t have
immutable coins. 48
Transactions in Scroogecoin [5]
 Core problem with Scroogecoin.
 People can see which coins are valid and it prevents double spending.
 Everyone can look into the blockchain and see that all transactions are valid
and that every coin is consumed only once.
 The central problem here is Scrooge—he has too much influence.
 Scrooge can’t create fake transactions, because he can’t forge other people’s signatures.
 Scrooge could stop endorsing transactions from some users, denying them service and
making their coins unspendable.
 If Scrooge is greedy, he could refuse to publish transactions unless they transfer some
mandated transaction fee to him.
 Scrooge can also of course create as many new coins for himself as he wants.
 Finally, Scrooge could get bored of the whole system and stop updating the block chain
completely. 49
Videos
 Inside a Bit Coin Mining Farm (Longer Video)
 https://www.youtube.com/watch?v=82vMOVREXzM
 Inside the Largest Bitcoin Mine in The U.S. | WIRED
 https://www.youtube.com/watch?v=x9J0NdV0u9k
Thanks

51

You might also like