Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Bitcoin MOOC Lecture 1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 75

Lecture 1

Click to add text

Intro to Crypto and Cryptocurrencies


This lecture
Crypto background
hash functions
digital signatures
… and applications
Intro to cryptocurrencies
basic digital cash
Lecture 1.1:

Cryptographic Hash Functions


Hash function:
takes any string as input
fixed-size output (we’ll use 256 bits)
Security properties:
Property1: collision-free
Property2: Pre-image resistance (hiding)
Property3: Secondary Pre-image resistance
(puzzle-friendly)
Property4: efficiently computable
Property5: High Avalanche effect
Property6: Deterministic
Hash property 1: Collision-free
Nobody can find x and y such that
x != y and H(x)=H(y)

H(x) = H(y)

y
Collisions do exist ...

possible outputs
possible inputs

… but can anyone find them?


Collision-Resistance
● Means it should be hard to find two different inputs of any length that
result in the same hash. This property is also referred to as collision
free hash function.
○ for a hash function h, it is hard to find any two different inputs x and y such that h(x) = h(y).
● Since, hash function is compressing function with fixed hash length, it is
impossible for a hash function not to have collisions. This property of
collision free only confirms that these collisions should be hard to find.
● This property makes it very difficult for an attacker to find two input
values with the same hash.
Birthday Paradox – Finding collisions

● if you gather up 20-30 people in one room, the odds of two


people sharing the exact same birthday rises up astronomically. In
fact, there is a 50-50 chance for 2 people of sharing the same
birthday in this scenario!
○ assuming that all days of the year have the same likelihood of
having a birthday, the chances of another person sharing your
birthday is 1/365 which is a 0.27%.
How to find the collisions?
● Suppose you have N different possibilities
of an even happening, then you need
square root of N random items for them to
have a 50% chance of a collision
○ So applying this theory for birthdays, you have 365
different possibilities of birthdays, so you just need
Sqrt(365), which is ~23~, randomly chosen people
for 50% chance of two people sharing birthdays.
How to find a collision?
Usually, collision happens after sqrt(N),
where N is total number of possible ways
For ex: For 256 bits output, N=2256
try 2130 randomly chosen inputs
99.8% chance that two of them will collide

This works no matter what H is …


… but it takes too long to matter
Is there a faster way to find collisions?
For some possible Hashes, yes.
For others, we don’t know of one.
It is infeasible to find the input having same
hash but not impossible
No Hash Function has been proven collision-
free.
Application: Hash as message digest
If we know H(x) = H(y),
it’s safe to assume that x = y.

To recognize a file that we saw before,


just remember its hash.

Useful because the hash is small.


Hash property 2: Pre-image
resistance (Hiding)

We want something like this:


Given H(x), it is infeasible to find x.

H(“heads”)
easy to find x!

H(“tails”)
Hash property 2: Pre-image
resistance (Hiding)
Hiding property:
If r is chosen from a probability distribution that has high
min-entropy, then given H(r | x), it is infeasible to find x.

High min-entropy means that the distribution is “very


spread out”, so that no particular value is chosen with more
than negligible probability.
Pre-image resistance
● Means that it should be computationally hard
to reverse a hash function.
○ if a hash function h produced a hash value
z, then it should be a difficult process to
find any input value x that hashes to z.
● This property protects against an attacker who
only has a hash value and is trying to find the
input.
Application: Commitment
Want to “seal a value in an envelope”, and
“open the envelope” later.

Commit to a value, reveal it later.


Commitment API
(com, key) := commit(msg)
match := verify(com, key, msg)

To seal msg in envelope:


(com, key) := commit(msg) -- then publish com
To open envelope:
publish key, msg
anyone can use verify() to check validity
Commitment API
(com, key) := commit(msg)
match := verify(com, key, msg)

Security properties:
Hiding: Given com, infeasible to find msg.
Binding: Infeasible to find msg != msg’ such that
verify(commit(msg), msg’) == true
Commitment API
commit(msg) := ( H(key | msg), H(key) )
where key is a random 256-bit value
verify(com, key, msg) := ( H(key | msg) == com )

Security properties:
Hiding: Given H(key | msg), infeasible to find msg.
Binding: Infeasible to find msg != msg’ such that
H(key | msg) == H(key | msg’)
Hash property 3: Second Pre-image
resistance Puzzle-friendly
Puzzle-friendly:
For every possible output value y,
if k is chosen from a distribution with high min-entropy,
then it is infeasible to find x such that H(k | x) = y.
Second Pre-image resistance
● Means given an input and its hash, it should be hard to
find a different input with the same hash.
○ if a hash function h for an input x produces hash value
h(x), then it should be difficult to find any other input
value y such that h(y) = h(x).

● This property of hash function protects against an attacker


who has an input value and its hash, and wants to
substitute different value as legitimate value in place of
original input value.
Application: Search puzzle
Given a “puzzle ID” id (from high min-entropy distrib.),
and a target set Y:
Try to find a “solution” x such that
H(id | x) ∈ Y.

Puzzle-friendly property implies that no solving strategy is


much better than trying random values of x.
Pictorial representations of properties of
Hash Function
Examples of cryptographic hash functions
● MD 5:
○ It produces a 128-bit hash. Collision resistance was
broken after ~2^21 hashes.
● SHA 1:
○ Produces a 160-bit hash. Collision resistance broke after
~2^61 hashes.
● SHA 256:
○ Produces a 256-bit hash. This is currently being used by
Bitcoin.
● Keccak-256:
○ Produces a 256-bit hash and is currently used by
Ethereum.
SHA-256 hash function
Padding (10* | length)
512 bits
Message Message Message
(block 1) (block 2) (block n)

256 bits 256 bits

c c c
IV Hash

Theorem: If c is collision-free, then SHA-256 is collision-free.


SHA-256 Operation
● Takes the message you're hashing, breaks it up into blocks
that are 512 bits in size, pad the blocks if it is not a
multiple of 512 (i.e. a 1 followed by a certain number of 0)
● start with the 256-bit value called the IV, specified in the
standards document and the first block. This 768-bits string
goes through a special function cc(compression function)
that outputs a 256-bits string
● Then the compression function (Merkle‐Damgard
transform) is applied to the concatenation of the first
output and the second block
● the process is repeated until the end of the blocks, the hash
is the final 256-bits output
One Compression function in SHA-256
One compression
function in SHA-256
comprises
• a 256-bit block cipher
with 64 rounds,
• a key expansion
mechanism from 512
to 2048 bits, and
• a final set of eight 32-
bit additions.
One round of the block cipher inside SHA-256
Last 5 rounds of SHA-256 computation
Application of SHA-256 in bitcoin
Lecture 1.2:

Hash Pointers and Data Structures


Pointers and Linked Lists
● Pointers
○ Pointers are variables in programming which stores
the address of another variable.
● Linked Lists
○ a sequence of blocks, each containing data which is
linked to the next block via a pointer variable
which is pointing to address of the next node in it
and hence the connection is made
○ The first block is called as “genesis block”
Linked List
Hash Pointer
● hash pointer is:
○ pointer to where some info as well as the
(cryptographic) hash of the info are stored.

● if we have a hash pointer, we can


○ get the info back, and
○ verify that it hasn’t changed
Hash Pointer

H( )
(data) will draw hash pointers like this
key idea:

build data structures with hash pointers


Blockchain
● Blockchain is linked list with hash pointers
○ A series of blocks, each block has data as well as a hash pointer
to the previous block in the list
■ Benefit: Value of the previous block and a digest of that
value that allows us to verify that the value hasn’t changed

○ Achieves tamper-evident (immutable) property because of hash


pointer
■ The adversary changes the data of some block k . Since the
data has been changed, the hash in block k + 1, which is a
hash of the entire block k , is not going to match up due to
collision-resistant property
linked list with hash pointers = “block chain”

H( )

prev: H( ) prev: H( ) prev: H( )

data data data

use case: tamper-evident log


detecting tampering

H( )

prev: H( ) prev: H( ) prev: H( )

data data data

use case: tamper-evident log


Merkle Tree
● Binary tree with hash pointers = “Merkle tree”
○ In a Merkle tree, data blocks are grouped in pairs and the hash of
each of these blocks is stored in a parent node.
○ The parent nodes are in turn grouped in pairs and their hashes
stored one level up the tree.
○ This continues all the way up the tree until we reach the root node.
■ if an adversary tampers with some data block at the bottom of
the tree,
■ that will cause the hash pointer that’s one level up to not
match, and
■ even if he continues to tamper with this block, the change will
eventually propagate to the top of the tree where he won’t be
able to tamper with the hash pointer that we’ve stored.
binary tree with hash pointers = “Merkle tree”

H( ) H( )

H( ) H( ) H( ) H( )

H( ) H( ) H( ) H( ) H( ) H( ) H( ) H( )

(data) (data) (data) (data) (data) (data) (data) (data)


proving membership in a Merkle tree

show O(log n) items


H( ) H( ) where n is total
number of leaf nodes
H( ) H( )

H( ) H( )

(data)
Advantages of Merkle trees
● Tree holds many items but just need to remember the root hash
● Can verify membership in O(log n) time/space
● Variant: sorted Merkle tree where the blocks are ordered at the
bottom can verify non-membership in O(log n) (show items
before, after the missing one)
● Proof of Non-membership: simply by showing a path to the item
that’s just before where the item in question would be and
showing the path to the item that is just after where it would be
More generally ...

can use hash pointers in any pointer-based


data structure that has no cycles
Lecture 1.3:

Digital Signatures
What we want from signatures

Only you can sign, but anyone can verify

Signature is tied to a particular document


can’t be cut-and-pasted to another doc
Requirements for signatures
“valid signatures verify”
verify(pk, message, sign(sk, message)) == true

“can’t forge signatures”


adversary who:
knows pk
gets to see signatures on messages of his choice
can’t produce a verifiable signature on another message
Unforgeability game
● Unforgeability game
○ Participants: an adversary who claims that he can forge signatures
and a challenger that will test this claim
○ Generate keys to generate the secret key which is given to
challenger and public key to adversary
○ Allow the attacker to get signatures on some documents of his
choice, for as long as he wants, as long as the number of guesses is
plausible
○ After that, the attacker picks some message M which he never
sees, that he will attempt to forge a signature on
○ The challenger runs the verify algorithm to determine if the
signature produced by the attacker is a valid signature on M
■ If it successfully verifies, the attacker wins the game
Practical stuff...
algorithms are randomized
need good source of randomness
limit on message size
fix: use Hash(message) rather than
message
fun trick: sign a hash pointer
signature “covers” the whole structure
Digital Signature with Hash
Bitcoin uses ECDSA standard
Elliptic Curve Digital Signature Algorithm

relies on hairy math


will skip the details here --- look it up if you care

good randomness is essential


foul this up in generateKeys() or sign() ?
probably leaked your private key
● ECDSA
○ a cryptographic algorithm used by Bitcoin to ensure that funds can
only be spent by their rightful owners
○ private key (256 bits or 32 byte):
■ A secret number, known only to the person that generated it.
■ someone with the private key that corresponds to funds on the
block chain can spend the funds
○ public key
■ A number that corresponds to a private key, but does not need
to be kept secret
■ used to determine if a signature is genuine
■ Compressed – 33 bytes
● prefix either 0x02 or 0x03, and a 256-bit integer called x
■ Uncompressed – 65 bytes
● constant prefix (0x04), followed by two 256-bit integers called x and y (2 * 32 bytes)
■ signature: A number that proves that a signing operation took
place
● ECDSA
○ An ellipsis is a special case of the general second-degree equation
ax² + bxy + cy² + dx + ey + f = 0.
■ Depending on the values of the parameters a to f, the resulting
graph could be a circle, hyperbola, or parabola.
■ Elliptic curve cryptography uses third-degree equations.
■ Digital Signature Standards defines two kinds of elliptic curves
for use with ECC
● pseudo-random curves
○ whose coefficients are generated from the output of a
seeded cryptographic hash function;
● Special curves
○ whose coefficients and underlying field have been
selected to optimize the efficiency of the elliptic curve
operations
● Pseudo-random curves can be defined over
○ prime fields GF(p)
■ which contains a prime number p of elements. The
elements of this field are the integers modulo p
■ Field arithmetic is: modulo
■ y² = x³ + ax + b
○ binary fields GF(2m)
■ which contains 2m elements for some m (called the
degree of the field) where the elements of this field
are the bit strings of length m
■ Field arithmetic is: operation on bits
■ y² + xy = x³ + ax² + b
Module 23
Lecture 1.4:

Public Keys as Identities


Useful trick: public key == an identity

if you see sig such that verify(pk, msg, sig)==true,


think of it as
pk says, “[msg]”.

to “speak for” pk, you must know matching secret key sk


How to make a new identity

create a new, random key-pair (sk, pk)


pk is the public “name” you can use
[usually better to use Hash(pk)]
sk lets you “speak for” the identity

you control the identity, because only you know sk


if pk “looks random”, nobody needs to know who you are
Decentralized identity management

anybody can make a new identity at any time


make as many as you want!

no central point of coordination

These identities are called “addresses” in Bitcoin.


Privacy
Addresses not directly connected to real-world identity.

But observer can link together an address’s activity over


time, make inferences.

Later: a whole lecture on privacy in Bitcoin ...


Lecture 1.5:

Simple Cryptocurrencies
GoofyCoin
Goofy can create new coins

New coins belong to me.

signed by pkGoofy
CreateCoin [uniqueCoinID]
A coin’s owner can spend it.

Alice owns it now.

signed by pkGoofy
Pay to pkAlice : H( )

signed by pkGoofy
CreateCoin [uniqueCoinID]
The recipient can pass on the coin again.

signed by pkAlice Bob owns it now.

Pay to pkBob : H( )

signed by pkGoofy
Pay to pkAlice : H( )

signed by pkGoofy
CreateCoin [uniqueCoinID]
double-spending attack

signed by pkAlice signed by pkAlice


Pay to pkBob : H( ) Pay to pkChuck : H( )

signed by pkGoofy
Pay to pkAlice : H( )

signed by pkGoofy
CreateCoin [uniqueCoinID]
double-spending attack

the main design challenge in digital currency


ScroogeCoin
Scrooge publishes a history of all transactions
(a block chain, signed by Scrooge)
H( )

prev: H( ) prev: H( ) prev: H( )


transID: 71 transID: 72 transID: 73

trans trans trans

optimization: put multiple transactions in the same block


CreateCoins transaction creates new coins

Valid, because I said so.

transID: 73 type:CreateCoins

coins created
num value recipient

0 3.2 0x... coinID 73(0)

1 1.4 0x... coinID 73(1)

2 7.1 0x... coinID 73(2)

You might also like