# Merkle trees and inclusion proofs

#### 2019-01-20

This is mainly stuff that I'm writing so I can better understand merkle trees and inclusion proofs. A Merkle tree is a data structure with the following definition:
• Non leaf nodes are hashes of their children, the children are typically concatenated.
• Leaf nodes are hashes of data blocks.
If you wanted to create a merkle tree in Go and come from an introductory data structures background, this is what you might do, given that it's called a merkle tree, and it seems like a recursive data structure:
```package main

// We'll need this later on
import (
"crypto/sha256"
)

// merkleData wraps block as []byte so we can check for nil in the data field of MerkleNode
type merkleData struct {
block []byte
}

// MerkleNode represents a node of a merkle tree. If the data field is not nil, it is a leaf.
type MerkleNode struct {
data  merkleData
// This merkle node can have any number of children, we can implement a binary tree version if we want.
nodes []*MerkleNode
}

// Create creates a merkle leaf with the data stored in block.
func(n *MerkleNode) Create(block []byte) (MerkleNode) {
n.data = merkleData{
block: block,
}
return n
}

// now for insertion and deletion, which depend on what type of balancing this tree uses.
```
This is actually code that I wrote before I understood that the exact structure of the merkle tree didn't matter as much as I thought it did. We aren't constructing a BST. When in reality, the most important part of merkle trees are the data blocks that get hashed, and using the roots to generate proofs of inclusion on a set of data. These proofs, also called merkle proofs, are used to prove that a certain set of data is included in a larger set of data without the need to reveal the whole set of data. The only things revealed are hashes, in fact in order to prove inclusion you only need to reveal `O(log(n))` hashes. And the prover only needs to store the merkle root, since the proofs will allow the prover to recompute this merkle root. So the most important thing is finding out how to create and verify these proofs. Bitcoin Core implements a merkle tree something like this:
```package main

import (
"crypto/sha256"
"fmt"
"hash"
)

type Transaction struct {
vin       []*txInput
vout      []*txOutput
nVersion  int32
nLockTime uint32
}

type txInput struct {
prevout       outPoint
scriptSig     []byte
sequence      uint32
scriptWitness txScriptWitness
}

// Not going to define OutPoint or txScriptWitness because they're not necessary
// for understanding the concept of merkle trees, but it's good to know what's
// actually in a transaction, so that's why I'm defining txInput and txOutput.

type txOutput struct {
scriptPubKey []byte
value        int64
}

// Block is a block of transactions
type Block struct {
vtx      []*Transaction
fChecked bool
}

// ComputeMerkleRoot computes the merkle root of a block's transactions.
func (b *Block) ComputeMerkleRoot() []byte {
var hashList []hash.Hash
for tx := range b.vtx {
hash := sha256.New()
// This is my lazy way of encoding transactions, bitcoin encodes transactions better.
hash.Write([]byte(fmt.Sprintf("%v", tx)))
hashList = append(hashList, hash)
}
return computeRootOfHashList(hashList)
}

// compute the merkle root of an arbitrary list of hashes
func computeRootOfHashList(hashList []hash.Hash) []byte {
for len(hashList) > 1 {
if len(hashList)%2 == 1 {
hashList = append(hashList, hashList[len(hashList)-1])
}
hashList = hashPairs(hashList)
}
if len(hashList) == 0 {
return nil
}
return hashList[0].Sum(nil)
}

// hash pairs of values and return their parent hashes
func hashPairs(hashList []hash.Hash) []hash.Hash {
var newList []hash.Hash

// Just in case the input is odd, even though we expect it to be even
if len(hashList)%2 == 1 {
hashList = append(hashList, hashList[len(hashList)-1])
}

for i := 0; i+1 < len(hashList); i += 2 {
// Concatenate the child hashes
concatHashes := append(hashList[i].Sum(nil)[:], hashList[i+1].Sum(nil)[:]...)
parentHash := sha256.New()
// Hash the concatenated child hashes
parentHash.Write(concatHashes)
// Add the hash to the list of parent hashes
newList = append(newList, parentHash)
}
return newList
}
```
This implementation is very similar to Bitcoin Core, but this implementation (as well as Bitcoin Core's) has a vulnerability. This vulnerability (CVE-2012-2459) is a result of how we "fill" the merkle tree. We duplicate elements when there is an odd number in the row we're at, meaning one would be able to generate a merkle root & inclusion proof that proves duplicate transactions. This was fixed in Bitcoin Core back in 2012, by adding validation code confirming that there are no duplicate transactions before accepting the block. As you can see, we represent the merkle tree first as the list of transactions, and start hashing the consecutive pairs until we reduce the list to a single element, the merkle root.

Previous article: How this blog is supposed to work