# Merkle trees and inclusion proofs

#### 2019-01-20

This is mainly stuff that I'm writing so I can better understand merkle trees and inclusion proofs. A**Merkle tree**is a data structure with the following definition:

- Non leaf nodes are hashes of their children, the children are typically concatenated.
- Leaf nodes are hashes of data blocks.

*tree*, and it seems like a recursive data structure:

package main // We'll need this later on import ( "crypto/sha256" ) // merkleData wraps block as []byte so we can check for nil in the data field of MerkleNode type merkleData struct { block []byte } // MerkleNode represents a node of a merkle tree. If the data field is not nil, it is a leaf. type MerkleNode struct { data merkleData // This merkle node can have any number of children, we can implement a binary tree version if we want. nodes []*MerkleNode } // Create creates a merkle leaf with the data stored in block. func(n *MerkleNode) Create(block []byte) (MerkleNode) { n.data = merkleData{ block: block, } return n } // now for insertion and deletion, which depend on what type of balancing this tree uses.This is actually code that I wrote before I understood that the exact structure of the merkle tree didn't matter as much as I thought it did. We aren't constructing a BST. When in reality, the most important part of merkle trees are the data blocks that get hashed, and using the roots to generate

**proofs of inclusion**on a set of data. These proofs, also called

**merkle proofs**, are used to prove that a certain set of data is included in a larger set of data without the need to reveal the whole set of data. The only things revealed are hashes, in fact in order to prove inclusion you only need to reveal

`O(log(n))`

hashes. And the prover only needs to store the merkle root, since the proofs will allow the prover to recompute this merkle root.
So the most important thing is finding out how to create and verify these proofs.
Bitcoin Core implements a merkle tree something like this:
package main import ( "crypto/sha256" "fmt" "hash" ) type Transaction struct { vin []*txInput vout []*txOutput nVersion int32 nLockTime uint32 } type txInput struct { prevout outPoint scriptSig []byte sequence uint32 scriptWitness txScriptWitness } // Not going to define OutPoint or txScriptWitness because they're not necessary // for understanding the concept of merkle trees, but it's good to know what's // actually in a transaction, so that's why I'm defining txInput and txOutput. type txOutput struct { scriptPubKey []byte value int64 } // Block is a block of transactions type Block struct { vtx []*Transaction fChecked bool } // ComputeMerkleRoot computes the merkle root of a block's transactions. func (b *Block) ComputeMerkleRoot() []byte { var hashList []hash.Hash for tx := range b.vtx { hash := sha256.New() // This is my lazy way of encoding transactions, bitcoin encodes transactions better. hash.Write([]byte(fmt.Sprintf("%v", tx))) hashList = append(hashList, hash) } return computeRootOfHashList(hashList) } // compute the merkle root of an arbitrary list of hashes func computeRootOfHashList(hashList []hash.Hash) []byte { for len(hashList) > 1 { if len(hashList)%2 == 1 { hashList = append(hashList, hashList[len(hashList)-1]) } hashList = hashPairs(hashList) } if len(hashList) == 0 { return nil } return hashList[0].Sum(nil) } // hash pairs of values and return their parent hashes func hashPairs(hashList []hash.Hash) []hash.Hash { var newList []hash.Hash // Just in case the input is odd, even though we expect it to be even if len(hashList)%2 == 1 { hashList = append(hashList, hashList[len(hashList)-1]) } for i := 0; i+1 < len(hashList); i += 2 { // Concatenate the child hashes concatHashes := append(hashList[i].Sum(nil)[:], hashList[i+1].Sum(nil)[:]...) parentHash := sha256.New() // Hash the concatenated child hashes parentHash.Write(concatHashes) // Add the hash to the list of parent hashes newList = append(newList, parentHash) } return newList }This implementation is very similar to Bitcoin Core, but this implementation (as well as Bitcoin Core's) has a vulnerability. This vulnerability (CVE-2012-2459) is a result of how we "fill" the merkle tree. We duplicate elements when there is an odd number in the row we're at, meaning one would be able to generate a merkle root & inclusion proof that proves duplicate transactions. This was fixed in Bitcoin Core back in 2012, by adding validation code confirming that there are no duplicate transactions before accepting the block. As you can see, we represent the merkle tree first as the list of transactions, and start hashing the consecutive pairs until we reduce the list to a single element, the

**merkle root**.