Building a Blockchain from Scratch
Blockchains are the Doritos Locos Taco of computer science. Complex as software can become, there are a limited number of tools available to software developers. Software developers combine arrays, databases, objects, pointers and other logical constructs in unique ways to create efficient solutions to problems, but most of these combinations have been known for decades. (This is the classic book describing some of these common combinations.) Software developers are like chefs at a Mexican restaurant mixing hard shells, salsa and beans to make a taco, but later using a flour tortilla, salsa and beans to make a burrito.
Because there are only so many ingredients available, innovative combinations are rare. In the fast-food space, the Doritos Locos Taco was one of these innovative combinations: It is just like a regular taco, but its shell is instead a giant, folded Dorito. It was unveiled at Taco Bell in 2012 and quickly became the restaurant's most popular menu item. It was a brilliant combination of existing ingredients, it took everyone by surprise, and it was delicious.
Like the Doritos Locos Taco, blockchains combine well-known software ingredients to make something completely new, brilliant and surprising. Hashing and peer-to-peer file sharing have been standard software constructs since the early days of computing, but no one thought to piece them together to build a blockchain until recently. And blockchains have had a big effect on software, the economy and internet culture in the form of cryptocurrency, non-fungible tokens (NFTs) and public ledgers of ownership.
And like a taco, the elements of a blockchain are not complicated. This post walks through building a blockchain from scratch to explain how it all comes together to create something unique in computer science.
The Ingredients
While there can be more involved as they become more complex, the basic ingredients of a blockchain are hashing and peer-to-peer file sharing. IP/Decode has discussed hashing in two posts:
- Hash Functions: Their Utility for Both Clients and Lawyers
- Forensic Hashing in Criminal and Civil Discovery
Effectively, hashing takes some input and converts it into a "hash" string of numbers and letters. For example,
"All you need is love" becomes "ed590c566dc35fefb1a1424c6541ba11"
Hashing is a one-way street. While there is "nothing you can do that can't be done," no one can take "ed590c566dc35fefb1a1424c6541ba11" and transform it back into the line from the Beatles song. Also, it is nearly impossible for two input strings to result in the same output hash value. Even when the input value is very similar ("All you need is lov"), the output is dramatically different ("b9965afad09081b3b7e6d14f037ca56e").
Peer-to-peer file sharing refers to making copies of a document and spreading it to different computers. File sharing accomplishes two goals: The document is available to more computers, and it receives protection against alterations. The availability is obvious: The more copies of the Declaration of Independence you create, more people in the colonies can read it. (The first pressing of the Declaration was 200 copies.) The protection against alterations is a byproduct of the wide distribution. If a Loyalist in Boston printed a version of the Declaration that was a swooning love song to King George III, it would be hard to pass it off as the authentic document with hundreds of real copies circulated in Massachusetts and the other twelve colonies.
Building a Blockchain
These two basic ingredients come together in a blockchain to store information that is widespread and difficult to alter. Blockchains are built from a series of "blocks" that each contain information. Each block is "chained" together by storing the hashed value of a previous block.
I wrote SimpleBlockchain to show how these elements work together. The source code is available on my GitHub, and you can explore the blockchain itself here and the XML file that stores its data here.
The source code for each block is below. Each block has an ID, a timestamp (recording when the site was accessed), its own hash value and the previous block's hash value:
Below is a visual representation of part of the blockchain:
Real blockchains would carry more data. For example, a block may record transferring a crypto asset from one party to another or the movement of an item through a distribution channel.
Each time the site is loaded, a new block containing the timestamp is created, then the new block is added to the top of the chain. The function "gen_hash()" below generates the hash for a block, which is the result of combining all of its content together, including the previous block's hash value ("last_hash"):
You can see these hash links between blocks in the annotated blocks below:
Using the previous block's hash value to generate the current block's hash value is incredibly powerful, because it effectively locks the data inside every previous block. Remember that a hash value is the result of its input, and even small adjustments to that input result in dramatically different hash values. If someone were to adjust the timestamp or hash value in Block 1, then the hash values for Block 2 and Block 3 would no longer match what has been written to the blockchain. This mismatch occurs because Block 2's and Block 3's hash values depend on Block 1's hash value.
Altering a block's content is similar to editing the evolution of humans by adding an African elephant to our natural history 10 million years ago. Because our physiology and DNA today depend on all of our ancestors who came before us, we could easily spot the error of including the African elephant in our evolutionary chain. (Humans and African elephants did evolve from a common ancestor, at least as far back as this flopping fish, but humans did not evolve directly from African elephants.)
If there were only one copy of the SimpleBlockchain, then altering any block in the chain would be easy: Just modify the block and reconstruct all blocks that follow it so the hashes match. That modified blockchain would become the only copy of the SimpleBlockchain, and no one would be the wiser.
And that is why peer-to-peer file sharing better secures the blockchain. If millions of copies of the blockchain are widely distributed, then the one copy with the modified blocks is the odd one out, and the network will reject it as inauthentic. Continuing with the example above, if the modified blockchain is the fake Declaration of Independence that praises King George III, then no one will believe it with the many more copies of Thomas Jefferson's draft in circulation.
Uses of Blockchains
Uses of blockchains have exploded in the decade-plus since they became popular tools. Today, blockchains power cryptocurrencies, NFTs and supply chains, among other uses. Blockchains should be considered whenever an organization wants to create an unalterable record of transactions.
The law is often building chains of title to property, and everywhere this occurs is an obvious use case for blockchains. For example, current title recording for real estate occurs at the registrar of deeds, typically at the county level. Our current (and historical) approach to recording title is in sharp contrast to blockchain recording. Title recording is centralized: the county registrar of deeds is the single source for determining ownership; blockchains are decentralized: there may be millions of identical records of ownership spread across many computers. Title recording links previous deeds with current deeds: To trace title, you have to work backward from conveyance to conveyance to determine if today's title is clean. Blockchains perform this chain-of-title authentication automatically by using hashes to link each clean title to the next conveyance.