This article was originally published on December 19, 2020, in Chinese. For more details, see here.
Recently, the Bitcoin world has been abuzz with talk about "Snowden."
Yes, that Snowden, the Bitcoin supporter who paid for his document exposure server with Bitcoin. The same Snowden who is wanted by the U.S. and currently exiled in Russia.
On December 16, 2020, when Bitcoin reached a historic high of over $20,000, Snowden tweeted: "One word: Bitcoin." What did he mean by that? There could be several interpretations, but it’s clear that people were happy—70,000 of his followers liked the tweet. If you can't see the tweet, here's a screenshot for you. If you can see it, congratulations.
However, amid the excitement, one can't help but remember Snowden's former employer—the U.S. National Security Agency (NSA). The NSA became infamous due to the PRISM program, which collected user data from Google, Facebook, and Microsoft. We learned about this program through documents leaked by Snowden, which is also why he remains exiled and wanted.
Here's another fact you need to know: Bitcoin uses an important algorithm, SHA-256, designed by the NSA. Does that make you uneasy? Why would Satoshi Nakamoto use something from the NSA? This has been a concern for a long time. Ten years ago, Satoshi Nakamoto addressed this issue, and so far, no problems have been found with SHA-256.
On June 14, 2010, before disappearing from the public eye, Satoshi Nakamoto shared his thoughts on the potential cracking of SHA-256. He offered two suggestions based on different scenarios:
Sudden Break: If SHA-256 is suddenly broken, the community can agree on an honest blockchain before the issue occurred and start using a new hash function from there.
Gradual Break: If SHA-256 is gradually broken, the community can transition to a new hash function in an orderly way, with all users updating their software to use the new hash function at a certain block height.
Even with these measures, concerns about potential backdoors in Bitcoin remain. If SHA-256 has a backdoor, is Bitcoin still secure?
Such worries are understandable, given that the NSA, known for dealing with secrecy, designed SHA-256. Can they really create an encryption system they can't break? I doubt it, but there's no evidence.
Bitcoin's development is almost in sync with cryptography. In 2009, the year Bitcoin was created, the commonly used hash algorithm MD5 was cracked by Chinese scientists Xie Tao and Feng Dengguo. By 2011, the Internet Engineering Task Force (IETF) had banned the use of MD5.
SHA-256 has been used in Bitcoin for 11 years, and concerns have always existed. But it's a difficult issue to prove or disprove. Unless another Snowden appears, such debates will continue. We should be cautious and assume that SHA-256 might have a backdoor, facing the issue head-on to see if we can find a better solution than Satoshi Nakamoto did 11 years ago.
The following content is a bit technical but not complex. You need to understand it, as it’s the foundation of Bitcoin.
Recent research shows that Bitcoin's trust stems from its "immutable" nature. If you don’t want to read the original paper, you can check out my previous article, "The Secret of Bitcoin's 'Bubble' Not Bursting for 11 Years!"
Immutability means that once something is said or done, it can't be changed. Unlike the Chinese idioms "a word that carries weight" or "a gentleman's word is as good as his bond," Bitcoin's immutability isn't based on virtue or goodness but on a powerful system. This means Bitcoin’s immutability is objective, not dependent on someone’s will to change or not.
Ensuring data immutability is crucial for Bitcoin because Bitcoin is essentially a public ledger accessible to anyone, anywhere. Immutability brings certainty. When everyone sees the same ledger, the cost of trust between people significantly decreases. This is why Bitcoin can be decentralized and transfer funds without banks.
Immutability is supported by cryptographic algorithms. The development of cryptography has enabled Bitcoin. However, cryptography’s progression also brings potential risks to Bitcoin. Theoretically, no encryption algorithm is unbreakable—it's just a matter of time.
Encryption algorithms are based on mathematics. Essentially, Bitcoin’s immutability is mathematically guaranteed. Mathematics, unlike physics or chemistry, is proven through logical reasoning, which is far more rigorous than empirical sciences. For instance, we’ve known 1+1=2 for ages, but the law of universal gravitation has been replaced by quantum mechanics.
However, encryption algorithms are not pure mathematics but applications of it, and they have many flaws. Thus, breaking encryption algorithms is inevitable, and Bitcoin’s algorithms are no exception. Encryption and decryption are like offense and defense. Bitcoin's security is only temporary—it’s just that the offensive tools aren’t strong enough to breach the defenses yet. Keep this in mind.
Here’s a diagram of the Bitcoin blockchain. It helps to understand the structure of the Bitcoin blockchain. You’ll see the word "Hash" frequently. Every transaction generates a hash, transactions are linked by hashes, each block has its own hash, and each block contains the previous block's hash. Essentially, the Bitcoin blockchain is a hash chain. Without hashes, there would be no Bitcoin. Bitcoin’s immutability is guaranteed by hashes.
A hash function, also known as a hash algorithm, creates a digital "fingerprint" from any electronic data. Here’s a diagram showing how a hash function works: the left side is the input, the right side is the output. Notice that regardless of the input length, the output is always a fixed length. This is a key feature of hash functions.
You might wonder, with such a vast world and so much information, how can only 8 characters on the right represent unique data? This is called a "collision" when two different inputs produce the same output.
As mentioned earlier, algorithms are human-designed and have many flaws. Collisions in hash functions are a significant problem. These problems drive the evolution of algorithms. Mathematics’ 1+1=2 remains constant, but algorithms must evolve.
Once an algorithm’s collision problem is discovered, it’s essentially declared dead. Older algorithms like MD5 met this fate. In this field, a prominent figure is Wang Xiaoyun, a mathematician and an academician of the Chinese Academy of Sciences. In April 2020, the International Association for Cryptologic Research (IACR) awarded Wang the Test-of-Time Award for her groundbreaking paper published in 2005 on hash function analysis.
Her paper focused on hash collisions. The algorithms mentioned in her paper—MD4, MD5, HAVAL-128, and RIPEMD—are no longer trusted or widely used.
Does Bitcoin’s hash algorithm have collision problems?
Bitcoin uses SHA-256 from the SHA-2 family. SHA-1, its predecessor, was found to have collision issues by Wang Xiaoyun and others. SHA-3 is its successor. SHA-256, used by Bitcoin, generates a 256-bit hash value.
2^256 ≈ 1.1579 × 10^77
This value is large enough, given the number of atoms in the universe is estimated between 10^60 and 10^80, making collisions highly improbable. Numbers can be abstract, so here’s an image to illustrate the size of a small part of the universe. Below is the famous "Pale Blue Dot" photo.
This image was processed by NASA in 2020, 30 years after it was originally taken on February 14, 1990, from 6 billion kilometers away by Voyager 1. Our Earth is the tiny dot. In the vast universe, Earth is insignificant, like a speck of dust. Viewing this photo might give you a sense of awe about the universe’s size and Earth’s smallness. This corner is just a tiny part of the universe.
You might feel the above explanation is indirect and lacks persuasiveness. Here’s a video from YouTube—“How Secure is 256-bit Encryption?” It shows that even with enough computing power, guessing a 256-bit number would take 5.07 trillion years, 37 times the age of the universe (13.8 billion years). The video is vivid and interesting, although I provided a link, you may not be able to watch it. If you can’t, try to find a way to see it.
We can temporarily set aside concerns about SHA-256 collisions. You might wonder if it's possible to subtly alter a number in the Bitcoin blockchain, like changing your account balance from 1 to 10 Bitcoin, without being noticed. The answer is no, due to the "avalanche effect" of hashes.
Here's a website offering online SHA-256 calculations. Using the previous article’s title "即使美国也不能杀死比特币" as input, the output is:
0c294590692aa32f7c6c0dd85f065e87c13b6f928e5ad5c6c163b287bba11882
Adding “!“ to the title changes the output to:
`07e24d0c3efdd396b9c82bd866c8e9c966459135ebe9b2a42c4877246af19fe9`
You’ll notice that although only “!“ was added, the hash output is vastly different. Comparing both outputs:
0c294590692aa32f7c6c0dd85f065e87c13b6f928e5ad5c6c163b287bba11882
07e24d0c3efdd396b9c82bd866c8e9c966459135ebe9b2a42c4877246af19fe9
Almost every character is different. This is the "avalanche effect," where a small change in input results in a drastically different output, similar to a small pebble triggering an avalanche on Mount Everest.
Now you know that any slight modification in the Bitcoin blockchain will be immediately detected due to the avalanche effect. If you’ve read this far, it’s partly due to my writing but mostly because of your patience and curiosity. Give yourself a pat on the back. With this cryptographic knowledge, you can better understand SHA-256's role in Bitcoin and the potential impact of any backdoors.
SHA-256 is mainly used in Bitcoin mining and generating Bitcoin addresses. Let’s talk about addresses first.
The sequence is: Bitcoin addresses come from public keys, public keys come from private keys. The process involves various data manipulations—all done through algorithms. Public and private keys are just strings of numbers, don’t be intimidated by the terms. Different private keys generate different public keys, and different public keys generate different addresses. The critical part is that the process from private key to public key and from public key to address is irreversible. This diagram clearly illustrates the process, including the algorithms used. SHA-256 appears in the transition from public key to public key hash, indicated by the red arrow.
Let’s get a more intuitive understanding of private keys and addresses. Below is a sample paper wallet with two QR codes serving different purposes. The left one is for receiving funds, and the right one is for spending. The right one is more critical—if it’s damaged or lost, no one can help you, not even God. This is why paper wallets are often criticized. Remember, the private key represents control, the address can be shared, but the private key should remain private. However, avoid linking your address to your identity, as it can lead to tracking of your Bitcoin transactions. Staying low-key is advisable for Bitcoin owners.
A private key is essentially a random number—the more random, the safer. Since the private key is 256 bits, it’s hard to guess or replicate. We’ve covered this already; you can review it if needed. If you wish, you can generate the most secure private key yourself, ensuring no one can crack it. Here’s how:
Prepare a pen, paper, and a coin.
Flip the coin.
Heads represent 1, tails represent 0, note it down.
Repeat steps 2-3, 256 times.
The 256-bit binary number you get is your private key.
Few people do this due to the hassle, relying instead on wallet software to generate random private keys. The principle remains the same. Crucially, you must safeguard this private key, or the funds in your account will be irretrievable. To stress the importance, let me introduce someone.
James Howells, an IT worker from Newport, Wales, started mining in 2009 using a laptop and mined 7,500 Bitcoins. You might envy those days of laptop mining. Today, he’s considered a pioneer.
Later, he stopped mining, possibly due to the transition to GPU mining, rendering laptops obsolete. When his laptop broke, he removed the hard drive, which contained his private key, for safekeeping. Unfortunately, during a house cleanup in 2013, he accidentally threw the hard drive away. Thus, 7,500 Bitcoins were lost forever. Remember, forever. James said he was distracted by family life and moving house. At today's Bitcoin prices, those Bitcoins would be worth $150 million. Such distractions are costly.
Now you understand the importance of Bitcoin private keys. Referring back to the “From Private Key to Bitcoin Address” diagram, since SHA-256 only touches the public key, it’s impossible to derive the private key from the public key. Why can’t the private key be derived from the public key? It’s a good question, but complex. We’ll cover it another time. For now, just remember: even if SHA-256 has a backdoor, our private keys are safe.
Even if SHA-256 has a backdoor, it won’t affect the process of generating Bitcoin addresses. Now, let’s consider if the backdoor could be exploited in mining.
Mining is a metaphor for treasure hunting, but the process happens entirely in computers. Here’s a recent mining result. The screenshot shows a block mined on December 18, 2020, at 6:01 am. The arrows point to the miner’s reward: 6.25 Bitcoins as block reward and 1.4744 Bitcoins as transaction fees, valued at over $150,000 at $20,000 per Bitcoin. To claim this reward, the miner had to find a hash value with 19 leading zeros by adjusting the number in the circle. The miner succeeded after 3.5 billion attempts.
This might seem abstract, so here’s a diagram illustrating the mining process. The puzzle is to find a hash value with 19 leading zeros. The first to find it wins the reward. The competition is fierce.
In the diagram, the function y=f(x) represents the SHA-256 algorithm. The winner is the one whose value of x2 yields a hash meeting the criteria. Below, we use "hello, world!" to represent unchanging block data, incrementing the number to find a hash with 4 leading zeros.
"Hello, world!0" => 1312af178c253f84028d480a6adc1e25e81caa44c749ec81976192e2ec934c64 = 2^252.253458683
"Hello, world!1" => e9afc424b79e4f6ab42d99c81156d3a17228d6e1eef4139be78e948a9332a7d8 = 2^255.868431117
"Hello, world!2" => ae37343a357a8297591625e7134cbea22f5928be8ca2a32aa475cf05fd4266b7 = 2^255.444730341
...
"Hello, world!4248" => 6e110d98b388e77e9c6f042ac6b497cec46660deef75a55ebc7cfdf65cc0b965 = 2^254.782233115
"Hello, world!4249" => c004190b822f1669cac8dc37e761cb73652e7832fb814565702245cf26ebb9e6 = 2^255.585082774
"Hello, world!4250" => 0000c3af42fc31103f1fdc0151fa747ff87349a4714df7cc52ea464e12dcd4e9 = 2^239.61238653
Notice, after 4,250 attempts, a hash with 4 leading zeros suddenly appears without warning. The only way to find more leading zeros is to guess more and faster, hence the computing power arms race. This process, called proof of work, is costly and resource-intensive.
If SHA-256 has a backdoor, could one cheat and find the required hash faster than others? Theoretically, yes. However, no evidence of a backdoor has been found. Otherwise, Bitcoin’s hash rate wouldn’t remain at 140 EH/s, the highest on Earth. The combined computing power of the top 100 supercomputers is less than one ten-thousandth of this.
But if SHA-256’s backdoor or vulnerability is discovered, what should we do?
Satoshi’s two methods still apply. Additionally, we can use Bitcoin's alert system to minimize damage.
The alert system has been used 15 times since February 18, 2012. The latest alert was on September 21, 2018, reminding users to upgrade Bitcoin Core. So, if a backdoor in SHA-256 were discovered, the impact on the Bitcoin community would be limited and quickly fixed through software updates. Software updates are a form of evolution; Bitcoin has undergone 21 major updates and can handle one more, especially on such a critical issue.
In reality, even without a backdoor, SHA-256 will eventually be broken. It’s just a matter of time. As mentioned earlier, encryption and decryption are like offense and defense, evolving together.
Satoshi Nakamoto’s statement about SHA-256 is crucial. He said: "SHA-256 is very strong. It’s not like the incremental step from MD5 to SHA1. It can last several decades unless there’s some massive breakthrough attack."
SHA-256 is part of the SHA-2 family. Even if it has no backdoor, it will eventually be broken.
MD5 was cracked by Chinese scientists Xie Tao and Feng Dengguo in 2009 using ordinary computers in just seconds. Two years later, it was banned by the IETF.
SHA1 was cracked by Google on February 23, 2017.
SHA-2, which includes SHA-256, will inevitably be broken too.
Why not proactively upgrade to a more secure algorithm not designed by the NSA?
Proactive measures are wise but challenging, involving significant governance issues. Many issues more critical than algorithm security remain unsolved. We’ll explore this in the next article.