Meta, the company behind Facebook, recently revealed its own AI supercomputer, a cluster of approximately 6,000 GPUs. This makes it the ~5th fastest supercomputer in the world and underscores their ambition to own the infrastructure on which the world’s artificial intelligence is developed.
As we’ve seen in the past decade, what’s good for Meta and other oligopolists generally hasn’t been good for us; the ease of use of their products typically trades off against price gouging and censorship/anti-competitive behaviour.
Thus far, most of the fight against Big Tech centralisation has been on age-old issues, principally, freedom of speech. In the coming decades, as AI becomes intractably integrated into society, a more fundamental concern will arise: freedom of compute.
As we discuss below, limitations and control on compute introduce existential implications, so let's rewind a bit, unpack what's really at stake here, and examine a radically new way to think about the problem.
Every face you see on a video call and all the audio you hear is manipulated. To improve call quality, neural networks selectively adjust the resolution in Zoom and suppress background noise in Microsoft Teams. More recent advances even see lower resolution video ‘dreamed’ into a higher resolution.
Neural networks are the models used in the deep learning branch of artificial intelligence. They are loosely based on the structure of the human brain and have myriad applications, perhaps ultimately creating human level artificial intelligence. Bigger models generally yield better results, and the hardware required for state-of-the-art development is doubling every three months.
This explosion in development has made deep learning a fundamental part of the modern human experience. In 2020, a neural network operated the radar on a US spy plane, language models now write better scam emails than humans, and self-driving car algorithms outperform humans in many environments.
In the next decade, deep learning will become even more fundamental as jumps in model development become more frequent and also as hardware advances create greater sensory immersion. For example, Brain Machine Interfaces (BMI) use artificial intelligence to decode brain signals. BMIs, like Neuralink, hold the near-term promise of allowing someone to access and browse the internet with their thoughts.
If you’re sceptical of grand visions for deep learning then you might have well-founded concerns. The space has been anthropomorphised and overhyped from the moment it entered the public eye.
Yet in the past decade, some of the human-like promise of AI has indeed come to fruition. In 2016, AlphaGo, a neural network developed by DeepMind (an Alphabet subsidiary), toppled Lee Sedol, the world’s Go Champion. It accomplished this by analysing games played by Go masters and then playing itself to improve. Go is a game that is 10^100 times more complicated than chess.
Less well-known is the fact that the following year, a new version of AlphaGo learned to play Go without ever being shown games from the Go masters. Having never seen a real game of Go, it ended up beating the previous version that had beaten Sedol.
However, despite enormous gains in narrow areas like Go, the most basic and innate concepts of the human experience are the hardest to replicate: self-awareness, morality, and ‘gut instinct’. This shortcoming was famously captured in Bladerunner’s Voight-Kampff test–loosely based on the Turing Test–in which AI replicants were asked a series of morally ambiguous questions to determine their humanity.
For a system to convincingly answer these questions, it would probably need to satisfy a version of Artificial General Intelligence (AGI), aka ‘Strong AI’. An AGI is a system that matches average human intelligence and has a sense of consciousness. Achieving this state is an area of passionate research and fundamental uncertainty (there is no universally accepted definition of consciousness, for example). Estimates for achieving AGI vary wildly; Bladerunner might have been set in 2019, but rough estimates for AGI creation span from 2029 to 2220!
If you’ve noticed that most of the above examples are produced by the same set of companies, that’s because the deep learning industry currently looks like a game of monopoly between Big Tech companies. At the state level, too, it often looks like a trade and talent war between China and the United States. These forces are resulting in huge centralisation of the key resources that get us to AGI: compute power, knowledge, and data.
Compute power: access to superior processors enables increasingly large/complex models to be trained. In the past decade, transistor density gains and advances in memory access speed/parallelisation have dramatically reduced training times for large models. Virtual access to this hardware, via cloud giants like AWS and Alibaba, has simultaneously widened adoption.
Accordingly, there is strong state interest in acquiring the means to produce state-of-the-art processors. Mainland China does not yet have the end-to-end capability to produce state-of-the-art semiconductors (namely, silicon wafers), an essential component in processors. They need to import these, particularly from TSMC (Taiwan Semiconductor Manufacturing Company). Chip vendors also attempt to block out other customers from accessing chip manufacturers by buying up supply. At the state level, the US has been aggressively blocking any move by Chinese companies to acquire this technology.
Further up the tech stack, some companies have gone as far as creating their own deep learning specific hardware, like Google’s TPU clusters. These outperform standard GPUs at deep learning and aren’t available for sale, only for rent.
Knowledge: many of the most public breakthroughs have stemmed from new model architectures developed by researchers, but there is a battle over the underlying IP and talent. The US has historically captured over 50% (!) of the talent emerging from China, and the companies that develop models with this talent are increasingly making the technology less accessible. GPT-3 by OpenAI was (as the name suggests) meant to be openly available. But, as of today, it controversially sits behind an API with only Microsoft having access to the source code.
Data: deep learning models require huge volumes of data–both labelled and unlabeled–and generally improve as data quantity increases. GPT-3 was trained on 300 billion words. Labelled data is particularly important, and the industry has been steadily accruing it for years. A clandestine example: every time you solve a reCaptcha to access a website you are labelling training data to improve Google Maps.
The internet might have been born of the US Government in the 1960s, but by the 1990s it was an anarchic web of creativity, individualism, and opportunity. Well before Google was stockpiling TPUs, projects like SETI@home attempted to discover alien life by crowdsourcing decentralised compute power. By the year 2000, SETI@home had a processing rate of 17 teraflops, which is over double the performance of the best supercomputer at the time, the IBM ASCI White. This period of time is generally named ‘web1’, a moment before the hegemony of large platforms like Google or Amazon (web2).
The shift from web1 and the relative obscurity of SETI@home might have been due to some of the issues with decentralised infrastructure. For one, all of the data must be distributed, increasing bandwidth requirements. Equally important is how fast you can complete tasks that must be performed in sequence. For example, if you want to analyse the background radiation of the universe (and maybe find alien life) you can divide the sky into small parts and distribute it to everyone (the radiation in one part of the sky can be analysed independently of the other parts). This allows for perfect parallelisation of work; it also means that it’s relatively trivial to check if the work has been done correctly. After asking a third party to perform a computation, you could randomly select units of submitted work and check if they are correct.
In contrast, if you want to calculate the next best move in a game of chess (or Go), you must have access to the positions of all the pieces on the board. This makes it very difficult to efficiently decentralise the computation in chess and Go engines.
However, the current centralisation of web infrastructure into huge web2 platforms creates its own arguably larger issues:
There is a third way, however. Web3 can be thought of as a combination of the decentralised components of web1 and the capitalist components of web2. For example, decentralising compute with a blockchain and buying/selling processor cycles with tokens would circumvent the above issues with web2:
It’s clear that decentralising compute creates a cheaper and freer base from which to research and develop artificial intelligence. But the fundamental blocker to the decentralisation of deep learning training has been verification of work. Essentially, how do you know that another party has completed the computation that you requested? The two factors driving this blocker are:
State dependency: neural networks are more like the chess board than the night sky. That’s because, generally, each layer in the network is connected to all the nodes in the layer before it. This means it requires the state of the previous layer (literally, ‘state dependent’). Worse still, all the weights in every layer are determined by the previous time step. So if you want to verify that someone has trained a model–say, by picking a random point in the network and seeing if you get the same state--you need to train the model all the way up to that point, which is very computationally expensive.
High computational expense: It cost ~$12m in 2020 for a single training run of GPT-3, >270x more than the estimated ~$43k for the 2019 training of GPT-2. In general, model complexity (size) of the best neural networks is currently doubling every three months. If neural networks were less expensive, and/or if the training represented less of the model development process, then perhaps the verification overhead stemming from state dependency would be acceptable.
If we want to lower the price, and decentralise the control, of deep learning training, we need a system that trustlessly manages state dependent verification whilst also being inexpensive in terms of overhead and rewarding to those who contribute compute.
The Gensyn protocol trustlessly trains neural networks at hyperscale and low cost. The protocol achieves the lower prices and higher scale by combining two things:
Vastly increasing the scale of accessible compute, whilst simultaneously reducing its unit cost, opens the door to a completely new paradigm for deep learning for both research and industrial communities.
Improvements in scale and cost allow the protocol to build up a set of already-proven, pre-trained, base models–also known as Foundation Models–in a similar way to the model zoos of popular frameworks. This allows researchers and engineers to openly research and train superior models over huge open datasets, in a similar fashion to the Eleuther project. These models will solve some of humanity’s fundamental problems without centralised ownership or censorship.
Cryptography, particularly Functional Encryption, will allow the protocol to be leveraged over private data on-demand. Huge foundation models can then be fine-tuned by anyone using a proprietary dataset, maintaining the value/privacy in that data but still sharing collective knowledge in model design and research.
With the Gensyn protocol, we finally have hyperscale, cost-efficient, deep learning training that isn’t marshalled by institutions who play god with who gets access. The first version of the protocol, our testnet, will be deployed later this year.
Finally, thank you to everyone in the Gensyn community who helped with this piece!