At the beginning of 2023, I wrote down some thoughts about “the Singularity”. This was, in no small measure, influenced by the novelty of ChatGPT at the time.
A lot has happened since then, and due to several flashpoints of drama, as well as the information cascade in machine learning and the maturation of a several components of the Ethereum ecosystem, I think the moment calls for a recap of the state of the art in both cases, and my dubious conjectural interpretation between the lines.
We’ve all come to love the cambrian explosion of LLMs, especially in chatbots. While extremely large models like GPT and PaLM have grown by orders of magnitude in recent years, the public consensus has drifted towards software that is small enough for consumer hardware (and further research indicates that software can shrink further). There’s a catch, though: most of these models are mediocre at best, and bleeding edge products like GPT-4 are overhyped as sufficient for virtually every NLP use case, when they’re really not. One can only fit so much context into these prompts, and there is active research like MegaByte and LongMem that attempts to rectify that limit. However, large-context products like Anthropic’s Claude and open source models like MPT-7B-Storywriter are not small enough to run locally. TL;DR is that there has to be some semi-permanent extension of the math going into these models, and thusly large corpuses of information can be divided and converted into vector embeddings. Additionally, one can generate indices of these vector embeddings, and deal with the locationality of those vectors in a shared space, which can lead to a very powerful search engine for colossal amounts of recorded natural language. Below is a pretty useful walkthrough of these principles.
Ultimately, tools like Llamaindex, Pinecone, and Langchain are leading indicators for a network that can exchange previously established information across very manageable nodes with very permissive specification, including many semifungible schemas for short and long term memory, and these AI services might even be overlaid by data structures like knowledge graphs. If you think we need a monolithic, multimodal, multitrillion parameter language model to augment human civilization, boy, do I have a GPU to sell you.
Gorilla is one piece of research among many that describe the journey to make LLMs actionably valuable to the real world. More specifically, Gorilla is one attempt of many (like toolformers & toolmakers) to train on datasets of API calls to “learn” how to use a standard interface to preconceived (human) software that is thoroughly developed to procedurally interact with the real world. Point is, humans first learned to use tools, then we learned to change our natural environment with those tools, and API-oriented models will follow suit. Oh, and OpenAI will also follow suit.
If models can use tools, then models can use other models. Orca is a relatively small LLM that uses GPT-4 explanation and ChatGPT assistance to learn, however it does not just learn to mimic, but rather it is attempting to learn to reason from preexisting thought processes. This sort of challenges recent study that indicated that small LLMs coming out of the LLaMa leak were not properly benchmarked and on further scrutiny did not perform nearly as well as the much more performant models (GPT-4) that they imitated. Personally, I’m optimistic that reasoning can be learned by models small enough to be trained on older GPUs, though this is clearly falsifiable and may not be the case.
One might ask, “all of these models are great prototypes, especially when the researchers have the resources to burn, but what’s the most atomic version of this software that actually does the job?” It’s a very good question to ask, and has been brought up repeatedly because researchers often juggle OpenAI’s various tiers of models, like the cheaper ada-002 for embeddings, and the decisively more expensive GPT-4 for the best reasoning skills and generative ability. FrugalGPT is an attempt to use techniques like “LLM cascade” going up from the bottom-tier, to stop at the cheapest model that sufficiently does the task at hand. With software that daisychains many tasks together, the opportunity to minimize resource cost and avoid overpriced APIs grows larger.
What if none of the models suffice in “0 shots”? There’s a progression in pretty much all LLMs, especially the highest-performing models, to progress through a sequence of multiple steps to “reason” into a much better final output. This subfield has progressed from few-shot benchmarking, into Chain of Thoughts (CoT). And now, with Tree of Thoughts, there’s more thoroughness, more decision-making, and possibly more introspection. Between FrugalGPT and Tree of Thoughts, there’s a model-agnostic spectrum of how much resources should be spent and to what standard for a given demand, but both pragmatically indicate that the consumer is not captive to a select few products that just happen to be very performant.
What would speak to performance more than an agent, especially one in the real world? That’s the approach done at Google, where they go through the process of having robot perform tasks while also using GPT in the background as an Inner Monologue (AdaPlanner also explores this subfield). In the case of Voyager, a distributed team used agentized GPT-3.5/4 (AutoGPT) as a “blackbox LLM”, along with other transparent elements, to create an embodied, lifelong-learning agent that can excel through Minecraft. The element that sticks out to me is the skill library: if this architecture can applied to the real world, then most of the trial & error results are extremely communicable across the Internet, and possibly across multiple usecases & modalities. It also adds on to FrugalGPT and Tree of Thoughts, now we have a “three-legged stool” for LLM workloads that balances between frugality, thoroughness, and skillfulness.
And the aggregate demand for these workloads already converges to the same interface standard: natural language. Where others might see single-payer API-based software, I see a massively multiplayer market that, most of all, emphasizes coordination of optimized intelligence work with no unnecessary overlap or repeats.
Moore’s Law used to be a question of how far we could push single-threaded computation, then we realized we needed multi-threaded computation, and even more specialized silicon & algorithms afterwards. While projects like Apple Vision Pro take this to the extreme and introduce an entirely new paradigm of spatial computation, one has to challenge the conclusion that these have to be mass-produced and mass-consumed, in single threads, for maximal economic windfall. I think we get a lot more juice from a much simpler squeeze, when anyone only needs to figure out that squeeze once for everyone else. Instead of just hyperperformant autonomous hardware and “blackbox” autonomous software, our mundane, daily driver might be the effectively coordinated mesh of much simpler computational processes, and the specialization of silicon & “inscrutable matrices” should instead be invested in the unknown, one-off frontier (and perhaps bleeding-edge form factors for nominal entertainment devices as well).
In “A Singular Trajectory” I was somewhat obsessed with the “Mixture of Experts” architecture because it seemed so intuitive on the surface. Human intelligence is built on a neocortex, the secondary structure on top of an allocortex, and is composed of several lobes, which all works together to handle more abstract, rational control over instinctual behavior underneath. Furthermore, agrarian society has further cemented a tertiary form of “sparsely-gated” human intelligence in specialized roles like warrior and priest castes. And just like the human mind & society generates and prunes generations of living memory, present-day machine learning is complemented by ever-evolving systems that can handle magnitudes more generation and distillation of ephemeral data. My pet theory is that we only have to implement a global market with “neocortex of the crowds”/consensus of demand on one side, and this imminent “network of AI atomizer/indexers” on the other side. Much like biological evolution is the sum of many beneficial mutations, I think that human civilization will find repetitive footholds in something that, while presently inscrutable, remains profoundly invaluable thereafter. Current generative AI will be seen in retrospect as extremely transient, to the semipermanent relay race we’ll find ourselves in later.
I’m somewhat optimistic about the current turn of events in social networks, and part of it is centered in the somewhat negative disruption of twitter and reddit as public square platforms, by twitter and reddit as profit-seeking, advertising platforms. Why? Because we discovered the network effects of social media at a time when content and forums were diffuse, and we’ll rediscover network effects again, when content and forums are necessarily decentralized into the fediverse, bluesky, and farcaster, and reconnected by the most ambitious hubs that index them all. Though not everyone agrees with me.
To be candid, I think a lot of projects take on the quixotic ideal of “one size fits all”, and respectfully, the evidence is pretty clear. Zillow tried to broker properties everywhere with iBuyer, but they couldn’t outperform local brokers familiar with the area, nor could they anticipate the macro shift. Even dispatcher services, whether through public service like the fire department or through private service like Uber, end up converging on a federated architecture to internalize all that bargaining without sacrificing nuance or diversity. Social markets of ideas need to federate moderation & gossip policies, much like real life.
Moreover, we’ll discover a new form of additive moderation (as opposed to censorship-based moderation) when hubs balance toll-gating their service with retroactively rewarding the most active, quality curators with subsidized access. The days of bot-infested, attention-economy advertised apps are not done, but they are waning. And we simply have not caught up to the technical debt or the full potential of coordination games, when a supermajority of the human population can assemble online in one ecosystem.
However, this does speak to the growing technical complexity and underlying political nuances of maintaining a continuous digital town square in every place, all at once. Most of these federated apps do have fallible moderation policies and poor user security, but more to the main challenge, they’re not cohesive like twitter or reddit in providing the same cultural security of boundless assembly. At best, there are dedicated nomads that maintain continuity between these alcoves, but the larger public consciousness yearns for a unified forum for the sake of assembly and oversight, which will be a heightened challenge for a while. Wat do?
Tools like Farcaster’s FIP-2 delve into the possibility of superficial discussions of ephemeral events (and pretty much everything else) being incredibly easy to deploy and contain since the underlying hubs have already adopted the appropriate communications standard.
In fact, I think that there’s an opportunity to start treating namespace and “chatspace” as essential to basic societal function, and that partly entails making both first class features in a web browser, well beyond the vicissitudes of a team or company that will retain unilateral control and other liabilities if those spaces ever get sufficiently populated or controversial. Developments like EIP-3668 give namespaces like ENS the ability to register on L2s and chatspaces, like Farcaster or Orbis, the ability to validate offchain information as users freely express themselves around the virtual locus of that information. Imagine a Kafkaesque town meeting over Zoom, then reimagine that same meeting when the town locals can digitally opine their timestamped criticism of the conduct therein, without the chilling effect of censorship or retaliation. Imagine generative AI, instead of being constrained to a single consumer, commanded by the consensus of any arbitrary community. Then reimagine generative AI that is continuously prompted by the aggregate sum of published content from many communities, bots, and individually divergent thinkers. Ethereum is already positioned to be the mapping and the market for this space (and all application-layer protocols should attempt to extend this nonrivalrously).
Recently, I read Vitalik Buterin’s “The Three Transitions”, and it’s excellent. I think that it’s one thing for me to just casually claim that opensource AI is just a short leap from a cryptographic parasocial network that is extremely emergent as the sum of its parts, and it’s another thing entirely for the deep thinker in the center of this developing network to soberly enumerate the ways in which it hasn’t matured yet to take on some of that high-level responsibility. Long story short, the network needs more L2 scaling, securely abstracted wallet logic, and privacy. Coincidentally, the SEC is suing some of the main onramps to Ethereum, the CFTC just set a dangerous precedent for unincorporated DAOs, and there are other ways in which organizations or homunculi on Ethereum can be mismanaged or attacked. More to the point, we know that we don’t entirely know what needs to be ingrained into an autonomous agent, or an account abstraction of personhood, or the defense against some unrealized, retroactive counterparty risk. I highly recommend everyone read that to get an idea of the versatility that is still needed to really augment Ethereum to fill all gaps IRL.
Part of this versatility is covered by EIP-4337. Just like EIP-3668 was proposed to standardize the outer validation of otherwise arbitrary data storage and availability methods, alternative mempool account abstraction tries to standardize how an arbitrary first-class contract would send an operation to be aggregated with other arbitrary account contracts. This is just my generalist take, but EIP-6551 also follows with an attempt to standardize nonfungible tokens themselves as first-class managers/transactooors of other onchain objects, while sharing a common registry with other token-managing tokens. EIP-4626 is an attempt to tokenize yield-bearing vaults as the ERC-20 standard, which can be further leveraged by protocols like Wildcat (which itself might lead to a standard of undercollateralized but compartmentalized debt). Why am I just dropping a bunch of Ethereum Improvement Proposals, and what does that have to do with overhyped AI? What I’m trying to say is that we shouldn’t just have initial projects like ChainGPT or infra like Autonolas, Gelato, Lit Protocol, instead there should be an autonomous-execution-bearing standard, especially due to the fact that onchain primitives like these previous EIPs are abstractions of distinctly valuable workloads. With LLMs like FinGPT and colossal context capacity, it should come as no surprise that the main “active participants” of DAOs in the imminent future will be onchain and as close to trustworthy as possible, because they won’t be human. With SOTA DeFi like CowSwap, Crocswap, Balancer, Poolshark, Uniswap v4, & cross-chain applications, and more complex architectures like MUD, more of the control of economic velocity should shift to superhumanly attentive agents via these onchain objects. But the caveat, as Vitalik points out, is that the execution towards IRL needs sufficient privacy. These things work too well in theory for captive human participants in authoritarian economies, like the United States, to either control or benefit from them.
“Token DAOs”, as advertised through the 2021 boom, don’t seem to be sustainable. After experiencing several myself, I think that the most important lesson is that DOs can be profoundly trust-intensive, slow-moving, and liability-inducing in such a demoralizing fashion that it is not reasonable to involve the public. Not unless decentralization, autonomy, and organization are clearly hardcoded and clearly reinforceable, before resources are solicited from uninformed (or even misled) buyers, and well before resources can be misappropriated from an unenforceable “governance” or “constitution”. Expect most tokens and promises associated with DAOs to go to zero, and try to entertain just a little healthy skepticism before FOMOing in.
Hyperbole aside, I think a lot more has to be protocolized and kept away from both a tyranny of structurelessness, as well as collusion-weak insiders. Hypercerts are one example of a protocol that can introduce a free market for proofs of impact:
Developers of DAO infra have shown how to nest multisigs so that they’re self-evidently checked & balanced against DM politicians, FUDsters, and other social engineers that drain public treasuries. And, of course, Nounish DAOs are chugging away (though they could be more nested as well).
I’ve even written about other nested approaches that could be developed further to show, beyond a reasonable doubt, that capital can only be allocated to constructive ends, and in the case of human discretion, only a minimized fraction of the public funds can be allocated to a limited scope of work. And yes, I am saying that immutable Proof of Work should be more prevalent in cases where the network is already maintained by Proof of Stake or something in the same ballpark of security and efficiency. “Passive income” is great for speculators, but DAOs need to account for actually productive work when driving real world impact, and they need to acknowledge when the labor market rate they set for contribution is no longer realistic. With effective autonomous agents arriving in force, there will be a question on how their workloads can be tokenized, lent, and borrowed across time. Ideally, those workloads can get exchanged without bureaucracy, info asymmetry, or illiquidity.
We need hardened protocols more than we need onchain politics, just like most of our demand is pointed towards an optimistic North Star where our software solves a lot of pain points (which I write about in “The Possibility of a Seldon Crisis”).
That’s assuming the incentives are in place to balance privacy with zk-attestations, autonomy with accountability, and all of this is legitimately sustained by real economic purpose. So part of the equation to forming the “neocortex of the crowd” is finding sector-specific “orderbooks” between the work we, our computers, and our actuators (physical or digital) can afford, and our thoughts, our speech, and our wallets on other side of the market. Well, that’s a big mess of atomic tasks, skill sets, and invoices. Besides, there’s many layers to a full-fledged modern economy from first principles. Parts of this equation need to be simplified even beyond EIPs, and just like intermediate economics deal with Coase vs Ostrom’s philosophies of bargaining and resource management, there has to be a credibly neutral index & archive that can’t be taken down. (this is a periodic reminder to download backups of your data)
Frankly, I think that tech has not sufficiently deployed a minimalist graph for simple economics. For example, if someone wanted to create a dapp for managing private goods, they might want to track depository structures like EIP 1271-gated locker hubs as the nodes, and chains of custody between reputation-staked agents as the edges. If neighborhoods wanted to create a dapp for organizing supply and demand for local agriculture, they might map out community gardens and use a combination of reputation-staked gardeners and offchain sensors like cameras, thermostats, and hygrometers to make sure that enough of what neighbors want to consume is growing as planned. The other aspect to consider for simple economics is that even reputation and surveillance can be removed from the system if we achieve enough competing trusted execution environments (and not just one that can be a single point of failure). After all, the larger picture is the sum of correct executions, some which really need to remain as close to zero-knowledge and diversified as humanly possible.
Suppose we figure all this out. Suppose that AI becomes this MMO enterprise, and Ethereum becomes the “Internet of Value” that compounds this “MMOE” (there, I coined it), and all of the sudden we’re on the doorstep of the Singularity™. Why not sit back and just drink in the public good that manifests? Here’s how far I see us being from such a passive reality. Twitter, as flawed as it was, dropped from being an ad-saturated public town square to a privately controlled, rent-seeking platform. Reddit, once the fabled “front page of the internet”, dropped from being a quasi-confederation of information-sharing subcultures and mobile frontends to a profit-driven, rent-seeking candidate for an imminent IPO.
To be clear, I’m not complaining about the commercial model, in fact others have explored why certain aspirational public goods and hyperstructures need to be grounded in a toll good or some other monetization mechanism to keep the lights on.
No, the issue at hand is that these social platforms are the crutch that normally insignificant, disenfranchised members of society (like myself) use to broadcast really important values and form parasocial relationships that connect them to every other corner and every manner of public figure in this permanently online society. When twitter and reddit recede, and the only platforms left are “attention economy” maximizers like instagram, youtube, and tiktok vs insular, dark corners for private communities, where can this online society afford the divergent, multidirectional discourse that keeps us discovering and developing more technology that further expands our ecological and sociological limits? Midjourney, MusicGen, and ChatGPT are generative trifles, so do we truly expect them to magically generate walkthroughs for human problems? If we’re not carefully invested in a semi-unified public forum, there is a runaway timeline where very few members of society are incentivized to genuinely connect with each other, and even fewer have the cognitive or charismatic means to reverse that historical episode. Interestingly, we seem to be at a crossroads for content creation, where culturally incendiary “speechcrime” becomes more justifiably censorable, rather than more easily addressable by the wealth of tools like twitter’s “community notes”. Here’s Mr. Bean explaining how suppression of offensive speech is actually self-defeating:
Part of the Singularity™ that I’d like to avoid is economic dystopia, which is a somewhat likely timeline we have to constantly spend human capital in order to avoid. Not only does the cost increase as digital communities get further separated to segregated namespaces, but there already exists a background of totalitarian apparatuses that just get less checked over time. At the moment, the SEC is suing the primary arteries of crypto as a safe haven from a potential banking collapse. That same banking system depends on the captivity of American and foreign nationals that transact or remit in the dollar. Others use the term fiat currency, which is effectively a nationalist scrip instrument with a diminishing soft power over time. Wealth is attained by either investing away from that scrip or inheriting previous investments. If there was no public resistance, forms of social mobility like sweat equity, unaccredited investors, and electable office would cease to exist. The same can be easily claimed for generative AI and crypto. It is somewhat worrying when the CFTC rules that an unincorporated DAO is, in casual legal terms, radioactive waste for its active participants. These MMOEs, that deal in unsanctioned units of account & self-autonomy, don’t magically translate to traditional society as self-sovereign panaceas. Especially not as completely public transactions, not without a certain degree of compartmentalization like BORGs, and it’s pertinent to bring up that a smart contract or an AI has never come close to legal personhood in the eyes of a traditional court, so don’t expect anything but a human intermediary/fiduciary is going to manage a given utopian apparatus. And for human intermediaries, you gotta give.
Then again, are we really in short supply of worthwhile humans that can intermediate as SMEs in larger IRL coordination games? There’s a simplified concept that a city doesn’t really form until a hospital, school, market, and bank are built in that proximity. From another viewpoint, communities typically don’t survive without some central structure for social welfare (religious centers often take on this role). So while there is a much more extensive global market for machine learning and cryptoeconomies, dialing those down to the local IRL level, especially without profit-seeking, is going to be a worthwhile challenge. However, while it may seem directly unsustainable, that local IRL public good actually creates more of an economic & intellectual base, especially if the at-cost toll is calculable, minimal, and enforceable. The best way to explain this train of thought is to say that an extensive philanthropic network is probably one of the directly achievable precedents to a non-dystopian Singularity™, and perhaps an underappreciated substrate in AI & crypto that drives NGU vibes, that later provide the revenue needed for other ventures.
In a sense, what I’m trying to describe throughout this piece is that there is digitally-bound knowledge and infrastructure that can do pretty amazing things, but it falls to us as fallible beholders to actually complete the physical impact that these technologies might have, should we successfully divide and conquer the space without succumbing to Moloch. And yes, all signs seem to indicate that we are closer to a potential convergence of technology to humanity compared to six months ago. Yes, the broader OSS community has discovered a plethora of cool things between AI and crypto, but no, there isn’t enough attention to make these cool things tangible enough for positive-sum, mainstream adoption. Without incentives or a commonly shared adversary, the public interest is not worth incubating on its own, and without public goods like censorship-resistant forums, there’s very little space for ambivalent special interests (like us) to accidentally overlap on this public interest.
Nevertheless, it’s worth reflecting on how this acceleration is really happening, how the software is promising even though most of it isn’t fully developed or seamless. The fact that we have so many platforms and so much competition is also a good complicated thing, even though we do have to confront the fact that there’s still a bit too much zero-sum competition.
Whether you call it a MMOE, or a market of BORGS, EIP-????s, neocortices, or sparsely-gated MoEs, I think we’re very close disposing of a “singularitarian” viewpoint that sees this:
or reads about the Andromeda cluster, and thinks it’s worth spending on some monetizable, single-threaded juggernaut. In fact, it’s more prudent than ever to recover your data from these siloes for the next stage. Our civilization has always, and will always, be the juggernaut, the question is which superficial, ephemeral protocol our civilization picks next, and whether we succumb or thrive with it.