Mass and the Law of [Economic] Gravitation
March 1st, 2022

Part 4.2 of Planetary-Scale Computation: An industry primer on the hyperscale CSP oligopoly (AWS/Azure/GCP):

  1. Let’s Get Physical, (Cyber)Physical!: Flows of Atoms, Flows of Electrons
  2. A Cloudy History: Four Histories of Cloud Computing
  3. Primer on the Economics of Cloud Computing
  4. Three-Body: Competitive Dynamics in the Hyperscale Oligopoly
    1. Initial Positions and Laws of [Competitive] Motion
    2. Mass and the Law of [Economic] Gravitation
    3. Velocity and the n-body problem
  5. The Telos of Planetary-Scale Computation: Ongoing and Future Developments

Table of Contents for Mass and the Law of [Economic] Gravitation:

  • Mass: The Cloud is a variable-mass system
  • Law of [Economic] Gravity: Supply and Demand
    • Compute Demand
    • Supply: Capex/Capacity Rules Everything Around Me (C.R.E.A.M.)

Mass: The Cloud is a variable-mass system

On top-down market sizing and the economic mass of the system. On TAM, market penetration, and intra-market share capture vs inter-market share capture.

The Rose by NASA/JPL-Caltech/SSI
The Rose by NASA/JPL-Caltech/SSI

Whereas classic three-body system in physics is concerned with a closed system with only three bodies, the system which we are considering is a variable-mass system in which three hyperscalers constitute the system’s center of mass but are not the system’s sole source of mass. Infrastructure and platforms, by definition, can not be infrastructure and platforms unless they serve as the foundation for bigger things to be built on top of them. In much the same way that the point of roads and bridges are not the roads and bridges themselves, the point of the tens of thousands of tons of metal and glass that compose the hyperscalers’ “clouds” is not itself. In both cases, the value of the infrastructural layers is contingent upon what can be enabled by and through them — roads and bridges enable the physical movement of automobiles and the people within them; IaaS and PaaS enable the computation, communication, and storage of electronic information through code.

From a strictly materialist perspective, the literal center of physical mass within the Cloud industry is the material infrastructure layer constituted by the thousands of tons of hyperscaler-operated servers in the Cloud, with all of the -illions of electrons and atoms that comprise the code and data in the software layer weighing, well, much less than that. However from an economic perspective, the “mass” of the dematerialized economic flows (i.e., periodic recurring revenue, annual free cash flow, etc.) and stocks of capital (i.e., market capitalization, enterprise value, etc.) of the software layer exceeds the economic “mass” of the infrastructure and platform layers which forms its foundation.

From a [revenue] flow-based, global public SaaS run rates at around 2-3x that of IaaS+PaaS.

This stacked bar chart isn’t up to date (2016 to 2018) but it clearly illustrates the relative market sizes of the layers of the public Cloud industry — IaaS+PaaS serve as the foundation for a larger, dematerialized economic structure.
This stacked bar chart isn’t up to date (2016 to 2018) but it clearly illustrates the relative market sizes of the layers of the public Cloud industry — IaaS+PaaS serve as the foundation for a larger, dematerialized economic structure.

More recent IDC figures for CY2020 indicates that global public revenues of SaaS are ~2x that of IaaS+PaaS revenues:

[SaaS - System Infrastruture Software] + [SaaS - Applications] = 16.0% + 49.7% = 65.7% of market share; SaaS/[IaaS+PaaS] ⇒ 65.7%/34.3% = 1.91x
[SaaS - System Infrastruture Software] + [SaaS - Applications] = 16.0% + 49.7% = 65.7% of market share; SaaS/[IaaS+PaaS] ⇒ 65.7%/34.3% = 1.91x

From a stock-based [”stock” as opposed to “flow”-based, that is; not referring to equity value] perspective, napkin math indicates that the global market value of public and private cloud companies exceeds the value of the hyperscaler cloud business segments by roughly a factor of 2 to 3.

The existence of a higher software value layer on top of the Cloud’s infrastructure and platform layers is what makes these layers Infrastructure-as-a-Service and Platform-as-a-Service. The market cap of the top 20 public US utilities companies is somewhere within the magnitude of ~$700B [a figure that’s less than half the market cap of Amazon, the smallest of the Big Three cloud’s parent companies] versus the aggregate market cap of the 30 US-based constituents of the S&P Global Luxury Index is ~1.7T, but the flourishing of the latter industry is dependent on the functioning of the former — whereas Thanos-snapping away the 30 US luxury companies off the face of the Earth would immediately destroy 1.7T of market value, doing the same to 20 US utilities companies would destroy 700B of market value as well as the market value of every company in the Luxury Index because bullets are more valuable than Birkins when the world is ending. Similarly, while the economic mass of SaaS companies eclipses that of the hyperscale cloud businesses, their value is predicated on the existence of the thousands of tons of servers that compose the Cloud’s infrastructure — this is why, for the sake of my contrived three-body analogy, the hyperscalers are the center of gravity of the system despite being less valuable than SaaS companies in the aggregate.

That Cloud platforms create systems greater than the underlying platforms themselves is the intended vision that has been implicit from the outset of the modern cloud computing industry and is increasingly explicit as the industry continues to expand.

If declining and dying industries might be conceptualized as systems which are losing both mass and energy [in the analogical/economic sense as well as the literal/materialist sense; an industry’s economic value relies on the employment of actual mass and energy in the form of labor and capital], then the system in which the three hyperscalers form the center of gravity is steadily accumulating mass and energy from the broader economic universe. The Cloud is a variable-mass system.

Q1’20 estimations [for this primer, the primary focus is on framing and not on up-to-date numbers] from GS and Gartner indicate that IaaS+PaaS will reach ~18% penetration of the total enterprise IT market opportunity (est’d @$777B by Gartner) in 2022, presenting over $600B of additional inter-market share capture for hyperscale cloud players. This Q1’20 estimation of ~$140B IaaS+PaaS revenue is less than the ~180B (=71,525M+106,800M) implied by Gartner’s public cloud spending forecast from Apr ‘21. More recent Aug ‘21 forecasts from IDC predict combined IaaS+PaaS revenues of $400B in 2025 at a 28.8% ‘21-’25 CAGR, implying a ~$187B IaaS+PaaS (400B/[1.288]^3=187.2B) in 2022.

Framing the cloud market as a pie chart with relative percentage shares between the hyperscale players omits the fact that the pie is continually getting bigger. In growing industries, participants can sublimate the desire to engage in zero/negative-sum, intra-market competition and tacitly cooperate to expand the overall market, effectively engaging in inter-market competition in which competition is reframed as disruptive industry (Public Cloud) vs incumbent industry (Traditional IT).

Overall industry growth relieves the pressure within the variable-mass system known the Cloud so that the hyperscalers have the option of expending resources on growing than the pie rather than fighting over where to cut. Some people disagree with framing the entirety of IT as an opportunity set for the Cloud, arguing that this framing understates the SAM/TAM penetration of global Cloud IaaS/PaaS/SaaS spend. They may be right but, frankly, it doesn’t really matter for our purposes today — the takeaway that most everyone can agree on is that the Cloud is growing and still has room to grow, regardless of whether or not the “Global IT Spend = TAM” framing is overzealous. It’s questionable whether Porter would consider public cloud infrastructure as an “Emerging Industry” or “Maturing Industry” in the ontology he sets out in Competitive Strategy [certainly not a “Fragmented Industry” or “Declining Industry”], but the quote applies regardless — mutual dependence between firms in an industry “inducing substitution and attracting first-time buyers” have common enemies and common problems.


Law of [Economic] Gravity: Supply and Demand

On the competitive laws that govern the overall mass of the system and its constituent bodies. Cost vs Price. Margin evolution.

Gravitational Waves (2017) by Tomás Saraceno
Gravitational Waves (2017) by Tomás Saraceno

Depending on what we consider as mass bodies, analogizing supply and demand as gravity should be considered anything between an artificial contrivance and conceptual bastardization. For example, if we are considering the dollar value of the three hyperscalers’ hypothetical standalone market capitalizations and/or enterprise value as individual masses, then gravitational interaction between these three bodies would be impossible to interpret in terms of supply/demand dynamics — the primary mode of interaction between AWS and Azure has little to do with supply or demand between the two businesses; the more appropriate interpretation within this mass-gravity framework might be one of attraction [i.e., industry clustering] or collision [i.e., negative-sum competition]. If we, however, consider global aggregate demand for public cloud infrastructure services as a mass and hyperscale capacity to supply these services as another mass, there might be some amenable interpretation that makes this mass-gravity analogy work for supply/demand.

Of the five pairings in this three-body problem analogy that I assigned a comparable business/economics element to in this primer, relating Newton’s Law of Universal Gravitation to the “law” of supply and demand has been my least favorite. It’s not that there’s no merit to the comparison — in fact, there’s even precedent for this gravity to supply/demand comparison in international trade economics in the form of the gravity model of trade. That this analogy works somewhat well is the problem, placing it in some sort of uncanny valley for conceptual metaphors that the “Laws of Motion ⇒ Law of Conservation of Attractive Profits” avoids by virtue of obviously not making any sense. I should also mention at this point that the use of Newton’s laws in describing economic phenomena [and therefore criticism of its use] has a long tradition dating back to Marx’s concept of an economic law of motion all the way to obscure, heterodox, Marx-Keynes-Schumpeter (MKS) syntheses in the form of H.J. Wagener and J. W. Drukker’s The Economic Law of Motion of Modern Society (1986) and Peter Flaschel’s The Macrodynamics of Capitalism (2009).

In any case, the “gravity” metaphor works for now so I’ve tried to make the best of it. If the analytical result of gravitional interactions between two bodies is their path and position, the intention of this Supply/Demand analysis is to build a framing around how margin for cloud infrastructure providers might evolve. The basic idea is this — too much demand relative to supply and industry margins rise because undercapacity gives CSPs’ pricing power; too much supply relative to demand and industry margins fall because overcapacity takes away CSPs’ pricing power. The industry margin therefore becomes a question of how compute demand will evolve relative to compute supply/capacity and what are the subdrivers for both demand and supply. This is the question we will be exploring.

Compute Demand

Metaverse. AI. Jevons paradox.

The Last Judgement in Cyberspace (2006) by Miao Xiaochun
The Last Judgement in Cyberspace (2006) by Miao Xiaochun

Broadly speaking, forecasting demand growth for compute is tantamount to predicting the future of humanity’s relationship with technology and, thus, humanity’s future. There might be people out there who might be able to say something like “The 10 year CAGR for compute demand will be X%, with factors = [a, b, ... n] driving [a%, b%, ... n%] of incremental growth in the demand forecast” and, in fact, the OECD formed a task force in 2021 around metricizing and quantifying AI compute demand needs for national governments — this subsection will not even begin to attempt anything of the sort. Quantification of demand and capacity for AI compute will undoubtedly have widespread, global policy implications as the adoption of general purpose AI/ML inevitably accelerates in coming years ...

... so I look forward to reading analyses from better informed, better resourced analysts, but a granular breakdown of global demand for compute [and accompanying demand for data storage and networking] is currently infeasible for me. Some useful framings for structuring the demand forecast for basic cloud infrastructure (i.e., compute, networking, storage) include ...

  • cloud penetration into the broader global IT market [discussed in the previous subsection]
  • Workloads/Data in Public Clouds (today vs planned)
  • Data Volume and Data Stewardship, (consumer vs enterprise vs cloud)
  • Technology Adoption Rates
  • Data Generation by Category

[Note*: Please see the Notion block for trend gallery*]

... etc., but I think what might better drive this discussion into future compute demand are explorations into specific use cases that are relatable to people on an individual level, because it is ultimately people that are downstream from usage of cloud computation.

This assertion is most obvious when we think about internet-based services like e-commerce, video streaming (Netflix, Youtube), app-based ridehailing/delivery (Uber, Lyft), consumer cloud storage (Dropbox, iCloud), social media (Facebook), multiplayer gaming, etc., but remains valid even when considering less-intuitive areas like manufactured consumer goods, industrial agriculture, oil & gas discovery, and national defense — the provisioning of most goods and services in today’s world, from digital media to state-operated physical defense systems, embeds and incorporates some quanta cloud infrastructure services. The photos in your iPhone gallery that are backed up on iCloud that comprises part of the $30mm+ bill that Apple pays to AWS every month (according to CNBC) can very clearly be attributed to each iPhone user. The cloud services sought by the US Department of Defense in whatever replaces the now cancelled $10bn JEDI cloud contract can be conceptualized as a pool of computational/storage resources attributable pro rata to the 320+ million people in the US — i.e., The cost of powering the cloud infrastructure demand for the national defense ~$30 over 10 years [that is, assuming the DoD was solely responsible for this function, which it’s not] of each American is (as implied by the JEDI contract).

In essence, each and every one of the 7.9+ billion people living on Earth today (and those dead people who either didn’t get a chance to shut down their AWS instances or are executing a Daemon) can be thought of as directly or indirectly consuming some quanta of the world’s computational resources. The in/direct use of computational resources obviously varies by individual, with those swathes of peoples living without Internet and/or personal compute devices (PCs, smartphones, etc.) in/directly using the least and Always Online people (me, and anyone reading this) in/directly using the most. Despite the disparity of computational resource consumption [i.e., the digital divide] inherent in the unequal global socioeconomic realities that also underly disparities in the consumption more fundamental resources like food, water, and energy, I believe that thinking of overall compute demand as [7.9+ billion people] x [computation per capita] (and similar analogs like “data storage per capita” and “bandwidth-use per capita”) is an effective and parsimonious framing.

The distribution of in/direct C/N/S resource consumption among individuals, although highly correlative with measures like income and wealth [source = N/A; common sense, general vibes], is probably much less so than other, more material resources. Whereas Jeff Bezos owns multiple homes and cars (therefore in/directly consuming quantities of steel, cement, wood, glass, petroleum-based products, etc.), he probably spends less time streaming Netflix, scrolling Facebook, or shopping on Amazon than you do. Of course that is isn’t the final word on his in/direct usage of computational resources because everything from the computation involved in his private security detail to the computation and storage used in the various systems that manage his personal wealth [e.g., If Jeff had a 5% LP stake in a computationally-intensive quant hedge fund then I’d attribute 5% of the fund’s yearly compute usage to him] should be attributed to him as well — make no mistake, he definitely uses more computational resources than you do. However, I would posit that the ratio of his compute use to your compute use is a smaller ratio than the ratio of his wealth to your wealth — i.e., [Jeff’s in/direct compute consumption]/[Your in/direct CC] < [Jeff’s wealth]/[Your wealth]. The difference in the in/direct consumption of C/N/S resources between Jeff Bezos binging 8 hours of Netflix and you binging 8 hours of Netflix is negligible and, given that people are granted only 24 hours in a day regardless of who they are, you can see why the global distribution of computation per capita is likely more egalitarian than global distributions of things like wealth or CO2 emissions.

This is how I justify reframing the question of forecasting changes in aggregate compute demand as a question about predicting how the daily life of average people [and therefore the compute intensity of daily life for average people] will change over time. Essentially what I’m saying is that if we decompose aggregate compute demand into ...

  1. ~7.9+ bn individuals
    1. birth rate, death rate, fertility, etc.
    2. increased human lifespan through advances in longevity longevity
    3. [residual factors]
  2. variability (statistical dispersion) of individual CPC
    1. remaining global penetration of PCs and smartphones
    2. continued global penetration of internet access
    3. variations in adoption of emerging technologies between income and age cohorts
    4. [residual factors]
  3. average computation per capita (CPC)

... then average CPC is the most interesting factor to analyze. Global population trends are extremely predictable and if we accept my provisional, minimally-viable argument that variability of individual CPC is negligible in the steady state [i.e., The digital divide is very real, but what I’m saying is that individual demand for in/direct C/N/S consumption won’t vary tremendously by age/wealth/geography/etc. cohorts (relative to other measures of consumption) in the near future when literally everyone has internet access and connected devices], then average in/direct consumption of computation becomes the leverage point for the entire equation. A [rough, non-rigorous, non-MECE, and ontologically inconsistent] decomposition of average CPC might look like this:

average CPC = f( ...

  • average indirect CPC
    • compute intensity of manufacturing, utilities, healthcare, national defense, etc. (goods and services that are not easily, directly attributable to individuals)
  • average direct CPC [this subsection’s area of focus]
    • computational overhead common between different use cases [i.e., all aforementioned use cases will require computation for de/encryption and AI/ML training/inferencing]
      • privacy-oriented cryptography
      • AI/ML
    • key use cases
      • personal finance (portfolio rebalancing via robo-advisors, budgeting, payments, etc.)
      • personal health (biometric tracking, consumer genomics, etc.)
      • “Metaverse” [this will be our main focus for the remainder of this subprimer]
        • social and leisure (gaming, online dating, chatting, VR-based social media, VR experiences, etc.)
        • education
          • rise in simulation-based learning [my personal pet theory]
          • persistence of distance-learning post-COVID
        • work
          • growth in proportion of global population engaged in knowledge work (Forrester estimates = 1.25bn as of 2018) ⇒ primarily driven by [secular, monotonic] automation of repetitive manual labor
          • “Creator Economy”, citizen developers, citizen data scientists, etc. ⇒ primarily driven by [secular, monotonic] automation of repetitive cognitive labor [through RPA, low/no-code, OpenAI Codex, etc.]

... )

[Sidenote*: The distinction between direct and indirect compute consumption is a matter of degree. This is most evident in cases of consumer tech hardware that utilize AI/ML to deliver services. For example, in the case of my Alexa, a certain quanta of compute can be attributed to ...*

  1. the design and manufacture of the hardware
  2. cloud-based inferencing (via AWS’s Inferentia chips) when I ask Alexa for the weather
  3. training of Alexa’s deep-learning (i.e., neural network w/ 3+ layers) ML model

... and it’s clear that the computational cost of inference is directly attributable to me but that the indirect computational cost of continually training the overall Alexa model can be attributed to me on some sort of pro rata basis as well.]

I should note here that the entirety of the multi-stage decomposition I’ve just presented reflects my imposed framing (i.e., there exist multiple valid ways to decompose average CPC that aren’t by direct vs indirect compute use) and my own personal views on what constitute key compute use cases for individuals. Framing demand growth drivers as being fueled by key use cases is an analytic choice I’ve made because it’s the framing that I believe to be the most informative, but there are other ways to frame the problem space such as by industry as in this slide on interconnection bandwidth capacity ...

From Credit Suisse: The Cloud Has Four Walls, 2021 Mid-year Outlook
From Credit Suisse: The Cloud Has Four Walls, 2021 Mid-year Outlook

... which decomposes interconnection bandwidth growth by existing industries — a similar analysis can be, and has definitely been, done on compute growth for it’s unimaginable that the hyperscale CSPs don’t internally conduct similar kinds of analyses that segment incremental cloud workload growth by industry. But, like I’ve said, I won’t be attempting anything of the sort.

Regarding computational overhead from privacy-oriented de/encryption and AI/ML training/inferencing, compute demand growth from the latter is more obvious and [rightly] expected to be more impactful than the former. My belief is that privacy and security concerns will only continue to rise as the Cloud manages larger and larger quantities of increasingly sensitive personal data and that the computational cost of de/encryption will be non-negligible ...

From Security Algorithms in Cloud Computing (2016) by Bhardwaj et al.
From Security Algorithms in Cloud Computing (2016) by Bhardwaj et al.

... at the minimum. The potential for mass adoption of zero-knowledge proofs and time-lock encryption (esp. verifiable delay functions) by individuals and enterprises represents potential computational overhead beyond the minimum tablestakes level, but this is a rabbit hole unto itself. With respect to AI//ML based computational overhead, Mule’s/FabricatedKnowledge’s *GPT-3 and the Writing on the Wall *runs the gamut of what I’d cover outside of a deeper dive and this graphic of his ...

... is as succinct as it gets.

Cryptography [once again, for privacy/security-related reasons; computational overhead for maintaining cryptonetworks through PoS, PoW, or other proof-based mechanisms is another issue entirely that I won’t get into here] and AI/ML will serve as sources of computational overhead for nearly every use case. In fact, these two sources of overhead are already ubiquitous in many of the internet services we use today. Look no further than “https://www.google.com/”, an AI/ML-enabled search engine that you connect through via the HyperText Transfer Protocol Secure (HTTPS) where the ‘S’ requires the computational cost of de/encryption. More pertinent to the Cloud is the fact that each major CSP offers encryption both “at rest” and “in transit” in accordance with Zero Trust Architecture (ZTA) principles that are increasingly relevant in a world where the surface area for cybersecurity attacks has exponentiated and cybersecurity has become an area of concern on the federal level.

This recent (Jan 26, 2022) strategy memorandum from the U.S.’s Office of Management and Budget, in response to Biden’s May ‘21 Executive Order titled ‘Improving the Nation’s Cybersecurity’ (EO 14028) which “required agencies to develop their own plans for implementing zero trust architecture”, clearly outlines those aspects of ZTA-oriented design that will become standards [de jure for gov’t agencies and therefore de jure for CSPs looking to earn gov’t contracts] sooner rather than later.

The privacy-based de/encryption and AI/ML-based requirements (and thus, associated computational cost) for personal finance and personal health applications are a function of these two use cases requiring particularly sensitive personal data and also being greatly reliant on ML models to be effective. A potential future where my health insurance premiums are partially determined through analyses of my biometric data streams [something already possible at technical level but still outside of the acceptable range of the Overton window on the societal level] is a future that will necessitate, whether by law or by consumers, lots of redundant privacy measures powered by cryptography (’Chapter 4: Contactless Love’ in Kai-Fu Lee’s AI 2041 is a compelling sci-fi illustration of this idea). The promise of health start-ups like Rootine [if you’ve riden the MTA in the past year you’ve seen their ads] and ZOE is the potential for pooling biometric and genetic data in order to derive insights for better health results. Here are two key excerpts from a highly recommended recent interview between Azeem Azhar and ZOE’s CEO and co-founder, Johnathan Wolf, that gives an idea as to the potential compute and data intensity of the industry:

JONATHAN WOLF: Today the product is live in the US and we’ll be launching in the UK shortly. What we do is we take this largest nutrition science study in the world and then we allow you to do a very simple at-home test and then we can compare your data using machine learning with the data from all of those studies. What that means is that it starts with a box that arrives at home and you unwrap and an app that takes you through a process that very easily explains how you can send us a microbiome sample, so that is a sample of your poop. How we can measure your blood sugar. So there’s a blood sugar sensor called a continuous glucose monitor they can put on your arm with a standardized meal that allows us to understand exactly how you respond to a meal with sugar and fat in it. And we also do a blood test to understand what’s happening to your fat, so your lipids. The fat is just as important as sugar in terms of what’s going on. We take all of that data and we also get you to track what you’re eating for a few days and give us some more context about your health. All of that information, we can then compare with thousands of people who took part in these weeks of clinical study in hospital and everybody else who’s been participating in ZOE since.

JONATHAN WOLF: The challenge to understand is that most science studies, particularly around nutrition, are very small. So whenever you’ve seen something on the front page about a particular food being a superfood or liable to kill you or give your cancer, most of those studies have involved 20 or 30 people. That means that the accuracy of that data is very low because there’s just not enough information, particularly given this huge personal variation that’s going on. Now, the reason why that’s happening is not because those scientists are bad scientists. It’s because the amount of funding that you can get to support a nutrition science study is very small. I won’t go into all the boring details, but the net result is that you end up funding maybe 30 tiny studies rather than one large scale study that can follow this over enough time to really get useful data. And so what is exciting, I think, for many of the scientists working with us is that they’ve been able to participate in what is the largest nutrition science study in the world that therefore gives you that depth of data that allows them to answer many, many questions, often questions we hadn’t even thought of at the point that we started the study because it’s got that scale of data.

It should surprise no one that all of Big Tech wants to get into personalized healthcare and have presented explicit strategies towards targeting the this emerging industry (despite the continued reluctance for traditional healthcare companies to dive into public clouds for various reasons).

Moreover, the recent successes of AlphaFold (extremely computationally intensive to train) in predicting protein structure from amino acid sequences (i.e. ATCG for DNA, AUCG for RNA) is catalyzing a wave of exploration into the potential of computational bioinformatics. Large-scale panel (across both # of subjects and time) analyses of combined biometric and genetic data and the resultant health and phenotypic outcomes will require lots of compute from privacy-preservation methods (incl. differential privacy methods) and model training/inference. To be clear, I consider myself uninformed in all things bio/health-tech [my expertise begins and ends with running agarose-based gel electrophoresis experiments on DNA in high school] so these bits I’m presenting are primarily invitations to DYDD, but what’s clear is that progress in this space will require lots and lots of computation.

With respect to personal finance, the crux of my argument here is that “everyone” (in the “everyone” has an iPhone sense of the word) will use some level of automated ML-based [and therefore computationally-intensive] financial asset management software akin to what existing robo-advisors already offer for the six-figure cohort and AM platforms like BlackRock offer for the 7+ figure cohort. What’s interesting about BlackRock is that [based on some limited conversations] they’ve been working on integrating quantized behavior and risk profiles in their wealth management business beyond just “behavioral coaching” for clients, indicative of what the future of PWM might evolve towards in the future, but I don’t know enough about ongoing developments to say more. What I can say, however, is that things like AI/ML-powered automated portfolio rebalancing sound more complicated than they actually are and are extremely low hanging fruit, hence why I (someone who doesn’t identify as a “developer” or “coder” and definitely not a quant) was able to create a minimally-viable portfolio rebalancing algorithm using:

  1. an LSTM module

  2. an open-source rebalancing algorithm (Hedgecraft)

  3. Alpaca’s APIs for financial data and trade execution

    [Note*: Alpaca changed their API sometime last year and I haven’t updated the code to reflect their new API endpoints. Regardless, I’m not responsible for anything you do with my code nor does anything I’m presenting anywhere constitute financial advice or a solicitation for a financial product.]*

My algorithm is [or rather, was, before Alpaca changed their API endpoint] able to automatically rebalance my personal equities portfolio via an API-first brokerage (Alpaca) using a risk minimizing rebalancing algorithm plugged into a basic LSTM module I stole and, if I knew how to deal with .yaml files, could have been made into an automated, pre-scheduled serverless task. In any case, it sucked for various reasons, not least because it was just too simplistic and simple ARIMA-based models often do a better job than LSTM-based models when using only historical price information but primarily because I’m not a quant. But the point I’m trying to make is that I (not a quant or coder) was able to cobble together something that worked and literally anyone [who can pass KYC for Alpaca’s brokerage] can use this — ostensibly, better renditions from more capable programmes will commoditize automated portfolio rebalancing, making this type of personal finance product as common as Robinhood is now.

What I really want to talk about, and what I’ve been muddling through these various tangents and preparatory remarks to get to, is the open question of the computational requirements for the emerging digital media landscape that we’ve christened as the “Metaverse.”

Matthew Ball opines in Compute and the Metaverse that ...

In totality, the Metaverse will have the greatest ongoing computational requirements in human history. And compute is, and is likely to remain, incredibly scarce.

... and, depending on how you define the M-word, there’s reason to believe this may prove true. There’s the question of what proportion of Metaverse-induced computation demand will take place in centralized cloud servers versus on-device (i.e., GPU-based processors integrated into VR headset) in the steady state versus via decentralized, local edge servers, and the various potential combinations in between. While, due to many of reasons mentioned in the Matthew’s article under ‘Where to Locate and Build up Compute’, I can’t imagine a centralized cloud gaming model for VR experiences [in which VR headsets primarily function as an interface and offload the majority of computation for rendering to hyperscale cloud servers], I can imagine use cases for massively multiplayer real-time experiences for people locally proximate to “edge” [that is, “edge” relative to larger, centralized data centers] locations — i.e., MMO cloud gaming is more tenable if the client-side users are all using their interface device in Lower Manhattan and the edge datacenter is there as well instead of, say, Northern Virginia.

However, the thesis that the Metaverse will require enormous amounts of cloud-based compute isn’t contingent on graphics rendering being offloaded onto cloud/edge servers. The graphics rendering for Facebook’s Oculus standalone models (Quest, Quest 2, and Go) is done on-device and yet the company is planning on doubling the number of buildings that it operates for internal use. From an article titled Facebook Has 47 Data Centers Under Construction from Data Center Frontier:

“As I’m writing this, we have 48 active buildings and another 47 buildings under construction,” said Tom Furlong, President of Infrastructure, Data Centers at Meta (formerly Facebook). “So we’re going to have more than 70 buildings in the near future.”

The article’s title is wrong — the “47 buildings under construction” are not all data centers in the same way the “48 active buildings” were not all data centers. Facebook’s Data Center Map lists 46 (not 48) buildings, only 18 of which are data centers with the rest comprising infrastructure to manage in/outflows of energy and water. If I had to bet, I’d guess that data centers comprise a higher proportion of the 47 buildings UC as compared to the 18/46 proportion Facebook currently operates because they’re probably planning on leveraging existing energy/water infrastructure for their new data centers.

[Sidenote: For context, I wrote this subsection the days leading up to and following FB’s disastrous Q4’21 print.]

From what I’ve seen, the attention of the analyst community underindexes Facebook’s Metaverse investments into DC infrastructure relative to Oculus-related R&D investments but the trajectory of their CapEx, which is largely driven by additional investments in their "data center capacity, servers, network infrastructure, and office facilities”, indicates an internal expectation that they expect the computational intensity of providing their services to increase. Given that they’ve reached the asymptote of their global user penetration, the only explanation for continued increases in CapEx is that they expect usage on a per user basis to increase through increases in app usage via more apps and/or more usage per app [FB’s explanation is more AI/ML workloads but I doubt they’re expecting the training data to only be from their Family of Apps]. While their stated position is that CapEx growth has not been driven by expected capacity needs from their VR services ...

From FB Q4’21 Earnings Call:

While our reality labs products and services may require more infrastructure capacity in the future, they do not require substantial capacity today and, as a result, are not a significant driver of 2022 capital expenditures.

... their entire Metaverse, “next computing platform” strategy depends on mass adoption which implies mass data collection which implies lots and lots and lots of data centers due to the drastically higher amount of C/N/S capacity relative to regular ol’ Facebook/Instagram/Whatsapp on a per user basis. Given the long year lead times for adding DC capacity from a combination of supply chain constraints and increased demand for all things DC (servers, chips, networking gear, etc.), FB is stuck in the unfortunate position of having to make a bet on expanding capacity before being able to better gauge whether the inflection point of adoption will take hold for their Oculus hardware, and doubly so given the lack of information about consumer uptake of VR hardware competition from Sony and Apple.

It makes sense then why Facebook would partner with AWS (announced Dec ’21 during AWS re:Invent) to keep open the potential for offloading their Reality Labs workloads in the case of insufficient internal capacity. To be clear, the AWS press release doesn’t say anything about offloading cloud workloads onto AWS from Facebook’s existing lines of business, but that FB will keep AWS-based workloads on AWS in the case of acquisitions already on AWS ...

Meta will run third-party collaborations in AWS and use the cloud to support acquisitions of companies that are already powered by AWS. It will also use AWS’s compute services to accelerate artificial intelligence (AI) research and development for its Meta AI group.

... but my impression, reading between the lines, is that FB is backed into a corner and is ...

  1. hoping to assuage Washington before pursuing tack-on VR acquisitions in order to ...
  2. ... spur the adoption of their VR/Metaverse platform ...
  3. ... while hedging the risk of not having sufficient DC capacity by forming a “long, term strategic partnership” with AWS ...

... which stands as the hyperscale CSP that is least at odds with them [FB & MSFT competing on Metaverse; FB & GOOG competing on ads and AI/ML framework; FB & AMZN compete on ads too but AMZN is nonetheless the better of three evils for FB]. But enough about Facebook.

[Note*: See this Notion block for “A sidebar rant on Facebook, Metaverse compute infrastructure, and framing potential CapEx requirements” that I can’t collapse/hide in Mirror*]

What AWS has been doing with gaming is quite instructive for how the Metaverse might potentially, actually operate in a Universe in which the M-word is (eventually? inevitability?) brought into reality, a prospect that everyone seems to have an opinion on but few take to its logical conclusion with respect to underlying [hardware-based] requirements. Their most recent re:Invent revealed implementation details of what a Metaverse on the Cloud [the only place with sufficient capacity for it to run at scale] might be designed to handle participants at scale, but it should be noted that the company’s competence in multiplayer experiences has been many years in the making — Minecraft, before being transitioned onto Azure in 2020 post-acquisition by Microsoft, used to run on AWS since 2014; Roblox has been running on AWS since 2017; Fortnite has run completely on AWS since 2018; Figma (and therefore FigJam) uses AWS; League of Legends runs on AWS — so they’ve had time to iterate and learn from operating multiplayer experiences at scale prior to their release of their first, internally produced, MMO game title, New World.

What Werner is talking about when he makes the distinction between the “Old World’s” scale up and the “New World’s” [he’s referring to both the game title and a new philosophy/architecture here] scale out is in reference to the actual, physical servers that mediate multiplayer online experiences — whereas your client (i.e., your PC) had to connect to different physical servers that were dedicated to a particular “town” or “zone” in traditional MMOs (which is why you get loading screens when you teleport from Town A to Town B), New World treats the entire world as a unified space. The 2020 Travis Scott Fortnite concert [hosted on AWS] which claimed an attendance of 12.3 million live viewers did not have 12.3 million people interacting in the same, synchronous virtual world but rather split up these millions of people into 50 person groups that primarily corresponded to user location [i.e., Epic Games’ player matching engine prefers to match players within the same geographic area to minimize cross-latency].

From Wired: It's a Short Hop From Fortnite to a New AI Best Friend (2019)

Tim Sweeney: It makes me wonder where the future evolutions of these types of games will go that we can't possibly build today. Our peak is 10.7 million players in Fortnite — but that's 100,000 hundred-player sessions. Can we eventually put them all together in this shared world? And what would that experience look like? There are whole new genres that cannot even be invented yet because of the ever upward trend of technology.

Absent pesky things like hardware constraints and the laws of physics [ugh, SO annoying. who agrees??], the Platonic ideal of the Metaverse approximates something like 8 billion people [a good proportion of which would normally be sleeping] concurrently in a synchronous VR space just, like, 🌊vibing🌊 out, man. Pretty sure there’s a Buddhist Sutra like this, minus the VR headsets. In any case, New World accommodation capacity of 2,500 players per “world” is the closest thing we have to this Platonic ideal so far.

Here’s a simplified break down of New World’s ontology:

  • Five regions for New World (”US WEST”, “US EAST”, “SA EAST”, “EU CENTRAL”, “AP SOUTHEAST”), with each region responsible for several of the 100+ “worlds” [See this website for live stats on New World server status and player count]
  • 100+ “worlds” [variable, depending on overall player count within a geographic region], representing synchronous, persistent virtual worlds; the number of “worlds” reached ~500 at one point post-launch, but as user count declined following hype, worlds were “merged” together
  • 14 “blocks”, corresponding to 7 synchronized EC2 instances (2 zones per EC2 instance), per “world”
  • ~2,500 players, ~7,000 A.I. entities, and X00,000s of objects per “world” [note that it looks like per server player capacity was reduced to 2,000 in Sep ‘21]

All the interactions between thousands of players, entities, and objects in each of these hundreds of worlds [to be clear, 2,000 is the current cap but a quick look of the New World Server Status page, depending on when you check, shows servers don’t approach that capacity] requires a lot of compute and produces a lot of data.

You can begin to see why Matthew Ball’s prediction that “In totality, the Metaverse will have the greatest ongoing computational requirements in human history” might end up being true. Relative to the CPC (computation per capita) that Meta’s ~3 billion Facebook MAUs (monthly active users) require, the CPC that a New World player requires is orders of magnitude larger — how many more billions of dollars of cloud infrastructure would Meta require if those 3 billion Facebook users become Reality Labs [or whatever they’re calling it] users, thereby exponentiating both per user compute demand and per user data generation?

To be clear, my 2 cents is that Meta/Facebook won’t be able to monopolize the Metaverse. I think that consumer hardware as the point of integration [i.e., hardware + platform + social graph + existing data, etc.] to capture users won’t work for Facebook because, not only has FB lost the trust of, well, literally everybody, but there are credible contenders in the hardware space that will make AR/VR hardware into an oligopolistic industry before FB is able to capture a critical mass of users that catalyzes a sustainable ecosystem. Furthermore, the creation of digital assets, virtual worlds, and curated communities requires a massively decentralized effort but both creators and users [a distinction that may become increasingly blurry over time as the barriers to entry of being a “Creator” are lowered; in my mind, everyone can become a Creator in the same way everyone became a photographer and blogger post-iPhone/IG/Twitter] don’t like Facebook — they will most likely opt for credibly neutral, platform-agnostic, crypto-enabled ownership of digital assets/worlds that ensures fairer distribution of financial upside and stronger privacy protections because, not only because that’s the economically rational thing to do, but because people hate Facebook. This was the same reason Facebook’s Libra/Diem never took off — the crypto community hated (still hates, but also hated) Facebook, which was obvious if you went to crypto conferences circa 2019 (and that’s on top of simply having no idea how to think about regulatory/legal structure, from what I’ve heard).

I have more thoughts on the value chain of the Metaverse from an modularity-interdependence, profit pool perspective that warrants it’s own primer, so I’ll leave the rest of my thoughts about Facebook’s VR ambitions for later. To be clear, it just so happens that Amazon’s New World is currently the most elucidating case study for the ideas around Metaverse computing demand I’m trying to get across — my thesis around Metaverse compute demand isn’t dependent on any particular game or even the category of “video games” in the traditional sense of the word. The reason I said I agree with Ball’s claim on computational requirements “depending on how you define the M-word” is because the concept of the Metaverse holds the promise of catalyzing a convergence between the physical and digital worlds and what redefining what constitutes a “game”.

As of the time of my writing this sentence [Feb 10, 2022], both the AWS re:Invent breakout session and the interview with Bill Vass (VP Engineering @AWS) are at sub-500 views on Youtube, which is part of the reason why I believe infrastructure requirements of the Metaverse are currently being ignored. The world’s foremost Cloud infrastructure provider has revealed their thinking around one of the hottest ideas of our current zeitgeist and only 500 people are paying attention [Bill Vass also touches upon the idea of NFTs, crypto, and open standards in the interview; you would think more people would be paying attention to Amazon’s opinion on these things].

The idea of the convergence of games, simulation, and reality has long been explored in science fiction, but AWS’s case study exposition during their Dec ‘21 re:Invent “From game worlds to real worlds” session is, to my [once again, limited] knowledge, the first public demonstration of how this convergence would be achieved in real life. The key takeaway from this breakout session is that the underlying technology, infrastructure, and architectural design of massively multiplayer games can be applied to large-scale simulations — for a computer [or, rather, a distributed collection of connected computers], a workload utilizing a physics engine for New World is indistinguishable from a workload utilizing the same physics engine for modeling Earth’s environment.

And, in fact, the distributed architecture that Werner referred to when presenting New World mirrors the distributed architecture that Wesley and Maurizio present as solutions to their respective simulation problems:

  • for Wesley: recovery and national resource allocation in California earthquake scenario ⇒ [AWS + here’s geospatial data + Unity game engine]
  • for Maurizio: 1 million independent A.I. agents pathing through Melbourne and Denver ⇒ [AWS + Cesium’s photogrammetry data + Unreal Engine]

In both of these cases, as in the case of New World, the “world” is partitioned into blocks that are allocated between a distributed set of compute instances [scale out] with the effect of simulating a unified digital space. If/when becomes ubiquitous in 10-30(?) years, it will be this type of cloud-based scale out architecture, most likely in conjunction with non-centralized edge computing devices, that enables expansive, near-synchronous AR/VR experiences for millions/billions of people — short of advances in silicon photonics or quantum computing [at mass scale], this is the only feasible way for the Metaverse to manifest in reality. To be clear, there will be interconnections between different virtual worlds and some of these worlds might even exhibit non-Euclidean logic, [see also: a non-Euclidean VR example; more likely, 3D dimensional worlds that require meshing together blocks along the z-axis will probably become the norm before people start messing with non-Euclidean worlds] but the point is that massively multi-player/agent [agent can be human or A.I.] shared digital spaces will require this kind of distributed, scale out architecture for reliable functioning.

The near-term, logical conclusion of this computationally-enabled merging of virtual and physical realities is manifest in convergent visions of a digital twin of Earth. Microsoft has their “Planetary Computer” initiative continues their pre-existing efforts to increasingly model and parameterize the Earth’s environment for developers using global environmental data and accessible APIs. “Destination Earth” is the name that the European Commission has given to its effort to build a digital twin of the planet in order to run climate simulations. Nvidia has a nearly identical, supercomputer-based simulation project [at the level of press releases, that is] which they’ve dubbed Earth-2, announced the same month as EU’s Destination Earth project.

In what seems like a direct contradiction to the claim that I just made about shared digital spaces requiring distributed, scale out architecture, Jensen posits hosting millions of people inside a scaled up, cloud-native supercomputer. If Jensen is talking about multiplayer experiences with synchronous, mutually dependent, interactivity [i.e., If I kill you in Fortnite before you kill me, you can’t keep shooting me], then synchronizing between millions of people is impossible given latency constraints. However, I think Jensen may be alluding to multiplayer experiences cases where eventual consistency, rather than synchronous mutual dependence, takes precedence — i.e., Millions of people each editing an Omniverse file that corresponds to millions of separate “plots” on the central Earth-2 simulation, with limited or batch-processed interactions between players and player created/edited entities.

Who knows? I certainly don’t. What’s clear, however, is that computational demand begets more computation demand. Jensen’s Earth-2 is not just one model, but a multiplicity of models achieved through “millions of overlays and millions of alternative universes” built by both AI and humans. These millions of alternate universes will ostensibly be accessed by AR/VR headsets which themselves create new use cases for compute workloads beyond [hopefully encrypted, privacy-preserving, unattributable, and anonymized] Cloud-based AI/ML pattern analysis of biometric/eye-tracking/facial data. The rise of Web2 and social media throughout the last decade were fueled by network effects which means that compute demand begot more compute demand after a critical mass of users. Positive explanations for Jevon’s paradox, which has seen previous mentions in the context of cloud computing, have been generally unsatisfactory. I posit [I’m sure this is not a new theory by any means. It’s just that, in my ignorance, I just haven’t seen it clearly stated before] that overall rise in resource usage [and therefore resource demand] is driven by game theoretic, competitive logics. My interpretation of the original coal use case that inspired Jevons’ observation is that technological improvements that drove increased efficiency in coal use increased system-wide coal consumption throughout a range of industries because it became a competitive imperative to do so. Applied to cloud computing, the efficiency gains from cloud computing have ignited competitive pressures between companies to engage in cloud-based, compute-intensive digital transformation. Applied to the conception of the Metaverse as outlined here, the increasing digitalization of our lives may create positive feedback loops that catalyze step function increases in average computation per capita akin to what we’ve already experienced in the past two decades.


Supply: Capex/Capacity Rules Everything Around Me (C.R.E.A.M.)

On capital commitment. Capacity expansion.

From Credit Suisse: The Cloud Has Four Walls: 2021 Mid-year Outlook
From Credit Suisse: The Cloud Has Four Walls: 2021 Mid-year Outlook

There was a brief period towards the beginning of the last decade when the business community (after gifting AWS a healthy seven year head start) finally realized the success of AWS’s business model and the list of potential entrants for the nascent cloud infrastructure industry included now-ignored names like Verizon, AT&T, CenturyLink (now Lumen), HP, Rackspace, IBM, and Oracle, as well as Google and Microsoft — arguably the only successful market entrants. Rackspace was the first to bow out of the market in 2014, focusing instead on providing cloud management services for businesses making the transition to cloud. Verizon, AT&T, and CenturyLink, despite their attempts, were clearly out of the race by 2016 to the applause of many analysts. HP discontinued their public cloud business around the same time. And it is well understood that the last two holdouts, IBM and Oracle, despite kicking and screaming on their way out of the cloud infrastructure business (mostly Oracle), will never be competitive in the public cloud business at the hyperscale that the Big Three are.

Cloud infrastructure is a highly capital intensive industry meaning that “capacity decisions have long lead times and involve commitments of resources which may be large in relation to firms' total capitalization” for companies seeking to be viable competitors in the industry. As Google has repeatedly demonstrated through its continued lossmaking in GCP, credible communication of intent to remain in the industry costs billions of dollars in CapEx and requires that investors be able to stomach prolonged periods of profit losses. Google’s opposite is Oracle [a point highlighted by Ben Thompson in 2018 after Thomas Kurian (an ex-Oracle exec) took the helm as Google Cloud’s CEO], a company which has repeatedly exaggerated their dominance in the cloud but clearly lacks the requisite costly infrastructural capacity to back up their claims — Charles Fitzgerald puts it best in a 2016 blog post titled Sorry Oracle, Clouds are Built with CAPEX, Not Hyperbole.

Fitzgerald’s longstanding coverage of the Cloud through the lens of CapEx, especially his Follow the Capex series, has [to my knowledge] been the most consistent source for reminders on importance of “putting your money where your mouth is” in the cloud via simply spending billions on servers and datacenters.

This chart on annual CapEx spend from a Jul ‘20 article by Charles is a good illustration of what Porter identifies as the “single most important concept in planning and executing offensive or defensive moves” — “commitment”:

From Platformonomics: Follow the CAPEX: Clown Watch
From Platformonomics: Follow the CAPEX: Clown Watch

Google’s ad-driven money printing machine has given the company the cash flow to plough back into GCP infrastructure and make credible commitments towards market entry that can be interpreted by competitors (mainly AWS) as “Give me some space or you’re going to hurt more from a price war than we are” and by existing/potential customers as “Look, we’re here to stay. Don’t believe me? Look at this pile of cash that we’ve been burning for years!” Relative to Google’s “Other Bets”, more conservative shareholders are comparatively ecstatic to see Google investing in their cloud segment given that, at the scale of their existing media and advertising businesses, many innovations end up getting passed down to their internal segments regardless of their ability to sell them as services — e.g., even if they never sell their internally developed in-house VCU instances (hardware-based encoding for uploaded video data, primarily Youtube) to external customers, they operate at a scale that justifies the R&D costs of the chip design. In other words, the cost of process/design/engineering improvements that Google invests in to provide their internal hyperscaled business lines can be amortized over a broader base if they’re able to capture market share in the public cloud and improve the operating leverage on their investments. The question of whether or not GCP has the ability to capture market share and achieve sufficient operating leverage to achieve profit margins that start converging towards AWS’s 25-30% operating margins has remained the overarching concern for investors trying to underwrite the business.

Guidance from management ...

... indicates that, despite negative but positively-trending operating margins, Google is continuing to optimize more for top line growth than for profit. From what I understand, the general street consensus is that there’s no reason (at least from a technological standpoint) why GCP shouldn’t be able to converge towards AWS/Azure level margins eventually, the only question is how long Google intends on spending to expand their footprint versus dialing back on growth CapEx to position their asset base towards profitable levels of utilization [i.e., Continued growth-oriented CapEx spend ahead of expected infrastructure utilization rates continually depresses realized utilization rates] and therefore profitability.

Google’s continued willingness to operate this segment at a loss while still growing their capacity/infrastructure-oriented investment spend is a reflection of how large the company expects the market for IaaS to become by the backhalf of the growth curve for the market — when Google says they think we’re still in the “early innings” for cloud adoption/penetration, they’re really putting their money where their mouth is. Google’s ability to put chips on the table (and chips into data centers) has cemented their status as one of the three hyperscalers in the public cloud infrastructure industry’s triopoly (ex-China), an industry whose barriers to entry can be summarized in one graph:

From Platformonomics: Follow the CAPEX: Cloud Table Stakes 2020 Retrospective
From Platformonomics: Follow the CAPEX: Cloud Table Stakes 2020 Retrospective

It is with this backdrop that we can begin a discussion about risks of industry-wide overcapacity, as outlined by Porter in Competitive Strategy and Capacity Expansion in a Growing Oligopoly, **in this mutually dependent oligopoly — What’s to prevent an overcapacity situation in which “too many competitors add capacity” and “no firm is likely to escape the adverse consequences” among the hyperscaler CSPs?

It’s likely my research simply hasn’t been extensive enough, but I have yet to encounter anyone else investigating this question [please do direct me if you’ve encountered something along these lines] and it’s probably because:

  1. Case studies of industry capacity through a game theoretic lens aren’t in the public domain [i.e., These types of case studies are either academically-oriented (rather than practitioner-oriented) or they exist behind sell-side research portals]
  2. Demand for Cloud infrastructure services [and therefore those inputs that enable the construction of data centers] has been remained perennially strong for the last decade.

The latter point only partly answers my question. In situations where an industry’s firms expect predictably high demand [which has certainly been the case for cloud computing services], economically rational decisions by individual firms to expand capacity and meet expected demand can risk a situation of broader industry overcapacity. However, in the case of the public cloud industry, the ability of firms to overbuild capacity in anticipation of demand has been limited by constraints in the supply of components, and particularly semiconductors, which have emerged as a particularly conspicuous bottleneck during the pandemic. The key thing to note is that hyperscalers have preferential treatment with chip suppliers ...

... so AWS/Azure/GCP will be the last to suffer from semiconductor-related supply constraints. Furthermore, hyperscalers won’t significantly over order (”double book”) beyond what can be consumed by customers because, yes, fabs are able to tell the difference between “real” orders vs double booked orders, but more importantly because the rate of improvement for semis meant a three to four year [now five, after AWS recent server depreciation change] depreciation schedule for servers. In other words, even is TSMC gave the green light and permitted Microsoft or Google to buy four years worth of chip supply in one year in a bid to catch up with AWS, Azure/GCP would find that the servers would remain underutilized if they’re unable to find sufficient customer demand and the problem for them would that they’d be unable to wait 10 years for demand to catch up to their capacity because the chips in their servers would be obsolete by then.

I also want to make a [not so] quick point here about server depreciation given the recent changes ...

... in estimations of useful life for servers and networking equipment by AWS. The common sense explanation for continued extensions in estimations in useful life of hardware is that the rate of innovation present during the early stages of cloud infrastructure expansion has moderated over time — this is not to say there isn’t still innovation [there is, especially given the re-architecting necessary for proprietary chip designs, software-defined networking, data center disaggregation (i.e., DC as unit of compute), and mix shift towards HPC and GPU hardware for AI workloads], but merely stating the obvious that AWS/Azure/GCP had more to figure out on a foundational level 5-10 years ago than they do now. A decade ago, the hyperscalers were still figuring out the optimal designs and configurations for their data center infrastructure and Moore’s Law, as it is traditionally known, was still humming along [See this great FabricatedKnowledge post on “the spirit of Moore’s lives on”] — as all the low hanging Pareto fruit got picked and chip node cycles lengthened as the fabs approached 5nm, it only makes sense for equipment useful life to extend.

But why do they replace old servers in the first place? Surely the servers don’t break down and become completely unusable after three years of use? This was the line of questioning that bothered me to no end, not least because search terms like “server depreciation why physics reason” or “why do servers depreciate” yielded links to accounting standards and financial minutiae that didn’t address the physics/engineering-based realities underlying the derivative accounting concerns. Why didn’t hyperscalers just continue using and operating hardware that was ostensibly perfectly functional, even after three to four years, and shift lower priority workloads to older, less performant servers? James Hamilton saves the day again.

Not only do replacing servers increase speed, but new servers cost less in terms of power/cooling-related overhead because of efficiency gains from higher logic densities between transistors on server chips — if transistors are packed closer together then, all other things being equal, it requires less energy to move electrons on the chip, therefore requiring less energy and producing less excess heat for the same amount of computational work. As Hamilton demonstrates, there comes a point at which bringing new, more efficient [and performant] servers is more economical than relying on old servers.

[From TSMC slidedeck by way of ExtremeTech]: Note the “Speed Improvement at Same Power” and “Power Reduction at Same Speed” improvement categories.
[From TSMC slidedeck by way of ExtremeTech]: Note the “Speed Improvement at Same Power” and “Power Reduction at Same Speed” improvement categories.

What the continued increase in server/equipment useful life indicates is that the economical point at which replacing old servers with new servers is arriving less and less frequently, coincident with the diminishing of power efficiency gains from process node improvements [i.e., second derivative of power efficiency improvements from 28nm→22nm→ ... →7nm→5nm is negative]. Furthermore, continued moderation of DC-related depreciation expenses has room to run as hyperscalers pursue disaggregated data center architectures [”Comparing to the monolithic server approach that data centres are being built now, in a disaggregated data centre, CPU, memory and storage are separate resource blades and they are interconnected via a network fabric”], thereby enabling more granularity in equipment refreshment:

That is, instead of having to replace an entire server blade (CPU + memory + fans + chillers + etc.) every four years, DC operators can, for example, choose to replace only the CPU after four years while keeping the fan for another three years of use.

One more tangent on what I originally intended to be a “quick point” on server depreciation, which is that while the hyperscalers don’t publicly break out what proportion of their DC CapEx is spent on Refreshment vs Expansion, I think it’s a useful frame for thinking about hyperscaler CapEx spend.

From McKinsey: How high-tech suppliers are responding to the hyperscaler opportunity
From McKinsey: How high-tech suppliers are responding to the hyperscaler opportunity

But back to capacity, what analysts have been concerned about recently [and what I personally need to do more research on] is the question of whether or not hyperscalers are subject to capacity constraints from global supply chain issues. Although buyers of server chip, especially those who end up supplying hyperscalers, get preferential treatment from fabs ...

... constraints that may even impact the hyperscalers.


Subscribe to 0x125c
Receive the latest updates directly to your inbox.
Verification
This entry has been permanently stored onchain and signed by its creator.
More from 0x125c

Skeleton

Skeleton

Skeleton