Part 4 of Planetary-Scale Computation: An industry primer on the hyperscale CSP oligopoly (AWS/Azure/GCP):
Table of Contents for Velocity and the n-body problem:
The fundamental axiom of economics is the human mercenary instinct. Without that assumption, the entire field would collapse.
— Cixin Liu, The Dark Forest
Let’s imagine a fictional place — we’ll call it “ECON”.
Imagine that we’re playing through the starter tutorial for an immersive, multiplayer VR game from an alien civilization called The n-Body Problem.
From our VR headset, we log into a command center in which we can observe this fictional game Universe which operates under Universal laws that are different but similar to the laws under which our own Universe operates. In this fictional Universe, matter is either dumb, smart, or dark — dumb matter passively adheres to the same laws of physics that exist in our Universe, smart matter is able to actively exert force to shape its surroundings and form connections with other smart matter, and dark matter is an inferable but unobservable theoretical construct that is some function of the dumb and smart mass within a system. Furthermore, dumb matter can be converted into smart matter and smart matter can degrade back into dumb matter, but trends indicate that the ongoing transition from dumb matter to smart matter is a monotonically increasing function.
Gamers here can make a career out of placing bets on the amount of dark matter that they estimate is present in particular regions of Space. Since different market participants have differing information and there exist various methodologies for evaluating the measure of this theoretical construct, [some like to measure flows, others estimate using multiples of comparable masses, and even others just buy and sell whatever they see is popular] there exists an active in-game financial market of people expressing their opinons through buying/selling at whatever price others are willing to sell/buy.
Our observation of this otherrrr Universe is limited (so far) to a single galaxy which is composed of various stellar systems. In this galaxy, which has been named “ECON”, there exists one particular galactic sector, the “IT” sector, that has been growing faster than the other sectors in ECON through its faster absorption of surrounding matter in Space. As an aside, the ECON galaxy itself has been accumulating mass and energy from the broader Universe at a rate of 5-6% per year (Earth years, that is).
The defining characteristic of this “IT” sector is the relatively high ratio of smart mass in the system, mass that seems to be forming connections to the concentrations of smart mass (smart mass communicate by streaming patterns of electrons at each other) in the Economy’s other systems and is also accelerating the pace at which stellar system “IT” is gathering matter. While ECON has historically exhibited disperse concentrations of carbon-based smart matter, there has been a relatively recent burst of silicon-based smart matter in “IT” that has better computing, networking, and storage capabilities than the carbon-based smart matter that still dominates ECON (we suspect, too, that there might be other forms of smart matter in this Universe that exists outside of our current observational capabilities). Observations show that larger, concentrated masses of carbon-based smart matter have historically utilized and instrumentalized silicon-based smart matter to communicate, but increasing concentrations of silicon-based smart matter seem to be networking and communicating without carbon-based mediation, leading some to believe that silicon-based smart matter will one day replace carbon-based smart matter.
In the game’s lore, only a few decades following the formation of the “IT” sector, observers noticed an interstellar Cloud (which we’ll just call “The Cloud”) within the “IT” sector that contained the highest mass and concentration of highly networked, silicon-based smart matter observed in ECON. Whereas the Total Astronomical Matter (TAM) of IT is growing at a rate of only 5-10% per year given IT’s already large size, the size of the Cloud is growing at a rate of around 20% per year through the steady absorption of less connected, less smart matter within “IT” — some people predict that the Cloud will eventually engulf the entirety of “IT”, leaving some observers with the view that the TAM of “The Cloud” will converge towards the TAM of IT.
The Cloud’s center of gravity is comprised of three bodies of highly concentrated masses of smart matter, both carbon and silicon-based — the first of these bodies to coalesce is currently the largest of the three with the second-largest body possibly approaching the size of the first and the smallest body lagging behind the other two in terms of mass. The carbon-based smart matter of these three bodies are exceptionally proficient at the rotation of variously shaped silicon-based matter required for the instrumentalization of silicon for computation and so these carbon-based masses are widely known as “shape rotators” to observers. While shape rotators exist throughout ECON, there seem to be especially high concentrations of shape rotators within the Cloud and particularly at these three compan-, I mean celestial bodies.
Normally, a system of three sufficiently large celestial bodies comprised of dumb mass would devolve into a chaotic system but since these three bodies are smart, they exist in a stable configuration because, unlike theoretical three-body systems in physics textbooks in which unchanging mass bodies indiscriminantly follow the physical laws of motion and gravity, these smart masses in the Cloud follow competitive and economic laws that seem to be maximizing for the quanta of dark matter (darK matter henceforth referred to as a capital “K”) that the bodies possess. These three bodies are pushed and pulled by each other, constantly jostling for a position that gives them access to more flows of smart matter (carbon-based and silicon-based) as well as dumb matter to convert into networked silicon-based smart matter that feeds into the maximization of their bodies’ overall estimated measure of “K”.
Stellar systems in other galactic sectors like CPG, FIG, and NRG (particularly in the OIL subsector of the NRG sector) increasingly network their internal smart matter with the Cloud’s smart matter, utilizing information derived from the Cloud’s computational capabilities to better accumulate K for their own sectors of the ECON galaxy and transferring a portion of their increased flows back to the Cloud as payment for utilizing the Cloud’s smart mass — the Big Three are increasingly competing with each other to connect with mass bodies in other sectors of ECON.
The overall mass of the Cloud, despite being anchored by the Big Three, is comprised of assemblages of other masses which in aggregate are comprised of two to three times more K than the Big Three despite having much less matter, both dumb and smart. These dematerialized, “softer” bodies — dubbed Interstellar Volumes, or ISVs — began emerging in the Cloud in the wake of the Big Three’s formation, orbiting around the Big Three to utilize computational resources and sell resultant information to other sectors of ECON whilst simultaneously attempting to avoid getting too close to the trio lest they risk being devoured. While each of the Big Three would prefer not being disintermediated by these softer satellite bodies, each of the three are too preoccupied in competitively positioning against the other two to also fight back against the soft bodies which end up paying to use their computational resources anyways. It should be noted here that other mass bodies with large concentrations of silicon-based smart matter exist in the Cloud but these other compani-, masses, have decided that selling the utilization of their silicon isn’t the optimal path towards accumulating K. The Big Three are therefore referred to as the “Public Cloud” in that they network with and sell to whichever mass bodies will pay them for use of their silicon-based smart matter.
Each of these n bodies in the Cloud, from the dematerialized soft masses to the Big Three, are variable-mass bodies seeking to maximize their own long-term K and are constantly engaged in positioning themselves to benefit from flows of information and matter that serve this purpose, potentially at the expense of other mass bodies in this ecosystem — this is the Cloud’s n-body problem.
Observers of ECON from our Universe place bets on the predicted K value of the various mass bodies in ECON and use historical measures of position, velocity, and mass with their knowledge of ECON’s laws of motion and gravity so as to try and predict values of K better than other observers. Financially interested observers are motivated to iterate on their calculations of these masses’ dynamically-adjusting velocities because it informs them as to whether or not their favored mass bodies might eventually occupy a strategically enviable position that precludes existential risk (i.e., collision with other bodies, consumption by a much larger celestial body, etc.) while also maximizing for the accumulation of matter that results in larger K values.
During the Cloud’s nascent stages, prevailing consensus among observers was that the Big Three would end up sucking up all of the Cloud’s matter leaving no room for independent softwar- softer mass bodies within the Cloud. However it turns out that the Cloud’s smaller and more numerous independent bodies are able, through the inherent nimbleness from dematerialization and relatively smaller size, to leverage the computational infrastructure provided by the Big Three and quickly reposition themselves in order to exploit competition between the Big Three and stay competitively viable in their own niches. The predictability of the competition among the Big Three has enabled the rise of an ecosystem of other masses in the Cloud that don’t need to construct their own silicon-based smart matter because the central three-body structure ensures favorable terms that wouldn’t exist if the Cloud was dominated by a single body of mass.
But despite the Cloud’s structural stability each of the ecosystem’s mass bodies are seeking relative advantage and reforming their chains of flows and connections to accelerate in trajectories with more matter to absorb and more unintermediated connections with masses in other sectors. Some of these n bodies even form connections with each of the Big Three and seek position themselves as neutral, equidistant intermediaries that try to utilize select aspects of each of the Big Three’s computing capabilities, siphoning off a proportion of flows for themselves. Each body seeks to engulf the connections and flows of other bodies (observers of the Cloud like to say they try to “eat” each other) for their own benefit, in order to reconstruct the networked chain of flows to their own advantage and minimize their own contributions to mass bodies that aren’t their own. Among the more consequential celestial events that are ongoing within the Cloud ecosystem are ...
... and much, much more.
At this point the tutorial ends and a line of text appears:
For observers of the Cloud, competitive and economic principles serve as the foundation for fundamental analysis of the motion, dynamic masses, and changing positions and velocities of the n-bodies.
The goal of The n-Body Problem is this: Use your knowledge to predict the movement, positioning, and mass of these celestial bodies.
We invite you to log on again.
What does it mean to “eat”? Who’s eating who?
As penance for the contrived allegory I subjected you to in the previous section, let me begin this section in concrete fashion. Whether explicit or implicit, most discussions about the Cloud industry can usually be reframed as discussions about Christensen’s law of conservation of attractive profits and specifically about which stages and subsystems of the technology value chain are undergoing commoditization versus decommoditization — companies that are well-positioned in stages that are being commoditized are unable capture the value of the revenues that flow through them; companies that control differentiated/decommodified, “interdependent links in the value chain capture the most profit.”
Discussions around whether the cost of cloud represents a Trillion Dollar Paradox, questions around “Who eats who?” between Cloud providers and open source software, proclamations about The Tech Monopolies Going Vertical (more on this later), or predictions about the reshuffling of the Cloud (i.e., reshuffling the Cloud’s value chain) — these are all discussions about the relative power of different factions within the Cloud ecosystem and how one faction’s relative power is de/increasing, thereby justifying lower/higher margin capture along the value chain. When people talk about X eating Y, what they’re ultimately talking about is X treating Y as a modularized, “good enough” commodity and integrating Y into X’s subsystems into a new interdependent, differentiated value proposition.
Eat and be eaten.
Marc Andressen’s contribution to tech canon in the form of his Why Software Is Eating The World, while unfortunately not giving an explicit definition of what it means for X to be ”eating” Y, leaves hints as to what viable interpretations of what it means for software to “eat” the world — in the article he writes that “Software is also eating much of the value chain of industries that are widely viewed as primarily existing in the physical world.” and also refers to markets and industries being eaten by software.
It was difficult to portray “eating” in my n-body allegory because the closest analogy to eating in a celestial context [that is, when black holes consume stars in a process usually accompanied by spaghettification] is inappropriate for the “eating” being done in the Cloud, which is the simultaneous commodification of competitors’ offerings [as well as the offerings of players in adjacent value chains] and reintegration of proprietary solutions into interdependent subsystems along the value chain — the three/n-body analogy wasn’t well-suited to illustrating this idea. A celestial body analogy that is more appropriate but impossibly contrived would be if the company with control over an interdependent link in the value chain was represented by a black hole that was continually spaghettifying [ayyy i’m making up words ova here🤌] the outer gaseous layers of multiple stars [”stars” that are themselves gathering flows of mass from other sources], with the stars representing companies/subsystems being commodified and mass accumulation by the black hole serving as the metaphorical value capture along the value chain. In other words, the more accurate analogy would be a dynamic, fractal web of value chains.
The “stars” can live on indefinitely, their growth will just be limited by the black hole capturing most of the excess mass in the system. Therefore Y doesn’t have to completely vanish to have been “eaten” by X (though this can be the case as in Andressen’s example of Amazon bankrupting Borders). X bankrupting Y isn’t necessarily X “eating” Y, nor is X acquiring Y equivalent to X “eating” Y — IBM didn’t “eat” anything through its aquisition of Redhat but AWS has certainly taken big bites out of open source without bankrupting or acquiring companies that monetize through OSS. Rather, X can “eat” Y by keeping Y alive in order to produce increasingly commodified modules that X reintegrates into X’s own processes and subsystems in order to capture value/margin along the value chain. The symbiote-parasite spectrum (where the one doing the “eating” = parasite) is a more useful analogy than predator-prey. Cloud “eating” open source does not mean AWS or Azure write a new operating system kernel to replace/”kill” Linux, nor does multi-cloud [tools/software/companies] “eating” cloud mean that the cloud dies in any world that multi-cloud ends up succeeding. That which is “eaten” continues to live on, earning returns that theoretically, eventually converge towards their cost of capital until fortunes reverse or the host dies and the “parasite”/symbiote finds other nutrient sources.
[Sidenote: The word “parasite” has negative connotations but this is wholly unintended; AWS commodifying basic compute through its “parasitic” use of OSS has unlocked untold surplus value into the world and, like, what else is OSS there for if not for enabling the actual democratization of large-scale computation?]
So the “eating” metaphor is imperfect but intuitive in much the same way that the “Cloud” is, and any phrase with the “Cloud” “eating” this or being eaten by that ends up sounding like technobabble for people who don’t already have mental models of the concepts underlying “Cloud” and “eating” at their disposal. The purpose of this exposition was to attempt to draw connections between the “X eating Y” throughline with the commoditization-differentiation & modularity-interdependence frameworks we’ve been working with so far. Looking with a Christensonian lens, what all of these articles/posts/tweets ...
... are saying when they say “X is eating Y” is just that X has modularized Y and Y is becoming increasingly commoditized so that the majority of excess returns will eventually go to X instead of Y if the dynamic is left unchanged.
To be clear, I’m not saying that “Cloud eating Software” means that software companies sacrifice all of their margins to cloud providers [they don’t, despite IaaS costs constituting 50+% of SaaS COGS; also I don’t think cloud is eating software but software is becoming cloud-like] but that this would be the case if the dynamic was left unchanged and if not for the countervailing trends of “Multi-cloud eating Cloud” or “OSS eating IaaS/PaaS” and therefore helping to modularize clouds infrastructure at the same time that Cloud is eating Software. These are continual processes that offset and interact with each other.
Since nature doesn’t typically exhibit examples of animals eating another animal while itself being eaten by two other animals (who are, themselves, being chewed on by even others), this “eating” example is only accurate if the predator-prey relationship is unidirectional, totalizing, and discrete — it never is. A more faithful representation of the competitive reality might be a fractal web of celestial mass bodies, each of whom are being spaghettified while also spaghettifying others, with the mass body gaining mass at the fastest rate representing the player that controls interdependent links in the value chain — but that’s hardly a useful marketing metaphor that people can instantly get like “eat” and “cloud”.
Instead, here’s what I think might be a more useful and explanatory representation of the Cloud ecosystem that is inspired by this graphic of The Dis-Integration of the Computer Industry featured in both Skate to Where the Money Will Be and The Innovator’s Solution ...
... that:
This diagram on the right uses the hyperscale triopoly as the central frame of reference in order to illustrate the various fronts of competition that are occuring — from the perspective of the hyperscalers, they are being challenged ...
This diagram is essentially a kind of Porter’s Five Forces [buyers, suppliers, substitutes, new entrants ⇒ S&P500, NVDA, SNOW, NET, respectively] that the value chain decomposition diagram elaborates upon by identifying exactly where along the value chain the different players have competing interests with respect to modularization/commoditization vs interdependence/integration.
Here’s how to read the value chain diagram:
Let me elaborate on what I think might be the most confusing point here, which is that the red or blue blocks for each player is an indication what they’re hoping to make interdependent or modular. “XYZ Co.’s” [representative of Cloud customers] blue block is an indication that they would prefer for the Cloud to be modularized and commoditized, although things won’t necessarily turn out that way. Same story with AWS and Azure’s solid red block — both these players would ideally be the beneficiaries of an interdependent Cloud value chain for which they are the primary suppliers for a fully integrated solution but the realities of competition will be the ultimate determinant.
While the value chain diagram can certainly be improved and iterated upon and you could easily argue that I’ve mischaracterized at least a couple of things, I think much of the utility comes from the framing and making explicit and visible the divergent interests within the cloud ecosystem. Mapping all of this out makes it easier for me to now finally write out my views on Cloud ecosystem without having to provide qualifications for every point for fear of not providing sufficient context:
[Sidenote*: These points don’t map perfectly (i.e., one-to-one) onto the “areas of competing interest”*]
Cost of cloud. ISVs vs hyperscalers. Partners.
Here’s the “cost of cloud” debate ...
... between [what I’m facetiously dubbing] the Repatriarchy and the Hypershillers [primarily embodied by Martin Casado and Zach Kanter, respectively] in broad strokes:
... because repatriation (and more specifically, designing in optionality/flexibility through modular, vendor-agnostic design) can actually free up resources to pursue more growth opportunities. I should be explicit at this point for those who haven’t read or forgot the original a16z article and say that the “Repatriarchy” isn’t arguing for or against repatriation, rather that “infrastructure spend should be a first-class metric” and that the “trillion dollar paradox” is centered around managing this cloud spending throughout the company lifecycle. As stated by Sarah Wang and Martin Casado, the paradox is this: You’re crazy if you don’t start in the cloud; you’re crazy if you stay on it.
Customers have historically had a frenemy relationship with the major public clouds, especially tech-first independent software vendors (ISVs) who essentially outsource most or all of their infrastructure needs to AWS/Azure/GCP whilst, at the same time, potentially competing with them. The “won’t AWS just copy it?” narrative is well-trodden territory given that it’s been in the minds of investors for over a decade now. It’s clear now, however, that ISVs have particular advantages relative to the hyperscalers ...
ISVs now inhabit niches in the Cloud ecosystem that are somewhat defensible from the hyperscalers, who have been so engaged in maintaining competitiveness against each other that maintaining dominance over the entire ecosystem has become untenable — this is the nature of an “ecosystem”. To Offringa’s point on “the spectrum of differing approaches”, the strategic calculus for hyperscalers has increasingly become one in which GCP thinks to iself “If I can be the best ecosystem partner for OSS/ISVs then I can take share from AWS and Azure” and AWS thinks to itself “I’ll lose out if I’m not a good ecosystem partner to ISVs and Azure and Google Cloud are.” Although many software products compete with the hyperscalers’ offerings, if customers are going to use Snowflake or MongoDB anyways, it might as well be through their infrastructure rather than their competitors’ infrastructure. In a similar way to when a certain European country was too busy fighting on two separate fronts to invade Switzerland and ended up partnering with them instead, the hyperscalers’ détentes with ISVs in order to combat one another have evolved into a full-blown partnerships. As to be expected from the disruptor with the least to lose, Google outlines this relationship in the relatively most straightforward manner:
... which, well, seems to have more layers to it. For example, AWS feeling the need to say that “Our fundamental strategy around our partners is unchanged” might be reinterpreted as AWS’s tacit recognition of how “partners” now have a level of strategic leverage that needs to be negotiated with. Selipsky’s answer to “Well, are you going to compete with your ecosystem?” is clearly not an unequivocal “No”. Selipsky’s instead answers that “there are many, many thousand of explosive opportunities for partners in our ecosystem” which I read as “Yes. Yes, we are going to compete with those in our ecosystem who are areas that we consider to be strategic to AWS.” This is not me passing a vaue judgement on AWS — if you’re an AMZN bull then these interpretations might be reassuring if your long thesis depends on AWS exerting control over the SaaS layer [or they might not if your long thesis depends on AWS becoming a good ecosystem partner, it all depends].
It should be noted both of these public explications on partner strategy by Kurian and Selipsky are from late 2021 (September and late November, respectively) which, along with Satya’s heavy emphasis on partner ecosystems throughout 2021, can be interpreted as hyperscaler recognition of the strategic need for more allegiances. Once again, these types of competitive dynamics are only possible because of cloud infrastructure’s oligopolistic structure — AWS would not be talking about a “partner ecosystem” in the same way if they had a sole monopoly over cloud infrastructure, nor would there be as many ISVs to populate the ecosystem in the first place.
This industry-wide frenemy dynamic has the effect of contributing to the modularization of cloud infrastructure from cloud-based software and, in my estimation, acts as one of the motivators for hyperscalers’ continuing efforts towards vertical integration — this is the focus of the next subsection.
On backwards integration. On forwards integration. Where everyone is going.
As discussed previously in Primer on the Economics of Cloud Computing, the decision by what are now known as the Cloud’s hyperscale infrastructure players to begin selling cloud services represented a decision to integrate backwards in the value chain for Internet-based services [those services being e-commerce for AMZN, search for GOOG, and software + pre-existing Windows-based server management business for MSFT] — from the very start the hyperscale cloud infrastructure business has been an exercise in backwards integration, an exercise that has continued further backwards as the hyperscalers have increased their scale.
Hyperscalers [and particularly AWS given their longer operating history] quickly realized that equipment vendors upstream from them were unable to meet their specific needs and that they themselves were beginning to get “large enough to support an in-house supplying unit large enough to reap all the economies of scale in producing the input internally”, and so they did. Here’s James Hamilton at a 2016 AWS re:Invent keynote talking about his realization that no switch gear manufacturer could meet their data center needs and that they had to simply do it themselves:
[Sidenote: Hamilton is talking about how their old DC switch gears didn’t switch from utility power to their backup generators in the event of certain circumstances that weren’t relevant or important to AWS. Their proprietary switch gear control system is called AMCOP. The entire keynote is worth watching if you’re interested in cloud infrastructure.]
In the same keynote, Hamilton talks about the operational benefits of their decision to integrate backwards into their networking equipment ...
[18:35] Hamilton: Second thing is, okay, you’ve got a problem, what do you do? Well if a pager goes off we can deal with it right now. It’s our data centers, we can go and figure it out, and we can fix it. We’ve got the code, we’ve got skilled individuals that work in that space — we can just fix it. If you’re calling another company, it’s going to be a long time. They have to duplicate something that happened at the scale I showed you in their test facilities — how many test facilities look like that? There’s not one on the planet.
... through their Nitro networking ASICs, namely that they don’t have to depend on third party suppliers in the event of problems, problems that these suppliers wouldn’t even necessarily be able to solve [and certainly not as effectively as AWS could internally] given their lack of experience in the scale and operational complexity within AWS’s datacenters. Furthermore, backwards integration into networking equipment through Nitro means that AWS have full control over the direction and speed [i.e., velocity] **of their design for all the components of their interdependent architecture. At the AWS re:Invent 2020 Infrastructure Keynote, Peter DeSantis talks about how backwards integration allows them to “innovate more quickly” in networking ...
Charles Fine’s “Clockspeed” thesis is that this ability to dynamically choose and execute on design and engineering competences across a company’s supply/value chain is “the meta-core competency” and “the only lasting competency”:
In line with Christensen’s theory that integration/decommoditization and modularization/commoditization are reciprocal processes within a value chain, the disintegration of suppliers [upstream/backwards/down the stack] from AWS resulted in their modularization and altered the basis of competition in favor of [increasingly concentrated] buyers (i.e., the hyperscalers):
If you’re reading this section in conjunction with the other sections in Three Body, then the first quote can be read as “Look! We’ve modularized networking ASICs!” and Hamilton’s second quote should remind you of our discussion around Christensen’s thinking regarding modularity vs interdependence. Hamilton even invokes the same mainframe example, which isn’t a surprise given that, like I’ve said, an industry analysis of cloud computing is the natural extension of the computing pre-cloud with respect to Christensen’s modularity theory. Hamilton’s observation that “as soon as you chop up these vertical stacks, you’ve got companies focused on every layer” was relevant in the Mainframe vs PC/Server era, is relevant in AWS’s backwards integration into networking (and other areas of infrastructure), and ,if you’re of the opinion that the vertical stack on the proverbial chopping block now is AWS’s stack, continues to be relevant.
The failure of AWS’s equipment vendors to react to the needs of what would eventually be a significant and highly concentrated buyer segment through uncomfortable modularization is reminiscent of Intel’s failure to meet Apple’s iPhone chip needs over a decade ago ...
... leading to what we can now recognize as a situation of modularization at the processor level (thanks to TSMC’s enabling of fabless chip design) and reintegration at the SoC and SiP level, optimized [in terms of compute AND power consumption] for Apple’s proprietary software. The modularization of manufacturing [enabler = TSMC] and design [enabled = Big Tech, NVDA, AMD, chip start-ups, etc.] has led to the gradual disintegration of Intel into what it is now, which is, in many analysts’ opinions, a company better off separated in the way AMD was spun off from its manufacturing arm, GlobalFoundries, in 2009 [and the stock performance of AMD relative to INTC over the past decade speaks to the success of AMD’s decision]. Intel’s disintegration is in line with Christensen’s thinking regarding modularity-interdependence reciprocality within value chains — AAPL has found a defensible point of reintegration in software+hardware whereas INTC’s former point of integration in design+manufacturing is no longer defensible. This tangent matters to our discussion about the Cloud because Intel, which has historically had a monopoly in the CPU server chip market ...
... is now experiencing pressure from TSMC-enabled, fabless designers in the form of AMD and NVDA [NVDA dominates the GPU market for both servers and PCs and announced its ARM-based Grace server CPU effort in 2021] as well as the hyperscalers themselves who are eschewing Intel’s x86 architecture and opting instead for ARM-based architectures which are both cheaper to license and more power efficient, resulting in lower server TCOs overall. Furthermore, since Intel has clearly fallen behind in their advancing their 7nm (roughly equivalent to TSMC’s 5nm) process node and TSMC is already working on their 3nm process, hyperscalers also have a performance [not just price] argument to make in favor of continuing their design efforts as well — backwards integration is the only way to maximally ensure that the velocity of product improvement stays in sync across the stack.
Therefore the situation for all the hyperscalers is one in which they design, manage, and self-service nearly every component in their infrastructure stack ...
... leading the to christening of The Tech Monopolies Go Vertical narrative which predicts the acceleration of Big Tech’s continued backwards integration into semiconductor design:
This is a prediction which seems to be coming true, particularly for the hyperscale cloud players who have no choice but to seek new points of integration in order to stay competitive within an ecosystem that continues to bifurcate profit pools between the infrastructure and software [and above] — hyperscalers earn 25-30% operating margins +/- [whatever pricing pressure results from f(industry capacity vs demand, semi R&D costs, intensity of intra-CSP competition for market share vs margins, residual factors)] on integrated IaaS+PaaS (where PaaS is more or less a commoditized complement) plus margins from full stack opportunities (forward integration, industry focus, value-add AI/ML services) in the steady state; ISVs and SaaS earn whatever margins are justified given lower barriers to entry with industry margins averaging higher than IaaS but with higher variance across participants.
The nearly complete integration of infrastructure along the value chain by hyperscalers implies a reciprocal process of modularization of those aspects that the hyperscalers have integrated. When Hamilton talked about “chopp[ing] up these vertical stacks” in reference to the server networking market, he was talking about modularization of networking equipment. This reciprocal process is also present in the CPU market, for in hyperscale data centers what is important is the entirety of the interdependent architecture that is the data center, and not any particular component.
In a market with dispersed buying power, the suppliers can dictate the standards/basis of competition and expect that their customers will have square holes for the suppliers’ square pegs. The rise of hyperscalers have meant that the buying power in the data center market has become concentrated, with the hyperscalers now dictating the terms and standards by which suppliers will compete for their business — AWS does not care about the sticker stats on Intel’s chips, they care about how well the modules perform as an interdependent whole. How well Intel’s CPUs or Cisco’s switches and routers adhere to the standards and specifications of AWS/Azure/GCP data centers matters more now than when buyers were more fragmented and they could them to take it or leave it.
It also helps the hyperscalers that, even on the module level, proprietary designs are outperforming designs from suppliers.
AWS wants to convince their IaaS customers, through better price/performance, to rearchitect their software to be compatible with their ARM-based chips rather than Intel’s x86-based server chips. In many (most?) cases, customers can’t necessarily just flip a switch to go from x86 to ARM: low level code needs to be recompiled, software packages need to be rewritten, and things need to be debugged and tested. This case study and one-year retrospective by Honeycomb, an AWS customer, as well as this 2020 re:Invent deep dive on Graviton2 gets into some of the gory details of rearchitecting for AWS chips, details that I have no business writing about since I’m neither a developer nor software engineer.
The necessity to rearchitect workloads for ARM helps to explain why AWS has promoted their EC2 M1 Mac instances so strongly — since Apple’s M1 is ARM-based, any rearchitecting that developers and engineers do to optimize for the M1 instances will translate over [at least to some degree] to AWS’s ARM-based Graviton instances. Of course, whatever developers and engineers rearchitect for AWS’s and Apple’s ARM chips will also be compatible with the ARM-based designs of other players [e.g., Microsoft’s newer, ARM-based Surface PCs] and vice-versa.
Before we move forward [haha get it? because forward integration?], there’s a point I want to make about the hyperscalers’ backwards integration that I haven’t yet seen articulated, which is that I’m not exactly sure if proprietary chip designs constitute true differentiators. Yes, the astronomical amounts of CapEx and R&D spend certainly constitute high barriers to entrance into the Cloud infrastructure industry, but whether or not these investments translate into differentiation within the industry is something I’m not as convinced about. The price/performance and TCO measures for proprietary chip designs indicate a cost strategy rather than a differentiation strategy. That is, if Microsoft hypothetically comes out with an ARM-based server chips that perform just as well on a price/performance basis as AWS’s Graviton chips, can AWS really claim that Graviton is a differentiator? I’m not so sure.
The reason it Feels Like They [i.e., the hyperscalers] Only Integrate Backwards is because tailoring general purpose cloud services into tailored solutions for industries/verticals isn’t as conspicuous a case of forwards integration as, say, a material goods manufacturer integrating into distribution might be, even though that’s exactly what the hyperscalers are doing by moving downstream the value chain and up the stack [reminder that moving downstream the value chain is isomorphic to moving up the tech stack]. Hyperscalers offering so-called “industry clouds” can be viewed as a manufacturer [of C/N/S services through their infrastructure networks] integrating forwards to better couple this manufacturing capacity [Reminder*: “A datacenter is a factory that transforms and stores bits”*] with marketing and distribution.
If “the Cloud” is a marketing term, then “Industry Cloud” is (marketing)$^2$ — for many industries, the “Industry Cloud” designation is purely virtual construct. For example two of Microsoft’s [currently, this number will inevitability rise] six industry clouds are Microsoft’s Healthcare Cloud and their Financial Services Cloud but the distinction, from a materialist perspective, between these two “Industry Clouds” is nonexistent — both their Healthcare Cloud and Financial Services Cloud will run on the same pool of servers [granted, the FS cloud will ostensibly utilize a higher proportion of GPU-based instances for ML workloads] and transport data down the same sets of dark fiber, thereby operating on the same exact “Cloud”. The only significant differences between providing cloud services in the Cloud for Healthcare and the Cloud for Financial Services is how engineers and systems architects design for compliance and security [and even then, it’s not like the encryption algorithm cares if the encrypted string is my blood type or the password to my checking account] and how internal salespeople and external consultants (i.e., ”partners”; “global systems integrators”) with industry expertise market the services to companies in each industry.
But, well, there actually are industry clouds after all, because marketing matters and, even if the engineers and solutions architects at the Big Three know that there aren’t “Healthcare” electrons or a “Financial Services” electrons in NAND charge trap cells, the Healthcare VP in charge of allocating her department’s IT budget doesn’t care about any of that nonsense. She cares about providing healthcare services and the Cloud is merely a means to that end. This is essentially the evolution in internal culture/philosophy that Google Cloud brough Kurian in to catalyze — having the “best” product/service isn’t better than a “good enough” (in the Christensonian sense) product/service that the customer can actually understand.
If the hyperscale CSPs have been selling picks and shovels to tech-savvy, software-oriented organizations in the form of modular primitives, then they are now increasingly doing the mining and doing the digging themselves for less tech-savvy, larger enterprises that don’t have enough miners and diggers because the hyperscalers and tech-first companies hired them all out of digging and mining school. And AWS/Azure/GCP have all converged on this realization to the point where it’d be difficult to identify the hyperscaler by their publicly expressed strategy around Industry Clouds. No, seriously, try guessing which hyperscaler said what.
Let’s Play Guess The Hyperscaler!
“It's great. I mean, we see our go-to-market, first of all, in three or four different lenses. One, we have shifted as an organization from talking about technology and shifted from technology to products to solutions. Customers want to have us understand their business, the opportunity to transform their business, and then provide solutions to their business problems as opposed to coming in and talking about how great our technology is. So that's been a big change in selling methodology and approach.
A second thing that we've done is we've organized our sales organization around industries because different industries have different needs. In financial services, for example, the use of data may be for fraud detection, may be for regulatory reporting, may be for financial market simulation. In contrast, in retail it may be about personalization, inventory management, supply-chain optimization, etcetera. So we've segmented our sales force by industry so that we can build greater competency in understanding customer needs.”
“And let me also just one more comment, when I say we're going after verticals, I want to make sure I'm also clear on this. It is partnering, with SIs, ISVs, basically our customer. We're not competing with our customers. There are other clouds that do that. They actually think, gosh, I should just go capture that margin, looks like a good opportunity. In our case, we're going to continue to partner. But I think there is a huge opportunity for us to go into that space and help enable that ecosystem.”
“At the same time, [REDACTED] says, [REDACTED] needs to continue to expand vertically as well, by providing more complete solutions for specific industries such as health care and manufacturing. To do so, the company is bringing its services to the edge of the network, including traditional data centers, the factory floor, and even the field, and establishing an operating model that integrates [REDACTED] more deeply into businesses in virtually every industry.”
“Not surprisingly, companies are also looking for more remote and edge use cases. The pandemic magnified the idea that the cloud must move to where the people and applications are – at the edge. More and more use cases are emerging such as cloud in cars, cloud in factories, cloud in farm equipment and so on. [REDACTED] wants its cloud to be everywhere. “We are going to work aggressively on ‘internet of things’ solutions, on ML solutions and these horizontal use cases and applications as well as bundle it together in ways that are attractive to solving customer problems,” said [REDACTED].”
Check your answers via this toggle Notion block.
Each of the three hyperscalers have made every point in each of these quotes at some recent (last 2 years-ish) conference/interview or another — they’re all ...
... and they all have explanations why they’re the best at all of these things ...
... because it is these five factors (integrated solutions, sales focus, partner ecosystem, deployment mode, AI/ML) that are common to all of their Industry Cloud strategies. The importance of AI/ML capabilities as a selling point for these hyperscalers cannot be overstated — provision of AI/ML capabilities is the primary point of differentiation between the hyperscalers and their prospective enterprise customers not only because the former contains world-class integrated hardware and software for AI/ML, but also because the latter has a lack of human capital capable of assessing and utilizing these new tools relative to Big Tech companies who act as black holes of talent.
From the perspective of the interdependence-modularity framework, not only can AI/ML serve as a demand driver for increasingly commoditized basic C/N/S services, the provisioning of AI/ML services as industry-focused “solutions” enables hyperscalers to reintegrate commodified C/N/S with differentiated software [reminder that AI/ML is technically software], differentiated hardware [e.g., Google’s TPUs, AWS’s Trainium and Inferentia], standardized development frameworks [Google has Tensorflow, AWS has partnered with Facebook on Pytorch], and 1P/3P consulting that integrates this technology into the systems of enterprises [hence the term “systems integrators”] with the overall effect of creating a differentiated, integrated solution with pricing power.
An important point to make here is that the prospect of AI/ML capabilities (incl. integrated hard/software + frameworks/platforms + 1P/3P “expertise”) being a point of reintegration along the value chain for hyperscalers is contingent upon interactions with counteracting forces that are primarily manifesting as “multi-cloud” but ultimately stem from customers’ desire to avoid vendor lock-in. If, for example, Google Cloud can catalyze modularization of basic compute and storage with their multi-cloud initiatives and offer differentiated AI/ML services, then basic C/N/S won’t be reintegrated into the overall solution. That is, if GCP can incrementally invest more in differentiated hardware like TPUs and VCUs and offload the “commoditized” and functionally modularized basic compute and storage to AWS and Azure, then GCP is a successful disruptor:
You can see some evidence of Google’s modularity-oriented, multi-cloud strategy playing out in initiatives like BigQuery Omni ...
... which is powered by Google’s open source (i.e., Anthos is based on Kubernetes and Istio), deployment/cloud-agnostic application management platform, Anthos. Anthos enables multi-cloud “Infrastructure, container, and cluster management”, “Multicluster management”, “Configuration management”, “Migration”, and “Service management” — that Google released and champions this interoperability-enabling tool is understandable given that GCP has more to gain from modularization along the Cloud’s value chain than AWS or Azure. There’s also public anecdata about the actual realization of Google’s multi-cloud strategy in clients like Wells Fargo ...
... which utilizes Azure as their primary cloud provider but allocates AI/ML workloads to GCP, so it’s clear that this modularity-oriented strategy has basis in reality. Obviously AWS and Azure are not going to sit idly by as Google has their cake and eats it too, so the question of whether or not GCP’s modular strategy takes root throughout the boader market is an open one. Enterprises, especially those in highly competitive industries where data can be maximally utilized, will prioritize making sure that the solution the CSP offers is an effective one before considering secondary vendor lock-in and multicloud concerns. If a company in an industry undergoing digital transformation has to choose between a relatively complex, multi-cloud AI/ML solution that requires technical expertise that isn’t internally available and a proven, fully integrated solution, company management will probably opt to stay competitive and clearly have someone to blame is something goes wrong.
[Note: See this Notion block for gallery of select McKinsey slides on industry AI/ML]
Forwards integration by hyperscalers into providing industry solutions means clearly communicating the potential value from applying recurrent neural networks techniques on repeat customer purchase data to the management team of a retail company and explaining why your Retail Industry Cloud is better than the other [two] options because you don’t compete in retail like AWS [you’re GCP in this example] and because you can derive unique, applicable customer insights from your search business. There are lots of complex moving parts in [buzzword warning] Cloud-based, AI/ML-enabled digital transformation that aren’t the core competencies of the C-suite of many companies in non-tech industries/verticals — simplification into a value proposition that makes sense for the companies’ decision makers and resource allocators, via a combination of both technical and industry expertise, is a differentiated service that deserves differentiated margins.
AWS, Azure, and GCP each have good cases to make for why they’re able to provide superior Industry Cloud offerings compared to their competitors:
The potential for an integrated, full stack Microsoft offering that ties together Azure IaaS/PaaS, GitHub Copilot, modular datacenters, IoT edge devices, and HoloLens [along with myriad other modular elements] is, for me, the most interesting value proposition of the three. The idea of an on-site engineer controlling and live programming [ostensibly with the aid of a CTRL-Labs type interface device] factory robots and/or drones with an AR headset on seems pretty cool, if not a little bit scary. Whether Microsoft’s HoloLens represents forwards or backwards integration depends on our perspective of the value chain. At its most abstract, the Cloud’s value chain is really the value chain for decision making and information processing — thinking of the HoloLens as a human-machine interface that tightens the cybernetic loop allows it to be interpreted as being up the stack, despite its being hardware. Or at least that’s how I’m currently thinking about it.