The Centralizing Forces Within AI

March 4th, 2024

Introduction

Artificial intelligence, which is broadly the ability of machines to perform cognitive tasks, has quickly become an essential technology in our day to day lives. The breakthrough in 2017 occurred when transformers were developed to solve the problem of neural machine translation, which allows a model to take an input sentence of a task and produce an output. This enabled a neural network to take text, speech, or images as an input, process it, and produce output.

OpenAI and Deepmind pioneered this technology and more recently the OpenAI GPT (Generative Pre-trained Transformer) models created a eureka for AI with the proliferation of their LLM chatbots. GPT-1 was first introduced in June of 2018, featuring a model composed of twelve processing layers. It used a specialized technique called "masked self-attention" across twelve different focus areas, allowing it to understand and interpret language more effectively. Unlike simpler learning methods, GPT-1 employed the Adam optimization algorithm for more efficient learning, with its learning rate gradually increasing and then decreasing in a controlled manner. Overall, it contained 117 million adjustable elements, or parameters, which helped refine its language processing capabilities.

Fast forward to March 14th 2023, OpenAI released GPT-4, which features approximately 1.8 trillion parameters spread across 120 layers. The increase in parameters and layers enhances its ability to understand and generate more nuanced and contextually relevant language, among other things. The over 10,000x increase in the number of parameters in OpenAI’s GPT models in under 5 years shows the astounding rate of innovation happening at the cutting edge of generative models.

[insert performance data]

Regulation

Running parallel to this innovation and underpinning the AI stack is regulation. Whenever a transformative technology comes to market, regulators will introduce laws and processes so that they can better control it. Almost prophetically, we saw this play out in 1991 when Joe Biden, then a chairman of the Senate Judiciary Committee, proposed a bill to ban encryption on Emails. This potential ban on code and mathematics inspired Phil Zimmermann to build the open source Pretty Good Privacy (PGP) program that enabled users to communicate securely by decrypting and encrypting messages, authenticating messages through digital signatures, and encrypting files. The United States Customs Service went on to start a criminal investigation into Zimmermann for allegedly violating the Arms Export Control Act as they regarded his PGP software as a munition and wanted to limit access to strong cryptography to citizens and foreign entities.

Reminiscent of the email encryption bill, on the 30th of October 2023 Joe Biden, now the President of the United States, passed a Presidential Executive Order on “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence”. The order falls under the Defense Production Act (DPA) affording the President a broad set of authorities to ensure the country has the resources necessary for national security. Broadly, the act seeks to establish new standards for AI safety and security. The order imposes strict Know Your Customer (KYC) on compute and data, whilst also banning all foreign AI model training occurring on US soil or in US data centers. On top of this they are putting permissionless AI models that will be capped at contains “tens of billions of parameters”, for reference Mistral-7B-v0.1 has 7 billion parameters. We are also witnessing this play out with hardware as the US recently prohibited the sale of semiconductor chips above a certain capability threshold to China, Russia and other nations.

Model Generation

On top of the centralizing regulatory pressures that artificial intelligence faces, there are a number of centralizing forces throughout the creation of a model. The creation of an AI model, particularly large-scale models like those used in natural language processing, typically follows three main phases: pre-training, fine-tuning, and inference. We will walk through each phase and the centralizing forces that are present:

Pre-Training

The pre-training phase is the initial step where the model learns a wide range of knowledge and skills from a large and diverse dataset. Before the advent of transformer-based architectures, top-performing neural models in natural language processing (NLP) primarily used supervised learning, which required vast quantities of curated and manually labeled data, which resided mostly within corporate boundaries. This dependence on supervised learning restricted their applicability to datasets lacking extensive annotations and created a centralizing force due to the prohibitive costs of employing skilled researchers and developers to perform this supervised learning. During this pre-transformer stage, supervised pre-training of models was dominated by

centralized entities like Google who had the resources to fund this work. The advent of transformer-based architectures, among other advancements, contributed significantly to the advancement of unsupervised learning, particularly in the field of natural language processing, enabling models to be trained on datasets without predefined labels or annotated outcomes.

Data Collection & Preparation

The first step in pre-training a model is gathering the data that the model will be trained on. A large and diverse data set is collected from a vast corpus of text such as books, websites and articles. The data is then cleaned and processed.

Tokenization involves breaking down text data into smaller units, or tokens, which may range from words to parts of words, or even individual characters, based on the model's architecture. Following this, the data undergoes formatting to make it comprehensible to the model. This typically includes transforming the text into numerical values that correspond to the tokens, such as through the use of word embeddings.

Model Architecture

Selecting the right model architecture is a crucial step in the development process, tailored to the specific application at hand. For instance, transformer-based architectures are frequently chosen for language models due to their effectiveness in handling sequential data. Alongside choosing a framework, it's also important to set the initial parameters of the model, such as the weights within the neural network. These parameters serve as the starting point for training and will be fine-tuned to optimize the model's performance.

Training Procedure

Using the cleaned and processed data, the model is fed a large amount of text and learns patterns and relationships in order to make predictions about that text. During the training procedure there are a couple of key procedures used to dial in the parameters of the model so that it produces accurate results. One is the learning algorithm:

The learning algorithm in neural network training prominently involves backpropagation, a fundamental method that propagates the error—defined as the difference between the predicted and actual outputs—back through the network layers. This identifies the contribution of each parameter, like weights, to the error. Backpropagation involves gradient calculation, where gradients of the error with respect to each parameter are computed. These gradients, essentially vectors, indicate the direction of the greatest increase of the error function.

Additionally, Stochastic Gradient Descent (SGD) is employed as an optimization algorithm to update the model's parameters, aiming to minimize the error. SGD updates parameters for each training example or small batches thereof, moving in the opposite direction of the error gradient. A critical aspect of SGD is the learning rate, a hyperparameter that influences the step size towards the loss function's minimum. A very high learning rate can cause overshooting of the minimum, while a very low rate can slow down the training process significantly.

Furthermore, the Adam optimizer, an enhancement over SGD, is used for its efficiency in handling separate learning rates for each parameter. It adjusts these rates based on the first moment (average of recent gradients) and the second moment (square of these gradients). Adam's popularity stems from its ability to achieve better results more quickly, making it ideal for large-scale problems with extensive datasets or numerous parameters.

The second key procedure we use in the training phase is the loss function, also known as a cost function. It plays a crucial role in supervised learning by quantifying the difference between the expected output and the model's predictions. It serves as a measure of error for the training algorithm to minimize. Common loss functions include Mean Squared Error (MSE), typically used in regression problems, where it computes the average of the squares of the differences between actual and predicted values. In classification tasks, Cross-Entropy Loss is often employed. This function measures the performance of a classification model by evaluating the probability output between 0 and 1. During the training process, the model generates predictions, the loss function assesses the error, and the optimization algorithm subsequently updates the model's parameters to reduce this loss. The choice of loss function is pivotal, significantly influencing the training's efficacy and the model's ultimate performance. It must be carefully selected to align with the specific objectives and nature of the problem at hand.

Resource Allocation

Resource allocation during the pre-training phase of AI models, particularly for large-scale models like those in the GPT series, necessitates a careful and substantial deployment of both computational and human resources. This phase is pivotal as it establishes the groundwork for the model's eventual performance and capabilities. The pre-training of these complex AI models demands an extensive amount of computational power, primarily sourced from Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs), which are specialized for handling the intense parallel processing tasks typical in machine learning. To address the considerable computational needs, a distributed computing approach is often adopted, utilizing multiple GPUs or TPUs across various machines or data centers in tandem to process the vast amounts of training data and update the model parameters efficiently.

Moreover, the significant volume of data required for pre-training, potentially reaching petabytes, necessitates robust storage solutions for both the raw and processed data formats. The energy consumption during this phase is notably high due to the prolonged operation of high-performance computing hardware, prompting a need to optimize computational resource use to strike a balance between performance, cost, and environmental impact. The financial aspects also play a critical role, as the acquisition and maintenance of necessary hardware, alongside the electricity for powering and cooling these devices, entail substantial costs. Furthermore, many organizations turn to cloud computing services to access the needed computational resources, adding a variable cost based on usage rates. In fact, when asked at an MIT event Sam Altman said that GPT 4 cost “more than $100 million” to train.

Fine-Tuning

The next stage in the creation of a model is fine-tuning. The pre-trained model undergoes adaptation to excel in specific tasks or with certain datasets that were not part of its initial training regimen. This phase takes advantage of the broad capabilities acquired during pre-training, refining them for superior performance in more focused applications, such as text classification, sentiment analysis, or question-answering. Fine-tuning involves preparing a smaller, task-specific dataset that reflects the nuances of the intended application, modifying the model's architecture to suit the task's unique output requirements, and adjusting parameters, including adopting a lower learning rate for more precise, targeted optimization. The model is then retrained on this curated dataset, which may involve training only the newly adjusted layers or the entire model, depending on the task's demands.

Following the initial pre-training and fine-tuning phases, models, particularly those akin to OpenAI's GPT-3, may undergo Reinforcement Learning from Human Feedback (RLHF) as an additional refinement step. This advanced training approach integrates supervised fine-tuning with reward modeling and reinforcement learning, leveraging human feedback to steer the model towards outputs that align with human preferences and judgments. This process begins with fine-tuning on a dataset of input-output pairs to guide the model towards expected outcomes. Human annotators then assess the model's outputs, providing feedback that helps to model rewards based on human preferences. A reward model is subsequently developed to predict these human-given scores, guiding reinforcement learning to optimize the AI model's outputs for more favorable human feedback. RLHF thus represents a sophisticated phase in AI training, aimed at aligning model behavior more closely with human expectations and making it more effective in complex decision-making scenarios.

Inference

The inference stage marks the point where the model, after undergoing training and possible fine-tuning, is applied to make predictions or decisions on new, unseen data. This stage harnesses the model's learned knowledge to address real-world problems across various domains. The process begins with preparing the input data to match the training format, involving normalization, resizing, or tokenizing steps, followed by loading the trained model into the deployment environment, whether it be a server, cloud, or edge devices. The model then processes the input to generate outputs, such as class labels, numerical values, or sequences of tokens, tailored to its specific task. Inference can be categorized into batch and real-time, with the former processing data in large volumes where latency is less critical, and the latter providing immediate feedback, crucial for interactive applications. Performance during inference is gauged by latency, throughput, and efficiency—key factors that influence the deployment strategy, choosing between edge computing for local processing and cloud computing for scalable resources. However, challenges such as model updating, resource constraints, and ensuring security and privacy remain paramount.

Centralizing Forces Within Model Generation

In the process of creating an AI model, numerous centralizing and monopolistic forces come into play. The significant resources needed for every phase of development pave the way for economies of scale, meaning that efficiency improvements tend to concentrate superior models in the hands of a select few corporations. Below, we detail the diverse mechanisms through which AI centralization occurs:

Pre-Training

As we have seen, the pre-training phase of a model combines a few things: data, training and resources. When it comes to the data collection, there are a number of issues:

Access to data

The pre-training phase requires a large corpus of data, typically from books, articles, corporate databases and from scraping the internet. As we discussed, when supervised learning dominated as a training technique, the large companies like Google could create the best models due to the large amount of data they were able to store from users interacting with their search engine. We see a similar centralizing and monopolistic force throughout AI today. Large companies such as Microsoft, Google & OpenAI have access to the best data through data partnerships, in-house user data or the infrastructure required to create an industrial internet scraping pipeline. For example, leaked documents suggest OpenAI is preparing to purchase user data from Tumblr and WordPress, at the expense of users' privacy.

The top 1% of x networks, facilitates x proportion of the total traffic / volume. Source Chris Dixon's "Read Write Own".

Transformers enabled unsupervised learning models but the scraping of web data is no easy feat, web pages typically ban scraper IP addresses, user agents and employ rate limits and CAPTCHA services.

AI companies deploy a variety of tactics to navigate around the barriers websites put in place to obstruct data collection efforts. One common method involves utilizing a diverse array of IP addresses to sidestep IP-based rate limiting or outright bans, often achieved through the use of proxy servers or VPN services. Additionally, altering the user-agent string in HTTP requests—a technique known as User-Agent Spoofing—allows these companies to emulate different browsers or devices, thereby potentially circumventing blocks aimed at user-agent strings typically associated with automated bots or scrapers. Furthermore, to overcome CAPTCHA challenges, which are frequently employed by websites to prevent automated data collection, some AI companies turn to CAPTCHA solving services. These services are designed to decode CAPTCHAs, enabling uninterrupted access to the site's data, albeit raising questions about the ethical implications of such practices.

Beyond their ability to gather large amounts of data, big corporations also have the financial means to build strong legal teams. These teams work tirelessly to help them collect data from the internet and through partnerships, as well as to obtain patents. We can see this happening today with OpenAI and Microsoft, who are in a legal dispute with The New York Times. The issue is over the use of The New York Times' articles to train the ChatGPT models without permission.

Closed source data

There are also ethical and bias considerations involved in training a model. All data has some inherent bias attached to it since AI models learn patterns, associations, and correlations from their training data, any inherent biases in this data can be absorbed and perpetuated by the model. Common biases we find in AI models result from sample bias, measurement bias and historical bias and can lead to AI models producing poor or unintended results. For example, Amazon trained an automated recruitment model which was designed to assess candidates based on their fit for different technical positions. The model developed its criteria for evaluating suitability by analyzing resumes from past applicants. However, since the data set it was trained on included predominantly male resumes, the model learned to penalize resumes that included the word “women”.

Resource allocation

As we have discussed, pre-training of foundation models requires large cycles of GPU compute, costing hundreds of millions to train the top models (in 2022, OpenAI reported a $540 million loss in the training phase of GPT3). Demand for accessible and usable GPUs vastly outstrips current supply and this has led to a consolidation of the pre-training of models to within the largest and most well-funded tech companies (FAANG, OpenAI, Anthropic) and data centers.

Although corporations keep details of their data centers and operations somewhat secret, for a variety of reasons: security, regulatory compliance, customer data protection & competitive advantages we can see that the top 5 cloud & data center providers

We have learned that models improve with training size logarithmically and therefore, in general, the best models are the ones trained with the highest number of GPU compute cycles. Thus, a very centralizing force within the pre-training of models is the economies of scale and productivity gains large incumbent tech and data companies have and we are seeing this play out with OpenAI, Google, Amazon, Microsoft and Meta dominating.

The concentration of the power to develop transformative artificial intelligence technologies within a small number of large corporations, such as OpenAI, Google, and Microsoft, prompts significant concerns. As articulated by Facebook's first president, the primary objective of these platforms is to capture and retain as much of our time and conscious attention as possible. This reveals a fundamental misalignment of incentives when interacting with Web2 companies, an issue we have begrudgingly accepted due to the perceived benefits their services bring to our lives. However, transplanting this oligopolistic Web2 model onto a technology that is far more influential than social media—and holds the capacity to profoundly influence our decisions and experiences—presents a concerning scenario. A perfect example of this is the Cambridge Analytica scandal in the 2010’s. The British firm unauthorizedly gathered personal data from up to 87 million Facebook users in order to build a user profile of each user before serving them targeted political ads to influence elections. This data aided the 2016 U.S. presidential campaigns of Ted Cruz and Donald Trump, and was implicated in the Brexit referendum interference. If such a powerful tool as AI falls under the control of a few dominant players, it risks amplifying the potential for misuse and manipulation, raising ethical, societal, and governance issues.

GPU Supply-Side Centralisation

The resultant effect of models scaling logarithmically with training size, is that demand for GPU compute is growing exponentially to achieve linear gains in model quality. Certainly we have seen this play out over the last 2 years with demand for GPU compute skyrocketing with the launch of chatGPT and the AI race. If we take Nvidia’s revenue as a proxy for GPU demand, we see that Nvidia’s quarterly revenue increased 405% from Q4 2022 to Q4 2023.

The production of GPUs and microchips for AI training is an extremely complex and expensive process, with high barriers to entry. As such, there are few companies capable of producing hardware capable of delivering the performance that companies like OpenAI require to train their GPT models. The largest of these semiconductor and GPU manufacturers is Nvidia, holding approximately 80% of the global market share in GPU semiconductor chips. Originally starting off in 1993, creating graphics-based computing hardware for video games, Nvidia quickly became a pioneer in high end GPUs and made their seminal step into AI in 2006 with the launch of its Compute Unified Device Architecture (CUDA), which specialized in GPU parallel processing.

The hardware used to train a model is vital and the costs of this are extremely high as we have discussed. To compound the barriers to entry of training a model, the current access to this hardware is extremely limited with only top tech companies receiving their orders in a timely manner. Normal people like you or I cannot buy the latest and greatest, H100 Tensor Core GPU from Nvidia. Nvidia works directly with Microsoft, Amazon, Google and co to facilitate large bulk orders of GPUs, leaving regular people at the bottom of the waitlist. We have seen a number of initiatives between chip manufacturers and large corporations in order to create the infrastructure required to train and provide inference for these models, for example:

OpenAI - In 2020, Microsoft exclusively built a supercomputer in order to train their GPT models. The supercomputer developed for OpenAI is a single system with more than 285,000 CPU cores, 10,000 Nvidia V100 and A100 GPUs and 400 gigabits per second of network connectivity for each GPU server.
Microsoft - In 2022, Nvidia partnered with Microsoft to create a 1,123,200-core supercomputer utilizing Microsoft's Azure cloud technology. Eagle is now the 3rd largest supercomputer in the entire world, with maximum performance of 561 petaFLOPS generated from 14,400 Nvidia H100 GPUs and Intels’ Xeon Platinum 8480C 48C CPU.
Google - In 2023, Google announced the A3 supercomputer, purpose built for AI & ML models. A3 combines Nvidia’s H100 GPUs with Google’s custom-designed 200 Gpbs Infrastructure Processing Units (IPUs), allowing the A3 to host up to 26,000 H100 GPUs.
Meta - By year end 2024, Meta expects to operate some 350,000 Nvidia H100 GPUs and an equivalent of 600,000 H100 of compute from older GPUs such as the Nvidia A100’s used to train Meta’s LLaMA models.

The application of these feats of engineering when applied to training of models is immediately transparent. The large number of GPUs allow for parallel processing, enabling AI training to be greatly sped up and for large models to be created. Take Microsoft's Eagle supercomputer for example, using the MLPerf benchmarking suite, this system trained a GPT-3 LLM generative model with 175 billion parameters, in just 4 minutes. The 10,752 H100 GPUs significantly speed up the process by leveraging their parallel processing capabilities, specialized Tensor Cores for deep learning acceleration, and high-speed interconnects like NVLink and NVSwitch. These GPUs' large memory bandwidth and capacity, along with optimized CUDA and AI frameworks, facilitate efficient data handling and computations. Consequently, this setup enables distributed training strategies, allowing for simultaneous processing of different model parts, which drastically reduces training times for complex AI models.

Scale records on the model GPT-3 (175 billion parameters) from MLPerf Training v3.0 in June 2023 (3.0-2003) and Azure on MLPerf Training v3.1 in November 2023 (3.1-2002). Source: Microsoft

We have clearly established then that the powerhouse behind the training of these large models is compute power, primarily in the form of GPUs. The centralizing forces we run into here are two fold:

Exclusivity - Nvidia GPUs have a huge waitlist & monopolistic corporations bulk order GPUs with priority over smaller orders / individuals.
Costs - The sheer cost of these GPU configurations mean only a small set of entities worldwide can train these models. For reference, each Nvidia H100 costs anywhere between $30,000 to $40,000, meaning Meta’s 600,000 H100 equivalent compute infrastructure will cost between $10.5 Billion and $24 Billion.

Supercomputer Geographical Centralization. Source: Wikipedia TOP500

Amid the consolidation of computational power by major corporations, there's a parallel and strategic push by leading nations to enhance their computational capabilities, mirroring the intense competition of the Cold War's nuclear arms race. These countries are crafting and implementing comprehensive AI strategies, accompanied by a suite of regulatory measures aimed at securing technological supremacy. Notably, a Presidential executive order now mandates that foreign entities must obtain authorization to train AI models on U.S. territory. Additionally, export restrictions on microchips are set to hinder China's efforts to expand its supercomputing infrastructure, showcasing the geopolitical maneuvers to maintain and control the advancement of critical technologies.

Chip Manufacturing

Whilst Nvidia & other semiconductor companies are at the cutting edge of chip design, they outsource all of their manufacturing to other corporations. Taiwan serves as the global hub for microchip production, accounting for more than 60% of the world's semiconductors and over 90% of the most sophisticated ones. The majority of these chips are produced by the Taiwan Semiconductor Manufacturing Corporation (TSMC), the sole manufacturer of most advanced semiconductors. Nvidia’s partnership with TSMC is fundamental to the company's success and for the efficient production of H100 GPUs. TSMC distinguishes itself in the semiconductor industry with its advanced chip packaging patents, utilizing high-density packaging technology that stacks chips in three dimensions to enhance performance. This technology is crucial for producing chips designed for intensive data processing tasks, such as AI, enabling faster operation.

Whilst microchip production is currently working at maximum capacity, there are some risks regarding the possible dangers to production due to increased military threats from China towards Taiwan, a democratic island claimed by Beijing despite Taipei's vehement opposition. Geopolitical tensions in the region have heightened, but worldwide we are seeing a heightening of AI tensions with the US banning certain microchip exports to China so as not to strengthen China’s AI capabilities and military. Should China advance on Taiwan, it could strategically position itself to dominate microchip manufacturing and thus the AI race.

Fine-Tuning & Closed-source Models

In the fine-tuning stage the model is trained on new, specific datasets and the internal configurations that allow the model to make predictions or decisions based on input data are altered. These internal configurations are called parameters and in neural networks, ‘weights’ are coefficients applied to input data, determining the connection strength between units across different layers of the model, and are adjusted throughout training to minimize prediction errors. ‘Biases’, constants added before the activation function, ensure the model can make accurate predictions even when inputs are zero, facilitating pattern recognition by allowing shifts in the activation function's application.

Closed-source models like OpenAI's GPT series maintain the confidentiality of their training data and model architecture, meaning the specific configurations of their parameters remain exclusive. The owner of this model retains complete control over how it is used, developed and deployed which can lead to a number of centralizing forces within the fine-tuning stage of a model:

Censorship - Owners can decide what types of content the model generates or processes. They can implement filters that block certain topics, keywords, or ideas from being produced or recognized by the model. This could be used to avoid controversial subjects, comply with legal regulations, or align with the company's ethical guidelines or business interests. Since the launch of chatGPT, the outputs have continued to become increasingly censored and less useful. An extreme case of censorship of these models is showcased in China, where weChat conversions with Robot (built atop OpenAI’s foundational model) doesn’t answer questions such as “What is Taiwan?” or allow users to ask questions about Xi Jinping. In fact, through adversarial bypass techniques, a WSJ reporter was able to get Robot to admit that it was programmed to avoid discussing “politically sensitive content about the Chinese government or Communist Party of China.”
Bias - In neural networks, the role of weights and biases is pivotal, yet their influence can inadvertently introduce bias, particularly if the training data lacks diversity. Weights, by adjusting the strength of connections between neurons, may disproportionately highlight or ignore certain features, potentially leading to a bias of omission where critical information or patterns in underrepresented data are overlooked. Similarly, biases, set to enhance learning capabilities, might predispose the model to favor certain data types if not calibrated to reflect a broad spectrum of inputs. The closed source nature of these models can cause the model to neglect important patterns from specific groups or scenarios, skewing predictions and perpetuating biases in the model's output, meaning certain perspectives, voices or information are excluded or misrepresented. A good example of bias and censorship by the model owner is Google’s latest and greatest LLM, Gemini.
Verifiability - In a closed-source environment, users cannot confirm whether the claimed version of a model, such as ChatGPT 4 versus ChatGPT 3, is actually being used. This is because the underlying model architecture, parameters, and training data are not accessible for external review. Such opacity makes it difficult to ascertain if the latest advancements or features are indeed present or if older technologies are being passed off as newer versions, potentially affecting the quality and capabilities of the AI service received. For example, when using AI models to ascertain an applicant's credit worthiness for a loan, how can the applicant be sure that the same model was run by them as other applicants? Or how can we be sure the model only used the inputs it was supposed to use?
Dependency, lock-in and stagnation - Entities that rely on closed source AI platforms or models find themselves dependent on the corporations that maintain these services, leading to a monopolistic concentration of power that stifles open innovation. This dependency arises because the owning corporation can, at any moment, restrict access or alter the model, directly impacting those who build upon it. A historical perspective reveals numerous instances of this dynamic: Facebook, which initially embraced open development with its public APIs to foster innovation, notably restricted access to applications like Vine as they gained traction. Similarly, Voxer, a messaging app that gained popularity in 2012 for allowing users to connect with their Facebook friends, lost its access to Facebook's 'Find Friends' feature. This pattern is not exclusive to Facebook; many networks and platforms begin with an open-source or open innovation ethos only to later prioritize shareholder value, often at the expense of their user base. We see for-profit corporations eventually require take rates in order to meet their stated goals of creating shareholder value, for example Apple's App Store imposes a 30% fee on the revenues that are generated from apps. Another example is Twitter. Despite its original commitment to openness and interoperability with the RSS protocol network, eventually prioritized its centralized database, leading to a disconnection from RSS in 2013 with it the loss of data ownership and one's social graph. Amazon has also been accused of using its internal data to replicate and prioritize its products over those of other sellers. These examples underscore a trend where platforms evolve from open ecosystems to more controlled, centralized models, impacting both innovation and the broader digital community.
Privacy - The owners of these centralized models, large corporations such as OpenAI, retain all rights to use the prompt and user data to better train their models. This greatly inhibits user privacy. For example, Samsung employees inadvertently exposed highly confidential information by utilizing ChatGPT for assistance with their projects. The organization permitted its semiconductor division engineers to use this AI tool for debugging source code. However, this led to the accidental disclosure of proprietary information, including the source code of an upcoming software, internal discussion notes, and details about their hardware. Given that ChatGPT collects and uses the data inputted into it for its learning processes, Samsung's trade secrets have unintentionally been shared with OpenAI.

Reinforcement Learning from Human Feedback (RLHF)

Reinforcement Learning from Human Feedback (RLHF) integrates supervised fine-tuning, reward modeling, and reinforcement learning, all underpinned by human feedback. In this approach, human evaluators critically assess the AI's outputs, assigning ratings that facilitate the development of a reward model attuned to human preferences. This process necessitates high-quality human input, highlighting the importance of skilled labor in refining these models. Typically, this expertise tends to be concentrated within a few organizations capable of offering competitive compensation for such specialized tasks. Consequently, corporations with substantial resources are often in a better position to enhance their models, leveraging top talent in the field. This dynamic presents challenges for open-source projects, which may struggle to attract the necessary human labor for feedback without comparable funding or revenue streams. The result is a landscape where resource-rich entities are more likely to advance their AI capabilities, underscoring the need for innovative solutions to support diverse contributions in the development of AI technologies.

Inference

To effectively deploy machine learning (ML) or artificial intelligence (AI) models for user applications, it is imperative to ensure these models are equipped to manage real-world data inputs and provide precise, timely predictions or analyses. This necessitates careful deliberation on two pivotal aspects: the choice of deployment platform and the infrastructure requirements.

Deployment Platform

The deployment platform serves as the foundation for hosting the model, dictating its accessibility, performance, and scalability. Options range from on-premises servers, offering heightened control over data security and privacy, to cloud-based solutions that provide flexible, scalable environments capable of adapting to fluctuating demand. Additionally, edge computing presents a viable alternative for applications requiring real-time processing, minimizing latency by bringing computation closer to the data source. As with the pre-training stage, we run into similar centralisation problems when deploying the model for real world use:

Infrastructure centralisation - The majority of models are deployed on top of high-performance cloud infrastructure, of which there are not many options worldwide. As highlighted earlier, a small set of corporations have the facilities to process inference for these high parameter models and the majority are located in the US (as of 2023, 58.6% of all data centers were located in the USA). This is particularly relevant in light of the presidential executive order on AI and the EU AI act as it could greatly limit the number of countries that are able to train and provide inference for complex AI models.

Costs - Another centralizing force within the inference stage is the significant costs involved in deploying these models on one's own servers, cloud infrastructure, or through edge computing. OpenAI has partnered with Microsoft to utilize Microsoft's Azure cloud infrastructure for serving its models. Dylan Patel, the chief analyst at consulting firm SemiAnalysis, estimated that OpenAI's server costs for enabling inference for GPT-3 were $700,000 per day. Importantly, this was when OpenAI was offering inference for their 175 billion parameter model, so, all things being equal, we would expect this number to have escalated well into the seven figures today. In addition to the geographical and jurisdictional centralization of these data centers, we also observe this necessary infrastructure being consolidated within a few corporations (84.9% of cloud revenues were generated by four companies in 2023).

Source: Amazon, Microsoft, Google, Equinix, Statista

Centralized Frontends

Centralized hosting of frontends involves delivering the user-interface components of websites and web applications from one primary location or a select few data centers managed by a handful of service providers. This method is widely adopted for rolling out web applications, particularly those leveraging AI technologies to offer dynamic content and interactive user experiences. The frontend is therefore susceptible to take-downs through regulations or through changes in the policies of the service providers. We have seen this play out in Mainland China as citizens are blocked from interacting with the frontends of the popular AI interfaces such as ChatGPT and Hugging Face.

Conclusion

In conclusion, we can see the status quo for AI suffers from a number of centralizing and monopolistic forces that enable a minority of the world's largest entities to control and distribute models to the population. We have seen from the failures of web2, the misalignment of incentives between the user and corporation poses a dire threat to our freedoms, privacy and right to use AI. The impending regulation surrounding AI and the flourishing open source space shows we are at a pivotal moment in the advancement of the technology and that we should do everything in our power to ensure it remains free and open source for all to use. In our next blog we will cover how crypto at the intersection of AI is enabling these free and open source systems to scale, improving the status quo and improving crypto.

Subscribe to ASXN Labs

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

ilYcOiFCtRhs9c_…09x1ubi6EdV7j3E

Author Address

0xC36A87666c505Fe…38725ff4d34877D

Content Digest

ziA4V0OULXjVQKe…EFuDJWT54k0sZSk