Blockchain-Based Accountability for LLMs

August 20th, 2023

The rapid rise of large language models (LLMs) has revolutionized industries, from customer service to content creation. However, as LLMs become increasingly integrated into our daily lives, concerns about transparency, accountability, and ethical behavior have emerged. In response, a hypothetical framework has been proposed that utilizes blockchain technology to create a solution for regulating LLMs.

At the core of this blueprint is a blockchain-based system that records information about interactions with a LLM model. Data such as the prompt, response, and model parameters of an LLM would be stored for each request, alongside other metadata such as timestamps and user info. The system consists of three main components: the LLM, the smart contract, and the blockchain. The smart contract is responsible for interacting with the LLM and recording said interactions on the blockchain.

Deployments

Corporate LLMs with Limited Public Access: Corporations like OpenAI could use the system to record interactions with their LLMs, such as GPT. The government could regulate these models by mandating the use of smart contracts to record all responses and prompts, ensuring proper RLHF training and guardrails are being implemented. The blockchain record would also be admissible evidence in legal proceedings regarding the LLM.
Open Source Models: Open-source models like Meta’s LLaMa could benefit from this system. While it would be impossible to regulate every deployment of these models, unless they were shipped with the regulatory frameworks built in, users could opt in to run blockchains for transparency of their own version. The chain data could be shared publicly or with regulatory bodies for improvement of the base model.
Private LLMs: LLMs developed by private organizations or individuals with no public access could still be regulated using this system. The government would need to identify these models, create contracts with each company for each model, and continuously monitor their interactions.

A Hypothetical Design

Machine A (Corporate-owned):

Function: Hosts the LLM model and provides an API for interaction. This machine is responsible for running the LLM and allowing external interactions only through the API.
Network Access Control: Machine A's network access is restricted to only allow communication with Machine B. All other external communications are blocked.
Traffic Monitoring: Machine A records all incoming and outgoing traffic. Logs are maintained to ensure that all interactions are only with Machine B. Any anomalies or unauthorized access attempts are flagged and reported.

Machine B (Regulatory-owned):

Function: Hosts a local blockchain and a smart contract. This machine is responsible for recording interactions with the LLM on the blockchain.
Smart Contract: The smart contract deployed on the local blockchain has a function that interacts with the LLM's API on Machine A. This function takes a prompt and parameters as input, communicates with the API, and receives the LLM's response.
Transaction Recording: The smart contract records the transaction on the blockchain, including the input prompt, parameters, and the LLM's response, ensuring an immutable record of the interaction.

Access Control, Usage, and Jurisdictional Divisions:

Corporate Access: The corporation can access Machine B to use their LLM. They interact with the LLM by sending prompts and parameters to the smart contract on Machine B, which then communicates with the LLM on Machine A. The corporation does not have direct access to the blockchain records on Machine B but can use the LLM through the designated interface.
Regulatory Access: The regulatory body or government has full access to Machine B, allowing them to monitor, audit, and review the LLM's interactions and the blockchain records. The regulatory body ensures that the LLM on Machine A is not accessed directly and that all interactions are recorded on the blockchain.
Jurisdictional Divisions: Clear jurisdictional boundaries are established between Machine A (corporate jurisdiction) and Machine B (regulatory jurisdiction). This division ensures that the corporation and the regulatory body operate within their respective domains, maintaining the integrity of the system.

Just use a regular db

The reasoning for using a blockchain instead of a database lies in the immutability of blockchain technology. Unlike databases, which can be edited or altered, blockchain records are permanent and cannot be changed once added to the chain. This is imperative to constructing an accurate representation of the behavior of an LLM, where a regular database is vulnerable to malicious or retroactive revision. Blockchains mean we don’t need to worry about the ledger appearing unethical in the case of violations, since it is permanent.

Exposure

The choice between private and public blockchains is crucial. Public blockchains offer the most transparency, as all responses are visible to everyone. However, they come with the gas cost of writing to the chain. If a model such as GPT were to write every interaction on to a blockchain such as Ethereum, there would likely be a severe amount of network congestion and gas cost associated with such activity.

Private blockchains are often gasless and, as long as the operational body can be trusted, such as a regulatory body from the federal government, they are more optimal for recording massive amounts of transactions.

Decentralization through Tokenomics

Tokenomics, or token economics, refers to the design and implementation of a token within a blockchain ecosystem. It encompasses the rules, incentives, and distribution mechanisms that govern the token's usage and value. In the context of the LLM blockchain, a utility token could be introduced to incentivize participation in the voting process and governance of the LLM.

The voting mechanism allows users to evaluate the quality and appropriateness of the LLM's responses. Users can vote on the LLM's responses based on criteria such as relevance, accuracy, and ethical considerations. The voting mechanism will need to include a voting interface, voting criteria, voting results, and voting incentives.

Users who participate in the voting process are rewarded with utility tokens. These tokens can be used to access premium features of the LLM, participate in governance decisions, or be exchanged for other cryptocurrencies. The tokenomics model is designed to encourage active participation in the voting process and ensure a fair and transparent evaluation of the LLM's behavior.

DAOs

A Decentralized Autonomous Organization (DAO) could own an LLM and use tokenomics to vote on the governance of the LLM. In this use case, the DAO would consist of a community of token holders who collectively make decisions about the LLM's development, deployment, and behavior.

Imagine a DAO that owns an LLM designed for medical research, and licenses it out to medical research facilities. The DAO's community consists of researchers, medical professionals, and patients. The utility token is used to incentivize participation in the voting process and governance of the LLM.

Voting on LLM Responses: Community members can vote on the LLM's responses to medical queries, ensuring that the LLM provides accurate and ethical information.
Governance Decisions: Token holders can vote on governance decisions, such as updating the LLM's training data, implementing new features, or establishing partnerships with medical institutions.
Token Distribution: Tokens are distributed to community members based on their contributions to the DAO, such as voting, providing feedback, or conducting research using the LLM.

In this use case, the DAO leverages tokenomics to create a decentralized and community-driven approach to the ownership and governance of the LLM. The utility token incentivizes active participation and ensures that the LLM's behavior aligns with the community's values and goals.

On Chain Reincarnation

Another transformative application of the blockchain in the context of LLMs is its potential use for further reinforcement learning. Traditional models are often fixed to their initial datasets, but with the blockchain, a dynamic evolution of the dataset becomes possible. By hosting an initial dataset on the blockchain and subsequently recording all of the LLM's interactions and responses on the same chain, a continuous feedback loop is established.

This feedback loop allows the LLM to self-recursively improve. By combining the original dataset with the accumulated responses and additional human guidance, it becomes possible to train a new iteration of the model. This iterative process can lead to the creation of a more refined and accurate version of the original model, which is better adapted to real-world interactions and more aligned with human values and expectations.

However, there are significant challenges to consider. Hosting a dataset on a blockchain and using it for the continuous training of an LLM would require vast amounts of data storage. Additionally, the computational power needed for the training and re-training of such models would be astronomically high. These challenges would need to be addressed to make this vision feasible, possibly through advancements in blockchain scalability solutions and more efficient training algorithms. Perhaps something L2s or zkEVMs could tackle.

Educational Access

Educational institutions can greatly benefit from the use of LLMs, offering students and instructors access to powerful AI tools that can assist in research, teaching, and learning. However, ensuring ethical usage and proper access control is crucial in an academic setting. The blockchain-based regulation system can be adapted to meet the specific needs of educational institutions.

Recording Interactions for Accountability: Educational institutions can record all interactions with the LLM, including prompts, responses, model parameters, and which student or faculty member sent the request. This creates an immutable record of the LLM's behavior, allowing institutions to monitor usage, ensure compliance with ethical guidelines, and prevent misuse.
Access Control for Students and Instructors: The system can be configured to control access to the LLM, ensuring that only authorized students and instructors can interact with the model. Access can be granted based on user roles, courses, or research projects, allowing institutions to tailor access to specific academic needs. For instance, you likely wouldn’t need the neuroscience-tailored LLM for your law students. This could also be used if an institution is licensing an LLM from a third party, and would like to prevent unrestricted and unauthorized access.

/>_

The rise of LLMs and their integration into our daily lives is not just a technological phenomenon; it has profound implications for society at large. As these AI systems become more pervasive, the need for transparency, accountability, and ethical behavior becomes paramount. The blockchain-based regulatory system we've explored offers a promising approach to addressing these concerns, ensuring that AI serves the interests of all stakeholders. By fostering a culture of responsible AI development and usage, we can harness the transformative potential of LLMs while safeguarding our values and ensuring a more equitable and inclusive digital future for everyone.

Subscribe to jeffy yu

Receive the latest updates directly to your inbox.

Mint this entry as an NFT to add it to your collection.

Verification

This entry has been permanently stored onchain and signed by its creator.

Arweave Transaction

X1864QqdxVA4IXM…DzDpCU58FWI_zN4

Author Address

0x0c778e66efa266b…A7BDA63Ad24C37B

Nft Address

0xc769CdF908E758F…4e3A51f6DD86eB4

Content Digest

eMp12YKHFeRg-dm…0jZAKPfbvmZz8YE