An Operating System for Software Development at Scale

Introduction

In recent years, forces have converged to enable a new mechanism for software development at scale: the entrenchment of collaborative, API-drivable software development platforms such as GitHub, the trustless payment and contract systems of crypto networks, and the rapid acceleration of artificial intelligence, especially in the adoption and very quick optimization of large language models.

Software development is fundamentally modular, able to be deconstructed into discrete tasks, issues, and pull requests. This makes it an ideal candidate for a task-oriented marketplace. This marketplace operates as a two-sided platform, driven by project owners who post tasks, and developers or automated bots that proactively find tasks to tackle. This can include addressing security vulnerabilities, optimizing performance, or solving other software bugs, thereby pushing the boundaries of the traditional development model.

The Power of Small Tasks

Traditional approaches to software development often involve long, monolithic processes that can become unwieldy and inefficient. Research has consistently shown that breaking down complex workflows into smaller, more manageable parts leads to greater efficiency and higher-quality output.

Even given this evidence, most new systems built to automate the creation of software development today still start with a monolithic approach.

The marketplace enables developers to hone in on specific tasks that align with their expertise. With each task being a self-contained unit, developers can efficiently solve problems, contributing to a faster, better, and more specialized development ecosystem.

A Competitive Race to Value

The open structure and the automated nature of this marketplace introduce a significant and novel dynamic to software development: competition. Developers are no longer working in silos but are part of a competitive landscape where the race to complete high-value tasks is on. This ecosystem taps into game theory and economic incentives, motivating developers to deliver solutions quickly without compromising quality. Such a race to the top stands to revolutionize the pace and quality of software development.

It also means developers can focus on narrow centers of their own expertise.  As an example, a developer who is in the top percentile of PostgreSQL query optimization is likely to only do that sort of work for a small percentage of their time in a typical job or to have to establish a brand and practice around consulting in order to focus on it.  In a task-oriented environment and marketplace, the same developer could scale their skills across many codebases and organizations, taking advantage of hard-won specialized expertise to develop an income stream and reputation that scales beyond what’s possible in existing systems.

APIs == Automation

As referenced in the introduction to this memo, the emergence and mainstreaming of collaborative coding systems such as GitHub, Gitlab and others has generated an unexpected side effect. With these systems built for developers by developers, almost every interaction is API-accessible. From issue creation and review to pull requests, reviews and code merges, it’s possible to interact with these systems without any human intervention.

The first time I saw snyk.io, a lightbulb went off in my head.  In its first iteration, Snyk would watch for dependencies that were out of date or had known vulnerabilities and create patches for those dependencies in the form of pull requests.  While that’s obviously how developers work on Github, what got me excited was the realization that the pull request mechanism in Github created a perfect structure for automation to interact with humans naturally and frictionlessly.  While we might not all be comfortable with bots modifying our code unsupervised, we’re already used to receiving pull requests, reviewing them, commenting, and then merging.  Additionally, developers have been using IDEs to do things like automated refactoring for many years now.  The primary difference between accepting these machine-authored changes and bots making changes autonomously is the modality.  We have come to think of our source editors as “ours”, but the editor will not be the primary canvas for software development in the future.

Automation == AI

The natural end state of an automatable economy for software development task completion in an API-drivable collaborative code environment is for most of the work to be completed by bots.  Some of these bots might be simple scripts created opportunistically to solve timely problems such as newly discovered vulnerabilities.  Others might be far more complex and feature-rich.

Even given the current, nascent state of LLMs as coding aids, machines are remarkably capable of digesting plain text descriptions of software modules and spitting out code.  The more focused the task, the better they are.  Machines are also capable of reading other code, mimicking coding styles and conventions, reacting to comments, and–most importantly–reacting to errors.  The most effective way to code with an LLM is to iterate with the code, generating, executing, posting errors, and resolving them.  This is most easily accomplished on small, well-defined tasks.

Financial Integration

To realize the full power of a software development marketplace such as this one, we need a credibly neutral payments & escrow system.  Developers need to believe that the best and fastest responses will win and be rewarded. Organizations owning codebases need to believe that they are receiving the best responses to their work at the lowest possible cost.

And all of it must be tied directly into the code management system.  To merge 3rd party code is to accept that contribution and its value.  For code changes and additions, the merge and payment should be atomic.

Developers also need to believe that the code hosting service doesn’t operate with both an unfair advantage and a conflict of interest.  Centralized code hosting services may both have an unfair advantage (low-latency access to both private and public customer code) and a conflict of interest (code hosting services are already building code generation tools for obvious reasons).

Beyond Small Tasks

While it’s true that small tasks are easier to understand, complete effectively, and automate, real world problems don’t always present themselves as small tasks.  How do we create entirely new complex systems?  How do we break down abstract ideas into actionable product implementations?

In practice, all complex software development scenarios are broken into small tasks which are usually completed by specialists.  Product managers translate business ideas into concrete product ideas. Project managers break down product ideas into discrete pieces of work and sequence them into timelines of dependencies.  Designers create user experience mockups and wireframes from the requirements. Software developers translate these more granular requirements into working code and executable test cases.  Quality assurance engineers map out edge cases and look for rough edges in implementations.

Our system is no different. While our first pragmatic step is to address clearly defined issues in existing software projects, the system is designed to work as a federated hierarchy of agents—both human and machine—which hyper-specialize in a specific capability or discipline (or even programming language, business domain, etc.) and are optimized to be the best possible executor of related tasks.

Imagine an asynchronous message bus, accepting tasks of various types and at varying levels of the stack, from product ideation through testing and deployment.  Drop an idea in at the top level and a cast of thousands of specialized agents work to decompose, vet, specify, plan, implement, test, and deploy the software.  Each layer of the work and associated task is the subject of competition for each relevant agent in the system to see which can complete the tasks the fastest, cheapest, and/or most effectively and get paid for the work.

Challenges

There are many challenges to making a system like this work.  Here is a short incomplete list of unanswered questions and problems:

  • Closed source code & IP protection: How do developers get enough access to code without putting customers’ intellectual property at risk?

  • How do developers propose solutions without giving them away without being paid? For example, if a developer found a performance problem in a piece of code and offered a fix for compensation, how does a customer verify the solution without being able to just include it without paying the developer?

  • How do open source projects fund development?

  • How do we verify submissions at scale?  Manually reviewing pull requests for an open source project is already an overwhelming task for some.  How much harder does this get when bots are submitting solutions in exchange for compensation?  How do humans keep up with the pace of reviews and merges and how do we validate the correctness of submissions?

Are you building this?

Or are you building something that aligns with these ideas?  Or do you think I’ve got this wrong and you have a better way to achieve the same goals?  If so I’d love to hear from you at team@blueyard.com

Subscribe to BlueYard
Receive the latest updates directly to your inbox.
Mint this entry as an NFT to add it to your collection.
Verification
This entry has been permanently stored onchain and signed by its creator.