When people find out that the Livepeer network has thousands of GPUs actively being used to transcode millions of minutes of video per week, one of the most common questions that gets asked is whether those GPUs could be put to use to perform other types of compute. In particular, with the rise of AI in 2023, and the associated growth in demand for GPUs, the hardware that is used to perform AI training and inference, it is naturally assumed that the Livepeer network could make its supply available to tap into the billions of dollars getting spent on AI infrastructure. NVidia’s data center business, which provides GPUs for AI compute, grew to earn $14B last quarter alone, up from $4B the same time last year.
Those making those assumptions would be correct - the Livepeer network can certainly be used by those seeking cost-disruptive AI processing. With the groundwork laid in recent months through both the growth of video usage on Livepeer via Livepeer Studio, and the new community governed Livepeer treasury being introduced, the time is now to bring the AI video compute capabilities to the project.
The rest of this post will lay out how AI video compute will be introduced on the Livepeer network, and a plan, strategy, and timeline for making it a reality.
Livepeer has always been committed to its mission: Building the world’s open video infrastructure. Other compute platforms have tried to be generic “AWS on the blockchain” or “run any type of compute task” type marketplaces, but this has presented a challenging go-to-market due to the lack of ability to target a solution to an industry segment. Instead Livepeer focused on video compute, via transcoding, and was able to build targeted products and a GTM for a specific industry - the 100B+ video streaming market - to solve a real use case and tap into existing demand, rather than marketing a generic abstract solution that no one wanted.
This focus on video has meant that Livepeer has avoided being overly reactive and pivoting to the latest hot trend such as ICOs, NFTs, or DeFi, but instead has always asked how these innovations can apply to video. The highs aren’t quite as high, but more importantly the lows aren’t quite as low. This has also attracted a mission focused team and community, with deep video expertise, who are excited about what we’re doing over a long period of time, rather than a community that jumps ship when the trend-of-the-month looses steam.
At the moment, there is no trend hotter than the fast rise in artificial intelligence, or AI. But unlike many crypto teams and projects, Livepeer hasn’t abandoned its mission and “pivoted to AI.” Instead we have asked the question, how will AI impact the future of video. AI lowers the barriers for video creators in many ways. Two important ones are reducing the time and cost for creation in the first place, and reducing the time, cost, and expertise for high quality video production and output.
On the creation side, generative AI can be used to create video clips from text or image prompts. Setting up a scene that used to require a film crew, set, cameras, script, actors, editing, and more, now may only require a user typing a text prompt on their keyboard and waiting a few minutes for GPUs to generate a sample of potential results. Generative video won’t replace high quality productions, but it can create tremendous cost savings at various stages in the process.
As for production, capabilities like upscaling, frame interpolation, subtitle generation, and more can rapidly expand the quality and accessibility of video content whether it was created by AIs or submitted by creators. Advanced capabilities like interactivity within video can be enabled via automatic object detection, masking, and scene-type classification.
The timing for Livepeer to tap into this AI capability set is exciting with the recent release of open source foundational models including Stable Video Diffusion, ESRGAN, FAST, and more, which are all keeping pace with closed source proprietary models. The aim is for the world’s open video infrastructure to support running open source models accessible to all, and these models are here now and getting better at a rapid pace due to the innovation of the open source AI community.
There are many stages in the AI lifecycle, but the three that typically require substantial compute power are Training, Fine Tuning, and Inference. Read here for definitions of these areas, but in short:
Training requires creating models and running computation on very large datasets. Sometimes this requires 10’s or 100’s of millions of dollars worth of compute while training foundational models, like those trained via OpenAI or Google.
Fine tuning is more cost effective, and takes an existing foundational model, but tunes the weights based on a specific set of inputs for a particular task.
Inference is the act of taking the already trained and tuned model, and having it produce outputs or make predictions based on the set of inputs. For one inference job, this is typically cheap in terms of compute relative to the first two phases, but is often performed over and over again millions of times, so that the amount of spend on inference more than overwhelms the cost of training, and therefore justifies the training investment.
Training and tuning require access to large datasets, and densely networked GPUs so that they can communicate with one another and share information quickly. A network like Livepeer is not well suited for training out of the box, and it would require significant updates to retrofit it for the task. While decentralized networks are attractive for training as alternatives to proprietary big-tech training clouds, It’s questionable whether decentralized networks can ever be competitive from a cost perspective due to the networking overheads and inefficiencies when it comes to training foundational models.
Inference on the other hand is where decentralized networks, like Livepeer, can shine. Each node operator can choose to load given models onto their GPUs, and can compete on cost to perform inference jobs on users inputs. Just like in the Livepeer transcoding network, users can submit jobs to the Livepeer network to perform AI inference, and should receive the benefits of open market competitive pricing that can leverage currently idle GPU power, to see cost benefits.
GPUs are the lifeblood of the AI Boom. NVidia’s datacenter business which is predicated on demand for their GPUs has exploded exponentially over the past year. Elon Musk jokingly shared that GPUs are harder to buy than drugs. Yet DePIN networks like Livepeer have shown that through their open marketplace dynamics, and bootstrapping incentives through inflationary token rewards, they can attract a global supply of GPUs ahead of demand, such that they can elastically support the growth of new users and apps with the appearance of near-infinite pay-as-you-go capacity. Developers no longer need to reserve hardware in advance at high prices that sit idle when not in use, and instead can pay as they go for the lowest possible market rate. This is the huge opportunity for decentralized networks to power the AI boom.
Cloud providers like GCP or AWS let you “reserve a GPU server” on their corporate clouds. Open networks like Akash go one step further and let you rent a server on demand from one of many decentralized providers all around the world. But regardless of the above choices, you then have to manage that rented server to run your model and perform your tasks. You have to scale it if you want to build an app that does many tasks at once. You have to chain your workflows together.
Livepeer abstracts things up a level to a “job” that you can submit to the network and trust it will get done. Livepeer already does this with transcoding video, where the job is to submit a 2-second segment of video to be transcoded. You just fire the job to the network, and can have confidence that your broadcast node will get it completed reliably, handing worker node selection, failovers, and redundancies.
For AI video compute tasks it can work the same way. There can be a job that “generates video from text”. You can trust that your node will get this done, and you can scale it to as many jobs as you want to submit concurrently through the single node, which can tap into a network of thousands of GPUs to perform the actual compute. Taking this one step forward - and this is still very much in the design phase - you could possibly submit an entire workflow such as
Generate video from text
Upscale it
Do frame interpolation to make it play back smoothly
And the network could do this for you, without you having to deploy separate models to separate machines, manage the IO, shared storage, and more. No more managing servers, scaling them, doing failover, etc. Livepeer is a scalable infrastructure, that is maximally cost effective, and highly reliable. If the network can deliver on these promises for AI video compute, the way it already does for video transcoding jobs, it offers a new level of developer experience and cost reduction, not yet seen in the open AI world.
As is consistent with Livepeer's journey over the last 7 years, the project will work to show real, usable, functional, open source software and network capabilities before doing promotion that "Livepeer has this." Here is the short version of a plan to get there:
Select a specific initial use case for an additional job type, beyond just video transcoding: AI based Generative Video, supported by AI upscaling and frame interpolation. Great open models, like Stable Video Diffusion, are evolving every day in this space.
Move quickly by building within a fork/spike of the node software to add these capabilities to both our orchestrator (supply side) node, and broadcaster (demand side) node. Livepeer’s open media server, Catalyst, should support an interface for requesting and consuming these generative video tasks.
Users running this spike will form a sort of sub-network on Livepeer, but they will use Livepeer protocol for discovery and payments to nodes running this new capability through Livepeer mainnet.
Work with a frontend consumer facing app to leverage Livepeer's highly cost effective open compute network, and to capture and showcase data that validates the cost effectiveness of Livepeer relative to public clouds.
After we validate this, merge into the core Livepeer clients, add additional job types, and grow the ecosystem around leveraging additional forms of AI based video compute.
Livepeer recently introduced a community governed on chain treasury to the protocol through its Delta upgrade, and it has been populating with LPT for a number of months to fund public goods initiatives. There is already a pre-proposal being discussed and nearing a vote which aims to fund a special purpose entity (or SPE) focused on making this AI video compute prospect a reality. The first proposal aims to deliver on the core development to hit the first 4 tasks listed above, including:
Development of these AI capabilities into a fork of the Livepeer nodes
A subnetwork that node operators can form to perform these tasks with payments cashed in on Livepeer mainnet.
A frontend app that demonstrates these capabilities for consumers.
The collection of the benchmarks and data that show the cost effectiveness of the Livepeer network for performing AI inference at scale.
It also suggests a future potential funding milestone to deliver infrastructure credits from the treasury to cover the initial cost of consumer usage during this data collection period.
The #ai-video channel in the Livepeer Discord has become a hotspot for discussion and collaborating around this initiative, and anyone who believes in the future of open AI infrastructures and video AI compute, should drop in to say hello and get involved. Node operators have begun benchmarking different hardware, getting familiar with running these open video models, and solving the challenges of moving from a video transcoding specialization to additional video specific job types. It’s a fun time to be part of a group working on a fast moving project.
While this initial milestone can show that Livepeer can be cost effective for specific forms of supported AI Video compute, the true ultimate power lies in the ability for AI developers to BYO-model, BYO-weights, BYO-fine-tunings, or deploy custom LoRAs on top of existing foundational models on the network.
Supporting these initial capabilities, across a set of a couple different models and forms of compute, will lead to fast learning in the areas of node operation, model loading/unloading on GPUs, node discovery and negotiation, failover, payments, verification, and more when it comes to AI video compute. From there, we can evaluate future milestones of productionizing and supporting arbitrary AI video compute job types on the Livepeer network.
In the early days, video specific platforms, such as Livepeer Studio, could build APIs and products for video developers to take advantage of supported models. Consumer apps, like the one proposed in the AI Video SPE can use these capabilities direct on the Livepeer network via the Catalyst node. But as these capabilities expand, new creator-focused AI businesses can take shape, and leverage Livepeer’s global network of GPUs, to cost-effectively build their custom experiences, without relying on expensive big tech clouds and their proprietary models as the backbone of their businesses.
It’s an exciting road to run on to get there. AI will undoubtedly change the world of video faster than we can imagine in the coming years, and the we look forward to the world’s open video infrastructure being the most cost effective, scalable, and reliable backbone for all the compute required to enable this bold new future.