The launch of powerful generative AI video was a before and after moment for the creative industries.
The AI video craze was kicked off by Runway back in 2023 with the launch of Gen-2, its pioneering video generation model. It was a huge leap forward from the clunky, warped images that the public were used to. The real watershed moment was the launch of OpenAI’s Sora. Sora not only created astoundingly high-quality videos, it was also capable of generating clips that could feasibly pass for real-world footage. Things would never go back to the way they were and video production was changed forever.
But all this innovation is private, closed source and often only available on a subscription-basis. That’s why free-to-use, open-source options like Stable Diffusion are rising as serious challengers to the top generation models. Combined with affordable, decentralized access to AI compute, there is a clear path towards an open AI video infrastructure for everyone.
This post will look at how the leading AI video generation models have changed video production, what projects they’ve cooked up, as well as how open-source models are disrupting this hotly contested market.
Sora is a video-generation model from OpenAI, the company behind the agenda-setting chatGPT. Sora can currently create photorealistic videos up to a minute long, based on user prompts in either image-to-video or text-to-video format.
The launch of Sora was a big break from everything else on the market. Instead of the dream-like previous incarnations of text-to-video generators, complete with images that ooze and flow like a surreal low-fi tapestry brought to life, Sora could produce videos that, while not instantly convincing that they could be real footage, might make the average internet user double take.
Sora’s launch videos created waves as they showed a huge variety of the model’s capabilities, from a herd of wooly mammoths shuffling through a snow-covered mountain plateau to a happy young couple walking through a bustling, wintry city-scape.
Despite the millions of overawed internet creatives, Sora remains off-limits to almost everyone. As of now, the model is only available to a select few testers, filmmakers, visual artists and designers. But that is soon set to change, with OpenAI CTO Mira Murati reporting that the video generation model will be available in 2024.
But Sora isn’t the only video generation model on the market. OpenAI’s model has a growing list of competitors, most notably Runway and Pika Labs.
When the company released its Gen 2 system, it was a huge leap forwards in the public consciousness about what AI could create.
Gen 2 was both image-to-video and text-to-video and would duly create something approximately similar to whatever users wanted, up to 3 seconds in length. Now, creators can generate 4 second videos as standard, with the option to extend the shot up to a total of 16 seconds, for both image and text to video prompts.
In February 2024, Runway released some details about Gen 3, highlighting fine-grained temporal control for dramatic transitions and photorealistic humans with a diverse range of actions, gestures and emotions.
Runway is really leaning into the cultural and serious filmmaking element of generative AI. It even created its own AI film festival that takes place every year. Creators from around the world can submit films up to 10 minutes in length that include AI-powered tooling, via a Runway link. The top prize is $15,000 and 1 million Runway credits.
Last year, filmmaker Daniel Antebi won the top prize with his unnerving film Get Me Out which sees a man struggle against nightmarish supernatural forces that trap him in a seemingly normal, suburban American house. It might not be an Oscar winner, but it shows that AI can be effectively integrated into serious, narrative filmmaking.
Pika is in some ways an offshoot of Runway’s original generative video model.
Pika co-founder Demi Guo was determined to make a film using Runway. Guo labored over her film for months, but she ended up frustrated with what the combined efforts of a group of highly technical people could produce using the generator.
So, she dropped out of Stanford University to launch Pika, with the dream of developing a more user-friendly AI video generator for amateur creators. Since then, then Pika has raised millions in both funding and users.
Pika also offers text, image and video-to-video conversions and gives users the chance to mix up their videos by changing dimensions, altering characters, environments and content length.
Pika isn’t intended to be used by professional filmmakers, but it’s still more than capable of creating compelling images that can craft a narrative. This sci-fi trailer made using Pika shows that you don’t need photorealistic quality to make something that can instantly create a powerful atmosphere and hold a viewer’s attention:
Pika founder Guo told Bloomberg that Pika will be able to create high-quality videos up to a minute in length and that within two years it could be used to create a feature film. But that doesn’t mean there won’t be a significant impact on production and filmmaking before then, many impactful clips and ads are much less than 10 seconds long.
AI video isn’t just some kind of boredom-killing experiment. It’s already serious business. For that reason, there are already some prominent partners lining up to implement these models into their business offering.
In April 2024, U.S. software maker Adobe said that it was in the early stages of allowing tools like OpenAI’s Sora into its popular Premiere Pro app. While this sounds promising, it’s worth noting that Adobe’s bullish statements on AI might be more to do with boosting its lagging share price than building actual products.
Meanwhile, recently-revived Toys ‘R’ Us became the first big brand to use Sora in its advertising. The ad premiered at the Cannes Lions Festival, an indulgent get-together for the international advertising glitterati, and received mixed reviews. Some praised the innovation from Native Foreign, the agency behind the ad, while others criticized the lack of visual consistency, particularly the appearance of the main character. Regardless of the advert’s finer technical or artistic points, it’s a watershed moment for the use of AI in advertising.
It’s not just Sora getting some high profile attention. Madonna’s Content Director Sasha Kasiuha said they used Runway to create state visuals for the Queen of Pop’s Celebration Tour.
Source: https://runwayml.com/customers/generating-new-worlds-for-madonnas-celebration-tour/
Many commentators have described AI video generation as an existential threat to the TV and film industry. This is almost certainly an exaggeration but it is true that AI has leveled the playing field for video production.
Scenes that would require a team of experts to create can now be conjured up with just a few short prompts. This not only reduces the potential payroll of production teams, but also the overall length of shoots. AI can also help with color correction, scene transitions and other time consuming editing jobs.
AI video generation is significant because it puts professional-quality tools in the hands of the wider public, without requiring detailed know-how. This is a game changer for independent creators and even smaller studios. Professional studios will need to adapt to stay competitive and integrate AI into their workflows.
While AI video might not be set to replace Hollywood production studios overnight just yet, that doesn’t mean the impact isn’t already being felt. Tyler Perry, a US TV and film mogul, said that he halted an $800m expansion of his Atlanta studio after the launch of Sora. Perry stated that although he is impressed at how the technology will improve film production, he foresees the loss of a lot of jobs across the industry.
There’s one thing these models all have in common: they’re private and closed source.
Open vs closed source is a divide in the tech world as old as the industry itself. There are those who think tech should be freely available and those who think the source code behind innovative products should remain behind closed doors. In the past, the closed-source option has usually won out because for-profit products have better earning potential for investors, meaning these products are easier to fund.
But AI video could be set to buck this long-established trend. Open-source AI video models like Stable Diffusion are putting up some stiff competition.
Stable Diffusion is widely available to anyone who has the technical capability to implement and run it. It can be deployed on both local and cloud devices. Stable video diffusion is already producing promising short videos, and as the models rapidly evolve, we can expect even higher quality outputs in the near future. Development and updates can also be much faster than on closed source models due to the size and activity level of a large community of users. This large-scale collaboration is one of the biggest reasons why open source models are able to deliver cutting edge-performance updates at a rate that can even outpace large, well-funded models. Stable Diffusion is generally free to use, aside from the costs to run the models, which can be significant.
Any way you look at it, generating AI video is expensive. This is because you need huge amounts of compute for the simplest text-to-text tasks, let alone creating the complex results needed in AI video.
Runway Gen-3 is $1 per 10 second video generated.
Sora can be up to $6. A recent music video made with Sora required up to 700 generations, totaling over $4000.
Luma AI charges $400 for 30 Dream Machine generations
The costs are high because the computing demands are heavy. On top of that, the servers capable of providing this power are in short supply. This is because almost all of the AI-ready compute is controlled by a small handful of companies like NVIDIA, Microsoft and Amazon. With all the power and the ability to control prices for an ever-increasing demand for AI compute, there isn’t much incentive to change. Added together with the prominence of private, closed-source AI video models, it’s clear that the future of video is at risk of becoming yet another tech black box.
This is where open source models, combined with decentralized networks have a real opportunity to smash the chokepoint on AI. Decentralized networks are a web of interconnected GPUs capable of processing AI tasks. Using an open job marketplace model on a decentralized network like Livepeer means that developers and small businesses can simply process individual inference jobs at minimal cost instead of renting whole servers. This essentially sidesteps the biggest cost factor for AI video compute.
A decentralized, open job AI marketplace like Livepeer is the most cost-effective and accessible option for developing the full potential of AI video. With affordable compute, the ability to experiment with the models and build innovative applications is open to all creative developers and companies, regardless of how well they are funded.
AI video is still a very young technology. Access to new, exciting technology is often gated and expensive, but drops down as competition comes along. In order for that to happen to AI video, the underlying foundations of the technology need to be changed, or competition will simply be priced out of the market.
Livepeer is a video infrastructure network for live and on-demand streaming. It has integrated AI Video Compute capabilities (Livepeer AI) by harnessing its massive GPU network and plans to become a key infrastructure player in video-focused AI computing.
Twitter | Discord | Website