Is AI video generation any good yet?
The short answer is no…
But if you feel like reading more, the longer answer is it’s getting better very quickly, but it’s not at the quality level we expect in the Film/TV industry yet. If you squint your eyes, and give these a passing glance, like most images on social media, then they will probably look pretty great on your phone screen. The animation is subtle, but convincing, and there in lies the trick… any extreme movements quickly devolve into a smear of pixels. A closer inspection reveals that fine details are lost, colours shift, and features pop in and out of existence. Feel free to browse the galleries below, and click on the images to view these issues easier
Before we get into the details, I think it’s important to recognize how significant an improvement the latest Runway ML Gen-2 update is… Below is what I like to call The Frog Test, I run this in all the AI applications because I like frogs, and they are the first thing I want to see everyday. The older version of Gen-2 from July of this year was producing Lovecraftian abominations, and while the new version does appear to finally understand what a frog is, it hasn’t fully resolved the way a frog moves… or blinks for that matter. Still a very large improvement.
Now lets examine the issues in closer detail, like we would in a TV/Film environment
Below are the best results out of dozens of tests taking images from Midjourney v5.2, and trying to turn them into videos via Runway ML Gen-2 (Nov 2 2023 release)
Below are some of the worst results out of dozens of tests taking images from Midjourney v5.2, and trying to turn them into videos via Runway ML Gen-2 (Nov 2 2023 release)
Results
Pros
Fast generations, usually under a minute
Simple interface
Okay for very subtle animations that don’t include humans, or animals
Probably good enough for social media already (This is actually a con, when you think about it)
Cons
Complex animations don’t work, no run cycles or Michael Bay camera moves yet
Faces/eyes tend to distort and flicker
Training set seems to be limited, not very good at animating things that don’t exist, or are poorly documented
Artifacts are everywhere in the high quality outputs. Even after upscaling and denoising
The longer the video generation is, the worse it gets
Most generations don’t produce the results you want, and are either broken, missing animation, or doing something else unusual
Price is still high for individuals
No way to iterate, any minor changes you want will require a completely new video even if you use the previous seed value
Significant detail loss and colour shifts happen when using an existing image as the input
Conclusion
The results are scary, even if they are not good enough to use in high quality productions like TV/Film, where every pixel is scrutinized a hundred times over. I wasn’t expecting this kind of advancement in video until summer next year, but as with all things AI, it seems that you can’t predict it. I will attempt to make a few more predictions however…
Next year we’ll start seeing a lot more AI videos on social media because of advances like this, and other software like Pika, and AnimateDiff
The generations are sometimes good enough to pass for stock, so we’ll also see a boom in that, like we already have in images (get ready for every YouTube video to have AI thumbnails and stock videos)
Lastly we’ll probably start to see some more tech savvy clients sending us AI generated reference for use in productions. And this will cause many a headache :(
But don’t be discouraged, there are still many, many things wrong with these videos. One of the biggest things, which I didn’t mention above, is that you can’t really generate the same thing consistently across multiple shots/angles, in images generators or video… so If you are worried about AI coming in to do all your comp work, matte paintings, or asset creation, don’t be.. yet