What the Tech!
Welcome to What the Tech!, a space where our Creative Technologist James Pollock shares updates that have piqued his interest in the world of tech.
Generative Extend in Premiere Pro (beta)
A while back Adobe announced that new AI-powered features would be coming to their video editing software, Premiere Pro, including Generative Extend. Well, that feature has now arrived in the latest beta version of Premiere, but what does it do? Adobe’s demo video does a great job of showcasing how it works:
So, basically, Generative Extend lets you pull the end of your clip beyond the length of the actual footage, prompting Adobe’s generative video model to continue the rest of the clip, creating whole new content based on the previous frames. Sounds similar to the image-to-video feature we tried with Runway a few weeks ago, but I was intrigued to see how much looking to the previous frames helps generate more consistent and realistic extension.
Okay, so here’s the test I gave Generative Extend. A video of a tram passing in front of a bookshop I downloaded from Pexels. Now let's imagine that I wanted to use this shot but I wanted to start on the bookshop, have the tram pass, and end on the bookshop. Right now, the tram is already passing by at the start of the clip, so what if I use Generative Extend to try and give me the moments before the tram enters the shot.
Interesting! I’ve put a green border around the generated section of the video. It’s able to get some of the information about what’s in that space when the tram’s not there, but a lot of stuff is just missing or wrong. To be honest, I set out to try and push Generative Expand to its limits, and thought this might be something it would struggle with, so I’m not disappointed at all! I was trying to discover if it takes information from the whole video or just the last frame, and it definitely does!
It looks like it interprets the reflection of a truck as being a truck on the other side of the tram. Maybe the generated shop front looks greenish and odd because it's reconstructing from what it sees through the window, rather than from later in the video. I'll have to do another test!
Runway Gen-3
A few weeks ago we looked at Runway's new Gen-3 video generation model and how it can take a single image and use it as the start or end frame of a generated video. Since then, they've unveiled a new feature for 'video to video' generation, which generates video guided by another video.
To test this process, I rendered out a simple previz-style shot where the camera flies down from above a city street and comes to rest on a car under a bridge.
I uploaded the shot to Runway and prompted for "Dramatic cinematography, camera swoops down over London and onto a classic black sedan parked under a bridge, cool greenish bluish tones", and this is what Gen-3 generated:
Pretty interesting! Obviously the quality isn't there as a final output, but it does capture some of the vibe of what a final shot could look like. The question is, how helpful is that for previz, storyboarding and beyond?
Film and TV studio Lionsgate appear to see potential in Runway's offering, they've signed a deal with the AI company to train a bespoke model on their own content and claim it will save millions. On the flipside, Runway and other AI companies have been accused of using content without permission to train their public models, something that factors into AI use policies at broadcasters and studios, as well as VFX companies like us.
SMERF
SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration
While NERFs (neural radiance fields) were seen as the big thing in volumetric capture and rendering, they drifted into the background when the more performant Gaussian Splatting tech appeared. However, Google has been doing research to make NERFs suitable for real-time applications too.
This is where SMERF comes along. I won't get into the ins and outs of the advancements, but their project site has all the details as well as some great demos you can try in your own web browser. It looks like Google is rolling out this tech to Google Maps to recreate 3D interiors for shops, cafes, etc. I'm interested to see which approach ends up being most applicable to VFX.