David Shapton charts the course up the learning curve that is the use of AI in all aspects of video production.
Learning about AI developments from Twitter is like drinking from a firehose. Among the authoritative and often surprising revelations about the latest AI models, you’ll find items like “100,000 apps to transform your video editing” or “improve your scriptwriting with these 50,000 brilliant Chat GPT prompts."
These are all very well; if you’re prepared to work through endless examples, you’ll probably learn something useful. But what’s often missing is any context around these techniques. That’s not entirely surprising: AI is progressing so fast that there often is no context to it. When we see news about AI, it’s not like we can say, “Oh, something similar happened in 1948." AI timescales feel like they’re quantised in milliseconds, and what happened yesterday isn’t a reliable guide to what will happen today.
So, rather than looking at individual and specific apps, it’s worth looking for areas that AI could usefully improve. There’s a lot of overlap, but broadly we’re talking about AI techniques that assist, create and process.
There are all kinds of areas where AI will, in the future, assist. For example, if you show a script to an AI assistant and tell it about your resources (equipment, personnel, budget, time, etc.), it could create a project management chart based on its knowledge of formal project planning methodologies. It could generate a list of equipment to order. Knowing what human resources are available, it could print out task lists for each individual. When the production is about to start, it could create call sheets. All of this assumes that it has accurate and relevant information and that inputting that data doesn’t take longer than doing the whole thing the old-fashioned way.
As a side note, it’s worth saying that all of this assumes that filmmakers want AI involved at this level. I feel it is natural to be wary of AI running the whole thing, but that over time it will be incorporated into productions because it will be not so good at some aspects but compellingly good at others.
It’s tempting to think around big-budget filmmaking when you’re talking about AI, but it’s probably in other types of production that assistive AI will make the most difference. AI transcription services can already “listen” to audio and display it as written text. What’s more, each word will link to timecode with a clickable link to the frames containing that word. This is a massive boon to editing news clips and documentaries, where several hours of interview footage can be quickly assessed for relevance and edited into the finished package.
AI will be able to sort your footage for you. It can dispose of obviously bad shots (like images of people’s feet and the catering van). However, this will always need a degree of caution - especially in a production where weird and unconventional shots are part of the plan. Through its ability to “understand” content, AI will play significant roles in media management. It should replace almost all logging and metadata-tagging duties, although, again, this will require caution and restraint, until a time when either there are no errors or we have a much better understanding of areas where errors are likely, and what we can do to either work around or eliminate them. In future, editing is likely to be on a continuum from “traditional” to “hands-off” editing - the equivalent of autonomous driving. Tell your AI assistant editor your ideas, and it will scan the footage and make the cuts that reflect your intent. Quite obviously, this raises about a hundred more questions than it answers at this stage in the evolution of AI.
From a creative person’s point of view - which includes the entirely reasonable aspiration that AI won’t put them out of a job - the new technology can help most with mundane tasks. There are already applications that can apply the same colour look to an entire collection of clips, and go way beyond merely imposing the same LUT. AI can “understand” a look in a deeper, more nuanced way than a LUT ever could because it is not restricted to a fixed number of “dimensions”. These are not dimensions in the sense that we usually understand them; they’re more like different ways to recognise things or to characterise them. It’s likely that AI “sees” things entirely differently from our perception, and this is one of the reasons that AI models - or the conclusions they come to - are hard for us to reverse-engineer. But it does mean that AI can capture the “essence” of a look and apply it to other clips, with a subtlety that would be hard to emulate without a lot of work.
In an ideal world, AI would do that basic matching, leaving the colour artist to work on their own and the director’s artistic vision. Please understand this doesn’t mean we have to rely on AI for inspiration: it’s possible to train the AI models with human expert knowledge, which would be stored in a nuanced, accurate and detailed way, ready to apply to your current project. Remember: this is much more powerful than LUTs because it can take the complete essence of a look and re-interpret it for radically different, as well as similar material.
RedShark readers might remember that for about a decade, I’ve been predicting you’ll be able to feed a script into a computer, do a bit of rendering, and output a finished feature film. At first, it was 99% tongue-in-cheek, but still with a sense that this was the almost inevitable result of extrapolating filmmaking technology as far as you could go. If you’d asked me then when I thought it would happen, I’d have said maybe in twenty to fifty years, if ever. Fast-forward to today, and we know how long it will take - because we can do it now. The only drawback is that it’s not very good, although still remarkable. The problem is that it’s tough to generate realistic-looking video.
Still images are pretty much solved: the latest outputs of Midjourney are photorealistic and can accept “artistic” and technical instructions with great precision and nuance. Adobe Creative Cloud’s Firefly can generate amazing new content based on existing images and do it transparently without copyright worries. Want a backdrop that’s autumnal instead of spring-like? Just ask. Want to extend a background? No problem.
But video? Not so much. Someone will solve it soon, but for now, here’s why it’s difficult.
When you shoot virtually anything with a video camera, whatever’s in front of the lens has to obey the laws of physics. Tables and chairs don’t jump around randomly, nor do they look like Monet painted them in one frame and Hokusai or Van Gogh in the next. But in AI-generated video, the results literally do look like a different artist painted them in every frame. AI generative video models like Runway are rapidly getting better, producing videos that are at least watchable, but they still look as though they’re made by a team of animators who never look at each other’s work. What’s needed is consistency across time - let’s call it “temporal continuity”. But future models will be able to do this. Solving the problem of time is only a matter of time.
It seems likely that AI will help with virtual production, and therefore become part of mainstream filmmaking. I hope that human artists still generate the main ideas, but AI can help to convert these ideas into fully-fledged backdrops for a virtual stage. I’m assuming that VP professionals will welcome any ability to speed up the process of generating 3D assets from original ideas.
When generative AI arrived in the public consciousness last year, it was like a runaway train, and it’s still hard to see exactly where it might lead. But some of the dust is settling, and some things are getting clearer. It’s important not to be distracted by those who say, “We don’t need creatives any more”. But the greatest works intended for humans will always be made by humans. If AI can help make those works faster and better, and allow us to create even more outstanding work, then that’s a good result. There are no guarantees, but I think it’s OK to be optimistic about the future.