As powerful as video game consoles have become, even the most graphically stunning video games don’t look like realistic, real-world footage, which is arguably the ultimate goal. But researchers at Intel Labs may have found a shortcut by applying machine learning techniques to rendered footage from a console that takes it from beautiful to photorealistic.
Over the past few decades, the graphics capabilities of home consoles have advanced by leaps and bounds. More processing power in the machines allows them to not only render more detail in the 3D models that make up a scene, but to also more accurately recreate the behaviour of light so that reflections, highlights, and shadows behave and look more and more like they do in the real world.
But the hardware isn’t quite to the point where video games look as photo-realistic as the computer-generated visual effects that Hollywood blockbusters employ to wow audiences. A console can render 60 frames of video at 4K resolution every second, but a single frame of a movie featuring complex computer-generated effects can take hours or even days to render with photo-realistic results. Game streaming is one solution, where powerful computers far away render a game in real-time and then send finalised frames over the internet to a gamer’s screen, but this new research is even more clever than that.
We’ve already seen machine learning used to transfer the unique artistic style of a famous painter’s work to another image, and even moving video, and that’s not entirely dissimilar to what’s happening in this research. But instead of training a neural network on famous masterpieces, the researchers at Intel Labs relied on the Cityscapes Dataset, a collection of images of a German city’s urban centre captured by a car’s built-in camera, for training.
When a different artistic style is applied to footage using machine learning techniques, the results are often temporally unstable, which means that frame by frame there are weird artifacts jumping around, appearing and reappearing, that diminish how real the results look. With this new approach, the rendered effects exhibit none of those telltale artifacts, because in addition to processing the footage rendered by Grand Theft Auto V’s game engine, the neural network also uses other rendered data the game’s engine has access to, like the depth of objects in a scene, and information about how the lighting is being processed and rendered.
That’s a gross simplification — you can read a more in-depth explanation of the research here — but the results are remarkably photorealistic. The surface of the road is smoothed out, highlights on vehicles look more pronounced, and the surrounding hills in several clips look more lush and alive with vegetation. What’s even more impressive is that the researchers think, with the right hardware and further optimisation, the gameplay footage could be enhanced by their convolutional network at “interactive rates” — another way to say in real-time — when baked into a video game’s rendering engine.
So instead of needing a $2,000 PS6 for games to look like this, all that may be needed is a software update.