Fake videos are getting really damn good, and now they’re getting even easier to make. For innocuous purposes like, say, a moving avatar, it’s a pretty cool development. But for more insidious use cases, like exploiting the technology to harass someone online, it’s unsettling.
Researchers at the Samsung AI Center at Moscow and the Skolkovo Institute of Science and Technology published a paper—Few-Shot Adversarial Learning of Realistic Neural Talking Head Models—on Monday that illustrated how their system can create a virtual talking head with a little photo.
While researchers have, over the last year, put out a number of new ways to create deepfakes—in which someone uses machine learning to create an ultrarealistic fake video of someone—there’s still been one crucial prerequisite to make one.
You need to collect a bunch of images of an individual to generate a realistic deepfake of them.
Of course, this isn’t impossible to do if you have an open-source photo-scraping tool and that person has posted enough public photos or videos of themselves online. But it was still a roadblock and one that afforded those who might be a victim the ability to be more careful about how much exploitable data they might share online. But this new system frees someone up from what was once a necessary and time-consuming step.
The researchers write in the paper that their system can create “talking head models from a handful of photographs” and “with limited training time.” When someone develops a deepfake, they have to feed the wealth of photos of an individual (the training data set) into a deep neural network which will then generate a manipulated video. These researchers are claiming that their system not only needs as little as one photo but that it also doesn’t take as much time to learn from the training data to spit out the fake video.
The researchers wrote in the study that for “perfect realism” the model was trained on 32 images, which is still a very small number and easy to collect in today’s age of oversharing online. It’s hard not to imagine someone being able to glean those images from a simple cursory search on someone’s Facebook page. More importantly, it shows this technology is developing at a rapid clip.
There are examples in the study displaying generated talking head models trained on just one image, and even the stills indicate the range for which this system can take a still and bring it to life, with images of the Mona Lisa and the Girl with a Pearl Earring making a range of expressions. The videos are even more unnerving—with just one source image, the system was able to generate realistic talking head models. And for many of the examples, it’s not easy to identify that these are completely fake.
In the paper, the researchers note that this type of technology might have “practical applications for telepresence, including and multiplayer games, as well as special effects industry.” As tech companies move into animated avatars and virtual reality, this tech feels like a natural next step in moving toward even more personalised and realistic visuals. And it also feels like a natural next step for places like film studios who might want to, say, create a posthumous recreation of an actor in record time.
But it would be irresponsible to exalt this technology without pointing to the very real threat it poses to victims of manipulated videos. In fact, when deepfakes first entered the public consciousness, it was only a matter of time before the tech was weaponised against women online.
The bleak reality is that there will always be shitty people online who will exploit this type of technology, and as tech goes, there will always be people trying to make it easier and more efficient.