‘Speaking Portraits’ Make It Unsettlingly Easy to Turn Still Photos Into Animated Deepfakes

‘Speaking Portraits’ Make It Unsettlingly Easy to Turn Still Photos Into Animated Deepfakes

Earlier this year, social media was briefly taken over by seemingly everyone using MyHeritage’s Deep Nostalgia feature to bring old photos to life. The company whose AI technology powers Deep Nostalgia, D-ID, is taking that technology one step further, turning still headshot photos into videos that move and say whatever a user wants.

As impressively lifelike as the results from MyHeritage’s Deep Nostalgia often were, the feature had its limitations. After a still photo of a person was uploaded, their orientation in the shot was analysed to determine which direction their head and eyes were looking, at which point a matching video from a small collection of ‘driver videos’ was selected to be used as a reference to create the AI-generated movements. Users had no control over the movements in the generated video, and the subject made no attempt to speak.

At the recent TechCrunch Disrupt 2021, D-ID revealed a more advanced version of Deep Nostalgia called Speaking Portraits that can make still photos appear to move and talk based on either a source video, just an audio clip, or even a text file with a pre-written script.

Two flavours of Speaking Portrait will be available. Single Portrait can turn a still photograph into a talking head, but the movements will be limited to just the head; anything else in an uncropped photo, including a person’s body and whatever’s in the background, will remain static, potentially ruining the believability of the effect.

The other more advanced version of Speaking Portrait is Trained Character and instead of a still photo, it requires a 10-minute video of the person being animated with them going through a specific set of motions and saying certain things, as defined by a set of guidelines D-ID has created. The results, as seen in this sample above of a newscaster delivering a story, are far more realistic and believable than what the Single Portrait produces, which still has the telltale signs of a ‘deepfake’ including blurry edges, and unusual warping artifacts as the face moves. Trained Character also has the added flexibility of swapping out what’s in the background, and the potential to animate the person’s body, including their arms and hands.

MyHeritage’s Deep Nostalgia feature felt more like a promotional tool than anything; a way to drive new users to the website’s various services. But Speaking Portrait has far more potential, and not just for those who want an animated stand-in for themselves after rolling out of bed just in time for their first Zoom call of the day. The technology can ensure that news agencies always have a ‘live’ presenter on hand for breaking news, even in the middle of the night, but it can also allow someone to appear to deliver the news in other languages they don’t actually speak. It’s an application we’ve seen other companies pursuing as well, where AI-powered facial manipulations can make movies dubbed into other languages appear more natural by ensuring mouth and facial movements match the new dialogue.

Are there still reasons to be concerned about how quickly deepfake technologies have progressed so far? Of course, but now that they’ve matured, and become much easier to use, we’re finally starting to see the potential benefits of the technology too.