Researchers Can Now Make Moving Videos From Just a Single Photo

Researchers Can Now Make Moving Videos From Just a Single Photo

A photo can bring back fond memories, but a video can help transport you back to the tropical destination you vacationed at years ago. So researchers at the University of Washington have developed a deep learning method that can turn a single still image into moving video without requiring any reference footage captured where the photo was taken.

It’s another novel use of machine learning that demonstrates the potential benefits of the technology, and while it’s similar to the techniques that companies like MyHeritage leverage to bring photos of old relatives to life, this new approach instead focuses on natural flowing phenomena such as water, smoke, and clouds. Developed at the University of Washington’s Paul G. Allen School of Computer Science & Engineering, the model doesn’t require any input from a user aside from a still photo, like from a recent trip to Niagara Falls.

As with any automated image processing done through deep learning methods, the process starts with training the model, which in this case was done using thousands of videos of rivers, waterfalls, and even clips of the ocean; anything that demonstrated a noticeable amount of fluid motion. The process would start with the neural network trying to predict the motion of a video given just a single starting frame, after which the prediction was compared to the actual results so the model was able to slowly identify visual clues that would reveal how fluids are supposed to move, correcting any inaccuracies with its predictions.

The trained model could then be applied to still photos where it would determine how each pixel should move on a frame-by-frame basis to create a short animation, but that created its own challenges because rivers and waterfalls are perpetual phenomena, and the flow of pixels needs to be constantly replenished where it starts to move. The researchers developed another technique called “symmetric splatting” that predicted the motions of a flow moving forward in time, but also the motions if time was moving backward. It resulted in two different animations that when combined and intelligently blended create a perpetual, believable motion that’s perfectly looped.

At times the results are very close to being photo-realistic, but other times there are subtle visual clues your brain picks up on that raise red flags about the authenticity of what you’re seeing. One of the things the deep learning method the researchers developed ignores is how moving water and smoke can distort light. Ripples at the bottom of a waterfall distort reflections in complex ways that your brain is used to seeing, and when not recreated accurately are very noticeable. The same goes with how mist or smoke obscure and distort what’s behind them.

But these are issues that can often be overcome with more training on a larger database of source videos, and eventually, in addition to options on your smartphone that let you perfectly adjust and tweak the lighting on a photo after it’s been taken, you might one day be able to bring a static shot to life again.