MIT Is Teaching The Machines How To Make Pizza Based On A Single Photo

MIT Is Teaching The Machines How To Make Pizza Based On A Single Photo

Generative adversarial networks (GAN) can do a lot of things—it’s basically the type of machine learning used to generate realistic AI faces and deepfakes. But researchers at MIT are using GAN to do the holy, blessed work of building a neural network to teach computers how to make pizza.

The study is titled “How to make a pizza: Learning a compositional layer-based GAN model,” and was spotted by ZDNet on

The so-called “PizzaGAN Project” is an attempt to “teach a machine how to make a pizza by building a generative model that mirrors this step-by-step procedure.” In plain speak, because pizza is comprised of layers, the researchers set out to teach machines how to recognise different steps in cooking by dissecting images of pizza for individual ingredients.

So, a plain pizza would look one way. Adding toppings and ingredients would visually, change the overall appearance. By identifying the visual changes, theoretically, the neural network could then reverse engineers the correct sequence of steps.

The researchers first created a synthetic dataset of about 5500 clip art pizza images. The next step involved trolling the #pizza hashtag on instagram for real-life pizza photos. After filtering out ‘undesired’ images, the researchers were left with 9213 pizza photos. The PizzaGAN code then does two things.

First, it’s trains the machine how to add and remove individual ingredients, such as pepperoni, and then create a synthesised image. Another model then detects the toppings that appear, and then predicts the order the toppings appear in the cooking process by calculating depth.

So if you have a photo of a pizza with mushroom, pepperoni, and olives, PizzaGAN would potentially be able to identify the three toppings, then see the mushrooms were on top—and therefore deduce that ingredient is added last. (You can play around with removing and adding ingredients, as well as cooking/uncooking the pizza on the PizzaGAN site.)

The PizzaGAN works by first identifying ingredients, before predicting depth layers to reverse engineer the cooking process. (Image: MIT)

The results were pretty accurate. Though, in their paper, the MIT researchers noted they had greater results from the synthetic data set. In general, they found the experiments revealed PizzaGAN could detect and segment pizza toppings, fill in what was supposed to be underneath, and infer the order with minimal supervision.

In the long run, one could imagine a neural network being able to scan a photo and spit out a pretty accurate recipe based on ingredients, how thoroughly it’s cooked, and even barely visible spices. As it is, the research is mostly just demonstrating an AI’s ability to differentiate between a confusing pile of ingredients.

While pizza is all well and good, some of us out here are lactose intolerant. To that end, the researchers concluded that the same approach used in PizzaGAN could be applied to other layered foods like burgers, sandwiches, and salads.

In a non-food context, the researchers noted it could also be applied to areas like fashion via digital shopping assistants. Think of a modern version of that smart closet in Clueless that Cher uses to pick out her outfit. I’m still waiting on that by the way.