There’s been loads of speculation about the HTC One M8’s strange new camera. With one large lens on the back and a slightly smaller one just above it, no one quite knew what to think. Is it just a 3D camera? Is one telephoto and one wide? Is it a light field camera like the Lytro? None of the above.
Turns out, it’s actually something entirely new (to mobile phones, anyway). And it could be a glimpse at the future of photography.
A few basics to get us started. As with last year’s HTC One, the HTC One M8’s main camera has a 4 UltraPixel camera with an f/2.0 lens. The lens is exactly the same as last year’s, but while the image sensor is based on last year’s design, it’s totally new (and comes from a new supplier). The revamped sensor should give daylight photos much better saturation and colour accuracy.
The real star of the show, though, is the two camera design. Dubbed Duo Camera, the system deploys a second lens to get detailed range/depth information for every pixel. This allows the camera to take shots just as fast as a standard shooter would (in fact, it’s the fastest phone camera we’ve used), but then opens the door for you to do some serious tweaking of the images in post production, making your subjects really stand out from the background.
To get all the details about what makes this camera so unique, we chatted with Symon Whitehorn, HTC’s Director of Camera Development. As the father of the Duo Camera, he had a lot to say.
Gizmodo: So, what is the overall intent with the camera on the new HTC One?
Symon Whitehorn: It’s our real first foray into what we believe is the next wave in imaging, which is information imaging. We’ve seen the early steps and concept platforms for computational imaging, and we’ve been talking about it for a long time as an industry. But we feel that now is the time to start putting products in people’s hands which we can actually start to learn from and develop. We genuinely believe that the next wave in computational imaging and information imaging will open up a host of new imaging opportunities.
We’ve had the film age, we transitioned to the digital age, and now we’re sort of transitioning to the information imaging age, where cameras start being less dumb. They’re becoming smarter and are starting to be more aware of what they’re seeing and can assign values to what they’re seeing. And that starts opening up a host of opportunities for us to A) manipulate images, but also B) extract the data and information from them.
Ultimately we don’t even know yet what all the use cases are for this, which is kind of why we want to put it in consumers’ hands. It’s a very pretty exciting phase.
Giz: Right, so it seems we’ve seen a bit of this with Lytro , but putting computational imagine capabilities into a phone is a pretty different application.
SW: I really admire what Ren (Ng) did with Lytro. I think it’s a really worthwhile thing to pursue. I do feel like there’s a little bit of solutions looking for a problem with that technology. I think the shame there is that it was touted as solving “the focus problem,” and I think there really wasn’t a focus problem to begin with. Not really.
As we move forward we start to get into looking at what is the emotional content of an image? How can we extract all of the information and get more meaning out of it? What we really wanted to do is create a camera that not just captures memories but one that can actually augment your experience. I think that’s the future of imagine. How can technology help these smart camera really augment the experience as opposed to just capture it.
Giz: So, in terms of the technical details, simply put, how does the Duo Camera actually work?
SW: Ha! Well, the basis of it is sort of old technology. You take a stereoscopic view of the world, much like your brain does — you see the world in 3D and you assign range and speed values to things based on your experience — and that’s essentially what this is doing. It’s a stereoscopic view of the world, but what it has to do in order to make it into something you can manipulate is to actually assign values to each individual pixel in the picture. So, basically, what the brain of the system is doing is assigning values to what it sees.
There are a few early benefits we already have from this, which is the shallow depth-of-field “bokeh” kind of effect we can create that’s typical of expensive glass. But there is going to be many more applications for it. We’ve just scratched the surface, and we’re not really there yet in terms of the pure accuracy of the speed measurements, but the more we develop this platform, there’s going to be some pretty intriguing data we’re going to be able to extract. So, yeah, I think the best analogy is that it sort of replicates what your own biological system does, but it’s silicon.
Giz: And in terms of the second, smaller camera, when we look at the photos we’ve shot, the image isn’t actually coming from that second lens, right? It’s just for depth information, is that correct?
SW: That’s correct. At present that other sensor and lens stack doesn’t actually add photo data to the image file, because we wouldn’t be able to load two UltraPixel sensor in there, and that wouldn’t make a lot of sense, though there are some things we’re investigating for the future. But yes, currently, that second camera stack is just there to add extra data to the content.
Giz: OK, so you take a photo and you realise “It’s not quite where I wanted the focus to be.” Two questions: Just how much can you correct the focus and have it look alright? And secondly, how does it do that?
SW: So, I mentioned it assigns a value to every pixel in the image — we know the distance of each pixel from the camera, that is. So we can then assign which part of the frame you’d actually want the focus on. It’s different from, say, the Lytro system which takes many exposures. That’s why movement isn’t a good thing with that system, and also you end up with one very low-resolution file that’s actually quite heavy because of all of the layers. We don’t do that. We basically assign information to the pixels, that way we can generate range-data which we can then interpret.
For instance, the bokeh effect, essentially what we do there is we know the range of those highlights in the distance, we know they’re further away from the face. We can actually combine that with facial-recognition as well, and other proprietary computational stuff. I’ve got to be careful not to give away any secret sauce, but essentially we can identify the scene, identify that it’s portrait, identify the range of objects within the scene, and then you can actually change the focal point. Basically it divides into zones currently, so the level of finite resolution of range is sort of a coefficient of the processing power and speed available. So we’ve had to make some pretty smart decisions about how to maximise the phone’s capabilities.
If I were using conventional glass I’d essentially bracket an area that I wanted in focus and drop out the background to get that lovely out-focus bokeh effect in the background. With this we’ve sort of make some assumptions, initially, prioritizing the portrait and dropping out the background, but you can correct that if you want to, because the value is always assigned. You know how in Photoshop you have lenses and blur effects? We can basically apply those automatically to pixels with a value setting within a certain range. So, really it’s more adding layering, masking, and depth to the image than it is focal correction.
Giz: So, basically, with the depth information added by the second camera, the app can then use that information to say, emphasise the pixels that are between three and five feet away from the camera (when the image was taken) and deemphasize or blur pixels that were say, seven feet and beyond. Is that about right?
SW: Essentially, yeah, in gross terms that’s right. Basically the brain correlates the data between the two lenses, and assigns a value to that which you can then manipulate later. And like I said, we’re sort of at the embryonic phases of what we can do with this. So in future revs it will be more refined, it will have more detail, and as processing power increases, we’ll be able to do more with it as well.
And I think the really exciting thing we’re doing here (which I can just now announce) is that we’re going to be releasing an SDK for the camera. So hopefully people will go out there and do better things with it than we had even imagined. We expect to see some pretty amazing third-party apps come out of this.
Giz: And could you talk briefly about Smart Flash 2.0?
SW: Well, I think we did an excellent job on the M7 [ed. the original HTC One’s codename] with its flash metering. I think we all know one of the worst things about a flash is when it blows out the scene and basically destroys any recognisable features you might have had.
Dual flash itself is not really new, what’s different for us is the cleverness behind the algorithm which allows it to throw out the correct temperature and illumination for the scene. So, basically, we’re instantly assessing the environment (because the camera is awake before you hit the shutter button), and actually metering it appropriately based on the range from the subject and the ambient temperature of light to help you generate the best-looking skin tones. It’s really about subtlety. It’s about putting subtlety back into low-light flash imaging. Of course the UltraPixels help, but sometimes you just need better illumination. And it’s sampling at a very high rate, and we can produce over a thousand variations of colour temperature to match the scene.
Giz: Great, anything you’d like to add?
SW: I think the biggest thing is what I was saying before, the opportunity is there for the community to tell us what they want to do with this new imaging technology. We want to drive a lot more photographic performance, obviously, but I think there are other applications and other uses for the technology, and it will be really intriguing to see what will be done with it. We basically want to build this into a really holistic system, so we can actually take this content and publish stuff. At the end of the day the reason you shoot is to share, right? So, we’re interested in how we can make it more compelling to share and enjoy.
These are certainly some lofty goals, and it’s always fun to see companies put bleeding-edge technology in consumers’ hands just to kind of see what will happen. To read our impressions of what it’s like to actually use the new camera, head on over to our full HTC One (2014 edition) review .