In January 2019, when China Central Television, the largest broadcast network in the most populous nation in the world, aired a special to celebrate the Lunar New Year, the hosts welcomed four life-sized “personal artificial intelligences” to share the stage with them.
Called PAIs, they were three-dimensional holographic replicas of the presenters that moved, spoke, and sang to the delight of the cheering live audience. The program was viewed some 1.8 billion times. One of the most-watched TV shows in the world had been hosted by AI avatars.
The company behind those avatars is the Pasadena-based ObEN. This startup, with its 100 plus employees, is betting that in the future, everyone will want their own PAIs—to digitally try on clothes, to interact with friends, to keep the kids company while you’re away on a business trip.
In that future, celebrities will create PAIs to interact with fans to promote their latest films and albums. Teachers and doctors will have PAIs that offer personalised services to their students and patients. When you go to the mall, PAIs will pop up on the interactive screens there, enticing you to buy stuff.
ObEN describes its ambitious vision of the future as “personal AI for all.” And ObEN is far from alone, of course. Investors, tech giants, and even governments are betting big on lifelike digital avatars—between Facebook’s push to port your likeness into VR, the eerily lifelike AI news anchors put on the air by Xinhua, China’s state-run news agency, and the burgeoning CGI celebrity simulacra scene in Hollywood, there’s a newfangled interest in the (potentially vastly profitable) art of porting people’s digital likeness to our screens.
Cyberculture has revolved around avatars for decades, but the avi-to-avi future pursued by ObEN and others promise a level of representation saturation hitherto unimagined by even the most fervent cyberpunks. Would a world filled with PAIs really beget more convenience and entertainment? Or would it further accelerate already ascendent trendlines of the crowding and hyper-commercialisation of our digital spaces?
To better understand this new frontier of companion AI, and both its utopian and dystopian implications, I headed to Pasadena, to ObEN’s HQ, to become the first non-celebrity civilian to get my own PAI.
AI-generated avatars are a growing trend, especially in China. What sets ObEN apart from the competition, the company says, is the lengths to which it goes to personalise its product. In order to experience this firsthand, I sat for my personal PAI-ification in Pasadena one long afternoon. The 3D face scan was fairly painless—I essentially sat in a chair and made faces when prompted, and 3D mapping technology rendered a digital model of my mug.
The audio session was the gruelling part. Sitting in a recording studio in the ObEN office space, I intoned line after varyingly comprehensible line, like I was recording a spoken word poetry album of prose poems written by Siri. But I got through it (pretty quickly, apparently) and at the end ObEN had a sample of my voice.
I returned to the studio a week or so later so they could photograph me wearing another shirt, and that was it—my raw PAI materials were ready for processing.
“The creation of a PAI is based on the concept of ‘models,’” ObEN’s chief engineer, Dr. Mark Harvilla, tells me. There are just two major components of a PAI—its appearance and its voice.
“The appearance of a PAI is comprised mainly of a user-specific face model,” Harvilla explains, adding that “for now, a general body shape is used across all PAIs, but things like height, weight, body type, and clothing can be customised by the end user.” The PAI’s speech, meanwhile, is generated from a “user-specific voice model.”
Both models are built using a machine learning process called “adaptation,” or “fine tuning,” Harvilla says. “To create the 3D head of a PAI, a so-called base model is chosen, which most closely represents the given user’s appearance. This base model is then adapted to the specific user; the adaptation process captures nuances of the facial structure and appearance, closing the gap between the base model and the fully representative PAI.”
“Voice model creation follows a similar pipeline,” he says, “wherein a language-specific base model is chosen, and from the relatively small sample of audio recordings of the user, fine-tuned to reflect idiosyncrasies of the user’s voice.”
ObEN’s neural nets can currently give voice to PAIs speaking English, Chinese, Japanese, and Korean. “Both technology suites,” Harzilla assures me, “utilise state-of-the-art deep learning algorithms.”
For me, the process of bringing my AI-ified avatar took a couple of weeks, though ObEN was busy finishing various projects, and I’m not sure just how much of a priority making a wacky singing high-res avatar for a journalist was. Did I not mention my PAI was going to sing?
“I started this company,” ObEN’s CEO Nikhil Jain says, “when I realised I was travelling a lot and my kids were missing me back home.”
Jain and ObEN co-founder Adam Zheng, ObEN’s COO, were both veterans of the tech industry, and they’d often find themselves talking about how they have to travel frequently for work, leaving too little time to spend with their children. The pair also frequently found themselves discussing the emergence of AI technology, and the increasing personalisation of AI, and apparently, the topics converged, and the idea of PAIs materialised.
“I thought—what if my kids had my PAI back home?” Jain says. That way, his kids could interact with an avatar that looked, sounded, and behaved like their father while he was away for days or weeks at a time.
As Jain and Zheng set out looking for funding, though, investors pointed them towards the entertainment space. Chinese tech conglomerate Tencent, which provided a significant portion of ObEN’s $US23.7 ($34) million in funding, seemed particularly interested in driving ObEN towards celebrity PAIs. So ObEN started out targeting the entertainment industry.
“When we created technology,” Jain says, “we thought, ‘how do we get this out to a newer audience?’” Jain says. “And we felt like the best way to get that out was using the power of celebrities. Thanks to these investors, we were able to team up with the Spice Girls of China.”
Jain is referring to SNH48, a massive ‘idol group’ in the j- or k-pop mould, filled with rotating members, where 20-year-old singers join on and get voted off.
(SNH48 itself is essentially a massive startup; according to Quartz, investors have dumped more than $217 million in the venture, which really took off after China banned Korean entertainment imports after the neighbouring nation began constructing a missile defence shield. The world is a weird place.)
Regardless, Tencent brokered a deal that saw ObEN create PAIs for one of SNH48’s new songs and the accompanying music video. It was heralded as “the world’s first commercially released song co-starring human singers and their 3D AI avatars,” when it aired in December 2018.
ObEN developed PAIs for the pop stars that could both speak (in multiple languages, no less) and sing — and, so they did. “Our AI takes the regular speaking voice and converts it into the singing voice,” Jain says. “We want to make it like a Turing test — is it AI singing, or you singing?”
The PAIs show up in the video above, and their voices were recorded in the chorus of the single—another ostensible first in the realm of entertainment-focused AI.
“Our personal AI technology has great potential in the entertainment industry,” Zheng said at the time. “Let’s say you have an actor who stars in a movie. A producer may want him to sing the ending theme song, which he is not good at. We can make his AI avatar sing for him.”
Next, ObEN teamed up with CCTV for the Spring Gala Festival, to celebrate the Lunar New Year, and that’s where the PAI-ified TV hosts came in. So ObEN has designed PAIs for two of China’s major cultural institutions, and kicked off the year with a bang, earning a nice amount of exposure for their signature PAI sidekicks. Now, the company is trying to figure out what else people might want to do with their PAIs.
My PAI, it turns out, has a better singing voice than I ever will.
Personally, it’s hard for me to place where, exactly, this lands in regards to the uncanny valley.
I found it deeply unsettling at first, then just a little weird, then sort of funny, then ridiculous, then a little unsettling again. So it’s kind of the effect of staring into the mirror while you’re stoned for too long, if the mirror were a screen undertaking the realtime rendering of your face in triple A PS4 game-calibre graphics.
I found it a little hard to look at, and as of now, I’m not sure how much more I want to see of PAI-me. But it’s also only a limited clip, with no interactivity, so I can’t really engage with it or see what else it can do.
“It certainly doesn’t seem human,” Kelly Bourdet, the editor in chief of Gizmodo said, “It looks like an uncanny valley creep show avi but one that looks closer to you than to any other human on earth.”
“Your eyes are freaking me out,” a friend said. “Sound is all off, to me at least,” said another, “And you look like you tried but failed to audition for a Korean boy band.”
“Wow .... very cool... (and a little unsettling at the same time.),” my mum texted when I sent her the video. “Pretty close... when I first looked at the picture I wasn’t sure If it was you or someone else…until you started talking and said who you were.”
Strangely, it was my wife who was the most sold on the rendering. “Wow!” she said. “The teeth are weird, but other than that…”
Which was sort of what I was expecting—most agreed it looked well enough like me, and most got a kick out of the thing, which is precisely what it was designed to do, I think. But I wonder if one challenge these PAIs will face is that the uncanny valley will inevitably seem wider when you’re measuring your own likeness across it.
“For these avatars to be useful the uncanny valley has to be passed,” Jain says. “The voice has to have some emotion to it. The four different expressions we are able to create—joy anger sadness—to have the corneas move in conjunction with how your lips are moving, these are things that will reduce the uncanny valley.”
But renderings of our own faces come bundled with more insecurities and hang-ups than any other, I’d imagine, and we judge them and engage them more skeptically, more harshly.
That’s why, perhaps, some of ObEN’s competition, like SoGo, chooses to render personal avatars more cartoonishly. “The difference between us,” Jain says, “is that it’s an extremely personalised version of you.”
My PAI was decidedly not an example of what a typical version will look like—for one thing, I can’t interact with it; it’s basically a demo built using the company’s AI and my training inputs. For another, it’s something of a premium model. Later this year, ObEN says, it will release an app that will let users create their own lower-resolution PAIs by taking smartphone photos of their face.
These PAIs will mainly be toys; you can your digitised mini-self in different outfits and make yourself dance. It’s a way, Jain says, to get people familiarised with the idea of having a constant AI companion on hand. Now, for celebrity PAIs, and the PAI ObEN made for me, an in-depth, sometimes multiple-appointment process is necessary to get a good audio sample.
“For celebrities the requirement is to have an extremely high resolution, Jain says, “first they go through a 3D scan, then full voice recordings. We outsource that work to a studio of their choice.” Once the ObEN team is satisfied with the sample, it then takes another few days to generate the PAI.
For regular users though, smartphone resolution will have to do. “From one selfie you can make your PAI,” Jain says. “It may not be as high resolution as Taylor Swift’s, but it’s going to be looking like you. Singing in a voice like yours. We believe that to make your PAI should always be free.”
It’s early yet, but the cyberpunk would point out two potentially diverging classes of PAIs here — high res, glamorised versions for the rich, fremium PAI models for the plebes, but I digress. Speaking of cyberpunk, ObEN imagines users bringing those PAIs to the mall, where they might interact with mall-PAIs, the digital concierges there.
Your favourite celebrity might pop up when you enter the mall, and tell you where the jacket is looking for,” Jain says, describing what could be a scene out of Minority Report. “Any consumer can have a PAI, get the full shopping experience.”
Beyond commerce, ObEN is also experimenting with healthcare PAIs, and has entered into a partnership with Georgetown-Howard Universities Center for Clinical and Translational Studies (GHUCCTS), and MedStar Health Research Institute to create Tara, an avatar designed to interact with patients with heart disease on their tablets.
The institutes are carrying out a 5-year study to see if interacting with Tara will help improve patients’ recovery times and propensity for taking medication and so forth. “The PAI will monitor them, give them comfort,” Jain says, “let them know someone is always there with them.”
“In this case it’s very nonclinical — it’s a virtual nurse — talking through this proposed cardiac surgery or that,” Jain says. “It’s recording the responses, and it can do follow ups.” They also imagine teacher PAIs. “Kids learn faster when the character,” Jain says with a laugh, “a cartoon character teaching them; my kids respect their teacher more than me—if it was their teacher, talking to them, teaching them a lesson, they might learn more.”
If ObEN has its way, and current trends — at least in China, Korea, and Japan — accelerate, then we may soon find ourselves in a world festooned with PAIs, emissaries of bystander and celebrity likeness, popping out of our phones, glasses, malls, bus stops, desktops.
There’s an energy to the company that I don’t often encounter in tech startups; an easy, unforced good-naturedness, a willingness to embrace weird frontiers. Maybe that’s in part because it’s one of the only companies I’ve encountered lately where I’ve sat down at the conference table and been the only white guy in the room. It’s a diverse staff, with diverse leadership, and our talk was refreshingly free of the buzzword drip you drown in across Silicon Valley.
Jain and co delight in talking about the theoretical uses of PAIs—like a Taylor Swift PAI that users can ask to sing for them, or for biographical info, and she’ll respond with personal knowledge gleaned from a long-ago training session. That’s one that came up more than once, and that, and the entertainment-based applications are still the ones that seem most likely to resonate.
But it also brings us to one of the starker concerns I see arising with any hypothetical mass popularisation with ObEN’s technology—who controls the likeness once it’s out in the world? For one thing, the contract that ObEN’s face-scanner asked me to sign seemed rather onerous. Here’s the opening clause:
“I hereby grant to Oben, Inc, and its successors, assigns and licenses (collectively, “Company”) the irrevocable right and permission to (i) take photos of me, and (ii) to reproduce and use the photos, my facial expressions, likeness, poses, and voice (the “Images) for all lawful purposes and without further compensation of any kind. Company may edit, alter, copy, exhibit, publish and otherwise use these images and I expressly waive any right to inspect or approve the finish product when any Image appears.”
I’m no legal expert, but that seems to me that ObEN would own the rights to my face and PAI in perpetuity, and they’d even be free to profit off my likeness without sharing any of the revenue. When I declined to sign the contract, another rep said it wasn’t meant for me anyway, and we proceeded without it—I’m not sure who if anyone has signed away their digital face rights
For his part, Jain says, “That aspect of the law is important. We work very closely with agencies to make sure their PAI is only used to create content that they are in control of. Once the data is collected, they are set for life. Their estates could own the PAI and keep it long after the person is gone.” He says he takes privacy very seriously, and is looking at Washington State’s data storage laws as a model. It’s also why he’s exploring the blockchain, he says, as a means of storing data in a distributed fashion.
“Data privacy is a huge topic for us,” he says. “Because you’re storing it. That’s when
blockchain came out as one feasible way to recall your data.” They will also take proactive measures to combat deepfakes, he says.
But ultimately, it all comes back to his children. I may not be convinced that my kids would want to spend time with PAI-dad, or that I’d want them to. But Jain’s are.
“My kids have been our test pilots,” he says. “The first one was me just talking to them then dancing on the couch in AR. I remember very clearly the first time I tried it on them—the weird part was not seeing their dad, it was seeing their dad on their hand, dancing. Once they got over the surprise, they loved it.”