British Telecom is putting Siri through a kind of biomedical training. Last month, at a conference in Boston, BT’s Bas Burger used Siri to launch a mock experiment that analysed data on the new cloud service the company build specifically for life sciences R&D.
Talking into his iPhone, Burger asked Siri to crunch some numbers on BT’s cloud using a common research tool called Pipeline Pilot, and after authenticating Burger, Siri complied. Moments later, Burger asked her for a status update, and she told him the experiment was complete, offering the results on his phone (see video below).
This Siri-in-lab-coat app was born during a hothouse session at BT in September that aimed to make it easier for scientists to run online experiments, and it was put together by developers at BioTeam, a Cambridge, Massachusetts-based consulting firm that serves life sciences research. BioTeam co-founder and director of technology Chris Dagdigian says that the app is merely a proof-of-concept. “It proves that we weren’t completely insane when we were thinking about this idea in a room in the U.K.,” he says. But it’s indicative a larger movement across the life science field to harness the power of the cloud computing.
BT’s service was one of three cloud offerings for life scientists that debuted at last month’s Bio-IT World conference. These include Chinese genome sequencing centre BGI and sequencing equipment-maker Illumina. And then there’s Amazon Web Services, the general-purpose cloud that has already found a home in life sciences. At the conference, pharmaceutical giants Novartis and Bristol-Myers Squibb detailed how they’re using AWS to improve the drug development process, and the service is home to the 1000 Genome Project, which now offers over 1,700 human genomes to genetics researchers across the globe.
The problem isn’t so much the raw volume of sequencing data this produces. Sequencing instruments have become very efficient. The problem is that researchers are doing more with the data and generating large, unpredictable amounts of downstream data, says Dagdigian. “It’s very easy for me to model the technical requirements of an instrument. It’s much harder for me to model the storage requirements of a Ph.D. scientist,” he says.
But the cloud can help, given the elastic nature of infrastructure-as-a-service offerings such as AWS. The rub is that using the cloud means moving huge volumes of data. Researchers routinely ship hard drives containing terabytes of data to Amazon. Handling growing numbers of hard drives is becoming a logistical burden, says Dagdigian, but improvements in networking technology are making it easier to move data online. In March, BioTeam participated in a data transfer test that sustained 700 megabytes per second for more than seven hours. A lab could ship 60 genomes a day at that rate, he says.
Avoiding a data overload crisis, however, is going to require order-of-magnitude improvements in the process, says Dagdigian. One potential saviour is a new data-compression format dubbed CRAM that compares sequence data against a reference genome and only includes the differences. “It would be cool if storage arrays could natively do CRAM compression and decompression on-the-fly,” he says.
And then there’s Siri, who would in theory help researchers move data to the cloud and analyse it once it’s there. Today, Siri is not a lab assistant; she just plays one on YouTube. Apple has yet to release a software development kit or public API for the tool, so in building their tool, BioTeam developers Bill Van Etten and Adam Kraut turned to a Siri proxy server kludge built by a man named Pete Lamonica.
Freely available on GitHub, the proxy listens to the traffic travelling between your iPhone and Apple’s Siri servers, intercepts your custom commands, and routes them to the appropriate application. Apple could shut the thing down at any time with a new version of its software – though it hasn’t so far. A fully automated life sciences cloud is still a ways away. But it’s coming.