The new file format is called SLOW5, but it is far from being slow. It has been developed with the specific intention of analysing nanopore sequences, which provide a broader view of genetic variations than standard DNA analysis sequences.
Nanopore sequencing allows for real-time analysis of long DNA and RNA fragments by monitoring nucleic acids as they pass through a protein nanopore. It’s typically used to develop specialised treatments for patients with cancer and other diseases, with the goal of delivering results quickly with high mobility. The problem is that nanopore sequencing usually results in huge amounts of data.
This new file format speeds up the process of analysing collected DNA data by about 30 times, which could result in much faster treatment recommendations.
The software used to run the fast SLOW5 DNA analysis file format has been made available online and is open source. UNSW says the file format has been downloaded more than 1,000 times over the past few weeks.
Interestingly, data like what the SLOW5 format is being used for in DNA analysis has been stored previously in file types called FAST5 (no relation to the Fast and the Furious movie of the same name).
Files would often be as large as 1.3 terabytes, making it quite difficult to store, read and transfer. Because of this, it would take about two weeks for computers to analyse the data in its entirety.
Why is it so slow? Because the data can’t be accessed in parallel. It’s based on an old design format from the 1990s, back when computers were much more basic.
But SLOW5 solves this problem, cutting down analysis time to around 10.5 hours for the exact same information. It also reduces the size of the files significantly.
“You can think of this like trying to dig a very big hole with ten people, but there is only one shovel they have to share round,” says Dr Hasindu Gamaarachchi, the lead author of the paper.
“That’s how it used to be with FAST5. But with SLOW5 everyone gets their own shovel and they can all dig at the same time and do the job much faster.”
It’s purpose-built for this job, allowing for parallel computing and quicker data analysis.
“SLOW5 has removed one of the major bottlenecks to the use of nanopore sequencing, a new technology that has countless potential applications in clinical genetics, agriculture and other bioscience domains,” added Dr Ira Deveson, a co-author of the paper and the head of genomic technologies at the Garvan Institute.
“With the development of SLOW5, our ability to process nanopore sequencing data can now keep up with our ability to generate it. This will open the door to many new applications in medical science for this exciting, emerging technology.”
Research data on SLOW5 has been published in Nature Biotechnology.