DNA research points to a new future for data storage

Insights Scientific Research

Readers of a certain age will remember buying much-loved albums on multiple formats. Perhaps first you had the vinyl version, then maybe a cassette for playing in the car and later a CD for added digital clarity. How about the DNA version? Earlier this year, Massive Attack, the British band, encoded their 1998 album Mezzanine into DNA to celebrate its 20th birthday.

Scientists from ETH Zurich compressed the album to 15 megabytes – small enough to be converted into 920,000 short strands of DNA. Those molecules are then inserted into 5,000 tiny glass spheres, so small that they are invisible to the naked eye. The beads will be stored in water and at any time the DNA can be taken out and converted back into music.

Aside from this being a novel way to mark the anniversary of an album, the Massive Attack project is part of a series of experiments into the possibilities of DNA as a data storage medium.

Last year, Microsoft, in partnership with the University of Washington and Twist Bioscience, encoded 200 megabytes of data into DNA, including the Universal Declaration of Human Rights in over 100 languages, a high-definition music video of the band OK Go, and a CropTrust database of the seeds stored in the Svalbard Global Seed Vault”.

DNA, or Deoxyribonucleic Acid, is nature’s way of encoding instructions for building every known life form. Animals, plants, bacteria and other life forms all grow according to the recipes embedded in DNA. Researchers have been exploring its potential for data storage for a little over 10 years.

As a storage medium, its advantage is that it packs a lot of information into a very small package. In theory, just one gram can store 455 exabytes of data – and one exabyte is the equivalent of one thousand petabytes. All the world’s data could, it’s estimated, be stored in 10 tons of DNA, which would fit in the back of a semi-trailer.

And it’s incredibly durable. While the lifespan of a CD is estimated at around 30 years, the DNA version of Massive Attack’s album is expected to last for hundreds or even thousands of years. In perfect conditions, DNA can last for hundreds of thousands of years.

Finally, it will never be obsolete; at least, not until we are. Finding a cassette player to listen to your old copy of Mezzanine might be a challenge today. Storage media go out of fashion as technology moves on, which means retrieving old data can be tricky. However, scientists have an obvious interest in continuing to study DNA.

There are problems, of course. Reading and writing the data is an expensive process. Current technology would deliver DNA storage devices at between $2 billion and $4 billion per terabyte. Clearly that isn’t practical but prices are falling all the time. The synthetic DNA used to store costs around $0.05 per base pair but just a few years ago the cost would have been $1 per base pair. Costs need to come down further but researchers are confident that they will.

The other thing that needs to improve is speed. Microsoft was able to write data at 400 bytes per second. That means the 150MB version of the Massive Attack album would take more than 100 hours to encode. Reading is even more complicated because a specific file needs to be found within the larger data. Imagine trying to find just track 6 on Mezzanine, for example. The data must be sequenced and decoded in bulk to find the required information.

But that last problem is changing too. Earlier this year, Microsoft’s researchers announced that they had developed a way to speed that process up. A University of Washington press release said: “They also present their system for random access — that is, the selective retrieval of individual data files encoded in more than 13 million DNA oligonucleotides. While this is not the first time researchers have achieved random access in DNA, the UW and Microsoft team have produced the first demonstration of random access at such a large scale.

This breakthrough, while significant, is just another step on the path. It will take some time before DNA storage is commercially viable. But it is not unfeasible that one day your music collection – along with all your other cloud documents – will be encoded using the tools of life itself.

Written by Shane Richmond (Guest)

See Shane Richmond (Guest)'s blog

Shane Richmond is a freelance technology writer and former Technology Editor of The Daily Telegraph. You can follow him at @shanerichmond

Related blogs

CERN - When science can't use the cloud

Demand for high performance computing (HPC) is growing fast - and you might expect it to become kind of generic. But leading research sites like CERN still make extreme demands.

Read more

TrueHPC at Industrial Scale

As we prepare to descend upon Frankfurt this week for ISC18, science will take its appropriate place at center stage. Throughout the week, the gathering will hear how through science we are challenging the world’s toughest problems, dissecting those problems down to their foundations and then building them back up by methodically moving every minute detail that has been learned into the powerful realm of the supercomputer. Science shows us that the best innovations are created by literally starting from scratch and building from there.

Read more

Industrial HPC Solutions: Visualisation

When you imagine what visualisation is in the world of HPC, most people think of astronomy, such as images of galaxies or black holes, or they think of weather, like analyses of tornadoes or hurricanes. Astronomical and atmospheric data is huge, requires HPC to analyse, and can make for amazing, sophisticated visualisations.

Read more

We use cookies to ensure we give you the best experience on our website, to analyse our website traffic, and to understand where our visitors are coming from. By browsing our website, you consent to our use of cookies and other tracking technologies. Read our Privacy Policy for more information.