Biology Technology: DNA as Compact Data Storage

Sometimes, it seems like the trajectory of science and technology is completely incompatible with the popular media predictions and depictions made thirty or fifty years ago. (This author may still be bitter that there are no flying cars like in Back to the Future or Blade Runner.) Other times, though, the present technological possibilities may feel eerily like a science fiction movie.

Microsoft’s exploration of DNA as data storage technology qualifies as sci-fi come to life. Now, DNA as a possible storage medium isn’t compelling news or a recently invented idea. DNA already serves as one nature’s data storage centers, but over three years ago, scientists successfully utilized DNA as storage for binary data (in this case, MP3 files) storage.

Related: Sign up for our weekly newsletter to stay updated on our latest product reviews

What brings scientists back to investigating DNA’s storage capacity potential is the influx of data creation coming from the rising popularity of the Internet of Things as well as the rapid surge of data already generated that increases daily. Microsoft has partnered with the University of Washington to further research the possibilities of DNA as a compact storage medium.

Microsoft purchased ten million strands of DNA from biology startup Twist Bioscience for these investigations. The data density of DNA is significantly higher than both conventional storage systems and what is currently available on the market, with 1 gram of DNA able to represent close to 1 billion terabytes (1 zettabyte) of data. DNA is also remarkably robust; the key to maintaining that robustness is ensuring the DNA is stored in a cool, dry place.

DNA containing the code of the wooly mammoth has been found and in such good condition that the possibility of bringing back a wooly mammoth exists even though the animal has been extinct for 10,000 years. (Hopefully, scientists will heed the lessons taught by Jurassic Park.) In contrast, the magnetic tape that is the best long-term data storage option today lasts only a few decades before starting to degrade.

Researchers from Microsoft and the University of Washington have already begun the investigation into utilizing DNA when they succeeded in storing 200 MB of data on a few strands of DNA earlier this year. The space of those 200 MB of data occupies only a small dot on a test tube many times smaller than the tip of a pencil.

Despite the small space occupied by the DNA strands, the researchers successfully stored and retrieved high-definition digital video (a music video from OK GO!), the top 100 books from Project Guttenberg and copies of the Universal Declaration of Human Rights in more than 100 languages.

The big difficulty with DNA storage is reading and writing. To store the data as DNA, the binary bits of machine language must first be translated into one of the four nucleotide bases that make up a DNA strand: adenine (A), cytosine (C), guanine (G) and thymine (T).

The molecules are then synthetically built following the coding rules. Scientists then use a technique normally employed by molecular biologists known as polymerase chain reaction to make multiple copies of the DNA strands they want to read. Then they can sample, sequence and decode the relevant data.

Reading the data uses genetic sequencing. The costs of genetic sequencing have dropped substantially over the last 20 years. The Human Genome project, which ran from 1990 to 2003, cost about $3 billion. In 2007, the project would have cost $10 million. In 2015, the same task would have cost about $1,000.

Although the technology has a long way to go before it can be commercialized, the researchers said they are upbeat. The Microsoft and university team have already managed to increase the storage capacity of their DNA system by 1000 times in just the last year.

The entire process is currently too expensive to replace magnetic tape storage. With the costs of tools used to manipulate DNA falling thanks to a growing biotech industry, using DNA to store data may eventually become more cost-effective.

Costs aside, the technology itself works. Microsoft says that its initial trials with Twist have shown that the process allowed full retrieval of the encoded data from the DNA. If the technology can be made cheap enough, it means that one day long-term data archiving could use the same technology as life itself.

“The company is interested in learning whether we can create an end-to-end system that can store information, that’s automated, and can be used for enterprise storage, based on DNA,” says Karin Strauss, Microsoft’s lead researcher on the project.

Strauss says the project is motivated by the fact that electronic storage devices are not improving as quickly as the amount of data we use. “If you look at current projections, we can’t store all the information we want with devices at the cost that they are,” she notes.

IDC predicts that the worldwide total of stored digital data will be 16 trillion gigabytes in 2017, most of it housed in huge data centers. Strauss estimates that a shoebox worth of DNA could hold the equivalent of roughly 100 giant data centers.

Instead of data centers, it might be that we need to start saving our shoeboxes to donate to companies so that enterprise businesses can protect all their data in the same physical shoebox location.  Truly the makings of a science fiction movie.

Lindsey Cobb

Lindsey Cobb, a Georgia native and former history major, is a technology researcher who is fascinated by past and future of technology. When she is not engrossed in the prophecy of science fiction stories, Lindsey is likely to be planning her next adventurous trip or petting every dog she meets. Contact Lindsey at [email protected]