The Future of Data Storage Lies in DNA
DNA archival storage within reach thanks to new PCR technique.
Storing data in DNA sounds like science fiction, yet it lies in the near future. Professor Tom de Greef expects the first DNA data center to be up and running within five to ten years. Data won’t be stored as zeros and ones in a hard drive but in the base pairs that make up DNA: AT and CG. Such a data center would take the form of a lab, many times smaller than the ones today. De Greef can already picture it all. In one part of the building, new files will be encoded via DNA synthesis. Another part will contain large fields of capsules, each capsule packed with a file. A robotic arm will remove a capsule, read its contents and place it back.
We’re talking about synthetic DNA. In the lab, bases are stuck together in a certain order to form synthetically produced strands of DNA. Files and photos that are currently stored in data centers can then be stored in DNA. For now, the technique is suitable only for archival storage. This is because the reading of stored data is very expensive, so you want to consult the DNA files as little as possible.
Large, energy-guzzling data centers made obsolete
Data storage in DNA offers many advantages. A DNA file can be stored much more compactly, for instance, and the lifespan of the data is also many times longer. But perhaps most importantly, this new technology renders large, energy-guzzling data centers obsolete. And this is desperately needed, warns De Greef, “because in three years, we will generate so much data worldwide that we won’t be able to store half of it.”
Together with PhD student Bas Bögels, Microsoft and a group of university partners, De Greef has developed a new technique to make the innovation of data storage with synthetic DNA scalable. The results have been published today in the journal Nature Nanotechnology. De Greef works at the Department of Biomedical Engineering and the Institute for Complex Molecular Systems (ICMS) at TU Eindhoven and serves as a visiting professor at Radboud University.
Scalable
The idea of using strands of DNA for data storage emerged in the 1980s but was far too difficult and expensive at the time. It became technically possible three decades later, when DNA synthesis started to take off. George Church, a geneticist at Harvard Medical School, elaborated on the idea in 2011. Since then, synthesis and the reading of data have become exponentially cheaper, finally bringing the technology to the market.
In recent years, De Greef and his group have looked mainly into reading the stored data. For the time being, this is the biggest problem facing this new technique. The PCR method currently used for this, called ‘random access’, is highly error-prone. You can therefore only read one file at a time and, in addition, the data quality deteriorates too much each time you read a file. Not exactly scalable.
Here’s how it works: PCR (Polymerase Chain Reaction) creates millions of copies of the piece of DNA that you need by adding a primer with the desired DNA code. Corona tests in the lab, for example, are based on this: even a minuscule amount of coronavirus material from your nose is detectable when copied so many times. But if you want to read multiple files simultaneously, you need multiple primer pairs doing their work at the same time. This creates many errors in the copying process.
Every capsule contains one file
This is where the capsules come into play. De Greef’s group developed a microcapsule of proteins and a polymer and then anchored one file per capsule. De Greef: “These capsules have thermal properties that we can use to our advantage.” Above 50 degrees Celsius, the capsules seal themselves, allowing the PCR process to take place separately in each capsule. Not much room for error then. De Greef calls this ‘thermo-confined PCR’. In the lab, it has so far managed to read 25 files simultaneously without significant error.
If you then lower the temperature again, the copies detach from the capsule and the anchored original remains, meaning that the quality of your original file does not deteriorate. De Greef: “We currently stand at a loss of 0.3 percent after three reads, compared to 35 percent with the existing method.”
Searchable with fluorescence
And that’s not all. De Greef has also made the data library even easier to search. Each file is given a fluorescent label and each capsule its own color. A device can then recognize the colors and separate them from one another. This brings us back to the imaginary robotic arm at the beginning of this story, which will neatly select the desired file from the pool of capsules in the future.
This solves the problem of reading the data. De Greef: “Now it’s just a matter of waiting until the costs of DNA synthesis fall further. The technique will then be ready for application.” As a result, he hopes that the Netherlands will soon be able to open its inaugural DNA data center – a world first.
Written by Eindhoven University News
Photo by Tom de Greef