(iStock )
(iStock )

Opinion | Molecular biology could hold the key to data storage

Information can be recorded on snippets of synthetic DNA and then converted back to digital form

Albert Einstein is thought to have said, “Make things simple, but never simpler." For a columnist writing about technology and science and their intersection with business and with humans, this is a tall order. Today’s technology has, in some areas, moved into the realm of fantasy, and deconstructing the powerful concepts underlying some of the advances is never easy, especially for someone like me who is at best a bystanding witness.

One of the main contributors to making Artificial Intelligence a powerful tool today is the advanced level of computing we have arrived at. Most of us hold smartphones in our hands that have more computing capability than it took to land Neil Armstrong on the moon 50 years ago. Efforts are on to keep increasing this capability, and the advances researchers are seeking to make in quantum computing is one of these.

The other axis is the inexorable rise in data that we cough up, some of it voluntarily and a lot of it involuntarily, for the benefit of the Big Tech companies that now control our world. This data gets produced by our ever-increasing use of the internet. This has caused a phenomenon called “data inundation", where firms are collecting large amounts of data on their customers and operations, but don’t quite yet know what to do with it. Meanwhile, according to the advisory Ark Invest, such data is predicted to grow to 44 zettabytes by the end of next year, and the deficit of computer storage space that can contain all this data will grow to 500%, meaning that most of it can’t be stored and will become useless.

Molecular biology may come to the rescue. It turns out that Mother Nature’s DNA is the data storage mechanism to beat all computers. According to New Scientist, 1 gram of DNA can hold up to 455 exabytes of data (there are 1,000 exabytes in 1 zettabyte). This means all 44 zettabytes of data produced by the end of next year can actually be stored on just 97 grams of DNA.

There are 4 types of molecules that make up DNA, which form pairs. To encode information on DNA, scientists program the pairs into 1s and 0s—the same binary language that encodes digital data. This concept is not new; scientists at Harvard University encoded a book onto DNA in 2012, but up to now, it has been difficult to retrieve the information stored in DNA.

Now, researchers from Microsoft Corp. and the University of Washington claim to have demonstrated the first fully automated system to store and retrieve data in manufactured “synthetic" DNA—a key step in moving the technology out of the research lab and into commercial data centres. Under helpful conditions, DNA can last much longer than current computer storage technologies that can degrade in a few years. As we know, some DNA has managed to persist in less than ideal storage conditions for tens of thousands of years in the bones of early humans such as the one found recently in deep freeze in the Alps.

In a simple proof-of-concept test, described in a new paper published on 21 March in the journal Nature’s “Scientific Reports", the team successfully encoded the word “HELLO" in snippets of fabricated DNA and converted it back to digital data using a fully automated end-to-end system. Using this prototype system, the team stored and later retrieved the 5-byte “HELLO" (01001000, 01000101, 01001100, 01001100 and 01001111 in bits). This took approximately 21 hours to accomplish.

Information is stored in synthetic DNA molecules created in a lab, not DNA taken from living beings, and can be encrypted before it is sent to the system. While sophisticated machines such as synthesizers and sequencers already perform key parts of the process, many of the intermediate steps until now have required manual labour in the research lab. This would not be viable in a commercial setting, but work is being done to automate this.

The automated DNA data storage system uses software developed by the Microsoft and University of Washington team to convert the 1s and 0s of digital data into the 4 molecular building blocks of DNA. Before a file can be written to DNA, its data must first be translated from 1s and 0s into what are known as the As, Cs, Ts, and Gs of DNA.

The team claims that it then used inexpensive, largely off-the-shelf lab equipment to flow the necessary liquids and chemicals into a synthesizer that builds manufactured snippets of DNA to push them into a storage vessel. When the system needed to retrieve the information, it added other chemicals to accurately prepare the DNA and used microfluidic pumps to push the liquids into other parts of the system that “read" the DNA sequences and converted it back to information that a computer can understand.

The goal of the above project was not to prove how fast or inexpensively the system could work, according to the researchers, but simply to demonstrate that automation is possible. While a paltry 5 bytes in 21 hours is not commercially viable, the researchers say there exists a precedent for many orders of magnitude improvement in such data storage. Also, unlike silicon-based computing systems, DNA-based storage and computing systems have to use liquids to move molecules around. And fluids are inherently different than silicon’s electrons and require entirely new engineering solutions.

Nonetheless, this research opens up a fascinating new flank at the intersection of biology and computing.

*Siddharth Pai is founder of Siana Capital, a venture fund management company focused on tech