Are We Ever Going to Run Out of Digital Storage Space?

Are We Ever Going to Run Out of Digital Storage Space?
Illustration: Benjamin Currie/Gizmodo

We live in a world of refuse — not simply the orange peels, Amazon boxes, and old TVs rotting away in landfills but also the texts, emails, and torrented movies looked at once and left to languish in the cloud. We appear to be verging on a crisis point with the first kind of trash, but rarely ever think about the second: it is more or less inconceivable that, one day, there may be no more room for our digital ephemera. Is this a reasonable assumption? Or might we actually one day run out of space? For this week’s Giz Asks, we reached out to a number of experts to find out.

Francesca Musiani

Associate Research Professor at CNRS, Deputy Director of the Centre for Internet and Society, doing research on Internet governance and in particular on infrastructures as tools of governance

To attempt an answer, it may be worth specifying who or what is the “we”’ here.

If the “we” is the population of the Earth, most likely, “we” will not. So far, history has shown that as the sheer volume of data to be stored increases (and indeed, it has increased spectacularly in the past decade especially), so the capacity and efficiency of storage systems have been increased and optimised. There seems to be an overall consensus among engineers and technical operators that for all practical purposes, due to a number of technical and economic factors, we are not going to run out of storage space in the foreseeable future (and for the less-foreseeable one, we’re discussing atoms and quantum computing).

If the “we” is rather this or that particular IT company or firm, and its pool of users, the answer is somewhat nuanced. Finding and allocating storage space is a very concrete and material infrastructural concern, and the Big Tech actors are much more likely than other actors to be able to increase by ‘brute force’ the quantity and quality of their storage space. As engineer Ben Podgursky recently mentioned, “YouTube can hoard cat videos as long as Moore’s Law holds for disk space costs.” This may eventually — in fact, may already — contribute to reinforce inequalities in the digital ecosystem. However, will Moore’s Law actually hold? And if so, until when? Even net giants may at some point need to make some choices.

What about decentralised storage solutions — are they the way to the sustainability and sobriety we might want to achieve even in an era of seemingly abundant storage capabilities? The blockchain has a number of advantages compared to centralised storage, but does not seem to be the way to go in terms of sobriety, due to the intrinsic downside of its characteristic built-in redundancy known as blockchain bloating. However, some solutions based on more “old-fashioned” peer to peer (which leverage idle storage space and CPU capacity) have shown, as in the case of the (unfortunately short-lived) Wuala, that interesting experiments exist to try and avoid brute-force scalability and building brand-new massive data centres as the only way to plan tomorrow’s storage.

In the end, I agree with Cisco engineer J. Metz as he points out that a much more problematic question than “finding a place to store data” could actually be “finding your data.” As the sheer volume of data increases and storage capabilities increase with it — and they will — will our tools for handling and retrieving just the information we want, exactly when we want it, keep pace?

John D. Villasenor

Professor, Electrical & Computer Engineering, UCLA

The short answer is no. Of course, I don’t mean that no disk or device or cloud storage account will ever get full. What I mean is that, for most applications, the cost of storage is no longer an obstacle. These days, very few people say “this data is important, but we can’t retain it because that would require spending too much on storage.”

A decade ago, I published a paper through the Brookings Institution that examined the implications of the decades-long trend of exponentially decreasing storage costs. Many of the points in the paper are still relevant today — perhaps even more so given continued cost declines. The fact that storage has become so inexpensive has plenty of positive consequences, including, for example, the ability to store large photo collections. But it has negative consequences as well in relation to privacy and to the reach of authoritarian governments that can create vast databases derived from surveillance data.

The biggest challenge with data these days is not figuring out how to store it, but rather problems with the quality of the data itself. There is plenty of data that is biased, incomplete, noisy, privacy-violating, or otherwise problematic. Addressing those shortcomings needs to be a major focus in the coming years.

Leonard Kleinrock

Distinguished Professor, Computer Science, UCLA, who developed the mathematical theory of packet networks, the technology underpinning the Internet

Are we ever going to run out of digital storage space? Most likely not!

The future demands for data storage are overwhelming. We are generating data far faster than we can store it using today’s existing storage technology (by 2025 it is estimated that we’ll be generating more bytes per year than there are stars in the observable universe). So clearly, we need to dramatically change the way we store data.

But we have seen such challenges met decade after decade before, and there is no reason to doubt our ability to continue to do so as we are constantly surprised at the pace of basic technology growth over the decades with respect to processing speed, communications bandwidth and, in our case, digital storage. It is the ingenuity, innovation, and creativity of our motivated and challenged scientists and engineers that will achieve this continuous show of magic. So, in spite of the enormous growth of data storage needs, don’t underestimate the potential of forthcoming solutions.

We already see a number of technologies that hold promise for our future data storage needs, with each presenting challenges yet to be met. Beyond these we can look forward to as-yet unforeseen technologies and solutions that will come about, as well as to the gains to be had by sophisticated data compression.

One interesting technology uses single molecule magnets. When these magnets, made of new material (i.e., a transition metal), are magnetized using a magnetic field, they remain magnetized after that field is removed. Transition metals can exhibit switchable magnetic properties such as spin crossover (changing the spin of one or more electrons from up to down or vice versa) and can maintain that change for some period. This means that each molecule can contain 1 bit of information, thus providing enormous storage density. In order to realise this promise for vast storage capability (e.g., hundreds of terabytes per square inch), there are a number of hurdles to be overcome. Among these challenges are: the requirement to provide supercooling in order to fabricate the molecules; the time they hold their memory is measured only in seconds; and there is the issue of adjacent magnets interfering with each other.

Another technology uses femtosecond laser writing to etch hundreds of terabytes of data into nanostructured quartz glass discs. These discs can last for thousands of years and potentially survive for billions of years. Although the technology is currently slow to read and write, no supercooling is needed and the discs can survive in very high temperatures.

Nature herself provides DNA storage as one of the real contenders for our future data storage needs since, long ago, she figured out how to store enormous amounts of information securely in the form of DNA, the building blocks of biological life. Now, some researchers (biologists, chemists and information technologists) are showing the ability to code data (e.g., words, images, music) into synthetically created polymers that use the four base pairs of DNA — A, C, G, and T (the nucleic acids adenine, cytosine, guanine and thymine). The storage density available with DNA is far greater than that of any electronic device; indeed a single gram of DNA has a data density in the range of a zettabyte (1021 bytes). One way to understand the enormity of that density is to realise we could store all the data that humans have ever recorded into a container full of DNA about the size of a few shoe boxes and the energy requirements to drive DNA data centres would be rather small. Moreover, DNA storage can survive for thousands of years while at the same time presenting huge protection against hacking. So the advantages of synthetic polymer DNA are: huge capacity; extremely small size; durable and long lasting; low energy requirements; no need for supercooling; and great security. Counteracting these advantages we note that chemically writing or reading DNA is incredibly expensive and the speed for these operations is far too slow.

So we already have some exciting candidates to solve our pending data storage needs moving forward, and further, as-yet-undiscovered solutions will likely be provided as we meet the challenge of the future.

Shane Greenstein

Professor of Business Administration and Co-Chair of the HBS Digital Initiative at Harvard Business School

How could the world ever run out of storage? It does not seem possible. One hyper scale data centre contains several hundred thousand servers. That is more than enough space to house Wikipedia, the third largest collection of writing ever assembled by humans (next to the Library of Congress and the British Library). That suggests storage is inexhaustible, and that any chance of running out of storage space is remote, far away beyond a horizon that society will never reach.

Not quite. New uses continue to emerge, and that demand induces more use of available supply. Probably the most interesting question is whether demand will ever stop growing. I think not, at least over the next few decades, and for two basic reasons. The edges of the internet require it, and human ingenuity is far from exhausting imaginative new applications.

Consider the edges first. The presence of any bottleneck will turn an infinite resource into a limited one, and every system contains at least one bottleneck. The internet contains many. A surfer occupying a speeding subway under Fifth Avenue in Manhattan wants an article without delay. Accomplishing that requires all the appropriate pieces to work at peak — fast lines, multiple antennae, cached content, and ever more storage at the edge of the system. As the videos get better and bigger, the content grows in data-intensity, so it arrives more slowly, and performance deteriorates. Wikipedia needs your donations, so their operations do not become the bottleneck. Less visibly, carriers of the data want your business, and a piece of that revenue goes into upgrades to the system, and, yes, frontier storage at the edges. It is in your smartphones, in the servers, and in the Content Delivery Networks. In other words, the modern internet contains market mechanisms that induce the use of ever more storage.

As storage has become cheap, human ingenuity has found ways ever more clever ways to use it. That turns last year’s frontier into next year’s routine. There was a time when it was novel to send more videos of babies and cats, and to share more fashion and gossip. In a few years a fleet of autonomous vehicles will draw on everything the system can handle. Nobody forecasts a stopping point for such ingenuity. Last year a team of scientists developed a visual representation of the detritus around a black hole. The research team collected so much data they could not ship it over the internet. They had to divide it into many packages and physically send the disks by post. Scientists have not come close to the end of black holes to view. More to the point, that precedent will inspire some clever inventor somewhere to find an object here on earth that pushes the boundaries of visual representations. When he or she does, we will all gawk at the picture in awe.

Christian Fuchs

Professor of Media and Communication Studies at the University of Westminster, author of Social Media: A Critical Introduction

We do not live in a data-and-digital society. We live in big data capitalism, and under digital capitalism. In this social formation, corporate and political power aim to store as much data as possible about humans in order to monetise and profit from our human life and to securitise our citizenship. If data and digital capitalism continues in an unlimited manner and uses up non-renewable resources required for computing, it might at some point of time run out of the physical resources needed for building computing and storage devices. But there is also Moore’s law that cheapens data storage and the search for new forms of computing such as quantum computing, which mitigate these limits and might result in new forms of data and digital capitalism that overcome storage limits.

The key question, however, is not a technological but a moral-political one: Do we want endless storage of almost all aspects of our lives? What will be the consequences and impacts of a society that monetises and securitises ever more of our thoughts and activities on human life? The big danger is that digital capitalism becomes digital dictatorship and digital fascism. The solution is therefore that we minimise the data that is stored to the minimum that is necessary for operating computers and society. We need digital democracy instead of digital capitalism in order to avoid the rise of digital fascism.

Eric Osterweil

Assistant Professor, Computer Science, George Mason University

I think it would certainly be possible to produce data at a rate that would make it impossible to capture, store, and/or preserve it all. For example, just look at all of the data the LHC at CERN drops on the floor when running their experiments. What I think we care about is storing data that we can use to discern relevant meaning to us (to our lives, to investigations, to our business interests, etc.).

I draw this distinction because what I believe we are really doing is storing data that lets us derive meaning, and often we store processed data (aggregates or representations that summarise, or are more compact, or have more semantics, etc.). This often means we’re guessing which pieces or representations of data we will need and/or want.

While we strive to increase our storage capacity, and record (perhaps) finer and finer grained data, we are still learning that our digital footprint can be a liability. As a cybersecurity researcher (who does data analyses), I think we are learning more and more,all the time, that the indelible footprint we have been leaving online is not washed away by time. What’s more, our community’s data analyses techniques (Big Data, ML/AI, etc.) are becoming both more insightful, while data sets are becoming more richly connected. I would turn this question around and ask: will we get to a point where people will want our digital storage space to run out so their footprint can age out? The problem there is that, with all of the information we have been sharing online for decades now, we may have already opened Pandora’s Box.

Kevin Curran

Professor of Cyber Security & Co-Executive Director of the Legal Innovation Centre at Ulster University

All digital data is stored on hard drives. Hard drive capacity has progressed incredibly just like other technology. The main changes have been to move from spinning drives to solid state drives which have no moving parts, in addition to substantial increases in disk capacity. Interestingly, for the first time, due in part to covid, we are seeing delays in products due to chip shortages. Chips are also a component of hard drives, and should such a shortage of core components for hard drives arise in the future then we could conceivably run out of digital storage space. What could be done however is to recycle existing hard drives and basically overwrite older files. That could alleviate the problem for some but of course the giants of the internet may struggle to reclaim enough storage space. The law of economics may also play a large part here as digital storage prices would increase dramatically and only the rich would be able to store their data.

Do you have a burning question for Giz Asks? Email us at [email protected]