On Sunday, one of the United Kingdom’s public health agencies announced that 15,841 covid-19 cases had gone unreported due to a “technical issue” that occurred during “the data load process.” As citizens demand answers, it’s looking increasingly likely that a simple error in Microsoft Excel is to blame for the missing data.
Public Health England (PHE) is responsible for collating covid-19 test results that are supplied from public and private labs across the UK. The data is supposed to be published on a daily basis, and the database is used to identify individuals for contact tracing measures. PHE said the technical issue caused authorities to temporarily lose data that was entered between September 25 and October 2. The bulk of the data losses occurred between September 30 and October 2, the agency said in a statement.
PHE claims that all patients who received a positive covid diagnosis have been informed and that everything has now been added to the proper NHS Test and Trace contact tracing system. But precious time has been lost due to the error. The Guardian estimates that 50,000 contacts were missed while the error went unnoticed.
But what was the error? The growing suspicion is that the Microsoft Excel sheet used to collate covid testing data at the PHE just ran out of space and no one noticed until it was too late. Outlets like the Guardian and the Daily Mail have claimed that when a CSV file from an independent lab was received and added to the master list, it caused the Excel sheet to hit its entry limit. The spreadsheet program maxes out at 1,048,576 rows and 16,384 columns. The PHE did not return Gizmodo’s request for comment, but health secretary Matt Hancock is reportedly going to address the House of Commons on the matter sometime on Monday.
The scandal comes at a time when UK Prime Minister Boris Johnson has been warning citizens of tough times ahead as covid outbreaks accelerate and stricter lockdowns are implemented. Whether or not an Excel error is to blame for the missing testing data, the PHE has been such a disaster among the coronavirus pandemic that it’s set to be abolished and replaced with a fresh agency in the near future.
Max Rosser, Oxford researcher and founder of the Our World In Data project, took the opportunity to highlight the need for greater focus on the integrity of spreadsheet data. Rosser tweeted that he has “struggled in the past to explain to funders that machine learning and AI are not the most urgent next steps. Clean, accurate csv-files are the frontier.”
As the Guardian points out, we’ve seen some minor data entry errors result in major consequences in recent years. In 2013, JPMorgan blamed a $US6 ($8) billion loss, in part, on an error in an Excel spreadsheet used for financial modelling. And in August, researchers announced that they’d changed the names of 27 genes over the last year because Excel was continuously recognising the old names as dates. It was simply easier to change the names than to abandon Excel.