While Google couldn't track influenza particularly accurately using search analysis, a team of researchers reckons it can predict the spread of diseases using data lurking within Wikipedia.
In a new study published in PLoS Computational Biology, scientists from the Defence Systems and Analysis Division at Los Alamos National Laboratory explain how there's a wealth of information lurking in Wikipedia that can predict the spread of illness. "Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social Internet data, such as social media and search queries, are emerging," they write. "These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness."
So, they set about probing the data in Wikipedia and comparing it to a more conventional data source: incidence reports from the World Health Organisation. By comparing the one with the other, they were able to build a model that translated Wikipedia edits into disease incidence around the world. They did that for seven diseases — cholera, dengue, Ebola, HIV/AIDs, influenza, plague, and tuberculosis — in nine different locations, from Haiti to Norway.
But such conversion isn't much use if you can't do anything with it. So they also set to working out if the same model could be used to predict the spread of disease. Turns out it can: they were able to predict outbreaks of dengue in Brazil and influenza in the US, a full 28 days before they happened. The technique isn't, however, perfect: it entirely failed to predict an outbreak of tuberculosis in China.
But maybe that's OK. "In the same way we check the weather each morning, individuals and public health officials can monitor disease incidence and plan for the future based on today's forecast," Dr Sara Del Valle, one of the researchers, explained in a press release. "The goal of this research is to build an operational disease monitoring and forecasting system with open data and open source code. This paper shows we can achieve that goal."