As a natural side-effect of its use as a medium for information storage and communication, the internet has become an extensive record of human history, at least in the last few decades. It’s understandable, then, why two scientists would be working on a way to use this massive repository of our existence to predict future events.
Kira Radinsky of the Technion–Israel Institute of Technology and Eric Horvitz of Microsoft Research recently published a paper entitled “Mining the Web to Predict Future Events” outlining their attempts at devising an intelligent system that could “forecast forthcoming events of interest” by scouring Wikipedia, over two decades of stories from the New York Times and 90 other web resources — at least to start with.
The idea is to automate the process of identifying important occurrences, extracting the relevant information and forming a “generalisation of sequences of events” in order to pick out patterns and hopefully, have a go at guessing what comes next. If such a system proved accurate enough, it could be used as an early-warning system.
But forging a cybernetic crystal ball is no easy task. As mentioned, the system would need to find similar events and link them together. The paper itself provides the following example of operation:
In particular, we define and extract from the NYT archive news storylines — sets of topically cohesive ordered segments of news that include two or more declarative independent clauses about a single story. As an example, the following events form a storyline: (drought in Africa, 02/17/2006), (storm in Rwanda, 01/26/2007), (flood in Rwanda, 01/27/2007), (cholera outbreak in Rwanda, 01/30/2007).
Then these events need to be generalised, so “flood in Rwanda” would become “flood in an African country”. The system then breaks down countries, causes, etc, into entities, which it can then feed into the appropriate predictive algorithms.
The paper then delves into these algorithms which, as you can imagine, are quite complicated. Sufficed to say, it’s only in recent times that we’ve had access to enough computational power to make such a system feasible. Whether it’ll be accurate enough is another question entirely, but given enough sources of information, processing grunt and formulas… who knows?