Melbourne's entire suburban train network ground to a halt at the worst possible time yesterday. An "infrastructure fault" was to blame — but how does a city-wide computer system just... stop?
iTnews reports that Metro Trains chief told media "a failure in the network's core train control system" meant that operators could not actively track the services as they moved around the network, so all trains were brought to a halt for safety reasons and to avoid any chance of potential accidents. A backup system designed for failover also failed. It's not immediately clear what hardware and train control systems Metro is using to control the network.
A 2012 document on Melbourne's TCS running at Metrol — the facility near Melbourne's Southern Cross railway station — shows the main trail control hardware was originally running on a DEC PDP-11 system, 16-bit computer hardware first developed in 1970 and made end-of-life in the mid-1990s. It was ported to a 'PC-based platform' — a box running Windows XP, mainstream support for which ended in 2014, to a new system called Osprey.
In late 2013, an article in The Age said that the hardware was still '80s era, and "series of delays" had set back planned upgrades by more than two decades. In mid-2014, an article said that upgrade had been delayed further. On Melbourne's Public Transport Victoria website doesn't mention the status of any upgrades since then.
Upgrading a system as crucial and extensive as Melbourne's train network is hard, even from a purely engineering standpoint. The original PDP-11's 16-bit architecture doesn't play well with 2017's 32- and 64-bit computing hardware, so software needs to be substantially or completely redesigned, and bespoke circuitry needs to be created. And even then, with newer and modern hardware — if it has already been upgraded in the three years since the most recent reporting — bugs can take years to appear.