How Facebook Is Hacking Together A Better Data Centre 

How Facebook Is Hacking Together A Better Data Centre 

Data centres are boring. They have to be; these are spaces of control, consistency, security. You wouldn’t expect to find much creativity inside the plain facades of these highly-regulated structures — much less hacked-together experiments involving robotic Blu-ray storage systems and thousands of Mac Minis.

But inside these seemingly boring buildings there’s an industry booming with change and experimentation, largely thanks to the exponential growth of data we produce in our lives. Like the explosion of technology that surrounded the installation of sewage systems in 19th century cities, data centres are a largely invisible infrastructure that is nonetheless absolutely vital to the way we live today, just as the pipes that shuttled our waste away from our growing cities were two centuries ago.

Overshadowed by Facebook’s user-facing initiatives — like its new inexplicably-named live show FaceCast or its multiplying stand-alone apps like Groups — there’s the huge side of the company working on the infrastructure and systems that power the website.

Partly because of the sheer scale of its operation — its 864 million daily active users create petabytes upon petabytes of data — Facebook has dug into the engineering problems confronting its storage centres. And as that number grows, Facebook is rushing to build the physical infrastructure needed to do it efficiently. The company has hired architects to design Ikea-inspired data centres that can be assembled cheaper, faster, and more efficiently than other buildings, and it’s buying up wind farms to power those buildings.

Then there are the machines inside the buildings. This is the nervous system of the second most-trafficked website on the internet ( after Google), the bit-based neurons that keep the entire system stable and secure. These machines are arguably the most important part of the whole operation, and where Facebook is bucking the norm. It’s eschewing off-the-shelf hardware and building these machines itself, and it’s making all of the work open source.

Facebook’s mad hardware scientists

Unlike many companies, Facebook runs its own data centres. So it has an incredible amount of leeway to experiment with how its nervous system is designed. That task falls to Facebook’s Director of Hardware Engineering, Matt Corddry, who leads the team that creates the hardware used in Facebook’s data centres.

Corddry and his team make sure the nervous system runs smoothly, but also look at core problems with data centre design and try out new, and sometimes fairly bizarre solutions. Armed with a fabrication lab at the Facebook campus stocked with everything from sheet metal bending machines to 3D printers to endless supplies of balsa wood for models, the group is able to hack together prototypes of their ideas and delve into the smallest or largest design issues within the data centre.

Cold storage drives at Facebook’s Prineville, Oregon, data centre.

When a hard drive storage rack proved too tall for workers to access easily, one of Corddry’s engineers suggested a hinged rack inspired by a garage door opener he had seen recently, building a curtain hinge prototype in the hardware lab that let workers swing down the highest racks for easy access.

To create the infrastructure needed to develop and test Facebook’s iOS app, the team bought thousands of Mac Minis — the resulting library of Minis is now a well-known project for its use of off-the-shelf Apple hardware, a rarity in data centre design or software engineering.

Thousands of Mac Minis at Facebook’s Forest City, North Carolina, data centre.

“I really believe you need to strip away all the layers of abstraction between the engineer and the opportunity or the problem they’re solving,” Corddry told me on a recent phone call. That means every team member experiences life inside the data centre doing repair maintenance, studying the supply chain, and visiting manufacturers of data centre hardware in Asia. It also means working closely with the software developers, since Corddry says the two sides, hard and soft, often exist in a vacuum until “you mash them together and hope it works.”

Storage for the end of the world (or 20 years, whichever comes first)

If you’re on Facebook, odds are good that you’ve uploaded photos to the site. But that doesn’t mean you actually look at them very often. More than 80 per cent of Facebook’s traffic comes from a mere 8 per cent of photos on the site — these are “hot” photos, usually recent uploads still getting views. But the majority of Facebook’s 240 billion photos are “cold,” rarely accessed, but still in need of storage.

Hot photos are stored on racks of hard drives that make up the conventional hardware within a data centre. But what about the cold photos? The images no one has accessed in years? Facebook still needs to archive them — it’s part of the service it provides — but there’s no reason to keep them on hard drive racks, which are more expensive to cool and maintain than other forms of storage. Corddry and his team had another idea: Why not create a hierarchy of storage systems that could keep “cold” photos on cheaper, simpler hardware than the hot photos?

Facebook’s new Luleå Data Centre in Sweden.

The normal solution to cold storage is magnetic tape: Plenty of other companies store their archived data on tape, whose data density is always increasing. But Corddry describes tape as an “operationally ugly thing”, and difficult to scale to Facebook’s four data centres. So he set out to find another storage system for those 92 per cent of photos that don’t often see the light of day, and was surprised to find out that there’s a growing interest in Blu-ray discs as archival storage.

Even though Blu-ray arrived at the exact wrong time for consumers, it’s still a useful and powerful technology: The discs withstand fluctuations in temperature and humidity better than sensitive hard drives, for example, and are often rated for 50 years of storage. It intrigued Corddry, and the team at Facebook began looking at storing their cold photos on Blu-ray. They partnered with a robotics company that helped them build a rack that could store and access stacks of hundreds of discs using a robotic arm, and worked out the numbers for what such a rack would cost to operate.

The Blu-ray storage prototype in action.

Their Blu-ray is not only 50 per cent cheaper — it uses 80 per cent less power and is more resilient than hard drives. Plus, it’s safer in a more conceptual way. “From a price point it’s a very compelling technology, but it’s also really nice from a diversity standpoint,” says Corddry. Diversity refers to how many different storage systems a company uses — if one has a failure or ends up crashing, it helps to have an entirely distinct system waiting in the wings.

To me, 50 years doesn’t seem like that long. Shouldn’t Facebook be planning for centuries? As Corddry explained, that approach just doesn’t make sense from an economic standpoint — or a technological one. “We need to be thinking very long term in terms of the durability we give to our customers,” he says. “That said, technology lifespan versus retaining the data are two different things.”

A Blu-ray disc and drive, part of the cold storage prototype.

In other words, in 10 years storage technology will have improved so much that it won’t make sense to use the team’s hacked-together Blu-ray contraption, even if it’s a great system today.

The long-term storage conundrum

Which gets to the heart of the conundrum confronting Facebook and many other internet giants: The data we generate is increasing exponentially. At the same time, the systems that can store that data are evolving at a quickening pace. Faced with the necessity of planning a real-world infrastructure for all that data, these companies have to choose their storage systems carefully, balancing cost and longevity with an eye to what might replace their racks in just a few years.

In that light, Corddry and his team start to feel a lot more important than “the guys who make the hardware for data centres.” With petabyte upon petabyte of data to store, tiny decisions about hardware can become massive liabilities, which explains why his team does everything from supply chain analysis to 3D printing. “Technology improves so much every year, it’s often not cost effective to run gear more than 10 or 20 years tops,” he says. “Just because it will be so obsolete, relative to what we can buy on the market at the time.”

One example of this thinking is already powering Facebook’s data centres: After an in-depth study of how lithium ion batteries have evolved in electric cars over the past decade, including trips to battery factories and plenty of chemistry, the team decided to try the smaller, lighter, but pricier batteries as an alternative to the large lead acid car batteries that normally keep the servers running. Even though data centres have long shunned lithium ion batteries as too expensive, they found that at the scale of Facebook, it was actually the cheaper option because it reduced the need for cabinet storage and other infrastructure for the car batteries.

So, rather than react to the shifts in storage technology as they happen, Facebook is taking an active role in designing that technology, which makes the hardware team into hybrid market analysts and engineers. The work they’re doing today might not even be technologically relevant in 10 years — but if your photos are still around in 2025, you’ll have them to thank.

Picture: Rows of servers at the Prineville, Oregon, data centre, via Facebook.